Package parsimony :: Module estimators :: Class Clustering
[hide private]
[frames] | no frames]

Class Clustering

source code

   object --+    
            |    
BaseEstimator --+
                |
               Clustering

Estimator for the clustering problem, i.e. for

    f(C, mu) = sum_{i=1}^K sum_{x in C_i} |x - mu_i|²,

where C = {C_1, ..., C_K} is a set of sets of points, mu_i is the mean of
C_i and |.|² is the squared Euclidean norm.

This loss function is known as the within-cluster sum of squares.

Parameters
----------
K : Positive integer. The number of clusters to find.

algorithm : Currently only the K-means algorithm (Lloyd's algorithm). The
        algorithm that should be used. Should be one of:
            1. KMeans(...)

        Default is KMeans(...).

algorithm_params : A dictionary. The dictionary algorithm_params contains
        parameters that should be set in the algorithm. Passing
        algorithm=MyAlgorithm(**params) is equivalent to passing
        algorithm=MyAlgorithm() and algorithm_params=params. Default
        is an empty dictionary.

Examples
--------
>>> import parsimony.estimators as estimators
>>> import parsimony.algorithms.cluster as cluster
>>> import numpy as np
>>> np.random.seed(1337)
>>>
>>> K = 3
>>> n, p = 150, 2
>>> X = np.vstack((2 * np.random.rand(n / 3, 2) - 2,
...                0.5 * np.random.rand(n / 3, 2),
...                np.hstack([0.5 * np.random.rand(n / 3, 1) - 1,
...                           0.5 * np.random.rand(n / 3, 1)])))
>>> lloyds = cluster.KMeans(K, max_iter=100, repeat=10)
>>> KMeans = estimators.Clustering(K, algorithm=lloyds)
>>> error = KMeans.fit(X).score(X)
>>> print error
27.6675491884
>>>
>>> #import matplotlib.pyplot as plot
>>> #mus = KMeans._means
>>> #plot.plot(X[:, 0], X[:, 1], '*')
>>> #plot.plot(mus[:, 0], mus[:, 1], 'rs')
>>> #plot.show()

Nested Classes [hide private]

Inherited from BaseEstimator: __metaclass__

Instance Methods [hide private]
 
__init__(self, K, algorithm=None, algorithm_params={})
x.__init__(...) initializes x; see help(type(x)) for signature
source code
 
get_params(self)
Return a dictionary containing the estimator's own input parameters.
source code
 
fit(self, X, means=None)
Fit the estimator to the data.
source code
 
predict(self, X)
Perform prediction using the fitted parameters.
source code
 
parameters(self)
Returns the estimator's fitted means.
source code
 
score(self, X)
Computes the within-cluster sum of squares.
source code

Inherited from BaseEstimator: get_info, set_params

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  __abstractmethods__ = frozenset([])
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, K, algorithm=None, algorithm_params={})
(Constructor)

source code 

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__
(inherited documentation)

get_params(self)

source code 

Return a dictionary containing the estimator's own input parameters.

Overrides: BaseEstimator.get_params

fit(self, X, means=None)

source code 

Fit the estimator to the data.

Overrides: BaseEstimator.fit

predict(self, X)

source code 
Perform prediction using the fitted parameters.

Finds the closest cluster centre to each point. I.e. assigns a class to
each point.

Returns
-------
closest : A list. A list with p elements: The cluster indices.

Overrides: BaseEstimator.predict

parameters(self)

source code 

Returns the estimator's fitted means.

Overrides: BaseEstimator.parameters

score(self, X)

source code 

Computes the within-cluster sum of squares.

Overrides: BaseEstimator.score