Package parsimony :: Module estimators :: Class Clustering

Class Clustering

   object --+    
            |    
BaseEstimator --+
                |
               Clustering

Estimator for the clustering problem, i.e. for

    f(C, mu) = sum_{i=1}^K sum_{x in C_i} |x - mu_i|²,

where C = {C_1, ..., C_K} is a set of sets of points, mu_i is the mean of
C_i and |.|² is the squared Euclidean norm.

This loss function is known as the within-cluster sum of squares.

Parameters
----------
K : Positive integer. The number of clusters to find.

algorithm : Currently only the K-means algorithm (Lloyd's algorithm). The
        algorithm that should be used. Should be one of:
            1. KMeans(...)

        Default is KMeans(...).

algorithm_params : A dictionary. The dictionary algorithm_params contains
        parameters that should be set in the algorithm. Passing
        algorithm=MyAlgorithm(**params) is equivalent to passing
        algorithm=MyAlgorithm() and algorithm_params=params. Default
        is an empty dictionary.

Examples
--------
>>> import parsimony.estimators as estimators
>>> import parsimony.algorithms.cluster as cluster
>>> import numpy as np
>>> np.random.seed(1337)
>>>
>>> K = 3
>>> n, p = 150, 2
>>> X = np.vstack((2 * np.random.rand(n / 3, 2) - 2,
...                0.5 * np.random.rand(n / 3, 2),
...                np.hstack([0.5 * np.random.rand(n / 3, 1) - 1,
...                           0.5 * np.random.rand(n / 3, 1)])))
>>> lloyds = cluster.KMeans(K, max_iter=100, repeat=10)
>>> KMeans = estimators.Clustering(K, algorithm=lloyds)
>>> error = KMeans.fit(X).score(X)
>>> print error
27.6675491884
>>>
>>> #import matplotlib.pyplot as plot
>>> #mus = KMeans._means
>>> #plot.plot(X[:, 0], X[:, 1], '*')
>>> #plot.plot(mus[:, 0], mus[:, 1], 'rs')
>>> #plot.show()

Nested Classes

[hide private]

Inherited from BaseEstimator: __metaclass__

Instance Methods

[hide private]

__init__(self, K, algorithm=None, algorithm_params={})
x.__init__(...) initializes x; see help(type(x)) for signature source code

get_params(self)
Return a dictionary containing the estimator's own input parameters.

source code

fit(self, X, means=None)
Fit the estimator to the data.

source code

predict(self, X)
Perform prediction using the fitted parameters.

source code

parameters(self)
Returns the estimator's fitted means.

source code

score(self, X)
Computes the within-cluster sum of squares.

source code

Inherited from BaseEstimator: get_info, set_params

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables

[hide private]

__abstractmethods__ = frozenset([])

Inherited from BaseEstimator (private): _abc_cache, _abc_negative_cache, _abc_negative_cache_version, _abc_registry

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, K, algorithm=None, algorithm_params=`{}`)
(Constructor)

source code

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__: (inherited documentation)

get_params(self)

source code

Return a dictionary containing the estimator's own input parameters.

Overrides: BaseEstimator.get_params

fit(self, X, means=None)

source code

Fit the estimator to the data.

Overrides: BaseEstimator.fit

predict(self, X)

source code

Perform prediction using the fitted parameters.

Finds the closest cluster centre to each point. I.e. assigns a class to
each point.

Returns
-------
closest : A list. A list with p elements: The cluster indices.

Overrides: BaseEstimator.predict

parameters(self)

source code

Returns the estimator's fitted means.

Overrides: BaseEstimator.parameters

score(self, X)

source code

Computes the within-cluster sum of squares.

Overrides: BaseEstimator.score

Class Clustering

__init__(self, K, algorithm=None, algorithm_params={}) (Constructor)

get_params(self)

fit(self, X, means=None)

predict(self, X)

parameters(self)

score(self, X)

init(self, K, algorithm=None, algorithm_params=`{}`)
(Constructor)