Spectral Clustering

sklearn.cluster.SpectralClustering

Spectral Clustering does a low-dimension embedding of the affinity matrix between samples, followed by a KMeans in the low dimensional space. It is especially efficient if the affinity matrix is sparse and the pyamg module is installed. SpectralClustering requires the number of clusters to be specified. It works well for a small number of clusters but is not advised when using many clusters.

For two clusters, it solves a convex relaxation of the normalised cuts problem on the similarity graph: cutting the graph in two so that the weight of the edges cut is small compared to the weights of the edges inside each cluster. This criteria is especially interesting when working on images: graph vertices are pixels, and edges of the similarity graph are a function of the gradient of the image.

Usable examples: Segmenting the picture of a raccoon face in regions

[2]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import os, sys
import numpy as np
from skimage import io
import matplotlib.pylab as plt
sys.path += [os.path.abspath('.'), os.path.abspath('..')]  # Add path to root
import notebooks.notebook_utils as uts
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/mnt/datagrid/personal/borovec/Applications/vEnv2/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

load datset

[7]:
p_dataset = os.path.join(uts.DEFAULT_PATH, uts.SYNTH_DATASETS_FUZZY[0])
print ('loading dataset: ({}) exists -> {}'.format(os.path.exists(p_dataset), p_dataset))

p_atlas = os.path.join(uts.DEFAULT_PATH, 'dictionary/atlas.png')
atlas_gt = io.imread(p_atlas)
nb_patterns = len(np.unique(atlas_gt))
print ('loading ({}) <- {}'.format(os.path.exists(p_atlas), p_atlas))
plt.imshow(atlas_gt, interpolation='nearest')
_ = plt.title('Atlas; unique %i' % nb_patterns)
loading dataset: (True) exists -> /datagrid/Medical/microscopy/drosophila/synthetic_data/atomicPatternDictionary_v0/datasetFuzzy_raw
loading (True) <- /datagrid/Medical/microscopy/drosophila/synthetic_data/atomicPatternDictionary_v0/dictionary/atlas.png
../_images/notebooks_method_SpectralClust_4_1.png
[8]:
list_imgs = uts.load_dataset(p_dataset)
print ('loaded # images: ', len(list_imgs))
img_shape = list_imgs[0].shape
print ('image shape:', img_shape)
('loaded # images: ', 800)
('image shape:', (64, 64))

Pre-Processing

[9]:
X = np.array([im.ravel() for im in list_imgs])  # - 0.5
print ('input data shape:', X.shape)

plt.figure(figsize=(7, 3))
_= plt.imshow(X, aspect='auto'), plt.xlabel('features'), plt.ylabel('samples'), plt.colorbar()
('input data shape:', (800, 4096))
../_images/notebooks_method_SpectralClust_7_1.png
[10]:
uts.show_sample_data_as_imgs(X, img_shape, nb_rows=1, nb_cols=5)
<matplotlib.figure.Figure at 0x7fb2827ee7d0>
../_images/notebooks_method_SpectralClust_8_1.png

SparsePCA

[18]:
from sklearn.cluster import SpectralClustering
sc = SpectralClustering(n_clusters=nb_patterns, n_jobs=-1) # , assign_labels='discretize'

sc.fit(X[:1200, :].T)
atlas = sc.labels_.reshape(atlas_gt.shape)
[18]:
SpectralClustering(affinity='rbf', assign_labels='kmeans', coef0=1, degree=3,
          eigen_solver=None, eigen_tol=0.0, gamma=1.0, kernel_params=None,
          n_clusters=7, n_init=10, n_jobs=-1, n_neighbors=10,
          random_state=None)
[19]:
plt.imshow(atlas)
('labels:', (4096,))
[19]:
<matplotlib.image.AxesImage at 0x7fb275e30a90>
../_images/notebooks_method_SpectralClust_11_2.png
[ ]: