Home IML: Unsupervised clustering
Post
Cancel

IML: Unsupervised clustering

Lien de la note Hackmd

Why do we care

  • Find pattern in the data (data mining problem)
  • Visualize the data in a simpler way
  • Infer some properties of a given data point based on how it relates to other data point (satistical learning)

Why is it tricky

Belongs to unsupervised learning

  • No grounds truth available to learn/evaluate quality of the algorithm

How to assess how much data points are related to each other?

  • Which criteria (features) are the most relevant
  • Which metrics make the most sense

How to assess the soundness of the resulting cluster? Is it relevant ?

Families of clustering approaches

  • Distance-based clustering
    • centroid-based approach (k-mean)
    • connectivity-based approaches (based on distance)
  • Density-based clustering
    • set of dense points
  • Distribution-based clustering
    • likelihood of point to belong to the same distribution
  • Fuzzy clustering
    • Relaxed clustering paradigm where a data point can be assigned to multiple clusters with a quantified degree of belongingness metric (fuzzy $c$-means clustering,…).

$k-$means clustering

Partition $n$ observations $x_1,…,x_n$ int $k$ clusters $C={C_1,…,C_k}$ where each observations $x_i$ belongs to the cluster $C_{j*}$ whose mean $\mu_{j*}$ is the closest: $x_i\in S_{j*}$ with $j^{*}=argmin_j\Vert x_i-\mu_j \Vert_2$

La croix represente le centre, on veut la plus petite distance depuis un centre pour ajouter un point dans un cluster:

  • Minimize within-cluster sum of squares (variance)
  • Overall optimization problem:

  • NP-hard problem, no guarantee to find the optimal value
  • Stochastic and very sensitive to initial conditions
  • Sensitive to outliers (thank you $L_2$ norm)
  • Probably the most used clustering algorithm

$k-$means and Voronoi tesselation

  • $k$-mean provides a way to obtain a Voronoi tesselation of the input space, where seeds are the final cluster means
  • Alternatively, one case use some pre-computer Voronoi tesselation seeds as initial clusters for $k$-means

Determining the optimal number of cluster

Combien de clusters a vue de nez pour cette image ?

2, 3, 4, 14….

Compute explained variance for an increasing number of clusters $k$

Sometimes, $k$-means works…

But most of the time not as expected.

  • Sensible of curse of dimensionality
  • Form “normalized Gaussian” clusters
  • Does not adapt to manifold geometry
  • Sensible to class imbalance
  • Sensible to outliers

Simple Linear Iterative Clustering

A kick-ass image segmentation algorithm using $k$-means

k-medoids clustering

Possible extension to $k$-means

$\Rightarrow$ Replace centroids by medoid (points with the smallest distance to all other points in the cluster)

$\Rightarrow$ $k$-medoid algorithm

Overall objective: find $k$ medoids $m_1, . . . , m_k$ that minimize the partitioning cost

Fuzzy $c$-means clustering

Gaussian mixture models

$k$-means on steroids

$k$-means works for spherical clusters, but fails in any other cases $\Rightarrow$ try harder Model probability density function $f$ of data as a mixture of multivariate Gaussian

Cette courbe est une superposition de plusieurs Gaussiennes:

The EM algorithm

Initialization

  • Select $k$ random points as initial means $\hat\mu_1,…,\hat\mu_k$
  • Init all covariance matrices $\hat\sum_1,…,\hat\sum_k$ as whole data sample covariances matrix $\hat\sum$
  • Set uniform mixture weight $\hat\phi_1,…,\hat\phi_k=\frac{1}{k}$

Alternate until convergence

Expectation step Compute membership weight $\hat\gamma_{ij}$ of $x_i$ with respect to $j^{th}$ component $\mathcal N(x\vert\mu_j,\sum_j)$

Maximization step Update weights (in that ordre)

$k$-means vs GMM

Let the fight begin!

Kernel Density Estimation

Nonparametric estimation

The kernel density estimator with bandwith $h$ at given point $x$ is given by

Exemples

Mean shift clustering

Exemples

On peut faire la meme chose sur les images en couleurs:

DBSCAN

Density-base spatial clustering of applications with noise

  • Divide points into 3 categories (core, boundary, outliers) whether there are at least $minPts$ in their $\epsilon$-neighborhood or not
  • Find the connected component of core points (ignore all non-core points)
  • Assign non-core points to nearby clusters if it is less than $\epsilon$ away, otherwise assign to noise

Spectral clustering

  • Compute similarity graph (but which one?) of data $x_1,…,x_n$

  • Compute (weighted) adjacency matrix $W$, degree matrix $D$ and Laplacian matrix $L=D-W$
  • Perform eigendecomposition of $L=(E,\triangle)$
  • Performs $k$-means clustering of the $k$ smallest eigenvectors $[e_1,…,e_k]_{n\times k}$

Hierarchical clustering

A very natural way of handling data

  • Leaves the dendogram = initial data
  • Inner nodes of the dendogram = clusters

Exemple

Agglomerative vs Divise clustering

Agglomerative: merge clusters from fine to coarse (bottom-up approach)

Divisive clustering: split clusters (top-down approach)

  • Needs some heuristics to avoid the $O(2^n)$ ways of spitting each cluster…
  • Not so used in practice

Bestiarity

Overall comparison of all methods

This post is licensed under CC BY 4.0 by the author.