Lien de la note Hackmd

Agenda for lecture 2

Introduction
Global image descriptors
Clustering
Local feature detectors

Introduction

Summary of last lecture

Machine learning

Machine learning = searching for the best model in a hypothesis space
Inductive machine learning, optimization-based
Inductive bias, bias/vairance compromise
Supervised, reinforcement, unsupervised learning
Regression, classification, density estimation
Model validation: test generalisation, separate/decorrelate test & training sets

Template matching

Sum of squared differences $(T-I)^2$, or correlation-based methodes ($T\times I$)
Normalization needed for correlation-based methods
Tolerates translation and small noise, but not rotation, intensity shift, …

Debrief of practice session 1

PS1 content

Jupyter tricks
NumPy reminders
Intro to image manipulations
Twin it! part 1: template matching
(Bonus level: segmentation)

Take home messages

How annoying was it to manually adjust color thresholds to select the duck ? How could have we automated it ?

Results with method SQDIFF_NORMED (lower is better)

Strengths and weaknesses of template matching for the Twin It! case ? Effects of normalization ?

Next practice session

Twin it! again, with a slightly more elaborated approach

Pre-selected bubbles based on their colors $\Rightarrow$ color histograms

Color histogram: in details

1.1 Color quantization: reduce the colors of the bubbles

1.2. Compute the color histogram of each bubble

1.3. Compute the distance matrix between each bubble, using its color histogram

1.4 Visualize the bubbles in an interesting way using hierarchical clustering

2.For the pre-selected bubbles, check their content is similar

$\Rightarrow$ Detect stable points and extract the patches around them
Compare (match) those patches

Image descriptors

Issues with method based on pixel comparison

What is important ? What do they consider? Raw pixels!

We want to be able to make use of domain knowledge
Like sensitivity to shape, or dominant color information

Overview

Different sizes and contents

Different kinds of descriptors

Different problems $\Rightarrow$ Different choices

Computation/memory constraints
Which perturbations do we have to tolerate ?

Global image descriptors

Two approaches

Global image descriptors

Compute statistics about the content of the image
Produce a single global vector

Very attractive because they are very fast to compute and match, but…

Bag of Features techniques (lecture 4)

Select regions of interest in the image (may be a variable quantity)
Compute descriptors for each region
Index each part separately (like a text seach engine which indexes words)

It is always possible to build a sing descriptors from multiple ones

Color histograms

High invariance to many transformation

rotation, scaling thanks to normalization, perspective But limited discriminative power

Easy to implement

Reduce the colors (opt. when performing backprojection)
Compute a reduced color histogram on each image
Use a distribution distance to compare the descriptors

Some results on Twin It!

Steps by step

1: Color reduction

use K-Means or any other clustering technique to find N useful colors
Project each pixels

One possible result on the Twin It! poster

2: Histogram computation

You already know it (Normalize it)

3: Descriptor comparison

Other global image descriptor

More global descriptors

GIST of a scene:

Oliva, Torralba, “Modeling the shape of the scene”

Global descriptors

Drawback

Accordin to F. Perronnin: Highly efficient to compute and to match $\Rightarrow$ perfect in theory

But robusteness vs informativeness tradeoff is hard to set

(personal conclusion)

Approache based on global image descriptors are confined to near-duplicate detection applications until now
Modern search engine uses local representations and leverage them

Clustering

Finding groups in data

Many techniques:

Connectivity models
- hierarchical clustering,…
- clustering = set of neighbors
Centroid models: k-means
- cluster = centroid point
Distribution model
- Gaussian mixtures models est. w. Expection maxim
- cluster = statistical distribution
Density models
Graph-based models

Always the same goal:

Minimise the differences between elements within the same cluster
Maximise the differences between elements within different cluster

Number of clusters:

Many methods require to choose it beforehand
Several techniques to adjust the number of clusters automatically

Outliers rejection:

Some techniques do not assign lonely points to any cluster

Focus on HAC and K-Means today

Hierarchical Agglomerative Clustering

Some linkage types

Single linkage
- minimizes the distance between the closest observations
Maximum or complete linkage
Average linkage
Centroid linkage
Waard criterion

Divisive clustering

HAC is bottom-up, divisive clustering is top-down Classical approach:

Start with all data
Apply flat clustering
Recursively apply the approach on each cluster until some termination

Pros: can have more than 2 sub-trees, must faster than HAC Cons: same issues as flat clustering, non-determinism

K-means

K-Mean clustering (again)

The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion

it does not maximizes inter-cluster disantce
it puts centers so as to get the best coverage (may not be on a density peak !)

Algorithm

Initialization:

Randomly selected cluster centers
Calculate distance oiunts $\Leftrightarrow$ centers
Assign each point to closest center
Update cluster centers: avg of points

Result: centroid centers

local maximax
tessellation / Voronoi set over the dataset

The previous algorithm is called “Batch K-Means” or simply “K-Means” because it considers the whole dataset at each iteration.

Batcj K-Means is not only sensible to outliers and initialization, it is also very slow to compute on large datasets..

It is possible to avoid this speed/memory issue by randomly smapling the dataset at each step.

Results are only slightly worse
Speed and memory requirements make it usable on bigger datasets
This approach is call “Online K-Means” or “MiniBatch K-Means”

Application: Color quantization

Many clustering techniques to play with !

Evaluation of clustering

Need some supervision ?

By construction, clustering algorithms are optimal as they are expect to find some optimal balance between high intra-cluster similarity and low inter-cluster similarity, on their training set.

How do these internal criteria translate into good effectiveness for applications ?

A common approach is to rely on labeled data to compute new indicators:

Purity: sort of “agreement” inside each cluster
Normalized Mutual Information (NMI) and Entropu: information measures
Rand Index (RI) and F measure: error counts

Modern density estimation point of view

But what about if we leave some samples out for testing the generalization ?

HAC or K-Means “overfit” the underlying data distribution.

It does not alway make sense, but if we are interested in density estimation, then we can assess how well our model estimates the probability $P(x)$ of unseen data. The “E” step of the EM algo is based on this idea.