Lien de la note Hackmd

Agenda for lecture 3

Introduction
Finish lecture about local feature detectors
Local feature descriptors
Descriptor matching and indexing
Projective transformations
Homography estimation

A word about Divise clustering

HAC is bottom-up, divisive clustering is performed top-down

Classical approach:

Start with all data
Apply flat clustering
Recursively apply the approach on each cluster until some termination

Pros: can have more than 2 sub-trees

Summary of last lecture

Global image descriptors

Color histogram
Limited descriptive power
Which distance function ?

Clustering

K-means
Hierarchical Agglomerative Clustering

Local feature detectors

Image gradients
Edge detector: Sobel, Canny
Corner detector: Harris
- Large image gradient in 2 directions
Corner detector: FAST
Corner detectors: LoG, DoG, DoH
Blob detector: MSER

Nest practice sessions

Compute and match descriptors for max. 1 hour (from practice session 2)

Play with ORB keypoint mathcing to implement a simple AR technique (practice session 3)

Introduction

How are panorama pcitures created from multiple pictures?

Detect small parts invariant under viewpoint change: keypoints
Find pairs of mathcing keypoints using a description of their neighborhood
Compute the most likely transformation to blend images together

Local feature descriptors

Harris & Stephen conclusion

Harris-Stephens trick to avoid computing eigenvalues:

Nowadays linear algebra is cheap, so compute the real eigenvalues.

Thein filter using $\min(\lambda_1, \lambda_2)\gt\lambda$, $\lambda$ being a threshold

This is the Shi-Tomasi variant

Build your own edge/corner detector

We need the eigenvalues $\lambda_1$ and $\lambda_2$ of the structure tensor (hessian matrix with block-wise summing)

  
dst = cv2.cornerEigenValsAndVecs(src, neighborhood_size, sobel_aperture)
dst = cv2.cornerMinEigenVal(src, neighborhood_size, sobel_aperture)

Harris summary

Pros

Translation invariant

Large gradients in both directions = stable points

Cons

Not so fast

Avoid to compute all those derivatives Not scale invariant
Detect corners at different scales

Corner detectors, binary tests FAST

Features from accelerated segment test (FAST)

Keypoint detector used by ORB

Segment test Compare pixel $P$ intensity $I_p$ with surrounding pixels (circle of 16 pixels)

If $n$ contiguous pixels are either:

all darker than $I_p-t$
all brighter than $I_p+t$

then $P$ is detected as a corner

Tricks

Cascading
- If $n=12$ ($\frac{3}{4}$ of the circle)
Machine learning
How to perform non-maximal suppression

FAST summary

Pros

Very fast

20 times faster than Harris
40 times faster than DoG

Very robust to transformations (perspective in particular)

Cons

Very sensitive to blur

Corner detectors at different scales LoG, DoG, DoH

Laplacian of Gaussian (LoG)

The theoretical, slow way

Band-pass filter It detects objects of a certain size

Laplacian = second derivative

Like sobel with 1 more derivation

Taylor, again:

New filter $I_{xx} = \begin{bmatrix}&1 &-2 &1\end{bmatrix} \times I$

Laplacian filter $\nabla^2I(x,y)$

Edge detector, like Sobel but with second derivatives

Laplacian of Gaussian

mexican hat

LoG = detector of circular shapes

Detector of circular shapes:

LoG filter extrema locates “blobs”

maxima = dark blobs on light background
minima = light blobs on dark background

Detecting corners/blobs

Build a scale space representation: stack of images (3D) with increasing $\sigma$

Difference of Gaussian (DoG)

Fast approximation of LoG (Used by SIFT)

LoG can be approximate by a Difference of 2 Gaussians (DoG) at different scales

DoG filter

It is a band-pass filter

L’idee est : “est-ce que mes bosses sont comprises entre telles ou telles longueurs d’onde”

DoG filter

Intuition

Gaussian (g) is a low pass filter

DoG computation in practice

Take an image

Blur it

Take the difference

DoG scale generation trick

DoG computation use “octaves”

“Octave” because frequency doubles/halves between octaves
If $\sigma=\sqrt{2}$, then $3$ levels per octave
Downsample images for next octave (like double sized kernel)
Compute the DoG between images

DoG: Corner selection

Throw out weak responses and edges

Estimate gradients

Similar to harris, look at nearby responses
Not whole image, only a few points! faster
Throw out weak responses

Find cornery things

Same deal, structure matrix, use dete and trace info (SIFT variant)

Determination of Hessian (DoH)

Faster approximation

LoG vs DoG vs DoH

On prefere un “Laplacian of Gaussian” pour detecter les petites etoiles

LoG, DoG DoH summary

Pros

Very robust to transformations

Perspective
Blur

Adjustable size detector

Cons

Slow

Blob detectors MSER

Maximally Stable Extremal Regions (MSER)

Detects regions which are stable over thresholds

Create min & max-tree of the image
- Tree of included components when thresholdinf the image ar each possible level
- - Le cerveau a une tumeur
Select most stable regions
- between $t-\triangle$ and $t+\triangle$
- $R_{t*}$ is maximally stable iif $q(t)=\vert R_{t-\triangle}\text{\ } R_{t+\triangle}\vert/\vert R_t\vert$ as local minimum at $t^{*}$

Summary

Pros

Very robust to transformations

Affine transformations
Intensity changes

Quite fast

Cons

Does not support blur

Local fetaure detectors: Conclusion

Harris Stephens: can be very stable when combined with DoG
Shi-Tomasi: Assumes affine transformation (avoid it with perspective)
DoG: slow but very robust (perspective, blur, illumination)
DoH: faster than DoG, misses small elements, better with perspective
FAST: very fast, robust to perspective change (like DoG), but blur quickly kills it
MSER: fast, very stable, good choice when no blur

Introduction

Given som keypoints in image 1, what are the more similar ones in image 2 ?

This is a nearest neighbor problem in descriptor space

This is also a geometrical problem in coordinate space

Matching

Matching problem

Goal: given 2 sets of descriptors, find the best matching pairs

Need a distanceorm: depends on the descriptor

Distribution (histogram)? Stats?
Data type ?
- Float, integers: Euclidean, cosine
- Binary: Hamming

1-way matching

For each $x_i$ in the set of descriptors $D_1$, find the closest element $y_i$ in $D_2$

We have a match $m(x_i, y_i)$ for each $x_i$

Symmetry test aka cross check aka 2-way matching

For each $x_i$ in the set of descriptor $D_1$, find the closest element $y_i$ in $D_2$ such as $x_i$ is also the closest element to $y_i$

Ratio test

For each $x_i$ in $D_1$, find the 2 closest elements $y_i$ and $y_j$ in $D_2$

Calibrate the ratio

Adjust it on a training set !

For each correct/incorrect match in your annotated database, plot the next to next closest distance PDF.

What is a good ratio in D. Lowe’s experiment ?

Geometric validation

Summary

Indexing

Indexing pipeline

Use case: We have a database of images and we want to find an object from it

Bruteforce matching aka linear matching

Simply scan all data and keep the closest elements

Does not scale to large databases, but can be faster on small ones

kD-Trees

binary tree in which every leaf node is a k-dimensional point

FLANN - Efficient indexing

Original version: hierarchical k-means
Construction: repetitive k-means on data (then inside clusters) until minimum cluster size is reached
Lookup: traverse the tree in a best-bin-first manner with backtrack queue, backtrack until enough point are returned

Locally Sensitive Hasing (LSH)

Hash items using family of hash function which project similar items in the same bucket with high probability

Not cryptographic hashing !

En fonction de quel cote notre separatrice se trouve par rapport au point, on met notre bit a 1 ou 0

Approximation de la distance $\cos$ en binaire

Locality Sensitive Hashing (LSH)

Fast and efficient with large spaces and lot of data
Return a “good match”, maybe not the best one
kNN can be costly

Which indexing ?

Experiment

Advices for practice session:

Projective transformations

A linear transformation of pixel coordinates

Image Mappings Overview

Math. foundations & assumptions

For planar surfaces, 3D to 2D perspectives projection reduces to a 2D to a 2D transformation
This is just a change of coordinate system
This transformation is invertible

Translation

Scale

Rotation

Notation: Partitioned matrices

Affine

Projective

Summary

Homography estimation, Geometric validation

We want to recover H from keypoint matches

Recover the parameters of a perspective transform

How many correspondences are needed ?

Depends on the type of transform:

How many for translation ?
For rotation ?
…
For general projective transform ?

Reminded: we have 2 knowns for each match

Linear system

Use Linear Least Square to solve $Ma=b$

Solve the system

How reliable is the estimate ?

Even worse

Is our data perfect ?

On a pas mal de bruit

Overcoming Least Square Limitations

We need a robust estimation

MLRF: Lecture 03

Agenda for lecture 3

A word about Divise clustering

Summary of last lecture

Nest practice sessions

Introduction

Local feature descriptors

Harris & Stephen conclusion

Build your own edge/corner detector

Harris summary

Pros

Cons

Corner detectors, binary tests FAST

Features from accelerated segment test (FAST)

Tricks

FAST summary

Pros

Cons

Corner detectors at different scales LoG, DoG, DoH

Laplacian of Gaussian (LoG)

Laplacian = second derivative

Laplacian filter $\nabla^2I(x,y)$

Laplacian of Gaussian

LoG = detector of circular shapes

Detecting corners/blobs

Difference of Gaussian (DoG)

DoG filter

DoG filter

DoG computation in practice

DoG scale generation trick

DoG: Corner selection

Determination of Hessian (DoH)

LoG vs DoG vs DoH

LoG, DoG DoH summary

Pros

Cons

Blob detectors MSER

Maximally Stable Extremal Regions (MSER)

Summary

Pros

Cons

Local fetaure detectors: Conclusion

Introduction

Matching

Matching problem

1-way matching

Symmetry test aka cross check aka 2-way matching

Ratio test

Calibrate the ratio

Geometric validation

Summary

Indexing

Indexing pipeline

Bruteforce matching aka linear matching

kD-Trees

FLANN - Efficient indexing

Locally Sensitive Hasing (LSH)

Locality Sensitive Hashing (LSH)

Which indexing ?

Projective transformations

A linear transformation of pixel coordinates

Image Mappings Overview

Math. foundations & assumptions

Translation

Scale

Rotation

Notation: Partitioned matrices

Affine

Projective

More on projective transform

Summary

Homography estimation, Geometric validation

We want to recover H from keypoint matches

Recover the parameters of a perspective transform

How many correspondences are needed ?

Linear system

Use Linear Least Square to solve $Ma=b$

Solve the system

How reliable is the estimate ?

Even worse