Lien de la note Hackmd
Agenda for lecture 3
- Introduction
- Finish lecture about local feature detectors
- Local feature descriptors
- Descriptor matching and indexing
- Projective transformations
- Homography estimation
A word about Divise clustering
HAC is bottom-up, divisive clustering is performed top-down
Classical approach:
- Start with all data
- Apply flat clustering
- Recursively apply the approach on each cluster until some termination
Pros: can have more than 2 sub-trees
Summary of last lecture
Global image descriptors
- Color histogram
- Limited descriptive power
- Which distance function ?
Clustering
- K-means
- Hierarchical Agglomerative Clustering
Local feature detectors
- Image gradients
- Edge detector: Sobel, Canny
- Corner detector: Harris
- Large image gradient in 2 directions
- Corner detector: FAST
- Corner detectors: LoG, DoG, DoH
- Blob detector: MSER
Nest practice sessions
Compute and match descriptors for max. 1 hour (from practice session 2)
Play with ORB keypoint mathcing to implement a simple AR technique (practice session 3)
Introduction
How are panorama pcitures created from multiple pictures?
- Detect small parts invariant under viewpoint change: keypoints
- Find pairs of mathcing keypoints using a description of their neighborhood
- Compute the most likely transformation to blend images together
Local feature descriptors
Harris & Stephen conclusion
Harris-Stephens trick to avoid computing eigenvalues:
Nowadays linear algebra is cheap, so compute the real eigenvalues.
Thein filter using $\min(\lambda_1, \lambda_2)\gt\lambda$, $\lambda$ being a threshold
This is the Shi-Tomasi variant
Build your own edge/corner detector
We need the eigenvalues $\lambda_1$ and $\lambda_2$ of the structure tensor (hessian matrix with block-wise summing)
1
2
dst = cv2.cornerEigenValsAndVecs(src, neighborhood_size, sobel_aperture)
dst = cv2.cornerMinEigenVal(src, neighborhood_size, sobel_aperture)
Harris summary
Pros
Translation invariant
- Large gradients in both directions = stable points
Cons
Not so fast
- Avoid to compute all those derivatives Not scale invariant
- Detect corners at different scales
Corner detectors, binary tests FAST
Features from accelerated segment test (FAST)
Keypoint detector used by ORB
Segment test Compare pixel $P$ intensity $I_p$ with surrounding pixels (circle of 16 pixels)
If $n$ contiguous pixels are either:
- all darker than $I_p-t$
- all brighter than $I_p+t$
then $P$ is detected as a corner
Tricks
- Cascading
- If $n=12$ ($\frac{3}{4}$ of the circle)
- Machine learning
- How to perform non-maximal suppression
FAST summary
Pros
Very fast
- 20 times faster than Harris
- 40 times faster than DoG
Very robust to transformations (perspective in particular)
Cons
Very sensitive to blur
Corner detectors at different scales LoG, DoG, DoH
Laplacian of Gaussian (LoG)
The theoretical, slow way
Band-pass filter It detects objects of a certain size
Laplacian = second derivative
Like sobel with 1 more derivation
Taylor, again:
New filter $I_{xx} = \begin{bmatrix}&1 &-2 &1\end{bmatrix} \times I$
Laplacian filter $\nabla^2I(x,y)$
Edge detector, like Sobel but with second derivatives
Laplacian of Gaussian
mexican hat
LoG = detector of circular shapes
Detector of circular shapes:
LoG filter extrema locates “blobs”
- maxima = dark blobs on light background
- minima = light blobs on dark background
Detecting corners/blobs
Build a scale space representation: stack of images (3D) with increasing $\sigma$
Difference of Gaussian (DoG)
Fast approximation of LoG (Used by SIFT)
LoG can be approximate by a Difference of 2 Gaussians (DoG) at different scales
DoG filter
It is a band-pass filter
L’idee est : “est-ce que mes bosses sont comprises entre telles ou telles longueurs d’onde”
DoG filter
Intuition
- Gaussian (g) is a low pass filter
DoG computation in practice
Take an image
Blur it
Take the difference
DoG scale generation trick
DoG computation use “octaves”
- “Octave” because frequency doubles/halves between octaves
- If $\sigma=\sqrt{2}$, then $3$ levels per octave
- Downsample images for next octave (like double sized kernel)
- Compute the DoG between images
DoG: Corner selection
Throw out weak responses and edges
Estimate gradients
- Similar to harris, look at nearby responses
- Not whole image, only a few points! faster
- Throw out weak responses
Find cornery things
- Same deal, structure matrix, use dete and trace info (SIFT variant)
Determination of Hessian (DoH)
Faster approximation
LoG vs DoG vs DoH
On prefere un “Laplacian of Gaussian” pour detecter les petites etoiles
LoG, DoG DoH summary
Pros
Very robust to transformations
- Perspective
- Blur
Adjustable size detector
Cons
Slow
Blob detectors MSER
Maximally Stable Extremal Regions (MSER)
Detects regions which are stable over thresholds
- Create min & max-tree of the image
- Tree of included components when thresholdinf the image ar each possible level
- Le cerveau a une tumeur
- Select most stable regions
- between $t-\triangle$ and $t+\triangle$
- $R_{t*}$ is maximally stable iif $q(t)=\vert R_{t-\triangle}\text{\ } R_{t+\triangle}\vert/\vert R_t\vert$ as local minimum at $t^{*}$
Summary
Pros
Very robust to transformations
- Affine transformations
- Intensity changes
Quite fast
Cons
Does not support blur
Local fetaure detectors: Conclusion
- Harris Stephens: can be very stable when combined with DoG
- Shi-Tomasi: Assumes affine transformation (avoid it with perspective)
- DoG: slow but very robust (perspective, blur, illumination)
- DoH: faster than DoG, misses small elements, better with perspective
- FAST: very fast, robust to perspective change (like DoG), but blur quickly kills it
- MSER: fast, very stable, good choice when no blur
Introduction
Given som keypoints in image 1, what are the more similar ones in image 2 ?
This is a nearest neighbor problem in descriptor space
This is also a geometrical problem in coordinate space
Matching
Matching problem
Goal: given 2 sets of descriptors, find the best matching pairs
Need a distanceorm: depends on the descriptor
- Distribution (histogram)? Stats?
- Data type ?
- Float, integers: Euclidean, cosine
- Binary: Hamming
1-way matching
For each $x_i$ in the set of descriptors $D_1$, find the closest element $y_i$ in $D_2$
We have a match $m(x_i, y_i)$ for each $x_i$
Symmetry test aka cross check aka 2-way matching
For each $x_i$ in the set of descriptor $D_1$, find the closest element $y_i$ in $D_2$ such as $x_i$ is also the closest element to $y_i$
Ratio test
For each $x_i$ in $D_1$, find the 2 closest elements $y_i$ and $y_j$ in $D_2$
Calibrate the ratio
Adjust it on a training set !
For each correct/incorrect match in your annotated database, plot the next to next closest distance PDF.
What is a good ratio in D. Lowe’s experiment ?
Geometric validation
Summary
Indexing
Indexing pipeline
Use case: We have a database of images and we want to find an object from it
Bruteforce matching aka linear matching
Simply scan all data and keep the closest elements
Does not scale to large databases, but can be faster on small ones
kD-Trees
binary tree in which every leaf node is a k-dimensional point
FLANN - Efficient indexing
- Original version: hierarchical k-means
- Construction: repetitive k-means on data (then inside clusters) until minimum cluster size is reached
- Lookup: traverse the tree in a best-bin-first manner with backtrack queue, backtrack until enough point are returned
Locally Sensitive Hasing (LSH)
Hash items using family of hash function which project similar items in the same bucket with high probability
Not cryptographic hashing !
En fonction de quel cote notre separatrice se trouve par rapport au point, on met notre bit a 1 ou 0
Approximation de la distance $\cos$ en binaire
Locality Sensitive Hashing (LSH)
- Fast and efficient with large spaces and lot of data
- Return a “good match”, maybe not the best one
- kNN can be costly
Which indexing ?
Experiment
Advices for practice session:
Projective transformations
A linear transformation of pixel coordinates
Image Mappings Overview
Math. foundations & assumptions
- For planar surfaces, 3D to 2D perspectives projection reduces to a 2D to a 2D transformation
- This is just a change of coordinate system
- This transformation is invertible
Translation
Scale
Rotation
Notation: Partitioned matrices
Affine
Projective
More on projective transform
- Each point in 2D is actually a vector in 3D
- Equivalent up to scaling factor
- Have to normalize to get back to 2D
- Using homogrpahy to project point
- Multiply $\tilde x$ by $\tilde H$
Summary
Homography estimation, Geometric validation
We want to recover H from keypoint matches
Recover the parameters of a perspective transform
How many correspondences are needed ?
Depends on the type of transform:
- How many for translation ?
- For rotation ?
- …
- For general projective transform ?
Reminded: we have 2 knowns for each match
Linear system
Use Linear Least Square to solve $Ma=b$
Solve the system
How reliable is the estimate ?
Even worse
Is our data perfect ?
On a pas mal de bruit
Overcoming Least Square Limitations
- We need a robust estimation