05 – Matching, Indexing & Search

SIFT keypoints, feature matching, bag of visual words, homography, and scale-space analysis

1. SIFT Keypoint Detection interest points

SIFT (Scale-Invariant Feature Transform) detects keypoints at multiple scales by finding extrema in a Difference-of-Gaussians (DoG) pyramid. Below we approximate interest-point detection using a Harris corner detector implemented in JavaScript — computing image gradients, the structure tensor, and the Harris response R = det(M) − k·trace(M)².

Harris k: 0.04

Threshold %: 10%

NMS radius: 8

Loading images…

2. Feature Matching patch descriptors

Feature matching pairs keypoints between images using descriptor similarity. Here we extract small grayscale patches around detected corners and match them via normalised cross-correlation (NCC). Lines connect the best matches; green = strong match, yellow = weaker.

Image pair:

Max matches: 25

Patch size: 11

3. Bag of Visual Words image retrieval

Bag of Visual Words represents each image as a histogram over a vocabulary of "visual words" (cluster centres from k-means on local descriptors). Similar images produce similar histograms. Here we use colour-patch features as a proxy for SIFT descriptors, then cluster with k-means.

K (visual words): 8

Patch grid: 8×8

4. Homography Warping projective transform

A homography is a 3×3 projective transformation mapping points from one plane to another. Given 4 point correspondences we can solve for the 8 degrees of freedom. Click 4 points on each image (in matching order) to define the mapping, or use presets.

Preset:

Click 4 points on the left image, then 4 on the right image.

Source image (click 4 pts)

Destination image (click 4 pts)

Warped result

5. Laplacian of Gaussian (LoG) scale space

The Laplacian of Gaussian is used for blob detection and edge detection in scale space. For a 1D signal, LoG highlights zero-crossings that correspond to edges.

LoG(x) = −1/(π·σ⁴) · (1 − x²/σ²) · exp(−x² / (2σ²))

σ (interactive): 0.100

Signal:

Input signal

LoG kernel

Convolution result

Multi-scale LoG responses (8 σ values)

Made with ❤️ by Mark Žnidar