← Back to Index

Lecture 10 – Object Detection

From HOG features to region proposals and non-maximum suppression

1. HOG (Histogram of Oriented Gradients) Interactive

HOG captures local edge/gradient structure by computing gradient orientations in small cells, accumulating them into histograms, and normalizing over blocks. This creates a descriptor robust to small deformations and illumination changes — ideal for pedestrian detection.

2. HOG Person Detection Conceptual

Pipeline: A fixed-size detection window (e.g., 64×128 pixels) slides across the image at multiple scales. At each position, HOG features are extracted and fed to a trained linear SVM. Positions scoring above the threshold are reported as detections. Overlapping detections are merged via NMS.
Detection (SVM score > threshold)
Sliding window concept

3. Selective Search (Region Proposals) Interactive

Selective Search starts with an over-segmentation and iteratively merges similar regions based on color, texture, size, and fill compatibility. The bounding boxes of all intermediate merged regions form the proposal set — typically 1000-2000 proposals covering objects at various scales.
80
40

4. Classified Proposals (R-CNN) Conceptual

R-CNN Pipeline: (1) Generate ~2000 region proposals via selective search. (2) Warp each proposal to a fixed size (e.g., 227×227). (3) Extract CNN features (e.g., AlexNet fc7). (4) Classify each proposal with class-specific SVMs. (5) Apply bounding box regression for refinement. This was the first method to combine deep features with region proposals for detection.
person
background objects

5. Non-Maximum Suppression (NMS) Interactive

NMS removes redundant detections: (1) Sort all boxes by confidence. (2) Pick the top-scoring box, remove all boxes with IoU > threshold relative to it. (3) Repeat with remaining boxes until none are left. This ensures one detection per object.
0.50

6. IoU (Intersection over Union) Interactive

IoU measures how much two bounding boxes overlap. It's the standard metric for evaluating detection accuracy (mAP uses IoU thresholds like 0.5 or 0.75) and for NMS suppression decisions.
IoU = Area(A ∩ B) / Area(A ∪ B)
IoU = 0.00

Made with ❤️ by Mark Žnidar