← Back to Index

06 – Classification

CIFAR-10, color features, k-NN classification, and softmax temperature

1

CIFAR-10 Dataset

CIFAR-10 is a benchmark dataset of 60,000 tiny 32×32 colour images in 10 classes, with 6,000 images per class. It is split into 50,000 training and 10,000 test images. The small image size makes it a popular sandbox for testing classification algorithms.

Key point: Each image is just 32×32×3 = 3,072 numbers. A classifier must map this high-dimensional input to one of 10 labels. Simple color-based features already carry some signal — we explore this below.
2

Color Feature Classification & k-NN

A simple idea: count blue vs green pixels in each image. Ships tend to have more blue (sky/water), while deer tend to have more green (grass/forest). We plot these as 2D features and classify with k-NN. Click anywhere on the plot to classify a new query point.

5
60
Ship (blue pixels) Deer (green pixels) Query point
Try it: Click near the overlap region where ship and deer clusters meet. Then change k to see how the decision flips. Larger k smooths the boundary but may lose detail.
3

k-NN Accuracy vs k

How does the choice of k affect classification accuracy? We generate synthetic train/test data and evaluate k-NN for odd values of k from 1 to 25. The stem plot shows accuracy at each k.

120
80
3.0
Observation: Very small k (1 or 3) may overfit to noise; very large k washes out local structure. Typically there is a sweet spot in between. Try reducing cluster separation to make the problem harder.
4

Softmax & Temperature Scaling

The softmax function converts a vector of raw scores (logits) into a probability distribution. The temperature τ controls the “sharpness” of the output.

softmax(xi) = exp(xi / τ) ⁄ Σj exp(xj / τ)
1.00
Cold τ<0.5 Standard τ≈1 Hot τ>2
INPUT LOGITS
SOFTMAX OUTPUT (probabilities)
ENTROPY vs TEMPERATURE
τ → 0 (cold): Output approaches a one-hot vector (argmax). The model is “confident.”
τ = 1: Standard softmax — the default.
τ → ∞ (hot): Output approaches a uniform distribution. The model is “uncertain.”

Made with ❤️ by Mark Žnidar