Understanding what neural networks learn through filter visualization, embeddings, saliency maps, and input maximization.
First-layer convolution filters in CNNs learn basic visual features: edges at various orientations, color blobs, and Gabor-like texture detectors. Below are 64 synthetic 7×7 filters replicating well-known first-layer patterns.
Second-layer filters combine first-layer features into more complex edge and texture detectors. Each group shows filters for one input channel. The patterns are more complex, detecting corners, curves, and oriented textures.
2D PCA projection of class embedding vectors for 30 ImageNet classes. Semantically similar classes cluster together. Scroll to zoom, drag to pan.
Compare PCA (linear) and t-SNE (non-linear) embeddings. PCA preserves global structure; t-SNE reveals local cluster separation. Adjust perplexity to see its effect on t-SNE clustering.
ResNet18 predictions on the input image. The model confidently predicts "Golden Retriever".
Slide a black patch across the image and observe how occlusion of different regions affects the model's prediction confidence. The heatmap shows which regions are most important for classification. Drag the patch or adjust its size.
A saliency map highlights the pixels most relevant to the predicted class. It is computed as the absolute gradient of the class score with respect to each input pixel. Bright regions indicate where small pixel changes would most affect the classification.
Input maximization starts from random noise and iteratively adjusts pixels via gradient ascent to maximize a target neuron's activation. A total-variation (TV) loss regularizer penalizes high-frequency noise to produce interpretable patterns.
Made with ❤️ by Mark Žnidar