Sparse Pathways: Domain-Aware Neuron Routing¶
Paper: Sparse Pathways: Domain-Aware Neuron Routing for Efficient Inference
Summary¶
FFN neurons exhibit strong domain specialization (~50% in late layers), with r=0.999 correlation between model scale and specialization degree. The paper demonstrates 2-4x potential compute reduction via negative neuron selection — skipping neurons that hurt performance on a given domain.
Key Finding¶
Individual FFN neurons are not general-purpose. Many neurons activate strongly for specific domains (medical, legal, code) and weakly or negatively for others. By identifying and skipping domain-negative neurons, you can reduce compute while maintaining or improving accuracy.
All neurons active: 100% compute, baseline accuracy
Domain-aware routing: 25-50% compute, maintained accuracy
Key Results¶
- ~50% of late-layer neurons show domain specialization
- r = 0.999 correlation between model scale and neuron specialization degree
- Larger models have more specialized neurons (more "experts" emerge naturally)
- 2-4x potential compute reduction by skipping domain-negative neurons
- Negative selection (removing harmful neurons) outperforms positive selection (keeping helpful ones)
Scale-Specialization Correlation¶
| Model Size | Specialization Degree |
|---|---|
| Small (125M) | Low |
| Medium (1.3B) | Moderate |
| Large (7B+) | High (~50% in late layers) |
The correlation is near-perfect (r = 0.999), suggesting specialization is an emergent property of scale.
Negative Neuron Selection¶
The key insight: rather than finding neurons that help a domain (positive selection), it's more effective to find neurons that hurt a domain (negative selection) and skip them.
# Conceptual approach using Model Garage
from model_garage.snapshot.capture import SnapshotCapture
from model_garage.core.hooks import HookManager
# Capture neuron activations per domain
capture = SnapshotCapture(model)
medical_snapshots = capture.run(medical_inputs, layers=all_ffn_layers)
general_snapshots = capture.run(general_inputs, layers=all_ffn_layers)
# Identify domain-negative neurons
# (neurons whose activation correlates with worse performance on medical tasks)
Implications¶
- Efficient inference — Skip 50-75% of FFN computation for domain-specific tasks
- Dynamic routing — Route tokens through domain-appropriate neuron subsets at runtime
- Model compression — Prune domain-irrelevant neurons for specialized deployments
- Understanding emergence — Specialization increases predictably with scale
Model Garage Modules Used¶
analyze— Per-neuron activation measurement across domainssnapshot— Hidden state capture at FFN granularitycore.hooks— Neuron-level interception during forward passes