Sparse Pathways: Domain-Aware Neuron Routing¶

Paper: Sparse Pathways: Domain-Aware Neuron Routing for Efficient Inference

Summary¶

FFN neurons exhibit strong domain specialization (~50% in late layers), with r=0.999 correlation between model scale and specialization degree. The paper demonstrates 2-4x potential compute reduction via negative neuron selection — skipping neurons that hurt performance on a given domain.

Key Finding¶

Individual FFN neurons are not general-purpose. Many neurons activate strongly for specific domains (medical, legal, code) and weakly or negatively for others. By identifying and skipping domain-negative neurons, you can reduce compute while maintaining or improving accuracy.

All neurons active:    100% compute, baseline accuracy
Domain-aware routing:  25-50% compute, maintained accuracy

Key Results¶

~50% of late-layer neurons show domain specialization
r = 0.999 correlation between model scale and neuron specialization degree
Larger models have more specialized neurons (more "experts" emerge naturally)
2-4x potential compute reduction by skipping domain-negative neurons
Negative selection (removing harmful neurons) outperforms positive selection (keeping helpful ones)

Scale-Specialization Correlation¶

Model Size	Specialization Degree
Small (125M)	Low
Medium (1.3B)	Moderate
Large (7B+)	High (~50% in late layers)

The correlation is near-perfect (r = 0.999), suggesting specialization is an emergent property of scale.

Negative Neuron Selection¶

The key insight: rather than finding neurons that help a domain (positive selection), it's more effective to find neurons that hurt a domain (negative selection) and skip them.

# Conceptual approach using Model Garage
from model_garage.snapshot.capture import SnapshotCapture
from model_garage.core.hooks import HookManager

# Capture neuron activations per domain
capture = SnapshotCapture(model)
medical_snapshots = capture.run(medical_inputs, layers=all_ffn_layers)
general_snapshots = capture.run(general_inputs, layers=all_ffn_layers)

# Identify domain-negative neurons
# (neurons whose activation correlates with worse performance on medical tasks)

Implications¶

Efficient inference — Skip 50-75% of FFN computation for domain-specific tasks
Dynamic routing — Route tokens through domain-appropriate neuron subsets at runtime
Model compression — Prune domain-irrelevant neurons for specialized deployments
Understanding emergence — Specialization increases predictably with scale

Model Garage Modules Used¶

analyze — Per-neuron activation measurement across domains
snapshot — Hidden state capture at FFN granularity
core.hooks — Neuron-level interception during forward passes