Self-Debate Chambers¶
Add multi-perspective reasoning to any model by splitting hidden states into divergent perspectives, processing them separately, and reconciling through learned gating.
Basic Usage¶
import torch
from model_garage.core.loader import quick_load
from model_garage.inject.debate import SelfDebate
model, tokenizer, _ = quick_load("gpt2")
input_ids = tokenizer("Artificial intelligence will", return_tensors="pt").input_ids
debate = SelfDebate(
model,
layer_idx=6,
divergence_method="dropout",
divergence_strength=0.15
)
with debate:
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=30,
do_sample=True,
temperature=0.8
)
info = debate.get_debate_info()
print(tokenizer.decode(output[0], skip_special_tokens=True))
How It Works¶
graph LR
A[Hidden State] --> B[Perspective A]
A --> C[Perspective B]
B --> D[Divergence]
C --> E[Divergence]
D --> F[Reconciliation]
E --> F
F --> G[Enriched Output]
- Split — The hidden state at the injection layer is duplicated
- Diverge — Each copy is perturbed differently (dropout, noise, scaling)
- Process — Both perspectives pass through the rest of the layer
- Reconcile — The two outputs are merged via gating, averaging, or learned combination
Divergence Methods¶
| Method | Description | Best For |
|---|---|---|
dropout |
Random dropout with different masks | General exploration |
noise |
Gaussian noise injection | Robustness testing |
scaling |
Different scale factors per perspective | Amplitude analysis |
Reconciliation Methods¶
| Method | Description | Quality |
|---|---|---|
gated |
Learned gating mechanism | Best (+8.9% vs identity) |
average |
Simple mean of perspectives | Good baseline |
weighted |
Fixed weighted combination | Manual control |
Inspecting Debate Results¶
info = debate.get_debate_info()
for round_info in info:
print(f"Cosine similarity: {round_info['cosine_similarity']:.4f}")
The cosine similarity between perspectives tells you how much the debate created genuine diversity:
- > 0.99 — Perspectives are nearly identical (increase divergence strength)
- 0.90 - 0.99 — Healthy divergence with maintained coherence
- < 0.90 — Strong divergence (may impact fluency)
Configuration¶
debate = SelfDebate(
model,
layer_idx=6, # Which layer to split at
divergence_method="dropout", # How to create diversity
divergence_strength=0.15, # How much divergence
reconciliation_method="gated", # How to merge perspectives
)
Layer Selection
The N-4 rule from the Blades research applies here too. For a 12-layer model, layer 8 (= 12 - 4) tends to produce the best results.
Next Steps¶
- Read the Blades paper for the validated principles behind injection
- Analyze how debate affects hidden state statistics
- Extract the debate layer for use in other models