Confabulation Scaling¶
π§ Predict LLM Factual Recall Errors¶
Confabulation Scaling is a Python package that models how large language models systematically fail to recall facts. Using calibrated sigmoid scaling laws, we jointly model topic frequency and parameter count to predict when and why LLMs hallucinate.
Why Confabulation Scaling?¶
Large language models are powerful but unreliable for factual recall. They "confabulate"βconfidently stating false information. Understanding when and how much they fail is critical for safe deployment.
Confabulation Scaling provides: - Predictive scaling laws that forecast recall errors across model sizes - Topic frequency modeling linking corpus statistics to hallucination rates - Calibrated uncertainty via sigmoid-based probability estimates - Production-ready tooling for auditing LLM factuality
Key Capabilities¶
Estimate topic frequency across reference corpora using multiple data sources. The CorpusFrequencyEstimator measures how often a topic appears in Wikipedia, web text, and other referencesβthe foundation for predicting recall.
Model error rates across parameter counts with calibrated sigmoid functions. Predict how a 7B model will hallucinate differently than a 70B model for the same topic.
Benchmark LLM recall performance against predicted baselines. Identify high-risk topics and model sizes where hallucination risk is highest.
Modular, tested architecture with scipy, numpy, and spacy for robust numerical computing and NLP pipelines. Full test coverage and type hints.
Quick Start¶
Installation¶
Or install from source:
git clone https://github.com/yourusername/confabulation-scaling.git
cd confabulation-scaling
pip install -e .
Basic Usage¶
from confabulation_scaling.corpus import CorpusFrequencyEstimator
# Estimate how often a topic appears in reference corpora
estimator = CorpusFrequencyEstimator()
frequency = estimator.estimate("machine learning")
print(f"Topic frequency: {frequency}")
Documentation¶
- Getting Started β Installation, setup, and first predictions
- API Reference β Complete module and class documentation
- Scaling Laws β Understanding the math behind predictions
- Examples β Tutorials and reproducible notebooks
Architecture¶
src/confabulation_scaling/
βββ corpus.py # Corpus frequency estimation
βββ scaling.py # Scaling law models
βββ calibration.py # Sigmoid calibration
βββ audit.py # LLM factuality auditing
Dependencies:
- scipy, numpy β Numerical computing and optimization
- spacy β NLP tokenization and entity recognition
- requests β HTTP corpus queries
- python-Levenshtein β String similarity
- wikitextparser β Wikipedia parsing
Contributing¶
We welcome contributions! See CONTRIBUTING.md for guidelines.
Citation¶
If you use Confabulation Scaling in your research, please cite:
@software{confabulation_scaling_2024,
title={Confabulation Scaling: Predicting LLM Factual Recall Errors via Scaling Laws},
author={Your Name},
year={2024},
url={https://github.com/yourusername/confabulation-scaling}
}