Feb 2026 – May 2026
AI · SWELast edited
Auditing Forgetting in Limited Memory Language Models
This project introduces a causal evaluation framework for memory separation in Limited Memory Language Models, building on Zhao et al. (2025). The core question: when a model is asked to forget specific information, what residual signal remains?
We decompose post-deletion correctness into three components: parametric leakage (information stored in the model's weights), retrieval-mediated correctness (information returned via retrieval over the active database), and retrieval artifacts (correctness arising from inference patterns rather than direct recall).
Across 1,404 datapoints, six prompt formulations, and thirteen database variants, we orchestrated evaluations via Slurm and tracked experiments with Weights & Biases. The headline finding: retrieval artifacts, not parametric leakage, dominate residual correctness after deletion.
The work was developed with research feedback from NLP researchers at Stanford, Cornell, and UC Berkeley.
Affiliation
UC Berkeley
Partners
Report
- Manuscript
Keywords
- Machine Unlearning
- Knowledge Editing
- Causal Inference
- Python
- PyTorch
- CUDA
- LLMs
- NLP
- Slurm
- Weights & Biases
▸ Deepdive
Under development.