Back to all projects

Feb 2026 – May 2026

AI · SWE

Last edited

Auditing Forgetting in Limited Memory Language Models

This project introduces a causal evaluation framework for memory separation in Limited Memory Language Models, building on Zhao et al. (2025). The core question: when a model is asked to forget specific information, what residual signal remains?

We decompose post-deletion correctness into three components: parametric leakage (information stored in the model's weights), retrieval-mediated correctness (information returned via retrieval over the active database), and retrieval artifacts (correctness arising from inference patterns rather than direct recall).

Across 1,404 datapoints, six prompt formulations, and thirteen database variants, we orchestrated evaluations via Slurm and tracked experiments with Weights & Biases. The headline finding: retrieval artifacts, not parametric leakage, dominate residual correctness after deletion.

The work was developed with research feedback from NLP researchers at Stanford, Cornell, and UC Berkeley.

Affiliation

UC Berkeley

Partners

Report

  • Manuscript

Keywords

  • Machine Unlearning
  • Knowledge Editing
  • Causal Inference
  • Python
  • PyTorch
  • CUDA
  • LLMs
  • NLP
  • Slurm
  • Weights & Biases

Deepdive

Under development.