How do we legislate an LLM's memory?

Kennon Stewart

Reading

How do we legislate an LLM's memory?

Published: 5/6/2026•Authors: Kennon Stewart

Cite this work

Show citation formats

APA

Kennon Stewart (2026). How do we legislate an LLM's memory?.

BibTeX

@inproceedings{gpdr_2026,
  title={How do we legislate an LLM's memory?},
  author={Kennon Stewart},
  year={2026},
}

📥 Download BibTeX 📥 Download APA

The right to delete data used to be a database problem, and easily resolved. A user asked a company to remove a record, and the company found the row and deleted it. This single change would propagate through backups, indexes, and downstream systems. This is one of the few tradeoffs of a more connected world.

This was already difficult in practice, but at least the object of deletion was visible. Modern machine learning makes the problem stranger. A datum is not merely stored; it is transformed into gradients, parameters, embeddings, normalization constants, checkpoints, optimizer state, cached predictions, and deployment artifacts. The original row may be gone while its statistical influence remains.

This is why machine unlearning matters. It gives statisticians and engineers the language for a question privacy law increasingly forces organizations to ask: after a valid deletion request, what exactly remains of the deleted person’s data inside a learned system?

the EU’s Approach: legislators take a swing at statistics.

GDPR is often described as the reason machine unlearning suddenly matters, but the relationship is more subtle. GDPR defines personal data broadly as information relating to an identified or identifiable natural person. It also defines roles such as data subject, controller, and processor, which determine who has obligations over the data and who acts on whose behalf. These are legal categories, not machine-learning categories. A model parameter, a gradient sketch, or an embedding vector is not automatically classified by GDPR simply because it was produced during training. The question is whether it still relates to an identifiable person, or whether personal data can be extracted, inferred, or otherwise obtained from it. (GDPR)

The right to erasure, often called the “right to be forgotten,” is similarly more precise than its popular description. Article 17 gives data subjects the right to obtain erasure of personal data without undue delay when one of the listed grounds applies; the right is not absolute. The UK ICO’s guidance also emphasizes that the right applies only in certain circumstances and to data held at the time the request is received. This matters because “delete the training row,” “remove a model’s residual influence,” and “make the model legally anonymous” are related but distinct operations. (GDPR)

AI models make this distinction unavoidable. The European Data Protection Board’s 2024 opinion on AI models says that models trained on personal data cannot always be treated as anonymous. The assessment has to be case-specific, including whether personal data could be extracted from the model or accidentally obtained through interaction with the system. That is the regulatory opening for machine unlearning: if models can retain personal information in non-obvious statistical form, then compliance cannot stop at deleting raw records. (European Data Protection Board)

The cleanest technical answer is retraining. Remove the datum from the dataset, rerun the training procedure, and deploy the model that would have existed had the datum never been present. In theory, this gives a natural unlearning method for any model. In practice, it can be expensive, slow, and operationally disruptive. Large models are costly to train, and even smaller online or continuously updated models may receive deletion requests after many downstream updates have already occurred. Retraining also does not automatically solve the whole governance problem: one must still account for checkpoints, logs, derived features, evaluation sets, cached outputs, and copies of the model already deployed.

This is the core idea behind machine unlearning. Instead of always retraining from scratch, can we update a trained system so that it becomes equivalent, or at least statistically close, to the system we would have obtained if the deleted datum had never been used? In the strongest version, unlearning is counterfactual reconstruction. In the weaker version, it is certified approximation: the unlearned model should be hard to distinguish from the model trained without the deleted data.

Statisticians give it their best shot.

The certified-removal literature makes this idea mathematically explicit. Guo, Goldstein, Hannun, and van der Maaten define certified removal as a guarantee that a model after removal cannot be distinguished from a model that never observed the removed data. Their work focuses on settings where this guarantee can be made precise, such as certain linear classifiers. The important conceptual move is that deletion becomes a statistical goal, rather than the hope that “the data is probably gone.” (arXiv)

Other work approaches the problem from system design. SISA training, for example, shards and slices training data so that a deletion request requires retraining only a limited part of the system rather than the whole model. This does not magically erase the complexity of unlearning, but it changes the design philosophy: if deletion is a foreseeable event, the training pipeline should be built so influence is localized and easier to remove. (arXiv)

This is where approximate unlearning becomes attractive. Exact unlearning gives the cleanest counterfactual, but approximate unlearning is more likely to be deployable. The relevant question is not whether approximation is philosophically satisfying. The relevant question is what is being approximated, how the error is measured, and whether the approximation is certified strongly enough for the risk level of the application. A recommender system, a medical model, an employment-screening model, and a public-benefits triage model should not be treated as though they pose the same privacy and downstream harms.

The legal side appears to be moving toward this kind of proportionality. CNIL’s 2026 recommendations state that exercising rights over an AI model may, by default, require retraining if the organization still holds the training data. But CNIL also recognizes proportionality: where retraining is disproportionate, alternative measures such as robust filters may be relevant if the organization can show they are sufficiently effective. This is not the same as saying approximate unlearning always satisfies GDPR. It is closer to saying that the technical response must be justified in relation to the data, the model, the risk, and the burden of the remedy. (CNIL)

This distinction matters because parameter-only unlearning is often too narrow. A trained parameter vector is not the whole system. A model may depend on feature pipelines, embeddings, retrieval indexes, optimizer memory, random seeds, hyperparameter-selection procedures, checkpoints, and monitoring infrastructure. In online learning, the state of the optimizer may itself encode a history of previously seen data. Deleting a datum from the final parameter vector while leaving the optimizer state untouched may be insufficient if that state continues to shape future updates.

Stateful Optimizers: when the algorithm has a stable “brain”

This is especially important for stateful optimizers. In stochastic gradient methods, momentum terms encode past gradient directions. In adaptive optimizers, preconditioners encode estimates of curvature or coordinate-wise variation. In quasi-Newton methods such as L-BFGS, stored curvature pairs encode a local memory of the training path. If the deleted datum affected that path, then the model’s future behavior may still be influenced even after the current parameter vector has been corrected. For online systems, the object of unlearning is not only the model at time (t), but the learning process that will produce models at times (t+1, t+2,\ldots).

This suggests a stronger standard: certify the learning state, not merely the current output. Neel, Roth, and Sharifi-Malvajerdi make a related distinction between strong deletion criteria, where the entire internal state is statistically indistinguishable from retraining, and weaker criteria, where only the observable output must be indistinguishable. The weaker criterion may be more efficient, but the stronger criterion is closer to the intuition behind future-safe unlearning: if the state is clean, future models generated from that state are less likely to reintroduce the deleted datum’s influence. (Proceedings of Machine Learning Research)

Still, “certify once and never again” is too strong. In production, deletion is not a single event. Systems are updated, copied, fine-tuned, distilled, monitored, rolled back, and integrated into other systems. A meaningful unlearning guarantee has to specify its boundary. Does it apply only to the main model? To optimizer state? To embeddings? To derived datasets? To downstream systems? To future updates? A certificate with no system boundary is not a certificate; it is an aspiration.

There is also a hard statistical problem hiding in ordinary model development. Many unlearning guarantees assume a fixed learning algorithm and a fixed target, such as the empirical risk minimizer on the dataset with the deleted point removed. But production models are rarely trained by a single clean optimization pass. They are selected by cross-validation, tuned through experiments, regularized through heuristics, and deployed after many choices that may themselves depend on the data. Suriyakumar and Wilson show that approximate data-removal methods can fail when common hyperparameter tuning procedures such as cross-validation are part of the model-selection process. This widens the unlearning target from “remove influence from the fitted parameter” to “remove influence from the procedure that selected the fitted parameter.” (arXiv)

GDPR’s current role in machine learning research.

The conclusion is not that machine unlearning is impossible or legally useless. The conclusion is that unlearning should be treated as a statistical compliance primitive rather than a complete compliance regime. Privacy law asks whether personal data is still being processed and whether data subject rights have been handled appropriately. Machine unlearning asks whether the influence of a datum can be removed, bounded, or made statistically indistinguishable from a counterfactual training run. These are not currently the same question, but they are increasingly inseparable.

For statisticians, engineers, and mathematicians, this is a productive tension. GDPR does not define a norm on model space. It does not tell us whether the correct distance is parameter distance, prediction distance, total variation distance between model distributions, membership-inference risk, or distance between optimizer states. But production systems need such definitions. Without them, “we deleted your data” remains an operational claim with no statistical content.

The future of unlearning will therefore depend on whether we can make deletion measurable. Not merely by checking whether a row is gone, but by specifying the counterfactual system that should have existed, the metric by which residual influence is measured, the certificate that bounds that influence, and the audit procedure that traces influence through downstream artifacts.

This doesn’t answer all of our questions, but it creates better ones.

Open question: What is the right deletion target? The first unresolved question is whether unlearning should target the final trained parameter, the empirical risk minimizer trained without the datum, the full training trajectory, the optimizer state, the output distribution, or the entire deployed decision system. This matters for data privacy in the age of AI because personal influence may survive in places other than the final model weights. For statisticians, the problem is to define the estimand: what counterfactual object are we trying to recover? For engineers, the problem is to identify the production boundary: which artifacts must be updated or invalidated? For mathematicians, the problem is to define a target rich enough to capture the learning process while still being tractable enough to prove useful guarantees.

Open question: What counts as influence removed? A datum’s influence can be measured through parameter sensitivity, prediction changes, membership-inference risk, reconstruction risk, loss perturbation, distributional indistinguishability, or distance to a leave-one-out model. These are not equivalent. A model can have nearly identical average accuracy while still leaking information about a particular person; conversely, a model can shift parameters noticeably without creating meaningful privacy risk. This is central to AI privacy because deletion requests are individual, while most statistical learning theory is population-oriented. Statisticians need influence measures that respect both individual and distributional effects. Engineers need diagnostics that can be computed in real systems. Mathematicians need to clarify which metrics imply which privacy protections, and which merely provide convenient but incomplete proxies.

Open question: How should approximate guarantees map to legal standards? Certified unlearning often uses mathematical indistinguishability, sometimes with ((\varepsilon,\delta))-style guarantees. GDPR, however, does not specify an acceptable (\varepsilon), nor does it say that statistical indistinguishability automatically satisfies the right to erasure. This is one of the major translation problems between law and machine learning. In the age of AI, privacy will increasingly depend on quantitative claims that regulators, auditors, and affected individuals can understand. Statisticians can help calibrate uncertainty and residual risk. Engineers can build systems that expose auditable deletion logs and certificates. Mathematicians can determine when approximation error composes safely across repeated deletions, model updates, and downstream systems.

Open question: How should organizations audit downstream propagation? Even if the main model is unlearned, the deleted datum may have affected embeddings, retrieval indexes, synthetic data, distilled models, cached outputs, model-selection decisions, logs, analytics dashboards, backups, and third-party recipients. This is the production version of the privacy problem: data influence propagates. For AI privacy, the relevant system is not just the model but the ecosystem of artifacts produced around it. Statisticians need methods for tracing influence through derived objects. Engineers need data lineage and model lineage tools that make deletion operational rather than heroic. Mathematicians need compositional theories of unlearning that explain when local deletion guarantees survive contact with pipelines, ensembles, and downstream decision systems.