The Results of Our Data Deletion Experiment.

Kennon Stewart

doi:10.48550/arXiv.2604.23046

Published

The Results of Our Data Deletion Experiment.

Published: 2/8/2026•Authors: Kennon Stewart

📄 View PDF 💻 View Code 🔗 DOI

Cite this work

Show citation formats

APA

Kennon Stewart (2026). The Results of Our Data Deletion Experiment.. https://doi.org/10.48550/arXiv.2604.23046

BibTeX

@inproceedings{data_deletion_2026,
  title={The Results of Our Data Deletion Experiment.},
  author={Kennon Stewart},
  year={2026},
  doi={10.48550/arXiv.2604.23046},
}

📥 Download BibTeX 📥 Download APA

Data Deletion Is Not Just About Model Performance

Machine unlearning asks a deceptively simple question: what should happen to a model when someone asks for their data to be deleted?

The clean answer is retraining. Remove the requested data, train the model again, and compare the new model to the one that originally saw the full dataset. But retraining is expensive, especially for systems that learn continuously from streams of data. This has made approximate unlearning attractive: rather than retraining from scratch, we update the model so that it behaves like the retrained counterfactual.

That criterion is useful, but it is incomplete.

In our experiments, a model can recover ordinary performance after deletion while its optimizer state remains geometrically misaligned with the counterfactual state. The model appears to have recovered, but the learning process is no longer the one that would have existed had the deleted data never appeared.

This distinction matters most for stateful optimizers. In first-order methods, the learner’s state is mostly its parameters. In second-order methods, the learner also carries a geometry: a matrix or curvature estimate that records accumulated directional information from past gradients. When data is deleted, the question is therefore not only whether the model’s weights or predictions recover. The question is whether the optimizer’s internal state has also been brought back into alignment.

Put differently: a model can forget in the usual performance sense while still remembering in its optimizer geometry.

Why deletion becomes harder in machine learning systems

Privacy regulation has made data deletion an operational requirement. Under the GDPR, individuals have a right to request erasure of personal data in certain circumstances, often described as the “right to be forgotten.” For ordinary databases, deletion is already difficult but conceptually clear: remove the row, remove the record, update any dependent systems.

Machine learning complicates this. A model trained on a data point may no longer contain the original record, but its parameters may still reflect the influence of that record. The hard question becomes:

What does it mean for an algorithm to forget data that has already shaped its internal state?

The most direct answer is exact retraining. Remove the data, train again from scratch, and use the retrained model as the gold standard. This gives a clean counterfactual target: the model that would have existed if the deleted data had never been observed.

But retraining is often too expensive. It can require storing the original training data, repeating costly optimization, and rebuilding models each time a deletion request arrives. Approximate unlearning methods try to avoid this cost by updating the trained model directly. The hope is that the updated model becomes close enough to the retrained counterfactual.

Much of the unlearning literature evaluates that closeness through model parameters, predictions, gradients, or loss. Those are reasonable starting points. But they miss an important case: optimizers whose internal state contains information beyond the current parameter vector.

The usual benchmark: does the unlearned model act like the retrained model?

A common benchmark for unlearning is behavioral. Suppose we have three models:

the original model, trained on all data;
the retrained counterfactual model, trained as if the deleted data never existed;
the unlearned model, produced by applying a deletion update to the original model.

The unlearned model is considered successful if it is close to the retrained counterfactual. Depending on the method, closeness might mean similar predictions, similar loss, similar parameters, or a certified distributional guarantee.

This framing works best when the model state is fully summarized by its parameters. If the optimizer has no memory beyond the current iterate, then making the parameters match the counterfactual is a plausible unlearning target.

But many optimizers are not memoryless. Second-order and quasi-Newton methods maintain curvature information, preconditioners, or low-rank summaries of the optimization trajectory. These objects are not just implementation details. They determine future updates.

That means deletion has two targets:

the model parameters;
the optimizer state that controls how those parameters will change in the future.

Our experiment focuses on that second target.

The experiment: compare the observed learner to the counterfactual learner

We study online deletion in a controlled convex learning problem. The learner receives data sequentially, updates its parameters over time, and then receives a deletion request partway through the stream.

The central comparison is between two trajectories.

The first trajectory is the observed learner. It processes the data stream, receives a deletion request, and continues learning after some unlearning intervention.

The second trajectory is the counterfactual learner. It follows the stream that would have existed had the deleted observations never appeared.

Successful unlearning should bring the observed learner close to this counterfactual path.

We compare two kinds of learners:

Online Gradient Descent (OGD), a first-order baseline with no second-order memory.
Online Newton Step (ONS), a second-order optimizer that maintains an internal matrix $A_t$ .

For OGD, the state is essentially the parameter vector $w_t$ . For ONS, the state is better written as

S_t = (w_t, A_t),

where $w_t$ is the model parameter and $A_t$ is the second-order optimizer state.

The ONS matrix has the form

A_t = \lambda I + \sum_{s=1}^{t} g_s g_s^\top,

where $g_s$ is a gradient observed earlier in the stream. This matrix defines the geometry of future updates. It changes the directions in which the optimizer is sensitive, and it contains a compressed trace of the trajectory that produced it.

So when a deletion request arrives, removing the influence of the deleted data means more than adjusting $w_t$ . It may also require changing $A_t$ .

What we measure

We use two classes of measurements.

The first class measures ordinary online-learning behavior:

regret, or cumulative loss relative to a comparator;
tracking error, or distance between the observed learner and the counterfactual learner;
parameter shock, or the instantaneous displacement at deletion time.

The second class looks inside the second-order optimizer state:

the trace of $A_t$ ;
the condition number of $A_t$ ;
the eigenvalue behavior of $A_t$ ;
cosine alignment between the observed and counterfactual optimizer states.

These state-space diagnostics are the important part of the experiment. They tell us whether the optimizer’s internal geometry aligns with the counterfactual geometry, even when ordinary performance metrics look stable.

Result 1: first-order learners recover in finite time

OGD behaves the way we would expect a memoryless online learner to behave.

In the stationary setting, deletion creates a small transient disturbance. The learner moves away from the counterfactual path, then recovers. In our experiments, the mean recovery time is about 23 rounds.

In the drifting setting, recovery is slower. The learner is not only adapting to deletion; it is also tracking a changing environment. In that case, recovery takes about 78 rounds on average.

This gives a useful baseline. For first-order learners, deletion affects the parameter trajectory, but there is no auxiliary curvature state that can preserve a separate memory of the deleted data. Recovery is primarily a parameter-level phenomenon.

Result 2: second-order learners recover performance faster than they recover state

ONS shows a different pattern.

Viewed through regret alone, ONS appears extremely robust. Across intervention settings in the drifting experiments, regret-based recovery is essentially immediate. A narrow performance-based evaluation would suggest that the second-order learner handles deletion well.

But the state diagnostics tell a different story.

The same runs show nontrivial parameter shock at deletion and persistent separation from the counterfactual trajectory. The learner continues to perform, but it does not return to the same internal geometry as the learner that never saw the deleted observations.

This is the key result: regret recovery does not certify state recovery.

The second-order learner can remain effective while becoming geometrically misaligned.

Result 3: spectral interventions alter geometry more than performance

To test the role of second-order memory, we apply deletion-time interventions directly to the ONS state matrix.

We use two kinds of spectral intervention.

The first is partial reset, which subtracts a constant from the eigenvalues of the optimizer state. This bluntly reduces stored curvature.

The second is eigenvalue decay, which rescales the spectrum of the optimizer state. This more smoothly reduces the magnitude of the stored second-order information.

These interventions are not ordinary parameter updates. They directly change the geometry under which the optimizer continues to learn.

The result is revealing. Spectral interventions substantially change the optimizer state, but they do not necessarily produce catastrophic predictive failure. The learner continues optimizing under a modified metric. Performance can remain stable even as the internal geometry diverges from the counterfactual.

This suggests that there is a class of solutions that are nearly equivalent in loss but different in state. They act similarly from the outside, but they are not the same learning system internally.

Result 4: deletion creates geometric hysteresis

The failure mode is not that the model suddenly stops working. It does not.

The failure mode is hysteresis. The optimizer’s future behavior depends on the path it took through the data stream, and deletion does not automatically erase that path dependence.

The trace of $A_t$ continues to evolve smoothly after deletion, which indicates that learning proceeds normally at the level of aggregate gradient accumulation. But the condition number and cosine-alignment diagnostics show discontinuities and volatility after intervention. Deletion perturbs the shape and orientation of the second-order state more than it perturbs the overall scale of learning.

This is why ordinary performance metrics can be misleading. They can show recovery while the optimizer state remains misaligned.

The learner still learns. It still performs. But it is no longer the learner that would have existed had the deleted data never been processed.

Why this matters

Second-order optimizers are useful because they remember. They use curvature information to adapt quickly, improve conditioning, and make better updates in difficult environments.

That same memory creates a deletion problem.

If the optimizer stores a compressed record of past gradients, then deleted data may influence not only the current parameters but also the geometry of future learning. A deletion request therefore targets the entire learner state, not only the visible model weights.

For stateful optimizers, unlearning should be framed as a counterfactual state-alignment problem:

S_t \approx S_t^{(-U)}.

That is, the observed learner state after deletion should approximate the state that would have arisen had the deleted data never appeared.

For first-order methods, this mostly reduces to parameter alignment:

w_t \approx w_t^{(-U)}.

For second-order methods, it requires both parameter and optimizer-state alignment:

(w_t, A_t) \approx (w_t^{(-U)}, A_t^{(-U)}).

This is the central conceptual shift. Unlearning is not just about making the model act like the retrained ideal. It is also about making the learning process continue from the right internal state.

What this paper does not claim

This experiment does not solve certified machine unlearning. It also does not prove that every second-order optimizer leaks private information in a directly exploitable way.

The claim is narrower and more structural.

Current parameter-based definitions of unlearning are underspecified for stateful optimizers. They can miss persistent differences in internal optimizer geometry, even when external performance appears to recover.

That gap matters because optimizer state determines future learning. If two learners have similar predictions but different internal states, they may respond differently to future data, future deletions, or future distribution shift.

The takeaway

A model can continue to act normally while its optimizer state remains misaligned.

For stateful optimizers, deletion cannot be judged only by regret, loss, prediction accuracy, or parameter distance. Those metrics remain useful, but they do not see the whole learner.

The right object is the full state of the learning process.

In first-order optimization, that state may be close to the parameter vector. In second-order optimization, the state includes the geometry accumulated along the way. That geometry is where memory lives.

So the question is not only:

Does the unlearned model perform like the retrained model?

The stronger question is:

Does the unlearned learner occupy the state it would have occupied if the deleted data had never existed?

Our experiments suggest that, for second-order optimizers, the answer is often no.