Reading

Information States as Minimal Representations.

Published: 12/28/2025β€’Authors: Kennon Stewart

Cite this work

Show citation formats
APA
Kennon Stewart (2025). Information States as Minimal Representations..
BibTeX
@inproceedings{information_states_2025, title={Information States as Minimal Representations.}, author={Kennon Stewart}, year={2025}, }

Motivation

A quiet assumption baked into reinforcement learning and control theory is that some parts of the past can be forgotten. Or more precisely: that there exists a representation of history such that, once learned, the rest of the past no longer matters in sequential decision-making. This representation of history, called a state, is only sufficient when it enables such perfect decision-making in the absence of perfect knowledge.

In classical MDPs, this role is played by the state. In POMDPs, by the belief state. And in modern learning systems, by something far messier: buffers or embeddings or even a corpus of natural language traces.

These are the things we’ve learned from the literature in an attempt to make these sufficiency conditions more concrete. We rely on existing classical control theory fundamentals like filtering while referencing up-and-coming applications like the approximate information state to follow.

The goal is not to restate the textbook definition of an information state, but to reinterpret it as a minimal representation problem, or a problem of compression under decision-theoretic constraints. Along the way, we will read Sinha & Mahajan carefully, quote them directly, and explain the direct applications to forefront of reinforcement learning.

The Core Problem: Growing Histories Used to Describe Stable States.

In a partially observed system, the agent does not see the true environment state. What it sees instead is a growing history of observations and actions.

This sentence is doing more work than it appears to.

The problem is not partial observability per se. It is unbounded memory growth. If the agent were allowed to retain the full history forever, optimal control would be trivial in principle (if not in practice).

Every notion of β€œstate” in control theory is, at bottom, an answer to the same question:

What can we throw away without hurting optimal decision-making?

The belief state is the canonical answer---but it comes with two fatal flaws in modern settings:

  1. It requires a known generative model.
  2. It is computationally and representationally expensive.

Sinha and Mahajan’s contribution is to relax the belief-state requirement while preserving the logic of dynamic programming.

Agent States: History Compression Without a Model

Instead of assuming a belief state, Sinha introduces a more general object: the agent state.

This is the conceptual pivot of the paper.

The agent state is not required to be optimal, sufficient, or even Markov. It is simply a recursively updateable compression of history: Zt+1=Ο•(Zt,Yt+1,At).Z_{t+1} = \phi(Z_t, Y_{t+1}, A_t).

In modern terms: replay buffers, RNN hidden states, evidence graphs, optimizer memory, and cache summaries all qualify.

The key move is philosophical as much as technical: state is no longer given by the environment---it is chosen by the agent designer.

At this level, β€œagent state” is a permissive concept. Almost anything qualifies. But permissiveness alone does not recover optimality.

For that, we need information states.

Information States: Sufficiency, Not Memory

Sinha defines an information state via two precise sufficiency conditions.

This is the precise definition we should use, because it ties β€œstate” directly to control.

Sinha and Mahajan are not saying that ZtZ_t is a good summary because it is small, low-entropy, or elegant. They are saying it is good when two concrete substitutions are valid:

  • Replace full history with ZtZ_t when evaluating expected one-step reward.
  • Replace full history with ZtZ_t when propagating next-step state dynamics.

In other words, sufficiency here is operational: if these equalities hold, dynamic programming over ZtZ_t is behaviorally faithful to the original history process.

The test is not β€œdid we compress?” but β€œdid we preserve decision-relevant prediction?” If both properties hold, then dynamic programming goes through cleanly.

This is the first real payoff of the framework.

The theorem says that once ZtZ_t satisfies Sinhaβ€”Mahajan sufficiency, history dependence is no longer needed for optimal control. You can optimize in the compressed state space and recover an optimal policy for the original problem.

Put differently: this is not β€œrepresentation learning because compression feels elegant.” It is a formal equivalence result between history-based control and state-based control under (P1) and (P2).

It also draws a clean boundary for engineers putting it into practice: the hard part is not solving the DP over ZtZ_t; the hard part is constructing a ZtZ_t that reaches sufficiency.

Approximate Information States: Controlled Lossy Compression

Most agent states in practice fail (P1) and (P2). Sinha’s response is not to abandon them, but to bound the damage.

The practical value of the Approximate Information State is that it satisfies an otherwise infeasible goal: perfect knowledge of the dynamics of a nonstationary system. The canonical belief state requires a continuous, and often infinite, belief space that is difficult to satisfy without storing the system dynamics explicity, action-by-action and state-by-state. Some sense of approximation, when well defined, is an appropriate substitute. And much more efficient.

Suppose one design updates a compact latent state ZtZ_t, while another stores a running natural-language trace. The AIS lens asks a concrete question: which representation yields smaller (Ξ΅t,Ξ΄t)(\varepsilon_t,\delta_t) for the same task and compute budget?

In many settings, latent states better preserve transition-relevant structure (smaller Ξ΄t\delta_t), while language traces improve auditability but expand memory and can degrade control fidelity if used as the sole control state.

The engineering answer is usually a mix of two solutions: an embedded latent state for live inference, with batch runs to convert model inference to interpretable natural language. This is known as a kappa architecture, separating the real-time streaming inference that users see from the less urgent interpretability metrics.

I would even argue that this is the implicit basis of Newton and quasi-Newton optimization methods. This family of methods differ mainly in the modes used to represent and invert a particularly expensive Hessian operation.

Optimization Algorithms as Agent States

Newton’s method, BFGS, and L-BFGS all maintain internal state across iterations. But precision matters. Jorge Nocedal proved that the reduction in precision induced by the BFGS operation would incur a controllable loss in accuracy. Mokhtari and Ribeiro carefully do a similar job here. And this is exactly the approximation of an otherwise intractable information state.

But there is also the converse of approximation, which is refinement. Under this framing the auxiliary state of the Newton optimizer corresponds to a high-complexity agent state, which is the Hessian itself. The low-memory equivalent L-BFGS corresponds to a compressed agent state. And online variants correspond to approximate information states, where we work under the assumption of a nonstationary policy that may perform better than a stationary policy.

This is not a claim that these methods optimize MDL, and I would interpret that to be a fault. I instead claim that they can be reinterpreted as navigating a tradeoff between the quality of memory, prediction, and decision-making.

Information States as Minimal Representations

I’m willing to admit at this point, my disagreement with the framework is narrow. But it is urgent in domains where precision comes at a cost.

Sinha and Mahajan give a rigorous behavioral definition of sufficiency: if two representations induce the same control-relevant reward and transition predictions (exactly or approximately), they are equally valid for dynamic programming. That is a strong theorem and a valuable one.

But it leaves an engineering gap. Two agent states can be equally sufficient in the Sinhaβ€”Mahajan sense while having very different content, memory footprint, latency, interpretability, and deployment cost. In practice, teams do not choose state representations in a vacuum---they choose under compute budgets, hardware limits, safety review requirements, and maintenance constraints.

So the open problem is not whether Theorem 1 is correct. It is whether control-relevance alone is the right endpoint for representation design.

My claim is that it is not.

A useful next definition of sufficiency should include both sufficient control of performance in addition to codelength. Such a definition would mirror the notion of a minimal statistic in probability theory, which is the minimal benchmark for probabilistic modeling.

This is a direction to watch, particularly as agents are deployed to memory-constrained devices where large agent states are infeasible. It becomes a barrier to a truly smart, reactive city when the models embedded to buses and trains add lag to the user experience.

It becomes imperative that we optimize on more than just control.

Concretely, that suggests adding a second objective to the AIS lens: among representations with comparable (Ξ΅t,Ξ΄t)(\varepsilon_t,\delta_t), prefer the one with lower descriptive complexity and implementation burden. In upcoming work, I will connect this to MDL/Kolmogorov-style language, where representation quality is judged not only by control fidelity but also by codelength.

In that sense, information states are not just filters over history. They are design decisions tailored to the agent’s environment and the problem at hand. As such, it does well to appropriately define our constraints.