Beyond Belief States — Reading Notes

Kennon Stewart

Reading

Beyond Belief States — Reading Notes

Published: 1/21/2026•Authors: Kennon Stewart

📄 View PDF

Cite this work

Show citation formats

APA

Kennon Stewart (2026). Beyond Belief States — Reading Notes.

BibTeX

@inproceedings{orders_of_information_2026,
  title={Beyond Belief States — Reading Notes},
  author={Kennon Stewart},
  year={2026},
}

📥 Download BibTeX 📥 Download APA

The Art of a Good Decision

While working at HP, I was embedded with a Marketing Returns team responsible for analyzing a global digital marketing portfolio. The opportunity was formative—not because of the scale of the data, but because of the discipline_required to use it well.

I was assigned a set of digital marketplaces across North and South America and became the single-threaded owner of marketing returns and share-of-shelf analyses. The outputs fed directly into conversations between business development teams and external partners. I wasn’t just building dashboards, I was scanning the waves of data for something that my stakeholders would need to know before those conversations.

Because the decisions were global, the data had to be good. But “good” did not mean throwing the entire database at my stakeholders. The team had enough experience to immediately filter the firehose: weekly volume sold, digital share of shelf, click-through rate. Everything else, no matter how detailed or expensive to compute, was usually ignored.

That experience sharpened my intuition about decision-making. Making a good decision appears deceptively simple:

keep the good information, and discard the rest.

But that immediately raises the harder question: what counts as good information?

Decisions Are Made Under Information Constraints

In practice, decisions are almost never made with full knowledge. They are made under constraints: time, noise, delayed feedback, and limited attention. Some signals matter now. Some mattered yesterday. Some will matter only if you wait too long.

This is not a failure of tooling or analytics. It is a structural feature of decision-making.

Classic work in decision theory and operations research—most famously Herbert Simon’s notion of bounded rationality—recognized this early. Real agents do not optimize over all available information. They satisfice using compressed summaries that are “good enough” for the task at hand.

The HP example fits this pattern exactly. The challenge was not collecting more data, but deciding which summaries were sufficient to justify a concrete action.

Why Naïve Decision Rules Fail

A common response to uncertainty is to delay commitment and gather more information. Though this can be important in crucial decisions, we also have to acknowledge that not all decisions are crucial. And sometimes, this need for more information is really just hedging for more time. This shows up when I hear things like

“Let’s wait for one more report.”
“Let’s add another KPI.”
“Let’s instrument the pipeline more thoroughly.”

These rules feel safe, but they fail in predictable ways. The information overload from “just one more source” can obscure the signal present in any refined body of information. Outdated or irrelevant information can distract from more pressing and relevant data, and any delay to taking action carries real opportunity cost. Pushing off a decision is, in fact, making a decision.

What is missing is a way to reason about sequences of decisions: when to observe, when to act, and when to stop. This is exactly the gap that sequential decision models were designed to fill.

Markov Decision Processes as a Language for Sequential Decisions

A Markov Decision Process (MDP) provides a minimal mathematical language for reasoning about sequential decisions under uncertainty. Formally, an MDP is defined by the tuple

(\mathcal S, \mathcal A, P, R, \gamma),

where:

$\mathcal S$ is a set of states,
$\mathcal A$ is a set of actions,
$P$ defines transition dynamics,
$R$ assigns rewards,
$\gamma$ discounts the future. In the marketing example:
the state is not the raw data warehouse, but a summary of what is currently known,
actions include acquiring more information or committing to a strategy,
rewards reflect downstream business outcomes minus the cost of delay or analysis.

A crucial point is often missed here. The Markov property—“the future depends only on the current state”—is not a claim about how the world works. It is a design constraint. We choose states so that this property holds approximately enough to support good decisions.

This perspective is developed rigorously in the dynamic programming literature (e.g., Bellman) and formalized in texts such as Markov Decision Processes: Discrete Stochastic Dynamic Programming, with modern intuition provided by Sutton and Barto.

The Real Problem: What Is the State?

Once you look closely, the hardest part of an MDP is not defining rewards or actions. It is defining the state.

Real systems are almost never Markov in their raw observations. Important variables are hidden, effects are delayed, and observations are noisy. If we treat the entire history as the state, the model becomes intractable. If we treat the latest observation as the state, it becomes wrong.

This tension leads naturally to partially observed settings, where the agent must act without direct access to the true underlying state. But naming the formalism is less important than confronting the core issue:

what information from the past is actually necessary to make good future decisions?

Information States: Keeping Only What Matters

An information state answers that question directly.

Informally, an information state is:

a compressed summary of the past that is sufficient for evaluating future decisions.

Unlike full histories, information states deliberately discard detail. Unlike perfect Bayesian belief states, they do not require complete probabilistic models of the world. What matters is sufficiency.

In the agent-state literature, sufficiency is typically defined by two requirements:

the state must be sufficient to evaluate expected rewards of future actions,
the state must be sufficient to predict its own evolution under those actions.

If these hold, then optimal (or near-optimal) policies can be derived as if the system were fully observed.

This idea is developed formally in the agent-state and information-state framework of Sinha and Mahajan, as well as in related common-information and approximate-state work. The key insight is that belief states are not the only game in town. Any representation that satisfies the sufficiency conditions can support principled control.

Approximation Is Not a Hack—It Is the Point

Exact information states are rare. In realistic systems, they are usually impossible.

This is not a problem to be patched over; it is the central design question. Information states are almost always approximate: lossy compressions tuned to a task, a cost structure, and a decision horizon.

From this perspective, approximation is not an engineering compromise. It is a control-theoretic necessity.

Ideas from information theory make this precise. Rate–distortion theory formalizes the tradeoff between compression and fidelity. The information bottleneck reframes representation learning as task-relevant compression. Approximate dynamic programming studies how near-optimal policies emerge from imperfect state abstractions.

All of these point to the same conclusion: good decisions depend on the right compression, not perfect memory.

Case Study: The Hessian as an Information State

Second-order optimization provides a concrete example.

In Newton’s method, the Hessian matrix accumulates curvature information over time. Individual data points are forgotten. Exact gradients are discarded. What remains is a compact object that encodes how the loss surface responds to change.

Viewed through the lens of decision-making:

the gradient alone is often an insufficient state,
the Hessian is a richer information state,
it preserves precisely the structure needed to choose effective updates.

The Hessian is not a record of the past. It is a compressed memory optimized for future decisions. This is exactly what an information state is meant to be.

From MDPs to Agentic Reasoning Systems

MDPs also underpin reinforcement learning, where agents learn policies through interaction with an environment. But it is easy to misinterpret the lesson.

The agent is not the center of intelligence. The state representation is.

In modern agentic systems—tool-using language models, retrieval-augmented generation, multi-agent orchestration—the same pattern appears. Reasoning does not live inside the agent as an anthropomorphic entity. It lives in the evolving state: an evidence set, a graph, a plan, a compressed summary.

Agents are policy executors. If the state is sufficient, they are interchangeable.

This reframing makes it clear why many contemporary failures are not failures of language modeling, but failures of decision design: poor stopping rules, bloated states, redundant information, and uncontrolled accumulation of context.

A Provocative Claim

Stronger reasoning systems do not primarily need more information.

They need better decisions.

Most failures are stopping failures. Most inefficiencies are state-design failures. Scaling models without controlling the decision process often makes things worse, not better.

Progress looks less like adding tokens, and more like learning what to throw away.

Open Questions

This perspective opens several research questions that continue to motivate my work:

How do we measure the sufficiency of an information state?
What is the minimum description length of a decision-relevant state?
When does compression break optimality, and by how much?
How can we experimentally test state sufficiency and agent interchangeability?

These are not abstract concerns. They are the difference between systems that reason and systems that merely think longer.