We answer a query q about a latent correct answer Y∈Y. At time t, the agent has an evidence graph
Gt=Φ(r1:t,q),
and its uncertainty is
Ht;=;H(Y∣q,Gt).
Let the filtration (the information the agent has seen so far) be
Ft;=;σ(q,a1:t,r1:t).
Because Gt is a deterministic function of (q,r1:t), both Gt and Ht are Ft-measurable. (For background on stopping times and measurability of stopped/optional objects, see the standard definitions and results in the general theory of stochastic processes.)
A (no systematic misinformation): Conditioning on Ft, the expected entropy after incorporating the next result does not increase:
E[H(Y∣q,Gt∪rt+1),,Ft]\leH(Y∣q,Gt).
B (causality): The policy chooses at using only current information, i.e., at is Ft-measurable.
Because Ht is Ft-measurable, the event {Ht≤ε}∈Ft, hence τε is a stopping time. (Stopped/optional objects at stopping times are measurable in the expected σ-fields.)
Under standard integrability/regularity (boundedness or bounded increments suffices in our discrete setup), optional stopping for supermartingales yields
(For the discrete-time optional sampling/sampling inequality, see Doob’s theorem in Pascucci’s Chapter 6; it is stated for sub/martingales and used analogously for supermartingales with signs adjusted. )
We’re in discrete time, (Ht) is bounded below, and (\tau\varepsilon) is (by design) a bounded/finite stopping time under any admissible policy that either hits (\varepsilon) or halts with abstention. Under these mild conditions, the optional sampling/optional stopping statements we used are standard.
A sigma-algebra lists the events you can measure; a filtration(Ft) is a growing family of such sigma-algebras capturing information over time. (Formal definitions and the role of optional/predictable σ-fields are standard; see, e.g., the development of optional processes and their measurability at stopping times.)
Stopping times are random times whose occurrence is knowable from the past: {τ≤t}∈Ft. (Definition and basic properties; also how stopped processes inherit measurability.)
Optional sampling/Doob: for (super/sub)martingales, conditional expectations at stopping times behave as expected; see the discrete statement used repeatedly in practice.
Because Ht is Ft-measurable, {Ht≤ε}∈Ft. The event {τε≤t} equals ⋃s≤t{Hs≤ε}∈Ft, so τε is a stopping time (and standard section/projection results ensure the usual measurability behaviors at τ).
Combine the bits-per-cost bound with graph-theoretic lower bounds (e.g., Steiner-style) to capture structural hardness as well as informational hardness.