Reading

Sigma-Algebras: a Principled Approach to Asking Good Questions

Published: 9/18/2025Authors: Kennon Stewart

Cite this work

Show citation formats
APA
Kennon Stewart (2025). Sigma-Algebras: a Principled Approach to Asking Good Questions.
BibTeX
@inproceedings{sigma_algebras_2025, title={Sigma-Algebras: a Principled Approach to Asking Good Questions}, author={Kennon Stewart}, year={2025}, }

To Compare the Probability of Events, We need to Define an Event.

In our work, we constantly deal with uncertainty. We build models to predict customer behavior, forecast demand, and find patterns in noisy data. But have you ever wondered what makes the math of probability actually work?

How can we be sure that our calculations are logically sound, especially when dealing with complex, real-world events?

The answer lies in a concept called the σ\sigma-algebra (pronounced: sigma-algebra). This is like a list of all the outcomes that we can group into events, which are then assigned probabilities (definition very loosely adapted from [Durrett_1999]). It defines the set of “reasonable questions” we’re allowed to ask about our data. Without this framework we have no way of distinguishing one event from another, much less compare their probability.

Is it more likely to rain tomorrow or to snow? We can only compare the probabilities if we have a clear definition of the event space that distinguishes rain and snow.

This is pretty simple in a finite and discrete space of outcomes. It becomes much more difficult when the outcome space is infinite or our outcomes are continuous.

What Is a Sigma-Algebra? A Principled Approach to Asking Questions

Let’s start with a simple experiment: flipping a coin twice (adapted from [Casella_Berger_2024]). The set of all possible outcomes, called the sample space (Ω\Omega), is {HH, HT, TH, TT}. We might want to ask about the probability of an “event,” like “getting at least one head,” which corresponds to the subset {HH, HT, TH}.

A σ\sigma-algebra is simply the collection of all such events that we want to be able to measure. For this collection to be useful and consistent, it must follow three common-sense rules:

  1. Certainty is measurable: The event “one of the outcomes happens” (i.e., the entire sample space {HH, HT, TH, TT}) must be in our collection. Its probability is 1.
  2. Opposites are measurable: If we can ask for the probability of an event, we must be able to ask for the probability of it not happening. If “at least one head” is a valid event, its opposite, “no heads” ({TT}), must also be a valid event. This is called being closed under complements.
  3. Combinations are measurable: If we have a list of measurable events, the event that “at least one of them occurs” must also be measurable. If “getting exactly one head” ({HT, TH}) and “getting two heads” ({HH}) are measurable, their combination, “getting at least one head,” must also be. This is called being closed under countable unions.

That’s it. A collection of events satisfying these three rules is a σ\sigma-algebra. It’s the “well-behaved” set of questions we can ask, for which probability theory guarantees a consistent answer.

A Very Special Case: Data Over Time (Stochastic Processes)

This framework truly shines when we analyze data that arrives sequentially, like stock prices, sensor readings, or user interactions on a website. This is the domain of stochastic processes.

Imagine you are monitoring a system in real-time. As time passes, you gain more information. We can model this accumulation of knowledge using a filtration, which is nothing more than a sequence of nested σ\sigma-algebras [Puterman_1994].

Let’s call the σ\sigma-algebra representing our knowledge at time tt as Ft\mathcal{F}_t.

  • F0\mathcal{F}_0 represents our knowledge at the start (maybe just the initial state).
  • F1\mathcal{F}_1 represents all the events we can determine after one minute. It includes everything in F0\mathcal{F}_0 plus new information.
  • F2\mathcal{F}_2 represents our knowledge after two minutes, containing everything in F1\mathcal{F}_1 and so on.

This sequence, F0F1F2\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \mathcal{F}_2 \subseteq \dots, is the filtration. When we say a process is adapted to a filtration, it’s a mathematically precise way of saying that at any point in time, the process’s value is knowable based only on the information gathered so far.

The Sigma-Algebra’s Role in Learning Theory.

The σ\sigma-algebra isn’t just an abstract concept; it’s deeply embedded in the principles of machine learning.

In ML, a feature (like age or income) is a random variable. Formally, a random variable is a function that maps outcomes to numbers. The σ\sigma-algebra guarantees that we can ask probabilistic questions about these features, like “What is the probability that income is over $100k?”

For probabilistic modeling, models like Bayesian networks, Gaussian processes, and VAEs are built on the laws of probability. The σ\sigma-algebra is the underlying structure that ensures the probability distributions these models learn are mathematically sound.

But this also has relevance for controlling the information we don’t want to learn. For sensitive data, we consciously decide what information a model should or should not have access to. Defining the events we choose to measure is equivalent to constructing a specific σ\sigma-algebra, giving us a powerful tool to control information flow.

References

  1. Durrett, Richard (1999). Essentials of stochastic processes. Springer.
  2. Casella, George, Berger, Roger (2024). Statistical Inference. Chapman and Hall/CRC. https://doi.org/10.1201/9781003456285
  3. Puterman, Martin L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley. https://doi.org/10.1002/9780470316887