Reading

Probability Theory: From Events to Numbers

Published: 9/22/2025Authors: Kennon Stewart

Cite this work

Show citation formats
APA
Kennon Stewart (2025). Probability Theory: From Events to Numbers.
BibTeX
@inproceedings{probability_theory_2025, title={Probability Theory: From Events to Numbers}, author={Kennon Stewart}, year={2025}, }

Why It Matters to the Lab

Cities, data streams, and neural networks all share a common feature: uncertainty. Probability theory gives us the language to quantify it.

For Second Street Labs, this matters when we interpret noisy sensor data, design algorithms with randomness, or ensure robustness in the face of incomplete information.


The Probability Space

A probability space is a triple (S,F,P)(S, \mathcal{F}, P) where:

  • SS: the sample space (all possible outcomes).
  • F\mathcal{F}: a σ\sigma-algebra of subsets of SS (the measurable events).
  • PP: a probability measure that assigns numbers to events.

Together, these ensure that every “legal” event gets a consistent probability.


Kolmogorov’s Axioms

Probability is built on three axioms:

  1. Non-negativity: P(A)0P(A) \geq 0 for all AFA \in \mathcal{F}.
  2. Normalization: P(S)=1P(S) = 1.
  3. Countable Additivity: If A1,A2,A_1, A_2, \dots are disjoint, P(i=1Ai)=i=1P(Ai).P\Big(\bigcup_{i=1}^\infty A_i\Big) = \sum_{i=1}^\infty P(A_i).

From these, we can derive continuity, complements, and conditional probability.


Examples

  • Coin Tosses: P({H})=0.5P(\{H\}) = 0.5, P({T})=0.5P(\{T\}) = 0.5.
  • Twins Example: Assign probabilities to “identical twins,” “fraternal twins,” and “female twins” using set intersections and unions.
  • Continuous Example: S={xR:x>0}S = \{x \in \mathbb{R}: x > 0\}, with PP defined by an exponential distribution.

Why Machine Learning Cares

  • Bayesian inference: Posterior updates are probability measures over parameter sets.
  • Generalization bounds: Expressed in terms of probability of error over unseen samples.
  • Stochastic optimization: Algorithms like SGD rely on treating gradients as random variables drawn from a probability space.

Whether explicit or implicit, probability is the glue that holds machine learning theory together.

Key Takeaways

A probability space (S,F,P)(S, \mathcal{F}, P) formalizes randomness.

Kolmogorov’s three axioms make the theory consistent.

Examples range from coin flips to continuous lifetimes.

Machine learning applies these foundations at every level.


Further Reading

  • Casella & Berger, Statistical Inference (Chapter 1, probability axioms)
  • Kolmogorov, Foundations of the Theory of Probability