Reading

Set Theory in Practice

Published: 9/15/2025Authors: Kennon Stewart

Cite this work

Show citation formats
APA
Kennon Stewart (2025). Set Theory in Practice.
BibTeX
@inproceedings{set_theory_2025, title={Set Theory in Practice}, author={Kennon Stewart}, year={2025}, }

Why It Matters to the Lab

Set theory is the language of modern probability and machine learning. It gives us a precise way to describe collections of outcomes and how they interact. By mapping sets to probabilities, we connect abstract math to real-world uncertainty.

The importance here is a bit more abstract than we may be used to. A random variable maps a piece of data (ie. heads and tails, a collection of streets, a particular neighborhood) to a numeric value that can be more easily interpreted. But this process of turning data into something quantitative is a powerful thing, especially when that data comes from a city.

We now have a way of describing the state of a city in numbers. When this happens, what was previously mathematics becomes a way of learning from a million human stories represented as data.


What Is a Set?

At its core, a set is just a collection of distinct objects. These objects are called elements or members. We write sets using curly braces:

  • A={1,2,3}A = \{1, 2, 3\}
  • B={cat, dog, parrot}B = \{\text{cat, } \text{dog, } \text{parrot}\}

If an element xx belongs to AA, we write xAx \in A. If not, we write xAx \notin A.

In statistics, sets often represent sample spaces: the universe of all possible outcomes of an experiment.
Example: tossing a coin four times gives the sample space

S={(H,H,H,H),(H,H,H,T),,(T,T,T,T)}.S = \{ (H,H,H,H), (H,H,H,T), \dots, (T,T,T,T) \}.

There are 24=162^4 = 16 possible outcomes.


Operations on Sets

Sets become powerful when we define operations on them. The same way that we can perform algebraic operations on sequences of numbers like addition, subtraction, and multiplication, we can perform set operations on the sets themselves.

These parallel logic in everyday reasoning:

  • Union (ABA \cup B): everything in AA or BB.
  • Intersection (ABA \cap B): everything in both AA and BB.
  • Complement (AcA^c): everything not in AA.
  • Difference (ABA \setminus B): things in AA but not in BB.

Example: If A=A = “twin females” and B=B = “identical twins,” then ABA \cap B is “identical twin females,” and ABA \setminus B is “fraternal twin females”:.

These simple rules give us a toolkit for describing events in probability theory.


Axioms and Identities

The structure of set theory rests on a few basic identities:

  • AB=ABcA \setminus B = A \cap B^c
  • B=(BA)(BAc)B = (B \cap A) \cup (B \cap A^c)
  • AB=A(BAc)A \cup B = A \cup (B \cap A^c):

These may look abstract, but they mirror intuitive reasoning: any element of BB is either in AA or outside AA. With practice, these identities become second nature when simplifying probabilities.


Sets and Probability

The connection between set theory and probability is elegant:

  • The sample space SS is the set of all possible outcomes.
  • An event is any subset of SS.
  • A probability measure PP assigns a number to each set, with 0P(A)10 \leq P(A) \leq 1.

Key property:
If A1A2A3A_1 \subseteq A_2 \subseteq A_3 \dots, then

P(i=1Ai)=limnP(An).P\Big(\bigcup_{i=1}^\infty A_i\Big) = \lim_{n \to \infty} P(A_n).

This continuity of probability measures is one of the foundational axioms:.


Why Machine Learning Cares

Set theory isn’t just an intro chapter—it underlies how we structure data:

  • In LLMs, the vocabulary is a set of tokens, and the context window is a sequence drawn from this set.
  • In recommender systems, the set of items a user has interacted with helps define future recommendation sets.
  • In clustering, each cluster is a set of points; in classification, each class label corresponds to a subset of the sample space.

Whenever we reason about categories, overlap, or exclusion, we’re doing set theory.


Summary

Sets define collections of outcomes.

Core operations

union, intersection, complement, difference.

Events in probability are subsets of the sample space.

Probability axioms build directly on set theory.

Applications stretch from coin tosses to deep learning models.


Further Reading

  • Casella & Berger, Statistical Inference