Set Theory in Practice

Kennon Stewart

Reading

Set Theory in Practice

Published: 9/15/2025•Authors: Kennon Stewart

Cite this work

Show citation formats

APA

Kennon Stewart (2025). Set Theory in Practice.

BibTeX

@inproceedings{set_theory_2025,
  title={Set Theory in Practice},
  author={Kennon Stewart},
  year={2025},
}

📥 Download BibTeX 📥 Download APA

Why It Matters to the Lab

Set theory is the language of modern probability and machine learning. It gives us a precise way to describe collections of outcomes and how they interact. By mapping sets to probabilities, we connect abstract math to real-world uncertainty.

The importance here is a bit more abstract than we may be used to. A random variable maps a piece of data (ie. heads and tails, a collection of streets, a particular neighborhood) to a numeric value that can be more easily interpreted. But this process of turning data into something quantitative is a powerful thing, especially when that data comes from a city.

We now have a way of describing the state of a city in numbers. When this happens, what was previously mathematics becomes a way of learning from a million human stories represented as data.

What Is a Set?

At its core, a set is just a collection of distinct objects. These objects are called elements or members. We write sets using curly braces:

$A = \{1, 2, 3\}$
$B = \{\text{cat, } \text{dog, } \text{parrot}\}$

If an element $x$ belongs to $A$ , we write $x \in A$ . If not, we write $x \notin A$ .

In statistics, sets often represent sample spaces: the universe of all possible outcomes of an experiment.
Example: tossing a coin four times gives the sample space

S = \{ (H,H,H,H), (H,H,H,T), \dots, (T,T,T,T) \}.

There are $2^4 = 16$ possible outcomes.

Operations on Sets

Sets become powerful when we define operations on them. The same way that we can perform algebraic operations on sequences of numbers like addition, subtraction, and multiplication, we can perform set operations on the sets themselves.

These parallel logic in everyday reasoning:

Union ( $A \cup B$ ): everything in $A$ or $B$ .
Intersection ( $A \cap B$ ): everything in both $A$ and $B$ .
Complement ( $A^c$ ): everything not in $A$ .
Difference ( $A \setminus B$ ): things in $A$ but not in $B$ .

Example: If $A =$ “twin females” and $B =$ “identical twins,” then $A \cap B$ is “identical twin females,” and $A \setminus B$ is “fraternal twin females”:.

These simple rules give us a toolkit for describing events in probability theory.

Axioms and Identities

The structure of set theory rests on a few basic identities:

$A \setminus B = A \cap B^c$
$B = (B \cap A) \cup (B \cap A^c)$
$A \cup B = A \cup (B \cap A^c)$ :

These may look abstract, but they mirror intuitive reasoning: any element of $B$ is either in $A$ or outside $A$ . With practice, these identities become second nature when simplifying probabilities.

Sets and Probability

The connection between set theory and probability is elegant:

The sample space $S$ is the set of all possible outcomes.
An event is any subset of $S$ .
A probability measure $P$ assigns a number to each set, with $0 \leq P(A) \leq 1$ .

Key property:
If $A_1 \subseteq A_2 \subseteq A_3 \dots$ , then

P\Big(\bigcup_{i=1}^\infty A_i\Big) = \lim_{n \to \infty} P(A_n).

This continuity of probability measures is one of the foundational axioms:.

Why Machine Learning Cares

Set theory isn’t just an intro chapter—it underlies how we structure data:

In LLMs, the vocabulary is a set of tokens, and the context window is a sequence drawn from this set.
In recommender systems, the set of items a user has interacted with helps define future recommendation sets.
In clustering, each cluster is a set of points; in classification, each class label corresponds to a subset of the sample space.

Whenever we reason about categories, overlap, or exclusion, we’re doing set theory.

Summary

Sets define collections of outcomes.

Core operations

union, intersection, complement, difference.

Events in probability are subsets of the sample space.

Probability axioms build directly on set theory.

Applications stretch from coin tosses to deep learning models.