Combinatorics is a branch of mathematics with applications in many fields, such as physics, economics, computer programming, and many others. In particular, probability theory is one of the fields that makes heavy use of combinatorics in a wide variety of contexts. For example, when calculating probabilities, you often need to know the number of possible orderings or groupings of certain events, outcomes of experiments, or generally any kinds of objects.

Here’s what I’m talking about.

# Introduction

Say you have the following set of two elements: {A, B}. In how many ways can you order them? Well, just 2:

- A, B
- B, A

What if you had 3 elements, like {A, B, C}? Then the number of orderings becomes 6:

- A, B, C
- A, C, B
- B, A, C
- B, C, A
- C, A, B
- C, B, A

When the number of elements is large, it gets more than impractical to find the number of orderings by listing and counting. For example, the number of possible orderings of 4 elements is 24, of 5 elements it’s 120, of 6 elements it’s 720, and of 10 elements it’s… 3 628 800. So, you see that the number grows very fast, even for relatively small numbers of elements. Therefore, it’s very useful to have general formulas. You can easily use them to find the number of orderings of any number of elements.

In this post, I’m going to derive a few formulas that are frequently used in probability theory, but the list is going to be nowhere near comprehensive. Hence, the general goal is mostly to give the intuition behind counting orderings and groupings of elements from one or more sets.

Combinatorics is a big field (not all of which is necessarily relevant to calculating probabilities), but once you have the intuition, understanding the derivations of other formulas is going to get much easier.

# Using combinatorics to calculate probabilities

In my post about sample spaces, I showed how the probability of an event can be thought of as the fraction of the sample space the event “occupies”. This way of thinking about probabilities is especially useful and intuitive when the sample space consists of a finite number of discrete outcomes. Then you can directly apply the classical definition of probability and calculate the probability of an event as the number of outcomes favorable to the event, divided by the total number of possible outcomes:

The idea is to reduce the sample space to equally likely outcomes and count those that satisfy the event. For example, there are 6 possible outcomes of rolling a die, 3 of which (1, 3, and 5) are odd numbers. Therefore, the probability of rolling an odd number is:

- P(Odd number) = 3/6 = 0.5

## Application: number of shuffles

Say you want to calculate something slightly more complicated. You have a random shuffle of the numbers 3, 4, 5, 8, and 13 and you want to calculate the probability that the number 13 is going to be in the first position. In this case the shuffles that satisfy your requirement are:

- 13, 3, 4, 5, 8
- 13, 5, 8, 4, 3
- 13, 8, 4, 3, 5
- Etc.

So, the probability is:

The number of possible shuffles is the number of all possible orderings, both those that have 13 at the first position and those that don’t:

- 3, 4, 5, 8, 13
- 13, 8, 3, 4, 5
- 5, 13, 8, 4, 3
- Etc.

As I said earlier, listing and counting all shuffles is impractical and often impossible (for example, when their number becomes large enough to require billions of years of computation) so there should be a more direct way of solving these kinds of combinatorics problems.

# The rule of product

Imagine you have two boxes and one of them holds letters while the other holds numbers. Currently, the first box has {E, L} and the second has {2, 5}.

So, how many pairs can you create between the objects from the two boxes, such that each pair has an object from the first box in the first position and an object from the second box in the second position? The answer is 4:

- (E, 2)
- (E, 5)
- (L, 2)
- (L, 5)

Here’s a graphical illustration of the pairing process:

## Adding more objects

What if you added the number 14 as a third object to the second box? Now the box contains {2, 5, 14} and the new number of pairs becomes 6:

- (E, 2)
- (E, 5)
- (E, 14)
- (L, 2)
- (L, 5)
- (L, 14)

You see the pattern, right? Basically, the number of pairs is equal to the number of objects in the first box (call that number M) times the number of objects in the second box (call it N). That’s because M objects from the first box can occupy the first position in the pair. Each object can be combined with N objects from the second box. So, the general rule for the number of possible pairs from two boxes is M * N.

## Adding more boxes

What if you added a third box containing the colors {Red, Blue} and wanted to create triplets from the three boxes? Now you have 3 positions to fill with the objects of 3 boxes but the problem remains essentially the same. Here are all the possibilities:

- (E, 2, Red)
- (E, 2, Blue)
- (E, 5, Red)
- (E, 5, Blue)
- (E, 14, Red)
- (E, 14, Blue)
- (L, 2, Red)
- (L, 2, Blue)
- (L, 5, Red)
- (L, 5, Blue)
- (L, 14, Red)
- (L, 14, Blue)

So, the new number of triplets is 12 and is the product of the number of objects in each box: 2 * 3 * 2 = 12. That’s because you’re combining each pair created from the first two boxes (of which there are 6) with each object of the third box (2). This rule can be extended to any number of boxes. In combinatorics, it’s known as the **rule of product**.

In the next section, I’m going to show how some basic problems in combinatorics can be solved by reducing them to “boxes” containing “objects” and applying the rule of product.

# Permutations, combinations, and the power set

## Permutations

I already defined the concept of a **permutation** in the first sections of this post, but without actually using that term. This is one of the most basic and frequently encountered concepts in (enumerative) combinatorics. A permutation of a set of items is just a particular ordering of them.

So, let’s go back to the problem I described in the beginning of this post (the probability of a random shuffle starting with 13) and see how we can use permutations to solve it.

We want the probability of getting a permutation of the numbers {3, 4, 5, 8, and 13} that starts with the number 13. To get an idea of a random permutation, take a look at this animated image (click on it to start the animation):

Click on the image to start/restart the animation.

As you can see, the first position of the permutation can be occupied by all 5 numbers. Then, once there is a number in the first position, there are only 4 available numbers for the second position, 3 for the third, 2 for the fourth, and only 1 for the last position. If you think about it, this is exactly like combining objects from five boxes where the first box has 5 objects and every other box has one fewer object than the box to its left. Therefore, all possible permutations can be calculated using the rule of product:

- 5 * 4 * 3 * 2 * 1 = 120

### The general pattern

This rule holds when counting the permutations of any number of elements. There’s a shortcut notation for multiplying all positive integers from 1 to a certain number N: **N! **(the exclamation mark is part of the notation). This is the factorial function you probably already know about:

- N! = N * (N-1) * (N-2) * … * 1

For example:

- 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1

This leads to the general formula:

With this formula, we can now easily solve the problem. Remember, the probability is equal to the number of permutations that start with 13, divided by the total number of permutations (which we already know is equal to 5!). The permutations starting with 13 can be counted in an identical way. This time, the first slot is only allowed to be filled by a single number (13). The remaining slots can still be occupied by 4, 3, 2, and 1 numbers:

- Number of permutations starting with 13 = 1 * 4 * 3 * 2 * 1 = 4!

The final answer to the problem is:

- P(“First Place 13”) = 4!/5! = 24/120 = 1/5

### Partial permutations

Remember, permutations are essentially the possible orderings of N numbers into N slots. There is a generalization of this concept which counts all possible orderings of N numbers into K slots. Here, K is some integer between 1 and N.

This generalization is sometimes called **partial permutations** and is also known as the **K-permutations of N**. However, calculating the number of partial permutations is as easy as calculating permutations.

For example, if you wanted to get the possible orderings of 5 numbers in 2 slots, you would have 5 possible numbers for the first slot and 4 possible numbers for the second:

Therefore, the total number of 2-permutations of 5 is:

- 5 * 4 = 20.

Similarly, the number of 3-permutations of 5 is equal to 5 * 4 * 3 = 60. In general, the number of K-permutations of N is given by the formula:

- K-permutations of N = N * (N-1) * (N-2) * … * (N-K+1)

In other words, the K-permutations of N is equal to the product of K terms, the first being N and each of the remaining terms being the previous term minus 1. The formula that achieves this is:

For example, the number of 2-permutations of 5 would be given by:

- 5!/3! = 5 * 4 * 3 * 2 * 1 / 3 * 2 * 1 = 5 * 4 = 20

Basically, the idea is for the denominator to cancel exactly those terms from the numerator that are not needed. This is achieved by (N-K)!.

## Combinations

Say you’re listing the 2-permutations of the numbers {1, 2, 3, 4}:

- (1, 2)
**(2, 1)**- (1, 3)
**(3, 1)**- (1, 4)
**(4, 1)**- (2, 3)
**(3, 2)**- (2, 4)
**(4, 2)**- (3, 4)
**(4, 3)**

Notice that every other partial permutation (the ones in bold) has the same elements as the one before it, but in a different order. In this case, since there are only two available slots, the possible orders of each pair of numbers is only 2. But if you’re counting the 3-permutations, or higher, of N numbers, there are going to be more than 2 different orderings of the same elements.

For example, among the 3-permutations of the numbers {1, 2, 3, 4, 5}, you would have 6 partial permutations for each set of 3 numbers. Consider the partial permutations consisting of the subset {2, 3, 4}:

- 2, 3, 4
- 2, 4, 3
- 3, 2, 4
- 3, 4, 2
- 4, 2, 3
- 4, 3, 2

So, say you only want to count the partial permutations consisting of unique numbers. In that case, you would divide the number of K-permutations of N by the number of permutations consisting of the same elements. How many is that? You probably already guessed that it’s K!, because you’re basically counting the number of orderings (permutations) of K numbers.

### The binomial coefficient

In combinatorics, the term used for a permutation consisting of unique numbers is a **combination**. Getting the K-combinations of N items is as easy as dividing the K-permutations of N items by K!. So, the formula, commonly referred to as the **binomial coefficient**, is:

The notation on the left-hand side is very common for representing the binomial coefficient and is read as “N choose K”. The need to calculate the binomial coefficient arises in many problems in probability theory, such as calculating the probability of flipping heads K times out of N coin flips.

## The power set

The final concept I want to talk about in this post is the **power set** of a set of items. The power set of a set consists of all subsets of that set. For example, the power set of {1, 2, 3} consists of:

- {} (the empty set)
- {1}
- {2}
- {3}
- {1, 2}
- {1, 3}
- {2, 3}
- {1, 2, 3} (the set itself)

Given any set, how many elements does its power set have? In the example above it was 8. You can always find this number by listing and counting all subsets, but that’s impractical for the same reasons it’s impractical to list and count permutations. Is there any way to reduce this problem to some form where you can apply the rule of product? Actually, there is.

### Constructing elements of the power set

Imagine you construct an element of the power set of a set X by passing through each element of X and making a binary yes/no decision for whether to include it or not. Here’s an animated visualization of the process of constructing an element of the power set of {A, B, C, D} (click on the image to start the animation):

Click on the image to start/restart the animation.

The particular subset obtained this time was {A, B, D}, but you can obtain any subset with this procedure. For example, the empty set is obtained by choosing “no” for each of the 4 binary decisions.

### Counting the elements

So, how many possible subsets are there? Intuitively, you can think of the selection process as a series of 4 “boxes”, each of which has exactly 2 objects: “yes” and “no”. In that case, how many quadruplets can you create out of those 4 boxes? Naturally, this is answered by applying the rule of product:

- 2 * 2 * 2 * 2 = 16

In general, if the number of elements of your set is equal to K, there will be K “boxes”, each with two “objects” (“yes” and “no”). Therefore, in general, the number of elements of the power set of a set with K elements is given by multiplying the number 2 by itself K times:

The vertical bar notation represents the number of elements of a set. Hence, you can read the formula as:

- “The number of elements of the power set of a set with K elements is equal to 2
^{K}.”

By the way, the P(S) notation represents the power set of the set S. Therefore, you shouldn’t confuse it with the notation for the probability of an event. It’s slightly confusing, but there’s only so many letters in the alphabet.

### Relevance to probability theory

So, how can this be useful in probability theory? Actually, in more than one ways, but I’m going to mention one of them here.

In my earlier post about compound events, I showed that the probability of the union of several events, like P(A ∪ B ∪ C…), is a sum consisting of positive and negative terms (the formula is actually somewhat long, so check out the post I linked to if you’re not familiar with calculating probabilities of unions). The total number of such terms is the number of elements of the power set of {A, B, C…}. This number grows exponentially with the number of elements (K). So, calculating the probability of the union of the events can quickly become computationally intractable, even for a relatively small number of events (like K=50). Therefore, knowing this can save you some time when planning the implementation of a probabilistic model.

# Summary

In this post, I introduced the following concepts (along with their mathematical formulas):

- The rule of product
- Permutations and partial permutations
- Combinations and the binomial coefficient
- The power set

In particular, I showed an intuitive way of thinking about the concepts by reducing them to a form (“boxes” containing “objects”) in which they can be solved with the so-called product rule. These concepts don’t necessarily have their roots in probability theory, but you will frequently rely on them (along with other ideas from combinatorics) when solving a wide variety of probabilistic problems.

Bob says

That was an amazing explanation. You’re very good at distilling the essence of the subject and laying it out in a easy to follow incremental fashion. Nice work.

The Cthaeh says

Thank you, Bob!

Faisal says

Loved the post! Can’t wait to read more. Your visualisations are top notch. Btw just a small spelling mistake…it should be “whether”, not “weather”.

I always thought the hardest thing about probability is the lack of intuition and how it can be counter-intuitive. But your posts prove otherwise!

The Cthaeh says

Thank you, Faisal! I’m happy you find my posts are helpful.

Oh, yeah, thanks for pointing out that mistake. Corrected!