Combinatorics is a branch of mathematics with applications in fields like physics, economics, computer programming, and many others. In particular, probability theory is one of the fields that makes heavy use of combinatorics in a wide variety of contexts.

For example, when calculating probabilities, you often need to know the number of possible orderings or groupings of events, outcomes of experiments, or generally any kind of objects.

Here’s what I’m talking about.

Table of Contents

## Introduction to combinatorics

Say you have the following set of two elements: {A, B}.

In how many ways can you order them? Well, just 2:

- A, B
- B, A

Then, what if you had 3 elements, like {A, B, C}? In this case, the number of orderings becomes 6:

- A, B, C
- A, C, B
- B, A, C
- B, C, A
- C, A, B
- C, B, A

When the number of elements is large, it gets more than impractical to find the number of orderings by listing and counting. For example, the number of possible orderings of 4 elements is 24. For 5 elements it is 120, for 6 elements it is 720, and for 10 elements it is… 3 628 800.

So, you see that the number grows very fast, even for relatively small numbers of elements. Therefore, it’s very useful to have general formulas for finding the number of orderings. Then you can easily apply them for any number of elements.

In this post I’m going to derive a few combinatorics formulas that are frequently used in probability theory but the list is going to be nowhere near comprehensive. Hence, the general goal is mostly to give the intuition behind counting orderings and groupings of elements from one or more sets.

Combinatorics is a big field (not all of which is necessarily relevant to calculating probabilities), but once you have the intuition, understanding the derivations of other formulas is going to be much easier.

## Using combinatorics to calculate probabilities

In my post about sample spaces, I showed how you can think about the probability of an event as the fraction of the sample space the event “occupies”. This way of thinking about probabilities is especially useful and intuitive when the sample space consists of a finite number of discrete outcomes. Then you can directly apply the classical definition of probability and calculate the probability of an event as the number of outcomes favorable to the event divided by the total number of possible outcomes:

In other words, the idea is to reduce the sample space into equally likely outcomes and count those that satisfy a particular event. For example, there are 6 possible outcomes of rolling a die, 3 of which (1, 3, and 5) are odd numbers. Therefore, the probability of rolling an odd number is:

### Application: number of shuffles

Say you want to calculate something slightly more complicated. You have a random shuffle of the numbers 3, 4, 5, 8, and 13 and you want to know the probability that the number 13 is going to be in the first position. In this case, the shuffles that satisfy your requirement are:

- 13, 3, 4, 5, 8
- 13, 5, 8, 4, 3
- 13, 8, 4, 3, 5
- Etc.

So, the probability is:

The number of possible shuffles is the number of all possible orderings (both those that have 13 at the first position and those that don’t):

- 3, 4, 5, 8, 13
- 13, 8, 3, 4, 5
- 5, 13, 8, 4, 3
- Etc.

As I said earlier, listing and counting all shuffles is impractical and often impossible (for example, when their number becomes large enough to require billions of years of computation) so there should be a more direct way of solving these kinds of combinatorics problems.

Let me give some basic theoretical background knowledge and show you how to solve this problem using combinatorics.

## The rule of product

Now imagine you have two boxes and one of them holds letters and the other holds numbers. Currently, the first box has {E, L} and the second has {2, 5}.

So, how many pairs can you create between the objects from the two boxes, such that each pair has an object from the first box in the first position and an object from the second box in the second position? The answer is 4:

- (E, 2)
- (E, 5)
- (L, 2)
- (L, 5)

Here’s a graphical illustration of the pairing process:

### Adding more objects to a box

What if you added the number 14 as a third object to the second box? Now the box contains {2, 5, 14} and the new number of pairs becomes 6:

- (E, 2)
- (E, 5)
- (E, 14)
- (L, 2)
- (L, 5)
- (L, 14)

You see the pattern, right? Basically, the number of pairs is equal to the number of objects in the first box (call that number M) times the number of objects in the second box (call it N). That’s because M objects from the first box can occupy the first position in the pair. Then, each object can be combined with N objects from the second box. So, the general rule for the number of possible pairs from two boxes is .

### Adding more boxes

What if you added a third box containing the colors {Red, Blue} and wanted to create triplets from the three boxes? Now you have 3 positions to fill with the objects of 3 boxes but the problem remains essentially the same. Here are all the possibilities:

- (E, 2, Red)
- (E, 2, Blue)
- (E, 5, Red)
- (E, 5, Blue)
- (E, 14, Red)
- (E, 14, Blue)
- (L, 2, Red)
- (L, 2, Blue)
- (L, 5, Red)
- (L, 5, Blue)
- (L, 14, Red)
- (L, 14, Blue)

So, the new number of triplets is 12 and is the product of the number of objects in each box: . That’s because you’re combining each pair created from the first two boxes (of which there are 6) with each object of the third box (2). Similarly, this rule can be extended to any number of boxes. In combinatorics, it’s known as the **rule of product**.

In the next section, I’m going to show how you can solve basic problems in combinatorics by reducing them to “boxes” containing “objects” and applying the rule of product.

## Formulas based on the rule of product

You see the rule of product is very simple. But it’s also very powerful. Now I want to show you how you can use it to derive some fundamental formulas in combinatorics.

### Permutations

I already defined the concept of a **permutation** in the first sections of this post but without actually using that term. In fact, this is one of the most basic and frequently encountered concepts in (enumerative) combinatorics. A permutation of a set of items is just a particular ordering of them.

So, let’s go back to the problem I described in the beginning (the probability of a random shuffle starting with 13) and see how we can use permutations to solve it.

We want the probability of getting a permutation of the numbers {3, 4, 5, 8, and 13} that starts with the number 13. To get an idea of a random permutation, take a look at this animated image (click on it to start the animation):

Click on the image to start/restart the animation.

As you can see, the first position of the permutation can be occupied by all 5 numbers. Then, once there is a number in the first position, there are only 4 remaining numbers for the second position, 3 for the third, 2 for the fourth, and only 1 for the last position.

If you think about it, this is exactly like combining objects from five boxes where the first box has 5 objects and every other box has one fewer object than the box to its left. Therefore, all possible permutations can be calculated using the rule of product:

#### The general pattern

Conveniently, this rule holds when counting the permutations of any number of elements. There’s a shortcut notation for multiplying all positive integers from 1 to a certain number N: **N! **(the exclamation mark is part of the notation). This is the factorial function you probably already know about:

For example:

Then, this leads to the general formula:

With this formula, we can now easily solve the problem above.

Remember, the probability is equal to the number of permutations that start with 13 divided by the total number of permutations (which we already know is equal to 5!). The permutations starting with 13 can be counted in an identical way. This time, the first slot is only allowed to be filled by a single number (13). The remaining slots can still have 4, 3, 2, and 1 numbers:

The final answer to the problem is:

#### Partial permutations

Remember, permutations are essentially the possible orderings of N numbers into N slots. There is a generalization of this concept which counts all possible orderings of N numbers into K slots. Here, K is some integer between 1 and N.

This generalization is sometimes called **partial permutations** and is also known as the **K-permutations of N**. However, calculating the number of partial permutations is as easy as calculating permutations.

For example, if you wanted to get the possible orderings of 5 numbers into 2 slots, you would have 5 possible numbers for the first slot and 4 possible numbers for the second:

Therefore, the total number of 2-permutations of 5 is:

Similarly, the number of 3-permutations of 5 is equal to . In general, the number of K-permutations of N is given by the formula:

In other words, the K-permutations of N is equal to the product of K terms, the first being N and each of the remaining terms being the previous term minus 1. The formula that achieves this is:

For example, the number of 2-permutations of 5 is:

Basically, the idea is for the denominator to cancel exactly those terms from the numerator that are not needed. This is achieved by (N-K)!.

### Combinations and the binomial coefficient

Let’s list all 2-permutations of the numbers {1, 2, 3, 4}:

- (1, 2)
**(2, 1)**- (1, 3)
**(3, 1)**- (1, 4)
**(4, 1)**- (2, 3)
**(3, 2)**- (2, 4)
**(4, 2)**- (3, 4)
**(4, 3)**

Notice that every other permutation (the ones in bold) has the same elements as the one before it, but in a different order. In this case, since there are only two available slots, the possible orders of each pair of numbers is only 2. But if you’re counting the 3-permutations (or higher) of N numbers, there’s going to be more than 2 different orderings of the same elements.

For example, among the 3-permutations of the numbers {1, 2, 3, 4, 5}, you would have 6 partial permutations for each set of 3 numbers. Consider the partial permutations consisting of the subset {2, 3, 4}:

- 2, 3, 4
- 2, 4, 3
- 3, 2, 4
- 3, 4, 2
- 4, 2, 3
- 4, 3, 2

Say you only want to count the partial permutations consisting of unique combinations of numbers. In that case, you would divide the number of K-permutations of N by the number of permutations consisting of the same elements. How many is that?

You probably already guessed that it’s K!, because you’re basically counting the number of orderings (permutations) of K numbers.

#### The binomial coefficient

In combinatorics, the term used for a permutation consisting of unique numbers is a **combination**. And getting the K-combinations of N items is as easy as dividing the K-permutations of N items by K!. So, the formula, commonly referred to as the **binomial coefficient**, is:

The notation on the left-hand side is very common for representing the binomial coefficient and is read as “N choose K“. Intuitively speaking, this formula counts the number of ways in which you can pick exactly K out of N items, regardless of their order. Alternatively, it counts the number of ways in which you can partition N items into 2 groups — one of K items and the other of N-K (the remaining) items. Notice the symmetry in the formula.

From this symmetry it follows that , , , and so on.

Therefore, a slightly more explicit (but rarely used) notation for the binomial coefficient is:

The need to calculate the binomial coefficient arises in many problems in combinatorics and probability theory, such as calculating the probability of flipping heads K times out of N coin flips.

### The multinomial coefficient

There is a natural generalization of the binomial coefficient for the case of partitioning N items into more than 2 groups. This generalization is called the **multinomial coefficient**.

For example, what if you wanted to count the number of ways in which you can split 10 items into 3 groups of size 2, 3, and 5 (disregarding the order of items within each group)? The notation in this case looks like this:

Or, more generally, if you want partition N items into M groups of different size:

Here are the number of elements in each group with the constraint that .

Finally, the formula for the multinomial coefficient is:

The intuition here is identical to the one for the binomial coefficient. Because we don’t care about the order of items in each group, we need to divide the total number of permutations of N items by the number of ways of ordering the elements in each group.

And if you wanted to be really pedantic, you can write as:

But that’s not too important and would make the formula too hairy. The important thing is to understand the intuition behind the formula.

### The power set

The final concept I want to talk about is the **power set** of a set of items. It’s not directly related to combinatorics (or probability theory for that matter), however counting the elements of a power set is a useful combinatorics exercise.

The power set of a set consists of all subsets of that set. For example, the power set of {1, 2, 3} consists of:

- {} (the empty set)
- {1}
- {2}
- {3}
- {1, 2}
- {1, 3}
- {2, 3}
- {1, 2, 3} (the set itself)

Then, given any set, how many elements does its power set have? In the example above it was 8.

You can always find this number by listing and counting all subsets, but that’s impractical for the same reasons it’s impractical to list and count permutations. Then, is there any way to reduce this problem to some form where you can apply the rule of product? Actually, there is.

#### Constructing elements of the power set

Imagine you construct an element of the power set of a set **S** by passing through each element of S and making a binary yes/no decision for whether to include it or not. Here’s an animated visualization of the process of constructing an element of the power set of {A, B, C, D} (click on the image to start the animation):

Click on the image to start/restart the animation.

The particular subset obtained this time was {A, B, D}, but you can obtain any subset with this procedure. For example, you can get the empty set by choosing “no” for each of the 4 binary decisions.

#### Counting the elements

So, how many possible subsets are there? Intuitively, you can think of the selection process as a series of 4 “boxes”, each of which has exactly 2 objects: “yes” and “no”. In that case, how many quadruplets can you create out of those 4 boxes? Naturally, this is answered by applying the rule of product:

In general, if the number of elements of your set is K, there will be K “boxes”, each with two “objects” (“yes” and “no”). Therefore, in general the number of elements of the power set of a set with K elements is given by multiplying the number 2 by itself K times:

The vertical bar notation represents the number of elements of a set. Hence, you can read the formula as:

- “The number of elements of the power set of a set with K elements is equal to 2
^{K}.”

By the way, the P(S) notation represents the power set of S. You shouldn’t mistake it with the notation for the probability of an event. It’s slightly confusing, but there’s only so many letters in the alphabet to use as notation.

#### Relevance to probability theory

So, how is all this useful in probability theory? Actually, in more than one ways, but I’m going to mention one of them here.

In my earlier post about compound events, I showed that the probability of the union of several events, like P(A ∪ B ∪ C…), is a sum consisting of positive and negative terms (the formula is actually somewhat long, so check out the post if you’re not familiar with calculating probabilities of unions).

The total number of such terms is the number of elements of the power set of {A, B, C…}. This number grows exponentially with the number of elements (K). Then, calculating the probability of the union of the events can quickly become computationally intractable, even for a relatively small number of events (like K=50). Therefore, knowing this can save you time when planning the implementation of a probabilistic model. For an example of where this is relevant, check out my post on the birthday problem.

## Summary

To sum up, in this post I introduced the following combinatorics concepts, along with their mathematical formulas:

- The rule of product
- Permutations and partial permutations
- Combinations and the binomial coefficient
- The power set

In particular, I showed an intuitive way of thinking about the concepts by reducing them to a form in which they can be solved with the rule of product (“boxes” containing “objects”).

These concepts don’t necessarily have their roots in probability theory but you will frequently rely on them when solving a wide variety of probabilistic problems.

If you want to see a really cool application of combinatorics and the concepts I covered in this post, check out my series Cryptography: Historical Intro & Combinatoric Analysis.

Bob says

That was an amazing explanation. You’re very good at distilling the essence of the subject and laying it out in a easy to follow incremental fashion. Nice work.

The Cthaeh says

Thank you, Bob!

Faisal says

Loved the post! Can’t wait to read more. Your visualisations are top notch. Btw just a small spelling mistake…it should be “whether”, not “weather”.

I always thought the hardest thing about probability is the lack of intuition and how it can be counter-intuitive. But your posts prove otherwise!

The Cthaeh says

Thank you, Faisal! I’m happy you find my posts helpful.

Oh, yeah, thanks for pointing out that mistake. Corrected!

Saleem says

Good Job, it is really explicit Explanation