This is a bonus post for my main post on the binomial distribution. Here I want to give a formal proof for the binomial distribution mean and variance formulas I previously showed you.
This post is part of my series on discrete probability distributions.
In the main post, I told you that these formulas are:
For which I gave you an intuitive derivation. The intuition was related to the properties of the sum of independent random variables. Namely, their mean and variance is equal to the sum of the means/variances of the individual random variables that form the sum. We could prove this statement itself too but I don’t want to do that here and I’ll leave it for a future post.
Instead, I want to take the general formulas for the mean and variance of discrete probability distributions and derive the specific binomial distribution mean and variance formulas from the binomial probability mass function (PMF):
To do that, I’m first going to derive a few auxiliary arithmetic properties and equations. We’re going to use those as pieces of the main proofs. But their usefulness is much bigger and you can apply them for many other derivations.
I think this post will be a great exercise for those of you who don’t have much experience in formal derivations of mathematical formulas. I’m going to be as explicit as I can and try to not skip even the smallest steps. So, don’t be scared by the quantity of equations in this post. I promise, you’ll be able to follow everything!
And if you happen to get stuck somewhere, I’m going to answer all of your questions in the comments. Don’t hesitate to ask me anything.
Auxiliary properties and equations
To make it easy to refer to them later, I’m going to label the important properties and equations with numbers, starting from 1. These identities are all we need to prove the binomial distribution mean and variance formulas. The derivations I’m going to show you also generally rely on arithmetic properties and, if you’re not too experienced with those, you might benefit from going over my post breaking down the main ones.
The first two equations are two important identities involving the sum operator which I proved in my recent post on the topic:
Second, in another recent post on different variance formulas I showed you the following alternative variance formula for a random variable X with mean M:
It’s a pretty nice formula used in many derivations, not just the ones I’m about to show you (for more intuition, check out the link above). According to this formula, the variance can also be expressed as the expected value of minus the square of its mean. As a reminder (and for comparison), here’s the main variance formula:
A property of the binomial coefficient
Finally, I want to show you a simple property of the binomial coefficient which we’re going to use in proving both formulas. Remember the binomial coefficient formula:
The first useful result I want to derive is for the expression . Let’s apply the formula to this expression and simplify:
Now let’s do something else. One of the simplest properties of the factorial function is:
I want to use this to derive the main property of binomial coefficients we’re interested in here. First, let’s use it to rewrite the right-hand side in the following way:
And using the commutative property of multiplication (), we can rewrite the right-hand side as:
Now let’s say we start with another expression: . Using the result above, we can equate it to:
In the last step, I simply canceled out the two k’s. Finally, using (derived above), we get the following identity:
And with all that out of the way, let’s get to the main proofs of today’s post!
Mean of binomial distributions derivation
Well, here we reach the main point of this post! Let’s use these equations and properties to derive the formulas we’re interested in.
First is the mean. Here’s the general formula again:
Let’s plug in the binomial distribution PMF into this formula. To be consistent with the binomial distribution notation, I’m going to use k for the argument (instead of x) and the index for the sum will naturally range from 0 to n. So, with in mind, we have:
But notice that when k = 0 the first term is also zero and doesn’t contribute to the overall sum. Therefore, we can also write the formula by having the index start from k = 1:
Now let’s see how we can manipulate the right-hand side to get the desired . Let’s do the proof step by step.
The first step in the derivation is to apply the binomial property from equation (4) to the right-hand side:
In the second line, I simply used equation (1) to get n out of the sum operator (because it doesn’t depend on k).
Next, we’re going to use the product rule of exponents:
A special case of this rule is:
And we’re going to use it to rewrite as :
Again, in the last line I simply took out the constant term p outside of the sum operator. Finally, let’s apply the identity to the exponent of (1 – p) (you’ll see why we do this in a moment):
The final proof
Believe it or not, we’re almost done here. Notice that, after the last manipulation, there’s a lot of terms like n – 1 and k – 1 inside the sum operator. To make the expression a little more readable, let’s rewrite it by applying the following variable substitutions:
This results in:
Here j starts from 0 because j = k – 1 (the k index used to start from 1 before the variable substitution). And because the number of terms in the sum must be preserved, the index runs until n – 1 = m.
Now, do you recognize the term inside the sum operator?
It looks exactly like the binomial PMF, doesn’t it? Only k has been replaced with j and n with m. And since the sum is from 0 to m, this is simply the sum of probabilities of all outcomes, right? Then, by definition:
And plugging this last result into what we have so far, we get:
Therefore, we can now confidently state:
Variance of binomial distributions derivation
Now it’s time to prove the variance formula. Remember the general variance formula for discrete probability distributions:
Like before, for the argument of the PMF I’m going to use k, instead of x. Furthermore, I’m going to use the alternative variance formula from equation (3) we derived earlier:
Using the result from the previous section:
And we can plug in to get:
We already solved half of the problem! Now let’s focus on the term. Using the binomial PMF, this expected value is equal to:
Like before, when k = 0 the first term in the sum becomes zero again. So we can similarly write the same sum with the index starting from 1:
With this setup, let’s start with the actual proof. The focus is going to be on manipulating the last equation. When we’re done with that, we’re going to plug in the final result into the main formula.
You’ll see that the mathematical tricks we use are going to be very similar to the ones we used in the previous proof.
First, using equation (4), let’s rewrite the part inside the sum as:
Plugging this in and taking the constant term n out of the sum operator, we get:
Next, let’s use the identity to rewrite this as:
Simplifying the sum
Now let’s ignore the constant product np for a moment and just focus on the sum. Let’s apply the same variable substitution rules as before:
to rewrite the sum as:
Next, let’s use equation (2) to split this sum into two sums by expanding with the distributive property:
Well, these individual sums are nothing but the expected value and the sum of probabilities of a binomial distribution:
Therefore, the final sum reduces to:
And when we plug this into the full expression for we get:
Which we can rewrite as:
So, finally we get:
The final proof
We’re at the homestretch. Let’s remember what we started with.
In the previous section we established that:
Which is what we wanted to prove!
In this post, I showed you a formal derivation of the binomial distribution mean and variance formulas. This is the first formal proof I’ve ever done on my website and I’m curious if you found it useful. Let me know if it was easy to follow.
Before the actual proofs, I showed a few auxiliary properties and equations.
The two properties of the sum operator (equations (1) and (2)):
An alternative formula for the variance of a random variable (equation (3)):
The binomial coefficient property (equation (4)):
Using these identities, as well as a few simple mathematical tricks, we derived the binomial distribution mean and variance formulas. In the last two sections below, I’m going to give a summary of these derivations.
I know there was a lot of mathematical expression manipulation, some of which was a little bit on the hairy side. However, I’m firmly convinced that even less experienced readers can understand these proofs. If you struggled to follow any part of this post (or even the post as a whole), don’t hesitate to ask me any question!
By the way, if you’re new to mathematical proofs but find it an interesting subject, check out this Wikipedia article on mathematical proofs which gives a good overview of the subject.
Mean of binomial distributions proof
We start by plugging in the binomial PMF into the general formula for the mean of a discrete probability distribution:
Then we use and to rewrite it as:
Finally, we use the variable substitutions m = n – 1 and j = k – 1 and simplify:
Variance of binomial distributions proof
Again, we start by plugging in the binomial PMF into the general formula for the variance of a discrete probability distribution:
Then we use and to rewrite it as:
Next, we use the variable substitutions m = n – 1 and j = k – 1:
Finally, we simplify: