This is a bonus post for my main post on the binomial distribution. Here I want to give a formal proof for the binomial distribution mean and variance formulas I previously showed you.
In the main post, I told you that these formulas are:
For which I gave you an intuitive derivation. The intuition was related to the properties of the sum of independent random variables. Namely, their mean and variance is equal to the sum of the means/variances of the individual random variables that form the sum. We could prove this statement itself too but I don’t want to do that here and I’ll leave it for a future post.
Instead, I want to take the general formulas for the mean and variance of discrete probability distributions and derive the specific binomial distribution mean and variance formulas from the binomial probability mass function (PMF):
To do that, I’m first going to derive a few auxiliary arithmetic properties and equations. We’re going to use those as pieces of the main proofs. But their usefulness is much bigger and you can apply them for many other derivations.
I think this post will be a great exercise for those of you who don’t have much experience in formal derivations of mathematical formulas. I’m going to be as explicit as I can and try to not skip even the smallest steps. So, don’t be scared by the quantity of equations in this post. I promise, you’ll be able to follow everything!
And if you happen to get stuck somewhere, I’m going to answer all of your questions in the comments. Don’t hesitate to ask me anything.
Auxiliary properties and equations
To make it easy to refer to them later, I’m going to label the important properties and equations with numbers (starting from 1) which I grouped into meaningful subsections.
These equations and properties are all we need to prove the binomial distribution mean and variance formulas. So let’s get started!
Properties of the sum operator
Remember the distributive property of multiplication I showed you in the main post:
If one of the products is a constant, instead of a binomial, this property allows us to do things like:
Or, more generally, if you multiply a polynomial of any length by a constant, you get the following equality:
Notice that we can express the polynomial in the parentheses with the sum operator:
Therefore, we can generalize the following property of the sum operator:
Anything inside the sum operator that doesn’t depend on the index k is a constant in the context of that sum. So, the above equation simply states that such constants can be taken out of the sum without changing the final value.
Another useful property of the sum operator is related to the commutative property of addition:
Now, say we have the sum:
The commutative property allows us to rearrange the terms and get:
In this case, the n parameter of the sum is 2 but you see that we can easily generalize to any n:
Properties of the binomial coefficient
Remember the binomial coefficient formula:
The first useful result I want to derive is for the expression . Let’s apply the formula to this expression and simplify:
Now let’s do something else. One of the simplest properties of the factorial function is:
I want to use this to derive a useful property of the binomial coefficient. First, let’s use it to rewrite the right-hand side in the following way:
And using the commutative property of multiplication (), we can rewrite the right-hand side as:
Now let’s say we start with another expression: . Using the result above, we can equate it to:
In the last step, I simply canceled out the two k’s. Finally, using equation (3) we get the following identity:
Expected value of x-squared
In my post about expected value I showed you the following formula for the expected value of a random variable:
Another notation for the expected value of a random variable X (which I’m going to use in this post) is:
In the above formulas and stand for the possible outcomes of the random variable.
In my post on the mean and variance of probability distributions I showed you that the expected value and the mean (M) of a probability distribution (random variable) are essentially the same thing:
Now let’s define another random variable Y, which is the square of X:
If we only have the probability distribution of X, how can we calculate the expected value of Y? Well, it’s pretty simple actually. We use the same formula but substitute with :
This is specifically about the function but the result is valid for any other function .
I’m going to use equation (6) to derive an important formula in the next section. And we’re going to use this when deriving the formula for the variance of a binomial distribution.
An alternative variance formula
Here’s the general formula for the variance of a probability distribution:
Using the binomial theorem, let’s expand the squared difference inside the sum (with M = Mean):
Then we can rewrite the variance formula as:
In the last line, I simply used the distributive property of multiplication.
Breaking down the sum
Now, using equation (2), we can rewrite the right-hand side as:
Finally, using equation (1), we can take out the constant terms outside the sums and rewrite this as:
because this is the sum of probabilities of all possible vales (which by definition is equal to 1). Furthermore:
which is simply the mean formula from equation (5). Also, from equation (6) we have:
Plugging these in the last expression for the variance formula, we get:
The last two expressions are identical. In various articles you’ll typically see the second one and, admittedly, it looks more elegant. But I personally prefer the first one because I find it more readable. Therefore, the main equation from this section we just derived is:
Mean of binomial distributions derivation
Well, here we finally reach the main point of this post! Let’s use these equations and properties to derive the formulas we’re interested in.
First is the mean. Here’s the general formula again:
Let’s plug in the binomial distribution PMF into this formula. To be consistent with the binomial distribution notation, I’m going to use k for the argument (instead of x) and the index for the sum will naturally range from 0 to n. So, with in mind, we have:
But notice that when k = 0 the first term is also zero and doesn’t contribute to the overall sum. Therefore, we can also write the formula by having the index start from k = 1:
Now let’s see how we can manipulate the right-hand side to get the desired . Let’s do the proof step by step.
The first step in the derivation is to apply the binomial property from equation (4) to the right-hand side:
In the second line, I simply used equation (1) to get n out of the sum operator (because it doesn’t depend on k).
Next, we’re going to use the product rule of exponents:
A special case of this rule is:
And we’re going to use it to rewrite as :
Again, in the last line I simply took out the constant term p outside of the sum operator. Finally, let’s apply the identity to the exponent of (1 – p) (you’ll see why we do this in a moment):
The final proof
Believe it or not, we’re almost done here. Notice that, after the final manipulation, there’s a lot of terms like n – 1 and k – 1 inside the sum operator. To make the expression a little more readable, let’s rewrite it by applying the following variable substitutions:
This results in:
Here j starts from 0 because j = k – 1 (the k index used to start from 1 before the variable substitution). And because the number of terms in the sum must be preserved, the index runs until n – 1 = m.
Now, do you recognize the term inside the sum operator?
It looks exactly like the binomial PMF, doesn’t it? Only k has been replaced with j and n with m. And since the sum is from 0 to m, this is simply the sum of probabilities of all outcomes, right? Then, by definition:
And plugging this last result into what we have so far, we get:
Therefore, we can now confidently state:
Variance of binomial distributions derivation
Now it’s time to prove the variance formula. Remember the general variance formula for discrete probability distributions:
Like before, for the argument of the PMF I’m going to use k, instead of x. Furthermore, I’m going to use the alternative variance formula from equation (7) we derived earlier:
Using the result from the previous section:
And we can plug in to get:
We already solved half of the problem! Now let’s focus on the term. Using the binomial PMF, this expected value is equal to:
Like before, when k = 0 the first term in the sum becomes zero again. So we can similarly write the same sum with the index starting from 1:
With this setup, let’s start with the actual proof. The focus is going to be on manipulating the last equation. When we’re done with that, we’re going to plug in the final result into the main formula.
You’ll see that the mathematical tricks we use are going to be very similar to the ones we used in the previous proof.
First, using equation (4), let’s rewrite the part inside the sum as:
Plugging this in and taking the constant term n out of the sum operator, we get:
Next, let’s use the identity to rewrite this as:
Simplifying the sum
Now let’s ignore the constant product np for a moment and just focus on the sum. Let’s apply the same variable substitution rules as before:
to rewrite the sum as:
Next, let’s use equation (2) to split this sum into two sums by expanding with the distributive property:
Well, these individual sums are nothing but the expected value and the sum of probabilities of a binomial distribution:
Therefore, the final sum reduces to:
And when we plug this into the full expression for we get:
Which we can rewrite as:
So, finally we get:
The final proof
We’re at the homestretch. Let’s remember what we started with.
In the previous section we established that:
Which is what we wanted to prove!
In this post, I showed you a formal derivation of the binomial distribution mean and variance formulas. This is the first formal proof I’ve ever done in my website and I’m curious if you found it useful. Let me know if it was easy to follow.
To summarize, I showed you the proofs of the two formulas I gave in my main post on the binomial distribution. In the first section, I derived a few auxiliary properties and equations. Here’s the list of the most important ones.
The two properties of the sum operator (equations (1) and (2)):
The binomial coefficient property (equation (4)):
The expected value of the square of a random variable (equation (6)):
An alternative formula for the variance of a random variable (equation (7)):
Using these equations and properties, as well as a few other simple mathematical tricks, we derived the binomial distribution mean and variance formulas. In the last two sections below, I’m going to give a summary of these derivations.
I know there was a lot of mathematical expression manipulation, some of which was a little bit on the hairy side. However, I’m firmly convinced that even less experienced readers can understand these proofs. If you struggled to follow any part of this post (or even the post as a whole), don’t hesitate to ask me any question!
By the way, if you’re new to mathematical proofs but find it an interesting subject, check out this Wikipedia article on mathematical proofs which gives a good overview.
Mean of binomial distributions proof summary
We start by plugging in the binomial PMF into the general formula for the mean of a discrete probability distribution:
Then we use and to rewrite it as:
Finally, we use the variable substitutions m = n – 1 and j = k – 1 and simplify:
Variance of binomial distributions proof summary
Again, we start by plugging in the binomial PMF into the general formula for the variance of a discrete probability distribution:
Then we use and to rewrite it as:
Next, we use the variable substitutions m = n – 1 and j = k – 1:
Finally, we simplify: