Welcome to my introductory post to a large series that I’m starting today. The main purpose of this post is to get you in the mood for the posts to follow. Namely, exploring and solving interesting probability questions from the real world.
Most of my posts so far have been more on the theoretical side. In previous posts, I introduced important concepts from probability theory (and related fields like statistics and combinatorics):
- probabilities and sample spaces
- the law of large numbers and expected value
- permutations, combinations, and other combinatorics concepts
- mean and variance
- probability distributions
- Bayes’ theorem
And I’m going to continue introducing more concepts in the future, basic and advanced concepts alike. I personally find them fascinating in their own right. In the context of mathematics, they are interesting, thought provoking, and (some would say) even beautiful.
But they’re not just interesting, they’re also extremely useful. And when I say useful, I don’t just mean useful for mathematicians and scientists. I would argue they are a potentially useful tool for everybody.
Skills in the real world
Think about the following skills for a moment:
- digesting nutrition
- absorbing water
You might find it strange that I’m calling these “skills”, but essentially they are. Of course, they are skills related to basic biological survival and, by definition, (almost) every living organism needs to have them in order to remain such.
Now, what about skills like:
- being fluent in a popular language
- detecting misinformation
- imagination and creativity
- performing CPR
In my opinion, skills like these (and many more) are always good to have. Regardless of your job, your age, or where you live. They are obviously not as essential as the previous category, but all of them are things that will allow you to achieve outcomes and take advantage of situations which you otherwise might not be able to.
Are probability theory skills useful in the real world?
So, where does having probability theory skills fit in all this? Well, I think it easily fits in the second category, though this isn’t as obviously true as some of the other skills in the list above. But think about it, what is probability theory really about? What does the ability to accurately calculate (or at least estimate) probabilities of events give you?
Well, probability theory is really about providing a measure for our uncertainty about an event’s occurrence and/or giving us insights about the frequency of an event’s long-term occurrence. In short, it helps us build good expectations about real-world events and phenomena. And, consequently, this helps us make better decisions (in the most general sense).
There’s uncertainty in so many fields. You can apply probability theory in science, games, economics, education, politics, and many more. Really, it’s hard to even come up with examples where probability theory can’t help. Regardless of what you do or find interesting, probability theory is a very useful tool to have under your belt.
Well, that’s how I feel about it anyway.
My motivation for these posts
Convincing you that probability theory is cool
So, in an effort to justify my position, in this series I want to show you many probability questions from diverse areas in life. I’m going to start with simpler problems which are more fun than useful. And, eventually, I’m going to build up to more complicated ones.
More importantly, the process of solving these problems itself is useful in training your brain to think about probability questions. Often, the principles involved in solving simpler problems are the same (or at least similar) to the ones used for solving more complex ones.
Even though most of my posts so far have been theoretical, I’ve also written a few more practical ones. For example, I’ve shown you how to apply some of the theoretical concepts from the beginning for things like:
- solving the inverse problem
- calculating the bias of a coin
- predicting presidential elections
- Occam’s razor
But in more than one occasion I’ve been asked to give more examples of practical applications of the theoretical concepts, as well as just examples of solving probability related problems. Hopefully, this series will be a good first step in this direction.
Probability questions from the book Understanding Probability
People have also asked me for recommendations on probability theory and statistics books that give a decent overview of all important concepts from these fields.
For the first posts in this series, I’m going to use twelve probability questions from the book Understanding Probability: Chance Rules in Everyday Life by the author Henk Tijms. Tijms is a Dutch mathematician who specializes in probability theory and many related fields. If you’ve been interested in probability theory for long enough, this is a name that you’ve likely already heard.
I personally read this book a little less than 10 years ago while I was still finishing my master’s degree in cognitive neuroscience and back then I found it one of the most interesting books on the subject. I was pleasantly surprised when I recently received an email from Henk Tijms himself in which he shared some positive words about Probabilistic World. And he was kind enough to give me permission to use the probability questions from his book in my posts.
The very first image in this post (the funny laundry cartoon) is actually from the same book. It is the header image of the first chapter in which the twelve questions are introduced. I say introduced because the actual solutions are given in later chapters.
Anyway, if you’re new to probability theory and statistics and you’re looking for a good comprehensive book on the subject, I recommend you start with this book. Now, some of the concepts Henk Tijms discusses in the book are things that I’ve discussed myself. And the rest are things I’m going to discuss in the future. But when you read about the same concept explained in different ways by different people, this helps you consolidate your knowledge and understanding. This is an approach that I myself have used for a very long time and I find it very effective in learning.
So, I think the Understanding Probability book is a very good complement to my website.
Answering probability questions with simulations
My third main motive for this series is that I want to introduce you to the method of answering probability questions using simulations. This is an extremely important technique and sometimes it’s the only way certain questions can be answered. Why? Well, as you’ll see in future posts, there are a lot of problems for which we don’t have an analytic solution.
I’ve already used simulations in some of my previous posts:
- estimating coin bias
- the mean, the mode, and the median
- the law of large numbers
- expected value
- probability distributions
- mean and variance of probability distributions
But, except for the first post in this list, I didn’t share the computer code used in these simulations. In order to show you how to use simulations yourself, in this series I’m going to be much more explicit with my explanations. And, for all simulations, I’m going to use my favorite programming language Python.
Python is an extremely powerful language and is one of the top choices for programmers, scientists, and basically anybody doing math-related programming for whatever reason. It’s also extremely beginner-friendly, easy to learn, and surprisingly similar to a natural language (English).
But don’t worry. If you don’t know anything about Python or programming in general, I’m going to make sure you still benefit from this series to the fullest extent. The simulations themselves are going to be ones you can perform even with a pen and paper. The role of the programming code is simply to make your computer perform the same steps automatically and much faster. Even without a programming background, you’ll gain intuition about the simulations.
For each probability question, I’m going to first show its analytic solution and then compare it to the answer we get with a simulation. Meaning, we’re going to reach the same answer from two entirely different paths. Which is going to be a very useful exercise for gaining intuition about the law of large numbers too!
What is a computer simulation?
In a nutshell, computer simulations are used for estimating probabilities empirically. This involves repeating the process that leads to the outcomes we’re interested in a large number of times. In the meantime, you simply keep track of the number of times each outcome occur. And the goal of the computer is to automate the steps in order to save you (lots of) time and effort.
In my post on the law of large numbers I showed you a few examples of such empirical estimates of probabilities. When it comes to the process of flipping a fair coin, the law guarantees that the percentage of flips that turn up “heads” will converge to the probability of “heads”. Click on the image below to see how the empirical estimate of the probability converges to the real probability as the number of simulated flips increases:
Similarly, if the process is rolling a fair die, the relative frequency of the six possible outcomes will approach the probabilities of :
Technically, you don’t need a computer for this. Just take a real coin or a real die and flip/roll it multiple times while keeping track of the outcomes. Of course, doing it like that is just extremely laborious (especially for more complicated processes), so it’s much better to use a computer simulation.
Bottom line is that, as long as you can simulate the outcome generating process with a computer (with programming or otherwise), you can empirically estimate the probability of any outcome. All thanks to the law of large numbers!
You know nothing about programming?
If you’ve never done any programming in your life but still want to run the simulations, you can do it. And I really mean that. You don’t have to first read a book about programming or Python. You don’t have to follow any online tutorials. None of that.
Don’t get me wrong, if you’re generally interested in getting into programming, you can do those things as well. But I’m a big fan of the philosophy called “learning by doing”. Especially for programming. If right now you’re thinking to yourself “Really? I can still run and understand the code even if know absolutely nothing about programming?”… Yes, trust me, you will be able to. And you’ll most likely start picking up programming concepts in the process, even if you don’t set this as an explicit goal.
For one thing, you’ll be able to run the code by simple copy/pasting even if you don’t understand it at all. But, like I said, Python is one of the most readable programming languages in existence and, even if you read the code as if you were reading plain English, you’ll still understand a lot. Especially combined with my brief explanations.
By the way, like I said earlier, even if you choose to skip the programming parts of my posts, you won’t lose anything from the analytic answers to the probability questions. But if you want to make your first steps in programming with actual probability questions, this is going to be a very good opportunity for you. And the only thing you’re going to need to start is Python itself.
Normally, you can simply download and install Python from the official Python website. But if you’re completely new to Python and/or programming, I strongly recommend installing it with the platform called Anaconda and using it with the web application called Jupyter Notebook that comes along with Anaconda.
Anaconda and Jupyter Notebook
You can download Anaconda from their official website. Definitely download the one with the latest Python 3 version (not Python 2) and just be careful to choose the right option for your operating system.
Once you download and install Anaconda, you can get familiar with it by following this quick guide. In particular, pay attention to the part about Jupyter Notebook. This is an awesome web application for running Python code (among other things) that is automatically installed when you install Anaconda.
Jupyter Notebook is an extremely popular tool among programmers working in fields like data science, machine learning, artificial intelligence, and many others intersecting with probability theory, statistics, and mathematics in general. It runs in your browser (the same one you’re currently reading this post from) and you’ll be able to run the code from my posts with it. Not only that, Anaconda will automatically install many popular Python packages that have a ton of useful functionality for the fields I mentioned.
If you want to get your hands dirty, you can take a look at this somewhat more extensive Jupyter Notebook tutorial. But you don’t have to read all these things at once, you can also do that when you start practicing with the code from my posts.
Bottom line, all you need to do to be able to start running the code form my posts is:
- Download and install Anaconda
- Learn how to run Jupyter Notebook from the command prompt (spoiler: the command is simply jupyter notebook)
- Optionally, go through the short tutorials I linked to
If you encounter any issues with these steps, let me know in the comments below and me or another reader will help you what that.
The probability questions
So, here are the titles of the twelve probability questions, as listed in the opening chapter of the book Understanding Probability:
- A birthday problem (analytic solution and Python simulation)
- Probability of winning streaks
- A scratch-and-win lottery
- A lotto problem
- Hitting the jackpot
- Who is the murderer?
- A coincidence problem
- A sock problem
- A statistical test problem
- The best-choice problem
- The Monty Hall dilemma
- An offer you can’t refuse — or can you?
Many of these questions are famous problems in probability theory but here Henk Tijms presents them in a fun and informal format. My posts won’t necessarily be in the same order, since answering some of these questions requires knowledge of concepts I haven’t talked about yet and I might put them on hold until I do.
Of course, these twelve questions are only a starting point. I’m going to write many other posts on other questions, some of them famous, some of them ones I came up with myself. And yet others which are simply interesting real-world questions that can be answered with probability theory.