# no libraries
Conditional Probability
An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand.
Background
I recently stumbled across a blog post from 2015, “Hey - guess what? There really is a hot hand!” The article had some R code in it that was intriguing, exploring the following proposition the post quoted from a paper it cited:
Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this conditional relative frequency to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample 1 million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.
What? OK, so let’s follow along with the R code. The first block runs the simulation:
<- 1e6
rep <- 4
n <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))
data <- rep(NA, rep)
prob for (i in 1:rep) {
<- data[i, 1:(n - 1)] == 1
heads1 <- data[i, 2:n] == 1
heads2 <- sum(heads1 & heads2) / sum(heads1)
prob[i] }
The second block naively calculates the average:
print(mean(prob))
[1] NaN
This doesn’t work since, as the post points out, “sometimes the first three flips are tails, so the probability is 0/0.” Discarding these gets us the correct average, which is approximately 0.4 and not 0.5 as the quote predicts:
print(mean(prob, na.rm = TRUE))
[1] 0.40483
Reading this and trying the code myself led me to ask, What the heck is going on here?
Huh?
Let’s follow along with the R code and try to work out why the conditional probability is 0.4.
Simulation
Looking at the first part of the code:
<- 1e6
rep <- 4
n <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n)) data
This code simulates flipping the coin 4 times in a row 1 million times and stores the results in a matrix:
head(data)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 0
[2,] 0 1 1 0
[3,] 1 1 1 0
[4,] 0 0 0 0
[5,] 0 0 1 1
[6,] 0 0 0 1
By convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of heads to be about 0.5.
mean(data)
[1] 0.5001815
round(mean(data), 2)
[1] 0.5
While there is some expected variance, the proportion is approximately 0.5.
Calculation
Looking at the second part of the code:
<- rep(NA, rep)
prob for (i in 1:rep) {
<- data[i, 1:(n - 1)] == 1
heads1 <- data[i, 2:n] == 1
heads2 <- sum(heads1 & heads2) / sum(heads1)
prob[i] }
This counts the relative frequency of heads immediately after heads, by finding heads in positions 1-3 (heads1
), comparing to heads in positions 2-4 (heads2
), and calculating the proportion of heads after heads (prob[i]
). To see how this works in practice, we can test all possible combinations of heads and tails:
<- function(flips) {
calc_prob <- flips[1:(n - 1)] == 1
heads1 <- flips[2:n] == 1
heads2 sum(heads1 & heads2) / sum(heads1)
}
<- expand.grid(0:1, 0:1, 0:1, 0:1)
test_data <- rep(NA, nrow(test_data))
test_prob
for (i in seq_len(nrow(test_data))) {
<- test_data[i, ]
f <- paste0("c(", toString(f), ")")
input <- calc_prob(f)
test_prob[i] print(paste0(i, ": ", input, " = ", test_prob[i]))
}
[1] "1: c(0, 0, 0, 0) = NaN"
[1] "2: c(1, 0, 0, 0) = 0"
[1] "3: c(0, 1, 0, 0) = 0"
[1] "4: c(1, 1, 0, 0) = 0.5"
[1] "5: c(0, 0, 1, 0) = 0"
[1] "6: c(1, 0, 1, 0) = 0"
[1] "7: c(0, 1, 1, 0) = 0.5"
[1] "8: c(1, 1, 1, 0) = 0.666666666666667"
[1] "9: c(0, 0, 0, 1) = NaN"
[1] "10: c(1, 0, 0, 1) = 0"
[1] "11: c(0, 1, 0, 1) = 0"
[1] "12: c(1, 1, 0, 1) = 0.5"
[1] "13: c(0, 0, 1, 1) = 1"
[1] "14: c(1, 0, 1, 1) = 0.5"
[1] "15: c(0, 1, 1, 1) = 1"
[1] "16: c(1, 1, 1, 1) = 1"
There are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and 0/0 (NaN
).
How it works
Looking at the permutations, the conditional probability starts to make sense. Calculating the conditional relative frequency for all permutations gives us:
mean(test_prob, na.rm = TRUE)
[1] 0.4047619
Which is approximately 0.4. As we repeat coin flips, the frequency approaches this value.
Recall that the conditional probability is: \(\large{P(A \mid B) = \frac{P(A \cap B)}{P(B)}}\) (I had to look it up).
In this case, we are trying to calculate the probability of heads (\(A\)) given heads occurring at least once (\(B\)), which is equivalent to mean(test_prob, na.rm = TRUE)
.