Conditional Probability

notes

Author

John Benninghoff

Published

March 26, 2024

Modified

March 26, 2024

An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand.

# no libraries

Background

I recently stumbled across a blog post from 2015, “Hey - guess what? There really is a hot hand!” The article had some R code in it that was intriguing, exploring the following proposition the post quoted from a paper it cited:

Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this conditional relative frequency to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample 1 million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.

What? OK, so let’s follow along with the R code. The first block runs the simulation:

rep <- 1e6
n <- 4
data <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))
prob <- rep(NA, rep)
for (i in 1:rep) {
  heads1 <- data[i, 1:(n - 1)] == 1
  heads2 <- data[i, 2:n] == 1
  prob[i] <- sum(heads1 & heads2) / sum(heads1)
}

The second block naively calculates the average:

print(mean(prob))

[1] NaN

This doesn’t work since, as the post points out, “sometimes the first three flips are tails, so the probability is 0/0.” Discarding these gets us the correct average, which is approximately 0.4 and not 0.5 as the quote predicts:

print(mean(prob, na.rm = TRUE))

[1] 0.4046258

Reading this and trying the code myself led me to ask, What the heck is going on here?

Huh?

Let’s follow along with the R code and try to work out why the conditional probability is 0.4.

Simulation

Looking at the first part of the code:

rep <- 1e6
n <- 4
data <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))

This code simulates flipping the coin 4 times in a row 1 million times and stores the results in a matrix:

head(data)

     [,1] [,2] [,3] [,4]
[1,]    1    1    1    0
[2,]    1    0    1    0
[3,]    1    0    0    1
[4,]    1    1    1    0
[5,]    1    0    0    1
[6,]    1    0    1    0

By convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of heads to be about 0.5.

mean(data)

[1] 0.4997455

round(mean(data), 2)

[1] 0.5

While there is some expected variance, the proportion is approximately 0.5.

Calculation

Looking at the second part of the code:

prob <- rep(NA, rep)
for (i in 1:rep) {
  heads1 <- data[i, 1:(n - 1)] == 1
  heads2 <- data[i, 2:n] == 1
  prob[i] <- sum(heads1 & heads2) / sum(heads1)
}

This counts the relative frequency of heads immediately after heads, by finding heads in positions 1-3 (heads1), comparing to heads in positions 2-4 (heads2), and calculating the proportion of heads after heads (prob[i]). To see how this works in practice, we can test all possible combinations of heads and tails:

calc_prob <- function(flips) {
  heads1 <- flips[1:(n - 1)] == 1
  heads2 <- flips[2:n] == 1
  sum(heads1 & heads2) / sum(heads1)
}

test_data <- expand.grid(0:1, 0:1, 0:1, 0:1)
test_prob <- rep(NA, nrow(test_data))

for (i in seq_len(nrow(test_data))) {
  f <- test_data[i, ]
  input <- paste0("c(", toString(f), ")")
  test_prob[i] <- calc_prob(f)
  print(paste0(i, ": ", input, " = ", test_prob[i]))
}

[1] "1: c(0, 0, 0, 0) = NaN"
[1] "2: c(1, 0, 0, 0) = 0"
[1] "3: c(0, 1, 0, 0) = 0"
[1] "4: c(1, 1, 0, 0) = 0.5"
[1] "5: c(0, 0, 1, 0) = 0"
[1] "6: c(1, 0, 1, 0) = 0"
[1] "7: c(0, 1, 1, 0) = 0.5"
[1] "8: c(1, 1, 1, 0) = 0.666666666666667"
[1] "9: c(0, 0, 0, 1) = NaN"
[1] "10: c(1, 0, 0, 1) = 0"
[1] "11: c(0, 1, 0, 1) = 0"
[1] "12: c(1, 1, 0, 1) = 0.5"
[1] "13: c(0, 0, 1, 1) = 1"
[1] "14: c(1, 0, 1, 1) = 0.5"
[1] "15: c(0, 1, 1, 1) = 1"
[1] "16: c(1, 1, 1, 1) = 1"

There are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and 0/0 (NaN).

How it works

Looking at the permutations, the conditional probability starts to make sense. Calculating the conditional relative frequency for all permutations gives us:

mean(test_prob, na.rm = TRUE)

[1] 0.4047619

Which is approximately 0.4. As we repeat coin flips, the frequency approaches this value.

Recall that the conditional probability is: \(\large{P(A \mid B) = \frac{P(A \cap B)}{P(B)}}\) (I had to look it up).

In this case, we are trying to calculate the probability of heads (\(A\)) given heads occurring at least once (\(B\)), which is equivalent to mean(test_prob, na.rm = TRUE).