--- title: "C12 Probability" author: "David Housman" format: docx editor: visual fig-width: 6 fig-height: 4 --- ```{r} #| message: false #| warning: false #| echo: false library(tidyverse) ``` ## Introduction A **random experiment** is a repeatable activity with a well-defined set of possible **outcomes**, although the outcome of any particular **run** of the experiment cannot be accurately predicted.  The **sample space** is the set of all possible outcomes. An **event** is a set of outcomes, often described verbally. A simple example of a random experiment is *roll a six-sided die*. The instructor will demonstrate a run of the experiment. The sample space is $S = \{1,2,3,4,5,6\}$. One event is *a composite number is rolled*, equivalently, $C = \{4,6\}$. Suppose $S$ is a sample space in which all outcomes are assumed to be equally likely, and $E$ is an event. The **theoretician's probability** of $E$ is $$P(E) = \dfrac{\text{the number of outcomes in } E}{\text{the number of outcomes in } S}.$$ For our example, $P(C) = \dfrac{2}{6} = \dfrac{1}{3} \approx 0.333 = 33.3\%$. The **frequentist's probability** of $E$ is the proportion of times the event should occur when the experiment is run a large number of times. For our example, we expect that a composite number will appear about 1/3 of the time a six-sided die is rolled a large number of times, but is this really true? Illustrate by hand and then by R with a table/tibble having columns roll, side, composite, and cumulative fraction composite. Create a time series graph of the cumulative fraction composite. ```{r} rolls = tibble( roll = sample(1:6, size = 3000, replace = TRUE), comp = roll == 4 | roll == 6, # roll %in% c(4,6) n = 1:3000, cum.rel.freq = cumsum(comp) / n ) ``` ```{r} ggplot(rolls, aes(x = n, y = cum.rel.freq)) + geom_line() + geom_hline(yintercept = 1/3, color = "red") ``` If we were to actually roll the six-sided die 300 times and obtain 110 composite numbers, then we might say that the **empirical probability** of obtaining a composite number rolling this six-sided die is $\dfrac{11}{30}$. The **subjectivist's probability** of $E$ is the fair payment for a lottery in which one unit is won should event $E$ occur. For our example, it may seem fair to pay $1/3$ of a dollar to win one dollar if a composite number is the outcome of rolling a six-sided die. The **mathematician's probability** of $E$ is defined in the following manner: $P$ is a *probability measure* if it is a real-valued function defined on the events of a sample space $S$ which satisfies - Axiom 1. For any event $E$, $P(E) \geq 0$. - Axiom 2. $P(S) = 1$. - Axiom 3. If $E_1, E_2, \ldots$ are mutually exclusive events ($E_i \cap E_j = \emptyset$ for all $i \ne j$), then $P(E_1 \cup E_2 \cup \cdots) = P(E_1) + P(E_2) + \cdots$. For our example, we define $P(E) = \frac16 (\text{the number of element in }E)$. It is straight-forward to verify that this defines a probability measure, and so $P(C) = 2/6 = 1/3$. ## Exercises 1. What is the probability that the millionth decimal digit of $\pi$ is 7? 2. Roll a 4-sided die and a 6-sided die simultaneously. a. List the equally likely outcomes. b. What is the probability that the maximum number rolled is 4? c. What is the probability that the sum is 4? d. What is the probability that an odd number has been rolled? 3. Draw two balls without replacement from an urn containing one red and two blue balls. a. List the equally likely outcomes. b. What is the probability of obtaining a red ball? c. What is the probability of obtaining a red ball on the first draw? d. What is the probability of obtaining a red ball on the second draw? 4. Draw two balls without replacement from an urn containing one red, two blue, and four green balls. a. How many equally likely outcomes are there? b. Use a tree diagram to cut down on counting the outcomes. c. What is the probability that the colors match? d. What is the probability of obtaining at least one blue ball? e. Approximate the probability of obtaining at least one blue ball with a simulation. ```{r} urn = c("r", "b", "b", rep("g", 4)) rblue = function() { any(sample(urn, size = 2) == "b") } n = 1000000 sum(replicate(n, rblue())) / n ``` 5. Draw five cards from a standard deck of cards. a. How many equally likely outcomes? b. What is the probability of a spade flush? c. What is the probability of a flush? d. What is the probability of three kings and two queens? e. What is the probability of a full house? 6. What is the probability that David shares his birthday with another person in a room of 30 people? ```{r} 1-(365/366)^29 ``` 1. What is the probability that two people share the same birthday in a room of 30 people? ```{r} 1 - prod(366:337/366) ``` 1. Approximate the probability calculated in the previous question using a simulation. ```{r} people = 30 days = 1:366 rbdmatch = function() { length(unique(sample(days, people, replace = TRUE))) != 30 } sum(replicate(10000, rbdmatch())) / 10000 ```