--- title: "C11 Simulation" author: "David Housman" format: docx editor: visual fig-width: 6 fig-height: 4 --- ```{r} #| echo: false library(tidyverse) ``` ## Introduction A **random** or **probabilistic** **experiment** is a repeatable activity with a well-defined set of possible outcomes, although the outcome of any particular **run** of the experiment cannot be accurately predicted.  When something is used as a surrogate for a physical experiment, the word **simulate** is often substituted for run.  The set of all possible **outcomes** is called the **sample space** of the experiment.  In a table, each row is typically associated with a run of the experiment. One good way to think about a random experiment is pulling an outcome randomly from a hat containing all possible (weighted) outcomes. A **random variable** is a question that can be asked about the outcome of an experiment.  In more formal language, a random variable is a function defined on the sample space of an experiment.  The question usually has numeric answers and is often stated in a declarative, rather than interrogative, form.  The answers are called **values** of the random variable.  A simple example of a random experiment is to roll a fair four-sided die. Its possible outcomes are 1, 2, 3, or 4. The sample space is $\{ 1, 2, 3, 4 \}$. A random variable is the remainder when dividing by two. So, the values would be 0 (for an outcome of 2 or 4) and 1 (for an outcome of 1 or 3). The instructor will carry out one run of this random experiment. The next code chunks simulate this random experiment with base R (returning a vector) and the `dplyr` package from the `tidyverse` (returning a tibble). The first code chunk shows the way we will be thinking about pseudo-random number generation: a uniformly random number is chosen from the real number interval $(0, 4)$ and then rounded up to the nearest integer to obtain one of the four possible outcomes. The other two code chunks are more readable that hide the underlying mechanism. ```{r} ceiling(runif(n = 1, min = 0, max = 4)) ``` ```{r} sample(1:4, size = 1) ``` ```{r} tibble(side = 1:4) |> slice_sample() ``` If the die were not fair but the sides are weighted by their number (1 has weight 0.1, 2 has weight 0.2, 3 has weight 0.3, and 4 has weight 0.4, then the following code chunks shows the modification. The first code chunk generates a random number $u$ from the real interval $(0, 1)$ and then returns 1 if $u \leq 0.1$, 2 if $0.1 < u \leq 0.3$, 3 if $0.3 < u \leq 0.6$ and 4 if $0.6 < u < 1$. ```{r} stepfun(c(0.1, 0.3, 0.6), 1:4)(runif(n = 1)) ``` ```{r} sample(1:4, prob = c(0.1, 0.2, 0.3, 0.4), size = 1) ``` ```{r} tibble(side = 1:4, weight = c(0.1, 0.2, 0.3, 0.4)) |> slice_sample(weight_by = weight) |> select(side) ``` ## Ten Coin Flips Wrangling Consider the random experiment involving flipping a fair coin ten times. Students in various Math 323 and 233 classes have been asked to simulate this experiment four times and then run this experiment four times. 1. Read the results from *C11 Coin Flips Human Simulation.csv* and *C11 Coin Flips Actual.csv*. ```{r} human = read_csv("C11 Coin Flips Human Simulation.csv", col_types = rep("i",10)) actual = read_csv("C11 Coin Flips Actual.csv", col_types = rep("i",10)) ``` 2. Define a function that simulates the experiment and obtain one simulation. ```{r} rtenflips = function() { sample(0:1, size = 10, replace = TRUE) } rtenflips() ``` 1. Run the computer simulation 310 times and store in a tibble. ```{r} computer = tibble() for (i in 1:343) { computer = rbind(computer, rtenflips()) } colnames(computer) = paste("F", c('1':'9', '10'), sep = '') ``` 1. Place the three separate tibbles into a single tibble with an identifying categorical variable. ```{r} tenflips = rbind( human |> mutate(type = "human"), actual |> mutate(type = "actual"), computer |> mutate(type = "computer")) ``` ```{r} tenflips |> slice_sample(n = 8) ``` 2. Add the variable *number of heads* to the `tenflips` tibble. ```{r} tenflips = tenflips |> rowwise() |> mutate(heads = sum(c_across(starts_with("F")))) |> ungroup() ``` 1. Create a function that returns the length of the longest sequence of the same number in a vector. ```{r} max.run.length = function(x) { max(rle(x)$lengths) } ``` 1. Add the variable *maximum run length* to the `tenflips` tibble. ```{r} tenflips = tenflips |> rowwise() |> mutate(mrl = max.run.length(c_across(starts_with("F")))) |> ungroup() ``` ## Ten Coin Flips Description 1. Obtain a visualization that compares the number of heads for the three types of ten coin flips. What is your conclusion? ```{r} ggplot(tenflips, aes(x = type, y = heads)) + geom_boxplot() ``` ```{r} ggplot(tenflips, aes(x = heads, fill = type, y = after_stat(density))) + geom_histogram(binwidth = 1, color = "white") + scale_x_continuous(breaks = 0:10) + facet_wrap(~ type, ncol = 1) ``` 1. Obtain a visualization that compares the maximum run length for the three types of ten coin flips. What is your conclusion? ```{r} ggplot(tenflips, aes(x = type, y = mrl)) + geom_boxplot() ``` ```{r} ggplot(tenflips, aes(x = mrl, fill = type, y = after_stat(density))) + geom_histogram(binwidth = 1, color = "white") + scale_x_continuous(breaks = 0:10) + facet_wrap(~ type, ncol = 1) ```