--- title: "A07 Discrete Random Variables (245 points)" author: "TYPE YOUR NAME HERE" date: "TYPE DATE HERE" format: docx editor: visual --- ```{r} #| message: false #| warning: false #| echo: false library(tidyverse) library(latex2exp) ``` ## Instructions Complete each exercise either in this qmd file or on paper. Include your name as the author and the date completed in the YAML code at the top of this file, and replace the first sentence in the Acknowledgements section as directed. Any parts completed on paper should either (a) be handed to David Housman, (b) placed in SC 117, or (c) scanned into a pdf file. This qmd file should be rendered to an html, docx, or pdf file. Zip together all relevant files: Rproj, qmd, rendered file, any data or image files, and (optionally) the pdf file containing your answers completed on paper. Upload the zip file in Moodle. Points will be taken off if these instructions are not followed. ## Acknowledgements Replace this sentence with either (1) an acknowledgment of any person who gave you assistance and/or any resource that was used, or (2) a statement that you did not use any outside assistance. By submitting this assignment, the author attests to abiding by the *Collaboration and Academic Integrity* policy stated in the course syllabus. ## Exercise 1 (40 points) Randomly draw five balls in turn and without replacement from an urn with 4 red and 9 blue balls. Let $Y$ be the draw number of the first blue ball. For example, one outcome could be (red, red, blue, blue, red) for which the value of $Y$ is 3. a. (10 points) Obtain and display a tibble with the probability distribution function (pdf) and cumulative distribution function (cdf) for $Y$. b. (10 points) Obtain and display a tibble with the mean, standard deviation, and probabilities of being within one, two, and three standard deviations of the mean. Compare with the empirical rule. c. (10 points) Obtain a tibble that Simulates $Y$ 10000 times where each simulation is based on simulating the experiment and calculating $Y$. Display the simulated distribution of $Y$ with a histogram overlaid with a spike graph of the theoretical distribution of $Y$. d. (10 points) Obtain a tibble that Simulates $Y$ 10000 times where each simulation is based on simulating a number uniformly from the interval $(0, 1)$ and making use of the cdf obtained in part (a). Display the simulated distribution of $Y$ with a histogram overlaid with a spike graph of the theoretical distribution of $Y$. ## Exercise 2 (25 points) Calculate the following probabilities. a. Ledolter exercise 2.4-2. The university football team has 11 games on its schedule. Assume that the probability of winning each game is 0.40 and that there are no ties. Assuming independence, calculate the probability that this year's team will have a winning season (i.e., that the team will win at least six games). b. A student who has not studied decides to randomly answer each of the ten multiple choice questions (each having one correct answer among the four provided answers) on the quiz. Determine the probability that the first correct answer is in response to question #3, c. A student who has not studied decides to randomly answer each of the ten multiple choice questions (each having one correct answer among the four provided answers) on the quiz. Determine the probability that exactly four questions on the quiz are answered correctly. d. A student who has not studied decides to randomly answer each of the ten multiple choice questions (each having one correct answer among the four provided answers) on the quiz. Determine the probability that at least six questions on the quiz are answered correctly. e. Ledolter exercise 2.5-2. On average, 2.5 telephone calls per minute are received at a corporation's switchboard. Find the probability that at any given minute there will be more than two calls. ## Exercise 3 (15 points) Derive the variance for a geometric random variable with parameter $p$. ## Exercise 4 (15 points) Derive the variance for a Poisson random variable with parameter $\lambda$. ## Exercise 5 (28 points) For each of the following verbal descriptions of a graph, state which of the random variables Uniform($n$), Binomial($n,p$), Geometric($p$), and/or Poisson($\lambda$) have pdf graphs that correspond to the description. For some descriptions, it is important to specify the possible parameter values. a. Flat. b. Unimodal c. Symmetric d. Skewed right e. Skewed left f. Decreasing and concave up g. Increasing and concave up ## Exercise 6 (20 points) Consider flipping a fair coin 99 times. and let $X$ be the number of heads, a. (4 points) Find the mean and standard deviation of $X$. b. (6 points) Find the probabilities that $X$ is within one, two, and three standard deviations of the mean. c. (6 points) Obtain a spike graph of the pdf of $X$ with domain within four standard deviations of the mean. d. (4 points) Compare your results with the empirical rule. ## Exercise 7 (15 points) Write R code based only on `runif` that will generate $n$ numbers from the $\text{Geometric}(p)$ random variable and compares a histogram of the generated random numbers with a spike graph of the theoretical probabilities. Choose $n \geq 1000$ and an interesting value for the parameter $p$ to test your code. ## Exercise 8 (15 points) Ledolter exercise 2.5-6. Bortkiewicz collected data on the number of horseman, $Y$, that were killed by kicks from horses in each of 10 Prussian cavalry regiments. Data for 20 years (thus, 200 observations in total) are as in the tibble. ```{r} horsemen = tibble( killed = 0:5, regiments = c(109, 65, 22, 3, 1, 0) ) ``` Assume that the data come from a Poisson random variable $Y$. Note that this makes sense because fatalities due to kicks from horses are very rare events. Calculate an estimate of the parameter $\lambda$. Using this estimate, calculate the probabilities and expected frequencies of $Y = 0, 1, 2, 3, 4, \text{ and } \geq 5$, and compare them with the observed frequencies. Comment on the fit. ## Exercise 9 (24 points) Let $X$ and $Y$ have the joint probability distribution function given in the table. | f(x,y) | x=1 | x=2 | x=3 | |:------:|:----:|:----:|:----:| | y=1 | 0.05 | 0.20 | 0.15 | | y=2 | 0.10 | 0.10 | 0.10 | | y=3 | 0.15 | 0.10 | 0.05 | a. (4 points) Find the marginal probability distribution functions. b. (8 points) Calculate the means, variances, covariance, and correlation coefficient. c. (4 points) Determine, with explanation, whether$X$ and $Y$ are independent. d. (8 points) Find the conditional distributions and their means and variances. ## Exercise 10 (24 points) For the random variables defined in exercise #9, find the mean and variance of the following random variables: $V = 2X + 3Y$, $W = 2X - 3Y$, and $Z = XY$. ## Exercise 11 (24 points) For each of the following joint densities for random variables $X$ and $Y$, compute the correlation coefficient and state with explanation whether $X$ and $Y$ are independent. | $f(x,y)$ | $x=1$ | $x=2$ | | $g(x,y)$ | $x=1$ | $x=2$ | | $h(x,y)$ | $x=1$ | $x=2$ | $x=3$ | |:--------:|:-----:|:-----:|:---:|:--------:|:-----:|:-----:|:---:|:--------:|:-----:|:-----:|:-----:| | $y=1$ | 0.0 | 0.6 | | $y=1$ | 0.2 | 0.2 | | $y=1$ | 0.1 | 0.3 | 0.1 | | $y=2$ | 0.4 | 0.0 | | $y=2$ | 0.3 | 0.3 | | $y=2$ | 0.2 | 0.1 | 0.2 |