--- title: "A02 (156 points)" author: "TYPE YOUR NAME HERE" date: "TYPE DATE HERE" format: docx editor: visual --- ## Acknowledgements Replace this sentence with either (1) an acknowledgment of any person who gave you assistance and/or any resource that was used, or (2) a statement that you did not use any outside assistance. By submitting this assignment, the author attests to abiding by the *Collaboration and Academic Integrity* policy stated in the course syllabus. ## By Hand Exercises Exercises 1-7 should be done “by hand.” You may use a calculator or R for simple calculations, but you should indicate your steps in a manner similar to what we did in class. You may either include your solutions in this `qmd` file, or you may write your solutions on paper and then either (a) hand in your papers to David Housman or SC 117, or (b) scan and include your solutions as a `pdf` file in your zipped Moodle submission. Exercises 1-4 are based on the Introductions Questionnaire completed by students taking Math 323. You should use the subset of the data reproduced below. | Gender | Religious Affiliation | Excitement Level | Grade Prediction | Height (in) | Weight (lb) | |------------|------------|------------|------------|------------|------------| | Male | Mennonite | fearful | A- | 70 | 180 | | Female | Mennonite | fearful | B | 66 | 134 | | Male | Mennonite | neutral | A- | 67 | 165 | | Female | Other Christian | neutral | A- | 65 | 129 | | Male | None | neutral | A | 73 | 150 | | Female | Mennonite | excited | A | 65 | 114 | | Male | Mennonite | neutral | A- | 76 | 190 | | Male | Mennonite | excited | A- | 78 | 192 | | Male | Mennonite | fearful | B+ | 72 | 165 | | Male | Mennonite | neutral | B+ | 73 | 158 | | Male | None | very excited | A | 6 | 180 | | Male | Other Christian | neutral | A | 72 | 160 | | Male | Mennonite | excited | A- | 71 | 180 | | Male | Mennonite | very excited | A- | 69 | 125 | | Male | Other Christian | excited | A- | 66 | 150 | | Female | Mennonite | neutral | B+ | 71 | 150 | ## Exercise 1 (10 points) a. (1 point) What type of variable is religious affiliation? b. (4 points) Find a table of absolute and relative frequencies of religious affiliation. c. (5 points) Draw a bar chart of religious affiliation with the values ordered so that the heights of the bars are decreasing. Include appropriate labels. ## Exercise 2 (10 points) a. (1 point) What type of variable is excitement level? b. (4 points) Find a table of absolute and relative frequencies of excitement level. c. (5 points) Draw a bar chart of excitement level. Include appropriate labels. ## Exercise 3 (38 points) a. (1 point) What type of variable is height in inches? In the following, exclude the clearly incorrect datum! b. (12 points) Draw a histogram and box plot. They should be drawn in a vertical column with the same horizontal scales so that they are easy to compare visually. The histogram should have bins 62.5-65.5, 65.5-68.5, …, 77.5-80.5. c. (16 points) Calculate the mean, median, first quartile, third quartile, mean deviation from the median, variance, and standard deviation. Include appropriate units with your answers. d. (5 points) Mark on one of the horizontal scales the mean ($\bar{x}$), the mean plus or minus one standard deviation ($\bar{x} \pm s$), the mean plus or minus two standard deviations ($\bar{x} \pm 2s$), and the mean plus or minus three standard deviations ($\hat{x} \pm 3s$). e. (4 points) What percentage of the data is within one standard deviation of the mean? What percentage of the data is within two standard deviations of the mean? What percentage of the data is within three standard deviations of the mean? Compare with the empirical rule. ## Exercise 4 (6 points) What type of variable is grade prediction? Why? If grade prediction had been reported as quality points (a number between 0.0 and 4.0) what type of variable would it have been? Why? ## Exercise 5 (20 points) Explore the effect of an outlier by considering the data set consisting of $n-1$ measurements equal to $-1$ and one measurement equal to $n-1$. For example, if $n=4$, then the data are $-1, -1, -1, 3$. a. (4 points) Find the mean. Show your work. b. (2 points) Find the median. c. (4 points) Find the average deviation from the median. Show your work. d. (4 points) Find the standard deviation. Show your work. e. (6 points) Draw a histogram and mark the mean, median, mean plus one stamdard deviation, and mean plus two standard deviations. To help draw the histogram, you might set $n=9$ or $n=16$; however, the labels should be in terms of arbitrary $n$. ## Exercise 6 (10 points) Consider the data set consisting of 1, 2, 6, 8, and 11 miles. a. Draw a graph of the empirical cumulative distribution function (measurements on the horizontal axis and proportions on the vertical axis). b. Overlay on the part (a) graph a graph of the quantiles as defined by default in R. Note that if $q_{0.4}$ is the 0.4^th^ quantile or 40^th^ percentile, then the corresponding point to plot has $q_{0.4}$ on the horizontal axis and $0.4$ on the vertical axis. ## Exercise 7 (6 points) Suppose that $x_1, x_2, \ldots, x_n$ are $n$ quantitative data. Prove that The median $\tilde{x}$ minimizes the function $f(x) = \displaystyle\sum_{i=1}^{n} |x_i - x|$. ## RStudio Exercises The remaining exercises are to be completed in RStudio. The code chunk below provides the set up by loading the *tidyverse* package and reading the *03 Introduction.csv* data into `intro` (suppressing messages when run). ```{r} #| message: false library(tidyverse) intro = read_csv("03 Introduction.csv") ``` ## Exercise 8 (10 points) a. (7 points) Obtain a good relative frequency bar chart of `Gender` with the bars sorted so that their heights are descending. b. (3 points) Interpret the left-most bar. ## Exercise 9 (26 points) a. (8 points) Obtain a good histogram of `Weight`. b. (3 points) Interpret the left-most bar. c. (9 points) From the histogram, estimate with explanation the median, mean, and standard deviation weight. d. (3 points) Calculate the median, mean, and standard deviation weight. e. (3 points) Compare your estimates with the calculated values. ## Exercise 10 (20 points) a. (8 points) Obtain a good empirical cumulative distribution function plot of `Grade`. b. (6 points) From the plot estimate with explanation the first quartile, median, and third quartile. c. (3 points) Calculate the first quartile, median, and third quartile grade. d. (3 points) Compare your estimates with the calculated values. ## Submission 1. Make sure you have included your name as the author and the date completed in the YAML code at the top of this file. Also replace the first sentence in the Acknowledgements section as directed. 2. Render this `qmd` file as an `html`, `docx`, or `pdf` file. 3. Zip together all relevant files: *Assignments.Rproj*, *A02.qmd*, *03 Introduction.csv*, the rendered file, and (optionally) the `pdf` file containing your answers to the "by hand" exercises. 4. Upload the zip file in Moodle. 5. If you have not included your answers to the "by hand" exercises in the zip file, submit your papers to David Housman or SC 117. 6. Points will be taken off if one or more parts of the submission process are not completed.