--- title: "C02 Variable and Data Types" author: "David Housman" format: html editor: visual --- ## Course Overview A *variable* is a question asked of entities of interest and *data* are the known answers. More formally, a *variable* is a function from a *population* or *sample* (a set of *observations* or *cases*) to a set of possible *values*, and *data* are the range of the function for a given sample. ## Variable Types Variables are classified into different types based upon how numerically meaningful the answers are. *Qualitative* variables are either *nominal* (answers/values have no numeric meaning) or are *ordinal* (answers/values only have a meaningful order). *Identification* variables are qualitative variables that uniquely identify each case. *Quantitative* variables have numeric answers/values that have numeric meaning greater than just order. For an *interval* scale, equal differences correspond to equal intensity changes. A *ratio* scale adds that doubling is meaningful. An *absolute* scale has no units. In R, qualitative variables can be `char` or `factor` variables, and quantitative variables can be `int`, `dbl`, or `time`. ## Software 1. R is a free software environment for statistical computing and graphics. You should install this on your computer or use a lab computer. 2. RStudio is an integrated development environment (IDE) for R and Python. You should install this on your computer or use a lab computer. 3. An RStudio project keeps related files in a single working directory. By double clicking on the *Classes.Rproj* file, RStudio starts with the project we will often use during class time. 4. A Quarto document is an efficient way to combine text, code, and results in draft and polished formats. These words are in the *02 Variable and Data Types.qmd* file. 5. The tidyverse package (which includes several dependent packages) extends the capabilities of R. You should install this package (see the tab in the lower right window of RStudio). This makes the package(s) available. ## Basic Computation Create a code chunk by pressing Ctrl-Alt-I, and run the code by pressing Ctrl-Shift-Enter. 1. Compute $3+4\cdot 3^2$. ```{r} 3 + 4*3^2 ``` 2. Create a vector with four single digit integers. ```{r} c(3,7,2,1) ``` 3. Create a vector of four names. ```{r} c("David", "Jeanne", "Kate", "Genevieve") ``` 4. Create a vector with the one digit integers in order. ```{r} 0:9 ``` 5. Create a vector of the numbers 0.0, 0.5, 1.0, 1.5, ...., 9.5, 10.0. ```{r} seq(0.0, 10.0, 0.5) ``` ```{r} seq(from = 0.0, to = 10.0, by = 0.5) ``` 6. Guess what each of the following computations do, and then check your guess. ```{r} #| eval: false x = 1:5 x + 3 ``` ```{r} #| eval: false x + x ``` ```{r} #| eval: false 1:8 * 1:2 ``` ## Data Frames 1. Load the tidyverse package and any dependent packages. Observe the message. ```{r} library(tidyverse) ``` 2. View the mpg data frame and its documentation. Each row is a case, and each column is a variable. ```{r} #| eval: false View(mpg) ``` 3. Obtain the second, ninth, and fourth rows of mpg using base and tidyverse. Provide an interpretation of the second row. What is the type of each variable according to our classification scheme and according to R? ```{r} mpg[c(2,9,4),] ``` ```{r} mpg |> slice(c(2,9,4)) ``` The second row is a particular model of car manufactured by Audi (nominal), model a4 (nomial), manufactured in 1999 (interval). It has an engine displacement of 1.8 litres (ratio), 4 cylinders (absolute), manual transmission (nominal, binary if the number of speeds were not considered), front-wheel drive train (nominal), a petro fuel type (nominal), and is a compact type of car (nominal,, not quite ordinal). In EPA testing, it obtained 21 miles per gallon when doing city driving (ratio) and 29 miles per gallon with doing highway driving (ratio). 4. Read in the *02ClassData.csv* data file and save to `math323`. ```{r} math323 = read_csv("C02Data.csv") ``` 5. Observe the presence of `math323` in the *Environment* tab in the upper right window of RStudio. Double click to view the data frame. 6. Observe the message that states the data type assigned to each variable. Use help to determine better types to assign., and then view the third row. ```{r} math323 = read_csv("C02Data.csv", col_types = "cfftdif") math323 %>% slice(3) ``` ## Render 1. Render this file as an html document. Observe how the YAML in the header is incorporated. 2. Add an option to the code chunk containing the `View` command so that it is not evaluated.