--- title: "C05 Qualitative Explanatory Variable" author: "David Housman" format: html editor: visual fig-width: 6 fig-asp: 0.618 out-width: "70%" fig-align: center --- ## Introduction Previously, we considered techniques to describe a single variable.  We now examine techniques to describe the relationship between two variables.  We usually think of one variable as the *dependent* or *response* variable and the other variable as the *independent*, *predictor*, or *explanatory* variable.  Again, the description techniques used depend on the types of variables.  Load the important packages. The `janitor` package provides a nice way to create tables based on the tidyverse. ```{r} #| message: false #| warning: false library(tidyverse) library(janitor) ``` Import the illustrative data set, and treat `Excitement` as an ordinal factor variable. ```{r} ##| eval: false intro = read_csv("03 Introduction.csv") |> mutate(Excitement = factor(Excitement, levels = "5":"1")) ``` ## Quantitative Response and Qualitative Explanatory Variables The approach is to use the visualizations and statistics used for a quantitative variable for each value of the qualitative variable. Consider `Weight` as a function of `Gender`. 1. Exclude the non-binary students. ```{r} intro |> filter(Gender == "N") ``` ```{r} intro2 = intro |> filter(Gender != "N") ``` 1. Obtain box plots. Overlay with blue jitter plots. ```{r} ggplot(intro2, aes(y = Weight, x = Gender)) + geom_boxplot() + geom_jitter(height = 0, width = 0.1, color = "blue") ``` 1. Estimate and obtain the standard five-number summaries. ```{r} intro2 |> group_by(Gender) |> summarise( min = min(Weight), Q1 = quantile(Weight, 0.25), med = median(Weight), Q3 = quantile(Weight, 0.75), max = max(Weight) ) ``` ```{r} summary(intro2) ``` 2. Obtain overlaid density plots. Unlike the other graphs, the response variable is typically represented on the horizontal axis. ```{r} ggplot(intro2, aes(x = Weight, color = Gender, fill = Gender)) + geom_density(alpha = 0.5) + xlab("Weight (lb)") ``` 2. Estimate and obtain counts, means, and standard deviations. ```{r} intro2 |> group_by(Gender) |> summarise( n = n(), mean = mean(Weight), sd = sd(Weight) ) ``` 3. Obtain vertically aligned histograms. ```{r} ggplot(intro2, aes(x = Weight)) + geom_histogram(bins = 8, fill = "gray", color = "black") + facet_wrap(~Gender, nrow = 2) + xlab("Weight (lb)") ``` 4. Obtain grouped empirical cumulative distribution functions plot. ```{r} ggplot(intro2, aes(x = Weight, color = Gender)) + stat_ecdf(geom = "step") ``` ## Qualitative Response and Explanatory Variables The approach is to use the visualizations and statistics used for a qualitative variable for each value of the explanatory qualitative variable. Consider `Excitement` as a function of `Gender`. 1. Obtain a stacked frequency bar chart. ```{r} ggplot(intro, aes(fill = Excitement, x = Gender)) + geom_bar() ``` 2. Obtain a dodged frequency bar chart. ```{r} ggplot(intro, aes(fill = Excitement, x = Gender)) + geom_bar(position = "dodge2") ``` 2. Obtain a relative frequency within each group stacked bar chart. ```{r} ggplot(intro, aes(fill = Excitement, x = Gender)) + geom_bar(position = "fill") ``` 4. Obtain a table of frequencies. ```{r} intro |> tabyl(Excitement, Gender) ``` 4. Obtain a table of percentages by group. ```{r} intro |> tabyl(Excitement, Gender) |> adorn_percentages("col") |> adorn_pct_formatting(digits = 2) ```