library(tidyverse)
STAT 118: Notes F
Making plots with ggplot2
: line graphs, histograms & boxplots
Recall: The mtcars
dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). It’s available inside the ggplot
package which is already installed.
#load the data
data(mtcars)
#code to update `mtcars` dataset so that `am` is treated as a factor rather than a continuous numeric variable
<- mtcars %>%
mtcars mutate(am = as.factor(am))
Histograms
Histograms are great for looking at the distributions of numeric variables
%>%
mtcars ggplot(aes(x=mpg)) +
geom_histogram()
You can control the bin size with binwidth
%>%
mtcars ggplot(aes(x=mpg)) +
geom_histogram(binwidth=3)
Boxplots
Boxplots are good for displaying the spread, central tendency, and distribution of one numeric variable.
%>%
mtcars ggplot(aes(y=mpg)) +
geom_boxplot()
Side-by-side boxplots are good for displaying one categorical variable and one numeric variable. One advantage of boxplots over barplots is that they are able to show a bit about the spread and distribution of the numeric variable!
%>%
mtcars ggplot(aes(x=am, y=mpg)) +
geom_boxplot()
Line Graphs
Line graphs are great for showing trends with respect to ordinal (ordered variables). Time is used quite commonly.
We consider data on river flow rates collected by volunteers of the Pierce Conservation District in WA.
<- read.csv("https://www.openintro.org/data/csv/flow_rates.csv") flow_rates
%>%
flow_rates ggplot(aes(x=date, y=flow, group=site, color=site)) +
geom_line() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=7)) +
labs(x="Date", y="Flow Rate")
What happens if we didn’t have the group=
command?
%>%
flow_rates ggplot(aes(x=date, y=flow)) +
geom_line() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=7))
Yikes!
If there is grouping in the data, you need to either: - seperate by group by using the group=
command - Merge together (add? average?) the groups using summarize()
first before trying to graph
Facet Wrap
facet_wrap()
is a function in the ggplot2
package that allows you to create a multi-panel plot showing a similar plot over different subsets of the data, usually different values of a categorical variable.
# Example of Facet Wrap with `flow_rates` dataset
%>%
flow_rates ggplot(aes(x=date, y=flow, group=site, color=site)) +
geom_line() +
facet_wrap(~site) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=4))
You can create facets over any categorical variable in the dataset!
#Example of Facet Wrap with `mtcars` dataset
%>%
mtcars mutate(vs=as.factor(vs)) %>% #engine shape 0=v-shaped, 1=manual
ggplot(aes(x=vs, y=mpg, fill=am)) +
geom_boxplot() +
facet_wrap(~am) +
theme(legend.position="none")