Maps with maps and sf

Before we get started, some context:

  • R is fantastic for spacial analysis (not covered in this classā€¦ look for classes related to spacial statistics)
  • R is great for interactive data visualization (via leaflet or shinyā€¦ more on this on Thursday)
  • R is okay at spacial data visualization (creating maps).
    • There are many different packages in R for creating maps. Iā€™ve found that different packages perform best for different maps. We will talk about a few different ones today.
    • If you have a highly map-centric project, there is nothing wrong with working in ArcGIS or QGIS if you find the mapping tools in R insufficient. There are many recent improvements with new packages (like sp, rgdal and rgeos) which profiles much of the functionality of GIS packages! Exciting! (not very beginner friendly - requires familiarity with GIS concepts)

Using the sf package

Vector data for maps are typically encoded using the ā€œsimple featuresā€ standard produced by the Open Geospatial Consortium. The sf package developed by Edzer Pebesma provides an excellent toolset for working with such data, and the geom_sf() and coord_sf() functions in ggplot2 are designed to work together with the sf package.

#LOAD PACKAGES
library(tidyverse)

#install.packages("sf") - note some students are getting a pop-up when they install the sf package for the first time. Select the "no" option when it pops up in your console. 
library(sf)

#some students are needing into install the rgeos package seperately as well
#library(rgeos)

For our first example, we will be working with a dataset of North Carolina that is built in to the sf package.

demo(nc, ask = FALSE, echo = FALSE)

You should notice that the nc dataset is now saved in your R environment. This dataset contains information about Sudden Infant Death Syndrome (SIDS) for North Carolina counties, over two time periods (1974-78 and 1979-84). Letā€™s take a look at that dataset.

Each row represents a county in North Carolina. This data frame contains the following columns:

  • AREA County polygon areas in degree units
  • PERIMETER County polygon perimeters in degree units
  • CNTY_ Internal county ID
  • NAME County names
  • FIPS County ID
  • FIPSNO County ID
  • CRESS_ID Cressie papers ID
  • BIR74 births, 1974-78
  • SID74 SID deaths, 1974-78
  • NWBIR74 non-white births, 1974-78
  • BIR79 births, 1979-84
  • SID79 SID deaths, 1979-84
  • NWBIR79 non-white births, 1979-84
  • geom information needed to plot the map for each county

Letā€™s begin by simply plotting the map using geom_sf. Note that you donā€™t need to specify the x- or y-axes ā€“ sf figures that out for you.

nc %>%
  ggplot() +
  geom_sf()

Letā€™s pretty it up:

nc %>%
  ggplot() +
  geom_sf(col="black", fill="darkgrey") +
  theme_light() +
  ggtitle("North Carolina Counties")

Cloropleth maps

What is a cloropleth map?

A choropleth map is a type of thematic map where areas (such as countries, states, or regions) are shaded or colored based on data values. Itā€™s commonly used to visualize statistical information, such as population density, election results, or income levels, by using different shades or colors to represent varying data ranges.

An example from USA Today

Suppose we want to shade each of these counties, based on the number of births in 1974.

map <- nc %>%
  ggplot() +
  geom_sf( aes(fill = BIR74), col ="black") +
  theme_light()+
  ggtitle("North Carolina, Birth Rates in 1974")

map

Color Palettes

Qualitative Color Palettes

Best forā€¦ Categories (unordered)
Examples Species, Groups, Brands
RColorBrewer Palettes "Set1", "Dark2", "Paired"
Example R Code scale_fill_brewer(palette = "Set1")
wesanderson Palettes "GrandBudapest1", "Darjeeling1", "Moonrise2"
Example R Code scale_fill_manual(values = wes_palette("GrandBudapest1"))

Sequential Color Palettes

Best forā€¦ Ordered, continuous data
Examples Temperature, Population Density
RColorBrewer Palettes "Blues", "Reds", "Greens"
Example R Code scale_fill_brewer(palette = "Blues")
viridis Palettes "viridis", "magma", "plasma", "cividis"
Example R Code scale_fill_viridis_c(option = "magma")
Build your Own scale_fill_gradientn(c("red", "yellow"))

Note: Be sure that higher values are encoded with the darkest colors!

Diverging Color Palettes

Best forā€¦ Data with a central midpoint
Examples Election Results, Anomaly Detection
RColorBrewer Palettes "RdBu", "Spectral"
Example R Code scale_fill_brewer(palette = "RdBu")
Build your Own scale_fill_manual(values = c("red", "orange"))

Some general guidelines when choosing color palettes:

āœ… Match palette type to data type

āœ… Choose colorblind-friendly palettes when designing for general audiences

āœ… Limit colors to avoid overwhelming the reader - for categortical data limit the number of distinct colors to 5-8 max (beyond that, consider grouping)

āœ… Consider the meaning of colors in your audienceā€™s cultural context.

āœ… If the data is skewed, consider using the scales package to log -scale.

šŸ”“ Avoid: Using blue for land in maps

Customizing Cholepleth maps

library(RColorBrewer)

map + 
  scale_fill_viridis_c(option = "magma", direction = -1) 

Adding labels with geom_sf_text()

map + 
  scale_fill_viridis_c(option = "magma", direction = -1)+ 
  geom_sf_text(aes(label = NAME), size = 1)

Warning

Since population density naturally drives most data trends, these maps frequently fail to provide any useful or surprising information.

XKCD

šŸ”“ Correlation doesnā€™t imply causation! Just because two variables show similar patterns doesnā€™t mean one causes the other.

āœ… Use rates, percentages, or per capita values rather than absolute numbers. Example: Instead of showing total website users per state, show website users per 100,000 residents.

āœ… Use location quotients or z-scores to highlight areas with unusually high or low values relative to expectations. Example: Show the percentage of a stateā€™s population that subscribes to Martha Stewart Living relative to the national average.