#LOAD PACKAGES
library(tidyverse)
#install.packages("sf") - note some students are getting a pop-up when they install the sf package for the first time. Select the "no" option when it pops up in your console.
library(sf)
#some students are needing into install the rgeos package seperately as well
#library(rgeos)
Maps with maps
and sf
Before we get started, some context:
R
is fantastic for spacial analysis (not covered in this classā¦ look for classes related to spacial statistics)R
is great for interactive data visualization (vialeaflet
orshiny
ā¦ more on this on Thursday)R
is okay at spacial data visualization (creating maps).- There are many different packages in
R
for creating maps. Iāve found that different packages perform best for different maps. We will talk about a few different ones today. - If you have a highly map-centric project, there is nothing wrong with working in ArcGIS or QGIS if you find the mapping tools in R insufficient. There are many recent improvements with new packages (like
sp
,rgdal
andrgeos
) which profiles much of the functionality of GIS packages! Exciting! (not very beginner friendly - requires familiarity with GIS concepts)
- There are many different packages in
Using the sf
package
Vector data for maps are typically encoded using the āsimple featuresā standard produced by the Open Geospatial Consortium. The sf
package developed by Edzer Pebesma provides an excellent toolset for working with such data, and the geom_sf()
and coord_sf()
functions in ggplot2 are designed to work together with the sf
package.
For our first example, we will be working with a dataset of North Carolina that is built in to the sf
package.
demo(nc, ask = FALSE, echo = FALSE)
You should notice that the nc
dataset is now saved in your R environment. This dataset contains information about Sudden Infant Death Syndrome (SIDS) for North Carolina counties, over two time periods (1974-78 and 1979-84). Letās take a look at that dataset.
Each row represents a county in North Carolina. This data frame contains the following columns:
AREA
County polygon areas in degree unitsPERIMETER
County polygon perimeters in degree unitsCNTY_
Internal county IDNAME
County namesFIPS
County IDFIPSNO
County IDCRESS_ID
Cressie papers IDBIR74
births, 1974-78SID74
SID deaths, 1974-78NWBIR74
non-white births, 1974-78BIR79
births, 1979-84SID79
SID deaths, 1979-84NWBIR79
non-white births, 1979-84geom
information needed to plot the map for each county
Letās begin by simply plotting the map using geom_sf
. Note that you donāt need to specify the x- or y-axes ā sf figures that out for you.
%>%
nc ggplot() +
geom_sf()
Letās pretty it up:
%>%
nc ggplot() +
geom_sf(col="black", fill="darkgrey") +
theme_light() +
ggtitle("North Carolina Counties")
Cloropleth maps
A choropleth map is a type of thematic map where areas (such as countries, states, or regions) are shaded or colored based on data values. Itās commonly used to visualize statistical information, such as population density, election results, or income levels, by using different shades or colors to represent varying data ranges.
Suppose we want to shade each of these counties, based on the number of births in 1974.
<- nc %>%
map ggplot() +
geom_sf( aes(fill = BIR74), col ="black") +
theme_light()+
ggtitle("North Carolina, Birth Rates in 1974")
map
Color Palettes
Qualitative Color Palettes
Best forā¦ | Categories (unordered) |
Examples | Species, Groups, Brands |
RColorBrewer Palettes |
"Set1" , "Dark2" , "Paired" |
Example R Code | scale_fill_brewer(palette = "Set1") |
wesanderson Palettes |
"GrandBudapest1" , "Darjeeling1" , "Moonrise2" |
Example R Code | scale_fill_manual(values = wes_palette("GrandBudapest1")) |
Sequential Color Palettes
Best forā¦ | Ordered, continuous data |
Examples | Temperature, Population Density |
RColorBrewer Palettes |
"Blues" , "Reds" , "Greens" |
Example R Code | scale_fill_brewer(palette = "Blues") |
viridis Palettes |
"viridis" , "magma" , "plasma" , "cividis" |
Example R Code | scale_fill_viridis_c(option = "magma") |
Build your Own | scale_fill_gradientn(c("red", "yellow")) |
Note: Be sure that higher values are encoded with the darkest colors!
Diverging Color Palettes
Best forā¦ | Data with a central midpoint |
Examples | Election Results, Anomaly Detection |
RColorBrewer Palettes |
"RdBu" , "Spectral" |
Example R Code | scale_fill_brewer(palette = "RdBu") |
Build your Own | scale_fill_manual(values = c("red", "orange")) |
ā Match palette type to data type
ā Choose colorblind-friendly palettes when designing for general audiences
ā Limit colors to avoid overwhelming the reader - for categortical data limit the number of distinct colors to 5-8 max (beyond that, consider grouping)
ā Consider the meaning of colors in your audienceās cultural context.
ā
If the data is skewed, consider using the scales
package to log -scale.
š“ Avoid: Using blue for land in maps
Customizing Cholepleth maps
library(RColorBrewer)
+
map scale_fill_viridis_c(option = "magma", direction = -1)
Adding labels with geom_sf_text()
+
map scale_fill_viridis_c(option = "magma", direction = -1)+
geom_sf_text(aes(label = NAME), size = 1)
Since population density naturally drives most data trends, these maps frequently fail to provide any useful or surprising information.
š“ Correlation doesnāt imply causation! Just because two variables show similar patterns doesnāt mean one causes the other.
ā Use rates, percentages, or per capita values rather than absolute numbers. Example: Instead of showing total website users per state, show website users per 100,000 residents.
ā Use location quotients or z-scores to highlight areas with unusually high or low values relative to expectations. Example: Show the percentage of a stateās population that subscribes to Martha Stewart Living relative to the national average.