Introduction to ggplot2

Lucy D’Agostino McGowan

ggplot2 \(\in\) tidyverse

  • ggplot2 is tidyverse’s data visualization package
  • Structure of the code for plots can be summarized as
ggplot(data = [dataset], 
       mapping = aes(x = [x-variable], 
                     y = [y-variable])) +
  geom_xxx() +
  other options

Data: Palmer Penguins

ggplot(data = penguins, 
       mapping = aes(x = bill_depth_mm, y = bill_length_mm,
                     colour = species)) +
  geom_point() +
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)", y = "Bill length (mm)",
       colour = "Species")

Plot in Layers

Start with the penguins data frame

ggplot(data = penguins)

Start with the penguins data frame, map bill depth to the x-axis

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm))

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm))

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm)) + 
  geom_point()

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point.

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point()

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie,
         Chinstrap, and Gentoo Penguins")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie,
         Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)", 
       y = "Bill length (mm)")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively label the legend “Species”

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie,
         Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)", 
       y = "Bill length (mm)",
       color = "Species")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively label the legend “Species”, and add a caption for the data source.

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie,
         Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)",
       y = "Bill length (mm)",
       color = "Species",
       caption = "Source: Palmer Station
         LTER / palmerpenguins package")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively label the legend “Species”, and add a caption for the data source. Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie,
         Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)", 
       y = "Bill length (mm)",
       color = "Species",
       caption = "Source: Palmer Station
         LTER / palmerpenguins package") + 
  scale_color_viridis_d()

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() + 
  labs(title = "Bill depth and length",
       subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
       x = "Bill depth (mm)", y = "Bill length (mm)",
       color = "Species",
       caption = "Source: Palmer Station LTER / palmerpenguins package") + 
  scale_color_viridis_d()

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively label the legend “Species”, and add a caption for the data source. Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.

Aesthetics

Aesthetics

Commonly used characteristics of plotting characters that can be mapped to a specific variable in the data are

  • color
  • shape
  • size
  • alpha (transparency)

Color

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species)) + 
  geom_point() +
  scale_color_viridis_d()

Shape

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species,
                     shape = island)) + 
  geom_point() +
  scale_color_viridis_d()

Size

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species,
                     shape = species,
                     size = body_mass_g)) + 
  geom_point() +
  scale_color_viridis_d()

Alpha

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm,
                     y = bill_length_mm,
                     color = species,
                     shape = species,
                     size = body_mass_g,
                     alpha = flipper_length_mm)) + 
  geom_point() +
  scale_color_viridis_d()

Mapping

ggplot(penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm,
           size = body_mass_g, 
           alpha = flipper_length_mm)) + 
  geom_point()

Setting

ggplot(penguins,
       aes(x = bill_depth_mm,
           y = bill_length_mm)) + 
  geom_point(size = 2, alpha = 0.5)

Mapping vs. setting

  • Mapping: Determine the size, alpha, etc. of points based on the values of a variable in the data
  • goes into aes()
  • Setting: Determine the size, alpha, etc. of points not based on the values of a variable in the data
  • goes into geom_*() (this was geom_point() in the previous example, but we’ll learn about other geoms soon!)

Application Exercise

  1. Open the Welcome Penguins folder from Day 1 in RStudio Pro (can’t find it? Copy the files again by following these instructions)

  2. Create a new R chunk (don’t remember how to do this? Make sure you are using the visual editor then click Insert > Code Chunk > R). Using the code in the chunk labeled plot as a template, create a plot that examines the relationship between x = Flipper Length and y = Bill Depth.

  3. Update the plot from part 2 to have a different shape depending on the Island the penguin is from.

  4. Change the size of all of the points to 3.

  5. Be sure to update all labels to describe what you have created.

BONUS: Is there any missing data? What is the plot doing with the missing values? See if you can get rid of the warning message.

10:00