R for Data Science Exercises: Communication

Now that we understand your data, we need to communicate this understanding to others. To help others quickly build up a good mental model of the data, you will need to be as self-explanatory as possible. This chapter focuses on the tools you need to create good graphics.

R for Data Science Exercises: Communication

R for Data Science 2nd Edition Exercises (Wickham, Mine Çetinkaya-Rundel and Grolemund, 2023)

Communication

Run the code in your script for the answers! I'm just exploring as I go.

Packages to load

library(tidyverse)
library(ggplot2)
library(scales)
library(ggrepel)
library(patchwork)

Introduction

Now that we understand your data, we need to communicate this understanding to others. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. This chapter focuses on the tools you need to create good graphics.

Labels

The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels.

Questions

  1. Create one plot on the fuel economy data with customized title, subtitle, caption, x, y, and color labels.

  2. Recreate the following plot using the fuel economy data.
    Note that both the colors and shapes of points vary by type of drive train.

    #| echo: false
    #| fig-alt: |
    #|   Scatterplot of highway versus city fuel efficiency. Shapes and 
    #|   colors of points are determined by type of drive train.
    
    ggplot(mpg, aes(x = cty, y = hwy, color = drv, shape = drv)) +
      geom_point() +
      labs(
        x = "City MPG",
        y = "Highway MPG",
        shape = "Type of\ndrive train",
        color = "Type of\ndrive train"
      )
    
  3. Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.

Answers

Solution 1:

ggplot(
  data = mpg,
  mapping = aes(x = fct_reorder(class, hwy), y = hwy)
) +
  geom_boxplot() +
  coord_flip() +
  labs(
    title = "Compact Cars have > 10 Hwy MPG than Pickup Trucks",
    subtitle = "Comparing the median highway mpg in each class",
    caption = "Data from fueleconomy.gov",
    x = "Car Class",
    y = "Highway Miles per Gallon"
  )

Solution 2 (Slighly Redundant!?):

ggplot(mpg, aes(x = cty, y = hwy, color = drv, shape = drv)) +
      geom_point() +
      labs(
        x = "City MPG",
        y = "Highway MPG",
        shape = "Type of\ndrive train",
        color = "Type of\ndrive train"
      )

Solution 3:

label_info <- mpg |>
  group_by(drv) |>
  arrange(desc(displ)) |>
  slice_head(n = 1) |>
  mutate(
    drive_type = case_when(
      drv == "f" ~ "front-wheel drive",
      drv == "r" ~ "rear-wheel drive",
      drv == "4" ~ "4-wheel drive"
    )
  ) |>
  select(displ, hwy, drv, drive_type)

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point(alpha = 0.3) +
  geom_smooth(se = FALSE) +
  geom_label_repel(
    data = label_info, 
    aes(x = displ, y = hwy, label = drive_type),
    fontface = "bold", size = 5, nudge_y = 5, nudge_x = -1
  ) +
  theme(legend.position = "none") + labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    color = "Drive train",
    title = "Fuel efficiency in front-, rear- and 4-wheeled drived vehicles",
    caption = "Data from fueleconomy.gov"
  )

Annotations

In addition to labelling major components of your plot, it's often useful to label individual observations or groups of observations. This makes it possible to add context to observations made by your plots.

Questions

  1. Use geom_text() with infinite positions to place text at the four corners of the plot.

  2. Use annotate() to add a point geom in the middle of your last plot without having to create a tibble.
    Customize the shape, size, or color of the point.

  3. How do labels with geom_text() interact with faceting?
    How can you add a label to a single facet?
    How can you put a different label in each facet?
    (Hint: Think about the dataset that is being passed to geom_text().)

  4. What arguments of geom_label() control the appearance of the background box?

  5. What are the four arguments to arrow()?
    How do they work?
    Create a series of plots that demonstrate the most important options.

Answers

Solution 1:

Use a similar code as the example in the text. However, use vjust and hjust in order for the text to appear in the plot, and these need to be different for each corner. geom_text() takes hjust and vjust as aesthetics, add them to the data and mappings, and use a single geom_text() call instead of four different geom_text() calls with four different data arguments, and four different values of hjust and vjust arguments.

label <- tribble(
  ~displ, ~hwy, ~label, ~vjust, ~hjust,
  Inf, Inf, "Top right", "top", "right",
  Inf, -Inf, "Bottom right", "bottom", "right",
  -Inf, Inf, "Top left", "top", "left",
  -Inf, -Inf, "Bottom left", "bottom", "left"
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label, vjust = vjust, hjust = hjust), data = label)

Solution 2:

With annotate you use what would be aesthetic mappings directly as arguments:

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  annotate("text",
    x = Inf, y = Inf,
    label = "Increasing engine size is \nrelated to decreasing fuel economy.", vjust = "top", hjust = "right"
  )

Solution 3:

If the facet variable is not specified, the text is drawn in all facets.

label <- tibble(
  displ = Inf,
  hwy = Inf,
  label = "Increasing engine size is \nrelated to decreasing fuel economy."
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label),
    data = label, vjust = "top", hjust = "right",
    size = 2
  ) +
  facet_wrap(~class)

To draw the label in only one facet, add a column to the label data frame with the value of the faceting variable(s) in which to draw it.

label <- tibble(
  displ = Inf,
  hwy = Inf,
  class = "2seater",
  label = "Increasing engine size is \nrelated to decreasing fuel economy."
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label),
    data = label, vjust = "top", hjust = "right",
    size = 2
  ) +
  facet_wrap(~class)

To draw labels in different plots, simply have the facetting variable(s):

label <- tibble(
  displ = Inf,
  hwy = Inf,
  class = unique(mpg$class),
  label = str_c("Label for ", class)
)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_text(aes(label = label),
    data = label, vjust = "top", hjust = "right",
    size = 3
  ) +
  facet_wrap(~class)

Solution 4:

  • label.padding: padding around label
  • label.r: amount of rounding in the corners
  • label.size: size of label border

Solution 5:

The four arguments are (from the help for arrow()):

  • angle : angle of arrow head
  • length : length of the arrow head
  • ends: ends of the line to draw arrow head
  • type: "open" or "close": whether the arrow head is a closed or open triangle

Scales

Scales control how the aesthetic mappings manifest visually.

Questions

  1. Why doesn't the following code override the default scale?

    #| fig-show: "hide"
    
    df <- tibble(
      x = rnorm(10000),
      y = rnorm(10000)
    )
    
    ggplot(df, aes(x, y)) +
      geom_hex() +
      scale_color_gradient(low = "white", high = "red") +
      coord_fixed()
    
  2. What is the first argument to every scale?
    How does it compare to labs()?

  3. Change the display of the presidential terms by:

    a. Combining the two variants that customize colors and x axis breaks.
    b. Improving the display of the y axis.
    c. Labelling each term with the name of the president.
    d. Adding informative plot labels.
    e. Placing breaks every 4 years (this is trickier than it seems!).

  4. First, create the following plot.
    Then, modify the code using override.aes to make the legend easier to see.

    #| fig-show: hide
    
    ggplot(diamonds, aes(x = carat, y = price)) +
      geom_point(aes(color = cut), alpha = 1/20)
    

Answers

Solution 1:

It does not override the default scale because the colors in geom_hex() are set by the fill aesthetic, not the color aesthetic.

ggplot(df, aes(x, y)) +
  geom_hex() +
  scale_fill_gradient(low = "white", high = "red") +
  coord_fixed()

Solution 2:

The first argument to every scale is the label for the scale. It is equivalent to using the labs function.

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    colour = "Car type"
  )
ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  scale_x_continuous("Engine displacement (L)") +
  scale_y_continuous("Highway fuel economy (mpg)") +
  scale_colour_discrete("Car type")

Solution 3:

fouryears <- lubridate::make_date(seq(year(min(presidential$start)),
  year(max(presidential$end)),
  by = 4
), 1, 1)

presidential %>%
  mutate(
    id = 33 + row_number(),
    name_id = fct_inorder(str_c(name, " (", id, ")"))
  ) %>%
  ggplot(aes(start, name_id, colour = party)) +
  geom_point() +
  geom_segment(aes(xend = end, yend = name_id)) +
  scale_colour_manual("Party", values = c(Republican = "red", Democratic = "blue")) +
  scale_y_discrete(NULL) +
  scale_x_date(NULL,
    breaks = presidential$start, date_labels = "'%y",
    minor_breaks = fouryears
  ) +
  ggtitle("Terms of US Presdients",
    subtitle = "Roosevelth (34th) to Obama (44th)"
  ) +
  theme(
    panel.grid.minor = element_blank(),
    axis.ticks.y = element_blank()
  )

Solution 4:

The problem with the legend is that the alpha value make the colors hard to see. Override the alpha value to make the points solid in the legend.

ggplot(diamonds, aes(carat, price)) +
  geom_point(aes(colour = cut), alpha = 1 / 20) +
  theme(legend.position = "bottom") +
  guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1)))

Themes

All the the non-data elements of your plot.

Questions

  1. Pick a theme offered by the ggthemes package and apply it to the last plot you made.

  2. Make the axis labels of your plot blue and bolded.

Answers

Solution 1:

ggplot(diamonds, aes(carat, price)) +
  geom_point(aes(colour = cut), alpha = 1 / 20) +
  theme_classic() +
  theme(legend.position = "bottom") +
  guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1))) 

Solution 2:

ggplot(diamonds, aes(carat, price)) +
  geom_point(aes(colour = cut), alpha = 1 / 20) +
  theme_classic() +
  theme(legend.position = "bottom",
        axis.text.x = element_text(face="bold", color="blue", 
                           size=11, angle=45),
        axis.text.y = element_text(face="bold", color="blue", 
                           size=11, angle=45)) +
  guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1))) 

Layout

The patchwork package allows you to combine separate plots into the same graphic. To place two plots next to each other, you can simply add (+) them to each other.

Questions

  1. What happens if you omit the parentheses in the following plot layout.
    Can you explain why this happens?

    #| fig-show: hide
    
    p1 <- ggplot(mpg, aes(x = displ, y = hwy)) + 
      geom_point() + 
      labs(title = "Plot 1")
    p2 <- ggplot(mpg, aes(x = drv, y = hwy)) + 
      geom_boxplot() + 
      labs(title = "Plot 2")
    p3 <- ggplot(mpg, aes(x = cty, y = hwy)) + 
      geom_point() + 
      labs(title = "Plot 3")
    
    (p1 | p2) / p3
    
  2. Using the three plots from the previous exercise, recreate the following patchwork - Three plots: Plot 1 is a scatterplot of highway mileage versus engine size. Plot 2 is side-by-side box plots of highway mileage versus drive train. Plot 3 is side-by-side box plots of city mileage versus drive train. Plots 1 is on the first row. Plots 2 and 3 are on the next row, each span half the width of Plot 1. Plot 1 is labelled "Fig. A", Plot 2 is labelled "Fig. B", and Plot 3 is labelled "Fig. C".

Answers

Solution 1:

    p1 <- ggplot(mpg, aes(x = displ, y = hwy)) + 
      geom_point() + 
      labs(title = "Plot 1")
    p2 <- ggplot(mpg, aes(x = drv, y = hwy)) + 
      geom_boxplot() + 
      labs(title = "Plot 2")
    p3 <- ggplot(mpg, aes(x = cty, y = hwy)) + 
      geom_point() + 
      labs(title = "Plot 3")

    p1 | p2 / p3 # removed () plots p1 stand-alone on the left and stacks p2 above p3.

Solution 2:

 #| fig-width: 7
    #| fig-asp: 0.8

    p1 / (p2 + p3) +
      plot_annotation(
        tag_levels = c("A"), 
        tag_prefix = "Fig. ",
        tag_suffix = ":"
      )

Reference

Wickham, H., Mine Çetinkaya-Rundel and Grolemund, G. (2023) R for data science. 2nd ed. Sebastopol, CA: O’Reilly Media.