R for Data Science Exercises: Communication
Now that we understand your data, we need to communicate this understanding to others. To help others quickly build up a good mental model of the data, you will need to be as self-explanatory as possible. This chapter focuses on the tools you need to create good graphics.
R for Data Science 2nd Edition Exercises (Wickham, Mine Çetinkaya-Rundel and Grolemund, 2023)
Communication
Run the code in your script for the answers! I'm just exploring as I go.
Packages to load
library(tidyverse)
library(ggplot2)
library(scales)
library(ggrepel)
library(patchwork)
Introduction
Now that we understand your data, we need to communicate this understanding to others. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. This chapter focuses on the tools you need to create good graphics.
Labels
The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels.
Questions
-
Create one plot on the fuel economy data with customized
title
,subtitle
,caption
,x
,y
, andcolor
labels. -
Recreate the following plot using the fuel economy data.
Note that both the colors and shapes of points vary by type of drive train.#| echo: false #| fig-alt: | #| Scatterplot of highway versus city fuel efficiency. Shapes and #| colors of points are determined by type of drive train. ggplot(mpg, aes(x = cty, y = hwy, color = drv, shape = drv)) + geom_point() + labs( x = "City MPG", y = "Highway MPG", shape = "Type of\ndrive train", color = "Type of\ndrive train" )
-
Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.
Answers
Solution 1:
ggplot(
data = mpg,
mapping = aes(x = fct_reorder(class, hwy), y = hwy)
) +
geom_boxplot() +
coord_flip() +
labs(
title = "Compact Cars have > 10 Hwy MPG than Pickup Trucks",
subtitle = "Comparing the median highway mpg in each class",
caption = "Data from fueleconomy.gov",
x = "Car Class",
y = "Highway Miles per Gallon"
)
Solution 2 (Slighly Redundant!?):
ggplot(mpg, aes(x = cty, y = hwy, color = drv, shape = drv)) +
geom_point() +
labs(
x = "City MPG",
y = "Highway MPG",
shape = "Type of\ndrive train",
color = "Type of\ndrive train"
)
Solution 3:
label_info <- mpg |>
group_by(drv) |>
arrange(desc(displ)) |>
slice_head(n = 1) |>
mutate(
drive_type = case_when(
drv == "f" ~ "front-wheel drive",
drv == "r" ~ "rear-wheel drive",
drv == "4" ~ "4-wheel drive"
)
) |>
select(displ, hwy, drv, drive_type)
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
geom_label_repel(
data = label_info,
aes(x = displ, y = hwy, label = drive_type),
fontface = "bold", size = 5, nudge_y = 5, nudge_x = -1
) +
theme(legend.position = "none") + labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
color = "Drive train",
title = "Fuel efficiency in front-, rear- and 4-wheeled drived vehicles",
caption = "Data from fueleconomy.gov"
)
Annotations
In addition to labelling major components of your plot, it's often useful to label individual observations or groups of observations. This makes it possible to add context to observations made by your plots.
Questions
-
Use
geom_text()
with infinite positions to place text at the four corners of the plot. -
Use
annotate()
to add a point geom in the middle of your last plot without having to create a tibble.
Customize the shape, size, or color of the point. -
How do labels with
geom_text()
interact with faceting?
How can you add a label to a single facet?
How can you put a different label in each facet?
(Hint: Think about the dataset that is being passed togeom_text()
.) -
What arguments of
geom_label()
control the appearance of the background box? -
What are the four arguments to
arrow()
?
How do they work?
Create a series of plots that demonstrate the most important options.
Answers
Solution 1:
Use a similar code as the example in the text. However, use vjust
and hjust
in order for the text to appear in the plot, and these need to be different for each corner. geom_text()
takes hjust
and vjust
as aesthetics, add them to the data and mappings, and use a single geom_text()
call instead of four different geom_text()
calls with four different data arguments, and four different values of hjust
and vjust
arguments.
label <- tribble(
~displ, ~hwy, ~label, ~vjust, ~hjust,
Inf, Inf, "Top right", "top", "right",
Inf, -Inf, "Bottom right", "bottom", "right",
-Inf, Inf, "Top left", "top", "left",
-Inf, -Inf, "Bottom left", "bottom", "left"
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label, vjust = vjust, hjust = hjust), data = label)
Solution 2:
With annotate you use what would be aesthetic mappings directly as arguments:
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
annotate("text",
x = Inf, y = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy.", vjust = "top", hjust = "right"
)
Solution 3:
If the facet variable is not specified, the text is drawn in all facets.
label <- tibble(
displ = Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label),
data = label, vjust = "top", hjust = "right",
size = 2
) +
facet_wrap(~class)
To draw the label in only one facet, add a column to the label data frame with the value of the faceting variable(s) in which to draw it.
label <- tibble(
displ = Inf,
hwy = Inf,
class = "2seater",
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label),
data = label, vjust = "top", hjust = "right",
size = 2
) +
facet_wrap(~class)
To draw labels in different plots, simply have the facetting variable(s):
label <- tibble(
displ = Inf,
hwy = Inf,
class = unique(mpg$class),
label = str_c("Label for ", class)
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label),
data = label, vjust = "top", hjust = "right",
size = 3
) +
facet_wrap(~class)
Solution 4:
label.padding
: padding around labellabel.r
: amount of rounding in the cornerslabel.size
: size of label border
Solution 5:
The four arguments are (from the help for arrow()
):
angle
: angle of arrow headlength
: length of the arrow headends
: ends of the line to draw arrow headtype
:"open"
or"close"
: whether the arrow head is a closed or open triangle
Scales
Scales control how the aesthetic mappings manifest visually.
Questions
-
Why doesn't the following code override the default scale?
#| fig-show: "hide" df <- tibble( x = rnorm(10000), y = rnorm(10000) ) ggplot(df, aes(x, y)) + geom_hex() + scale_color_gradient(low = "white", high = "red") + coord_fixed()
-
What is the first argument to every scale?
How does it compare tolabs()
? -
Change the display of the presidential terms by:
a. Combining the two variants that customize colors and x axis breaks.
b. Improving the display of the y axis.
c. Labelling each term with the name of the president.
d. Adding informative plot labels.
e. Placing breaks every 4 years (this is trickier than it seems!). -
First, create the following plot.
Then, modify the code usingoverride.aes
to make the legend easier to see.#| fig-show: hide ggplot(diamonds, aes(x = carat, y = price)) + geom_point(aes(color = cut), alpha = 1/20)
Answers
Solution 1:
It does not override the default scale because the colors in geom_hex()
are set by the fill
aesthetic, not the color
aesthetic.
ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_gradient(low = "white", high = "red") +
coord_fixed()
Solution 2:
The first argument to every scale is the label for the scale. It is equivalent to using the labs
function.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
colour = "Car type"
)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
scale_x_continuous("Engine displacement (L)") +
scale_y_continuous("Highway fuel economy (mpg)") +
scale_colour_discrete("Car type")
Solution 3:
fouryears <- lubridate::make_date(seq(year(min(presidential$start)),
year(max(presidential$end)),
by = 4
), 1, 1)
presidential %>%
mutate(
id = 33 + row_number(),
name_id = fct_inorder(str_c(name, " (", id, ")"))
) %>%
ggplot(aes(start, name_id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = name_id)) +
scale_colour_manual("Party", values = c(Republican = "red", Democratic = "blue")) +
scale_y_discrete(NULL) +
scale_x_date(NULL,
breaks = presidential$start, date_labels = "'%y",
minor_breaks = fouryears
) +
ggtitle("Terms of US Presdients",
subtitle = "Roosevelth (34th) to Obama (44th)"
) +
theme(
panel.grid.minor = element_blank(),
axis.ticks.y = element_blank()
)
Solution 4:
The problem with the legend is that the alpha
value make the colors hard to see. Override the alpha value to make the points solid in the legend.
ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1 / 20) +
theme(legend.position = "bottom") +
guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1)))
Themes
All the the non-data elements of your plot.
Questions
-
Pick a theme offered by the ggthemes package and apply it to the last plot you made.
-
Make the axis labels of your plot blue and bolded.
Answers
Solution 1:
ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1 / 20) +
theme_classic() +
theme(legend.position = "bottom") +
guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1)))
Solution 2:
ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1 / 20) +
theme_classic() +
theme(legend.position = "bottom",
axis.text.x = element_text(face="bold", color="blue",
size=11, angle=45),
axis.text.y = element_text(face="bold", color="blue",
size=11, angle=45)) +
guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1)))
Layout
The patchwork package allows you to combine separate plots into the same graphic. To place two plots next to each other, you can simply add (+
) them to each other.
Questions
-
What happens if you omit the parentheses in the following plot layout.
Can you explain why this happens?#| fig-show: hide p1 <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + labs(title = "Plot 1") p2 <- ggplot(mpg, aes(x = drv, y = hwy)) + geom_boxplot() + labs(title = "Plot 2") p3 <- ggplot(mpg, aes(x = cty, y = hwy)) + geom_point() + labs(title = "Plot 3") (p1 | p2) / p3
-
Using the three plots from the previous exercise, recreate the following patchwork - Three plots: Plot 1 is a scatterplot of highway mileage versus engine size. Plot 2 is side-by-side box plots of highway mileage versus drive train. Plot 3 is side-by-side box plots of city mileage versus drive train. Plots 1 is on the first row. Plots 2 and 3 are on the next row, each span half the width of Plot 1. Plot 1 is labelled "Fig. A", Plot 2 is labelled "Fig. B", and Plot 3 is labelled "Fig. C".
Answers
Solution 1:
p1 <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
labs(title = "Plot 1")
p2 <- ggplot(mpg, aes(x = drv, y = hwy)) +
geom_boxplot() +
labs(title = "Plot 2")
p3 <- ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
labs(title = "Plot 3")
p1 | p2 / p3 # removed () plots p1 stand-alone on the left and stacks p2 above p3.
Solution 2:
#| fig-width: 7
#| fig-asp: 0.8
p1 / (p2 + p3) +
plot_annotation(
tag_levels = c("A"),
tag_prefix = "Fig. ",
tag_suffix = ":"
)
Reference
Wickham, H., Mine Çetinkaya-Rundel and Grolemund, G. (2023) R for data science. 2nd ed. Sebastopol, CA: O’Reilly Media.