Overview
The aim of this tutorial is to demonstrate one way to create a colour palette for use for figures throughout a document. It deals with this situation:
- Creating a palette for
n
catergories. - Creating a gradient of
m
shades for each catergory.
The idea here is that, for example, you have experiments using nine different chemicals plus a control condition (ten catergories) for five different concentrations (including zero) and you want to create a palette of colours for each chemical, and also be able to change the shade according to the concentration of each drug.
By setting up a palette and corresponding gradients, one can use them to create figures that follow a consistent scheme for as many figures as you need, such as when writing a thesis dissertation.
Prerequistes: I assume some familiarity with tidyverse packages: ggplot2
, dplyr
and purr
.
If these are new to you, check out R for Data Science for an introduction, or my BSPR workshop materials.
The tidyverse can be installed by running install.packages("tidyverse")
.
Creating a vector of colours to use as a custom palette
Colours in R can be specified by name or by HEX code e.g. #F2EDEE
is a shade of grey. Here I use HEX codes.
I want hue is a useful site for creating and choosing colours.
I chose a palette of ten colours and asked for them to be colourblind friendly. I then copied the HEX codes to create a character vector called my_colours
.
# Hex codes from 'I want hue'
my_colours <- c("#f2edee","#8a79f4","#f0d359","#ff70c3","#53ecc0",
"#af0043","#bae179","#ec5646","#4f9059","#af3c00")
Plotting my_colours
looks like this:
Creating a set of gradients from a custom palette
As I also want to create gradients for nine of my_colours
. The first colour in my_colours
will be used for the control catergory and is the grey base for my gradient representing zero. For convenience I assgin it to another character vector: gradient_base
.
I can use my_colours
to create a gradient palette using the map
function with colorRampPalette()
to create a list called my_gradients
.
The map()
function takes my_colours
as the first argument and then I’ve created a function that takes each value of my_colours
and passes it in turn to colorRampPalette()
along with gradient_base
to make a gradient with five steps.
This returns a list where each element of my_gradients
is a new character vector with five shades, one for each element of my_colours
starting with the gradient_base
up to the my_colour
as the highest value.
(The same thing can be acheived with lapply()
).
# Load the tidyverse packages
library(tidyverse)
# Lowest colour for all gradients
gradient_base <- my_colours[1]
# Create a list of gradients for each colour 2 to 10 over five steps from
# gradient_base grey (low) to colour (high)
my_gradients <- map(my_colours[2:10],
function(x) colorRampPalette(c(gradient_base,x))(5))
Plotting my nine gradients yields this:
Using a custom palette with some experimental data and ggplot2
To use the palette we can use my_colours
directly for any plot using ggplot2
requiring up to ten colours by using scale_fill_manual(values = my_colours)
.
I’ve created an example data set worm-sample-data-10-11-2018.csv loaded as dat
that has control readings (no treatment) and readings for four drugs that were used to treat nematode worms, where the movements is a measure of the motion of the worm. The drugs were applied at different micromolar concentrations in different experiments.
Exploring the dataset
Looking at the data we have 529 observations on 4 variables. Three variables are factors and the movements is numerical.
glimpse(dat)
## Observations: 529
## Variables: 4
## $ concentration <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ drug <fct> Control, Control, Control, Control, Control, Con...
## $ experiment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ movements <dbl> 281, 275, 244, 276, 255, 250, 265, 259, 279, 282...
There are eight experiments and different experiments use different drugs. In fact each drug has two experiments associated with it.
dat %>% filter(concentration != 0) %>%
group_by(drug,experiment) %>%
summarise()
## # A tibble: 8 x 2
## # Groups: drug [?]
## drug experiment
## <fct> <fct>
## 1 Drug 1 1
## 2 Drug 1 2
## 3 Drug 2 3
## 4 Drug 2 4
## 5 Drug 3 5
## 6 Drug 3 6
## 7 Drug 4 7
## 8 Drug 4 8
And various concentrations for the different drugs:
dat %>% filter(concentration != 0) %>%
group_by(drug,concentration) %>%
summarise()
## # A tibble: 13 x 2
## # Groups: drug [?]
## drug concentration
## <fct> <fct>
## 1 Drug 1 0.5
## 2 Drug 1 1
## 3 Drug 1 10
## 4 Drug 1 25
## 5 Drug 2 1
## 6 Drug 3 0.5
## 7 Drug 3 1
## 8 Drug 3 0.25
## 9 Drug 3 1.5
## 10 Drug 4 0.5
## 11 Drug 4 1
## 12 Drug 4 3.75
## 13 Drug 4 2
All the drugs appeared to have been used at 1 uM concentration in all experiments, but let’s check:
dat %>% filter(concentration != 0 & concentration == 1) %>%
group_by(experiment,drug,concentration) %>%
summarise()
## # A tibble: 8 x 3
## # Groups: experiment, drug [?]
## experiment drug concentration
## <fct> <fct> <fct>
## 1 1 Drug 1 1
## 2 2 Drug 1 1
## 3 3 Drug 2 1
## 4 4 Drug 2 1
## 5 5 Drug 3 1
## 6 6 Drug 3 1
## 7 7 Drug 4 1
## 8 8 Drug 4 1
Plot the 1 uM treatments
Having seen that all the experiments have a 1 uM treatment, I’ll use my palette to plot the comparison.
Here I filter dat
for the rows for the control and the 1 uM dose, then I group the output according to the drug and the concentration to pool the experiments. This is passed to ggplot
to creare a dynamite plot using the stat_summary()
geom.
The colour of the bars is provide by the fill
aesthetic which is passed scale_fill_manual(values = my_colours)
.
Note: I wouldn’t generally reccomend dynamite plots as they obsure the data by hiding the individual datapoints. However they are commonly used in biology, hence I provide that example here.
dat %>% filter(concentration == 0 | concentration == 1) %>%
group_by(drug,concentration) %>%
ggplot(aes(drug,movements, fill = drug)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
scale_fill_manual(values = my_colours) +
ggtitle("1 uM treatment comparison") +
xlab("") +
theme_minimal()
So we can see that Drug 1 appears to have the greatest effect on the motion of the worms.
Using custom colour gradients
In the next example, I plot Drug 1 to explore the effect of different concentrations of a single drug on the motion of the worms.
This time I filter for the Control and Drug 1 rows and then plot the movements against concentration, also assiging concentration to the fill aesthetic.
scale_fill_manual()
this time uses my_gradients[[1]]
which corresponds with the first colour gradient in the list I made earlier. Remember lists use [[]]
indexing.
Note: I used the first colour in my_colours
for the Control condition and also used the same colour for the lowest value in each gradient. This means that the Control condition will always be the same grey colour. This is why I didn’t make a gradient for my_colours[1]
, I don’t need it and it’s easier to remember that my_gradients[[1]]
goes with Drug 1 even though Drug 1 corresponds with my_colours[2]
.
dat %>% filter(drug == "Control" | drug == "Drug 1") %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
scale_fill_manual(values = my_gradients[[1]]) + # Drug 1 is gradient 1
ggtitle("Drug 1 treatment comparison") +
theme_minimal()
Combining multiple plots with gradients
There may be a neater way to do this, but my solution was to use the cowplot
package.
First I create a plot as above for each of the drugs of interest, remembering which of my_gradients
to use for each drug, and then combine them using plot_grid
from the cowplot
package.
# Load cowplot for mutli-panel figures
library(cowplot)
# Assign Drug 1 plot to p1
p1 <- dat %>% filter(drug == "Control" | drug == "Drug 1") %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
scale_fill_manual(values = my_gradients[[1]]) + # Drug 1 is gradient 1
ggtitle("Drug 1 treatment comparison") +
theme_minimal()
# Assign Drug 3 plot to p3
p3 <- dat %>% filter(drug == "Control" | drug == "Drug 3") %>%
group_by(experiment) %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean", position = "dodge") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
scale_fill_manual(values = my_gradients[[3]]) + # Drug 3 is gradient 3
ggtitle("Drug 3 treatment comparison") +
theme_minimal()
# Use plot_grid to plot both drugs together
plot_grid(p1,p3)
This probably isn’t a very useful comparison as the range of concentrations is different for the two drugs, but gives an idea of what is possible.