## Overview

The aim of this tutorial is to demonstrate one way to create a colour palette for use for figures throughout a document. It deals with this situation:

1. Creating a palette for n catergories.
2. Creating a gradient of m shades for each catergory.

The idea here is that, for example, you have experiments using nine different chemicals plus a control condition (ten catergories) for five different concentrations (including zero) and you want to create a palette of colours for each chemical, and also be able to change the shade according to the concentration of each drug.

By setting up a palette and corresponding gradients, one can use them to create figures that follow a consistent scheme for as many figures as you need, such as when writing a thesis dissertation.

Prerequistes: I assume some familiarity with tidyverse packages: ggplot2, dplyr and purr.

If these are new to you, check out R for Data Science for an introduction, or my BSPR workshop materials.

The tidyverse can be installed by running install.packages("tidyverse").

## Creating a vector of colours to use as a custom palette

Colours in R can be specified by name or by HEX code e.g. #F2EDEE is a shade of grey. Here I use HEX codes.

I want hue is a useful site for creating and choosing colours.

I chose a palette of ten colours and asked for them to be colourblind friendly. I then copied the HEX codes to create a character vector called my_colours.

# Hex codes from 'I want hue'
my_colours <- c("#f2edee","#8a79f4","#f0d359","#ff70c3","#53ecc0",
"#af0043","#bae179","#ec5646","#4f9059","#af3c00")

Plotting my_colours looks like this:

## Creating a set of gradients from a custom palette

As I also want to create gradients for nine of my_colours. The first colour in my_colours will be used for the control catergory and is the grey base for my gradient representing zero. For convenience I assgin it to another character vector: gradient_base.

I can use my_colours to create a gradient palette using the map function with colorRampPalette() to create a list called my_gradients.

The map() function takes my_colours as the first argument and then I’ve created a function that takes each value of my_colours and passes it in turn to colorRampPalette() along with gradient_base to make a gradient with five steps.

This returns a list where each element of my_gradients is a new character vector with five shades, one for each element of my_colours starting with the gradient_base up to the my_colour as the highest value.

(The same thing can be acheived with lapply()).

# Load the tidyverse packages
library(tidyverse)
# Lowest colour for all gradients

# Create a list of gradients for each colour 2 to 10 over five steps from
# gradient_base grey (low) to colour (high)
function(x) colorRampPalette(c(gradient_base,x))(5))

Plotting my nine gradients yields this:

## Using a custom palette with some experimental data and ggplot2

To use the palette we can use my_colours directly for any plot using ggplot2 requiring up to ten colours by using scale_fill_manual(values = my_colours).

I’ve created an example data set worm-sample-data-10-11-2018.csv loaded as dat that has control readings (no treatment) and readings for four drugs that were used to treat nematode worms, where the movements is a measure of the motion of the worm. The drugs were applied at different micromolar concentrations in different experiments.

### Exploring the dataset

Looking at the data we have 529 observations on 4 variables. Three variables are factors and the movements is numerical.

glimpse(dat)
## Observations: 529
## Variables: 4
## $concentration <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ##$ drug          <fct> Control, Control, Control, Control, Control, Con...
## $experiment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ##$ movements     <dbl> 281, 275, 244, 276, 255, 250, 265, 259, 279, 282...

There are eight experiments and different experiments use different drugs. In fact each drug has two experiments associated with it.

dat %>% filter(concentration != 0) %>%
group_by(drug,experiment) %>%
summarise()
## # A tibble: 8 x 2
## # Groups:   drug [?]
##   drug   experiment
##   <fct>  <fct>
## 1 Drug 1 1
## 2 Drug 1 2
## 3 Drug 2 3
## 4 Drug 2 4
## 5 Drug 3 5
## 6 Drug 3 6
## 7 Drug 4 7
## 8 Drug 4 8

And various concentrations for the different drugs:

dat %>% filter(concentration != 0) %>%
group_by(drug,concentration) %>%
summarise()
## # A tibble: 13 x 2
## # Groups:   drug [?]
##    drug   concentration
##    <fct>  <fct>
##  1 Drug 1 0.5
##  2 Drug 1 1
##  3 Drug 1 10
##  4 Drug 1 25
##  5 Drug 2 1
##  6 Drug 3 0.5
##  7 Drug 3 1
##  8 Drug 3 0.25
##  9 Drug 3 1.5
## 10 Drug 4 0.5
## 11 Drug 4 1
## 12 Drug 4 3.75
## 13 Drug 4 2

All the drugs appeared to have been used at 1 uM concentration in all experiments, but let’s check:

dat %>% filter(concentration != 0 & concentration == 1) %>%
group_by(experiment,drug,concentration) %>%
summarise()
## # A tibble: 8 x 3
## # Groups:   experiment, drug [?]
##   experiment drug   concentration
##   <fct>      <fct>  <fct>
## 1 1          Drug 1 1
## 2 2          Drug 1 1
## 3 3          Drug 2 1
## 4 4          Drug 2 1
## 5 5          Drug 3 1
## 6 6          Drug 3 1
## 7 7          Drug 4 1
## 8 8          Drug 4 1

## Plot the 1 uM treatments

Having seen that all the experiments have a 1 uM treatment, I’ll use my palette to plot the comparison.

Here I filter dat for the rows for the control and the 1 uM dose, then I group the output according to the drug and the concentration to pool the experiments. This is passed to ggplot to creare a dynamite plot using the stat_summary() geom.

The colour of the bars is provide by the fill aesthetic which is passed scale_fill_manual(values = my_colours).

Note: I wouldn’t generally reccomend dynamite plots as they obsure the data by hiding the individual datapoints. However they are commonly used in biology, hence I provide that example here.

dat %>% filter(concentration == 0 | concentration == 1) %>%
group_by(drug,concentration) %>%
ggplot(aes(drug,movements, fill = drug)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
scale_fill_manual(values = my_colours) +
ggtitle("1 uM treatment comparison") +
xlab("") +
theme_minimal()

So we can see that Drug 1 appears to have the greatest effect on the motion of the worms.

In the next example, I plot Drug 1 to explore the effect of different concentrations of a single drug on the motion of the worms.

This time I filter for the Control and Drug 1 rows and then plot the movements against concentration, also assiging concentration to the fill aesthetic.

scale_fill_manual() this time uses my_gradients[[1]] which corresponds with the first colour gradient in the list I made earlier. Remember lists use [[]] indexing.

Note: I used the first colour in my_colours for the Control condition and also used the same colour for the lowest value in each gradient. This means that the Control condition will always be the same grey colour. This is why I didn’t make a gradient for my_colours[1], I don’t need it and it’s easier to remember that my_gradients[[1]] goes with Drug 1 even though Drug 1 corresponds with my_colours[2].

dat %>% filter(drug == "Control" | drug == "Drug 1") %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
ggtitle("Drug 1 treatment comparison") +
theme_minimal()

## Combining multiple plots with gradients

There may be a neater way to do this, but my solution was to use the cowplot package.

First I create a plot as above for each of the drugs of interest, remembering which of my_gradients to use for each drug, and then combine them using plot_grid from the cowplot package.

# Load cowplot for mutli-panel figures
library(cowplot)

# Assign Drug 1 plot to p1
p1 <- dat %>% filter(drug == "Control" | drug == "Drug 1") %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
ggtitle("Drug 1 treatment comparison") +
theme_minimal()

# Assign Drug 3 plot to p3
p3 <- dat %>% filter(drug == "Control" | drug == "Drug 3") %>%
group_by(experiment) %>%
ggplot(aes(concentration,movements, fill = concentration)) +
stat_summary(geom = "bar", fun.y = "mean", position = "dodge") +
stat_summary(geom = "errorbar", fun.data = "mean_se", width = .2) +
plot_grid(p1,p3)