# Data Science Workshop

*British Society for Proteomic Research Meeting 2018*

*Alistair Bailey*

*July 16 2018*

# Overview

This book covers:

- An introduction to R and RStudio
- An introduction to tidyverse and base R
- Importing and transforming proteomics data
- Visualisation of proteomics analysis

The analysis is of an example data set of observations for 7702 proteins from cells in three control experiments and three treatment experiments. The observations are signal intensity measurements from the mass spectrometer. These intensities relate the concentration of protein observed in each experiment and under each condition. The analysis transforms the data to examine the effect of treatment on the cellular proteome and visualise the output using a volcano plot , a heatmap, a Venn diagram and peptide sequence logos. Click here to download the csv file.

## Requirements

An up to date version of R (R Core Team 2018) and RStudio (RStudio Team 2018).

If you are new to R, then the first thing to know is that R is a programming language and RStudio is a program for working with R called an integrated development environment (IDE). You can use R without RStudio, but not the other way around. Further details in Chapter 1.1.

Download R here and Download RStudio Desktop here.

These materials were generated using R version 3.5.0.

Once you’ve installed R and RStudio, you’ll also need a few R packages. Packages are collections of functions.

Open RStudio and put the code below into the `Console`

window and press `Enter`

to install these three packages.

```
install.packages(c("plyr","tidyverse","gplots","pheatmap",
"gridExtra","VennDiagram","ggseqlogo"))
```

### References

R Core Team. 2018. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

RStudio Team. 2018. *RStudio: Integrated Development Environment for R*. Boston, MA: RStudio, Inc. http://www.rstudio.com/.