Machine Learning for MHC I peptide classification

Using the R interface to Keras / TensorFlow

Alistair Bailey

1st May 2018

Overview

This notebook is based upon a blog post by Leon Eyrich Jessen: https://tensorflow.rstudio.com/blog/dl-for-cancer-immunotherapy.html

Leon’s blog shows how to build a deep learning model that classifies the strength of binding of different peptides to a single MHC I allotype (HLA-A*02:01).

Machine learning in general attempts to find transformations of input data into more useful representations of that data to solve a problem.

Deep learning does this by successive data transformations via layers, the depth referring to the number of layers of transformations, not the extent of insight gained.

This layered approach is sometimes referred to as a neural network, however as François Chollet, one of the authors of Deep Learning with R tweeted:

Here I present a variation on Leon’s theme after I was kindly invited to present at the Computational Biology Club Meeting at the University of Southampton on April 25th 2018.

I took the invitation as an opportunity to learn a bit more about how peptide prediction tools such as NetMHC work.

What follows is a toy model that attempts to classify peptides from my some of my experiments according to one of five MHC class I allotypes present in the cells I use.

The repository with all the code and datasets is here: https://github.com/ab604/mhc_tensorflow

Some background

MHC I allotypes (allo means other) refers to proteins expressed by the most diverse gene in the human genome. We each have up to six MHC I gene variants, or alleles, depending upon those carried by our parents and therefore almost all our cells express up to six MHC I allotype proteins: Two A, two B and two C allotypes.

MHC I proteins have evolved to sample fragments of other proteins from the complement of proteins inside our cells and present them to our immune system at the cell surface. These small fragments are called peptides and in MHC I processing, the predominant length of a peptide MHC I selects is 9 amino acids.

Pamela Bjorkman solved the first MHC I structure in 1987 and the peptide can be seen bound in the picture below in red.