January 16 2020

What is R? What is RStudio?

  • The term R is used to refer to both the programming language and the software that interprets the scripts written using it.
  • RStudio is an interactive development environment which as this defintion suggests assists us in using R

Understanding R

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

— John Chambers

Reproducible science

“Reproducibility involves being able to recalculate the exact numbers in a data analysis using the code and raw data provided by the analyst…Reproducibility should not be confused with “correctness” of a data analysis. A data analysis can be fully reproducible and recreate all numbers in an analysis and still be misleading or incorrect."

Jeff Leek, The Elements of Data Analytic Style

Reproducible R

Don’t save your workspace, save your code. Consider your scripts/notebooks as real, not the objects in your environment.

Where does your data live? Absolute path

Where does your data live? Relative path

Where does your data live? Home directory path

Where does your data live? Working directory

The working directory. This is where R looks for files that you ask it to load

Reproducible R

R Projects keep all the files associated with a project together — input data, R scripts, analytical results, figures.

Naming things

  1. Machine readable (no white space, punctuation, upper AND lower-case…)
  2. Human readable (makes sense in 6 months or 2 years time)
  3. Plays well with default ordering (numerical or date order)

Getting help

  • Google: ‘Typically adding “R” to a query is enough to restrict it to relevant results’. Hadley Wickham
  • Check out the help pages using ?function_name e.g. ?mean
  • Join RStudio Community
  • Learn how to make a reproducible example

Assigning objects

names are labels bound to objects

Atomic vectors

One-dimensional groups, the key building blocks of R objects

Indexing atomic vectors

# Make a character vector using the combine function, c()
cards <- c("ace", "king", "queen", "jack", "ten")

# Return the values of cards
cards
## [1] "ace"   "king"  "queen" "jack"  "ten"
# Return the third value of cards
# Indexing starts at 1
cards[3]
## [1] "queen"

Type of atomic vectors

# Make a character vector using the combine function, c()
cards <- c("ace", "king", "queen", "jack", "ten")
# Make a numeric vector using seq()
my_sequence <- seq(1:7)

# Check the type of each vector
typeof(cards)
## [1] "character"
typeof(my_sequence)
## [1] "integer"

Data frames

Two dimensional versions of lists, each atomic vector is the same length.

Tidy data

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each cell contains a single value

dplyr::filter()

dplyr::arrange()

dplyr::select()

dplyr::mutate()

dplyr::summraise()

dplyr::summraise() and dplyr::group_by()

dplyr::pivot_longer()

dplyr::pivot_wider()

dplyr::inner_join()

dplyr::left_join()

dplyr::right_join()

dplyr::full_join()

dplyr::semi_join()

dplyr::anti_join()

Functions

Name,body and set of arguments

# Roll two dice function
roll <- function(){
  die <- 1:6
  dice <- sample(die, size = 2, replace = TRUE)
  sum(dice)
}

Functions

GDP calculator

# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat, yr=NULL, ctry=NULL) {
  # Is there a year argument?
  if(!is.null(yr)) {
   dat <- dat %>% filter(year == yr)
  }
  # Is there a country argument?
  if (!is.null(ctry)) {
   dat <- dat %>% filter(country == ctry)
  }
  # Create new GDP column
  new <- dat %>% mutate(gdp = pop * gdpPercap)
  return(new)
}