title |
type |
genre |
pages |
year |
publisher |
|---|---|---|---|---|---|
Plenty |
Non-fiction |
Cookery |
288 |
2,010 |
Chronicle Books |
In a Sunburned Country |
Non-fiction |
Travel |
335 |
2,000 |
Broadway Books |
The Catcher in the Rye |
Fiction |
Literature |
234 |
1,951 |
Little, Brown and Company |
A Time of Gifts |
Non-fiction |
Travel |
321 |
1,977 |
John Murray |
In Patagonia |
Non-fiction |
Travel |
204 |
1,977 |
Jonathan Cape |
4 Importing and exporting data
For detailed advice and examples, I direct you to two sections of R4DS, data import and spreadsheets.
4.1 Organising your data in spreadsheets
Data Organization in Spreadsheets by Karl W. Broman & Kara H. Woo is full of practical advice and context. It is well worth reading for helping you think about best practices for organising your data for yourself, and when working with others.
As per previous discussion about naming things (Section 3.5.2), the recommendation for naming variables (column names) is to use snake case. This means replacing spaces with underscores _ and typically using lower case only. So for example, the non-syntactic variable Student ID in snake case would be student_id.
4.2 Plain text files
Plain text separated by a delimiter, often a comma, tab or space. That is to say they contain only the data, as text information. File formats .csv and .tsv indicate comma separated variables and tab separated variables respectively.
Here is some comma separated variable csv data about some books in it’s raw state:
title,type,genre,pages,year,publisher
Plenty,Non-fiction,Cookery,288,2010,Chronicle Books
In a Sunburned Country,Non-fiction,Travel,335,2000,Broadway Books
The Catcher in the Rye,Fiction,Literature,234,1951,"Little, Brown and Company"
A Time of Gifts,Non-fiction,Travel,321,1977,John MurrayHere is the same data formatted as a table:
4.2.1 Importing plain text
data folder and assigning it to an object called books
books <- read_csv("data/books-2024-04-30.csv")4.2.2 Exporting plain text
penguins data to a csv file in the data folder with write_csv
penguins |>
write_csv("data/palmer-penguins-data.csv")4.3 Excel spreadsheets
Excel files saves files into its own proprietary format .xls or .xlsx that holds information in addition to the data itself (metadata). For reading and writing excel files in R, tidyverse readxl package is the place to start: readxl homepage.
When we install the tidyverse packages, readxl is installed, but not loaded by default when we load library(tidyverse) so we have to load it separately, library(readxl).
For exporting to MS Excel, writexl from R open science writexl has to be installed separately
install.packages("writexl")4.3.1 Importing Excel sheets
data folder and assigning it to an object called books
books_xl <- read_xlsx("data/books-2024-05-08.xlsx")4.3.2 Exporting Excel sheets
penguins data to a csv file in the data folder with write_csv
penguins |>
write_xlsx("data/palmer-penguins-data.xlsx")