4  Importing and exporting data

Summary

This chapter covers importing tabular data from plain text files and spreadsheets using the tidyverse packages readr and readxl. And exporting data using readr and writexl.

For detailed advice and examples, I direct you to two sections of R4DS, data import and spreadsheets.

4.1 Organising your data in spreadsheets

Data Organization in Spreadsheets by Karl W. Broman & Kara H. Woo is full of practical advice and context. It is well worth reading for helping you think about best practices for organising your data for yourself, and when working with others.

As per previous discussion about naming things (Section 3.5.2), the recommendation for naming variables (column names) is to use snake case. This means replacing spaces with underscores _ and typically using lower case only. So for example, the non-syntactic variable Student ID in snake case would be student_id.

4.2 Plain text files

Plain text separated by a delimiter, often a comma, tab or space. That is to say they contain only the data, as text information. File formats .csv and .tsv indicate comma separated variables and tab separated variables respectively.

Here is some comma separated variable csv data about some books in it’s raw state:

title,type,genre,pages,year,publisher
Plenty,Non-fiction,Cookery,288,2010,Chronicle Books
In a Sunburned Country,Non-fiction,Travel,335,2000,Broadway Books
The Catcher in the Rye,Fiction,Literature,234,1951,"Little, Brown and Company"
A Time of Gifts,Non-fiction,Travel,321,1977,John Murray

Here is the same data formatted as a table:

title

type

genre

pages

year

publisher

Plenty

Non-fiction

Cookery

288

2,010

Chronicle Books

In a Sunburned Country

Non-fiction

Travel

335

2,000

Broadway Books

The Catcher in the Rye

Fiction

Literature

234

1,951

Little, Brown and Company

A Time of Gifts

Non-fiction

Travel

321

1,977

John Murray

In Patagonia

Non-fiction

Travel

204

1,977

Jonathan Cape

4.2.1 Importing plain text

Listing 4.1: Importing a plain text csv file from the data folder and assigning it to an object called books
books <- read_csv("data/books-2024-04-30.csv")

4.2.2 Exporting plain text

Listing 4.2: Writing the penguins data to a csv file in the data folder with write_csv
penguins |> 
write_csv("data/palmer-penguins-data.csv")

4.3 Excel spreadsheets

Excel files saves files into its own proprietary format .xls or .xlsx that holds information in addition to the data itself (metadata). For reading and writing excel files in R, tidyverse readxl package is the place to start: readxl homepage.

When we install the tidyverse packages, readxl is installed, but not loaded by default when we load library(tidyverse) so we have to load it separately, library(readxl).

For exporting to MS Excel, writexl from R open science writexl has to be installed separately

Listing 4.3: Install writexl package from ROpenSci for writing to MS Excel
install.packages("writexl")

4.3.1 Importing Excel sheets

Listing 4.4: Importing a Excel file from the data folder and assigning it to an object called books
books_xl <- read_xlsx("data/books-2024-05-08.xlsx")

4.3.2 Exporting Excel sheets

Listing 4.5: Writing the penguins data to a csv file in the data folder with write_csv
penguins |> 
write_xlsx("data/palmer-penguins-data.xlsx")

4.4 Cleaning names