4 Importing and exporting data

Summary

This chapter covers importing tabular data from plain text files and spreadsheets using the tidyverse packages readr and readxl. And exporting data using readr and writexl.

For detailed advice and examples, I direct you to two sections of R4DS, data import and spreadsheets.

4.1 Organising your data in spreadsheets

Data Organization in Spreadsheets by Karl W. Broman & Kara H. Woo is full of practical advice and context. It is well worth reading for helping you think about best practices for organising your data for yourself, and when working with others.

As per previous discussion about naming things (Section 3.5.2), the recommendation for naming variables (column names) is to use snake case. This means replacing spaces with underscores _ and typically using lower case only. So for example, the non-syntactic variable Student ID in snake case would be student_id.

4.2 Plain text files

Plain text separated by a delimiter, often a comma, tab or space. That is to say they contain only the data, as text information. File formats .csv and .tsv indicate comma separated variables and tab separated variables respectively.

Here is some comma separated variable csv data about some books in it’s raw state:

title,type,genre,pages,year,publisher
Plenty,Non-fiction,Cookery,288,2010,Chronicle Books
In a Sunburned Country,Non-fiction,Travel,335,2000,Broadway Books
The Catcher in the Rye,Fiction,Literature,234,1951,"Little, Brown and Company"
A Time of Gifts,Non-fiction,Travel,321,1977,John Murray

Here is the same data formatted as a table:

title	type	genre	pages	year	publisher
Plenty	Non-fiction	Cookery	288	2,010	Chronicle Books
In a Sunburned Country	Non-fiction	Travel	335	2,000	Broadway Books
The Catcher in the Rye	Fiction	Literature	234	1,951	Little, Brown and Company
A Time of Gifts	Non-fiction	Travel	321	1,977	John Murray
In Patagonia	Non-fiction	Travel	204	1,977	Jonathan Cape

4.2.1 Importing plain text

Listing 4.1: Importing a plain text csv file from the data folder and assigning it to an object called books

books <- read_csv("data/books-2024-04-30.csv")

4.2.2 Exporting plain text

Listing 4.2: Writing the penguins data to a csv file in the data folder with write_csv

penguins |> 
write_csv("data/palmer-penguins-data.csv")

4.3 Excel spreadsheets

Excel files saves files into its own proprietary format .xls or .xlsx that holds information in addition to the data itself (metadata). For reading and writing excel files in R, tidyverse readxl package is the place to start: readxl homepage.

When we install the tidyverse packages, readxl is installed, but not loaded by default when we load library(tidyverse) so we have to load it separately, library(readxl).

For exporting to MS Excel, writexl from R open science writexl has to be installed separately

Listing 4.3: Install writexl package from ROpenSci for writing to MS Excel

install.packages("writexl")

4.3.1 Importing Excel sheets

Listing 4.4: Importing a Excel file from the data folder and assigning it to an object called books

books_xl <- read_xlsx("data/books-2024-05-08.xlsx")

4.3.2 Exporting Excel sheets

Listing 4.5: Writing the penguins data to a csv file in the data folder with write_csv

penguins |> 
write_xlsx("data/palmer-penguins-data.xlsx")

4.1 Organising your data in spreadsheets

4.2 Plain text files

4.2.1 Importing plain text

4.2.2 Exporting plain text

4.3 Excel spreadsheets

4.3.1 Importing Excel sheets

4.3.2 Exporting Excel sheets

4.4 Cleaning names