title |
type |
genre |
pages |
year |
publisher |
---|---|---|---|---|---|
Plenty |
Non-fiction |
Cookery |
288 |
2,010 |
Chronicle Books |
In a Sunburned Country |
Non-fiction |
Travel |
335 |
2,000 |
Broadway Books |
The Catcher in the Rye |
Fiction |
Literature |
234 |
1,951 |
Little, Brown and Company |
A Time of Gifts |
Non-fiction |
Travel |
321 |
1,977 |
John Murray |
In Patagonia |
Non-fiction |
Travel |
204 |
1,977 |
Jonathan Cape |
4 Importing and exporting data
For detailed advice and examples, I direct you to two sections of R4DS, data import and spreadsheets.
4.1 Organising your data in spreadsheets
Data Organization in Spreadsheets by Karl W. Broman & Kara H. Woo is full of practical advice and context. It is well worth reading for helping you think about best practices for organising your data for yourself, and when working with others.
As per previous discussion about naming things (Section 3.5.2), the recommendation for naming variables (column names) is to use snake case. This means replacing spaces with underscores _
and typically using lower case only. So for example, the non-syntactic variable Student ID
in snake case would be student_id
.
4.2 Plain text files
Plain text separated by a delimiter, often a comma, tab or space. That is to say they contain only the data, as text information. File formats .csv
and .tsv
indicate comma separated variables and tab separated variables respectively.
Here is some comma separated variable csv
data about some books in it’s raw state:
title,type,genre,pages,year,publisher
Plenty,Non-fiction,Cookery,288,2010,Chronicle Books
In a Sunburned Country,Non-fiction,Travel,335,2000,Broadway Books
The Catcher in the Rye,Fiction,Literature,234,1951,"Little, Brown and Company" A Time of Gifts,Non-fiction,Travel,321,1977,John Murray
Here is the same data formatted as a table:
4.2.1 Importing plain text
4.2.2 Exporting plain text
4.3 Excel spreadsheets
Excel files saves files into its own proprietary format .xls
or .xlsx
that holds information in addition to the data itself (metadata). For reading and writing excel files in R, tidyverse readxl
package is the place to start: readxl homepage.
When we install the tidyverse packages, readxl
is installed, but not loaded by default when we load library(tidyverse)
so we have to load it separately, library(readxl)
.
For exporting to MS Excel, writexl
from R open science writexl has to be installed separately