Package: haven


Function: haven::read_sav()


  • Note: I am only showing how to read in SPSS files here but the haven package also has functions to read in Stata and SAS files.

1. Read in an SPSS file

Type ?read_sav in the console for available arguments

d <- haven::read_sav("name-of-import.sav")

2. Read in an SPSS file with NA value labels

  • Note: If the file has user labelled missing values that you want to save, add the option user_na=TRUE. Otherwise the labelled NA values will be removed and read in as NA.
d <- haven::read_sav("name-of-import.sav", user_na=TRUE)

Package: readxl


Function: readxl::read_excel()


1. Read in an Excel file

Type ?read_excel in the console for available arguments

  • Note: Default for col_names argument is TRUE so the first row of your data will be treated a your column names unless you change the argument to FALSE
d <- readxl::read_excel("name-of-import.xlsx")

2. Read in “Sheet 2” of an Excel file and skip the first 3 rows and tell R that 0 is to be read in as NA

Add the argument sheet= to call the sheet you want to read in, add the argument skip= to denote the number of rows you wish to skip, and add the argument na= to denote what values should be considered NA.

  • Note: Be aware that if your column names are in the first row of your data, using the “skip” argument will remove that first row with column names because the “skip” happens before data is read. Also, if you leave the default col_names=TRUE, you column names will become whatever data is in the 4th row of your Excel sheet.
d <- readxl::read_excel("name-of-import.xlsx", sheet="Sheet2", skip=3, na="0")

3. Read in an Excel file and skip the second row only

Here the second row of the file contains metadata that I don’t want in my dataframe.

First I am reading the file in and only grabbing the column names.

Then I am reading the file in again, this time skipping the first two rows of data and then reassigning the names back to the dataframe.

This is the preferred method as opposed to reading the data in first and then dropping the second row because if we read the data in with the metadata, all of our columns will be read in as character columns. Instead, if we remove the 2nd row during import, our columns will be imported with the appropriate class.

d_names <- readxl::read_excel("name-of-import.xlsx") %>%
  names()

d <- readxl::read_excel("name-of-import.xlsx", skip = 2, 
                        col_names = d_names)

Package: readr


Function: readr::read_csv()


1. Read in an csv file

Type ?read_csv in the console for available arguments

  • Note: Default for col_names is TRUE.
d <- readr::read_csv("name-of-import.csv")

2. Read in a csv file with no column names and assign your own column names

d <- readr::read_csv("name-of-import.csv", 
                     col_names = c("stu_id", "grade", "test1", "test2", "test3", "test4"))

3. Read in a csv file and specify the “col_types”

You can specify one or all column types.

d <- readr::read_csv("name-of-import.csv", 
                     col_types = cols(stu_id = col_double(), 
                                   grade = col_factor("6", "7", "8"), 
                                   test1 = col_double(), test2 =  col_double(), 
                                   test3 = col_double(), test4 = col_double()))

Function: readr::read_tsv()


1. Read in a tab delimited file.

d <- read_tsv('name-of-import.txt', col_names = TRUE)

Return to Import Files