Package: purrr


Function: map_dfr()


1. Read in all sheets of an Excel file and bind them into one large data frame.

View the sheets in our workbook

path <- (here::here("import-data", "data", "all-school_attendance.xlsx"))

path %>% 
readxl::excel_sheets()
[1] "Sheet1" "Sheet2" "Sheet3"

We first use readxl::excel_sheets() to grab the sheet names and put them into a character vector. Then we set the names of the sheets in the vector using purrr::set_names(). Last we read the sheets in using readxl::read_excel() and bind them using purrr::map_dfr(). We add the purrr::map_dfr() argument .id to add a new variable that we call source so we know which sheet each row comes from.

data <- path %>% 
  readxl::excel_sheets() %>% 
  purrr::set_names() %>% 
  purrr::map_dfr(readxl::read_excel, path = path, .id = "source")

head(data)
# A tibble: 6 x 26
  source  YEAR PROPORTIONAL_ATTENDANCE_TOTAL_90ABV PROPORTIONAL_ATTENDANCE_TOTAL
  <chr>  <dbl>                               <dbl>                         <dbl>
1 Sheet1  2015                                518.                          534 
2 Sheet1  2016                                519                           534 
3 Sheet1  2017                                528                           551 
4 Sheet1  2018                                529.                          564.
5 Sheet1  2019                                558.                          582.
6 Sheet2  2015                                518.                          534 
# i 22 more variables: PROPORTIONAL_ATTENDANCE_TOTAL_PCT <dbl>,
#   PROPORTIONAL_ATTENDANCE_BLACK_90ABV <chr>,
#   PROPORTIONAL_ATTENDANCE_BLACK <chr>,
#   PROPORTIONAL_ATTENDANCE_BLACK_PCT <chr>,
#   PROPORTIONAL_ATTENDANCE_HISPANIC_90ABV <dbl>,
#   PROPORTIONAL_ATTENDANCE_HISPANIC <dbl>,
#   PROPORTIONAL_ATTENDANCE_HISPANIC_PCT <dbl>, ...
  • Update 2024-06-13: purrr::map_dfr() is now superseded. You can still use it but the preferred method of binding files now is using purrr::list_rbind() from purrr version 1.0.2. Similar to dplyr::bind_rows(), used in purrr::map_dfr(), purrr::list_rbind() matches by variable names and requires variables to be the same types. It also allows some variables to be missing from some sheets (filling with NA as needed).
data <- path %>% 
  readxl::excel_sheets() %>% 
  purrr::set_names() %>%
  purrr::map(readxl::read_excel, path = path) %>%
  purrr::list_rbind(names_to = "sheet_name")

head(data)
# A tibble: 6 x 26
  sheet_name  YEAR PROPORTIONAL_ATTENDANCE_TOTAL_90ABV PROPORTIONAL_ATTENDANCE~1
  <chr>      <dbl>                               <dbl>                     <dbl>
1 Sheet1      2015                                518.                      534 
2 Sheet1      2016                                519                       534 
3 Sheet1      2017                                528                       551 
4 Sheet1      2018                                529.                      564.
5 Sheet1      2019                                558.                      582.
6 Sheet2      2015                                518.                      534 
# i abbreviated name: 1: PROPORTIONAL_ATTENDANCE_TOTAL
# i 22 more variables: PROPORTIONAL_ATTENDANCE_TOTAL_PCT <dbl>,
#   PROPORTIONAL_ATTENDANCE_BLACK_90ABV <chr>,
#   PROPORTIONAL_ATTENDANCE_BLACK <chr>,
#   PROPORTIONAL_ATTENDANCE_BLACK_PCT <chr>,
#   PROPORTIONAL_ATTENDANCE_HISPANIC_90ABV <dbl>,
#   PROPORTIONAL_ATTENDANCE_HISPANIC <dbl>, ...

Return to Import Files