map()1. Read in all Excel files from the directory and create individual data frames for each file.
Review the files in our directory
here::here() function which implicitly sets my directory to
the top level (root directory) of my current project. Then any
subsequent folders can be listed in descending order. In this case my
folder “attendance” is within another folder called “import-data” which
is located in the top level of my project directory. You could also just
list your entire file path.fs::dir_ls(path = here::here("import-data", "attendance"))
[1] "C:/Users/Me/project/import-data/attendance/school-1_attendance.csv"
[2] "C:/Users/Me/project/import-data/attendance/school-2_attendance.xlsx"
[3] "C:/Users/Me/project/import-data/attendance/school-3_attendance.xlsx"
[4] "C:/Users/Me/project/import-data/attendance/school-4_attendance.xlsx"
[5] "C:/Users/Me/project/import-data/attendance/school-5_attendance.xlsx"
We see that there are 4 Excel files in the directory and one csv file. We only want to import the Excel files.
There are several ways to do this.
fs::dir_ls(). By
adding the fs::dir_ls() argument glob = “*.xlsx”
we are saying we only want to read in the Excel files with the extension
“.xlsx”. In this case, while there are 5 files in the directory, only 4
of them were Excel files, one file was csv format.Then when I use that path in the
readxl::read_excel() function, I use base R bracket
indexing [] to extract the element we are referencing (only
shown for 2 of the 4 files here)
Note: In the function readxl::read_excel() I added
the argument na= because I know the files I am reading in use
“*” to denote missing values.
# Create a list of files
files <- fs::dir_ls(here::here("import-data", "attendance"), glob = "*.xlsx")
# Read in each file
File1 <- readxl::read_excel(path = files[1], na = "*")
File2 <- readxl::read_excel(path = files[2], na = "*")
f <- function(x) {
fs::dir_ls(here::here("import-data", "attendance"), glob = "*.xlsx")[x] %>%
readxl::read_excel(path = ., na = "*")
}
File1 <- f(1)
File2 <- f(2)
File3 <- f(3)
File4 <- f(4)
purrr::map() in conjunction with the
base::list2env() function, which creates environment
variables from a named list.Note: We use fs::dir_ls() to read the names of our
files into a vector and we use purrr::set_names() (to
create a named vector), in conjunction with
base::basename() and stringr::str_remove() to
create more usable names.
Note: We use purrr::map() to first create a list of
our files. The main arguments you include for purrr::map()
are .x, a list or atomic vector (files to read), and .f, which in this
scenario is a function. Below “.” refers to the character vector we
created in the previous step.
# Create our list of data frames
my_file_list <- fs::dir_ls(here::here("import-data", "attendance"),
glob ="*.xlsx") %>%
purrr::set_names(basename(stringr::str_remove(., ".xlsx"))) %>%
purrr::map(., .f = readxl::read_excel, na="*")
# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)
<environment: R_GlobalEnv>
# Review all objects in environment
ls()
[1] "my_file_list" "school-2_attendance" "school-3_attendance"
[4] "school-4_attendance" "school-5_attendance"
2. Read in all Excel files from the directory, select the same variables from each file, and create individual data frames for each file.
Using the same files as above, if we only cared about certain
variables in our data frames, we could add a
dplyr::select() statement in a purrr::map()
function to select our variables from our list of data frames before
creating our environment variables.
tidyselection helper to grab similar names across files,
such as tidyselect::contains().# Create our list of data frames
my_file_list <- fs::dir_ls(path = here::here("import-data", "attendance"),
glob ="*.xlsx") %>%
purrr::set_names(basename(stringr::str_remove(., ".xlsx"))) %>%
purrr::map(., .f = readxl::read_excel, na="*") %>%
purrr::map(., .f = . %>% dplyr::select(YEAR, tidyselect::contains("TOTAL")))
# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)
<environment: R_GlobalEnv>
# Review names of one object
names(`school-2_attendance`)
[1] "YEAR" "PROPORTIONAL_ATTENDANCE_TOTAL_90ABV"
[3] "PROPORTIONAL_ATTENDANCE_TOTAL" "PROPORTIONAL_ATTENDANCE_TOTAL_PCT"
Return to Import Files