str_remove()1. Remove all “cm” or “CM” from
item1
Review the data (d3)
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 12cm
2 2 4cm
3 3 6cm
4 4 2.5CM
In this case I only want the number, not the measurement type, so I need to remove all versions of “cm”.
Note: I am using dplyr::mutate() to create a new
variable “item1” that will replace the existing “item1”.
Note: I am using | to say remove “cm” AND/OR
“CM”.
d3 %>%
dplyr::mutate(item1 = stringr::str_remove(item1, "cm|CM"))
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 12
2 2 4
3 3 6
4 4 2.5
If I did not want to list both the upper and lower case versions of “cm” I could add the pattern argument regex and add “ignore_case = TRUE*.
d3 %>%
dplyr::mutate(item1 = stringr::str_remove(item1, regex("cm", ignore_case = TRUE)))
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 12
2 2 4
3 3 6
4 4 2.5
2. Remove all variations of the word “year” from
tch_years
Review the data (d15)
# A tibble: 5 x 2
tch_id tch_years
<dbl> <chr>
1 100 10 years
2 101 3
3 102 2.5
4 103 15 yrs
5 107 1.5 Years
Here since everything is written as years, we can simply use regex to quickly remove all letters and white space.
d15 %>%
dplyr::mutate(tch_years = stringr::str_remove(tch_years, "\\s\\w+"))
# A tibble: 5 x 2
tch_id tch_years
<dbl> <chr>
1 100 10
2 101 3
3 102 2.5
4 103 15
5 107 1.5
We could now go one step further and recode this variable to numeric if we wanted to.
3. Remove all instances of the word “and” from
item1
Review the data (d4)
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 1, and 2, and 3
2 2 1, and 3
3 3 3
4 4 1, and 2, and 3
Notice here, if we simply use stringr::str_remove() we
will only remove the first instance of “and”.
d4 %>%
dplyr::mutate(item1 = stringr::str_remove(item1, "and"))
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 1, 2, and 3
2 2 1, 3
3 3 3
4 4 1, 2, and 3
So therefore, if we want to remove all instances, we need to use
stringr::str_remove_all()
d4 %>%
dplyr::mutate(item1 = stringr::str_remove_all(item1, "and"))
# A tibble: 4 x 2
id item1
<dbl> <chr>
1 1 1, 2, 3
2 2 1, 3
3 3 3
4 4 1, 2, 3
4. Read in files and remove the date from the file name
See what files we have
base::basename(fs::dir_ls(here::here("strings", "data")))
[1] "scoresa_2023-01-23.xlsx" "scoresb_2023-01-24.xlsx"
Read in all files from the data folder, remove the underscore and date from the file name, and create individual data frames.
\\ is used as an escape character so that the
“d” is seen as digits rather than the letter d.# Create our list of data frames
my_file_list <- fs::dir_ls(path = here::here("strings", "data"),
glob ="*.xlsx") %>%
purrr::set_names(basename(stringr::str_remove(., "_\\d{4}-\\d{2}-\\d{2}"))) %>%
purrr::map(readxl::read_excel)
# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)
<environment: R_GlobalEnv>
# Review all objects in environment
ls()
[1] "my_file_list" "scoresa.xlsx" "scoresb.xlsx"
But there are many ways we can grab date patterns, so we can choose whatever makes the most sense to us. The brackets here are also used as escape character to say look for any number between 0 and 9 that is represented 4 times, 2 times, and 2 times.
# Create our list of data frames
my_file_list <- fs::dir_ls(path = here::here("strings", "data"),
glob ="*.xlsx") %>%
purrr::set_names(basename(stringr::str_remove(., "_[0-9]{4}-{1}[0-9]{2}-{1}[0-9]{2}"))) %>%
purrr::map(readxl::read_excel)
# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)
<environment: R_GlobalEnv>
Return to Strings