Remove values

Package: stringr

Function: `str_remove()`

1. Remove all “cm” or “CM” from item1

Review the data (d3)

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12cm 
2     2 4cm  
3     3 6cm  
4     4 2.5CM

In this case I only want the number, not the measurement type, so I need to remove all versions of “cm”.

Note: I am using dplyr::mutate() to create a new variable “item1” that will replace the existing “item1”.
Note: I am using | to say remove “cm” AND/OR “CM”.

d3 %>%
  dplyr::mutate(item1 = stringr::str_remove(item1, "cm|CM"))

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12   
2     2 4    
3     3 6    
4     4 2.5

If I did not want to list both the upper and lower case versions of “cm” I could add the pattern argument regex and add “ignore_case = TRUE*.

d3 %>%
  dplyr::mutate(item1 = stringr::str_remove(item1, regex("cm", ignore_case = TRUE)))

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12   
2     2 4    
3     3 6    
4     4 2.5

2. Remove all variations of the word “year” from tch_years

Review the data (d15)

# A tibble: 5 x 2
  tch_id tch_years
   <dbl> <chr>    
1    100 10 years 
2    101 3        
3    102 2.5      
4    103 15 yrs   
5    107 1.5 Years

Here since everything is written as years, we can simply use regex to quickly remove all letters and white space.

Note: Pay attention to what is written. If someone wrote 8 months, then more consideration would be needed for how to deal with this variable.

d15 %>%
  dplyr::mutate(tch_years = stringr::str_remove(tch_years, "\\s\\w+"))

# A tibble: 5 x 2
  tch_id tch_years
   <dbl> <chr>    
1    100 10       
2    101 3        
3    102 2.5      
4    103 15       
5    107 1.5

We could now go one step further and recode this variable to numeric if we wanted to.

3. Remove all instances of the word “and” from item1

Review the data (d4)

# A tibble: 4 x 2
     id item1          
  <dbl> <chr>          
1     1 1, and 2, and 3
2     2 1, and 3       
3     3 3              
4     4 1, and 2, and 3

Notice here, if we simply use stringr::str_remove() we will only remove the first instance of “and”.

d4 %>%
  dplyr::mutate(item1 = stringr::str_remove(item1, "and"))

# A tibble: 4 x 2
     id item1       
  <dbl> <chr>       
1     1 1,  2, and 3
2     2 1,  3       
3     3 3           
4     4 1,  2, and 3

So therefore, if we want to remove all instances, we need to use stringr::str_remove_all()

d4 %>%
  dplyr::mutate(item1 = stringr::str_remove_all(item1, "and"))

# A tibble: 4 x 2
     id item1    
  <dbl> <chr>    
1     1 1,  2,  3
2     2 1,  3    
3     3 3        
4     4 1,  2,  3

4. Read in files and remove the date from the file name

See what files we have

base::basename(fs::dir_ls(here::here("strings", "data")))

[1] "scoresa_2023-01-23.xlsx" "scoresb_2023-01-24.xlsx"

Read in all files from the data folder, remove the underscore and date from the file name, and create individual data frames.

Note: The \\ is used as an escape character so that the “d” is seen as digits rather than the letter d.

# Create our list of data frames

my_file_list <- fs::dir_ls(path = here::here("strings", "data"), 
                glob ="*.xlsx") %>%
  purrr::set_names(basename(stringr::str_remove(., "_\\d{4}-\\d{2}-\\d{2}"))) %>%
  purrr::map(readxl::read_excel)

# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)

<environment: R_GlobalEnv>

# Review all objects in environment
ls()

[1] "my_file_list" "scoresa.xlsx" "scoresb.xlsx"

But there are many ways we can grab date patterns, so we can choose whatever makes the most sense to us. The brackets here are also used as escape character to say look for any number between 0 and 9 that is represented 4 times, 2 times, and 2 times.

# Create our list of data frames
my_file_list <- fs::dir_ls(path = here::here("strings", "data"), 
                glob ="*.xlsx") %>%
  purrr::set_names(basename(stringr::str_remove(., "_[0-9]{4}-{1}[0-9]{2}-{1}[0-9]{2}"))) %>%
  purrr::map(readxl::read_excel)

# Create environment variables from our list
list2env(my_file_list, envir=.GlobalEnv)

<environment: R_GlobalEnv>

Return to Strings

Remove values

Package: stringr

Function: str_remove()

Function: `str_remove()`