Package: dplyr


Function: if_else()


Note: dplyr::if_else() will evaluate in order, so proceed from most specific to most general. The function will not write over a previous evaluation even if something meets a new criteria.

Note: dplyr::if_else() will not evaluate NAs as TRUE or FALSE and therefore will return NAs. If you want to specifically evaluate NAs you will need to call out NAs.


1. Collapse all “Google Meet” in character var (online_platform) into one category

Review the data (d6)

# A tibble: 7 x 2
  tch_id online_platform
   <dbl> <chr>          
1    105 google         
2    106 meet           
3    107 Zoom           
4    108 Zoom           
5    109 Google Meet    
6    112 Microsoft Teams
7    113 <NA>           

Collapse all spellings of Google Meet into one uniform category

  • Note: Using stringr::str_detect() we can detect different spellings of google meet. I enter a pattern (“google|meet”) that means to look for any value in online_platform that contains the word “google” OR “meet” and recategorize those to “Google Meet”
d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::if_else(
        stringr::str_detect(online_platform, "google|meet"),
        "Google Meet",
        online_platform
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               

We can also do this using %in% instead of stringr::str_detect()

d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::if_else(
        online_platform %in% c("google", "meet", "Google Meet"),
        "Google Meet",
        online_platform
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               

2. Collapse all “Google Meet” in character var (online_platform) into one category and address case sensitivity

Review the data (d12)

# A tibble: 7 x 2
  tch_id online_platform
   <dbl> <chr>          
1    105 Google         
2    106 meet           
3    107 Zoom           
4    108 Zoom           
5    109 Google Meet    
6    112 Microsoft Teams
7    113 <NA>           

Collapse all spellings of Google Meet into one uniform category

  • Note: Using stringr::str_detect() we can detect different spellings of google meet. It is important to note that this function is case sensitive. As we saw above in the data, the “google|meet” pattern we used before will no longer catch our upper case “Google”. So there are a number of ways we can deal with this. The first way we will try is by adding stringr::regex() and adding the argument ignore_case = TRUE.
d12 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::if_else(
        stringr::str_detect(
          online_platform,
          stringr::regex("google|meet", ignore_case = TRUE)
        ),
        "Google Meet",
        online_platform
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 Google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               

We can also do this by adding (?i) to the pattern, which means match case insensitive.

d12 %>%
  dplyr::mutate(
    online_platform_new = dplyr::if_else(
      stringr::str_detect(online_platform, "(?i)google|meet"),
      "Google Meet",
      online_platform
    )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 Google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               

3. Collapse a numeric variable (Var2) into discrete numeric categories

Review the data (d10)

# A tibble: 5 x 2
  Var1   Var2
  <chr> <dbl>
1 a         1
2 b        16
3 c         3
4 d         4
5 e        NA

Create an indicator variable. 1 = 8 or lower, 0 = 9 or higher

*Note: NA will remain as NA because if_else does not evaluate it as TRUE or FALSE.

d10 %>%
  dplyr::mutate(Var2_new = dplyr::if_else(Var2 <= 8, 1, 0))
# A tibble: 5 x 3
  Var1   Var2 Var2_new
  <chr> <dbl>    <dbl>
1 a         1        1
2 b        16        0
3 c         3        1
4 d         4        1
5 e        NA       NA

*Note: If you want to evaluate NA, you will need to specifically add it

d10 %>%
  dplyr::mutate(Var2_new = dplyr::if_else(Var2 <= 8 |
                                            is.na(Var2), 1, 0))
# A tibble: 5 x 3
  Var1   Var2 Var2_new
  <chr> <dbl>    <dbl>
1 a         1        1
2 b        16        0
3 c         3        1
4 d         4        1
5 e        NA        1

You could also specifically add NA this way as well

d10 %>%
  dplyr::mutate(Var2_new = dplyr::if_else(Var2 > 8 &
                                            !is.na(Var2), 0, 1))
# A tibble: 5 x 3
  Var1   Var2 Var2_new
  <chr> <dbl>    <dbl>
1 a         1        1
2 b        16        0
3 c         3        1
4 d         4        1
5 e        NA        1

Function: case_when()


Note: dplyr::case_when() will evaluate in order, so proceed from most specific to most general. The function will not write over a previous evaluation even if something meets a new criteria.


1. Collapse all “Google Meet” into one category

Review the data (d6)

# A tibble: 7 x 2
  tch_id online_platform
   <dbl> <chr>          
1    105 google         
2    106 meet           
3    107 Zoom           
4    108 Zoom           
5    109 Google Meet    
6    112 Microsoft Teams
7    113 <NA>           

Collapse all spellings of Google Meet into one uniform category

d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::case_when(
        stringr::str_detect(online_platform, "google|meet") ~ "Google Meet",
        TRUE ~ online_platform
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               
  • Note: If you do not add TRUE ~ online_platform, you will get NA for all categories that are not specifically mentioned in the statement.
d6 %>%
  dplyr::mutate(online_platform_new =
                  dplyr::case_when(
                    stringr::str_detect(online_platform, "google|meet") ~ "Google Meet"
                  ))
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            <NA>               
4    108 Zoom            <NA>               
5    109 Google Meet     <NA>               
6    112 Microsoft Teams <NA>               
7    113 <NA>            <NA>               
  • Note: And if you don’t care about the other platforms, you can always collapse the remaining categories into another name, such as “other”. Notice here that our NA value will be recoded into “Other”.
d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::case_when(
        stringr::str_detect(online_platform, "google|meet") ~ "Google Meet",
        TRUE ~ "Other"
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Other              
4    108 Zoom            Other              
5    109 Google Meet     Other              
6    112 Microsoft Teams Other              
7    113 <NA>            Other              
  • Note: If you want to keep NA as NA, while recoding the rest to “other”, you would need to specifically call out NA. For dplyr::case_when() you need to call out the specific type of NA. In this case NA_character_.
d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::case_when(
        is.na(online_platform) ~ NA_character_,
        stringr::str_detect(online_platform, "google|meet") ~ "Google Meet",
        TRUE ~ "Other"
      )
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Other              
4    108 Zoom            Other              
5    109 Google Meet     Other              
6    112 Microsoft Teams Other              
7    113 <NA>            <NA>               

2. Collapse all “Google Meet” into one category and clarify that “google” means Google Hangouts

Review the data (d9)

# A tibble: 10 x 2
   tch_id online_platform
    <dbl> <chr>          
 1    105 google         
 2    106 meet           
 3    107 Zoom           
 4    108 Zoom           
 5    109 Google Meet    
 6    112 Microsoft Teams
 7    113 google meet    
 8    114 google met     
 9    115 hangouts       
10    116 <NA>           

Collapse all spellings of Google Meet into one uniform category and clarify that google means Google Hangouts

  • Note: Here I am adding the base::tolower() function to deal with the varying capitalization. You can also use the solutions we showed above in dplyr::if_else() where we add either (?i) to the pattern or add stringr::regex() around the pattern with the argument ignore_case = TRUE.

  • Note: Note that I am not putting “google” at the top because if I did, all instances of “google” would be recoded to google hangouts.

d9 %>%
  dplyr::mutate(online_platform =
                  tolower(online_platform)) %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::case_when(
        stringr::str_detect(online_platform, "met|meet") ~ "google meet",
        stringr::str_detect(online_platform, "google|hangouts") ~ "google hangouts",
        TRUE ~ online_platform
      )
  )
# A tibble: 10 x 3
   tch_id online_platform online_platform_new
    <dbl> <chr>           <chr>              
 1    105 google          google hangouts    
 2    106 meet            google meet        
 3    107 zoom            zoom               
 4    108 zoom            zoom               
 5    109 google meet     google meet        
 6    112 microsoft teams microsoft teams    
 7    113 google meet     google meet        
 8    114 google met      google meet        
 9    115 hangouts        google hangouts    
10    116 <NA>            <NA>               
  • Note: See how the result would be different if I put “google” at the top. The ordering matters. Once a statement is evaluated, it will not be rewritten.
d9 %>%
  dplyr::mutate(online_platform =
                  tolower(online_platform)) %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::case_when(
        stringr::str_detect(online_platform, "google|hangouts") ~ "google hangouts",
        stringr::str_detect(online_platform, "met|meet") ~ "google meet",
        TRUE ~ online_platform
      )
  )
# A tibble: 10 x 3
   tch_id online_platform online_platform_new
    <dbl> <chr>           <chr>              
 1    105 google          google hangouts    
 2    106 meet            google meet        
 3    107 zoom            zoom               
 4    108 zoom            zoom               
 5    109 google meet     google hangouts    
 6    112 microsoft teams microsoft teams    
 7    113 google meet     google hangouts    
 8    114 google met      google hangouts    
 9    115 hangouts        google hangouts    
10    116 <NA>            <NA>               

3. Collapse a numeric variable (Var2) into discrete character categories

Review the data (d8)

# A tibble: 5 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b        16     0
3 c         3     1
4 d         4     3
5 e        14     0

Collapse Var2 into low, medium, high categories.

  • Note: We use the & operator to say if a value meets both conditions

  • Note: We cannot use dplyr::between because our left side is not >=

d8 %>%
  dplyr::mutate(Var2_new =
                  dplyr::case_when(Var2 > 14 ~ "high",
                                   Var2 > 6 & Var2 <= 14 ~ "medium",
                                   TRUE ~ "low"))
# A tibble: 5 x 4
  Var1   Var2  Var3 Var2_new
  <chr> <dbl> <dbl> <chr>   
1 a         1     2 low     
2 b        16     0 high    
3 c         3     1 low     
4 d         4     3 low     
5 e        14     0 medium  
  • Note: Because dplyr::case_when() evaluates in order, we don’t actually have to add the right side of our “medium” logic because all of the highs have been assigned and will not be written over by the mediums
d8 %>%
  dplyr::mutate(Var2_new =
                  dplyr::case_when(Var2 > 14 ~ "high",
                                   Var2 > 6 ~ "medium",
                                   TRUE ~ "low"))
# A tibble: 5 x 4
  Var1   Var2  Var3 Var2_new
  <chr> <dbl> <dbl> <chr>   
1 a         1     2 low     
2 b        16     0 high    
3 c         3     1 low     
4 d         4     3 low     
5 e        14     0 medium  

Function: recode()


1. Collapse all “Google Meet” into one category

Review the data (d6)

# A tibble: 7 x 2
  tch_id online_platform
   <dbl> <chr>          
1    105 google         
2    106 meet           
3    107 Zoom           
4    108 Zoom           
5    109 Google Meet    
6    112 Microsoft Teams
7    113 <NA>           

Collapse all spellings of Google Meet into one uniform category

d6 %>%
  dplyr::mutate(
    online_platform_new =
      dplyr::recode(online_platform, google = "Google Meet", meet = "Google Meet")
  )
# A tibble: 7 x 3
  tch_id online_platform online_platform_new
   <dbl> <chr>           <chr>              
1    105 google          Google Meet        
2    106 meet            Google Meet        
3    107 Zoom            Zoom               
4    108 Zoom            Zoom               
5    109 Google Meet     Google Meet        
6    112 Microsoft Teams Microsoft Teams    
7    113 <NA>            <NA>               

Return to Recode