Package: stringr


Function: str_c()


1. Combine two variables (school and district) with a separator (_)

Review the data (d9)

# A tibble: 4 x 3
     id school  district 
  <dbl> <chr>   <chr>    
1     1 schoola districtb
2     2 schoolb districtb
3     3 schoolc districte
4     4 schoold districte

I will first use dplyr::mutate() to create my new variable which I am calling sch_dist.

I can then use my stringr::str_c() function to combine or concatenate the two variables which I want to separate by a “,” so I will add the argument collapse = “,”

d9 %>%
  dplyr::mutate(sch_dist = stringr::str_c(school, district, sep = ","))
# A tibble: 4 x 4
     id school  district  sch_dist         
  <dbl> <chr>   <chr>     <chr>            
1     1 schoola districtb schoola,districtb
2     2 schoolb districtb schoolb,districtb
3     3 schoolc districte schoolc,districte
4     4 schoold districte schoold,districte

2. Combine two variables (item1 and measure) with no separator

Review the data (d10)

# A tibble: 4 x 3
     id item1 measure
  <dbl> <dbl> <chr>  
1     1    12 in     
2     2    10 in     
3     3    15 cm     
4     4    20 cm     

Again, I will use dplyr::mutate() first to create my new variable item1_measure.

Then I can use my stringr function to combine values.

In this case I don’t want any separator between my two values so I will write my separator argument as sep = ““.

d10 %>%
  dplyr::mutate(item1_measure = stringr::str_c(item1, measure, sep = ""))
# A tibble: 4 x 4
     id item1 measure item1_measure
  <dbl> <dbl> <chr>   <chr>        
1     1    12 in      12in         
2     2    10 in      10in         
3     3    15 cm      15cm         
4     4    20 cm      20cm         
  • Note: This is also equivalent to using base::paste0() which unlike base::paste() has no separator.
d10 %>%
  dplyr::mutate(item1_measure = base::paste0(item1, measure))
# A tibble: 4 x 4
     id item1 measure item1_measure
  <dbl> <dbl> <chr>   <chr>        
1     1    12 in      12in         
2     2    10 in      10in         
3     3    15 cm      15cm         
4     4    20 cm      20cm         


Package: tidyr


Function: unite()


1. Combine multiple dummy variables into one variable

Review the data (d16)

# A tibble: 3 x 4
     id cheese pepperoni mushrooms
  <dbl> <chr>  <chr>     <chr>    
1    10 cheese <NA>      mushrooms
2    11 <NA>   <NA>      <NA>     
3    12 <NA>   pepperoni <NA>     

Combine responses to the question “Which pizza toppings do you like?”

The problem with stringr::str_c() and base::paste0() is that they don’t handle NA values well. Since this was a select all question, answers were provided when selected, and NA otherwise.

d16 %>%
  mutate(toppings = str_c(cheese, pepperoni, mushrooms, sep = ","))
# A tibble: 3 x 5
     id cheese pepperoni mushrooms toppings
  <dbl> <chr>  <chr>     <chr>     <chr>   
1    10 cheese <NA>      mushrooms <NA>    
2    11 <NA>   <NA>      <NA>      <NA>    
3    12 <NA>   pepperoni <NA>      <NA>    
d16 %>%
  mutate(toppings = paste0(cheese, pepperoni, mushrooms, collapse = ","))
# A tibble: 3 x 5
     id cheese pepperoni mushrooms toppings                              
  <dbl> <chr>  <chr>     <chr>     <chr>                                 
1    10 cheese <NA>      mushrooms cheeseNAmushrooms,NANANA,NApepperoniNA
2    11 <NA>   <NA>      <NA>      cheeseNAmushrooms,NANANA,NApepperoniNA
3    12 <NA>   pepperoni <NA>      cheeseNAmushrooms,NANANA,NApepperoniNA

A great alternative is to use tidyr::unite(). Here we can use the argument na.rm = TRUE

*Note: dplyr::mutate() is not needed here

d16 %>%
  unite(col = toppings, cheese, pepperoni, mushrooms, sep = ",", na.rm = TRUE)
# A tibble: 3 x 2
     id toppings          
  <dbl> <chr>             
1    10 "cheese,mushrooms"
2    11 ""                
3    12 "pepperoni"       

Return to Strings