str_c()1. Combine two variables (school and
district) with a separator (_)
Review the data (d9)
# A tibble: 4 x 3
id school district
<dbl> <chr> <chr>
1 1 schoola districtb
2 2 schoolb districtb
3 3 schoolc districte
4 4 schoold districte
I will first use dplyr::mutate() to create my new
variable which I am calling sch_dist.
I can then use my stringr::str_c() function to combine
or concatenate the two variables which I want to separate by a “,” so I
will add the argument collapse = “,”
d9 %>%
dplyr::mutate(sch_dist = stringr::str_c(school, district, sep = ","))
# A tibble: 4 x 4
id school district sch_dist
<dbl> <chr> <chr> <chr>
1 1 schoola districtb schoola,districtb
2 2 schoolb districtb schoolb,districtb
3 3 schoolc districte schoolc,districte
4 4 schoold districte schoold,districte
2. Combine two variables (item1 and
measure) with no separator
Review the data (d10)
# A tibble: 4 x 3
id item1 measure
<dbl> <dbl> <chr>
1 1 12 in
2 2 10 in
3 3 15 cm
4 4 20 cm
Again, I will use dplyr::mutate() first to create my new
variable item1_measure.
Then I can use my stringr function to combine
values.
In this case I don’t want any separator between my two values so I will write my separator argument as sep = ““.
d10 %>%
dplyr::mutate(item1_measure = stringr::str_c(item1, measure, sep = ""))
# A tibble: 4 x 4
id item1 measure item1_measure
<dbl> <dbl> <chr> <chr>
1 1 12 in 12in
2 2 10 in 10in
3 3 15 cm 15cm
4 4 20 cm 20cm
base::paste0()
which unlike base::paste() has no separator.d10 %>%
dplyr::mutate(item1_measure = base::paste0(item1, measure))
# A tibble: 4 x 4
id item1 measure item1_measure
<dbl> <dbl> <chr> <chr>
1 1 12 in 12in
2 2 10 in 10in
3 3 15 cm 15cm
4 4 20 cm 20cm
unite()1. Combine multiple dummy variables into one variable
Review the data (d16)
# A tibble: 3 x 4
id cheese pepperoni mushrooms
<dbl> <chr> <chr> <chr>
1 10 cheese <NA> mushrooms
2 11 <NA> <NA> <NA>
3 12 <NA> pepperoni <NA>
Combine responses to the question “Which pizza toppings do you like?”
The problem with stringr::str_c() and
base::paste0() is that they don’t handle NA values well.
Since this was a select all question, answers were provided when
selected, and NA otherwise.
d16 %>%
mutate(toppings = str_c(cheese, pepperoni, mushrooms, sep = ","))
# A tibble: 3 x 5
id cheese pepperoni mushrooms toppings
<dbl> <chr> <chr> <chr> <chr>
1 10 cheese <NA> mushrooms <NA>
2 11 <NA> <NA> <NA> <NA>
3 12 <NA> pepperoni <NA> <NA>
d16 %>%
mutate(toppings = paste0(cheese, pepperoni, mushrooms, collapse = ","))
# A tibble: 3 x 5
id cheese pepperoni mushrooms toppings
<dbl> <chr> <chr> <chr> <chr>
1 10 cheese <NA> mushrooms cheeseNAmushrooms,NANANA,NApepperoniNA
2 11 <NA> <NA> <NA> cheeseNAmushrooms,NANANA,NApepperoniNA
3 12 <NA> pepperoni <NA> cheeseNAmushrooms,NANANA,NApepperoniNA
A great alternative is to use tidyr::unite(). Here we
can use the argument na.rm = TRUE
*Note: dplyr::mutate() is not needed here
d16 %>%
unite(col = toppings, cheese, pepperoni, mushrooms, sep = ",", na.rm = TRUE)
# A tibble: 3 x 2
id toppings
<dbl> <chr>
1 10 "cheese,mushrooms"
2 11 ""
3 12 "pepperoni"
Return to Strings