separate_rows()Note: Unlike tidyr::separate() which splits columns
into new columns, tidyr::separate_rows() splits a column
with multiple delimited values into new rows.
Note: Since writing this, tidyr::separate_rows() has
been superseded by separate_longer_delim(). See this link
for more info.
1. Split a variable (toppings) into multiple
rows
Review the data (d5)
# A tibble: 3 x 2
name toppings
<chr> <chr>
1 Marge anchovies,garlic,tomato
2 Homer sausage,red pepper
3 Ned cheese
Split toppings into new rows
d5 %>%
tidyr::separate_rows(toppings, sep=",")
# A tibble: 6 x 2
name toppings
<chr> <chr>
1 Marge anchovies
2 Marge garlic
3 Marge tomato
4 Homer sausage
5 Homer red pepper
6 Ned cheese
tidyr::separate_rows() is a very
clean way of accomplishing the goal above, I think it is nice to see
that there are other methods to accomplishing this result. One of those
other ways is using stringr::str_split().Notice here though that you have to use dplyr::mutate()
to create a new column (or in this case replace the existing
column).
d5 %>%
mutate(toppings = stringr::str_split(toppings, pattern = ","))
# A tibble: 3 x 2
name toppings
<chr> <list>
1 Marge <chr [3]>
2 Homer <chr [2]>
3 Ned <chr [1]>
You’ll also notice that you do not get a normal character vector for
toppings. You get a nested list-column that now requires us
to tidyr::unnest() in order to see the information.
So while this method does work, you can see what
tidyr::separte_rows() is much simpler for this use
case.
d5 %>%
mutate(toppings = stringr::str_split(toppings, pattern = ",")) %>%
tidyr::unnest()
# A tibble: 6 x 2
name toppings
<chr> <chr>
1 Marge anchovies
2 Marge garlic
3 Marge tomato
4 Homer sausage
5 Homer red pepper
6 Ned cheese
Return to Separate