mutate()1. Create a single race variable from a series of dummy coded variables
Review the data (d11)
# A tibble: 3 x 7
stu_id race1 race2 race3 race4 race5 race6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 0 0 0 0 0 1
2 101 1 0 0 0 0 0
3 102 0 0 1 0 0 1
We want a single race variable where each value is represented if it is the only one selected, but if more than one race is selected, the value is set to 7 (to represent multi-racial).
We will use dplyr::case_when() to draft the logic, along
with dplyr::if_all() to check values across multiple
variables.
d11 %>%
mutate(
race =
case_when(
race1 == 1 & if_all(race2:race6, ~. == 0) ~ 1,
race2 == 1 & if_all(c(race1, race3:race6), ~. == 0) ~ 2,
race3 == 1 & if_all(c(race1:race2, race4:race6), ~. == 0) ~ 3,
race4 == 1 & if_all(c(race1:race3, race5:race6), ~. == 0) ~ 4,
race5 == 1 & if_all(c(race1:race4, race6), ~. == 0) ~ 5,
race6 == 1 & if_all(race1:race5, ~. == 0) ~ 6,
rowSums(across(race1:race6)) > 1 ~ 7))
# A tibble: 3 x 8
stu_id race1 race2 race3 race4 race5 race6 race
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 0 0 0 0 0 1 6
2 101 1 0 0 0 0 0 1
3 102 0 0 1 0 0 1 7
While the code above is definitely the more explicit way to assign
values, you could also simplify the above code by putting the logic for
the multi-racial value at the top because
dplyr::case_when() reviews logic in order and does not
replace previously assigned values.
d11 %>%
mutate(
race =
case_when(
rowSums(across(race1:race6)) > 1 ~ 7,
race1 == 1 ~ 1,
race2 == 1 ~ 2,
race3 == 1 ~ 3,
race4 == 1 ~ 4,
race5 == 1 ~ 5,
race6 == 1 ~ 6,
))
# A tibble: 3 x 8
stu_id race1 race2 race3 race4 race5 race6 race
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 0 0 0 0 0 1 6
2 101 1 0 0 0 0 0 1
3 102 0 0 1 0 0 1 7
**Note: You could of course use this same logic for other variables, not just race. For example, dummy coded grade levels that teachers work with (assigning a new value when teachers work with multiple grade levels).
Return to Create New Variables