Package: dplyr


Function: mutate()


1. Create a single race variable from a series of dummy coded variables

Review the data (d11)

# A tibble: 3 x 7
  stu_id race1 race2 race3 race4 race5 race6
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     0     0     0     0     0     1
2    101     1     0     0     0     0     0
3    102     0     0     1     0     0     1

We want a single race variable where each value is represented if it is the only one selected, but if more than one race is selected, the value is set to 7 (to represent multi-racial).

We will use dplyr::case_when() to draft the logic, along with dplyr::if_all() to check values across multiple variables.

d11 %>%
  mutate(
    race = 
      case_when(
        race1 == 1 & if_all(race2:race6, ~. == 0) ~ 1,
        race2 == 1 & if_all(c(race1, race3:race6), ~. == 0) ~ 2,
        race3 == 1 & if_all(c(race1:race2, race4:race6), ~. == 0) ~ 3,
        race4 == 1 & if_all(c(race1:race3, race5:race6), ~. == 0) ~ 4,
        race5 == 1 & if_all(c(race1:race4, race6), ~. == 0) ~ 5,
        race6 == 1 & if_all(race1:race5, ~. == 0) ~ 6,
        rowSums(across(race1:race6)) > 1 ~ 7))
# A tibble: 3 x 8
  stu_id race1 race2 race3 race4 race5 race6  race
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     0     0     0     0     0     1     6
2    101     1     0     0     0     0     0     1
3    102     0     0     1     0     0     1     7

While the code above is definitely the more explicit way to assign values, you could also simplify the above code by putting the logic for the multi-racial value at the top because dplyr::case_when() reviews logic in order and does not replace previously assigned values.

d11 %>%
  mutate(
    race = 
      case_when(
        rowSums(across(race1:race6)) > 1 ~ 7,
        race1 == 1 ~ 1,
        race2 == 1 ~ 2,
        race3 == 1 ~ 3,
        race4 == 1 ~ 4,
        race5 == 1 ~ 5,
        race6 == 1 ~ 6,
))
# A tibble: 3 x 8
  stu_id race1 race2 race3 race4 race5 race6  race
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     0     0     0     0     0     1     6
2    101     1     0     0     0     0     0     1
3    102     0     0     1     0     0     1     7

**Note: You could of course use this same logic for other variables, not just race. For example, dummy coded grade levels that teachers work with (assigning a new value when teachers work with multiple grade levels).

Return to Create New Variables