Package: dplyr


Function: mutate()


1. Create new scored variables for all of the “item” variables

Review the data (d1)

# A tibble: 3 x 5
     id item1 item2 item3 item4
  <dbl> <dbl> <dbl> <dbl> <dbl>
1    10     3     5     3    NA
2    11     3     5     1     5
3    12     3     1     3     5

item1 through item4 were all multiple choice variables with one correct answer. We now want to score these variables to be correct (1) or incorrect (0).

Item1: correct answer = 3
Item2: correct answer = 5
Item3: correct answer = 3
Item4: correct answer = 5

Since item1 and item3 have the same correct answers, and item2 and item4 have the same correct answers, we can use dplyr::across() within dplyr::mutate() to score these variables at the same time. We use . to denote we want the criteria to apply to the variables we have selected.

  • Note: We use the dplyr::across() argument .names to rename our newly scored variables. You can use {col} to stand for the selected column name.

  • Note: We have NA values in our variables, so since dplyr::case_when() recodes in order of how statements are provided, we will first want to say we want NA values to stay as NA values (NA_real_).

Note: In the dplyr::case_when() function, .default means if a value was not evaluated in my arguments above, replace with the value I give. In this case, I am saying replace remaining values with 0.

  • Note: We are adding the dplyr::across() argument .names to rename our newly created variables with the suffix “_scored”.
d1 %>%
  mutate(across(c(item1, item3),
      ~ case_when(is.na(.) ~ NA_real_,
                  . == 3 ~ 1,
                  .default = 0),
      .names = "{col}_scored"
    ),
    across(c(item2, item4),
      ~ case_when(is.na(.) ~ NA_real_,
                  . == 5 ~ 1,
                  .default = 0),
      .names = "{col}_scored"
    )
  ) %>%
  dplyr::select(id, tidyselect::contains("scored"))
# A tibble: 3 x 5
     id item1_scored item3_scored item2_scored item4_scored
  <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1    10            1            1            1           NA
2    11            1            0            1            1
3    12            1            1            0            1

We could also write the formula this way, with explicitly listing the values to be recoded to 0 and recoding NA at the end, but this is a bit longer.

d1 %>%
  mutate(across(c(item1, item3),
      ~ case_when(. == 3 ~ 1,
                  . %in% c(1, 2, 4, 5) ~ 0),
      .names = "{col}_scored"
    ),
    across(c(item2, item4),
      ~ case_when(. == 5 ~ 1,
                  . %in% c(1, 2, 3, 4) ~ 0),
      .names = "{col}_scored"
    )
  ) %>%
  dplyr::select(id, tidyselect::contains("scored"))
# A tibble: 3 x 5
     id item1_scored item3_scored item2_scored item4_scored
  <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1    10            1            1            1           NA
2    11            1            0            1            1
3    12            1            1            0            1

Return to Create New Variables