mutate()1. Create new scored variables for all of the “item” variables
Review the data (d1)
# A tibble: 3 x 5
id item1 item2 item3 item4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 10 3 5 3 NA
2 11 3 5 1 5
3 12 3 1 3 5
item1 through item4 were all multiple
choice variables with one correct answer. We now want to score these
variables to be correct (1) or incorrect (0).
Item1: correct answer = 3
Item2: correct answer = 5
Item3: correct answer = 3
Item4: correct answer = 5
Since item1 and item3 have the same correct
answers, and item2 and item4 have the same
correct answers, we can use dplyr::across() within
dplyr::mutate() to score these variables at the same time.
We use . to denote we want the criteria to apply to the
variables we have selected.
Note: We use the dplyr::across() argument
.names to rename our newly scored variables. You can use {col}
to stand for the selected column name.
Note: We have NA values in our variables, so since
dplyr::case_when() recodes in order of how statements are
provided, we will first want to say we want NA values to stay as NA
values (NA_real_).
Note: In the dplyr::case_when() function,
.default means if a value was not evaluated in my arguments
above, replace with the value I give. In this case, I am saying replace
remaining values with 0.
dplyr::across() argument
.names to rename our newly created variables with the suffix
“_scored”.d1 %>%
mutate(across(c(item1, item3),
~ case_when(is.na(.) ~ NA_real_,
. == 3 ~ 1,
.default = 0),
.names = "{col}_scored"
),
across(c(item2, item4),
~ case_when(is.na(.) ~ NA_real_,
. == 5 ~ 1,
.default = 0),
.names = "{col}_scored"
)
) %>%
dplyr::select(id, tidyselect::contains("scored"))
# A tibble: 3 x 5
id item1_scored item3_scored item2_scored item4_scored
<dbl> <dbl> <dbl> <dbl> <dbl>
1 10 1 1 1 NA
2 11 1 0 1 1
3 12 1 1 0 1
We could also write the formula this way, with explicitly listing the values to be recoded to 0 and recoding NA at the end, but this is a bit longer.
d1 %>%
mutate(across(c(item1, item3),
~ case_when(. == 3 ~ 1,
. %in% c(1, 2, 4, 5) ~ 0),
.names = "{col}_scored"
),
across(c(item2, item4),
~ case_when(. == 5 ~ 1,
. %in% c(1, 2, 3, 4) ~ 0),
.names = "{col}_scored"
)
) %>%
dplyr::select(id, tidyselect::contains("scored"))
# A tibble: 3 x 5
id item1_scored item3_scored item2_scored item4_scored
<dbl> <dbl> <dbl> <dbl> <dbl>
1 10 1 1 1 NA
2 11 1 0 1 1
3 12 1 1 0 1
Return to Create New Variables