mutate()1. Create an indicator column
(scale1_risk)
# A tibble: 5 x 4
id grade cut_scale1 scale1_sum
<dbl> <dbl> <dbl> <dbl>
1 1 2 2.5 1
2 2 3 2 4
3 3 3 2 1
4 4 NA NA 3
5 5 2 2.5 4
I have a sum score and a cut score for a scale. The cut scores are different depending on grade level.
I want to create an indicator of risk level:
sum score < cut score = 0 score score >= cut score = 1
dplyr::case_when() to recode my sum
variable into a dichotomous variable.My logic says:
grade or my selected sum score variable is
NA, then my new variable should be coded to NA.TRUE) code to 1.d9 %>%
dplyr::mutate(scale1_risk =
dplyr::case_when(
is.na(grade) | is.na(scale1_sum) ~ NA_real_,
scale1_sum < cut_scale1 ~ 0,
TRUE ~ 1
))
# A tibble: 5 x 5
id grade cut_scale1 scale1_sum scale1_risk
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 2.5 1 0
2 2 3 2 4 1
3 3 3 2 1 0
4 4 NA NA 3 NA
5 5 2 2.5 4 1
2. Create multiple _risk indicator
columns
Review the data (d6)
# A tibble: 5 x 8
id grade cut_scale1 cut_scale2 cut_scale3 scale1_sum scale2_sum scale3_sum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 2.5 3.1 3 1 2 4
2 2 3 2 3 2 2 4 1
3 3 3 2 3 2 3 1 2
4 4 NA NA NA NA 3 4 1
5 5 2 2.5 3.1 3 4 2 NA
This time I have sum scores for 3 different scales. I also have 3 different cut scores for each of those scales. The cut scores are different depending on grade level.
I want to create an indicator of risk level:
sum score < cut score = 0 score score >= cut score = 1
I could write this calculation out one piece at a time, but I would prefer to create a function that I can use that will minimize duplication.
First I create a cutoff function that I can use.
In this function I call my data frame and I use
dplyr::mutate() to create my new at_risk
variable which will be named dynamically.
:=. I also have to surround all data frame variables with
{{}} to denote that I am referring to data frame variables,
not environment variables.cutoff <- function(df, new_name, cut_score, sum_score) {
df %>%
dplyr::mutate({{new_name}} := dplyr::case_when(
is.na(grade) | is.na({{sum_score}}) ~ NA_real_,
{{sum_score}} < {{cut_score}} ~ 0,
TRUE ~ 1))
}
Now I can use my new function to create my new _risk
variables.
d6 <- cutoff(d6, scale1_risk, cut_scale1, scale1_sum)
d6 <- cutoff(d6, scale2_risk, cut_scale2, scale2_sum)
d6 <- cutoff(d6, scale3_risk, cut_scale3, scale3_sum)
d6 %>%
dplyr::select(tidyselect::contains("risk"))
# A tibble: 5 x 3
scale1_risk scale2_risk scale3_risk
<dbl> <dbl> <dbl>
1 0 0 1
2 1 1 0
3 1 0 1
4 NA NA NA
5 1 0 NA
Return to Create New Variables