Add new indicator column

Package: dplyr

Function: `mutate()`

1. Create an indicator column (scale1_risk)

# A tibble: 5 x 4
     id grade cut_scale1 scale1_sum
  <dbl> <dbl>      <dbl>      <dbl>
1     1     2        2.5          1
2     2     3        2            4
3     3     3        2            1
4     4    NA       NA            3
5     5     2        2.5          4

I have a sum score and a cut score for a scale. The cut scores are different depending on grade level.

I want to create an indicator of risk level:

sum score < cut score = 0 score score >= cut score = 1

Note: I use dplyr::case_when() to recode my sum variable into a dichotomous variable.

My logic says:

If either grade or my selected sum score variable is NA, then my new variable should be coded to NA.
Then if my selected sum score variable is less than my selected cut score variable, code my new variable to 0.
Else (TRUE) code to 1.

d9 %>%
  dplyr::mutate(scale1_risk =
                  dplyr::case_when(
    is.na(grade) | is.na(scale1_sum) ~ NA_real_,
    scale1_sum < cut_scale1 ~ 0,
    TRUE ~ 1
  ))

# A tibble: 5 x 5
     id grade cut_scale1 scale1_sum scale1_risk
  <dbl> <dbl>      <dbl>      <dbl>       <dbl>
1     1     2        2.5          1           0
2     2     3        2            4           1
3     3     3        2            1           0
4     4    NA       NA            3          NA
5     5     2        2.5          4           1

2. Create multiple _risk indicator columns

Review the data (d6)

# A tibble: 5 x 8
     id grade cut_scale1 cut_scale2 cut_scale3 scale1_sum scale2_sum scale3_sum
  <dbl> <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1     1     2        2.5        3.1          3          1          2          4
2     2     3        2          3            2          2          4          1
3     3     3        2          3            2          3          1          2
4     4    NA       NA         NA           NA          3          4          1
5     5     2        2.5        3.1          3          4          2         NA

This time I have sum scores for 3 different scales. I also have 3 different cut scores for each of those scales. The cut scores are different depending on grade level.

I want to create an indicator of risk level:

sum score < cut score = 0 score score >= cut score = 1

I could write this calculation out one piece at a time, but I would prefer to create a function that I can use that will minimize duplication.

First I create a cutoff function that I can use.

In this function I call my data frame and I use dplyr::mutate() to create my new at_risk variable which will be named dynamically.

Note: In order to name this variable dynamically I have to use :=. I also have to surround all data frame variables with {{}} to denote that I am referring to data frame variables, not environment variables.

cutoff <- function(df, new_name, cut_score, sum_score) {
  df %>%
    dplyr::mutate({{new_name}} := dplyr::case_when(
      is.na(grade) | is.na({{sum_score}}) ~ NA_real_,
      {{sum_score}} < {{cut_score}} ~ 0,
      TRUE ~ 1))
}

Now I can use my new function to create my new _risk variables.

d6 <- cutoff(d6, scale1_risk, cut_scale1, scale1_sum)

d6 <- cutoff(d6, scale2_risk, cut_scale2, scale2_sum)

d6 <- cutoff(d6, scale3_risk, cut_scale3, scale3_sum)

d6 %>%
  dplyr::select(tidyselect::contains("risk"))

# A tibble: 5 x 3
  scale1_risk scale2_risk scale3_risk
        <dbl>       <dbl>       <dbl>
1           0           0           1
2           1           1           0
3           1           0           1
4          NA          NA          NA
5           1           0          NA

Return to Create New Variables

Add new indicator column

Package: dplyr

Function: mutate()

Function: `mutate()`