Package: dplyr


Function: recode()


Note: The dplyr::recode() formula is: old value=new value, this is opposite of dplyr::rename()


1. Reverse code multiple variables (Var2 and Var3) in the same way

Review the data (d2)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     4
2 b        NA     5
3 c         3     1

Reverse code Var2 and Var3 and rename the variables with _r at the end.

  • Note: We are modifying existing variables using dplyr::mutate().

  • Note: We are adding dplyr::across() to select the variables we want to modify.

  • Note: The dplyr::recode() function considers the old value as a name and therefore, in order to use a number as a name/character, you have to surround it with backticks (``)

  • Note: Last, in this scenario we are writing over the existing variables and then renaming them by using the dplyr::rename_with() function along with base::paste0().

d2_new <- d2 %>%
  dplyr::mutate(dplyr::across(Var2:Var3, ~ dplyr::recode(
    .,
    `1` = 5,
    `2` = 4,
    `3` = 3,
    `4` = 2,
    `5` = 1
  ))) %>%
  dplyr::rename_with( ~ paste0(., "_r"), Var2:Var3)

d2_new
# A tibble: 3 x 3
  Var1  Var2_r Var3_r
  <chr>  <dbl>  <dbl>
1 a          5      2
2 b         NA      1
3 c          3      5

We could also do this using the formula (1 + highest possible scale value) - original value. In this case I know that the highest possible value for this scale is 5.

Note that I donโ€™t need to use the dplyr::recode() function here. I can simply apply a formula across my selected variables.

d2_new <- d2 %>%
  dplyr::mutate(dplyr::across(Var2:Var3, ~ 6 - .x)) %>%
  dplyr::rename_with( ~ paste0(., "_r"), Var2:Var3)

d2_new
# A tibble: 3 x 3
  Var1  Var2_r Var3_r
  <chr>  <dbl>  <dbl>
1 a          5      2
2 b         NA      1
3 c          3      5

Check that your reverse code worked correctly

base::table(d2$Var2, d2_new$Var2_r)
   
    3 5
  1 0 1
  3 1 0
base::table(d2$Var3, d2_new$Var3_r)
   
    1 2 5
  1 0 0 1
  4 0 1 0
  5 1 0 0

If you want to keep both the old variables and the newly created variables, you can add the argument .names to the dplyr::across() function to rename your recoded variables as you create them, rather than writing over the old variables by adding the dplyr::rename_with() function as a second step.

d2 <- d2 %>%
  dplyr::mutate(dplyr::across(
    Var2:Var3,
    ~ dplyr::recode(
      .,
      `1` = 5,
      `2` = 4,
      `3` = 3,
      `4` = 2,
      `5` = 1
    ),
    .names = '{col}_r'
  ))

d2
# A tibble: 3 x 5
  Var1   Var2  Var3 Var2_r Var3_r
  <chr> <dbl> <dbl>  <dbl>  <dbl>
1 a         1     4      5      2
2 b        NA     5     NA      1
3 c         3     1      3      5

Function: case_when()


1. Reverse code a single variable

Review the data (d2)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     4
2 b        NA     5
3 c         3     1

Check the values of Var2

d2 %>% 
  janitor::tabyl(Var2)
 Var2 n   percent valid_percent
    1 1 0.3333333           0.5
    3 1 0.3333333           0.5
   NA 1 0.3333333            NA

Reverse code Var2 and name it with *_r* at the end.

  • Note: We are recoding into a new variable using dplyr::mutate() and naming the new variable a different name than the original. It keeps both the new and old versions of the variable. If we used dplyr::transmute() instead it would keep the new version and remove the old version.
d2 <- d2 %>% 
  dplyr::mutate(Var2_r = dplyr::case_when(
  Var2 == 1 ~ 5,
  Var2 == 2 ~ 4,
  Var2 == 3 ~ 3, 
  Var2 == 4 ~ 2,
  Var2 == 5 ~ 1,
  TRUE ~ NA_real_))

Check that your reverse code worked correctly

d2 %>% 
  janitor::tabyl(Var2, Var2_r)
 Var2 3 5 NA_
    1 0 1   0
    3 1 0   0
   NA 0 0   1

2. Turn your reverse scale into a function to use for multiple variables (Data Science in Education Using R example)

Review the data (d2)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     4
2 b        NA     5
3 c         3     1

Check the values of Var2 and Var3

d2 %>% 
  janitor::tabyl(Var2)
 Var2 n   percent valid_percent
    1 1 0.3333333           0.5
    3 1 0.3333333           0.5
   NA 1 0.3333333            NA
d2 %>% 
  janitor::tabyl(Var3)
 Var3 n   percent
    1 1 0.3333333
    4 1 0.3333333
    5 1 0.3333333

Create your function

reverse_scale <- function(Var) {
  x <- dplyr::case_when(
    Var == 1 ~ 5,
    Var == 2 ~ 4,
    Var == 3 ~ 3, 
    Var == 4 ~ 2,
    Var == 5 ~ 1,
    TRUE ~ NA_real_
  )
}

Reverse code Var2 and Var3 using your function and rename the new variables with *_r* at the end

  • Note: We are recoding into new variables using dplyr::mutate() and naming the new variables different names than the original. It keeps both the new and old versions of the variables.
d2 <- d2 %>%
  dplyr::mutate(Var2_r = reverse_scale(Var2),
                Var3_r = reverse_scale(Var3))

Check that the recode worked

d2 %>% 
  janitor::tabyl(Var2, Var2_r)
 Var2 3 5 NA_
    1 0 1   0
    3 1 0   0
   NA 0 0   1
d2 %>% 
  janitor::tabyl(Var3, Var3_r)
 Var3 1 2 5
    1 0 0 1
    4 0 1 0
    5 1 0 0

You can also use your function this way

Reverse code Var2 and Var3 using your function and rename the new variables with _r at the end

  • Note: We are modifying existing variables using dplyr::mutate()

  • Note: We are using dplyr::across() to apply a transformation to select columns and then adding the argument names to rename your recoded variables as you create them, rather than writing over the old variables.

d2 <- d2 %>%
  dplyr::mutate(across(Var2:Var3, reverse_scale,
                       .names = '{col}_r'))

Check that the recode worked

base::table(d2$Var2, d2$Var2_r)
   
    3 5
  1 0 1
  3 1 0
base::table(d2$Var3, d2$Var3_r)
   
    1 2 5
  1 0 0 1
  4 0 1 0
  5 1 0 0

Return to Recode