Recode values

Package: dplyr

Function: `recode()`

Note: The dplyr::recode() formula is: old value=new value, this is opposite of dplyr::rename().

1. Recode character values into different character values (for gender)

Review the data (d4)

# A tibble: 3 x 3
  id    gender lunch
  <chr> <chr>  <chr>
1 a     m      f    
2 b     f      r    
3 c     n      p

Recode gender

Note: We did not have to put quotes around the old character values because dplyr::recode() replaces character or factor values by their name.
Note: Parentheses are required around the new value when recoding into a character/factor variable.
Note: We are recoding back into the same variable using dplyr::mutate(). However, we could have recoded into a new variable by changing the name of gender.

d4 %>%
  dplyr::mutate(gender = dplyr::recode(gender, m = "male", f = "female", n = "nonbinary"))

# A tibble: 3 x 3
  id    gender    lunch
  <chr> <chr>     <chr>
1 a     male      f    
2 b     female    r    
3 c     nonbinary p

2. Recode a character variable (gender) into a numeric variable

Review the data (d4)

# A tibble: 3 x 3
  id    gender lunch
  <chr> <chr>  <chr>
1 a     m      f    
2 b     f      r    
3 c     n      p

Recode gender

Note: Since we are recoding into a numeric variable, no quotes are necessary for the numeric new value.
Note: We are recoding back into the same variable using dplyr::mutate(). However, we could have recoded into a new variable by changing the name of gender.

d4 %>%
  dplyr::mutate(gender = dplyr::recode(gender, m = 1, f = 2, n = 3))

# A tibble: 3 x 3
  id    gender lunch
  <chr>  <dbl> <chr>
1 a          1 f    
2 b          2 r    
3 c          3 p

3. Recode just one value in a numeric variable (Var2)

Review the data (d5)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b        16     0
3 c         3     1

Recode 16 -> 4 in Var2

Note: We did have to put tick marks around the old values because dplyr::recode() replaces numeric values by either their name or their position. If a number with no quotes/backticks is given, it will assume it is a position.

Note: Notice here that we are only recoding one value in the variable. There is a default option that you can set all not recoded values to, and if no default is supplied and the replacement values are the same variable class as the original values (ex: numeric and numeric), then unmatched values are unchanged. However, if the replacement value is a new variable class (ex: numeric to character) then all other values will be recoded to NA.

d5 %>%
  dplyr::mutate(Var2 = dplyr::recode(Var2, `16` = 4))

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b         4     0
3 c         3     1

4. Recode just one value in a character variable (Var)

Review the data (d5)

# A tibble: 3 x 2
     id Var1 
  <dbl> <chr>
1   123 1.5  
2   234 2.2  
3   345 MR

In this case I want Var1 to be numeric. However, missing values were entered as character values (“MR” for missing response) so I need to recode these values before I can convert this variable to numeric.

Note: Parentheses are required around the new value when recoding because the values in Var1 are still considered character even though they all appear as numeric after recoding.

d14 <- d14 %>%
  dplyr::mutate(Var1 = dplyr::recode(Var1, MR = "-99"))

d14

# A tibble: 3 x 2
     id Var1 
  <dbl> <chr>
1   123 1.5  
2   234 2.2  
3   345 -99

Note: Notice again that we are only recoding one value in the variable. If no default replacement option is supplied and the replacement values are the same variable class as the original values (ex: character to character), then unmatched values are unchanged.

I can then now convert my variable to numeric without having my character values converted to NAs. If I tried to convert my Var1 variable to numeric before recoding “MR” to -99, the dplyr::recode() function would convert “MR” to NA.

d14 %>%
  dplyr::mutate(Var1 = base::as.numeric(Var1))

# A tibble: 3 x 2
     id  Var1
  <dbl> <dbl>
1   123   1.5
2   234   2.2
3   345 -99

5. Recode one value in a numeric haven labelled variable (Var3)

Review the data (d5)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b        16     0
3 c         3     1

View the variable labels for Var3

Note: You’ll see in this example there are three values and two value labels. In this instance, 2s are supposed to also be yes, but they do not contain a value label.

labelled::val_labels(d5$Var3)

 no yes 
  0   1

Recode 2 -> 1 in Var3

d5 <- d5 %>% 
  dplyr::mutate(Var3 = dplyr::recode(Var3, `2`=1))

d5

# A tibble: 3 x 3
  Var1   Var2      Var3
  <chr> <dbl> <dbl+lbl>
1 a         1   1 [yes]
2 b        16   0 [no] 
3 c         3   1 [yes]

View the variable labels for Var3

Note: The value labels are retained. If the labelled package is loaded, additional functionality is available for dplyr::recode(), such as additional arguments for value labels. For example, there is an argument, .keep_value_labels, that you can add to dplyr::recode() and the default is TRUE. However, if you do not want to keep the value labels you can set the option to FALSE.

labelled::val_labels(d5$Var3)

 no yes 
  0   1

6. Recode multiple values in a numeric haven labelled variable (Var3)

Review the data (d13)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     6
2 b        16     5
3 c         3     6

View the variable labels for Var3

d13 %>%
  dplyr::select(Var3) %>%
  labelled::val_labels()

$Var3
 no yes 
  5   6

Recode 5 -> 0 and 6 -> 1 in Var3

d13 <- d13 %>%
  dplyr::mutate(Var3 = dplyr::recode(Var3, `5` = 0, `6` = 1))

d13

# A tibble: 3 x 3
  Var1   Var2      Var3
  <chr> <dbl> <dbl+lbl>
1 a         1         1
2 b        16         0
3 c         3         1

View the variable labels for Var3

Note: We left the labelled and dplyr::recode() default of .keep_value_labels is TRUE so our value labels are still funky.

d13 %>%
  dplyr::select(Var3) %>%
  labelled::val_labels()

$Var3
 no yes 
  5   6

In this instance I don’t want to keep the original value labels. I don’t necessarily need to set the .keep_value_labels to FALSE though. I can just add new value labels and they will overwrite the previous labels.

d13 <- d13 %>% 
  labelled::set_value_labels(Var3 = c(no = 0, yes = 1))

d13 %>%
  dplyr::select(Var3) %>%
  labelled::val_labels()

$Var3
 no yes 
  0   1

Function: `case_when()`

1. Recode Var3 based on decision_var

Review the data (d21)

# A tibble: 6 x 3
  decision_var  Var2  Var3
         <dbl> <dbl> <dbl>
1            2     5    NA
2            4     4    NA
3           NA     0     4
4            2     4     1
5            4     5     3
6            3     1    NA

My logic is

If decision_var != 4 then Var3 should be -99, else it should be it’s current value

d21 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        !decision_var %in% 4 ~ -99,
        TRUE ~ Var3
      )
  )

# A tibble: 6 x 3
  decision_var  Var2  Var3
         <dbl> <dbl> <dbl>
1            2     5   -99
2            4     4    NA
3           NA     0   -99
4            2     4   -99
5            4     5     3
6            3     1   -99

Notice that I use the ! outside of decision_var and use %in% instead of !=. This is just one way to deal with the NA values that exist in decision_var. In this case I wanted values to be recoded to -99 even if decision_var is NA.

Let’s see what would happen if I wrote this the other way.

d21 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        decision_var != 4 ~ -99,
        TRUE ~ Var3
      )
  )

# A tibble: 6 x 3
  decision_var  Var2  Var3
         <dbl> <dbl> <dbl>
1            2     5   -99
2            4     4    NA
3           NA     0     4
4            2     4   -99
5            4     5     3
6            3     1   -99

Using != above, the NAs in decision_var are not evaluated and so the value of Var3 remains the same value. I could solve this problem another way by adding an explicit call to recode NAs like this below.

d21 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        decision_var != 4 ~ -99,
        is.na(decision_var) ~ -99,
        TRUE ~ Var3
      )
  )

# A tibble: 6 x 3
  decision_var  Var2  Var3
         <dbl> <dbl> <dbl>
1            2     5   -99
2            4     4    NA
3           NA     0   -99
4            2     4   -99
5            4     5     3
6            3     1   -99

And last, if I wanted NAs in the decision_var to be dealt with in a different way from the other values, I would not want to use %in%. See example below.

My new logic is

If decision_var !=4 then Var3 should be -99, If decision_var is NA then Var3 should be NA, else it should be it’s current value

d21 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        decision_var != 4 ~ -99,
        is.na(decision_var) ~ NA_real_,
        TRUE ~ Var3
      )
  )

# A tibble: 6 x 3
  decision_var  Var2  Var3
         <dbl> <dbl> <dbl>
1            2     5   -99
2            4     4    NA
3           NA     0    NA
4            2     4   -99
5            4     5     3
6            3     1   -99

2. Recode Var3 into a dichotomous variable

Review the data (d22)

# A tibble: 6 x 3
   Var1  Var2  Var3
  <dbl> <dbl> <dbl>
1     2     5     4
2     4     4     3
3    NA     0     4
4     2     4     1
5     4     5     3
6     3     1    NA

My logic is

If Var3 !=4 then 0, else 1

d22 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        !Var3 %in% 4 ~ 0,
        TRUE ~ 1
      )
  )

# A tibble: 6 x 3
   Var1  Var2  Var3
  <dbl> <dbl> <dbl>
1     2     5     1
2     4     4     0
3    NA     0     1
4     2     4     0
5     4     5     0
6     3     1     0

Similar to above, using ! and %in% ensures that the NA is accounted for. If however, I wanted NA to stay NA, I would want to write something like this.

d22 %>%
  dplyr::mutate(
    Var3 =
      dplyr::case_when(
        Var3 !=4 ~ 0,
        is.na(Var3) ~ NA_real_,
        TRUE ~ 1
      )
  )

# A tibble: 6 x 3
   Var1  Var2  Var3
  <dbl> <dbl> <dbl>
1     2     5     1
2     4     4     0
3    NA     0     1
4     2     4     0
5     4     5     0
6     3     1    NA

Return to Recode

Recode values

Package: dplyr

Function: recode()

Function: case_when()

Function: `recode()`

Function: `case_when()`