recode()Note: The dplyr::recode() formula is:
old value=new value, this is opposite of
dplyr::rename().
1. Recode character values into different character values
(for gender)
Review the data (d4)
# A tibble: 3 x 3
id gender lunch
<chr> <chr> <chr>
1 a m f
2 b f r
3 c n p
Recode gender
Note: We did not have to put quotes around the old character
values because dplyr::recode() replaces character or factor
values by their name.
Note: Parentheses are required around the new value when recoding into a character/factor variable.
Note: We are recoding back into the same variable using
dplyr::mutate(). However, we could have recoded into a new
variable by changing the name of gender.
d4 %>%
dplyr::mutate(gender = dplyr::recode(gender, m = "male", f = "female", n = "nonbinary"))
# A tibble: 3 x 3
id gender lunch
<chr> <chr> <chr>
1 a male f
2 b female r
3 c nonbinary p
2. Recode a character variable (gender) into a
numeric variable
Review the data (d4)
# A tibble: 3 x 3
id gender lunch
<chr> <chr> <chr>
1 a m f
2 b f r
3 c n p
Recode gender
Note: Since we are recoding into a numeric variable, no quotes are necessary for the numeric new value.
Note: We are recoding back into the same variable using
dplyr::mutate(). However, we could have recoded into a new
variable by changing the name of gender.
d4 %>%
dplyr::mutate(gender = dplyr::recode(gender, m = 1, f = 2, n = 3))
# A tibble: 3 x 3
id gender lunch
<chr> <dbl> <chr>
1 a 1 f
2 b 2 r
3 c 3 p
3. Recode just one value in a numeric variable
(Var2)
Review the data (d5)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b 16 0
3 c 3 1
Recode 16 -> 4 in Var2
Note: We did have to put tick marks around the old values
because dplyr::recode() replaces numeric values by either
their name or their position. If a number with no quotes/backticks is
given, it will assume it is a position.
Note: Notice here that we are only recoding one value in the variable. There is a default option that you can set all not recoded values to, and if no default is supplied and the replacement values are the same variable class as the original values (ex: numeric and numeric), then unmatched values are unchanged. However, if the replacement value is a new variable class (ex: numeric to character) then all other values will be recoded to NA.
d5 %>%
dplyr::mutate(Var2 = dplyr::recode(Var2, `16` = 4))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b 4 0
3 c 3 1
4. Recode just one value in a character variable
(Var)
Review the data (d5)
# A tibble: 3 x 2
id Var1
<dbl> <chr>
1 123 1.5
2 234 2.2
3 345 MR
In this case I want Var1 to be numeric. However, missing values were entered as character values (“MR” for missing response) so I need to recode these values before I can convert this variable to numeric.
d14 <- d14 %>%
dplyr::mutate(Var1 = dplyr::recode(Var1, MR = "-99"))
d14
# A tibble: 3 x 2
id Var1
<dbl> <chr>
1 123 1.5
2 234 2.2
3 345 -99
Note: Notice again that we are only recoding one value in the variable. If no default replacement option is supplied and the replacement values are the same variable class as the original values (ex: character to character), then unmatched values are unchanged.
I can then now convert my variable to numeric without having my
character values converted to NAs. If I tried to convert my Var1
variable to numeric before recoding “MR” to -99, the
dplyr::recode() function would convert “MR” to NA.
d14 %>%
dplyr::mutate(Var1 = base::as.numeric(Var1))
# A tibble: 3 x 2
id Var1
<dbl> <dbl>
1 123 1.5
2 234 2.2
3 345 -99
5. Recode one value in a numeric haven labelled variable
(Var3)
Review the data (d5)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b 16 0
3 c 3 1
View the variable labels for Var3
labelled::val_labels(d5$Var3)
no yes
0 1
Recode 2 -> 1 in Var3
d5 <- d5 %>%
dplyr::mutate(Var3 = dplyr::recode(Var3, `2`=1))
d5
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl+lbl>
1 a 1 1 [yes]
2 b 16 0 [no]
3 c 3 1 [yes]
View the variable labels for Var3
Note: The value labels are retained. If the labelled
package is loaded, additional functionality is available for
dplyr::recode(), such as additional arguments for value
labels. For example, there is an argument, .keep_value_labels,
that you can add to dplyr::recode() and the default is
TRUE. However, if you do not want to keep the value labels you can set
the option to FALSE.
labelled::val_labels(d5$Var3)
no yes
0 1
6. Recode multiple values in a numeric haven labelled
variable (Var3)
Review the data (d13)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 6
2 b 16 5
3 c 3 6
View the variable labels for Var3
d13 %>%
dplyr::select(Var3) %>%
labelled::val_labels()
$Var3
no yes
5 6
Recode 5 -> 0 and 6 -> 1 in Var3
d13 <- d13 %>%
dplyr::mutate(Var3 = dplyr::recode(Var3, `5` = 0, `6` = 1))
d13
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl+lbl>
1 a 1 1
2 b 16 0
3 c 3 1
View the variable labels for Var3
Note: We left the labelled and
dplyr::recode() default of .keep_value_labels is
TRUE so our value labels are still funky.
d13 %>%
dplyr::select(Var3) %>%
labelled::val_labels()
$Var3
no yes
5 6
In this instance I don’t want to keep the original value labels. I don’t necessarily need to set the .keep_value_labels to FALSE though. I can just add new value labels and they will overwrite the previous labels.
d13 <- d13 %>%
labelled::set_value_labels(Var3 = c(no = 0, yes = 1))
d13 %>%
dplyr::select(Var3) %>%
labelled::val_labels()
$Var3
no yes
0 1
case_when()1. Recode Var3 based on
decision_var
Review the data (d21)
# A tibble: 6 x 3
decision_var Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 NA
2 4 4 NA
3 NA 0 4
4 2 4 1
5 4 5 3
6 3 1 NA
My logic is
If decision_var != 4 then Var3 should be
-99, else it should be it’s current value
d21 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
!decision_var %in% 4 ~ -99,
TRUE ~ Var3
)
)
# A tibble: 6 x 3
decision_var Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 -99
2 4 4 NA
3 NA 0 -99
4 2 4 -99
5 4 5 3
6 3 1 -99
Notice that I use the ! outside of
decision_var and use %in% instead of
!=. This is just one way to deal with the NA values that
exist in decision_var. In this case I wanted values to be
recoded to -99 even if decision_var is NA.
Let’s see what would happen if I wrote this the other way.
d21 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
decision_var != 4 ~ -99,
TRUE ~ Var3
)
)
# A tibble: 6 x 3
decision_var Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 -99
2 4 4 NA
3 NA 0 4
4 2 4 -99
5 4 5 3
6 3 1 -99
Using != above, the NAs in decision_var are
not evaluated and so the value of Var3 remains the same
value. I could solve this problem another way by adding an explicit call
to recode NAs like this below.
d21 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
decision_var != 4 ~ -99,
is.na(decision_var) ~ -99,
TRUE ~ Var3
)
)
# A tibble: 6 x 3
decision_var Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 -99
2 4 4 NA
3 NA 0 -99
4 2 4 -99
5 4 5 3
6 3 1 -99
And last, if I wanted NAs in the decision_var to be
dealt with in a different way from the other values, I would
not want to use %in%. See example below.
My new logic is
If decision_var !=4 then Var3 should be
-99, If decision_var is NA then Var3 should be
NA, else it should be it’s current value
d21 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
decision_var != 4 ~ -99,
is.na(decision_var) ~ NA_real_,
TRUE ~ Var3
)
)
# A tibble: 6 x 3
decision_var Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 -99
2 4 4 NA
3 NA 0 NA
4 2 4 -99
5 4 5 3
6 3 1 -99
2. Recode Var3 into a dichotomous
variable
Review the data (d22)
# A tibble: 6 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 4
2 4 4 3
3 NA 0 4
4 2 4 1
5 4 5 3
6 3 1 NA
My logic is
If Var3 !=4 then 0, else 1
d22 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
!Var3 %in% 4 ~ 0,
TRUE ~ 1
)
)
# A tibble: 6 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 1
2 4 4 0
3 NA 0 1
4 2 4 0
5 4 5 0
6 3 1 0
Similar to above, using ! and %in% ensures
that the NA is accounted for. If however, I wanted NA to stay NA, I
would want to write something like this.
d22 %>%
dplyr::mutate(
Var3 =
dplyr::case_when(
Var3 !=4 ~ 0,
is.na(Var3) ~ NA_real_,
TRUE ~ 1
)
)
# A tibble: 6 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 2 5 1
2 4 4 0
3 NA 0 1
4 2 4 0
5 4 5 0
6 3 1 NA
Return to Recode