replace_na()
1. Replace NA values for one variable
(Var2
)
Review the data (d18)
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 NA NA 3
2 1 NA 4
3 1 0 2
Recode NA to 0
d18 %>%
dplyr::mutate(Var2 = tidyr::replace_na(Var2, replace = 0))
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 NA 0 3
2 1 0 4
3 1 0 2
2. Replace NA for multiple variables (Var1
and
Var2
) with the same NA value
Review the data (d18)
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 NA NA 3
2 1 NA 4
3 1 0 2
Recode NA for Var1
and Var2
to 0
dplyr::across()
to apply a
transformation to multiple columns.d18 %>%
dplyr::mutate(dplyr::across(Var1:Var2, ~ tidyr::replace_na(., 0)))
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 0 0 3
2 1 0 4
3 1 0 2
3. Replace NA for multiple variables (Var2
and
Var3
) with the same NA value using your project data
dictionary
Review the data (d18)
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 NA NA 3
2 1 NA 4
3 1 0 2
It’s possible that you may have built a data dictionary before collecting data and you may have added a column to your dictionary that helps you identify which columns need to be recoded in a particular way.
Let’s review our very simple data dictionary.
dict
# A tibble: 3 x 3
var_name label var_type
<chr> <chr> <chr>
1 Var1 Select all: Option 1 binary
2 Var2 Select all: Option 2 binary
3 Var3 Question wording of Var3 likert
Here we see our data dictionary has a column called “var_type” which tells us which variables are binary (or coded 0 and 1).
Oftentimes when you download “select all” questions from a survey platform, each option is downloaded with a 1 (if selected) and NA (if not selected). There are times when this format makes sense (say for visualizations). However, for analysis purposes we probably want those variables to show 1s (Yes) and 0s (No) instead of 1s and NAs.
Instead of listing out all of the variables we want to recode, we can pull a vector of variable names using our dictionary.
vars <- dict %>%
dplyr::filter(var_type == "binary") %>%
dplyr::select(var_name) %>%
dplyr::pull()
We can now use this vector in our tidyr::replace_na()
function.
tidyselect::all_of()
to denote that we are
selecting variable names from an external vector (“vars”).d18 %>%
dplyr::mutate(dplyr::across(tidyselect::all_of(vars), ~tidyr::replace_na(., 0)))
# A tibble: 3 x 3
Var1 Var2 Var3
<dbl> <dbl> <dbl>
1 0 0 3
2 1 0 4
3 1 0 2
4. Replace NA for multiple variables of the same class with
the same NA value (Var2
and Var3
)
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
Recode NA for Var2
and Var3
to -999
tidyselect
selection helper
where()
.d1 %>%
dplyr::mutate(dplyr::across(where(is.numeric), ~tidyr::replace_na(., -999)))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b -999 8.5
3 c 3 -999
5. Replace NA Var2
and Var3
with
different values
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
Replace NA with -999 for Var2
and replace NA with 0 for
Var3
.
Here we can use base::list()
to list the variables we
want to transform. Because we are not replacing NAs in a vector, we do
not have to use dplyr::mutate()
to do our
transformation.
d1 %>%
tidyr::replace_na(list(Var2 = -999, Var3 = 0))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b -999 8.5
3 c 3 0
na_if()
1. Replace a value for the entire dataset with NA
Review the data (d3)
# A tibble: 3 x 3
stu_id q1 q2
<dbl> <dbl> <dbl>
1 1 1 1
2 2 -99 1
3 3 -99 -99
Replace -999 for the entire dataset with NA
d24 %>%
mutate(across(everything(), ~na_if(., -99)))
# A tibble: 3 x 3
stu_id q1 q2
<dbl> <dbl> <dbl>
1 1 1 1
2 2 NA 1
3 3 NA NA
2. Replace a value for select variables (Var2
and Var3
) with NA
Review the data (d3)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b -999 0
3 c 3 -999
Replace -999 for for Var2
and Var3
with
NA
d3 %>%
dplyr::mutate(dplyr::across(Var2:Var3, ~dplyr::na_if(., -999)))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b NA 0
3 c 3 NA
3. Replace a value for multiple variables of the same class (in this case numeric variables) with NA
Review the data (d3)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b -999 0
3 c 3 -999
Replace -999 for numeric variables with NA
d3 %>%
dplyr::mutate(dplyr::across(where(is.numeric), ~ dplyr::na_if(.,-999)))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b NA 0
3 c 3 NA
Note: A potentially preferred method to using
~
(which comes from {rlang}), is to instead use
\(x)
which is native to R 4.1 and up.
d3 %>%
dplyr::mutate(dplyr::across(where(is.numeric), \(x) dplyr::na_if(x,-999)))
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b NA 0
3 c 3 NA
Return to Recode