coalesce()1. Create a complete 20-21 gender variable
(gender_2021) with no missing data
Review the data (d3)
# A tibble: 5 x 5
s_id gender_1718 gender_1819 gender_1920 gender_2021
<dbl> <chr> <chr> <chr> <chr>
1 123 male male <NA> male
2 145 non-binary non-binary non-binary <NA>
3 150 female female female female
4 164 male <NA> male <NA>
5 170 female male <NA> male
Fill in missing 20-21 self-reported gender information from the most recently available year of data.
gender_2021_new)
by using dplyr::mutate().d3 %>%
dplyr::mutate(gender_2021_new = dplyr::coalesce(gender_2021, gender_1920, gender_1819, gender_1718))
# A tibble: 5 x 6
s_id gender_1718 gender_1819 gender_1920 gender_2021 gender_2021_new
<dbl> <chr> <chr> <chr> <chr> <chr>
1 123 male male <NA> male male
2 145 non-binary non-binary non-binary <NA> non-binary
3 150 female female female female female
4 164 male <NA> male <NA> male
5 170 female male <NA> male male
However, imagine you have 20 years worth of gender variables and you
do not want to list all of those variables out. Instead you can use the
splice operator (!!!) to select all variables that meet a
certain criteria. The coalescing happens by variable order, so if you
wanted to use the most recent values and fill from there, you would need
to reorder your variables first.
d3 <- d3 %>%
dplyr::relocate(s_id, gender_2021, gender_1920, gender_1819)
d3 %>%
dplyr::mutate(gender_2021_new = dplyr::coalesce(!!! dplyr::select(., contains("gender"))))
# A tibble: 5 x 6
s_id gender_2021 gender_1920 gender_1819 gender_1718 gender_2021_new
<dbl> <chr> <chr> <chr> <chr> <chr>
1 123 male <NA> male male male
2 145 <NA> non-binary non-binary non-binary non-binary
3 150 female female female female female
4 164 <NA> male <NA> male male
5 170 male <NA> male female male
dplyr::case_when() to fill empty values (similar to what we
did in the Recode
section). If you are planning to coalesce using many variables, I think
recoding blanks to NA_character first would be your best option.Here is an example below. Our new data has blanks rather than NAs.
d6
# A tibble: 5 x 5
s_id gender_1718 gender_1819 gender_1920 gender_2021
<dbl> <chr> <chr> <chr> <chr>
1 123 male "male" "" "male"
2 145 non-binary "non-binary" "non-binary" ""
3 150 female "female" "female" "female"
4 164 male "" "male" ""
5 170 female "male" "" "male"
I can select all gender variables that I want to recode using
dplyr::across() along with
tidyselect::contains() and then recode all “” to NAs by
using dplyr::na_if(). Now that my blank values are coded to
NA, coalesce will work fine.
d6 %>%
dplyr::mutate(dplyr::across(tidyselect::contains("gender"),
~ dplyr::na_if(., ""))) %>%
dplyr::mutate(gender_2021_new = dplyr::coalesce(gender_2021, gender_1920, gender_1819, gender_1718))
# A tibble: 5 x 6
s_id gender_1718 gender_1819 gender_1920 gender_2021 gender_2021_new
<dbl> <chr> <chr> <chr> <chr> <chr>
1 123 male male <NA> male male
2 145 non-binary non-binary non-binary <NA> non-binary
3 150 female female female female female
4 164 male <NA> male <NA> male
5 170 female male <NA> male male
Return to Complete