case_when()1. Recode all missing survey dates to the midpoint of the date range for id 3 and 6
Review the data (d17)
# A tibble: 7 x 2
id svy_date
<dbl> <date>
1 1 2022-10-01
2 2 2022-10-05
3 3 NA
4 4 2022-10-14
5 5 NA
6 6 NA
7 7 2022-10-07
But first we need to find the midpoint of the svy_date
variable. We can do that by getting the min and max (using
base::range()) and then taking the mean using
base::mean() of that range. We need to make sure to add the
argument na_rm = TRUE to calculate the range despite having NA
values. We can save the value in a vector to be used later.
date_mid <- base::mean(base::range(d17$svy_date, na.rm = TRUE))
date_mid
[1] "2022-10-07"
Now we can use this vector in our recoding statement. We can use
dplyr::case_when() to recode the NA values.
Note: I am using dplyr::mutate() to create a new
svy_date variable which will replace the existing
variable.
Note: TRUE means, if a value was not evaluated in my arguments above, replace with the value I give. In this case, I am saying just replace with the existing value.
d17_new <- d17 %>%
dplyr::mutate(svy_date = dplyr::case_when(
id %in% c(3,6) ~ date_mid,
TRUE ~ svy_date
))
d17_new
# A tibble: 7 x 2
id svy_date
<dbl> <date>
1 1 2022-10-01
2 2 2022-10-05
3 3 2022-10-07
4 4 2022-10-14
5 5 NA
6 6 2022-10-07
7 7 2022-10-07
Awesome. This looks like it worked great. But wait, let’s do one last check. Let’s make a table of the values.
d17_new %>%
janitor::tabyl(svy_date)
svy_date n percent valid_percent
2022-10-01 1 0.1428571 0.1666667
2022-10-05 1 0.1428571 0.1666667
2022-10-07 1 0.1428571 0.1666667
2022-10-07 2 0.2857143 0.3333333
2022-10-14 1 0.1428571 0.1666667
<NA> 1 0.1428571 NA
Well that doesn’t look right at all. For some reason this
janitor::tabyl() function is not recognizing that my new
“2022-10-07” values should be the same as the existing “2022-10-07”
values. So I decide to try out a few date formats in my recode function
until I find one that forces R to recognize them as the same value. I
land on lubridate::ymd().
d17_new <- d17 %>%
dplyr::mutate(svy_date = dplyr::case_when(
id %in% c(3,6) ~ lubridate::ymd(date_mid),
TRUE ~ svy_date
))
d17_new
# A tibble: 7 x 2
id svy_date
<dbl> <date>
1 1 2022-10-01
2 2 2022-10-05
3 3 2022-10-07
4 4 2022-10-14
5 5 NA
6 6 2022-10-07
7 7 2022-10-07
d17_new %>%
janitor::tabyl(svy_date)
svy_date n percent valid_percent
2022-10-01 1 0.1428571 0.1666667
2022-10-05 1 0.1428571 0.1666667
2022-10-07 3 0.4285714 0.5000000
2022-10-14 1 0.1428571 0.1666667
<NA> 1 0.1428571 NA
Return to Recode