Package: dplyr


Function: case_when()


1. Recode all missing survey dates to the midpoint of the date range for id 3 and 6

Review the data (d17)

# A tibble: 7 x 2
     id svy_date  
  <dbl> <date>    
1     1 2022-10-01
2     2 2022-10-05
3     3 NA        
4     4 2022-10-14
5     5 NA        
6     6 NA        
7     7 2022-10-07

But first we need to find the midpoint of the svy_date variable. We can do that by getting the min and max (using base::range()) and then taking the mean using base::mean() of that range. We need to make sure to add the argument na_rm = TRUE to calculate the range despite having NA values. We can save the value in a vector to be used later.

  • Note: Notice that the class of our “svy_date” variable is date (we can see above). If it was character we would need to convert it to date before performing calculations.
date_mid <- base::mean(base::range(d17$svy_date, na.rm = TRUE))

date_mid
[1] "2022-10-07"

Now we can use this vector in our recoding statement. We can use dplyr::case_when() to recode the NA values.

  • Note: I am using dplyr::mutate() to create a new svy_date variable which will replace the existing variable.

  • Note: TRUE means, if a value was not evaluated in my arguments above, replace with the value I give. In this case, I am saying just replace with the existing value.

d17_new <- d17 %>%
  dplyr::mutate(svy_date = dplyr::case_when(
    id %in% c(3,6) ~ date_mid,
    TRUE ~ svy_date
  ))

d17_new
# A tibble: 7 x 2
     id svy_date  
  <dbl> <date>    
1     1 2022-10-01
2     2 2022-10-05
3     3 2022-10-07
4     4 2022-10-14
5     5 NA        
6     6 2022-10-07
7     7 2022-10-07

Awesome. This looks like it worked great. But wait, let’s do one last check. Let’s make a table of the values.

d17_new %>%
  janitor::tabyl(svy_date)
   svy_date n   percent valid_percent
 2022-10-01 1 0.1428571     0.1666667
 2022-10-05 1 0.1428571     0.1666667
 2022-10-07 1 0.1428571     0.1666667
 2022-10-07 2 0.2857143     0.3333333
 2022-10-14 1 0.1428571     0.1666667
       <NA> 1 0.1428571            NA

Well that doesn’t look right at all. For some reason this janitor::tabyl() function is not recognizing that my new “2022-10-07” values should be the same as the existing “2022-10-07” values. So I decide to try out a few date formats in my recode function until I find one that forces R to recognize them as the same value. I land on lubridate::ymd().

d17_new <- d17 %>%
  dplyr::mutate(svy_date = dplyr::case_when(
    id %in% c(3,6) ~ lubridate::ymd(date_mid),
    TRUE ~ svy_date
  ))

d17_new
# A tibble: 7 x 2
     id svy_date  
  <dbl> <date>    
1     1 2022-10-01
2     2 2022-10-05
3     3 2022-10-07
4     4 2022-10-14
5     5 NA        
6     6 2022-10-07
7     7 2022-10-07
d17_new %>%
  janitor::tabyl(svy_date)
   svy_date n   percent valid_percent
 2022-10-01 1 0.1428571     0.1666667
 2022-10-05 1 0.1428571     0.1666667
 2022-10-07 3 0.4285714     0.5000000
 2022-10-14 1 0.1428571     0.1666667
       <NA> 1 0.1428571            NA

Return to Recode