rows_update()1. Fill in missing values using a second dataset
Review the data (d5)
# A tibble: 5 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 22 3 4 5
2 23 3 5 4
3 24 4 2 1
4 25 5 NA NA
5 26 4 NA NA
Notice we are missing data for id 25 and 26, item1 and
item2
Luckily that data was collected later on another form.
Let’s review that form (extra_form)
# A tibble: 2 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 25 5 2 3
2 26 4 4 5
We can use dplyr::rows_update() to fill in the missing
data on our original form with our new form data.
d5 %>%
dplyr::rows_update(extra_form, by = "id")
# A tibble: 5 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 22 3 4 5
2 23 3 5 4
3 24 4 2 1
4 25 5 2 3
5 26 4 4 5
What would happen if our extra_form had different values than we already have for existing data though?
For example in this case where we have a new form (extra_form2), where the grade level has changed for id 25 (from 5 -> NA) and id 26 (from 4 -> 5).
# A tibble: 2 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 25 NA 2 3
2 26 5 4 5
We see that if we use this new form with updated information for existing variables, that all of our data will update based on the new data, not just the missing data. See that grade level for id 25 has changed from 5 to NA and grade level for id 26 has changed from 4 to 5.
We would need to make a decision about which values for grade level are more accurate, the old values or the new values.
d5 %>%
dplyr::rows_update(extra_form2, by = "id")
# A tibble: 5 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 22 3 4 5
2 23 3 5 4
3 24 4 2 1
4 25 NA 2 3
5 26 5 4 5
If we do not want to overwrite all data with new information, we can
use another function, dplyr::rows_patch() which only
overwrites NA values.
d5 %>%
dplyr::rows_patch(extra_form2, by = "id")
# A tibble: 5 x 4
id grade item1 item2
<dbl> <dbl> <dbl> <dbl>
1 22 3 4 5
2 23 3 5 4
3 24 4 2 1
4 25 5 2 3
5 26 4 4 5
Return to Complete