rowSums()1. Calculate the sum of NA values across all items in the “toca” measure
Review the data (d5)
# A tibble: 5 x 5
stu_id toca1 toca2 toca3 toca4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1234 3 2 NA 3
2 2345 4 3 1 4
3 3456 2 NA 1 NA
4 4567 4 5 1 6
5 5678 1 3 2 2
Calculate the sum of NAs
Note: We are calculating a new variable using
dplyr::mutate()
Note: Adding dplyr::across() allows you to select
the specific columns you want to calculate the
base::rowSums() for. Otherwise rowSums will be
applied across all columns.
Note: We use the tidyselect selection helper
contains() to refer to all variables that contain with the
word “toca”.
Note: Adding base::is.na() returns a logical value
which can be counted/summed.
d5 %>%
mutate(toca_na_sum = rowSums(is.na(across(contains("toca")))))
# A tibble: 5 x 6
stu_id toca1 toca2 toca3 toca4 toca_na_sum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1234 3 2 NA 3 1
2 2345 4 3 1 4 0
3 3456 2 NA 1 NA 2
4 4567 4 5 1 6 0
5 5678 1 3 2 2 0
We could also select the variables in other ways and it would work just as well.
d5 %>%
mutate(toca_na_sum = rowSums(is.na(across(toca1:toca4))))
# A tibble: 5 x 6
stu_id toca1 toca2 toca3 toca4 toca_na_sum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1234 3 2 NA 3 1
2 2345 4 3 1 4 0
3 3456 2 NA 1 NA 2
4 4567 4 5 1 6 0
5 5678 1 3 2 2 0
And last, we can also write this where, rather than wrapping the
is.na() function around our dplyr::across()
function, we call our variables first and then add is.na()
in an anonymous function.
d5 %>%
mutate(toca_na_sum = rowSums(across(toca1:toca4, ~ is.na(.))))
# A tibble: 5 x 6
stu_id toca1 toca2 toca3 toca4 toca_na_sum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1234 3 2 NA 3 1
2 2345 4 3 1 4 0
3 3456 2 NA 1 NA 2
4 4567 4 5 1 6 0
5 5678 1 3 2 2 0
Return to Calculate Sums and Means