Package: dplyr


Function: summarise()


1. Calculate percent adherence for item1_t1

Review the data (obs_data)

# A tibble: 6 x 4
  tch_id item1_t1 item1_t2 item1_t3
   <dbl>    <dbl>    <dbl>    <dbl>
1    100        1        0        1
2    101        1        1        1
3    102        0        1        1
4    104        1        1        1
5    107        1        1        0
6    109        1        1        0

Here we want to know the percentage of teachers that implemented item1 in time 1. In this variable

1 = implemented
0 = did not implement

  • Note: The default for base::sum() is to not calculate a sum if any NA value exists. If you want to still calculate a sum despite missing values, you can add the argument na.rm = TRUE.
obs_data %>%
  summarise(item1_t1_per = sum(item1_t1, na.rm = TRUE)/n())
# A tibble: 1 x 1
  item1_t1_per
         <dbl>
1        0.833

2. Calculate percent adherence for item1 across all time periods

Review the data (obs_data)

# A tibble: 6 x 4
  tch_id item1_t1 item1_t2 item1_t3
   <dbl>    <dbl>    <dbl>    <dbl>
1    100        1        0        1
2    101        1        1        1
3    102        0        1        1
4    104        1        1        1
5    107        1        1        0
6    109        1        1        0

Since our data is in wide format, this is not a simple column calculation. We first need to restructure our wide data into long format so that all item1 values are in the same column.

First restructure using tidyr::pivot_longer()

obs_data_long <- obs_data %>%
  pivot_longer(
    cols = starts_with("item"),
    names_sep = ("_"),
    names_to = c(".value", "time"))

head(obs_data_long)
# A tibble: 6 x 3
  tch_id time  item1
   <dbl> <chr> <dbl>
1    100 t1        1
2    100 t2        0
3    100 t3        1
4    101 t1        1
5    101 t2        1
6    101 t3        1

Then calculate percent adherence

obs_data_long %>%
  summarise(item1_per = sum(item1, na.rm = TRUE)/n())
# A tibble: 1 x 1
  item1_per
      <dbl>
1     0.778

3. Calculate percent adherence for multiple items across all time periods

Review the data (obs_data2)

# A tibble: 6 x 7
  tch_id item1_t1 item1_t2 item1_t3 item2_t1 item2_t2 item2_t3
   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1    100        1        0        1        0        0        1
2    101        1        1        1        1        0        1
3    102        0        1        1        1        1        1
4    104        1        1        1        0        0        1
5    107        1        1        0        1        1        0
6    109        1        1        0        1        1        1

Similar to above, first we restructure using tidyr::pivot_longer()

obs_data2_long <- obs_data2 %>%
  pivot_longer(
    cols = starts_with("item"),
    names_sep = ("_"),
    names_to = c(".value", "time"))

head(obs_data2_long)
# A tibble: 6 x 4
  tch_id time  item1 item2
   <dbl> <chr> <dbl> <dbl>
1    100 t1        1     0
2    100 t2        0     0
3    100 t3        1     1
4    101 t1        1     1
5    101 t2        1     0
6    101 t3        1     1

Then calculate percent adherence for item1 and item2

  • Note: We can use the dplyr::across() argument .names to rename our new columns.
obs_data2_long %>%
  summarise(across(
    contains("item"),
    ~ sum(., na.rm = TRUE)/n(),
    .names = "{.col}_per"
  ))
# A tibble: 1 x 2
  item1_per item2_per
      <dbl>     <dbl>
1     0.778     0.667

Return to Calculate Sums and Means