summarise()1. Calculate percent adherence for
item1_t1
Review the data (obs_data)
# A tibble: 6 x 4
tch_id item1_t1 item1_t2 item1_t3
<dbl> <dbl> <dbl> <dbl>
1 100 1 0 1
2 101 1 1 1
3 102 0 1 1
4 104 1 1 1
5 107 1 1 0
6 109 1 1 0
Here we want to know the percentage of teachers that implemented
item1 in time 1. In this variable
1 = implemented
0 = did not implement
base::sum() is to
not calculate a sum if any NA value exists. If you want
to still calculate a sum despite missing values, you can add the
argument na.rm = TRUE.obs_data %>%
summarise(item1_t1_per = sum(item1_t1, na.rm = TRUE)/n())
# A tibble: 1 x 1
item1_t1_per
<dbl>
1 0.833
2. Calculate percent adherence for item1 across
all time periods
Review the data (obs_data)
# A tibble: 6 x 4
tch_id item1_t1 item1_t2 item1_t3
<dbl> <dbl> <dbl> <dbl>
1 100 1 0 1
2 101 1 1 1
3 102 0 1 1
4 104 1 1 1
5 107 1 1 0
6 109 1 1 0
Since our data is in wide format, this is not a simple column calculation. We first need to restructure our wide data into long format so that all item1 values are in the same column.
First restructure using tidyr::pivot_longer()
obs_data_long <- obs_data %>%
pivot_longer(
cols = starts_with("item"),
names_sep = ("_"),
names_to = c(".value", "time"))
head(obs_data_long)
# A tibble: 6 x 3
tch_id time item1
<dbl> <chr> <dbl>
1 100 t1 1
2 100 t2 0
3 100 t3 1
4 101 t1 1
5 101 t2 1
6 101 t3 1
Then calculate percent adherence
obs_data_long %>%
summarise(item1_per = sum(item1, na.rm = TRUE)/n())
# A tibble: 1 x 1
item1_per
<dbl>
1 0.778
3. Calculate percent adherence for multiple items across all time periods
Review the data (obs_data2)
# A tibble: 6 x 7
tch_id item1_t1 item1_t2 item1_t3 item2_t1 item2_t2 item2_t3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 0 1 0 0 1
2 101 1 1 1 1 0 1
3 102 0 1 1 1 1 1
4 104 1 1 1 0 0 1
5 107 1 1 0 1 1 0
6 109 1 1 0 1 1 1
Similar to above, first we restructure using
tidyr::pivot_longer()
obs_data2_long <- obs_data2 %>%
pivot_longer(
cols = starts_with("item"),
names_sep = ("_"),
names_to = c(".value", "time"))
head(obs_data2_long)
# A tibble: 6 x 4
tch_id time item1 item2
<dbl> <chr> <dbl> <dbl>
1 100 t1 1 0
2 100 t2 0 0
3 100 t3 1 1
4 101 t1 1 1
5 101 t2 1 0
6 101 t3 1 1
Then calculate percent adherence for item1 and
item2
dplyr::across() argument
.names to rename our new columns.obs_data2_long %>%
summarise(across(
contains("item"),
~ sum(., na.rm = TRUE)/n(),
.names = "{.col}_per"
))
# A tibble: 1 x 2
item1_per item2_per
<dbl> <dbl>
1 0.778 0.667
Return to Calculate Sums and Means