count()1. Count the number of students per teacher
Review the data (d1)
# A tibble: 6 x 3
school tch_id stu_id
<chr> <dbl> <dbl>
1 a 12 30
2 b 13 20
3 a 12 50
4 b 17 22
5 c 18 25
6 c 18 35
In this case our dataset is student level, so both teacher and school are grouping variables.
Count our number of students per grouping variable
tch_id.
d1 %>%
dplyr::count(tch_id)
# A tibble: 4 x 2
tch_id n
<dbl> <int>
1 12 2
2 13 1
3 17 1
4 18 2
You’ll notice above that dplyr::count() does the
grouping for you. But if you want to more explicitly see the steps, you
could first group by tch_id using
dplyr::group_by() and then using
dplyr::summarize() to create a new variable
n_stud and use the dplyr::n() function to
count the number of rows per group.
d1 %>%
dplyr::group_by(tch_id) %>%
dplyr::summarize(n_stud = dplyr::n())
# A tibble: 4 x 2
tch_id n_stud
<dbl> <int>
1 12 2
2 13 1
3 17 1
4 18 2
2. Count the number of distinct teachers per school
Review the data (d1)
# A tibble: 6 x 3
school tch_id stu_id
<chr> <dbl> <dbl>
1 a 12 30
2 b 13 20
3 a 12 50
4 b 17 22
5 c 18 25
6 c 18 35
Since this is a student level dataset, not a teacher level dataset, we see that teachers appear more than once in our data frame (once for every student they are associated with). In this case we don’t want to count a teacher more than once so we will need to only count distinct teacher ids per school. Otherwise it will appear that we have more teachers per school than we really do.
We can do this using dplyr::distinct() to first remove
duplicate teachers and then do our count.
tch_id variable).d1 %>%
dplyr::distinct(tch_id, .keep_all = TRUE) %>%
count(school)
# A tibble: 3 x 2
school n
<chr> <int>
1 a 1
2 b 2
3 c 1
Or, like above, if you want to more explicitly group by
school, you can first group using
dplyr::group_by() and then create our new variable
n_tch using dplyr::summmarize() and use
dplyr::n_distinct() to count the distinct number of
teachers per group.
d1 %>%
dplyr::group_by(school) %>%
dplyr::summarize(n_tch = dplyr::n_distinct(tch_id))
# A tibble: 3 x 2
school n_tch
<chr> <int>
1 a 1
2 b 2
3 c 1
3. Count the number of students in each of our demographic grouping variables
Review the data (d5)
# A tibble: 7 x 4
stu_id frl gender race
<dbl> <dbl> <dbl> <dbl>
1 30 1 1 1
2 30 2 1 2
3 31 2 3 1
4 32 3 2 4
5 33 1 1 3
6 33 3 2 5
7 24 3 4 6
Here we can use purrr::map() to map our count function
across all of our selected variables, returning a list.
d5 %>%
dplyr::select(frl, gender, race) %>%
purrr::map(~dplyr::count(d5,{{.x}}))
$frl
# A tibble: 3 x 2
`<dbl>` n
<dbl> <int>
1 1 2
2 2 2
3 3 3
$gender
# A tibble: 4 x 2
`<dbl>` n
<dbl> <int>
1 1 3
2 2 2
3 3 1
4 4 1
$race
# A tibble: 6 x 2
`<dbl>` n
<dbl> <int>
1 1 2
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
Return to Count