Count values by groups

Package: dplyr

Function: `count()`

1. Count the number of students per teacher

Review the data (d1)

# A tibble: 6 x 3
  school tch_id stu_id
  <chr>   <dbl>  <dbl>
1 a          12     30
2 b          13     20
3 a          12     50
4 b          17     22
5 c          18     25
6 c          18     35

In this case our dataset is student level, so both teacher and school are grouping variables.

Count our number of students per grouping variable tch_id.

d1 %>%
  dplyr::count(tch_id)

# A tibble: 4 x 2
  tch_id     n
   <dbl> <int>
1     12     2
2     13     1
3     17     1
4     18     2

You’ll notice above that dplyr::count() does the grouping for you. But if you want to more explicitly see the steps, you could first group by tch_id using dplyr::group_by() and then using dplyr::summarize() to create a new variable n_stud and use the dplyr::n() function to count the number of rows per group.

d1 %>%
  dplyr::group_by(tch_id) %>%
  dplyr::summarize(n_stud = dplyr::n())

# A tibble: 4 x 2
  tch_id n_stud
   <dbl>  <int>
1     12      2
2     13      1
3     17      1
4     18      2

2. Count the number of distinct teachers per school

Review the data (d1)

# A tibble: 6 x 3
  school tch_id stu_id
  <chr>   <dbl>  <dbl>
1 a          12     30
2 b          13     20
3 a          12     50
4 b          17     22
5 c          18     25
6 c          18     35

Since this is a student level dataset, not a teacher level dataset, we see that teachers appear more than once in our data frame (once for every student they are associated with). In this case we don’t want to count a teacher more than once so we will need to only count distinct teacher ids per school. Otherwise it will appear that we have more teachers per school than we really do.

We can do this using dplyr::distinct() to first remove duplicate teachers and then do our count.

Note: You will need to add the argument .keep_all = TRUE in order to keep all variables after using the distinct function (otherwise it will only keep the tch_id variable).

d1 %>%
  dplyr::distinct(tch_id, .keep_all = TRUE) %>%
  count(school)

# A tibble: 3 x 2
  school     n
  <chr>  <int>
1 a          1
2 b          2
3 c          1

Or, like above, if you want to more explicitly group by school, you can first group using dplyr::group_by() and then create our new variable n_tch using dplyr::summmarize() and use dplyr::n_distinct() to count the distinct number of teachers per group.

d1 %>%
  dplyr::group_by(school) %>%
  dplyr::summarize(n_tch = dplyr::n_distinct(tch_id))

# A tibble: 3 x 2
  school n_tch
  <chr>  <int>
1 a          1
2 b          2
3 c          1

3. Count the number of students in each of our demographic grouping variables

Review the data (d5)

# A tibble: 7 x 4
  stu_id   frl gender  race
   <dbl> <dbl>  <dbl> <dbl>
1     30     1      1     1
2     30     2      1     2
3     31     2      3     1
4     32     3      2     4
5     33     1      1     3
6     33     3      2     5
7     24     3      4     6

Here we can use purrr::map() to map our count function across all of our selected variables, returning a list.

d5 %>%
     dplyr::select(frl, gender, race) %>%
     purrr::map(~dplyr::count(d5,{{.x}}))

$frl
# A tibble: 3 x 2
  `<dbl>`     n
    <dbl> <int>
1       1     2
2       2     2
3       3     3

$gender
# A tibble: 4 x 2
  `<dbl>`     n
    <dbl> <int>
1       1     3
2       2     2
3       3     1
4       4     1

$race
# A tibble: 6 x 2
  `<dbl>`     n
    <dbl> <int>
1       1     2
2       2     1
3       3     1
4       4     1
5       5     1
6       6     1

Return to Count

Count values by groups

Package: dplyr

Function: count()

Function: `count()`