Package: dplyr


Function: count()


1. Count how many students we have in our data frame

Review the data (d1)

# A tibble: 6 x 3
  school tch_id stu_id
  <chr>   <dbl>  <dbl>
1 a          12     30
2 b          13     20
3 a          12     50
4 b          17     22
5 c          18     25
6 c          18     35

Count our number of students.

Since each row in our data should be a unique student, we can simply count the number of rows in the data frame.

d1 %>%
  dplyr::count()
# A tibble: 1 x 1
      n
  <int>
1     6

We could also use base::nrow() in this case as well.

  • Note: Note the difference in the output type though. Above using dplyr::count() we got a tibble. Now with base::nrow() we get a numeric vector.
d1 %>%
  base::nrow()
[1] 6

2. Count the number of students we have in our data frame when there are duplicate students

Review the data (d3)

# A tibble: 6 x 3
  school tch_id stu_id
  <chr>   <dbl>  <dbl>
1 a          12     30
2 b          13     20
3 a          12     30
4 b          17     22
5 c          18     25
6 c          18     35

Notice now that ID 30 appears twice. It’s possible that we accidentally collected data on this student twice. We don’t actually want to count this student twice though in our sample.

So in this case we can use dplyr::distinct() to get distinct IDs before we do our count.

d3 %>%
  dplyr::distinct(stu_id) %>%
  dplyr::count()
# A tibble: 1 x 1
      n
  <int>
1     5

Return to Count