Package: dplyr


Function: filter()


1. Filter to only (grade = 1 for sch = “a”) and only (grade = 9 for sch = “b”)

Review the data (d11)

# A tibble: 7 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 a         21     3    12   250
5 b         22     9    13   217
6 b         31    10    NA    NA
7 b         32    10    14   251

Conditionally filter

  • Note: Note that we put each condition within () and add the and/or operator between the two conditions
d11 %>%
  filter((sch == "a" & grade == 1) | (sch == "b" & grade == 9))
# A tibble: 3 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 b         22     9    13   217

You can use this same conditional filtering for more than just getting exact matches. Here is an example where we use the same logic but get all grades >= 9 for school b.

d11 %>%
  filter((sch == "a" & grade == 1) | (sch == "b" & grade >= 9))
# A tibble: 5 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 b         22     9    13   217
4 b         31    10    NA    NA
5 b         32    10    14   251

2. Filter to (grade between 1 & 2 for sch = “a”) and (grade = 9 for sch = “b”)

Review the data (d11)

# A tibble: 7 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 a         21     3    12   250
5 b         22     9    13   217
6 b         31    10    NA    NA
7 b         32    10    14   251

Conditionally filter

d11 %>%
  filter((sch == "a" & between(grade, 1, 2)) | (sch == "b" & grade==9))
# A tibble: 4 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 b         22     9    13   217

I could have also used and/or rather than between in this situation.

d11 %>%
  filter((sch == "a" & (grade == 1 | grade ==2)) | (sch == "b" & grade == 9))
# A tibble: 4 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 b         22     9    13   217

Or, I also could have used `%in%

d11 %>%
  filter((sch == "a" & (grade %in% 1:2)) | (sch == "b" & grade == 9))
# A tibble: 4 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 b         22     9    13   217

3. Filter to only (grade = 1 for sch = “a”) and (grade = 9 for sch = “b”), but also keep all other schools, no matter the grade level

Review the data (d15)

Notice this dataset also has a school c

# A tibble: 9 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 a         11     2    15   220
4 a         21     3    12   250
5 b         22     9    13   217
6 b         31    10    NA    NA
7 b         32    10    14   251
8 c         40     2    10   200
9 c         41     3    14   220

Conditionally filter

d15 %>%
  filter((sch == "a" & grade == 1) | (sch == "b" & grade == 9) | (sch == "c"))
# A tibble: 5 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 b         22     9    13   217
4 c         40     2    10   200
5 c         41     3    14   220

If I did not add a condition to retain school c, only a and b would remain

d15 %>%
  filter((sch == "a" & grade == 1) | (sch == "b" & grade == 9))
# A tibble: 3 x 5
  sch   stu_id grade  var2  var3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 a         10     1    20   205
2 a         15     1    14   201
3 b         22     9    13   217

Return to Filter