Package: dplyr


Function: filter()

Examples using one criteria for one variable (character)


1. Keep any row that has the exact match of “harris” for tch_name.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Keep any row that has “harris” for tch_name.

  • Note: Use the logical operator == to denote exactly equal to.
d8 %>% 
  dplyr::filter(tch_name == "harris")
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 c        -999   -999     12        250 harris  

You could get the same result adding the stringr::str_detect() function.

d8 %>% 
  dplyr::filter(stringr::str_detect(tch_name, 'har') )
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 c        -999   -999     12        250 harris  

2. Remove any row that has the exact match of “harris” for tch_name.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Filter out any row that has “harris” for tch_name.

  • Note: Use the logical operator != to denote not equal to.
d8 %>% 
  dplyr::filter(tch_name != "harris")
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 d           4      0     13        217 lewis   

Again, you could get the same result adding the stringr::str_detect() function and adding the ! operator.

d8 %>% 
  dplyr::filter(!stringr::str_detect(tch_name, 'har'))
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 d           4      0     13        217 lewis   

You could also remove the ! and add the stringr::str_detect argument negate=TRUE which returns non-matching elements.

d8 %>% 
  dplyr::filter(stringr::str_detect(tch_name, 'har', negate=TRUE))
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 d           4      0     13        217 lewis   

Function: filter()

Examples using one criteria for one variable (date)


3. Keep any row where test_date occurred in the year 2021.

Review the data (d12)

# A tibble: 6 x 5
  sch   stu_id test_date     q1    q2
  <chr>  <dbl> <chr>      <dbl> <dbl>
1 a         10 2019-04-11    20   205
2 a         10 2020-03-21    14   201
3 a         10 2021-03-28    15   220
4 b         22 2019-04-12    13   217
5 b         22 <NA>          NA    NA
6 b         22 2021-03-29    14   251

First lets check to see what type of variable test_date is.

We find that it is a character. In order to filter as a date, we either need to convert the variable type to date now, or within our filter.

class(d12$test_date)
[1] "character"

Keep any row where the test date occurred in the year 2021.

  • Note: We added lubridate::as_date() in order to change the variable class to date

  • Note: Use the logical operator >= to denote greater than or equal to.

d12 %>% 
  dplyr::filter(lubridate::as_date(test_date) >= "2021-01-01")
# A tibble: 2 x 5
  sch   stu_id test_date     q1    q2
  <chr>  <dbl> <chr>      <dbl> <dbl>
1 a         10 2021-03-28    15   220
2 b         22 2021-03-29    14   251
  • Note: I knew that the highest year in the data was 2021 so I didn’t need to add a less than for this function. However if there were years greater than 2021, I could write the function like this.
d12 %>% 
  dplyr::filter(lubridate::as_date(test_date) >= "2021-01-01" &
                  lubridate::as_date(test_date) < "2022-01-01")
# A tibble: 2 x 5
  sch   stu_id test_date     q1    q2
  <chr>  <dbl> <chr>      <dbl> <dbl>
1 a         10 2021-03-28    15   220
2 b         22 2021-03-29    14   251

I could also simplify this by not adding the day and month and just looking for the year by using the lubridate::year() function.

d12 %>% 
  dplyr::filter(lubridate::year(lubridate::as_date(test_date)) ==
                  "2021")
# A tibble: 2 x 5
  sch   stu_id test_date     q1    q2
  <chr>  <dbl> <chr>      <dbl> <dbl>
1 a         10 2021-03-28    15   220
2 b         22 2021-03-29    14   251

Function: filter()

Examples using one criteria for one variable (numeric)


1. Keep any row whose value is greater than or equal to 0 for extra2.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Keep any row that has a value greater than or equal to zero for extra2.

  • Note: Use the logical operator >= to denote greater than or equal to.
d8 %>% 
  dplyr::filter(extra2 >= 0)
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 d           4      0     13        217 lewis   

Return to Filter