Package: dplyr


Function: filter()


  • Note: Using dplyr::across() in dplyr::filter() is deprecated. dplyr::if_any() and dplyr::if_all() are predicate functions used to select columns within dplyr::filter(). This function is available in version 1.0.5 of dplyr. dplyr::if_any() returns a true when the statement is true for any of the variables. dplyr::if_all() returns a true when the statement is true for all of the variables. See Filter using if_all or if_any for further explanation

1. Keep rows that have -999 for extra1 AND/OR -999 for extra2.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Keep rows that have -999 for extra2 and/or extra3.

  • Note: Use of the | operator to request and/or.
d8 %>% 
  dplyr::filter(extra2 == -999 | extra3 == -999)
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 c        -999   -999     12        250 harris  

Another way I could write this is using dplyr::if_any()

d8 %>%
  dplyr::filter(dplyr::if_any(extra2:extra3, ~ . == -999))
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 c        -999   -999     12        250 harris  

2. Keep rows that have -999 for extra1 OR -999 for extra2 (not for both).

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Keep rows that have -999 for extra2 OR -999 for extra3 (but not -999 for both).

  • Note: Use the xor operator to request OR (not both).
d8 %>% 
  dplyr::filter(xor(extra2 == -999 , extra3 == -999))
# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   

3. Keep rows that have BOTH -999 for extra1 AND -999 for extra2.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Keep rows that have -999 for extra2 and also have -999 extra3.

  • Note: Use the & operator to request and.
d8 %>% 
  dplyr::filter(extra2 == -999 & extra3 == -999)
# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 c        -999   -999     12        250 harris  

Another way I could write this is using dplyr::if_all()

d8 %>%
  dplyr::filter(dplyr::if_all(extra2:extra3, ~ . == -999))
# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 c        -999   -999     12        250 harris  

4. Remove any row that has -999 for extra2 AND/OR has -999 for extra3.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Filter out any row that has -999 for extra2 and/or has -999 extra3.

  • Note: Notice that I put the operator ! outside of the parentheses. While there are other ways to write this equation, I find this the least confusing way to remove rows based on removal criteria from multiple variables.
d8 %>% 
  dplyr::filter(!(extra2 == -999 | extra3 == -999))
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 d           4      0     13        217 lewis   

I could also write this using dplyr::if_any()

d8 %>% 
  dplyr::filter(!dplyr::if_any(extra2:extra3, ~ . == -999))
# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 d           4      0     13        217 lewis   

5. Remove any row that has -999 for BOTH extra2 AND for extra3.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis   

Filter out any row that has -999 for extra2 and has -999 extra3.

  • Note: Notice that I put the operator ! outside of the parentheses. While there are other ways to write this equation, I find this the least confusing way to remove rows based on removal criteria from multiple variables.
d8 %>% 
  dplyr::filter(!(extra2 ==-999 & extra3 ==-999))
# A tibble: 3 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 d           4      0     13        217 lewis   

I could also write this using dplyr::if_all()

d8 %>% 
  dplyr::filter(!dplyr::if_all(extra2:extra3, ~ . == -999))
# A tibble: 3 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 d           4      0     13        217 lewis   

6. Keep rows where any variable that starts with the word “item” contains the value “MR”.

Review the data (d16)

# A tibble: 3 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    10 harris       6 5     3    
2    11 lewis        4 3     MR   
3    12 harris       6 4     1    

Keep rows where any of the “item” variables contains the value “MR”

  • Note: We use the tidyverse selection helper starts_with() to select variables.
d16 %>%
  dplyr::filter(dplyr::if_any(tidyselect::starts_with("item"), ~ . == "MR"))
# A tibble: 1 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    11 lewis        4 3     MR   

7. Keep rows where any variable that starts with the word “item” contain letters.

Review the data (d16)

# A tibble: 3 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    10 harris       6 5     3    
2    11 lewis        4 3     MR   
3    12 harris       6 4     1    

We want to review any rows where either item1 OR item2 contain letters.

  • Note: We use stringr::str_detect() to detect a specific pattern, assigned in the pattern argument. Here we use a regular expression to denote that we would like to identify zero or more letters from the A-Z alphabet, lower or upper case.
d16 %>% 
  dplyr::filter(dplyr::if_any(item1:item2, 
                ~ stringr::str_detect(., pattern = "[a-zA-Z]")))
# A tibble: 1 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    11 lewis        4 3     MR   

Return to Filter