Filter rows based on one criteria for multiple variables

Package: dplyr

Function: `filter()`

Note: Using dplyr::across() in dplyr::filter() is deprecated. dplyr::if_any() and dplyr::if_all() are predicate functions used to select columns within dplyr::filter(). This function is available in version 1.0.5 of dplyr. dplyr::if_any() returns a true when the statement is true for any of the variables. dplyr::if_all() returns a true when the statement is true for all of the variables. See Filter using if_all or if_any for further explanation

1. Keep rows that have -999 for extra1 AND/OR -999 for extra2.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis

Keep rows that have -999 for extra2 and/or extra3.

Note: Use of the | operator to request and/or.

d8 %>% 
  dplyr::filter(extra2 == -999 | extra3 == -999)

# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 c        -999   -999     12        250 harris

Another way I could write this is using dplyr::if_any()

d8 %>%
  dplyr::filter(dplyr::if_any(extra2:extra3, ~ . == -999))

# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve   
2 c        -999   -999     12        250 harris

2. Keep rows that have -999 for extra1 OR -999 for extra2 (not for both).

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis

Keep rows that have -999 for extra2 OR -999 for extra3 (but not -999 for both).

Note: Use the xor operator to request OR (not both).

d8 %>% 
  dplyr::filter(xor(extra2 == -999 , extra3 == -999))

# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 b        -999      0     11        220 steve

3. Keep rows that have BOTH -999 for extra1 AND -999 for extra2.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis

Keep rows that have -999 for extra2 and also have -999 extra3.

Note: Use the & operator to request and.

d8 %>% 
  dplyr::filter(extra2 == -999 & extra3 == -999)

# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 c        -999   -999     12        250 harris

Another way I could write this is using dplyr::if_all()

d8 %>%
  dplyr::filter(dplyr::if_all(extra2:extra3, ~ . == -999))

# A tibble: 1 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 c        -999   -999     12        250 harris

4. Remove any row that has -999 for extra2 AND/OR has -999 for extra3.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis

Filter out any row that has -999 for extra2 and/or has -999 extra3.

Note: Notice that I put the operator ! outside of the parentheses. While there are other ways to write this equation, I find this the least confusing way to remove rows based on removal criteria from multiple variables.

d8 %>% 
  dplyr::filter(!(extra2 == -999 | extra3 == -999))

# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 d           4      0     13        217 lewis

I could also write this using dplyr::if_any()

d8 %>% 
  dplyr::filter(!dplyr::if_any(extra2:extra3, ~ . == -999))

# A tibble: 2 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 d           4      0     13        217 lewis

5. Remove any row that has -999 for BOTH extra2 AND for extra3.

Review the data (d8)

# A tibble: 4 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 c        -999   -999     12        250 harris  
4 d           4      0     13        217 lewis

Filter out any row that has -999 for extra2 and has -999 extra3.

Note: Notice that I put the operator ! outside of the parentheses. While there are other ways to write this equation, I find this the least confusing way to remove rows based on removal criteria from multiple variables.

d8 %>% 
  dplyr::filter(!(extra2 ==-999 & extra3 ==-999))

# A tibble: 3 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 d           4      0     13        217 lewis

I could also write this using dplyr::if_all()

d8 %>% 
  dplyr::filter(!dplyr::if_all(extra2:extra3, ~ . == -999))

# A tibble: 3 x 6
  extra1 extra2 extra3 stu_id test_score tch_name
  <chr>   <dbl>  <dbl>  <dbl>      <dbl> <chr>   
1 a           1      2     10        205 harris  
2 b        -999      0     11        220 steve   
3 d           4      0     13        217 lewis

6. Keep rows where any variable that starts with the word “item” contains the value “MR”.

Review the data (d16)

# A tibble: 3 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    10 harris       6 5     3    
2    11 lewis        4 3     MR   
3    12 harris       6 4     1

Keep rows where any of the “item” variables contains the value “MR”

Note: We use the tidyverse selection helper starts_with() to select variables.

d16 %>%
  dplyr::filter(dplyr::if_any(tidyselect::starts_with("item"), ~ . == "MR"))

# A tibble: 1 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    11 lewis        4 3     MR

7. Keep rows where any variable that starts with the word “item” contain letters.

Review the data (d16)

# A tibble: 3 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    10 harris       6 5     3    
2    11 lewis        4 3     MR   
3    12 harris       6 4     1

We want to review any rows where either item1 OR item2 contain letters.

Note: We use stringr::str_detect() to detect a specific pattern, assigned in the pattern argument. Here we use a regular expression to denote that we would like to identify zero or more letters from the A-Z alphabet, lower or upper case.

d16 %>% 
  dplyr::filter(dplyr::if_any(item1:item2, 
                ~ stringr::str_detect(., pattern = "[a-zA-Z]")))

# A tibble: 1 x 5
     id tch_name grade item1 item2
  <dbl> <chr>    <dbl> <chr> <chr>
1    11 lewis        4 3     MR

Return to Filter

Filter rows based on one criteria for multiple variables

Package: dplyr

Function: filter()

Function: `filter()`