Package: dplyr


Function: filter()


  • Note: Using dplyr::across() in dplyr::filter() is deprecated. dplyr::if_any() and dplyr::if_all() are predicate functions used to select columns within dplyr::filter(). This function is available in version 1.0.5 of dplyr. dplyr::if_any() returns a true when the statement is true for any of the variables. dplyr::if_all() returns a true when the statement is true for all of the variables.


1. Keep rows where respondent has missing responses for AT LEAST ONE variable (var1:var4). Missing data is denoted by “MR”

Review the data (d20)

# A tibble: 6 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    13 ""    MR    MR    MR    MR   
5    14 ""    MR    MR    MR    MR   
6    15 ""    1     2     3     MR   

You can read this as “If a row has a value of "MR” for ANY variables (var1:var4), return the row”

d20 %>%
  dplyr::filter(dplyr::if_any(var1:var4, ~ . == "MR"))
# A tibble: 5 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    11 "b"   3     MR    2     4    
2    12 "b"   1     3     1     MR   
3    13 ""    MR    MR    MR    MR   
4    14 ""    MR    MR    MR    MR   
5    15 ""    1     2     3     MR   

2. Keep rows where respondent has missing responses for ALL variables (var1:var4). Missing data is denoted by “MR”

Review the data (d20)

# A tibble: 6 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    13 ""    MR    MR    MR    MR   
5    14 ""    MR    MR    MR    MR   
6    15 ""    1     2     3     MR   

You can read this as “If a row has a value of "MR” for ALL variables (var1:var4), return the row”

d20 %>%
  dplyr::filter(dplyr::if_all(var1:var4, ~ . == "MR"))
# A tibble: 2 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    13 ""    MR    MR    MR    MR   
2    14 ""    MR    MR    MR    MR   

3. Remove rows where respondent has missing responses for ALL variables (var1:var4). Missing data is denoted by “MR”

Review the data (d20)

# A tibble: 6 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    13 ""    MR    MR    MR    MR   
5    14 ""    MR    MR    MR    MR   
6    15 ""    1     2     3     MR   

This example can be solved in one of two ways. Use whichever makes the most sense to you.

The first and probably least confusing way to me is using dplyr::if_all().

Here we use the same formula that we used in #2, where we first use dplyr::if_all() to pull rows that have values of "MR” for ALL variables.

But then, we add the ! operator outside of dplyr::if_all() to say, after we grab those rows, we want to drop them.

d20 %>%
  dplyr::filter(!dplyr::if_all(var1:var4, ~ . == "MR"))
# A tibble: 4 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    15 ""    1     2     3     MR   

The other way we can solve this is reframing our question to say, “Keep rows where respondent has responses that are NOT missing for ANY variable (var1:var4)”.

Reframing our question this way allows us to see how we can also you dplyr::if_any() to get the rows we want.

You can read the code below as “If a row has a value that is NOT "MR” for ANY variable (var1:var4), return the row”

Because if a row has a value that is NOT "MR” for any variable (var1:var4) then they are NOT missing all responses and should be returned.

d20 %>%
  dplyr::filter(dplyr::if_any(var1:var4, ~ . != "MR"))
# A tibble: 4 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    15 ""    1     2     3     MR   

4. Remove rows where respondent has missing responses for AT LEAST ONE variable (var1:var4). Missing data is denoted by “MR”

Review the data (d20)

# A tibble: 6 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 "a"   3     4     3     3    
2    11 "b"   3     MR    2     4    
3    12 "b"   1     3     1     MR   
4    13 ""    MR    MR    MR    MR   
5    14 ""    MR    MR    MR    MR   
6    15 ""    1     2     3     MR   

Again, this example can be solved in one of two ways. Use whichever makes the most sense to you.

The first and probably least confusing way to me is using dplyr::if_any().

Here we use the same formula that we used in #1, where we first use dplyr::if_any() to pull rows that have values of "MR” for ANY variables.

But then, we add the ! operator outside of dplyr::if_any() to say, after we grab those rows, we want to drop them.

d20 %>%
  dplyr::filter(!dplyr::if_any(var1:var4, ~ . == "MR"))
# A tibble: 1 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 a     3     4     3     3    

The other way we can solve this is reframing our question to say, “Keep rows where respondent has NO missing responses across ALL variables (var1:var4)”.

Reframing our question this way allows us to see how we can also you dplyr::if_all() to get the rows we want.

You can read the code below as “If values for ALL variables (var1:var4) in a row are NOT "MR”, return the row”

d20 %>%
  dplyr::filter(dplyr::if_all(var1:var4, ~ . != "MR"))
# A tibble: 1 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 a     3     4     3     3    

Return to Filter