Filter based on row sums or means

Package: dplyr

Function: `filter()`

1. Keep rows that have two or less values of “MR” across var1:var4.

Review the data (d23)

# A tibble: 6 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 a     3     4     3     3    
2    11 b     3     MR    2     4    
3    12 b     1     3     1     MR   
4    13 c     MR    MR    MR    MR   
5    14 c     MR    MR    <NA>  MR   
6    15 a     1     2     3     MR

Keep rows that have two or less values of “MR” across var1:var4

Note: Here I am not using if_any() or if_all() like we typically use in a filter statement. I am using dplyr::across() because we are selecting variables within base::rowSums().
Note: I am using an anonymous function within our dplyr::across() statement to check if the values of our variables are equal to “MR”.
Note: I add the base::rowSums() argument, na.rm = TRUE because there are NA values in my data and I still want my “MR” values summed for rows where there are NA values.

d23 %>%
  dplyr::filter(rowSums(dplyr::across(var1:var4, ~ . == "MR"), na.rm = TRUE) < 2)

# A tibble: 4 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 a     3     4     3     3    
2    11 b     3     MR    2     4    
3    12 b     1     3     1     MR   
4    15 a     1     2     3     MR

If the pattern we were trying to count was not always exactly the same across variables, we could use something like stringr::str_detect() to capture values instead.

d23 %>%
  dplyr::filter(rowSums(dplyr::across(var1:var4, ~ stringr::str_detect(., "MR")), na.rm = TRUE) < 2)

# A tibble: 4 x 6
     id form  var1  var2  var3  var4 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1    10 a     3     4     3     3    
2    11 b     3     MR    2     4    
3    12 b     1     3     1     MR   
4    15 a     1     2     3     MR

2. Keep rows where the mean of var1 and var2 is greater than 2.

Review the data (d24)

# A tibble: 4 x 3
     id  var1  var2
  <dbl> <dbl> <dbl>
1    20     3     4
2    21     4     5
3    22     3     1
4    23     5    NA

Here we don’t need to add an anonymous function within dplyr::across().

d24 %>%
  dplyr::filter(rowMeans(dplyr::across(var1:var2), na.rm = TRUE) > 2)

# A tibble: 3 x 3
     id  var1  var2
  <dbl> <dbl> <dbl>
1    20     3     4
2    21     4     5
3    23     5    NA

Return to Filter

Filter based on row sums or means

Package: dplyr

Function: filter()

Function: `filter()`