filter()1. Keep rows that have two or less values of “MR” across var1:var4.
Review the data (d23)
# A tibble: 6 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 a 3 4 3 3
2 11 b 3 MR 2 4
3 12 b 1 3 1 MR
4 13 c MR MR MR MR
5 14 c MR MR <NA> MR
6 15 a 1 2 3 MR
Keep rows that have two or less values of “MR” across var1:var4
Note: Here I am not using if_any() or
if_all() like we typically use in a filter statement. I am
using dplyr::across() because we are selecting variables
within base::rowSums().
Note: I am using an anonymous function within our
dplyr::across() statement to check if the values of our
variables are equal to “MR”.
Note: I add the base::rowSums() argument, na.rm
= TRUE because there are NA values in my data and I still want my
“MR” values summed for rows where there are NA values.
d23 %>%
dplyr::filter(rowSums(dplyr::across(var1:var4, ~ . == "MR"), na.rm = TRUE) < 2)
# A tibble: 4 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 a 3 4 3 3
2 11 b 3 MR 2 4
3 12 b 1 3 1 MR
4 15 a 1 2 3 MR
If the pattern we were trying to count was not always exactly the
same across variables, we could use something like
stringr::str_detect() to capture values instead.
d23 %>%
dplyr::filter(rowSums(dplyr::across(var1:var4, ~ stringr::str_detect(., "MR")), na.rm = TRUE) < 2)
# A tibble: 4 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 a 3 4 3 3
2 11 b 3 MR 2 4
3 12 b 1 3 1 MR
4 15 a 1 2 3 MR
2. Keep rows where the mean of var1 and
var2 is greater than 2.
Review the data (d24)
# A tibble: 4 x 3
id var1 var2
<dbl> <dbl> <dbl>
1 20 3 4
2 21 4 5
3 22 3 1
4 23 5 NA
Here we don’t need to add an anonymous function within
dplyr::across().
d24 %>%
dplyr::filter(rowMeans(dplyr::across(var1:var2), na.rm = TRUE) > 2)
# A tibble: 3 x 3
id var1 var2
<dbl> <dbl> <dbl>
1 20 3 4
2 21 4 5
3 23 5 NA
Return to Filter