filter()dplyr::across() in
dplyr::filter() is deprecated. dplyr::if_any()
and dplyr::if_all() are predicate functions used to select
columns within dplyr::filter(). This function is available
in version 1.0.5 of dplyr. dplyr::if_any()
returns a true when the statement is true for any of
the variables. dplyr::if_all() returns a true when the
statement is true for all of the variables. See Filter
using if_all or if_any for further explanation 1. Keep rows that have -999 for extra1
AND/OR -999 for extra2.
Review the data (d8)
# A tibble: 4 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 c -999 -999 12 250 harris
4 d 4 0 13 217 lewis
Keep rows that have -999 for extra2 and/or extra3.
d8 %>%
dplyr::filter(extra2 == -999 | extra3 == -999)
# A tibble: 2 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 b -999 0 11 220 steve
2 c -999 -999 12 250 harris
Another way I could write this is using
dplyr::if_any()
d8 %>%
dplyr::filter(dplyr::if_any(extra2:extra3, ~ . == -999))
# A tibble: 2 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 b -999 0 11 220 steve
2 c -999 -999 12 250 harris
2. Keep rows that have -999 for extra1
OR -999 for extra2 (not for both).
Review the data (d8)
# A tibble: 4 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 c -999 -999 12 250 harris
4 d 4 0 13 217 lewis
Keep rows that have -999 for extra2 OR -999 for
extra3 (but not -999 for both).
d8 %>%
dplyr::filter(xor(extra2 == -999 , extra3 == -999))
# A tibble: 1 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 b -999 0 11 220 steve
3. Keep rows that have BOTH -999 for
extra1 AND -999 for
extra2.
Review the data (d8)
# A tibble: 4 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 c -999 -999 12 250 harris
4 d 4 0 13 217 lewis
Keep rows that have -999 for extra2 and also have -999 extra3.
d8 %>%
dplyr::filter(extra2 == -999 & extra3 == -999)
# A tibble: 1 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 c -999 -999 12 250 harris
Another way I could write this is using
dplyr::if_all()
d8 %>%
dplyr::filter(dplyr::if_all(extra2:extra3, ~ . == -999))
# A tibble: 1 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 c -999 -999 12 250 harris
4. Remove any row that has -999 for extra2
AND/OR has -999 for extra3.
Review the data (d8)
# A tibble: 4 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 c -999 -999 12 250 harris
4 d 4 0 13 217 lewis
Filter out any row that has -999 for extra2 and/or has
-999 extra3.
d8 %>%
dplyr::filter(!(extra2 == -999 | extra3 == -999))
# A tibble: 2 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 d 4 0 13 217 lewis
I could also write this using dplyr::if_any()
d8 %>%
dplyr::filter(!dplyr::if_any(extra2:extra3, ~ . == -999))
# A tibble: 2 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 d 4 0 13 217 lewis
5. Remove any row that has -999 for BOTH
extra2 AND for extra3.
Review the data (d8)
# A tibble: 4 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 c -999 -999 12 250 harris
4 d 4 0 13 217 lewis
Filter out any row that has -999 for extra2 and has -999
extra3.
d8 %>%
dplyr::filter(!(extra2 ==-999 & extra3 ==-999))
# A tibble: 3 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 d 4 0 13 217 lewis
I could also write this using dplyr::if_all()
d8 %>%
dplyr::filter(!dplyr::if_all(extra2:extra3, ~ . == -999))
# A tibble: 3 x 6
extra1 extra2 extra3 stu_id test_score tch_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 a 1 2 10 205 harris
2 b -999 0 11 220 steve
3 d 4 0 13 217 lewis
6. Keep rows where any variable that starts with the word “item” contains the value “MR”.
Review the data (d16)
# A tibble: 3 x 5
id tch_name grade item1 item2
<dbl> <chr> <dbl> <chr> <chr>
1 10 harris 6 5 3
2 11 lewis 4 3 MR
3 12 harris 6 4 1
Keep rows where any of the “item” variables contains the value “MR”
tidyverse selection helper
starts_with() to select variables.d16 %>%
dplyr::filter(dplyr::if_any(tidyselect::starts_with("item"), ~ . == "MR"))
# A tibble: 1 x 5
id tch_name grade item1 item2
<dbl> <chr> <dbl> <chr> <chr>
1 11 lewis 4 3 MR
7. Keep rows where any variable that starts with the word “item” contain letters.
Review the data (d16)
# A tibble: 3 x 5
id tch_name grade item1 item2
<dbl> <chr> <dbl> <chr> <chr>
1 10 harris 6 5 3
2 11 lewis 4 3 MR
3 12 harris 6 4 1
We want to review any rows where either item1 OR
item2 contain letters.
stringr::str_detect() to detect a specific
pattern, assigned in the pattern argument. Here we use a
regular expression to denote that we would like to identify zero or more
letters from the A-Z alphabet, lower or upper case.d16 %>%
dplyr::filter(dplyr::if_any(item1:item2,
~ stringr::str_detect(., pattern = "[a-zA-Z]")))
# A tibble: 1 x 5
id tch_name grade item1 item2
<dbl> <chr> <dbl> <chr> <chr>
1 11 lewis 4 3 MR
Return to Filter