filter()
dplyr::across()
in
dplyr::filter()
is deprecated. dplyr::if_any()
and dplyr::if_all()
are predicate functions used to select
columns within dplyr::filter()
. This function is available
in version 1.0.5 of dplyr
. dplyr::if_any()
returns a true when the statement is true for any of
the variables. dplyr::if_all()
returns a true when the
statement is true for all of the variables.1. Keep rows where respondent has missing responses for AT LEAST ONE variable (var1:var4). Missing data is denoted by “MR”
Review the data (d20)
# A tibble: 6 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 13 "" MR MR MR MR
5 14 "" MR MR MR MR
6 15 "" 1 2 3 MR
You can read this as “If a row has a value of "MR” for ANY variables (var1:var4), return the row”
d20 %>%
dplyr::filter(dplyr::if_any(var1:var4, ~ . == "MR"))
# A tibble: 5 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 11 "b" 3 MR 2 4
2 12 "b" 1 3 1 MR
3 13 "" MR MR MR MR
4 14 "" MR MR MR MR
5 15 "" 1 2 3 MR
2. Keep rows where respondent has missing responses for ALL variables (var1:var4). Missing data is denoted by “MR”
Review the data (d20)
# A tibble: 6 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 13 "" MR MR MR MR
5 14 "" MR MR MR MR
6 15 "" 1 2 3 MR
You can read this as “If a row has a value of "MR” for ALL variables (var1:var4), return the row”
d20 %>%
dplyr::filter(dplyr::if_all(var1:var4, ~ . == "MR"))
# A tibble: 2 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 13 "" MR MR MR MR
2 14 "" MR MR MR MR
3. Remove rows where respondent has missing responses for ALL variables (var1:var4). Missing data is denoted by “MR”
Review the data (d20)
# A tibble: 6 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 13 "" MR MR MR MR
5 14 "" MR MR MR MR
6 15 "" 1 2 3 MR
This example can be solved in one of two ways. Use whichever makes the most sense to you.
The first and probably least confusing way to me is using
dplyr::if_all()
.
Here we use the same formula that we used in #2, where we first use
dplyr::if_all()
to pull rows that have values of "MR” for
ALL variables.
But then, we add the !
operator outside of
dplyr::if_all()
to say, after we grab those rows, we want
to drop them.
d20 %>%
dplyr::filter(!dplyr::if_all(var1:var4, ~ . == "MR"))
# A tibble: 4 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 15 "" 1 2 3 MR
The other way we can solve this is reframing our question to say, “Keep rows where respondent has responses that are NOT missing for ANY variable (var1:var4)”.
Reframing our question this way allows us to see how we can also you
dplyr::if_any()
to get the rows we want.
You can read the code below as “If a row has a value that is NOT "MR” for ANY variable (var1:var4), return the row”
Because if a row has a value that is NOT "MR” for any variable (var1:var4) then they are NOT missing all responses and should be returned.
d20 %>%
dplyr::filter(dplyr::if_any(var1:var4, ~ . != "MR"))
# A tibble: 4 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 15 "" 1 2 3 MR
4. Remove rows where respondent has missing responses for AT LEAST ONE variable (var1:var4). Missing data is denoted by “MR”
Review the data (d20)
# A tibble: 6 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 "a" 3 4 3 3
2 11 "b" 3 MR 2 4
3 12 "b" 1 3 1 MR
4 13 "" MR MR MR MR
5 14 "" MR MR MR MR
6 15 "" 1 2 3 MR
Again, this example can be solved in one of two ways. Use whichever makes the most sense to you.
The first and probably least confusing way to me is using
dplyr::if_any()
.
Here we use the same formula that we used in #1, where we first use
dplyr::if_any()
to pull rows that have values of "MR” for
ANY variables.
But then, we add the !
operator outside of
dplyr::if_any()
to say, after we grab those rows, we want
to drop them.
d20 %>%
dplyr::filter(!dplyr::if_any(var1:var4, ~ . == "MR"))
# A tibble: 1 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 a 3 4 3 3
The other way we can solve this is reframing our question to say, “Keep rows where respondent has NO missing responses across ALL variables (var1:var4)”.
Reframing our question this way allows us to see how we can also you
dplyr::if_all()
to get the rows we want.
You can read the code below as “If values for ALL variables (var1:var4) in a row are NOT "MR”, return the row”
d20 %>%
dplyr::filter(dplyr::if_all(var1:var4, ~ . != "MR"))
# A tibble: 1 x 6
id form var1 var2 var3 var4
<dbl> <chr> <chr> <chr> <chr> <chr>
1 10 a 3 4 3 3
Return to Filter