Filter rows based on matching variable values

Package: dplyr

Function: `filter()`

1. Keep any row where q1 equals q2.

Review the data (d13)

# A tibble: 3 x 3
     id q1     q2    
  <dbl> <chr>  <chr> 
1    10 harris Harris
2    20 steve  steve 
3    30 lewis  <NA>

Filter rows

d13 %>% 
  dplyr::filter(q1 == q2)

# A tibble: 1 x 3
     id q1    q2   
  <dbl> <chr> <chr>
1    20 steve steve

2. Remove any row where q1 does not equal q2, while also keeping rows that have NA for q2.

Review the data (d13)

# A tibble: 3 x 3
     id q1     q2    
  <dbl> <chr>  <chr> 
1    10 harris Harris
2    20 steve  steve 
3    30 lewis  <NA>

Filter out any row where q1 is not equal to q2

Note: Use the logical operator != to denote not equal to.
Note: I make an explicit call to keep rows with NA values for q2. If I did not do this, filter would drop the last row of data.

d13 %>% 
  dplyr::filter(q1 != q2 | is.na(q2))

# A tibble: 2 x 3
     id q1     q2    
  <dbl> <chr>  <chr> 
1    10 harris Harris
2    30 lewis  <NA>

There are still other ways to keep NAs as well. As we saw in Filter rows that contain NA values there are other ways to keep NA values using the %in% operator.

However, in this scenario where we are comparing two vectors, you cannot use the %in% operator the same way we that we used it in Filter rows that contain NA values. Here you need to add the dplyr::rowwise() function. This is also a better method when both q1 and q2 might have NA values and you want to keep rows when either column has NA, but not when both columns have NA.

d13 %>% 
  dplyr::rowwise() %>%
  dplyr::filter(!q1 %in% q2)

# A tibble: 2 x 3
# Rowwise: 
     id q1     q2    
  <dbl> <chr>  <chr> 
1    10 harris Harris
2    30 lewis  <NA>

And yet, another way you may want to keep rows in your data is using the dplyr::case_when() function.

d13 %>%
  dplyr::filter(
    dplyr::case_when(
      q1 == q2 ~ FALSE,
      TRUE ~ TRUE))

# A tibble: 2 x 3
     id q1     q2    
  <dbl> <chr>  <chr> 
1    10 harris Harris
2    30 lewis  <NA>

Return to Filter

Filter rows based on matching variable values

Package: dplyr

Function: filter()

Function: `filter()`