select()1. Drop columns using a data dictionary
Review the data (d2)
# A tibble: 5 x 6
extra1 extra2 extra3 id test_score1 test_score2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 2 10 205 500
2 b -999 0 11 220 480
3 c 3 -999 12 250 540
4 d 4 0 13 217 499
5 <NA> NA NA NA NA NA
Review our data dictionary.
Oftentimes we develop a data dictionary to include everything that exists in our raw data. It is reasonable to include a column that describes what you plan to do with your variables, such as drop a column.
# A tibble: 6 x 3
var_name label keep
<chr> <chr> <chr>
1 extra1 extra var from qualtrics no
2 extra2 extra var from qualtrics no
3 extra3 extra var from qualtrics no
4 id student id yes
5 test_score1 1st test score yes
6 test_score2 2nd test score yes
We can now create a character vector of all of the variables we wish to drop using our data dictionary.
Note: We use dplyr::filter() to only keep the variables we wish to drop.
Note: We use dplyr::pull() to extract the one column with the names of the variables in our data dictionary and create a character vector
drop_vars <- dictionary %>%
dplyr::filter(keep == "no") %>%
dplyr::pull(var_name)
drop_vars
[1] "extra1" "extra2" "extra3"
We can now use this character vector to select/remove variables from our dataset.
Note: We use tidyselect::all_of() to select variables that are contained in a character vector (an environment variable).
Note: We add the - to denote that we want to remove variables.
d2 %>%
dplyr::select(-all_of(drop_vars))
# A tibble: 5 x 3
id test_score1 test_score2
<dbl> <dbl> <dbl>
1 10 205 500
2 11 220 480
3 12 250 540
4 13 217 499
5 NA NA NA
Return to Select Variables