rename()
dplyr::rename()
formula is new value=old
value, this is opposite of dplyr::recode()
.1. Set all column names using an existing file (for example a data dictionary).
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
Read in and review the data dictionary
dplyr::relocate()
to reorder them.dict <- readxl::read_excel("dictionary.xlsx")
# A tibble: 3 x 2
new_name old_name
<chr> <chr>
1 stu_id Var1
2 read_score Var2
3 math_score Var3
In this case we can use the existing file to rename the variables for us, rather than us hand entering “stu_id = Var1, read_score = Var2, math_score = Var3”. You can see how this would save us time if we have many variables to rename.
The first thing we need to do is use the function
tibble::deframe()
to convert our dictionary data frame to a
two-column names vector.
dict_names <- tibble::deframe(dict)
dict_names
stu_id read_score math_score
"Var1" "Var2" "Var3"
Now we can rename our variables using this vector.
tidyselect::all_of()
to remove
ambiguity between columns and external variables. See this link for more
details https://tidyselect.r-lib.org/reference/faq-external-vector.htmld1 %>%
dplyr::rename(tidyselect::all_of(dict_names))
# A tibble: 3 x 3
stu_id read_score math_score
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
There are many other ways to rename variables using another column of names. However I like this way the best because of versatility.
Even if the data dictionary rows are not in the same order as the variables in our data frame, the renaming will still work.
For example, here is the dictionary where the rows are in a different order.
dict2
# A tibble: 3 x 2
new_name old_name
<chr> <chr>
1 math_score Var3
2 stu_id Var1
3 read_score Var2
And we can still use this to rename our variables.
dict_names <- tibble::deframe(dict2)
dict_names
math_score stu_id read_score
"Var3" "Var1" "Var2"
d1 %>%
dplyr::rename(tidyselect::all_of(dict_names))
# A tibble: 3 x 3
stu_id read_score math_score
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
We can also use this dictionary, even if it doesn’t have new names for all of our variables. Take this dictionary that can only relabel 2 of our variables.
dict3
# A tibble: 2 x 2
new_name old_name
<chr> <chr>
1 stu_id Var1
2 read_score Var2
We can still rename.
dict_names <- tibble::deframe(dict3)
dict_names
stu_id read_score
"Var1" "Var2"
d1 %>%
dplyr::rename(tidyselect::any_of(dict_names))
# A tibble: 3 x 3
stu_id read_score Var3
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
Last, if for some reason your data dictionary contains variables that are not in your dataset (for instance, maybe you previously removed/dropped some variables), you can still rename your variables using this data dictionary (without having to filter those variables out).
Let’s take this data dictionary that has one extra variable compared to our dataset.
dict4
# A tibble: 4 x 2
new_name old_name
<chr> <chr>
1 math_score Var3
2 stu_id Var1
3 read_score Var2
4 DROP extra_var
dict_names <- tibble::deframe(dict4)
dict_names
math_score stu_id read_score DROP
"Var3" "Var1" "Var2" "extra_var"
We can still rename if we use dplyr::any_of() rather than dplyr::all_of().
d1 %>%
dplyr::rename(tidyselect::any_of(dict_names))
# A tibble: 3 x 3
stu_id read_score math_score
<chr> <dbl> <dbl>
1 a 2 3.6
2 b NA 8.5
3 c 3 NA
Return to Name Variables