Add variable labels using a data dictionary

Package: labelled

Function: `set_variable_labels()`

1. Set labels for one or more variables (id, gradelevel, q1) using a data dictionary

Review the data (d11)

# A tibble: 3 x 3
     id gradelevel    q1
  <dbl>      <dbl> <dbl>
1  1234          6     4
2  2345          6     5
3  3456          7     3

Review our data dictionary (dict)

# A tibble: 3 x 2
  varname    label                              
  <chr>      <chr>                              
1 id         study ID                           
2 gradelevel student grade level                
3 q1         do you get along with your teacher?

Add variable labels

Note: First we need to create a named list. labelled::set_variable_labels() works with named lists or character vectors. I prefer to work with named lists which matches based on variable name. With a character vector, if the variables are not in the exact same order as our data frame variables, labels will be incorrectly assigned.

We can use tibble::deframe() to create a named character vector and turn it into a list using base::as.list(). Make sure your varname variable is first in your data frame and your label variable is second. The first variable will become your “names”.

var_labels <- dict %>%
  tibble::deframe() %>%
  base::as.list()

We could also get the same named list using dplyr::pull() in conjunction with base::as.list(). HOWEVER, note that the order of variables will need to change. Now we need the label to be first and the varname to be second in order for varname to become our “names”.

var_labels <- dict %>%
  dplyr::pull(label, varname) %>%
  base::as.list()

Now we can use this list to add our variable labels using the argument .labels.

d11 <- d11 %>%
  labelled::set_variable_labels(.labels = var_labels)

labelled::var_label(d11)

$id
[1] "study ID"

$gradelevel
[1] "student grade level"

$q1
[1] "do you get along with your teacher?"

This also works if you only want to add variable labels to some variables, not all. There may be times when some of your variables already have labels or you simply do not want to add labels to all variables. If there are existing labels and those variables are not in your data dictionary, those labels will remain. If they are in your data dictionary, those labels will be replaced with your data dictionary labels.

Here is an example where we choose to only add labels to 2 of our variables.

Note: I filter my dictionary using dplyr::filter() to just the two variables of interest.

You’ll see labels are only added to gradelevel and q1.

var_labels <- dict %>%
  dplyr::filter(varname %in% c("gradelevel", "q1")) %>%
  tibble::deframe() %>%
  base::as.list()

d11 <- d11 %>%
  labelled::set_variable_labels(.labels = var_labels)

labelled::var_label(d11)

$id
NULL

$gradelevel
[1] "student grade level"

$q1
[1] "do you get along with your teacher?"

If “id” already had a label, like in the example below, and that variable was not in the data dictionary, that label would be retained.

Again, we narrow the data dictionary to only gradelevel and q1 to demonstrate this point.

$id
[1] "research study id"

$gradelevel
NULL

$q1
NULL

var_labels <- dict %>%
  dplyr::filter(varname %in% c("gradelevel", "q1")) %>%
  tibble::deframe() %>%
  base::as.list()

d11 <- d11 %>%
  labelled::set_variable_labels(.labels = var_labels)

labelled::var_label(d11)

$id
[1] "research study id"

$gradelevel
[1] "student grade level"

$q1
[1] "do you get along with your teacher?"

However, if study id was in our data dictionary, it would write over the existing label.

var_labels <- dict %>%
  tibble::deframe() %>%
  base::as.list()

d11 <- d11 %>%
  labelled::set_variable_labels(.labels = var_labels)

labelled::var_label(d11)

$id
[1] "study ID"

$gradelevel
[1] "student grade level"

$q1
[1] "do you get along with your teacher?"

Note: If we have more variables in our data dictionary than we have in our current data frame, we can add the argument .strict = FALSE to denote that we are aware that the files don’t match exactly and we don’t need a warning.

Here is an example.

Review our data dictionary (dict2)

# A tibble: 4 x 2
  varname    label                              
  <chr>      <chr>                              
1 id         study ID                           
2 gradelevel student grade level                
3 q1         do you get along with your teacher?
4 q2         something

Add our labels

var_labels <- dict2 %>%
  tibble::deframe() %>%
  base::as.list()

d11 <- d11 %>%
  labelled::set_variable_labels(.labels = var_labels, .strict = FALSE)

labelled::var_label(d11)

$id
[1] "study ID"

$gradelevel
[1] "student grade level"

$q1
[1] "do you get along with your teacher?"

Return to Label Data

Add variable labels using a data dictionary

Package: labelled

Function: set_variable_labels()

Function: `set_variable_labels()`