Package: labelled

Note: Using the labelled package, adding missing label values will not allow these values to be treated as NA in R. However, they will be labelled as missing when you export to a file type such as .sav and will be treated as missing values in those programs.


Function: set_na_values()


1.Add a label for one missing value (-999), for one variable (Var3).

Review the data (d1)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b      -999     0
3 c         3  -999

Add NA label for -999

d1 <- d1 %>%
  labelled::set_na_values(Var3 = -999)

labelled::na_values(d1$Var3)
[1] -999

Let’s review what this data looks like now.

You can see that -999 has a label of “NA” for Var3 now.

d1
# A tibble: 3 x 3
  Var1   Var2       Var3
  <chr> <dbl> <hvn_lbl_>
1 a         1     2     
2 b      -999     0     
3 c         3  -999 (NA)

You can also see that R recognizes these values as NA.

is.na(d1)
      Var1  Var2  Var3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE

Be careful though, many functions, however, will not recognize these values as NA. As just one example, both rowSums() and sum() have the argument na.rm. However, these labelled NA values will not be recognized as NA and will be included in your sum, giving you very large negative numbers.

On the flip side, other functions, such as those within the pointblank package and even the function tidyr::replace_na() both seem to recognize labelled NA as true NA values.

As a rule, test the function first to see how it interacts with labelled missing values. You may need to convert these to true NAs in order to interact with them in your normal capacity.

2. Add labels for multiple missing values (-999, 0), for one variable (Var3)

Review the data (d1)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b      -999     0
3 c         3  -999

Add NA labels for -999 and 0

d1 <- d1 %>%
  labelled::set_na_values(Var3 = c(-999, 0))

labelled::na_values(d1$Var3)
[1] -999    0

If all values -999 to 0 are considered missing you could use the labelled::set_na_range() function instead. This is a good alternative if you have more user-defined missing values than are allowed in other statistical programs such as SPSS, which only allow 3 distinct values.

Note that the range will not appear when you use labelled::na_values() to review the values. However, if you export the data to a program like SPSS, the ranges will appear as user defined missing values.

d1 <- d1 %>% 
  labelled::set_na_range(Var3=c(-999,0))

3. Add labels to multiple variables (Var2 and Var3) with the same missing value

Review the data (d1)

# A tibble: 3 x 3
  Var1   Var2  Var3
  <chr> <dbl> <dbl>
1 a         1     2
2 b      -999     0
3 c         3  -999

Add NA labels for -999 and 0

d1 <- d1 %>%
  labelled::set_na_values(
    Var2 = c(-999, 0),
    Var3 = c(-999, 0))

labelled::na_values(d1)
$Var1
NULL

$Var2
[1] -999    0

$Var3
[1] -999    0

If I wanted to just review NA value labels for specific variables, I can select my variables of interest using dplyr::select() and then iterate the labelled::na_values() function over those variables using the purrr::map() function

d1 %>%
  dplyr::select(Var2, Var3) %>%
  purrr::map(labelled::na_values)
$Var2
[1] -999    0

$Var3
[1] -999    0

Since both of these variables have the same missing value labels, I could write the function like this.

  • Note: The na_values<- needs to be in backticks with no spaces. And you also need to have the labelled package loaded.
d1 %>% 
  dplyr::mutate(
  dplyr::across(Var2:Var3, ~(`na_values<-`
                                (., c(-999, 0)))))

And again, if all values between -999 and 0 are missing values, I could use labelled::na_range() instead.

d1 %>% 
  dplyr::mutate(
  dplyr::across(Var2:Var3, ~(`na_range<-`
                                (., c(-999, 0)))))

Return to Label Data