Note: Using the labelled package,
adding missing label values will not allow these values to be treated as
NA in R. However, they will be labelled as missing when you export to a
file type such as .sav and will be treated as missing values in those
programs.
set_na_values()1.Add a label for one missing value (-999), for one variable
(Var3).
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b -999 0
3 c 3 -999
Add NA label for -999
d1 <- d1 %>%
labelled::set_na_values(Var3 = -999)
labelled::na_values(d1$Var3)
[1] -999
Let’s review what this data looks like now.
You can see that -999 has a label of “NA” for Var3
now.
d1
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <hvn_lbl_>
1 a 1 2
2 b -999 0
3 c 3 -999 (NA)
You can also see that R recognizes these values as NA.
is.na(d1)
Var1 Var2 Var3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE TRUE
Be careful though, many functions, however, will not recognize these
values as NA. As just one example, both rowSums() and
sum() have the argument na.rm. However, these
labelled NA values will not be recognized as NA and will be included in
your sum, giving you very large negative numbers.
On the flip side, other functions, such as those within the
pointblank package and even the function
tidyr::replace_na() both seem to recognize labelled NA as
true NA values.
As a rule, test the function first to see how it interacts with labelled missing values. You may need to convert these to true NAs in order to interact with them in your normal capacity.
2. Add labels for multiple missing values (-999, 0), for one
variable (Var3)
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b -999 0
3 c 3 -999
Add NA labels for -999 and 0
d1 <- d1 %>%
labelled::set_na_values(Var3 = c(-999, 0))
labelled::na_values(d1$Var3)
[1] -999 0
If all values -999 to 0 are considered missing you could use the
labelled::set_na_range() function instead. This is a good
alternative if you have more user-defined missing values than are
allowed in other statistical programs such as SPSS, which only allow 3
distinct values.
Note that the range will not appear when you use
labelled::na_values() to review the values. However, if you
export the data to a program like SPSS, the ranges will appear as user
defined missing values.
d1 <- d1 %>%
labelled::set_na_range(Var3=c(-999,0))
3. Add labels to multiple variables (Var2 and
Var3) with the same missing value
Review the data (d1)
# A tibble: 3 x 3
Var1 Var2 Var3
<chr> <dbl> <dbl>
1 a 1 2
2 b -999 0
3 c 3 -999
Add NA labels for -999 and 0
d1 <- d1 %>%
labelled::set_na_values(
Var2 = c(-999, 0),
Var3 = c(-999, 0))
labelled::na_values(d1)
$Var1
NULL
$Var2
[1] -999 0
$Var3
[1] -999 0
If I wanted to just review NA value labels for specific variables, I
can select my variables of interest using dplyr::select()
and then iterate the labelled::na_values() function over
those variables using the purrr::map() function
d1 %>%
dplyr::select(Var2, Var3) %>%
purrr::map(labelled::na_values)
$Var2
[1] -999 0
$Var3
[1] -999 0
Since both of these variables have the same missing value labels, I could write the function like this.
na_values<- needs to be in backticks with
no spaces. And you also need to have the labelled package
loaded.d1 %>%
dplyr::mutate(
dplyr::across(Var2:Var3, ~(`na_values<-`
(., c(-999, 0)))))
And again, if all values between -999 and 0 are missing values, I
could use labelled::na_range() instead.
d1 %>%
dplyr::mutate(
dplyr::across(Var2:Var3, ~(`na_range<-`
(., c(-999, 0)))))
Return to Label Data