as.numeric()1. Convert a character variable (Var3) to
numeric
Review the data (d2)
# A tibble: 3 x 5
Var1 Var2 Var3 Var4 Var5
<chr> <int> <chr> <chr> <lgl>
1 b 2 3.6 10/10/2004 TRUE
2 a NA 8.5 12/14/2007 FALSE
3 c 3 X 08/09/2020 TRUE
View the class for Var3
Var3 will read in as character because an “X” was
used to denote missing values in the data.class(d2$Var3)
[1] "character"
Convert Var3 to numeric.
Var3 you will get a warning
message that says “NAs were introduced in conversion”. In this case, I
am okay with that because “X”s were used to denote NAs previously and I
want those to be converted to NA.HOWEVER, if your variable contains any unexpected character values (spaces, extra decimal points, letters) and you were unaware of these values, you may have values converted to NA that you did not want that for. Whenever you get the error message above, always look into the reason before moving on. It may be that your variable requires some sort of transformation (such as a recode) before converting the type.
dplyr::mutate() and saving over the original variable by
naming the new variable the same name as the original.d2 <- d2 %>%
dplyr::mutate(Var3 = as.numeric(Var3))
class(d2$Var3)
[1] "numeric"
2. Convert a logical variable (Var5) to
numeric
Review the data (d2)
# A tibble: 3 x 5
Var1 Var2 Var3 Var4 Var5
<chr> <int> <chr> <chr> <lgl>
1 b 2 3.6 10/10/2004 TRUE
2 a NA 8.5 12/14/2007 FALSE
3 c 3 X 08/09/2020 TRUE
View the class for Var5
class(d2$Var5)
[1] "logical"
Convert Var5 to numeric.
d2 <- d2 %>%
dplyr::mutate(Var5 = as.numeric(Var5))
d2$Var5
[1] 1 0 1
class(d2$Var5)
[1] "numeric"
3. Convert all character variables to numeric
Review the data (d4)
# A tibble: 3 x 4
ID Var2 Var3 Var4
<dbl> <chr> <chr> <chr>
1 1 2 3.6 4
2 2 X 8.5 6
3 3 2.5 X X
View the class for all variables
Note: Var2, Var3 and Var4
are read in as character variables because an “X” was used to denote
missing values in the data.
Note: Another way to have dealt with these columns that have “X” denoting NA, is to have read in the data using a function where you explicitly state what the missing values are. Example: `readr::read_csv(“file.csv”, na=“X”). If you read in your file this way, the column would have read in as numeric rather than character.
str(d4)
tibble [3 x 4] (S3: tbl_df/tbl/data.frame)
$ ID : num [1:3] 1 2 3
$ Var2: chr [1:3] "2" "X" "2.5"
$ Var3: chr [1:3] "3.6" "8.5" "X"
$ Var4: chr [1:3] "4" "6" "X"
Convert all character variables to numeric variables
dplyr::across() we are
applying a transformation across multiple columnstidyselect selection helper
where().dplyr::mutate() and saving over the original
variables.d4 <- d4 %>%
dplyr::mutate(dplyr::across(where(is.character), as.numeric))
View the class for all variables
str(d4)
tibble [3 x 4] (S3: tbl_df/tbl/data.frame)
$ ID : num [1:3] 1 2 3
$ Var2: num [1:3] 2 NA 2.5
$ Var3: num [1:3] 3.6 8.5 NA
$ Var4: num [1:3] 4 6 NA
You can also call out the exact variables you want to convert
d4 %>%
dplyr::mutate(dplyr::across(Var2:Var4, as.numeric))
# A tibble: 3 x 4
ID Var2 Var3 Var4
<dbl> <dbl> <dbl> <dbl>
1 1 2 3.6 4
2 2 NA 8.5 6
3 3 2.5 NA NA
Or in the case of this data frame, since you essentially want all
variables to be numeric (Var1 just happens to already be
numeric), you could convert all variables to numeric using the
tidyselect selection helper everything().
d4 %>%
dplyr::mutate(dplyr::across(tidyselect::everything(), as.numeric))
# A tibble: 3 x 4
ID Var2 Var3 Var4
<dbl> <dbl> <dbl> <dbl>
1 1 2 3.6 4
2 2 NA 8.5 6
3 3 2.5 NA NA
4. Convert a factor variable (Var3) to
numeric
Review the data (d3)
# A tibble: 3 x 4
Var1 Var2 Var3 Var4
<chr> <int> <fct> <chr>
1 b 2 3 10/10/2004
2 a NA 8 12/14/2007
3 c 3 2 08/09/2020
View the class for Var3
class(d3$Var3)
[1] "factor"
Convert Var3 to numeric.
base::as.numeric() will convert our factor values
to their factor levels (3=2, 8=3, 1=1) which is not what we want. See
the first example vs the second example.Don’t do this
d3 %>%
dplyr::mutate(Var3 = as.numeric(Var3))
# A tibble: 3 x 4
Var1 Var2 Var3 Var4
<chr> <int> <dbl> <chr>
1 b 2 2 10/10/2004
2 a NA 3 12/14/2007
3 c 3 1 08/09/2020
Do this
d3 <- d3 %>%
dplyr::mutate(Var3 = as.numeric(as.character(Var3)))
d3
# A tibble: 3 x 4
Var1 Var2 Var3 Var4
<chr> <int> <dbl> <chr>
1 b 2 3 10/10/2004
2 a NA 8 12/14/2007
3 c 3 2 08/09/2020
class(d3$Var3)
[1] "numeric"
Return to Data Types