mdy()1. Convert a character variable (Var4) to a
date
Review the data (d2)
# A tibble: 3 x 5
Var1 Var2 Var3 Var4 Var5
<chr> <int> <chr> <chr> <lgl>
1 b 2 3.6 10/10/2004 TRUE
2 a NA 8.5 12/14/2007 FALSE
3 c 3 X 08/09/2020 TRUE
View the class for Var4
class(d2$Var4)
[1] "character"
Convert Var4 to a date
Note: We are recoding into a new variable using
dplyr::mutate() and saving over the original variable by
naming the new variable the same name as the original.
Note: mdy() is just the format for
Var4, you can adjust the format for your character variable
and use functions such as dmy(), ymd(),
ymd_hms().
d2 <- d2 %>%
dplyr::mutate(Var4 = lubridate::mdy(Var4))
d2$Var4
[1] "2004-10-10" "2007-12-14" "2020-08-09"
class(d2$Var4)
[1] "Date"
parse_date_time()1. Convert a character variable with various date formats
(svy_date) to a date
Review the data (d7)
# A tibble: 7 x 2
id svy_date
<dbl> <chr>
1 10 2-13-23
2 20 06/04/2022
3 30 1 Mar 2022
4 40 12/27/22
5 50 Feb 3 2023
6 60 30 Jan 2012
7 70 02282022
Convert all dates to a standard date format
I currently see both “mdy” and “dmy” formats so I will add both of those to the orders argument.
Make sure to check your work to ensure everything converted as expected. Unfortunately with an open text format date like this, you can never know for sure, what date someone intended to input (for example “02-03-2023” could be February 3rd or March 2nd)
d7 %>%
dplyr::mutate(svy_date_fixed =
lubridate::parse_date_time(svy_date, orders = c('mdy', 'dmy')))
# A tibble: 7 x 3
id svy_date svy_date_fixed
<dbl> <chr> <dttm>
1 10 2-13-23 2023-02-13 00:00:00
2 20 06/04/2022 2022-06-04 00:00:00
3 30 1 Mar 2022 2022-03-01 00:00:00
4 40 12/27/22 2022-12-27 00:00:00
5 50 Feb 3 2023 2023-02-03 00:00:00
6 60 30 Jan 2012 2012-01-30 00:00:00
7 70 02282022 2022-02-28 00:00:00
If I wanted this format without time, I can wrap this in
lubridate::as_date()
d7 %>%
dplyr::mutate(svy_date_fixed =
lubridate::as_date(lubridate::parse_date_time(svy_date, orders = c('mdy', 'dmy'))))
# A tibble: 7 x 3
id svy_date svy_date_fixed
<dbl> <chr> <date>
1 10 2-13-23 2023-02-13
2 20 06/04/2022 2022-06-04
3 30 1 Mar 2022 2022-03-01
4 40 12/27/22 2022-12-27
5 50 Feb 3 2023 2023-02-03
6 60 30 Jan 2012 2012-01-30
7 70 02282022 2022-02-28
2. Convert a month-day variable from Excel into the numeric values
Review the data (d9)
# A tibble: 3 x 2
id age
<dbl> <chr>
1 100 July-11
2 102 May-03
3 103 Oct-14
In this case, age is supposed to be a variable that represents the age of the participant, denoted as years-months old. However, Excel has converted this in to a date. We want to convert this back to the numeric values, stored as a character variable.
We can use lubridate::parse_date_time() to identify that age is currently a date, structured as month-day (denoted by %m%d), and then we can use lubridate::month() and lubridate::day() to pull out the numeric values of each. Last we can paste the pieces together, with a delimiter (“-”) to get our final desired outcome.
d9 %>%
dplyr::mutate(age = paste0(
lubridate::month(lubridate::parse_date_time(age, "%m%d")),
"-",
lubridate::day(lubridate::parse_date_time(age, "%m%d"))))
# A tibble: 3 x 2
id age
<dbl> <chr>
1 100 7-11
2 102 5-3
3 103 10-14
We could also replace %m%d with “bd” (b = abbreviated or full month) to refer to the month and day format.
d9 %>%
dplyr::mutate(age = paste0(
lubridate::month(lubridate::parse_date_time(age, "bd")),
"-",
lubridate::day(lubridate::parse_date_time(age, "bd"))))
# A tibble: 3 x 2
id age
<dbl> <chr>
1 100 7-11
2 102 5-3
3 103 10-14
as_date(), as_datetime()1. Convert a character variable that appears numeric
(created_timestamp) to a datetime (POSIXct)
Review the data (d5)
# A tibble: 2 x 3
type id created_timestamp
<chr> <dbl> <chr>
1 b 333 1643909614746
2 a 444 1644593624206
View the class for created_timestamp
class(d5$created_timestamp)
[1] "character"
Convert created_timestamp to a datetime
Note: I know for this timestamp variable that the value is in
milliseconds, not seconds. So in order to use the function I want to
use, lubridate::as_datetime(), I need to convert my
character variable to numeric using base::as.numeric() and
then divide it by 1000. If the time was in seconds, I would not need to
convert the character variable to numeric before applying the
lubridate function.
Note: We are recoding into a new variable using
dplyr::mutate() and saving over the original variable by
naming the new variable the same name as the original.
Note: You can change the timezone from UTC to something else (like “America/Chicago”) by adding the argument tz = “America/Chicago”. You can get your current timezone by typing “Sys.timezone()” in your console. Time zones must be from tz database (Olson database). You can see all time zones by typing “OlsonNames()” in your console.
d5_new <- d5 %>%
dplyr::mutate(created_timestamp =
lubridate::as_datetime(as.numeric(created_timestamp)/1000))
d5_new$created_timestamp
[1] "2022-02-03 17:33:34 UTC" "2022-02-11 15:33:44 UTC"
class(d5_new$created_timestamp)
[1] "POSIXct" "POSIXt"
If I no longer cared about the time portion of this variable and only
wanted the date, I could further modify the variable using
lubridate::as_date().
dplyr::mutate() and saving into a new variable.d5_new <- d5_new %>%
dplyr::mutate(created_timestamp_date =
lubridate::as_date(created_timestamp))
d5_new$created_timestamp_date
[1] "2022-02-03" "2022-02-11"
class(d5_new$created_timestamp_date)
[1] "Date"
2. Convert a numeric date variable from Excel to a date
Review the data (d6)
# A tibble: 2 x 3
type id date
<chr> <dbl> <dbl>
1 b 333 44491
2 a 444 44530
View the class for date
class(d6$date)
[1] "numeric"
d6 <- d6 %>%
dplyr::mutate(date = lubridate::as_date(date, origin = "1899-12-30"))
d6$date
[1] "2021-10-22" "2021-11-30"
class(d6$date)
[1] "Date"
janitor::excel_numeric_to_date()1. Convert a numeric date variable from Excel to a date
Review the data (d6)
# A tibble: 2 x 3
type id date
<chr> <dbl> <dbl>
1 b 333 44491
2 a 444 44530
View the class for date
class(d6$date)
[1] "numeric"
If you don’t want to stress over remembering the origin date, another
helpful function for working with Excel numeric dates is
janitor::excel_numeric_to_date(), which includes the origin
time.
d6 <- d6 %>%
dplyr::mutate(date = janitor::excel_numeric_to_date(date))
d6$date
[1] "2021-10-22" "2021-11-30"
class(d6$date)
[1] "Date"
Return to Data Types