Package: lubridate


Function: mdy()


1. Convert a character variable (Var4) to a date

Review the data (d2)

# A tibble: 3 x 5
  Var1   Var2 Var3  Var4       Var5 
  <chr> <int> <chr> <chr>      <lgl>
1 b         2 3.6   10/10/2004 TRUE 
2 a        NA 8.5   12/14/2007 FALSE
3 c         3 X     08/09/2020 TRUE 

View the class for Var4

class(d2$Var4)
[1] "character"

Convert Var4 to a date

  • Note: We are recoding into a new variable using dplyr::mutate() and saving over the original variable by naming the new variable the same name as the original.

  • Note: mdy() is just the format for Var4, you can adjust the format for your character variable and use functions such as dmy(), ymd(), ymd_hms().

d2 <- d2 %>% 
  dplyr::mutate(Var4 = lubridate::mdy(Var4))

d2$Var4
[1] "2004-10-10" "2007-12-14" "2020-08-09"
class(d2$Var4)
[1] "Date"


Function: parse_date_time()


1. Convert a character variable with various date formats (svy_date) to a date

Review the data (d7)

# A tibble: 7 x 2
     id svy_date   
  <dbl> <chr>      
1    10 2-13-23    
2    20 06/04/2022 
3    30 1 Mar 2022 
4    40 12/27/22   
5    50 Feb 3 2023 
6    60 30 Jan 2012
7    70 02282022   

Convert all dates to a standard date format

I currently see both “mdy” and “dmy” formats so I will add both of those to the orders argument.

Make sure to check your work to ensure everything converted as expected. Unfortunately with an open text format date like this, you can never know for sure, what date someone intended to input (for example “02-03-2023” could be February 3rd or March 2nd)

d7 %>%
  dplyr::mutate(svy_date_fixed = 
           lubridate::parse_date_time(svy_date, orders = c('mdy', 'dmy')))
# A tibble: 7 x 3
     id svy_date    svy_date_fixed     
  <dbl> <chr>       <dttm>             
1    10 2-13-23     2023-02-13 00:00:00
2    20 06/04/2022  2022-06-04 00:00:00
3    30 1 Mar 2022  2022-03-01 00:00:00
4    40 12/27/22    2022-12-27 00:00:00
5    50 Feb 3 2023  2023-02-03 00:00:00
6    60 30 Jan 2012 2012-01-30 00:00:00
7    70 02282022    2022-02-28 00:00:00

If I wanted this format without time, I can wrap this in lubridate::as_date()

d7 %>%
  dplyr::mutate(svy_date_fixed = 
           lubridate::as_date(lubridate::parse_date_time(svy_date, orders = c('mdy', 'dmy'))))
# A tibble: 7 x 3
     id svy_date    svy_date_fixed
  <dbl> <chr>       <date>        
1    10 2-13-23     2023-02-13    
2    20 06/04/2022  2022-06-04    
3    30 1 Mar 2022  2022-03-01    
4    40 12/27/22    2022-12-27    
5    50 Feb 3 2023  2023-02-03    
6    60 30 Jan 2012 2012-01-30    
7    70 02282022    2022-02-28    

2. Convert a month-day variable from Excel into the numeric values

Review the data (d9)

# A tibble: 3 x 2
     id age    
  <dbl> <chr>  
1   100 July-11
2   102 May-03 
3   103 Oct-14 

In this case, age is supposed to be a variable that represents the age of the participant, denoted as years-months old. However, Excel has converted this in to a date. We want to convert this back to the numeric values, stored as a character variable.

We can use lubridate::parse_date_time() to identify that age is currently a date, structured as month-day (denoted by %m%d), and then we can use lubridate::month() and lubridate::day() to pull out the numeric values of each. Last we can paste the pieces together, with a delimiter (“-”) to get our final desired outcome.

d9 %>%
  dplyr::mutate(age = paste0(
    lubridate::month(lubridate::parse_date_time(age, "%m%d")), 
    "-", 
    lubridate::day(lubridate::parse_date_time(age, "%m%d"))))
# A tibble: 3 x 2
     id age  
  <dbl> <chr>
1   100 7-11 
2   102 5-3  
3   103 10-14

We could also replace %m%d with “bd” (b = abbreviated or full month) to refer to the month and day format.

d9 %>%
  dplyr::mutate(age = paste0(
    lubridate::month(lubridate::parse_date_time(age, "bd")), 
    "-", 
    lubridate::day(lubridate::parse_date_time(age, "bd"))))
# A tibble: 3 x 2
     id age  
  <dbl> <chr>
1   100 7-11 
2   102 5-3  
3   103 10-14

Function: as_date(), as_datetime()


1. Convert a character variable that appears numeric (created_timestamp) to a datetime (POSIXct)

  • Note: Sometimes dates will come in as numeric, epoch time, typically number of seconds (sometimes a different unit) since a start date, usually Jan 1, 1970. This happens a lot when importing data from Excel but can also happen with other sources. For example, I’ve imported data from the Twitter API and had dates in epoch time in milliseconds. You will need to know information about your data before converting a variable.

Review the data (d5)

# A tibble: 2 x 3
  type     id created_timestamp
  <chr> <dbl> <chr>            
1 b       333 1643909614746    
2 a       444 1644593624206    

View the class for created_timestamp

  • Note: You’ll notice this variable appeared numeric but is actually character which is important to know.
class(d5$created_timestamp)
[1] "character"

Convert created_timestamp to a datetime

  • Note: I know for this timestamp variable that the value is in milliseconds, not seconds. So in order to use the function I want to use, lubridate::as_datetime(), I need to convert my character variable to numeric using base::as.numeric() and then divide it by 1000. If the time was in seconds, I would not need to convert the character variable to numeric before applying the lubridate function.

  • Note: We are recoding into a new variable using dplyr::mutate() and saving over the original variable by naming the new variable the same name as the original.

  • Note: You can change the timezone from UTC to something else (like “America/Chicago”) by adding the argument tz = “America/Chicago”. You can get your current timezone by typing “Sys.timezone()” in your console. Time zones must be from tz database (Olson database). You can see all time zones by typing “OlsonNames()” in your console.

d5_new <- d5 %>% 
  dplyr::mutate(created_timestamp = 
           lubridate::as_datetime(as.numeric(created_timestamp)/1000))

d5_new$created_timestamp
[1] "2022-02-03 17:33:34 UTC" "2022-02-11 15:33:44 UTC"
class(d5_new$created_timestamp)
[1] "POSIXct" "POSIXt" 

If I no longer cared about the time portion of this variable and only wanted the date, I could further modify the variable using lubridate::as_date().

  • Note: We are recoding into a new variable using dplyr::mutate() and saving into a new variable.
d5_new <- d5_new %>% 
  dplyr::mutate(created_timestamp_date =
           lubridate::as_date(created_timestamp))

d5_new$created_timestamp_date
[1] "2022-02-03" "2022-02-11"
class(d5_new$created_timestamp_date)
[1] "Date"

2. Convert a numeric date variable from Excel to a date

Review the data (d6)

# A tibble: 2 x 3
  type     id  date
  <chr> <dbl> <dbl>
1 b       333 44491
2 a       444 44530

View the class for date

class(d6$date)
[1] "numeric"
  • Note: Here you’ll notice that I provide an origin date. Unlike the Unix epoch of 1970-01-01, the origin date for Excel on Windows, is December 30, 1899 for dates after 1900. For Excel on Mac, the origin date is January 1, 1904. If I did not provide an origin date, we would get some very strange dates.
d6 <- d6 %>% 
  dplyr::mutate(date = lubridate::as_date(date, origin = "1899-12-30"))

d6$date
[1] "2021-10-22" "2021-11-30"
class(d6$date)
[1] "Date"

Package: janitor


Function: janitor::excel_numeric_to_date()



1. Convert a numeric date variable from Excel to a date

Review the data (d6)

# A tibble: 2 x 3
  type     id  date
  <chr> <dbl> <dbl>
1 b       333 44491
2 a       444 44530

View the class for date

class(d6$date)
[1] "numeric"

If you don’t want to stress over remembering the origin date, another helpful function for working with Excel numeric dates is janitor::excel_numeric_to_date(), which includes the origin time.

  • Note: There are still more issues where certain versions of Excel for Mac may still use different date systems which you can work with by adding the argument date_system. You can view the options for adding this argument by typing ?excel_numeric_to_date in your console.
d6 <- d6 %>% 
  dplyr::mutate(date = janitor::excel_numeric_to_date(date))

d6$date
[1] "2021-10-22" "2021-11-30"
class(d6$date)
[1] "Date"

Return to Data Types