Package: dplyr


Function: consecutive_id()


1. Assign an ID for each combination of two variables

Review the data (d10)

# A tibble: 8 x 3
  tch_id date_observed rater
   <dbl> <date>        <dbl>
1      1 2025-01-02        5
2      1 2025-01-02       10
3      1 2024-01-01        4
4      2 2025-01-10        5
5      2 2025-01-15        5
6      2 2025-01-15       10
7      2 2025-01-07        4
8      3 2025-01-08        4

I want to create a new variable video_id that assigns an ID value for each unique date within each tch_id group. I also want those numbers to be order chronologically.

In order to do this,

  1. I first group by tch_id (using dplyr::group_by())
  2. Then order dates chronologically (using dplyr::arrange())
  3. Then create our new variable using dplyr::mutate() and use dplyr::consecutive_id() to create a unique identifier that increments every time a variable changes
  4. Last I make sure to dplyr::ungroup() our data
d10 %>%
  group_by(tch_id) %>%
  arrange(tch_id, date_observed) %>%
  mutate(video_id = consecutive_id(date_observed)) %>%
  ungroup()
# A tibble: 8 x 4
  tch_id date_observed rater video_id
   <dbl> <date>        <dbl>    <int>
1      1 2024-01-01        4        1
2      1 2025-01-02        5        2
3      1 2025-01-02       10        2
4      2 2025-01-07        4        1
5      2 2025-01-10        5        2
6      2 2025-01-15        5        3
7      2 2025-01-15       10        3
8      3 2025-01-08        4        1

You could also do this without dplyr::consecutive_id() by using base::as.factor() instead.

d10 %>%
  group_by(tch_id) %>%
  arrange(tch_id, date_observed) %>%
  mutate(video_id = as.numeric(as.factor(date_observed))) %>%
  ungroup()

Return to Create New Variables