Package: tidyr


Function: pivot_wider()


  • Note: There are many arguments available in tidyr::pivot_wider(). Type ?pivot_wider in your console to see all arguments. But the most basic arguments to be aware of include:
    names_from = name of columns/s to get the name of the output column,
    values_from = name of column/s to get the values from


1. Restructure our longitudinal data into wide format using the “wave” column

Review the data (d2)

# A tibble: 6 x 4
  tch_id treatment  wave    q1
   <dbl>     <dbl> <dbl> <dbl>
1   1234         1     1     3
2   1234         1     2     5
3   2345         0     1     4
4   2345         0     2     5
5   3456         1     1    NA
6   3456         1     2     2

Restructure the data into wide format, adding “w#_” as a prefix to the “q1” variable.

  • Note: Here we use the names_from argument to say what the output column should be, and the values_from argument to get cell values. We also use names_glue to create a custom column name where we add “w#_” as a prefix to our question name. If we did not add the names_glue argument, our column values would simply be the name of the wave (which in this case is not what we want because we want to know the question as well).
d2 %>%
  tidyr::pivot_wider(
    names_from = wave,
    names_glue = "w{wave}_{.value}",
    values_from = q1)
# A tibble: 3 x 4
  tch_id treatment w1_q1 w2_q1
   <dbl>     <dbl> <dbl> <dbl>
1   1234         1     3     5
2   2345         0     4     5
3   3456         1    NA     2

2. Restructure our longitudinal data into wide format using the “year” and “cycle” column

Review the data (d7)

# A tibble: 8 x 5
     id year  cycle    q1    q2
  <dbl> <chr> <dbl> <dbl> <dbl>
1  1234 1819      1     1     2
2  1234 1819      2     2     3
3  1234 1920      1     2     4
4  1234 1920      2     2     3
5  2345 1819      1     0     1
6  2345 1819      2     1     3
7  2345 1920      1     2     3
8  2345 1920      2     3     3

Restructure the data into wide format, adding the year and cycle as a prefix to “q1” and “q2”.

  • Note: If we didn’t specify our own name using names_glue, the year and cycle would be added as a suffix since there are multiple value columns.
d7 %>%
  pivot_wider(
    names_from = c(year,cycle),
    values_from = c(q1, q2),
    names_glue = "y{year}_c{cycle}_{.value}"
  )
# A tibble: 2 x 9
     id y1819_c1_q1 y1819_c2_q1 y1920_c1_q1 y1920_c2_q1 y1819_c1_q2 y1819_c2_q2
  <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
1  1234           1           2           2           2           2           3
2  2345           0           1           2           3           1           3
# ... with 2 more variables: y1920_c1_q2 <dbl>, y1920_c2_q2 <dbl>

3. Restructure the data into wide format to calculate a statistic such as intraclass correlation coefficient

Review the data (d3)

# A tibble: 6 x 3
  tch_id rater_id score
   <dbl>    <dbl> <dbl>
1   1234       16    23
2   1234       22    24
3   2345       16    22
4   2345       22    19
5   3456       16    18
6   3456       22    20

For ICC analysis, we typically only want two columns of data, one for rater 1 and one for rater 2 (or one column per rater if we have more than 2 raters per case).

  • Note: Instead of using names_glue we used names_prefix this time to state that we just want to add “rater_” to the beginning of our rater_id column name.
d3 %>%
    tidyr::pivot_wider(names_from=rater_id,
                values_from=score,
                names_prefix= "rater_") %>%
  select(-tch_id)
# A tibble: 3 x 2
  rater_16 rater_22
     <dbl>    <dbl>
1       23       24
2       22       19
3       18       20

If we are only concerned with the pairs, and not the individual raters, we can also use this method even if the raters are not consistent. Take this data for example, where the raters change, but we have a column to indicate which rater is primary and which rater is secondary.

# A tibble: 7 x 4
  tch_id rater_id primary score
   <dbl>    <dbl>   <dbl> <dbl>
1   1234       16       1    23
2   1234       22       2    24
3   2345       16       1    22
4   2345       25       2    19
5   3456       18       1    18
6   3456       22       2    20
7   4567       18       1    22

We can still restructure this data.

  • Note: The only thing different here, compared to the prior example, is that we add the argument id_cols to indicate the the “tch_id” is what makes our rows unique.
d4 %>%
    tidyr::pivot_wider(
      id_cols = tch_id,
      names_from=primary,
      values_from=score,
      names_prefix= "rater_")  %>% 
    dplyr::select(-tch_id)
# A tibble: 4 x 2
  rater_1 rater_2
    <dbl>   <dbl>
1      23      24
2      22      19
3      18      20
4      22      NA

Return to Restructure