Package: janitor


Function: compare_df_cols()


1. Compare two data frames that should have identical column types to see if there are any differences

Read in the data

  • Note: Here we read in csv files but of course you can read in any type of data (.xlsx, .sav, etc)
entry1 <- readr::read_csv("project-a_forms_entry1.csv")
entry3 <- readr::read_csv("project-b_forms_entry3.csv")

Review entry1

# A tibble: 4 x 5
  stu_id grade    q1    q2    q3
   <dbl> <dbl> <dbl> <dbl> <dbl>
1   1234     1     2     4     6
2   1235     2     1     5     6
3   1236     1    NA    12     4
4   1237     3     3     2     4

Review entry3

# A tibble: 4 x 5
  stu_id grade    q1    q2    q3
   <dbl> <chr> <dbl> <dbl> <dbl>
1   1234 1         1     4     6
2   1235 2         1     5     6
3   1236 1        NA     1     4
4   1237 3         3     2     4

Now check if there are any differences in the column classes between the two data frames. The function janitor::compare_df_cols() will indicate if data frames will successfully bind together by rows.

compare_df_cols(entry1, entry3)
  column_name  entry1    entry3
1       grade numeric character
2          q1 numeric   numeric
3          q2 numeric   numeric
4          q3 numeric   numeric
5      stu_id numeric   numeric

If we only wanted to see the differences, we could add the argument return = “mismatch”. We can easily see here that grade is a different variable type across files and we will need to transform the variable in entry1 or entry3 in if we want to bind the two data frames.

compare_df_cols(entry1, entry3, return = "mismatch")
  column_name  entry1    entry3
1       grade numeric character

Return to Compare data frames