A Comparison of Packages to Generate Codebooks in R

⚡️ R-Ladies NYC Lightning Talk ⚡️

Crystal Lewis

Documentation

Codebooks

Criteria


✅ Compatible with class haven labelled data

✅ Exportable to txt, Word or PDF format

✅ Produces this variable level information

  • Variable name
  • Variable label
  • Variable type
  • Code values
  • Code labels
  • Missing value codes
  • Missing value labels
  • Total valid N
  • Total missing N
  • N per value
  • % per value
  • N per missing value
  • % per value
  • Range for continuous
  • Mean for continuous

Example Data

Pet Ownership and Attachment Survey

participant_id participant_age pet_type pet_1 pet_2 pet_3
10 12 1 2 1 1
22 14 2 2 2 -99
13 15 1 4 4 1
11 13 1 2 1 1
  • pet_1: Within your family, your pet likes you best
  • pet_2: You talk to your pet as a friend
  • pet_3: You buy presents for your pet

(1 = almost never, 2 = sometimes, 3 = often, 4 = almost always, -99 = missing response)

Metadata

Review of 4 packages

codebookr::codebook()


study_codebook <- codebookr::codebook(survey,
                    title = "Pet Relationship Study", 
                    subtitle = "Various Authors", 
                    description = "This study was funded by 
                    the Pet Society. Here is a basic 
                    description of our study, our methods, 
                    our sample, and protocols.")


print(study_codebook, "codebookr.docx")

codebookr::codebook()


  • Prints to Word Document
  • Works well with haven labelled data
  • Options to add additional variable attributes
  • Prints almost all summary statistics
  • Can add overall project metadata

codebook::codebook()


Creates the pre-filled .Rmd document

codebook::new_codebook_rmd()


Creates the codebook

codebook::codebook(survey)

codebook::codebook()


  • Prints to HTML, Word, PDF and other formats
  • Works fairly well with haven labelled data
  • Options to add additional variable attributes
  • Prints some of the summary statistics (not all)
  • Provides additional statistics such as scale reliability estimates


memisc::codebook()


study_codebook <- memisc::codebook(survey)


memisc::Write(study_codebook, file = 
                here::here("code","my_memisc_codebook.txt"))

memisc::codebook()


  • Prints to txt file
  • Works well with haven labelled data
  • Options to add additional variable attributes
  • Prints most of the summary statistics (not all)
  • Provides some additional summary statistics for continuous variables

sjPlot::view_df()


sjPlot::view_df(survey,
                show.type = TRUE,
                show.frq = TRUE,
                show.prc = TRUE, 
                show.na = TRUE,
                show.string.values = TRUE,
                file = here::here("code", "my_sjplot_codebook.html"))

sjPlot::view_df()


  • Prints to html file
  • Works well with haven labelled data
  • Prints most of the summary statistics (not all)
  • There are several arguments included to control your level of detail

A review of 10 packages

You can see a table of all 10 packages I reviewed on GitHub

Thank you!


cghlewis.com

@Cghlewis

linkedin.com/in/crystal-lewis-922b4193/

github.com/Cghlewis

meetup.com/rladies-st-louis/