class: inverse, left, middle background-image: url(img/cover2.png) # Data Management Overview: Session 3 ## Training for Schoen Research ---- ## Crystal Lewis Slides available on [
](https://cghlewis.github.io/schoen-data-mgmt-series-public/) --- # Plan for this series .pull-left[ Session 1 * ~~Data flow~~ * ~~Documentation~~ <br>  ] .pull-right[ Session 2 * ~~Creating instruments~~ * ~~Tracking data~~ * ~~Capturing and storing data~~ * ~~Preparing to clean and validate data~~ ] --- # Plan for this series .pull-left[ Session 3 * Why R? * Getting acclimated with R and RStudio * Understanding objects, functions, and packages * Code writing best practices Session 4 * Packages and functions for data wrangling ] .pull-right[ Session 5 * Setting up a reproducible syntax file * Cleaning and validating data with R Session 6 * Additional data wrangling with R <img src="img/r-project.svg" width="300px" style="display: block; margin: auto;" /> ] --- background-image: url(img/syntax.PNG) background-size: contain --- background-image: url(img/syntax2.PNG) background-size: contain --- # Why use R for Data Management? .pull-left[ * Writing syntax: - Automates your work - Allows your work to be reproducible - Facilitates collaboration - Allows others to check your work <img src="img/heartyr.gif" width="300px" style="display: block; margin: auto;" /> .center[Source: @allison_horst] ] .pull-right[ * R specifically: - **Free and open source** - Platform independent - Supportive community - Powerful packages that allow us to quickly manipulate data - It integrates well with other languages, file types, and applications - Becoming more ubiquitous in the world of education research ] --- .panelset[ .panel[.panel-name[messy_data] <table class="table table-striped" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> Teach Years </th> <th style="text-align:left;"> Teach grade </th> <th style="text-align:left;"> School District ID </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 123 </td> <td style="text-align:left;"> 12yrs </td> <td style="text-align:left;"> k </td> <td style="text-align:left;"> 50_100 </td> </tr> <tr> <td style="text-align:right;"> 234 </td> <td style="text-align:left;"> 15 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 50_100 </td> </tr> <tr> <td style="text-align:right;"> 345 </td> <td style="text-align:left;"> 22.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 60_102 </td> </tr> <tr> <td style="text-align:right;"> 456 </td> <td style="text-align:left;"> 4yrs </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 60_102 </td> </tr> <tr> <td style="text-align:right;"> 567 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 60_102 </td> </tr> <tr> <td style="text-align:right;"> 678 </td> <td style="text-align:left;"> .5 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 50_100 </td> </tr> </tbody> </table> ] .panel[.panel-name[cleaning_code] ```r library(tidyverse) library(stringr) library(janitor) clean <- mess %>% # fix names clean_names() %>% # remove "yrs" from teach_years mutate(teach_years = str_remove_all(teach_years, "yrs")) %>% # make teach_years numeric and round up mutate(teach_years = ceiling(as.numeric(teach_years))) %>% # for teach_grade, recode k to 0 and make it a numeric variable mutate(teach_grade = as.numeric(recode(teach_grade, `k` = "0"))) %>% # separate school and district id separate(school_district_id, into = c("sch_id", "district_id"), sep = "_") ``` ] .panel[.panel-name[clean_data] <table class="table table-striped" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> teach_years </th> <th style="text-align:right;"> teach_grade </th> <th style="text-align:left;"> sch_id </th> <th style="text-align:left;"> district_id </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 123 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:left;"> 50 </td> <td style="text-align:left;"> 100 </td> </tr> <tr> <td style="text-align:right;"> 234 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 50 </td> <td style="text-align:left;"> 100 </td> </tr> <tr> <td style="text-align:right;"> 345 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 102 </td> </tr> <tr> <td style="text-align:right;"> 456 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 102 </td> </tr> <tr> <td style="text-align:right;"> 567 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 102 </td> </tr> <tr> <td style="text-align:right;"> 678 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 50 </td> <td style="text-align:left;"> 100 </td> </tr> </tbody> </table> ] ] --- class: inverse, top, center # What Else Can R Do? <img src="img/r-package.PNG" width="800px" style="display: block; margin: auto;" /> Source: [<span style="color: white; ">rviews</span>](https://rviews.rstudio.com/2019/06/19/a-gentle-intro-to-tidymodels/) --- class: top, center # Analyze Data <img src="img/models.png" width="800px" height="450px" style="display: block; margin: auto;" /> .footer[Source: [statisticsglobe](https://statisticsglobe.com/extract-standard-error-t-and-p-value-from-regression-in-r)] --- class: top, center # Data Visualization <img src="img/graphs.jpg" width="800px" height="450px" style="display: block; margin: auto;" /> .footer[Source: @mohsinramay_] --- class: top, center # Mapping <img src="img/map.png" width="700px" height="450px" style="display: block; margin: auto;" /> .footer[Source: [revolutionanalytics](https://blog.revolutionanalytics.com/2012/02/creating-beautiful-maps-with-r.html)] --- class: top, center .left-column[ # Reporting ] .right-column[ <img src="img/psc.gif" width="550px" height="600px" /> ] .footer[Source: [RfortheRestofUs](https://rfortherestofus.com/consulting/)] --- class: top, center # Applications <img src="img/app.PNG" width="600px" height="425px" style="display: block; margin: auto;" /> Source: [Ashley Edwards](https://wordreadinggrowth.shinyapps.io/exposures_to_mastery/) --- class: top, center # Slide Decks <img src="img/slides.PNG" width="650px" height="475px" style="display: block; margin: auto;" /> --- class: top, center # Websites <img src="img/website.PNG" width="750px" height="475px" style="display: block; margin: auto;" /> Source: [Meghan Hall](https://meghan.rbind.io/) --- background-image: url(img/smiling_r_user.jpg) background-size: contain --- .pull-left[ <img src="img/r.PNG" width="480px" height="415px" style="display: block; margin: auto;" /> ### R is a free, open-source programming language for statistics and data visualization ] .pull-right[  ### RStudio is an integrated development environment (IDE) for R ] --- background-image: url(img/dashboard.PNG) .footnote[Source: [ModernDive](https://moderndive.com/1-getting-started.html#r-rstudio)] --- .pull-left[  ] .pull-right[ <img src="img/justRStud.PNG" width="540px" height="540px" style="display: block; margin: auto;" /> ] --- background-image: url(img/rstudio_types.PNG) background-size: contain --- background-image: url(img/cruella.gif) background-size: contain --- # Disclaimer <img src="img/r_rollercoaster.png" width="900px" height="500px" style="display: block; margin: auto;" /> --- class: inverse # Let's Get to Know R and RStudio <style> .exercise { font-size: 2em; font-style: bold } .tan{ color: #CEB888; } </style> .exercise[
.tan[Exercises]
] ### 1. Open R ### 2. Open RStudio ### 3. Find what version of R and RStudio that you have --- ## Tour of RStudio <img src="img/panes.PNG" width="800px" height="500px" style="display: block; margin: auto;" /> --- class: inverse background-image: url(img/reminders.PNG) background-size: 80% --- ## Source Pane <img src="img/panes2.PNG" width="800px" height="500px" style="display: block; margin: auto;" /> --- class: inverse background-image: url(img/reminders2.PNG) background-size: 80% --- class: inverse # Let's Get to Know our Settings .pull-left[ .exercise[
.tan[Exercises]
] ### Tools -> Global Options 1. General Options 2. Code Options 3. Appearance Options 4. Pane Layout ] .pull-right[  ] --- # Terminology <style> .yellow{ color: yellow; font-style: bold; } </style> Everything that exists in R is an **object** Recall the assignment operator .yellow[`<-`] ```r x <- 5 y <- 6 ``` Everything that happens is a **function** Consider the function .yellow[sum] ```r sum(x, y) ``` ``` [1] 11 ``` .footnote[Source: [R for the Rest of Us](https://rfortherestofus.com/courses/getting-started/)] --- # Objects .pull-left[ 1. There are 6 types of **objects** in R programming + **Vector** + Lists + Matrices + Array + Factors + **Data Frame** (Tibble) ] .pull-right[  ] --- # Objects .pull-left[ * Data Frame - Simply put, this is your dataset. A two dimensional data structure, where each column is a variable and each row is a case. - In R you can create your own dataframe or read in a dataframe from your computer, a website, or a package. - Tibble is another term you might hear ```r data <- data.frame( id = c(123, 234, 456), age = c(12, 10, 9)) data ``` ``` id age 1 123 12 2 234 10 3 456 9 ``` ] .pull-right[ * Vector - This is the simplest object - It consists of one or more elements all of the same type (ex: all numeric, all character) - Think of vector as a variable outside of a dataframe ```r id <- c(123, 234, 456) id ``` ``` [1] 123 234 456 ``` ] --- # Best Practices for Object Naming .pull-left[ 1. No spaces 2. Use all lower case 3. Use underscores to separate words 4. Descriptive 5. Not names of existing functions 6. Don't start with a number 7. No special characters ] .pull-right[ ```r # Good day_one day_1 # Bad 1_day DayOne day-one x ``` ```r # Bad T <- FALSE c <- 10 sum <- x + y ``` Source: [Advanced R](http://adv-r.had.co.nz/Style.html) ] --- # Every R Object has a Type and Class .pull-left[ #### **Type**: How an object is stored in memory 1. Character: **"apple"**, **"12_405"** 2. Double: **2**, **2.5** 3. Integer: **2L** 4. Logical: **TRUE**, **FALSE** 5. Complex: **1+4i** ] .pull-right[ #### **Class**: The abstract type 1. Character 1. Numeric 1. Integer 1. Factor 1. Date 1. POSIXct 1. Logical ] .footnote[[The Carpentries](https://swcarpentry.github.io/r-novice-inflammation/13-supp-data-structures/)] --- # Examples of Type and Class .pull-left[ ```r # Numeric vector age <- c(12, 14) class(age) ``` ``` [1] "numeric" ``` ```r typeof(age) ``` ``` [1] "double" ``` ] .pull-right[ ```r # Date birth_date <- as.Date(c("2005-01-14", "2006-07-22")) class(birth_date) ``` ``` [1] "Date" ``` ```r typeof(birth_date) ``` ``` [1] "double" ``` ] --- # Examples of Type and Class ```r # Dataframe data <- data.frame( id = c(123, 234, 456), age = c(12, 10, 9)) print(data) ``` ``` id age 1 123 12 2 234 10 3 456 9 ``` ```r class(data) ``` ``` [1] "data.frame" ``` ```r typeof(data) ``` ``` [1] "list" ``` --- # Functions .pull-left[ **Option 1:** Write your own function ```r my_sum <- function(x,y){ x + y } my_sum(x=1, y=2) ``` ``` [1] 3 ``` ] .pull-right[ **Option 2:** Use an existing function (calling a function) ```r x <- 1 y <- 2 sum(x, y) ``` ``` [1] 3 ``` ] --- # Anatomy of a Function Call <style> .large { font-size: 2em; font-style: bold; } </style> .center[.large[**function_name(arguments)**]] .pull-left[  ] .pull-right[  ] --- #
Let's Practice Some Base R Functions
| Task | Function | Arguments | |-------|------------|-------| |combine elements | `c` | objects | |check the class of an object | `class` | object | |check the length of a vector | `length` | object | |get the mean of values | `mean` | object, na.rm| |create a data frame | `data.frame` | vectors of the same length | |check internal structure of an object | `str`| object | <br> <br> --- class: inverse # Packages Packages are a collection of functions. They are written by a worldwide community of R users and can be downloaded for free from the internet. <img src="img/packages.PNG" width="600px" height="300px" style="display: block; margin: auto;" /> .footnote[Source: [<span style="color: white;">Modern Dive</span>](https://moderndive.com/1-getting-started.html#r-rstudio)] --- # Packages .pull-left[ #### A huge collection of packages are hosted on the internet 1. CRAN + A central repository supported by the R Foundation + **C**omprehensive **R** **A**rchive **N**etwork + These packages must meet certain quality standards, and are regularly tested + Anyone can submit their package to CRAN and have it published for broad use + https://cran.r-project.org/ #### You may also find packages in other places 2. Bioconductor 3. GitHub 4. Your own personal computer or network drive ] .pull-right[  ] --- # How to Access Packages .pull-left[ #### Step 1: Install: Do only once Either install a package through code ```r install.packages("package_name") ``` Or install manually <img src="img/install.PNG" width="600px" height="275px" /> ] .pull-right[ #### Step 2: Load: Do every time you start R ```r library(package_name) library(stringr) library(dplyr) library(readr) library(haven) ``` ] --- class: inverse # Getting Help If you want to understand a package, you have several options: .pull-left[ Review the documentation.  ] .pull-right[ Type .white[`?package_name`] in your console  ] --- class: inverse # Getting Help in General .pull-left[ Google is your friend  ] .pull-right[ Turn to the supportive community  Source: [<span style="color: white;">Shannon Pileggi</span>](https://www.pipinghotdata.com/) ] --- class: inverse # Restart R Session .pull-left[  ] .pull-right[  ] --- #
Let's Use our First Package
.pull-left[ 1. Open a script 2. Install a package + `readr` 3. Library a package + `library(readr)` 4. Use a function from that package + `read_csv()` + To read in this seattle pet names data from a website: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-26/seattle_pets.csv ] .pull-right[ ```r # Install readr package install.packages("readr") # Library package library(readr) # Read in data using readr and assign to an object pet_names <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-26/seattle_pets.csv") ``` ] --- # General Code Writing Best Practices 1. Comment your code + Comments begin with `#` and a space + Comments should explain what and why something is happening ```r # Load raw data # Drop duplicates cases from raw data ``` 2\. Add sections to further refine your comments and make your file more searchable + Sections can be added by using 4 dashes after your comment `----` + You can also add sections by going to - Code -> Insert Section ```r # Load raw data ---- # (1) Read in the data # (2) Review the data ``` --- # General Code Writing Best Practices 3\. Use spacing + Place spacing around all operators (**except** the colon) + Do not put spaces inside or outside parentheses for regular function calls ```r # Good average <- mean(feet / 12 + inches, na.rm = TRUE) x <- 1:10 mean(x, na.rm = TRUE) # Bad average<-mean(feet/12+inches,na.rm=TRUE) x <- 1 : 10 mean (x, na.rm = TRUE) mean( x, na.rm = TRUE ) ``` --- # General Code Writing Best Practices 4\. Function-indent long lines + Strive to limit your code to 80 characters per line + If a function call is too long, place each argument on its own line ```r do_something_very_complicated( something = "that", requires = many, arguments = "some of which may be long" ) ``` .footnote[Source: [Tidyverse Style Guide](https://style.tidyverse.org/syntax.html) and [Advanced R](http://adv-r.had.co.nz/Style.html)] --- # Shortcuts | Task | Windows | Mac | |-------|------------|-------| |Run current line of code | `Ctrl + Enter` | `Cmd + Return` | |Run all lines of code | ` Ctrl + A + Enter` | `Cmd + A + Enter` | |Code completion | `Tab` | `Tab`| |Insert assignment operator | `Alt + -` | `Option + -` | |Insert pipe operator | `Ctrl + Shift + M` | `Cmd + Shift + M` | |Comment or uncomment a line | `Ctrl + Shift + C` | `Cmd + Shift + C`| |Restart R Session | `Ctrl + Shift + F10` | `Cmd + Shift + F10` |Multi-line cursor| `Alt + click and drag` | `Option + click and drag`| |Previous command in console | `up-arrow` | `up-arrow` | --- #
Let's Practice Shortcuts
<img src="img/shortcuts.PNG" width="600px" height="500px" style="display: block; margin: auto;" /> .footnote[Source: [Zenkit](https://zenkit.com/en/blog/shortcuts-for-all/)] --- class: inverse, center, middle # Questions?