Calculate row occurrences of strings

Package: base

Function: `rowSums()`

1. Calculate all occurrences of “prefer not to answer” across a selection of variables (q1:q4)

Review the data (d7)

# A tibble: 3 x 5
     id q1                   q2                   q3    q4                  
  <dbl> <chr>                <chr>                <chr> <chr>               
1     1 no                   yes                  no    <NA>                
2     2 prefer not to answer no                   yes   no                  
3     3 no                   prefer not to answer yes   prefer not to answer

Here, using dplyr::mutate() I can create a new variable prefer_sum.

Then I can use base::rowSums() to calculate the occurrences of the string “prefer”, detected using stringr:str_detect(), across my variables, selected using dplyr::across().

I add the rowSums() argument na.rm = TRUE to calculate sums despite NA values in my variables.

d7 %>%
  mutate(prefer_sum =
                  rowSums(across(q1:q4, ~ str_detect(., "prefer")), na.rm = TRUE))

# A tibble: 3 x 6
     id q1                   q2                   q3    q4            prefer_sum
  <dbl> <chr>                <chr>                <chr> <chr>              <dbl>
1     1 no                   yes                  no    <NA>                   0
2     2 prefer not to answer no                   yes   no                     1
3     3 no                   prefer not to answer yes   prefer not t~          2

I could also write this using the exact text.

d7 %>%
  mutate(prefer_sum =
                  rowSums(across(q1:q4, ~ . == "prefer not to answer"), na.rm = TRUE))

# A tibble: 3 x 6
     id q1                   q2                   q3    q4            prefer_sum
  <dbl> <chr>                <chr>                <chr> <chr>              <dbl>
1     1 no                   yes                  no    <NA>                   0
2     2 prefer not to answer no                   yes   no                     1
3     3 no                   prefer not to answer yes   prefer not t~          2

2. Calculate a sum of occurences of kids between ages 1 and 17 (kid1:kid3)

Review the data (d8)

# A tibble: 4 x 4
     id  kid1  kid2  kid3
  <dbl> <dbl> <dbl> <dbl>
1    10     8    10    NA
2    11    15    19    20
3    12    NA    NA    NA
4    13     4    NA    NA

Similar to above I can use stringr::str_detect() in combination with regex to only grab values 1-17.

d8 %>%
  mutate(minor_kids =
                  rowSums(across(
                    kid1:kid3, ~ str_detect(., "^[1-9]$|^[1]{1}[0-7]$")), na.rm = TRUE))

# A tibble: 4 x 5
     id  kid1  kid2  kid3 minor_kids
  <dbl> <dbl> <dbl> <dbl>      <dbl>
1    10     8    10    NA          2
2    11    15    19    20          1
3    12    NA    NA    NA          0
4    13     4    NA    NA          1

But also, since the values are all numeric, I could simplify the code to this.

d8 %>%
  mutate(minor_kids =
                  rowSums(across(kid1:kid3, ~ . < 18), na.rm = TRUE))

# A tibble: 4 x 5
     id  kid1  kid2  kid3 minor_kids
  <dbl> <dbl> <dbl> <dbl>      <dbl>
1    10     8    10    NA          2
2    11    15    19    20          1
3    12    NA    NA    NA          0
4    13     4    NA    NA          1

Return to Calculate Sums and Means

Calculate row occurrences of strings

Package: base

Function: rowSums()

Function: `rowSums()`