Package: stringr


Function: str_trunc()


1. Truncate all _text variables to a max of 10 characters

Review the data (d14)

# A tibble: 3 x 5
     id item1_text   item2_text item3_text                               item4~1
  <dbl> <chr>        <chr>      <chr>                                    <lgl>  
1     1 <NA>         -1         broccoli                                 NA     
2     2 cheese pizza -1         I love pancakes in the morning and I lo~ NA     
3     3 <NA>         -8         strawberries                             NA     
# ... with abbreviated variable name 1: item4_text

In this case, we need to import this data into a program but the program only allows character columns to have a max of 10 characters.

We can see what our max count is for each variable right now.

  • Note: Some variables contain NA. If we don’t add any additional arguments to our base::nchar function, the value returned for those columns will be NA.
d14 %>%
  dplyr::select(contains("text")) %>%
  lapply(., \(x) max(nchar(x)))
$item1_text
[1] NA

$item2_text
[1] 2

$item3_text
[1] 66

$item4_text
[1] NA

That is probably not what we want. So we can add the argument keepNA = FALSE to return a value when the column contains NA. If the column is all NAs, it will return a value of 2, the number of printing characters used when strings are written to output.

d14 %>%
  dplyr::select(contains("text")) %>%
  lapply(., \(x) max(nchar(x, keepNA = FALSE)))
$item1_text
[1] 12

$item2_text
[1] 2

$item3_text
[1] 66

$item4_text
[1] 2

Now we can truncate our text variables.

d14 <- d14 %>%
  dplyr::mutate(dplyr::across(contains("text"),
                ~stringr::str_trunc(., 10, "right")))

Let’s see what the data looks like now

d14
# A tibble: 3 x 5
     id item1_text item2_text item3_text item4_text
  <dbl> <chr>      <chr>      <chr>      <chr>     
1     1 <NA>       -1         broccoli   <NA>      
2     2 cheese ... -1         I love ... <NA>      
3     3 <NA>       -8         strawbe... <NA>      

And let’s see what the new max values are

d14 %>%
  dplyr::select(contains("text")) %>%
  lapply(., \(x) max(nchar(x, keepNA = FALSE)))
$item1_text
[1] 10

$item2_text
[1] 2

$item3_text
[1] 10

$item4_text
[1] 2

Return to Strings