str_trunc()1. Truncate all _text variables to a max of 10
characters
Review the data (d14)
# A tibble: 3 x 5
id item1_text item2_text item3_text item4~1
<dbl> <chr> <chr> <chr> <lgl>
1 1 <NA> -1 broccoli NA
2 2 cheese pizza -1 I love pancakes in the morning and I lo~ NA
3 3 <NA> -8 strawberries NA
# ... with abbreviated variable name 1: item4_text
In this case, we need to import this data into a program but the program only allows character columns to have a max of 10 characters.
We can see what our max count is for each variable right now.
base::nchar function, the value returned
for those columns will be NA.d14 %>%
dplyr::select(contains("text")) %>%
lapply(., \(x) max(nchar(x)))
$item1_text
[1] NA
$item2_text
[1] 2
$item3_text
[1] 66
$item4_text
[1] NA
That is probably not what we want. So we can add the argument keepNA = FALSE to return a value when the column contains NA. If the column is all NAs, it will return a value of 2, the number of printing characters used when strings are written to output.
d14 %>%
dplyr::select(contains("text")) %>%
lapply(., \(x) max(nchar(x, keepNA = FALSE)))
$item1_text
[1] 12
$item2_text
[1] 2
$item3_text
[1] 66
$item4_text
[1] 2
Now we can truncate our text variables.
d14 <- d14 %>%
dplyr::mutate(dplyr::across(contains("text"),
~stringr::str_trunc(., 10, "right")))
Let’s see what the data looks like now
d14
# A tibble: 3 x 5
id item1_text item2_text item3_text item4_text
<dbl> <chr> <chr> <chr> <chr>
1 1 <NA> -1 broccoli <NA>
2 2 cheese ... -1 I love ... <NA>
3 3 <NA> -8 strawbe... <NA>
And let’s see what the new max values are
d14 %>%
dplyr::select(contains("text")) %>%
lapply(., \(x) max(nchar(x, keepNA = FALSE)))
$item1_text
[1] 10
$item2_text
[1] 2
$item3_text
[1] 10
$item4_text
[1] 2
Return to Strings