Package: stringr


Function: str_extract()


1. Extract all numeric values from item1

Review the data (d3)

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12cm 
2     2 4cm  
3     3 6cm  
4     4 2.5CM

We first need to create our new variable item1 which will write over our existing item1 variable since I named it the same name. We do this using dplyr::mutate().

Next we use the argument pattern = in our stringr function to denote what pattern we want to extract. If it wasn’t for the 2.5 value, this pattern would be fairly simple. We could use something like “\\d*” to say zero or more digits.

But since we have a “2.5” value, the pattern is a little more complicated. We say the pattern starts with zero or more digits (“^\\d*”) and then an optional period (“\\.?”) and then zero or more digits (“\\d*”).

d3 %>%
  dplyr::mutate(item1 = stringr::str_extract(item1, pattern = "^\\d*\\.?\\d*"))
# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12   
2     2 4    
3     3 6    
4     4 2.5  
  • Note: It is important to note above that item1 continues to be a character variable. If we want to work with this as a numeric variable, we would need to further transform this variable into numeric.


Package: tidyr


Function: extract()


1. Extract all numeric values from item1

Review the data (d3)

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12cm 
2     2 4cm  
3     3 6cm  
4     4 2.5CM

Using tidyr::extract() you no longer need to use dplyr::mutate() because you have the argument into where you name the new extracted variable/s.

The other thing to note is that in the regex argument, each pattern associated with an “into” variable needs to be surrounded by parentheses.

d3 %>%
  tidyr::extract(col =item1, into = "item1", regex = "(^\\d*\\.?\\d*)")
# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12   
2     2 4    
3     3 6    
4     4 2.5  

Note again that the variable is still character and would need to be converted to numeric as a next step.


Package: readr


Function: parse_number()


1. Extract all numeric values from item1

Review the data (d3)

# A tibble: 4 x 2
     id item1
  <dbl> <chr>
1     1 12cm 
2     2 4cm  
3     3 6cm  
4     4 2.5CM

There is another function readr::parse_number() that pulls out all numeric values from a string variable and converts the variable to numeric for us, which is pretty cool.

d3 %>%
  dplyr::mutate(item1 = readr::parse_number(item1))
# A tibble: 4 x 2
     id item1
  <dbl> <dbl>
1     1  12  
2     2   4  
3     3   6  
4     4   2.5

Return to Strings