Applies a set of regular-expression-based text normalization rules to one or more strings. All performed replacements are displayed on the console by default
(verbose = TRUE).
Usage
str_normalize(
string,
rules = yay::regex_text_normalization,
n_context_chrs = 20L,
verbose = TRUE
)Arguments
- string
Input vector. Either a character vector, or something coercible to one.
- rules
A data frame of regular expression
patterns andreplacements.patterncan optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described instringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form\1,\2, etc. will be replaced with the contents of the respective matched group (created in patterns using()). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.- n_context_chrs
The (maximum) number of characters displayed around the actual
stringand its replacement. The number refers to a single side ofstring/replacement, so the total number of context characters is at the maximum2 * n_context_chrs. Only relevant ifverbose = TRUE.- verbose
Whether or not to display replacements on the console.
See also
Regular expression rules: regex_text_normalization regex_file_normalization
Other string functions:
str_normalize_file(),
str_replace_file(),
str_replace_verbose()
Examples
"This kind of “text normalization” is e.g. useful to apply before feeding stuff to ‘Pandoc’" |>
yay::str_normalize()
#> 1× - This kind of “text normalization” …
#> + This kind of "text normalization” …
#> 1× - … “text normalization” is e.g. useful to a…
#> + … “text normalization" is e.g. useful to a…
#> 1× - …re feeding stuff to ‘Pandoc’
#> + …re feeding stuff to 'Pandoc’
#> 1× - …ing stuff to ‘Pandoc’
#> + …ing stuff to ‘Pandoc'
#> [1] "This kind of \"text normalization\" is e.g. useful to apply before feeding stuff to 'Pandoc'"