Skip to contents

Applies a set of regular-expression-based text normalization rules to one or more strings. All performed replacements are displayed on the console by default (verbose = TRUE).

Usage

str_normalize(
  string,
  rules = yay::regex_text_normalization,
  n_context_chrs = 20L,
  verbose = TRUE
)

Arguments

string

Input vector. Either a character vector, or something coercible to one.

rules

A data frame of regular expression patterns and replacements. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form \1, \2, etc. will be replaced with the contents of the respective matched group (created in patterns using ()). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.

n_context_chrs

The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.

verbose

Whether or not to display replacements on the console.

Value

path invisibly.

See also

Examples

"This kind of “text normalization” is e.g. useful to apply before feeding stuff to ‘Pandoc’" |>
  yay::str_normalize()
#>- This kind of text normalization” …
#>    + This kind of "text normalization” …
#>- … “text normalization is e.g. useful to a…
#>    + … “text normalization" is e.g. useful to a…
#>- …re feeding stuff to Pandoc’
#>    + …re feeding stuff to 'Pandoc’
#>- …ing stuff to ‘Pandoc
#>    + …ing stuff to ‘Pandoc'
#> [1] "This kind of \"text normalization\" is e.g. useful to apply before feeding stuff to 'Pandoc'"