Apply regular-expression-based text normalization to strings

Applies a set of regular-expression-based text normalization rules to one or more strings. All performed replacements are displayed on the console by default (verbose = TRUE).

Usage

str_normalize(
  string,
  rules = yay::regex_text_normalization,
  n_context_chrs = 20L,
  verbose = TRUE
)

Arguments

string: Input vector. Either a character vector, or something coercible to one.
rules: A tibble of regular expression patterns and replacements. It must have the columns pattern and replacement. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form \1, \2, etc. will be replaced with the contents of the respective matched group (created in patterns using ()). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.
n_context_chrs: The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.
verbose: Whether or not to display replacements on the console.

Value

path invisibly.

Examples

"This kind of “text normalization” is e.g. useful to apply before feeding stuff to ‘Pandoc’" |>
  yay::str_normalize()
#> 1× - This kind of “text normalization” …
#>    + This kind of "text normalization” …
#> 1× - … “text normalization” is e.g. useful to a…
#>    + … “text normalization" is e.g. useful to a…
#> 1× - …re feeding stuff to ‘Pandoc’
#>    + …re feeding stuff to 'Pandoc’
#> 1× - …ing stuff to ‘Pandoc’
#>    + …ing stuff to ‘Pandoc'
#> [1] "This kind of \"text normalization\" is e.g. useful to apply before feeding stuff to 'Pandoc'"

Apply regular-expression-based text normalization to strings

Usage

Arguments

Value

See also

Examples