Skip to contents

Applies a set of regular-expression-based text normalization rules to one or more files. By default, changes are shown on the console only, without actually modifying any files. Set run_dry = FALSE to apply the changes.

Usage

str_normalize_file(
  path,
  rules = yay::regex_text_normalization,
  run_dry = TRUE,
  process_line_by_line = FALSE,
  n_context_chrs = 20L,
  verbose = TRUE
)

Arguments

path

Paths to the text files. A character vector.

rules

A tibble of regular expression patterns and replacements. It must have the columns pattern and replacement. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form \1, \2, etc. will be replaced with the contents of the respective matched group (created in patterns using ()). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.

run_dry

Whether or not to show replacements on the console only, without actually modifying any files. Implies verbose = TRUE.

process_line_by_line

Whether each line in a file should be treated as a separate string or the whole file as one single string. While the latter is more performant, you probably want the former if you're using "^" or "$" in your patterns.

n_context_chrs

The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.

verbose

Whether or not to display replacements on the console.

Value

path invisibly.

See also

Examples

# Use POSIX-related file normalization rule(s) included in this package
temp_file <- tempfile()
download.file(url = paste0("https://raw.githubusercontent.com/RcppCore/Rcpp/72f0652b93f196d",
                           "64faab6b108cd02a197510a7b/inst/include/Rcpp/utils/tinyformat.h"),
              destfile = temp_file,
              quiet = TRUE,
              mode = "wb")

yay::regex_file_normalization |>
  dplyr::filter(category == "posix") |>
  yay::str_normalize_file(path = temp_file)
#>  Running in dry mode. Set `run_dry = FALSE` to actually modify any files.
#>  Processing file ../../../../../../../../../tmp/RtmpQvZFQN/file151f92460a2b15……
#>- …TINYFORMAT_H_INCLUDED
#>    + …TINYFORMAT_H_INCLUDE\1\n
#>  Processing file ../../../../../../../../../tmp/RtmpQvZFQN/file151f92460a2b15…… done [45ms]
#>