Apply regular-expression-based text normalization to files

Applies a set of regular-expression-based text normalization rules to one or more files. By default, changes are shown on the console only, without actually modifying any files. Set run_dry = FALSE to apply the changes.

Usage

str_normalize_file(
  path,
  rules = yay::regex_text_normalization,
  run_dry = TRUE,
  process_line_by_line = FALSE,
  n_context_chrs = 20L,
  verbose = TRUE
)

Arguments

path: Paths to the text files. A character vector.
rules: A data frame of regular expression patterns and replacements. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form \1, \2, etc. will be replaced with the contents of the respective matched group (created in patterns using ()). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.
run_dry: Whether or not to show replacements on the console only, without actually modifying any files. Implies verbose = TRUE.
process_line_by_line: Whether each line in a file should be treated as a separate string or the whole file as one single string. While the latter is more performant, you probably want the former if you're using "^" or "$" in your patterns.
n_context_chrs: The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.
verbose: Whether or not to display replacements on the console.

Value

path invisibly.

Examples

# Use POSIX-related file normalization rule(s) included in this package
temp_file <- tempfile()
download.file(url = paste0("https://raw.githubusercontent.com/RcppCore/Rcpp/72f0652b93f196d",
                           "64faab6b108cd02a197510a7b/inst/include/Rcpp/utils/tinyformat.h"),
              destfile = temp_file,
              quiet = TRUE,
              mode = "wb")

yay::regex_file_normalization |>
  dplyr::filter(category == "posix") |>
  yay::str_normalize_file(path = temp_file)
#> ℹ Running in dry mode. Set `run_dry = FALSE` to actually modify any files.
#> ℹ Processing file ../../../../../../../../../tmp/RtmpYBZVAE/file68ca623976875……
#> 1× - …TINYFORMAT_H_INCLUDED
#>    + …TINYFORMAT_H_INCLUDE\1\n
#> ✔ Processing file ../../../../../../../../../tmp/RtmpYBZVAE/file68ca623976875…… done [39ms]
#>

Apply regular-expression-based text normalization to files

Usage

Arguments

Value

See also

Examples