Apply regular-expression-based text normalization to files
Source:R/yay.gen.R
str_normalize_file.Rd
Applies a set of regular-expression-based text normalization rules to one or more files. By default, changes are shown on the console only, without actually
modifying any files. Set run_dry = FALSE
to apply the changes.
Usage
str_normalize_file(
path,
rules = yay::regex_text_normalization,
run_dry = TRUE,
process_line_by_line = FALSE,
n_context_chrs = 20L,
verbose = TRUE
)
Arguments
- path
Paths to the text files. A character vector.
- rules
A tibble of regular expression patterns and replacements. It must have the columns
pattern
andreplacement
.pattern
can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described instringi::stringi-search-regex()
. Replacements are interpreted as-is, except that references of the form\1
,\2
, etc. will be replaced with the contents of the respective matched group (created in patterns using()
). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.- run_dry
Whether or not to show replacements on the console only, without actually modifying any files. Implies
verbose = TRUE
.- process_line_by_line
Whether each line in a file should be treated as a separate string or the whole file as one single string. While the latter is more performant, you probably want the former if you're using
"^"
or"$"
in yourpattern
s.- n_context_chrs
The (maximum) number of characters displayed around the actual
string
and its replacement. The number refers to a single side ofstring
/replacement, so the total number of context characters is at the maximum2 * n_context_chrs
. Only relevant ifverbose = TRUE
.- verbose
Whether or not to display replacements on the console.
See also
Regular expression rules: regex_text_normalization
regex_file_normalization
Other string functions:
str_normalize()
,
str_replace_file()
,
str_replace_verbose()
Examples
# Use POSIX-related file normalization rule(s) included in this package
temp_file <- tempfile()
download.file(url = paste0("https://raw.githubusercontent.com/RcppCore/Rcpp/72f0652b93f196d",
"64faab6b108cd02a197510a7b/inst/include/Rcpp/utils/tinyformat.h"),
destfile = temp_file,
quiet = TRUE,
mode = "wb")
yay::regex_file_normalization |>
dplyr::filter(category == "posix") |>
yay::str_normalize_file(path = temp_file)
#> ℹ Running in dry mode. Set `run_dry = FALSE` to actually modify any files.
#> ℹ Processing file ../../../../../../../../../tmp/RtmpV44gVt/filec40401220632e……
#> 1× - …TINYFORMAT_H_INCLUDED
#> + …TINYFORMAT_H_INCLUDE\1\n
#> ✔ Processing file ../../../../../../../../../tmp/RtmpV44gVt/filec40401220632e…… done [53ms]
#>