Regular expression patterns and replacements for text normalization

Usage

regex_text_normalization

Format

A tibble.

Examples

# unnest the pattern column
tidyr::unnest_longer(data = yay::regex_text_normalization,
                     col = pattern)
#> # A tibble: 9 × 5
#>   id                            category              purpose                                                                                pattern replacement
#>   <chr>                         <chr>                 <chr>                                                                                  <chr>   <chr>      
#> 1 uniform_quotation_marks       harmonize_punctuation "use typewriter double quotes (`\"`) as quotation marks"                               "[“”„‟… "\""       
#> 2 uniform_apostrophes           harmonize_punctuation "use typewriter single quotes (`'`) as apostrophes"                                    "[’‘‚‛… "'"        
#> 3 no_break_percentages          prettify_punctuation  "use narrow non-breaking space between numbers and percentage signs"                   "\\b(\… "\\1 \\2"  
#> 4 no_break_abbreviations_german prettify_punctuation  "use narrow non-breaking space between characters of common German abbreviations"      "(?i)\… "\\1 \\2"  
#> 5 no_break_abbreviations_german prettify_punctuation  "use narrow non-breaking space between characters of common German abbreviations"      "(?i)\… "\\1 \\2"  
#> 6 no_break_abbreviations_german prettify_punctuation  "use narrow non-breaking space between characters of common German abbreviations"      "(?i)\… "\\1 \\2"  
#> 7 no_break_abbreviations_german prettify_punctuation  "use narrow non-breaking space between characters of common German abbreviations"      "(?i)\… "\\1 \\2"  
#> 8 no_break_equals_sign          prettify_punctuation  "use narrow non-breaking space before and after certain assignments and equality comp… "(?<= … " \\1 "    
#> 9 en_dash_value_ranges          prettify_punctuation  "use [en dash](https://www.thepunctuationguide.com/en-dash.html) instead of hyphen in… "(?<!-… "\\1–\\2"

Regular expression patterns and replacements for text normalization

Usage

Format

See also

Examples