Repository: tidyverse/stringr Branch: main Commit: ae054b1d28f6 Files: 163 Total size: 377.5 KB Directory structure: gitextract_1kgwvzj7/ ├── .Rbuildignore ├── .covrignore ├── .github/ │ ├── .gitignore │ ├── CODE_OF_CONDUCT.md │ └── workflows/ │ ├── R-CMD-check.yaml │ ├── pkgdown.yaml │ ├── pr-commands.yaml │ └── test-coverage.yaml ├── .gitignore ├── .vscode/ │ ├── extensions.json │ └── settings.json ├── DESCRIPTION ├── LICENSE ├── LICENSE.md ├── NAMESPACE ├── NEWS.md ├── R/ │ ├── c.R │ ├── case.R │ ├── compat-obj-type.R │ ├── compat-purrr.R │ ├── compat-types-check.R │ ├── conv.R │ ├── count.R │ ├── data.R │ ├── detect.R │ ├── dup.R │ ├── equal.R │ ├── escape.R │ ├── extract.R │ ├── flatten.R │ ├── glue.R │ ├── interp.R │ ├── length.R │ ├── locate.R │ ├── match.R │ ├── modifiers.R │ ├── pad.R │ ├── remove.R │ ├── replace.R │ ├── sort.R │ ├── split.R │ ├── stringr-package.R │ ├── sub.R │ ├── subset.R │ ├── trim.R │ ├── trunc.R │ ├── unique.R │ ├── utils.R │ ├── view.R │ ├── word.R │ └── wrap.R ├── README.Rmd ├── README.md ├── _pkgdown.yml ├── air.toml ├── codecov.yml ├── cran-comments.md ├── data/ │ ├── fruit.rda │ ├── sentences.rda │ └── words.rda ├── data-raw/ │ ├── harvard-sentences.txt │ └── samples.R ├── inst/ │ └── htmlwidgets/ │ ├── lib/ │ │ └── str_view.css │ ├── str_view.js │ └── str_view.yaml ├── man/ │ ├── case.Rd │ ├── invert_match.Rd │ ├── modifiers.Rd │ ├── pipe.Rd │ ├── str_c.Rd │ ├── str_conv.Rd │ ├── str_count.Rd │ ├── str_detect.Rd │ ├── str_dup.Rd │ ├── str_equal.Rd │ ├── str_escape.Rd │ ├── str_extract.Rd │ ├── str_flatten.Rd │ ├── str_glue.Rd │ ├── str_interp.Rd │ ├── str_length.Rd │ ├── str_like.Rd │ ├── str_locate.Rd │ ├── str_match.Rd │ ├── str_order.Rd │ ├── str_pad.Rd │ ├── str_remove.Rd │ ├── str_replace.Rd │ ├── str_replace_na.Rd │ ├── str_split.Rd │ ├── str_starts.Rd │ ├── str_sub.Rd │ ├── str_subset.Rd │ ├── str_to_camel.Rd │ ├── str_trim.Rd │ ├── str_trunc.Rd │ ├── str_unique.Rd │ ├── str_view.Rd │ ├── str_which.Rd │ ├── str_wrap.Rd │ ├── stringr-data.Rd │ ├── stringr-package.Rd │ └── word.Rd ├── po/ │ ├── R-es.po │ └── R-stringr.pot ├── revdep/ │ ├── .gitignore │ ├── README.md │ ├── cran.md │ ├── email.yml │ ├── failures.md │ └── problems.md ├── stringr.Rproj ├── tests/ │ ├── testthat/ │ │ ├── _snaps/ │ │ │ ├── c.md │ │ │ ├── conv.md │ │ │ ├── detect.md │ │ │ ├── dup.md │ │ │ ├── equal.md │ │ │ ├── flatten.md │ │ │ ├── interp.md │ │ │ ├── match.md │ │ │ ├── modifiers.md │ │ │ ├── replace.md │ │ │ ├── split.md │ │ │ ├── sub.md │ │ │ ├── subset.md │ │ │ ├── trunc.md │ │ │ └── view.md │ │ ├── test-c.R │ │ ├── test-case.R │ │ ├── test-conv.R │ │ ├── test-count.R │ │ ├── test-detect.R │ │ ├── test-dup.R │ │ ├── test-equal.R │ │ ├── test-escape.R │ │ ├── test-extract.R │ │ ├── test-flatten.R │ │ ├── test-glue.R │ │ ├── test-interp.R │ │ ├── test-length.R │ │ ├── test-locate.R │ │ ├── test-match.R │ │ ├── test-modifiers.R │ │ ├── test-pad.R │ │ ├── test-remove.R │ │ ├── test-replace.R │ │ ├── test-sort.R │ │ ├── test-split.R │ │ ├── test-sub.R │ │ ├── test-subset.R │ │ ├── test-trim.R │ │ ├── test-trunc.R │ │ ├── test-unique.R │ │ ├── test-utils.R │ │ ├── test-view.R │ │ ├── test-word.R │ │ └── test-wrap.R │ └── testthat.R └── vignettes/ ├── .gitignore ├── from-base.Rmd ├── locale-sensitive.Rmd ├── regular-expressions.Rmd └── stringr.Rmd ================================================ FILE CONTENTS ================================================ ================================================ FILE: .Rbuildignore ================================================ ^pkgdown$ ^\.covrignore$ ^.*\.Rproj$ ^\.Rproj\.user$ ^packrat/ ^\.Rprofile$ ^\.travis\.yml$ ^revdep$ ^cran-comments\.md$ ^data-raw$ ^codecov\.yml$ ^\.httr-oauth$ ^_pkgdown\.yml$ ^doc$ ^docs$ ^Meta$ ^README\.Rmd$ ^README-.*\.png$ ^appveyor\.yml$ ^CRAN-RELEASE$ ^LICENSE\.md$ ^\.github$ ^CRAN-SUBMISSION$ ^[.]?air[.]toml$ ^\.vscode$ ================================================ FILE: .covrignore ================================================ R/deprec-*.R R/compat-*.R ================================================ FILE: .github/.gitignore ================================================ *.html ================================================ FILE: .github/CODE_OF_CONDUCT.md ================================================ # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at codeofconduct@posit.co. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at . Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][https://github.com/mozilla/inclusion]. For answers to common questions about this code of conduct, see the FAQ at . Translations are available at . [homepage]: https://www.contributor-covenant.org ================================================ FILE: .github/workflows/R-CMD-check.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help # # NOTE: This workflow is overkill for most R packages and # check-standard.yaml is likely a better choice. # usethis::use_github_action("check-standard") will install it. on: push: branches: [main, master] pull_request: branches: [main, master] name: R-CMD-check.yaml permissions: read-all jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} name: ${{ matrix.config.os }} (${{ matrix.config.r }}) strategy: fail-fast: false matrix: config: - {os: macos-latest, r: 'release'} - {os: windows-latest, r: 'release'} # use 4.0 or 4.1 to check with rtools40's older compiler - {os: windows-latest, r: 'oldrel-4'} - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} - {os: ubuntu-latest, r: 'oldrel-1'} - {os: ubuntu-latest, r: 'oldrel-2'} - {os: ubuntu-latest, r: 'oldrel-3'} - {os: ubuntu-latest, r: 'oldrel-4'} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} R_KEEP_PKG_SOURCE: yes steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: r-version: ${{ matrix.config.r }} http-user-agent: ${{ matrix.config.http-user-agent }} use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::rcmdcheck needs: check - uses: r-lib/actions/check-r-package@v2 with: upload-snapshots: true build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' ================================================ FILE: .github/workflows/pkgdown.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] release: types: [published] workflow_dispatch: name: pkgdown.yaml permissions: read-all jobs: pkgdown: runs-on: ubuntu-latest # Only restrict concurrency for non-PR jobs concurrency: group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} permissions: contents: write steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::pkgdown, local::. needs: website - name: Build site run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) shell: Rscript {0} - name: Deploy to GitHub pages 🚀 if: github.event_name != 'pull_request' uses: JamesIves/github-pages-deploy-action@v4.5.0 with: clean: false branch: gh-pages folder: docs ================================================ FILE: .github/workflows/pr-commands.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: issue_comment: types: [created] name: pr-commands.yaml permissions: read-all jobs: document: if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/document') }} name: document runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} permissions: contents: write steps: - uses: actions/checkout@v4 - uses: r-lib/actions/pr-fetch@v2 with: repo-token: ${{ secrets.GITHUB_TOKEN }} - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::roxygen2 needs: pr-document - name: Document run: roxygen2::roxygenise() shell: Rscript {0} - name: commit run: | git config --local user.name "$GITHUB_ACTOR" git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" git add man/\* NAMESPACE git commit -m 'Document' - uses: r-lib/actions/pr-push@v2 with: repo-token: ${{ secrets.GITHUB_TOKEN }} style: if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/style') }} name: style runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} permissions: contents: write steps: - uses: actions/checkout@v4 - uses: r-lib/actions/pr-fetch@v2 with: repo-token: ${{ secrets.GITHUB_TOKEN }} - uses: r-lib/actions/setup-r@v2 - name: Install dependencies run: install.packages("styler") shell: Rscript {0} - name: Style run: styler::style_pkg() shell: Rscript {0} - name: commit run: | git config --local user.name "$GITHUB_ACTOR" git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" git add \*.R git commit -m 'Style' - uses: r-lib/actions/pr-push@v2 with: repo-token: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .github/workflows/test-coverage.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] name: test-coverage.yaml permissions: read-all jobs: test-coverage: runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::covr, any::xml2 needs: coverage - name: Test coverage run: | cov <- covr::package_coverage( quiet = FALSE, clean = FALSE, install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package") ) covr::to_cobertura(cov) shell: Rscript {0} - uses: codecov/codecov-action@v4 with: fail_ci_if_error: ${{ github.event_name != 'pull_request' && true || false }} file: ./cobertura.xml plugin: noop disable_search: true token: ${{ secrets.CODECOV_TOKEN }} - name: Show testthat output if: always() run: | ## -------------------------------------------------------------------- find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true shell: bash - name: Upload test results if: failure() uses: actions/upload-artifact@v4 with: name: coverage-test-failures path: ${{ runner.temp }}/package ================================================ FILE: .gitignore ================================================ docs .Rproj.user .Rhistory .RData packrat/lib*/ packrat/src inst/doc .httr-oauth revdep/checks revdep/library revdep/checks.noindex revdep/library.noindex revdep/data.sqlite /doc/ /Meta/ ================================================ FILE: .vscode/extensions.json ================================================ { "recommendations": [ "Posit.air-vscode" ] } ================================================ FILE: .vscode/settings.json ================================================ { "[r]": { "editor.formatOnSave": true, "editor.defaultFormatter": "Posit.air-vscode" } } ================================================ FILE: DESCRIPTION ================================================ Package: stringr Title: Simple, Consistent Wrappers for Common String Operations Version: 1.6.0.9000 Authors@R: c( person("Hadley", "Wickham", , "hadley@posit.co", role = c("aut", "cre", "cph")), person("Posit Software, PBC", role = c("cph", "fnd")) ) Description: A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another. License: MIT + file LICENSE URL: https://stringr.tidyverse.org, https://github.com/tidyverse/stringr BugReports: https://github.com/tidyverse/stringr/issues Depends: R (>= 3.6) Imports: cli, glue (>= 1.6.1), lifecycle (>= 1.0.3), magrittr, rlang (>= 1.0.0), stringi (>= 1.5.3), vctrs (>= 0.4.0) Suggests: covr, dplyr, gt, htmltools, htmlwidgets, knitr, rmarkdown, testthat (>= 3.0.0), tibble VignetteBuilder: knitr Config/Needs/website: tidyverse/tidytemplate Config/potools/style: explicit Config/testthat/edition: 3 Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.3 ================================================ FILE: LICENSE ================================================ YEAR: 2023 COPYRIGHT HOLDER: stringr authors ================================================ FILE: LICENSE.md ================================================ # MIT License Copyright (c) 2023 stringr authors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: NAMESPACE ================================================ # Generated by roxygen2: do not edit by hand S3method("[",stringr_pattern) S3method("[",stringr_view) S3method("[[",stringr_pattern) S3method(print,stringr_view) S3method(type,character) S3method(type,default) S3method(type,stringr_boundary) S3method(type,stringr_coll) S3method(type,stringr_fixed) S3method(type,stringr_regex) export("%>%") export("str_sub<-") export(boundary) export(coll) export(fixed) export(invert_match) export(regex) export(str_c) export(str_conv) export(str_count) export(str_detect) export(str_dup) export(str_ends) export(str_equal) export(str_escape) export(str_extract) export(str_extract_all) export(str_flatten) export(str_flatten_comma) export(str_glue) export(str_glue_data) export(str_ilike) export(str_interp) export(str_length) export(str_like) export(str_locate) export(str_locate_all) export(str_match) export(str_match_all) export(str_order) export(str_pad) export(str_rank) export(str_remove) export(str_remove_all) export(str_replace) export(str_replace_all) export(str_replace_na) export(str_sort) export(str_split) export(str_split_1) export(str_split_fixed) export(str_split_i) export(str_squish) export(str_starts) export(str_sub) export(str_sub_all) export(str_subset) export(str_to_camel) export(str_to_kebab) export(str_to_lower) export(str_to_sentence) export(str_to_snake) export(str_to_title) export(str_to_upper) export(str_trim) export(str_trunc) export(str_unique) export(str_view) export(str_view_all) export(str_which) export(str_width) export(str_wrap) export(word) import(rlang) import(stringi) importFrom(glue,glue) importFrom(lifecycle,deprecated) importFrom(magrittr,"%>%") ================================================ FILE: NEWS.md ================================================ # stringr (development version) # stringr 1.6.0 ## Breaking changes * All relevant stringr functions now preserve names (@jonovik, #575). * `str_like(ignore_case)` is deprecated, with `str_like()` now always case sensitive to better follow the conventions of the SQL LIKE operator (@edward-burn, #543). * In `str_replace_all()`, a `replacement` function now receives all values in a single vector. This radically improves performance at the cost of breaking some existing uses (#462). ## New features * New `vignette("locale-sensitive")` about locale sensitive functions (@kylieainslie, #404) * New `str_ilike()` that follows the conventions of the SQL ILIKE operator (@edward-burn, #543). * New `str_to_camel()`, `str_to_snake()`, and `str_to_kebab()` for changing "programming" case (@librill, #573 + @arnaudgallou, #593). ## Minor bug fies and improvements * `str_*` now errors if `pattern` includes any `NA`s (@nash-delcamp-slp, #546). * `str_dup()` gains a `sep` argument so you can add a separator between every repeated value (@edward-burn, #564). * `str_sub<-` now gives a more informative error if `value` is not the correct length. * `str_view()` displays a message when called with a zero-length character vector (@LouisMPenrod, #497). * New `[[.stringr_pattern` method to match existing `[.stringr_pattern` (@edward-burn, #569). # stringr 1.5.2 * `R CMD check` fixes # stringr 1.5.1 * Some minor documentation improvements. * `str_trunc()` now correctly truncates strings when `side` is `"left"` or `"center"` (@UchidaMizuki, #512). # stringr 1.5.0 ## Breaking changes * stringr functions now consistently implement the tidyverse recycling rules (#372). There are two main changes: * Only vectors of length 1 are recycled. Previously, (e.g.) `str_detect(letters, c("x", "y"))` worked, but it now errors. * `str_c()` ignores `NULLs`, rather than treating them as length 0 vectors. Additionally, many more arguments now throw errors, rather than warnings, if supplied the wrong type of input. * `regex()` and friends now generate class names with `stringr_` prefix (#384). * `str_detect()`, `str_starts()`, `str_ends()` and `str_subset()` now error when used with either an empty string (`""`) or a `boundary()`. These operations didn't really make sense (`str_detect(x, "")` returned `TRUE` for all non-empty strings) and made it easy to make mistakes when programming. ## New features * Many tweaks to the documentation to make it more useful and consistent. * New `vignette("from-base")` by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It's designed to help you move to stringr if you're already familiar with base R string functions (#266). * New `str_escape()` escapes regular expression metacharacters, providing an alternative to `fixed()` if you want to compose a pattern from user supplied strings (#408). * New `str_equal()` compares two character vectors using unicode rules, optionally ignoring case (#381). * `str_extract()` can now optionally extract a capturing group instead of the complete match (#420). * New `str_flatten_comma()` is a special case of `str_flatten()` designed for comma separated flattening and can correctly apply the Oxford commas when there are only two elements (#444). * New `str_split_1()` is tailored for the special case of splitting up a single string (#409). * New `str_split_i()` extract a single piece from a string (#278, @bfgray3). * New `str_like()` allows the use of SQL wildcards (#280, @rjpat). * New `str_rank()` to complete the set of order/rank/sort functions (#353). * New `str_sub_all()` to extract multiple substrings from each string. * New `str_unique()` is a wrapper around `stri_unique()` and returns unique string values in a character vector (#249, @seasmith). * `str_view()` uses ANSI colouring rather than an HTML widget (#370). This works in more places and requires fewer dependencies. It includes a number of other small improvements: * It no longer requires a pattern so you can use it to display strings with special characters. * It highlights unusual whitespace characters. * It's vectorised over both string` and `pattern` (#407). * It defaults to displaying all matches, making `str_view_all()` redundant (and hence deprecated) (#455). * New `str_width()` returns the display width of a string (#380). * stringr is now licensed as MIT (#351). ## Minor improvements and bug fixes * Better error message if you supply a non-string pattern (#378). * A new data source for `sentences` has fixed many small errors. * `str_extract()` and `str_exctract_all()` now work correctly when `pattern` is a `boundary()`. * `str_flatten()` gains a `last` argument that optionally override the final separator (#377). It gains a `na.rm` argument to remove missing values (since it's a summary function) (#439). * `str_pad()` gains `use_width` argument to control whether to use the total code point width or the number of code points as "width" of a string (#190). * `str_replace()` and `str_replace_all()` can use standard tidyverse formula shorthand for `replacement` function (#331). * `str_starts()` and `str_ends()` now correctly respect regex operator precedence (@carlganz). * `str_wrap()` breaks only at whitespace by default; set `whitespace_only = FALSE` to return to the previous behaviour (#335, @rjpat). * `word()` now returns all the sentence when using a negative `start` parameter that is greater or equal than the number of words. (@pdelboca, #245) # stringr 1.4.1 Hot patch release to resolve R CMD check failures. # stringr 1.4.0 * `str_interp()` now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen). * New `str_starts()` and `str_ends()` functions to detect patterns at the beginning or end of strings (@jonthegeek, #258). * `str_subset()`, `str_detect()`, and `str_which()` get `negate` argument, which is useful when you want the elements that do NOT match (#259, @yutannihilation). * New `str_to_sentence()` function to capitalize with sentence case (@jonthegeek, #202). # stringr 1.3.1 * `str_replace_all()` with a named vector now respects modifier functions (#207) * `str_trunc()` is once again vectorised correctly (#203, @austin3dickey). * `str_view()` handles `NA` values more gracefully (#217). I've also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232). # stringr 1.3.0 ## API changes * During package build, you may see `Error : object ‘ignore.case’ is not exported by 'namespace:stringr'`. This is because the long deprecated `str_join()`, `ignore.case()` and `perl()` have now been removed. ## New features * `str_glue()` and `str_glue_data()` provide convenient wrappers around `glue` and `glue_data()` from the [glue](https://glue.tidyverse.org/) package (#157). * `str_flatten()` is a wrapper around `stri_flatten()` and clearly conveys flattening a character vector into a single string (#186). * `str_remove()` and `str_remove_all()` functions. These wrap `str_replace()` and `str_replace_all()` to remove patterns from strings. (@Shians, #178) * `str_squish()` removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197). * `str_sub()` gains `omit_na` argument for ignoring `NA`. Accordingly, `str_replace()` now ignores `NA`s and keeps the original strings. (@yutannihilation, #164) ## Bug fixes and minor improvements * `str_trunc()` now preserves NAs (@ClaytonJY, #162) * `str_trunc()` now throws an error when `width` is shorter than `ellipsis` (@ClaytonJY, #163). * Long deprecated `str_join()`, `ignore.case()` and `perl()` have now been removed. # stringr 1.2.0 ## API changes * `str_match_all()` now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with `str_match()` and other match failures (#134). ## New features * In `str_replace()`, `replacement` can now be a function that is called once for each match and whose return value is used to replace the match. * New `str_which()` mimics `grep()` (#129). * A new vignette (`vignette("regular-expressions")`) describes the details of the regular expressions supported by stringr. The main vignette (`vignette("stringr")`) has been updated to give a high-level overview of the package. ## Minor improvements and bug fixes * `str_order()` and `str_sort()` gain explicit `numeric` argument for sorting mixed numbers and strings. * `str_replace_all()` now throws an error if `replacement` is not a character vector. If `replacement` is `NA_character_` it replaces the complete string with replaces with `NA` (#124). * All functions that take a locale (e.g. `str_to_lower()` and `str_sort()`) default to "en" (English) to ensure that the default is consistent across platforms. # stringr 1.1.0 * Add sample datasets: `fruit`, `words` and `sentences`. * `fixed()`, `regex()`, and `coll()` now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement for `perl()` is `regex()` not `regexp()` (#61). `boundary()` has improved defaults when splitting on non-word boundaries (#58, @lmullen). * `str_detect()` now can detect boundaries (by checking for a `str_count()` > 0) (#120). `str_subset()` works similarly. * `str_extract()` and `str_extract_all()` now work with `boundary()`. This is particularly useful if you want to extract logical constructs like words or sentences. `str_extract_all()` respects the `simplify` argument when used with `fixed()` matches. * `str_subset()` now respects custom options for `fixed()` patterns (#79, @gagolews). * `str_replace()` and `str_replace_all()` now behave correctly when a replacement string contains `$`s, `\\\\1`, etc. (#83, #99). * `str_split()` gains a `simplify` argument to match `str_extract_all()` etc. * `str_view()` and `str_view_all()` create HTML widgets that display regular expression matches (#96). * `word()` returns `NA` for indexes greater than number of words (#112). # stringr 1.0.0 * stringr is now powered by [stringi](https://github.com/gagolews/stringi) instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail. * stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal. * `str_c()` now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using `str_c("x", NA)` now yields `NA`. If you want `"xNA"`, use `str_replace_na()` on the inputs. * `str_replace_all()` gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector: ```R input <- c("abc", "def") str_replace_all(input, c("[ad]" = "!", "[cf]" = "?")) ``` * `str_match()` now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with `str_extract()` and other match failures. * New `str_subset()` keeps values that match a pattern. It's a convenient wrapper for `x[str_detect(x)]` (#21, @jiho). * New `str_order()` and `str_sort()` allow you to sort and order strings in a specified locale. * New `str_conv()` to convert strings from specified encoding to UTF-8. * New modifier `boundary()` allows you to count, locate and split by character, word, line and sentence boundaries. * The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need. * `ignore.case(x)` has been deprecated in favour of `fixed|regex|coll(x, ignore.case = TRUE)`, `perl(x)` has been deprecated in favour of `regex(x)`. * `str_join()` is deprecated, please use `str_c()` instead. # stringr 0.6.2 * fixed path in `str_wrap` example so works for more R installations. * remove dependency on plyr # stringr 0.6.1 * Zero input to `str_split_fixed` returns 0 row matrix with `n` columns * Export `str_join` # stringr 0.6 * new modifier `perl` that switches to Perl regular expressions * `str_match` now uses new base function `regmatches` to extract matches - this should hopefully be faster than my previous pure R algorithm # stringr 0.5 * new `str_wrap` function which gives `strwrap` output in a more convenient format * new `word` function extract words from a string given user defined separator (thanks to suggestion by David Cooper) * `str_locate` now returns consistent type when matching empty string (thanks to Stavros Macrakis) * new `str_count` counts number of matches in a string. * `str_pad` and `str_trim` receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up * str_length returns NA for invalid multibyte strings * fix small bug in internal `recyclable` function # stringr 0.4 * all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters * fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors. * new ignore.case() modifier tells stringr functions to ignore case of pattern. * str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions. * new str_sub<- function (analogous to substring<-) for substring replacement * str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end. * str_pad side argument can be left, right, or both (instead of center) * str_trim gains side argument to better match str_pad * stringr now has a namespace and imports plyr (rather than requiring it) # stringr 0.3 * fixed() now also escapes | * str_join() renamed to str_c() * all functions more carefully check input and return informative error messages if not as expected. * add invert_match() function to convert a matrix of location of matches to locations of non-matches * add fixed() function to allow matching of fixed strings. # stringr 0.2 * str_length now returns correct results when used with factors * str_sub now correctly replaces Inf in end argument with length of string * new function str_split_fixed returns fixed number of splits in a character matrix * str_split no longer uses strsplit to preserve trailing breaks ================================================ FILE: R/c.R ================================================ #' Join multiple strings into one string #' #' @description #' `str_c()` combines multiple character vectors into a single character #' vector. It's very similar to [paste0()] but uses tidyverse recycling and #' `NA` rules. #' #' One way to understand how `str_c()` works is picture a 2d matrix of strings, #' where each argument forms a column. `sep` is inserted between each column, #' and then each row is combined together into a single string. If `collapse` #' is set, it's inserted between each row, and then the result is again #' combined, this time into a single string. #' #' @param ... One or more character vectors. #' #' `NULL`s are removed; scalar inputs (vectors of length 1) are recycled to #' the common length of vector inputs. #' #' Like most other R functions, missing values are "infectious": whenever #' a missing value is combined with another string the result will always #' be missing. Use [dplyr::coalesce()] or [str_replace_na()] to convert to #' the desired value. #' @param sep String to insert between input vectors. #' @param collapse Optional string used to combine output into single #' string. Generally better to use [str_flatten()] if you needed this #' behaviour. #' @return If `collapse = NULL` (the default) a character vector with #' length equal to the longest input. If `collapse` is a string, a character #' vector of length 1. #' @export #' @examples #' str_c("Letter: ", letters) #' str_c("Letter", letters, sep = ": ") #' str_c(letters, " is for", "...") #' str_c(letters[-26], " comes before ", letters[-1]) #' #' str_c(letters, collapse = "") #' str_c(letters, collapse = ", ") #' #' # Differences from paste() ---------------------- #' # Missing inputs give missing outputs #' str_c(c("a", NA, "b"), "-d") #' paste0(c("a", NA, "b"), "-d") #' # Use str_replace_NA to display literal NAs: #' str_c(str_replace_na(c("a", NA, "b")), "-d") #' #' # Uses tidyverse recycling rules #' \dontrun{str_c(1:2, 1:3)} # errors #' paste0(1:2, 1:3) #' #' str_c("x", character()) #' paste0("x", character()) str_c <- function(..., sep = "", collapse = NULL) { check_string(sep) check_string(collapse, allow_null = TRUE) dots <- list(...) dots <- dots[!map_lgl(dots, is.null)] vctrs::vec_size_common(!!!dots) inject(stri_c(!!!dots, sep = sep, collapse = collapse)) } ================================================ FILE: R/case.R ================================================ #' Convert string to upper case, lower case, title case, or sentence case #' #' * `str_to_upper()` converts to upper case. #' * `str_to_lower()` converts to lower case. #' * `str_to_title()` converts to title case, where only the first letter of #' each word is capitalized. #' * `str_to_sentence()` convert to sentence case, where only the first letter #' of sentence is capitalized. #' #' @inheritParams str_detect #' @inheritParams coll #' @return A character vector the same length as `string`. #' @examples #' dog <- "The quick brown dog" #' str_to_upper(dog) #' str_to_lower(dog) #' str_to_title(dog) #' str_to_sentence("the quick brown dog") #' #' # Locale matters! #' str_to_upper("i") # English #' str_to_upper("i", "tr") # Turkish #' @name case NULL #' @export #' @rdname case str_to_upper <- function(string, locale = "en") { check_string(locale) copy_names(string, stri_trans_toupper(string, locale = locale)) } #' @export #' @rdname case str_to_lower <- function(string, locale = "en") { check_string(locale) copy_names(string, stri_trans_tolower(string, locale = locale)) } #' @export #' @rdname case str_to_title <- function(string, locale = "en") { check_string(locale) out <- stri_trans_totitle( string, opts_brkiter = stri_opts_brkiter(locale = locale) ) copy_names(string, out) } #' @export #' @rdname case str_to_sentence <- function(string, locale = "en") { check_string(locale) out <- stri_trans_totitle( string, opts_brkiter = stri_opts_brkiter(type = "sentence", locale = locale) ) copy_names(string, out) } #' Convert between different types of programming case #' #' @description #' * `str_to_camel()` converts to camel case, where the first letter of #' each word is capitalized, with no separation between words. By default #' the first letter of the first word is not capitalized. #' #' * `str_to_kebab()` converts to kebab case, where words are converted to #' lower case and separated by dashes (`-`). #' #' * `str_to_snake()` converts to snake case, where words are converted to #' lower case and separated by underscores (`_`). #' @inheritParams str_to_lower #' @export #' @param first_upper Logical. Should the first letter be capitalized? #' @examples #' str_to_camel("my-variable") #' str_to_camel("my-variable", first_upper = TRUE) #' #' str_to_snake("MyVariable") #' str_to_kebab("MyVariable") str_to_camel <- function(string, first_upper = FALSE) { check_character(string) check_bool(first_upper) string <- string |> to_words() |> str_to_title() |> str_remove_all(pattern = fixed(" ")) if (!first_upper) { str_sub(string, 1, 1) <- str_to_lower(str_sub(string, 1, 1)) } string } #' @export #' @rdname str_to_camel str_to_snake <- function(string) { check_character(string) to_separated_case(string, sep = "_") } #' @export #' @rdname str_to_camel str_to_kebab <- function(string) { check_character(string) to_separated_case(string, sep = "-") } to_separated_case <- function(string, sep) { out <- to_words(string) str_replace_all(out, fixed(" "), sep) } to_words <- function(string) { breakpoints <- paste( # non-word characters "[^\\p{L}\\p{N}]+", # lowercase followed by uppercase "(?<=\\p{Ll})(?=\\p{Lu})", # letter followed by number "(?<=\\p{L})(?=\\p{N})", # number followed by letter "(?<=\\p{N})(?=\\p{L})", # uppercase followed uppercase then lowercase (i.e. end of acronym) "(?<=\\p{Lu})(?=\\p{Lu}\\p{Ll})", sep = "|" ) out <- str_replace_all(string, breakpoints, " ") out <- str_to_lower(out) str_trim(out) } ================================================ FILE: R/compat-obj-type.R ================================================ # nocov start --- r-lib/rlang compat-obj-type # # Changelog # ========= # # 2022-10-04: # - `obj_type_friendly(value = TRUE)` now shows numeric scalars # literally. # - `stop_friendly_type()` now takes `show_value`, passed to # `obj_type_friendly()` as the `value` argument. # # 2022-10-03: # - Added `allow_na` and `allow_null` arguments. # - `NULL` is now backticked. # - Better friendly type for infinities and `NaN`. # # 2022-09-16: # - Unprefixed usage of rlang functions with `rlang::` to # avoid onLoad issues when called from rlang (#1482). # # 2022-08-11: # - Prefixed usage of rlang functions with `rlang::`. # # 2022-06-22: # - `friendly_type_of()` is now `obj_type_friendly()`. # - Added `obj_type_oo()`. # # 2021-12-20: # - Added support for scalar values and empty vectors. # - Added `stop_input_type()` # # 2021-06-30: # - Added support for missing arguments. # # 2021-04-19: # - Added support for matrices and arrays (#141). # - Added documentation. # - Added changelog. #' Return English-friendly type #' @param x Any R object. #' @param value Whether to describe the value of `x`. Special values #' like `NA` or `""` are always described. #' @param length Whether to mention the length of vectors and lists. #' @return A string describing the type. Starts with an indefinite #' article, e.g. "an integer vector". #' @noRd obj_type_friendly <- function(x, value = TRUE) { if (is_missing(x)) { return("absent") } if (is.object(x)) { if (inherits(x, "quosure")) { type <- "quosure" } else { type <- paste(class(x), collapse = "/") } return(sprintf("a <%s> object", type)) } if (!is_vector(x)) { return(.rlang_as_friendly_type(typeof(x))) } n_dim <- length(dim(x)) if (!n_dim) { if (!is_list(x) && length(x) == 1) { if (is_na(x)) { return(switch( typeof(x), logical = "`NA`", integer = "an integer `NA`", double = if (is.nan(x)) { "`NaN`" } else { "a numeric `NA`" }, complex = "a complex `NA`", character = "a character `NA`", .rlang_stop_unexpected_typeof(x) )) } show_infinites <- function(x) { if (x > 0) { "`Inf`" } else { "`-Inf`" } } str_encode <- function(x, width = 30, ...) { if (nchar(x) > width) { x <- substr(x, 1, width - 3) x <- paste0(x, "...") } encodeString(x, ...) } if (value) { if (is.numeric(x) && is.infinite(x)) { return(show_infinites(x)) } if (is.numeric(x) || is.complex(x)) { number <- as.character(round(x, 2)) what <- if (is.complex(x)) "the complex number" else "the number" return(paste(what, number)) } return(switch( typeof(x), logical = if (x) "`TRUE`" else "`FALSE`", character = { what <- if (nzchar(x)) "the string" else "the empty string" paste(what, str_encode(x, quote = "\"")) }, raw = paste("the raw value", as.character(x)), .rlang_stop_unexpected_typeof(x) )) } return(switch( typeof(x), logical = "a logical value", integer = "an integer", double = if (is.infinite(x)) show_infinites(x) else "a number", complex = "a complex number", character = if (nzchar(x)) "a string" else "\"\"", raw = "a raw value", .rlang_stop_unexpected_typeof(x) )) } if (length(x) == 0) { return(switch( typeof(x), logical = "an empty logical vector", integer = "an empty integer vector", double = "an empty numeric vector", complex = "an empty complex vector", character = "an empty character vector", raw = "an empty raw vector", list = "an empty list", .rlang_stop_unexpected_typeof(x) )) } } vec_type_friendly(x) } vec_type_friendly <- function(x, length = FALSE) { if (!is_vector(x)) { abort("`x` must be a vector.") } type <- typeof(x) n_dim <- length(dim(x)) add_length <- function(type) { if (length && !n_dim) { paste0(type, sprintf(" of length %s", length(x))) } else { type } } if (type == "list") { if (n_dim < 2) { return(add_length("a list")) } else if (is.data.frame(x)) { return("a data frame") } else if (n_dim == 2) { return("a list matrix") } else { return("a list array") } } type <- switch( type, logical = "a logical %s", integer = "an integer %s", numeric = , double = "a double %s", complex = "a complex %s", character = "a character %s", raw = "a raw %s", type = paste0("a ", type, " %s") ) if (n_dim < 2) { kind <- "vector" } else if (n_dim == 2) { kind <- "matrix" } else { kind <- "array" } out <- sprintf(type, kind) if (n_dim >= 2) { out } else { add_length(out) } } .rlang_as_friendly_type <- function(type) { switch( type, list = "a list", NULL = "`NULL`", environment = "an environment", externalptr = "a pointer", weakref = "a weak reference", S4 = "an S4 object", name = , symbol = "a symbol", language = "a call", pairlist = "a pairlist node", expression = "an expression vector", char = "an internal string", promise = "an internal promise", ... = "an internal dots object", any = "an internal `any` object", bytecode = "an internal bytecode object", primitive = , builtin = , special = "a primitive function", closure = "a function", type ) } .rlang_stop_unexpected_typeof <- function(x, call = caller_env()) { abort( sprintf("Unexpected type <%s>.", typeof(x)), call = call ) } #' Return OO type #' @param x Any R object. #' @return One of `"bare"` (for non-OO objects), `"S3"`, `"S4"`, #' `"R6"`, or `"R7"`. #' @noRd obj_type_oo <- function(x) { if (!is.object(x)) { return("bare") } class <- inherits(x, c("R6", "R7_object"), which = TRUE) if (class[[1]]) { "R6" } else if (class[[2]]) { "R7" } else if (isS4(x)) { "S4" } else { "S3" } } #' @param x The object type which does not conform to `what`. Its #' `obj_type_friendly()` is taken and mentioned in the error message. #' @param what The friendly expected type as a string. Can be a #' character vector of expected types, in which case the error #' message mentions all of them in an "or" enumeration. #' @param show_value Passed to `value` argument of `obj_type_friendly()`. #' @param ... Arguments passed to [abort()]. #' @inheritParams args_error_context #' @noRd stop_input_type <- function( x, what, ..., allow_na = FALSE, allow_null = FALSE, show_value = TRUE, arg = caller_arg(x), call = caller_env() ) { # From compat-cli.R cli <- env_get_list( nms = c("format_arg", "format_code"), last = topenv(), default = function(x) sprintf("`%s`", x), inherit = TRUE ) if (allow_na) { what <- c(what, cli$format_code("NA")) } if (allow_null) { what <- c(what, cli$format_code("NULL")) } if (length(what)) { what <- oxford_comma(what) } message <- sprintf( "%s must be %s, not %s.", cli$format_arg(arg), what, obj_type_friendly(x, value = show_value) ) abort(message, ..., call = call, arg = arg) } oxford_comma <- function(chr, sep = ", ", final = "or") { n <- length(chr) if (n < 2) { return(chr) } head <- chr[seq_len(n - 1)] last <- chr[n] head <- paste(head, collapse = sep) # Write a or b. But a, b, or c. if (n > 2) { paste0(head, sep, final, " ", last) } else { paste0(head, " ", final, " ", last) } } # nocov end ================================================ FILE: R/compat-purrr.R ================================================ # nocov start - compat-purrr (last updated: rlang 0.3.2.9000) # This file serves as a reference for compatibility functions for # purrr. They are not drop-in replacements but allow a similar style # of programming. This is useful in cases where purrr is too heavy a # package to depend on. Please find the most recent version in rlang's # repository. map <- function(.x, .f, ...) { lapply(.x, .f, ...) } map_mold <- function(.x, .f, .mold, ...) { out <- vapply(.x, .f, .mold, ..., USE.NAMES = FALSE) names(out) <- names(.x) out } map_lgl <- function(.x, .f, ...) { map_mold(.x, .f, logical(1), ...) } map_int <- function(.x, .f, ...) { map_mold(.x, .f, integer(1), ...) } map_dbl <- function(.x, .f, ...) { map_mold(.x, .f, double(1), ...) } map_chr <- function(.x, .f, ...) { map_mold(.x, .f, character(1), ...) } map_cpl <- function(.x, .f, ...) { map_mold(.x, .f, complex(1), ...) } walk <- function(.x, .f, ...) { map(.x, .f, ...) invisible(.x) } pluck <- function(.x, .f) { map(.x, `[[`, .f) } pluck_lgl <- function(.x, .f) { map_lgl(.x, `[[`, .f) } pluck_int <- function(.x, .f) { map_int(.x, `[[`, .f) } pluck_dbl <- function(.x, .f) { map_dbl(.x, `[[`, .f) } pluck_chr <- function(.x, .f) { map_chr(.x, `[[`, .f) } pluck_cpl <- function(.x, .f) { map_cpl(.x, `[[`, .f) } map2 <- function(.x, .y, .f, ...) { out <- mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE) if (length(out) == length(.x)) { set_names(out, names(.x)) } else { set_names(out, NULL) } } map2_lgl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "logical") } map2_int <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "integer") } map2_dbl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "double") } map2_chr <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "character") } map2_cpl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "complex") } args_recycle <- function(args) { lengths <- map_int(args, length) n <- max(lengths) stopifnot(all(lengths == 1L | lengths == n)) to_recycle <- lengths == 1L args[to_recycle] <- map(args[to_recycle], function(x) rep.int(x, n)) args } pmap <- function(.l, .f, ...) { args <- args_recycle(.l) do.call( "mapply", c( FUN = list(quote(.f)), args, MoreArgs = quote(list(...)), SIMPLIFY = FALSE, USE.NAMES = FALSE ) ) } probe <- function(.x, .p, ...) { if (is_logical(.p)) { stopifnot(length(.p) == length(.x)) .p } else { map_lgl(.x, .p, ...) } } keep <- function(.x, .f, ...) { .x[probe(.x, .f, ...)] } discard <- function(.x, .p, ...) { sel <- probe(.x, .p, ...) .x[is.na(sel) | !sel] } map_if <- function(.x, .p, .f, ...) { matches <- probe(.x, .p) .x[matches] <- map(.x[matches], .f, ...) .x } compact <- function(.x) { Filter(length, .x) } transpose <- function(.l) { inner_names <- names(.l[[1]]) if (is.null(inner_names)) { fields <- seq_along(.l[[1]]) } else { fields <- set_names(inner_names) } map(fields, function(i) { map(.l, .subset2, i) }) } every <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (!rlang::is_true(.p(.x[[i]], ...))) return(FALSE) } TRUE } some <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (rlang::is_true(.p(.x[[i]], ...))) return(TRUE) } FALSE } negate <- function(.p) { function(...) !.p(...) } reduce <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init) } reduce_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE) } accumulate <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init, accumulate = TRUE) } accumulate_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE, accumulate = TRUE) } detect <- function(.x, .f, ..., .right = FALSE, .p = is_true) { for (i in index(.x, .right)) { if (.p(.f(.x[[i]], ...))) { return(.x[[i]]) } } NULL } detect_index <- function(.x, .f, ..., .right = FALSE, .p = is_true) { for (i in index(.x, .right)) { if (.p(.f(.x[[i]], ...))) { return(i) } } 0L } index <- function(x, right = FALSE) { idx <- seq_along(x) if (right) { idx <- rev(idx) } idx } imap <- function(.x, .f, ...) { map2(.x, vec_index(.x), .f, ...) } vec_index <- function(x) { names(x) %||% seq_along(x) } # nocov end ================================================ FILE: R/compat-types-check.R ================================================ # nocov start --- r-lib/rlang compat-types-check # # Dependencies # ============ # # - compat-obj-type.R # # Changelog # ========= # # 2022-10-04: # - Added `check_name()` that forbids the empty string. # `check_string()` allows the empty string by default. # # 2022-09-28: # - Removed `what` arguments. # - Added `allow_na` and `allow_null` arguments. # - Added `allow_decimal` and `allow_infinite` arguments. # - Improved errors with absent arguments. # # # 2022-09-16: # - Unprefixed usage of rlang functions with `rlang::` to # avoid onLoad issues when called from rlang (#1482). # # 2022-08-11: # - Added changelog. # Scalars ----------------------------------------------------------------- check_bool <- function( x, ..., allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_bool(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } if (allow_na && identical(x, NA)) { return(invisible(NULL)) } } stop_input_type( x, c("`TRUE`", "`FALSE`"), ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } check_string <- function( x, ..., allow_empty = TRUE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { is_string <- .rlang_check_is_string( x, allow_empty = allow_empty, allow_na = allow_na, allow_null = allow_null ) if (is_string) { return(invisible(NULL)) } } stop_input_type( x, "a single string", ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } .rlang_check_is_string <- function(x, allow_empty, allow_na, allow_null) { if (is_string(x)) { if (allow_empty || !is_string(x, "")) { return(TRUE) } } if (allow_null && is_null(x)) { return(TRUE) } if (allow_na && (identical(x, NA) || identical(x, na_chr))) { return(TRUE) } FALSE } check_name <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { is_string <- .rlang_check_is_string( x, allow_empty = FALSE, allow_na = FALSE, allow_null = allow_null ) if (is_string) { return(invisible(NULL)) } } stop_input_type( x, "a valid name", ..., allow_na = FALSE, allow_null = allow_null, arg = arg, call = call ) } check_number_decimal <- function( x, ..., min = -Inf, max = Inf, allow_infinite = TRUE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { .rlang_types_check_number( x, ..., min = min, max = max, allow_decimal = TRUE, allow_infinite = allow_infinite, allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } check_number_whole <- function( x, ..., min = -Inf, max = Inf, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { .rlang_types_check_number( x, ..., min = min, max = max, allow_decimal = FALSE, allow_infinite = FALSE, allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } .rlang_types_check_number <- function( x, ..., min = -Inf, max = Inf, allow_decimal = FALSE, allow_infinite = FALSE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (allow_decimal) { what <- "a number" } else { what <- "a whole number" } .stop <- function(x, what, ...) { stop_input_type( x, what, ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } if (!missing(x)) { is_number <- is_number( x, allow_decimal = allow_decimal, allow_infinite = allow_infinite ) if (is_number) { if (min > -Inf && max < Inf) { what <- sprintf("a number between %s and %s", min, max) } else { what <- NULL } if (x < min) { what <- what %||% sprintf("a number larger than %s", min) .stop(x, what, ...) } if (x > max) { what <- what %||% sprintf("a number smaller than %s", max) .stop(x, what, ...) } return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } if ( allow_na && (identical(x, NA) || identical(x, na_dbl) || identical(x, na_int)) ) { return(invisible(NULL)) } } .stop(x, what, ...) } is_number <- function(x, allow_decimal = FALSE, allow_infinite = FALSE) { if (!typeof(x) %in% c("integer", "double")) { return(FALSE) } if (length(x) != 1) { return(FALSE) } if (is.na(x)) { return(FALSE) } if (!allow_decimal && !is_integerish(x)) { return(FALSE) } if (!allow_infinite && is.infinite(x)) { return(FALSE) } TRUE } check_symbol <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_symbol(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a symbol", ..., allow_null = allow_null, arg = arg, call = call ) } check_arg <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_symbol(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an argument name", ..., allow_null = allow_null, arg = arg, call = call ) } check_call <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_call(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a defused call", ..., allow_null = allow_null, arg = arg, call = call ) } check_environment <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_environment(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an environment", ..., allow_null = allow_null, arg = arg, call = call ) } check_function <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_function(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a function", ..., allow_null = allow_null, arg = arg, call = call ) } check_closure <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_closure(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an R function", ..., allow_null = allow_null, arg = arg, call = call ) } check_formula <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_formula(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a formula", ..., allow_null = allow_null, arg = arg, call = call ) } # Vectors ----------------------------------------------------------------- check_character <- function( x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env() ) { if (!missing(x)) { if (is_character(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a character vector", ..., allow_null = allow_null, arg = arg, call = call ) } # nocov end ================================================ FILE: R/conv.R ================================================ #' Specify the encoding of a string #' #' This is a convenient way to override the current encoding of a string. #' #' @inheritParams str_detect #' @param encoding Name of encoding. See [stringi::stri_enc_list()] #' for a complete list. #' @export #' @examples #' # Example from encoding?stringi::stringi #' x <- rawToChar(as.raw(177)) #' x #' str_conv(x, "ISO-8859-2") # Polish "a with ogonek" #' str_conv(x, "ISO-8859-1") # Plus-minus str_conv <- function(string, encoding) { check_string(encoding) copy_names(string, stri_conv(string, encoding, "UTF-8")) } ================================================ FILE: R/count.R ================================================ #' Count number of matches #' #' Counts the number of times `pattern` is found within each element #' of `string.` #' #' @inheritParams str_detect #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described in #' `vignette("regular-expressions")`. Use [regex()] for finer control of the #' matching behaviour. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' #' Match character, word, line and sentence boundaries with #' [boundary()]. The empty string, `""``, is equivalent to #' `boundary("character")`. #' @return An integer vector the same length as `string`/`pattern`. #' @seealso [stringi::stri_count()] which this function wraps. #' #' [str_locate()]/[str_locate_all()] to locate position #' of matches #' #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_count(fruit, "a") #' str_count(fruit, "p") #' str_count(fruit, "e") #' str_count(fruit, c("a", "b", "p", "p")) #' #' str_count(c("a.", "...", ".a.a"), ".") #' str_count(c("a.", "...", ".a.a"), fixed(".")) str_count <- function(string, pattern = "") { check_lengths(string, pattern) out <- switch( type(pattern), empty = , bound = stri_count_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_count_fixed(string, pattern, opts_fixed = opts(pattern)), coll = stri_count_coll(string, pattern, opts_collator = opts(pattern)), regex = stri_count_regex(string, pattern, opts_regex = opts(pattern)) ) preserve_names_if_possible(string, pattern, out) } ================================================ FILE: R/data.R ================================================ #' Sample character vectors for practicing string manipulations #' #' `fruit` and `words` come from the `rcorpora` package #' written by Gabor Csardi; the data was collected by Darius Kazemi #' and made available at \url{https://github.com/dariusk/corpora}. #' `sentences` is a collection of "Harvard sentences" used for #' standardised testing of voice. #' #' @format Character vectors. #' @name stringr-data #' @examples #' length(sentences) #' sentences[1:5] #' #' length(fruit) #' fruit[1:5] #' #' length(words) #' words[1:5] NULL #' @rdname stringr-data #' @format NULL "sentences" #' @rdname stringr-data #' @format NULL "fruit" #' @rdname stringr-data #' @format NULL "words" ================================================ FILE: R/detect.R ================================================ #' Detect the presence/absence of a match #' #' `str_detect()` returns a logical vector with `TRUE` for each element of #' `string` that matches `pattern` and `FALSE` otherwise. It's equivalent to #' `grepl(pattern, string)`. #' #' @param string Input vector. Either a character vector, or something #' coercible to one. #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described in #' `vignette("regular-expressions")`. Use [regex()] for finer control of the #' matching behaviour. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' #' You can not match boundaries, including `""`, with this function. #' #' @param negate If `TRUE`, inverts the resulting boolean vector. #' @return A logical vector the same length as `string`/`pattern`. #' @seealso [stringi::stri_detect()] which this function wraps, #' [str_subset()] for a convenient wrapper around #' `x[str_detect(x, pattern)]` #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_detect(fruit, "a") #' str_detect(fruit, "^a") #' str_detect(fruit, "a$") #' str_detect(fruit, "b") #' str_detect(fruit, "[aeiou]") #' #' # Also vectorised over pattern #' str_detect("aecfg", letters) #' #' # Returns TRUE if the pattern do NOT match #' str_detect(fruit, "^p", negate = TRUE) str_detect <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) out <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_detect_fixed( string, pattern, negate = negate, opts_fixed = opts(pattern) ), coll = stri_detect_coll( string, pattern, negate = negate, opts_collator = opts(pattern) ), regex = stri_detect_regex( string, pattern, negate = negate, opts_regex = opts(pattern) ) ) preserve_names_if_possible(string, pattern, out) } #' Detect the presence/absence of a match at the start/end #' #' `str_starts()` and `str_ends()` are special cases of [str_detect()] that #' only match at the beginning or end of a string, respectively. #' #' @inheritParams str_detect #' @param pattern Pattern with which the string starts or ends. #' #' The default interpretation is a regular expression, as described in #' [stringi::about_search_regex]. Control options with [regex()]. #' #' Match a fixed string (i.e. by comparing only bytes), using [fixed()]. This #' is fast, but approximate. Generally, for matching human text, you'll want #' [coll()] which respects character matching rules for the specified locale. #' #' @return A logical vector. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_starts(fruit, "p") #' str_starts(fruit, "p", negate = TRUE) #' str_ends(fruit, "e") #' str_ends(fruit, "e", negate = TRUE) str_starts <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) out <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_startswith_fixed( string, pattern, negate = negate, opts_fixed = opts(pattern) ), coll = stri_startswith_coll( string, pattern, negate = negate, opts_collator = opts(pattern) ), regex = { pattern2 <- paste0("^(", pattern, ")") stri_detect_regex( string, pattern2, negate = negate, opts_regex = opts(pattern) ) } ) preserve_names_if_possible(string, pattern, out) } #' @rdname str_starts #' @export str_ends <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) out <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_endswith_fixed( string, pattern, negate = negate, opts_fixed = opts(pattern) ), coll = stri_endswith_coll( string, pattern, negate = negate, opts_collator = opts(pattern) ), regex = { pattern2 <- paste0("(", pattern, ")$") stri_detect_regex( string, pattern2, negate = negate, opts_regex = opts(pattern) ) } ) preserve_names_if_possible(string, pattern, out) } #' Detect a pattern in the same way as `SQL`'s `LIKE` and `ILIKE` operators #' #' @description #' `str_like()` and `str_like()` follow the conventions of the SQL `LIKE` #' and `ILIKE` operators, namely: #' #' * Must match the entire string. #' * `_` matches a single character (like `.`). #' * `%` matches any number of characters (like `.*`). #' * `\%` and `\_` match literal `%` and `_`. #' #' The difference between the two functions is their case-sensitivity: #' `str_like()` is case sensitive and `str_ilike()` is not. #' #' @note #' Prior to stringr 1.6.0, `str_like()` was incorrectly case-insensitive. #' #' @inheritParams str_detect #' @param pattern A character vector containing a SQL "like" pattern. #' See above for details. #' @param ignore_case `r lifecycle::badge("deprecated")` #' @return A logical vector the same length as `string`. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_like(fruit, "app") #' str_like(fruit, "app%") #' str_like(fruit, "APP%") #' str_like(fruit, "ba_ana") #' str_like(fruit, "%apple") #' #' str_ilike(fruit, "app") #' str_ilike(fruit, "app%") #' str_ilike(fruit, "APP%") #' str_ilike(fruit, "ba_ana") #' str_ilike(fruit, "%apple") str_like <- function(string, pattern, ignore_case = deprecated()) { check_lengths(string, pattern) check_character(pattern) if (inherits(pattern, "stringr_pattern")) { cli::cli_abort( "{.arg pattern} must be a plain string, not a stringr modifier." ) } if (lifecycle::is_present(ignore_case)) { lifecycle::deprecate_warn( when = "1.6.0", what = "str_like(ignore_case)", details = c( "`str_like()` is always case sensitive.", "Use `str_ilike()` for case insensitive string matching." ) ) check_bool(ignore_case) if (ignore_case) { return(str_ilike(string, pattern)) } } pattern <- regex(like_to_regex(pattern), ignore_case = FALSE) out <- stri_detect_regex(string, pattern, opts_regex = opts(pattern)) preserve_names_if_possible(string, pattern, out) } #' @export #' @rdname str_like str_ilike <- function(string, pattern) { check_lengths(string, pattern) check_character(pattern) if (inherits(pattern, "stringr_pattern")) { cli::cli_abort(tr_( "{.arg pattern} must be a plain string, not a stringr modifier." )) } pattern <- regex(like_to_regex(pattern), ignore_case = TRUE) out <- stri_detect_regex(string, pattern, opts_regex = opts(pattern)) preserve_names_if_possible(string, pattern, out) } like_to_regex <- function(pattern) { converted <- stri_replace_all_regex( pattern, "(?= 2) { string <- c( string[seq2(1, n - 2)], stringi::stri_c(string[[n - 1]], last, string[[n]]) ) } stri_flatten(string, collapse = collapse) } #' @export #' @rdname str_flatten str_flatten_comma <- function(string, last = NULL, na.rm = FALSE) { check_string(last, allow_null = TRUE) check_bool(na.rm) # Remove comma if exactly two elements, and last uses Oxford comma if (length(string) == 2 && !is.null(last) && str_detect(last, "^,")) { last <- str_replace(last, "^,", "") } str_flatten(string, ", ", last = last, na.rm = na.rm) } ================================================ FILE: R/glue.R ================================================ #' Interpolation with glue #' #' @description #' These functions are wrappers around [glue::glue()] and [glue::glue_data()], #' which provide a powerful and elegant syntax for interpolating strings #' with `{}`. #' #' These wrappers provide a small set of the full options. Use `glue()` and #' `glue_data()` directly from glue for more control. #' #' @inheritParams glue::glue #' @return A character vector with same length as the longest input. #' @export #' @examples #' name <- "Fred" #' age <- 50 #' anniversary <- as.Date("1991-10-12") #' str_glue( #' "My name is {name}, ", #' "my age next year is {age + 1}, ", #' "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." #' ) #' #' # single braces can be inserted by doubling them #' str_glue("My name is {name}, not {{name}}.") #' #' # You can also used named arguments #' str_glue( #' "My name is {name}, ", #' "and my age next year is {age + 1}.", #' name = "Joe", #' age = 40 #' ) #' #' # `str_glue_data()` is useful in data pipelines #' mtcars %>% str_glue_data("{rownames(.)} has {hp} hp") str_glue <- function(..., .sep = "", .envir = parent.frame(), .trim = TRUE) { glue::glue(..., .sep = .sep, .envir = .envir, .trim = .trim) } #' @export #' @rdname str_glue str_glue_data <- function( .x, ..., .sep = "", .envir = parent.frame(), .na = "NA" ) { glue::glue_data( .x, ..., .sep = .sep, .envir = .envir, .na = .na ) } ================================================ FILE: R/interp.R ================================================ #' String interpolation #' #' @description #' `r lifecycle::badge("superseded")` #' #' `str_interp()` is superseded in favour of [str_glue()]. #' #' String interpolation is a useful way of specifying a character string which #' depends on values in a certain environment. It allows for string creation #' which is easier to read and write when compared to using e.g. #' [paste()] or [sprintf()]. The (template) string can #' include expression placeholders of the form `${expression}` or #' `$[format]{expression}`, where expressions are valid R expressions that #' can be evaluated in the given environment, and `format` is a format #' specification valid for use with [sprintf()]. #' #' @param string A template character string. This function is not vectorised: #' a character vector will be collapsed into a single string. #' @param env The environment in which to evaluate the expressions. #' @seealso [str_glue()] and [str_glue_data()] for alternative approaches to #' the same problem. #' @keywords internal #' @return An interpolated character string. #' @author Stefan Milton Bache #' @export #' @examples #' #' # Using values from the environment, and some formats #' user_name <- "smbache" #' amount <- 6.656 #' account <- 1337 #' str_interp("User ${user_name} (account $[08d]{account}) has $$[.2f]{amount}.") #' #' # Nested brace pairs work inside expressions too, and any braces can be #' # placed outside the expressions. #' str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}") #' #' # Values can also come from a list #' str_interp( #' "One value, ${value1}, and then another, ${value2*2}.", #' list(value1 = 10, value2 = 20) #' ) #' #' # Or a data frame #' str_interp( #' "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", #' iris #' ) #' #' # Use a vector when the string is long: #' max_char <- 80 #' str_interp(c( #' "This particular line is so long that it is hard to write ", #' "without breaking the ${max_char}-char barrier!" #' )) str_interp <- function(string, env = parent.frame()) { check_character(string) string <- str_c(string, collapse = "") # Find expression placeholders matches <- interp_placeholders(string) # Determine if any placeholders were found. if (matches$indices[1] <= 0) { string } else { # Evaluate them to get the replacement strings. replacements <- eval_interp_matches(matches$matches, env) # Replace the expressions by their values and return. `regmatches<-`(string, list(matches$indices), FALSE, list(replacements)) } } #' Match String Interpolation Placeholders #' #' Given a character string a set of expression placeholders are matched. They #' are of the form \code{${...}} or optionally \code{$[f]{...}} where `f` #' is a valid format for [sprintf()]. #' #' @param string character: The string to be interpolated. #' #' @return list containing `indices` (regex match data) and `matches`, #' the string representations of matched expressions. #' #' @noRd #' @author Stefan Milton Bache interp_placeholders <- function(string, error_call = caller_env()) { # Find starting position of ${} or $[]{} placeholders. starts <- gregexpr("\\$(\\[.*?\\])?\\{", string)[[1]] # Return immediately if no matches are found. if (starts[1] <= 0) { return(list(indices = starts)) } # Break up the string in parts parts <- substr( rep(string, length(starts)), start = starts, stop = c(starts[-1L] - 1L, nchar(string)) ) # If there are nested placeholders, each part will not contain a full # placeholder in which case we report invalid string interpolation template. if (any(!grepl("\\$(\\[.*?\\])?\\{.+\\}", parts))) { cli::cli_abort( tr_("Invalid template string for interpolation."), call = error_call ) } # For each part, find the opening and closing braces. opens <- lapply(strsplit(parts, ""), function(v) which(v == "{")) closes <- lapply(strsplit(parts, ""), function(v) which(v == "}")) # Identify the positions within the parts of the matching closing braces. # These are the lengths of the placeholder matches. lengths <- mapply(match_brace, opens, closes) # Update the `starts` match data with the attr(starts, "match.length") <- lengths # Return both the indices (regex match data) and the actual placeholder # matches (as strings.) list( indices = starts, matches = mapply(substr, starts, starts + lengths - 1, x = string) ) } #' Evaluate String Interpolation Matches #' #' The expression part of string interpolation matches are evaluated in a #' specified environment and formatted for replacement in the original string. #' Used internally by [str_interp()]. #' #' @param matches Match data #' #' @param env The environment in which to evaluate the expressions. #' #' @return A character vector of replacement strings. #' #' @noRd #' @author Stefan Milton Bache eval_interp_matches <- function(matches, env, error_call = caller_env()) { # Extract expressions from the matches expressions <- extract_expressions(matches, error_call = error_call) # Evaluate them in the given environment values <- lapply( expressions, eval, envir = env, enclos = if (is.environment(env)) env else environment(env) ) # Find the formats to be used formats <- extract_formats(matches) # Format the values and return. mapply(sprintf, formats, values, SIMPLIFY = FALSE) } #' Extract Expression Objects from String Interpolation Matches #' #' An interpolation match object will contain both its wrapping \code{${ }} part #' and possibly a format. This extracts the expression parts and parses them to #' prepare them for evaluation. #' #' @param matches Match data #' #' @return list of R expressions #' #' @noRd #' @author Stefan Milton Bache extract_expressions <- function(matches, error_call = caller_env()) { # Parse function for text argument as first argument. parse_text <- function(text) { withCallingHandlers( parse(text = text), error = function(e) { cli::cli_abort( tr_("Failed to parse input {.str {text}}"), parent = e, call = error_call ) } ) } # string representation of the expressions (without the possible formats). strings <- gsub("\\$(\\[.+?\\])?\\{", "", matches) # Remove the trailing closing brace and parse. lapply(substr(strings, 1L, nchar(strings) - 1), parse_text) } #' Extract String Interpolation Formats from Matched Placeholders #' #' An expression placeholder for string interpolation may optionally contain a #' format valid for [sprintf()]. This function will extract such or #' default to "s" the format for strings. #' #' @param matches Match data #' #' @return A character vector of format specifiers. #' #' @noRd #' @author Stefan Milton Bache extract_formats <- function(matches) { # Extract the optional format parts. formats <- gsub("\\$(\\[(.+?)\\])?.*", "\\2", matches) # Use string options "s" as default when not specified. paste0("%", ifelse(formats == "", "s", formats)) } #' Utility Function for Matching a Closing Brace #' #' Given positions of opening and closing braces `match_brace` identifies #' the closing brace matching the first opening brace. #' #' @param opening integer: Vector with positions of opening braces. #' #' @param closing integer: Vector with positions of closing braces. #' #' @return Integer with the posision of the matching brace. #' #' @noRd #' @author Stefan Milton Bache match_brace <- function(opening, closing) { # maximum index for the matching closing brace max_close <- max(closing) # "path" for mapping opening and closing breaces path <- numeric(max_close) # Set openings to 1, and closings to -1 path[opening[opening < max_close]] <- 1 path[closing] <- -1 # Cumulate the path ... cumpath <- cumsum(path) # ... and the first 0 after the first opening identifies the match. min(which(1:max_close > min(which(cumpath == 1)) & cumpath == 0)) } ================================================ FILE: R/length.R ================================================ #' Compute the length/width #' #' @description #' `str_length()` returns the number of codepoints in a string. These are #' the individual elements (which are often, but not always letters) that #' can be extracted with [str_sub()]. #' #' `str_width()` returns how much space the string will occupy when printed #' in a fixed width font (i.e. when printed in the console). #' #' @inheritParams str_detect #' @return A numeric vector the same length as `string`. #' @seealso [stringi::stri_length()] which this function wraps. #' @export #' @examples #' str_length(letters) #' str_length(NA) #' str_length(factor("abc")) #' str_length(c("i", "like", "programming", NA)) #' #' # Some characters, like emoji and Chinese characters (hanzi), are square #' # which means they take up the width of two Latin characters #' x <- c("\u6c49\u5b57", "\U0001f60a") #' str_view(x) #' str_width(x) #' str_length(x) #' #' # There are two ways of representing a u with an umlaut #' u <- c("\u00fc", "u\u0308") #' # They have the same width #' str_width(u) #' # But a different length #' str_length(u) #' # Because the second element is made up of a u + an accent #' str_sub(u, 1, 1) str_length <- function(string) { copy_names(string, stri_length(string)) } #' @export #' @rdname str_length str_width <- function(string) { copy_names(string, stri_width(string)) } ================================================ FILE: R/locate.R ================================================ #' Find location of match #' #' @description #' `str_locate()` returns the `start` and `end` position of the first match; #' `str_locate_all()` returns the `start` and `end` position of each match. #' #' Because the `start` and `end` values are inclusive, zero-length matches #' (e.g. `$`, `^`, `\\b`) will have an `end` that is smaller than `start`. #' #' @inheritParams str_count #' @returns #' * `str_locate()` returns an integer matrix with two columns and #' one row for each element of `string`. The first column, `start`, #' gives the position at the start of the match, and the second column, `end`, #' gives the position of the end. #' #'* `str_locate_all()` returns a list of integer matrices with the same #' length as `string`/`pattern`. The matrices have columns `start` and `end` #' as above, and one row for each match. #' @seealso #' [str_extract()] for a convenient way of extracting matches, #' [stringi::stri_locate()] for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_locate(fruit, "$") #' str_locate(fruit, "a") #' str_locate(fruit, "e") #' str_locate(fruit, c("a", "b", "p", "p")) #' #' str_locate_all(fruit, "a") #' str_locate_all(fruit, "e") #' str_locate_all(fruit, c("a", "b", "p", "p")) #' #' # Find location of every character #' str_locate_all(fruit, "") str_locate <- function(string, pattern) { check_lengths(string, pattern) out <- switch( type(pattern), empty = , bound = stri_locate_first_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_locate_first_fixed( string, pattern, opts_fixed = opts(pattern) ), coll = stri_locate_first_coll( string, pattern, opts_collator = opts(pattern) ), regex = stri_locate_first_regex(string, pattern, opts_regex = opts(pattern)) ) preserve_names_if_possible(string, pattern, out) } #' @rdname str_locate #' @export str_locate_all <- function(string, pattern) { check_lengths(string, pattern) opts <- opts(pattern) out <- switch( type(pattern), empty = , bound = stri_locate_all_boundaries( string, omit_no_match = TRUE, opts_brkiter = opts ), fixed = stri_locate_all_fixed( string, pattern, omit_no_match = TRUE, opts_fixed = opts ), regex = stri_locate_all_regex( string, pattern, omit_no_match = TRUE, opts_regex = opts ), coll = stri_locate_all_coll( string, pattern, omit_no_match = TRUE, opts_collator = opts ) ) preserve_names_if_possible(string, pattern, out) } #' Switch location of matches to location of non-matches #' #' Invert a matrix of match locations to match the opposite of what was #' previously matched. #' #' @param loc matrix of match locations, as from [str_locate_all()] #' @return numeric match giving locations of non-matches #' @export #' @examples #' numbers <- "1 and 2 and 4 and 456" #' num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] #' str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) #' #' text_loc <- invert_match(num_loc) #' str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) invert_match <- function(loc) { cbind( start = c(0L, loc[, "end"] + 1L), end = c(loc[, "start"] - 1L, -1L) ) } ================================================ FILE: R/match.R ================================================ #' Extract components (capturing groups) from a match #' #' @description #' Extract any number of matches defined by unnamed, `(pattern)`, and #' named, `(?pattern)` capture groups. #' #' Use a non-capturing group, `(?:pattern)`, if you need to override default #' operate precedence but don't want to capture the result. #' #' @inheritParams str_detect #' @param pattern Unlike other stringr functions, `str_match()` only supports #' regular expressions, as described `vignette("regular-expressions")`. #' The pattern should contain at least one capturing group. #' @return #' * `str_match()`: a character matrix with the same number of rows as the #' length of `string`/`pattern`. The first column is the complete match, #' followed by one column for each capture group. The columns will be named #' if you used "named captured groups", i.e. `(?pattern')`. #' #' * `str_match_all()`: a list of the same length as `string`/`pattern` #' containing character matrices. Each matrix has columns as described above #' and one row for each match. #' #' @seealso [str_extract()] to extract the complete match, #' [stringi::stri_match()] for the underlying implementation. #' @export #' @examples #' strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", #' "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", #' "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", #' "Home: 543.355.3679") #' phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" #' #' str_extract(strings, phone) #' str_match(strings, phone) #' #' # Extract/match all #' str_extract_all(strings, phone) #' str_match_all(strings, phone) #' #' # You can also name the groups to make further manipulation easier #' phone <- "(?[2-9][0-9]{2})[- .](?[0-9]{3}[- .][0-9]{4})" #' str_match(strings, phone) #' #' x <- c(" ", " <>", "", "", NA) #' str_match(x, "<(.*?)> <(.*?)>") #' str_match_all(x, "<(.*?)>") #' #' str_extract(x, "<.*?>") #' str_extract_all(x, "<.*?>") str_match <- function(string, pattern) { check_lengths(string, pattern) if (type(pattern) != "regex") { cli::cli_abort(tr_("{.arg pattern} must be a regular expression.")) } out <- stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) preserve_names_if_possible(string, pattern, out) } #' @rdname str_match #' @export str_match_all <- function(string, pattern) { check_lengths(string, pattern) if (type(pattern) != "regex") { cli::cli_abort(tr_("{.arg pattern} must be a regular expression.")) } out <- stri_match_all_regex( string, pattern, omit_no_match = TRUE, opts_regex = opts(pattern) ) preserve_names_if_possible(string, pattern, out) } ================================================ FILE: R/modifiers.R ================================================ #' Control matching behaviour with modifier functions #' #' @description #' Modifier functions control the meaning of the `pattern` argument to #' stringr functions: #' #' * `boundary()`: Match boundaries between things. #' * `coll()`: Compare strings using standard Unicode collation rules. #' * `fixed()`: Compare literal bytes. #' * `regex()` (the default): Uses ICU regular expressions. #' #' @param pattern Pattern to modify behaviour. #' @param ignore_case Should case differences be ignored in the match? #' For `fixed()`, this uses a simple algorithm which assumes a #' one-to-one mapping between upper and lower case letters. #' @return A stringr modifier object, i.e. a character vector with #' parent S3 class `stringr_pattern`. #' @name modifiers #' @examples #' pattern <- "a.b" #' strings <- c("abb", "a.b") #' str_detect(strings, pattern) #' str_detect(strings, fixed(pattern)) #' str_detect(strings, coll(pattern)) #' #' # coll() is useful for locale-aware case-insensitive matching #' i <- c("I", "\u0130", "i") #' i #' str_detect(i, fixed("i", TRUE)) #' str_detect(i, coll("i", TRUE)) #' str_detect(i, coll("i", TRUE, locale = "tr")) #' #' # Word boundaries #' words <- c("These are some words.") #' str_count(words, boundary("word")) #' str_split(words, " ")[[1]] #' str_split(words, boundary("word"))[[1]] #' #' # Regular expression variations #' str_extract_all("The Cat in the Hat", "[a-z]+") #' str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) #' #' str_extract_all("a\nb\nc", "^.") #' str_extract_all("a\nb\nc", regex("^.", multiline = TRUE)) #' #' str_extract_all("a\nb\nc", "a.") #' str_extract_all("a\nb\nc", regex("a.", dotall = TRUE)) NULL #' @export #' @rdname modifiers fixed <- function(pattern, ignore_case = FALSE) { pattern <- as_bare_character(pattern) check_bool(ignore_case) options <- stri_opts_fixed(case_insensitive = ignore_case) structure( pattern, options = options, class = c("stringr_fixed", "stringr_pattern", "character") ) } #' @export #' @rdname modifiers #' @param locale Locale to use for comparisons. See #' [stringi::stri_locale_list()] for all possible options. #' Defaults to "en" (English) to ensure that default behaviour is #' consistent across platforms. #' @param ... Other less frequently used arguments passed on to #' [stringi::stri_opts_collator()], #' [stringi::stri_opts_regex()], or #' [stringi::stri_opts_brkiter()] coll <- function(pattern, ignore_case = FALSE, locale = "en", ...) { pattern <- as_bare_character(pattern) check_bool(ignore_case) check_string(locale) options <- str_opts_collator( ignore_case = ignore_case, locale = locale, ... ) structure( pattern, options = options, class = c("stringr_coll", "stringr_pattern", "character") ) } str_opts_collator <- function( locale = "en", ignore_case = FALSE, strength = NULL, ... ) { strength <- strength %||% if (ignore_case) 2L else 3L stri_opts_collator( strength = strength, locale = locale, ... ) } # used for testing turkish_I <- function() { coll("I", ignore_case = TRUE, locale = "tr") } #' @export #' @rdname modifiers #' @param multiline If `TRUE`, `$` and `^` match #' the beginning and end of each line. If `FALSE`, the #' default, only match the start and end of the input. #' @param comments If `TRUE`, white space and comments beginning with #' `#` are ignored. Escape literal spaces with `\\ `. #' @param dotall If `TRUE`, `.` will also match line terminators. regex <- function( pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ... ) { pattern <- as_bare_character(pattern) check_bool(ignore_case) check_bool(multiline) check_bool(comments) check_bool(dotall) options <- stri_opts_regex( case_insensitive = ignore_case, multiline = multiline, comments = comments, dotall = dotall, ... ) structure( pattern, options = options, class = c("stringr_regex", "stringr_pattern", "character") ) } #' @param type Boundary type to detect. #' \describe{ #' \item{`character`}{Every character is a boundary.} #' \item{`line_break`}{Boundaries are places where it is acceptable to have #' a line break in the current locale.} #' \item{`sentence`}{The beginnings and ends of sentences are boundaries, #' using intelligent rules to avoid counting abbreviations #' ([details](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)).} #' \item{`word`}{The beginnings and ends of words are boundaries.} #' } #' @param skip_word_none Ignore "words" that don't contain any characters #' or numbers - i.e. punctuation. Default `NA` will skip such "words" #' only when splitting on `word` boundaries. #' @export #' @rdname modifiers boundary <- function( type = c("character", "line_break", "sentence", "word"), skip_word_none = NA, ... ) { type <- arg_match(type) check_bool(skip_word_none, allow_na = TRUE) if (identical(skip_word_none, NA)) { skip_word_none <- type == "word" } options <- stri_opts_brkiter( type = type, skip_word_none = skip_word_none, ... ) structure( NA_character_, options = options, class = c("stringr_boundary", "stringr_pattern", "character") ) } opts <- function(x) { if (identical(x, "")) { stri_opts_brkiter(type = "character") } else { attr(x, "options") } } type <- function(x, error_call = caller_env()) { UseMethod("type") } #' @export type.stringr_boundary <- function(x, error_call = caller_env()) { "bound" } #' @export type.stringr_regex <- function(x, error_call = caller_env()) { "regex" } #' @export type.stringr_coll <- function(x, error_call = caller_env()) { "coll" } #' @export type.stringr_fixed <- function(x, error_call = caller_env()) { "fixed" } #' @export type.character <- function(x, error_call = caller_env()) { if (any(is.na(x))) { cli::cli_abort( tr_("{.arg pattern} can not contain NAs."), call = error_call ) } if (identical(x, "")) "empty" else "regex" } #' @export type.default <- function(x, error_call = caller_env()) { if (inherits(x, "regex")) { # Fallback for rex return("regex") } cli::cli_abort( tr_( "{.arg pattern} must be a character vector, not {.obj_type_friendly {x}}." ), call = error_call ) } #' @export `[.stringr_pattern` <- function(x, i) { structure( NextMethod(), options = attr(x, "options"), class = class(x) ) } #' @export `[[.stringr_pattern` <- function(x, i) { structure( NextMethod(), options = attr(x, "options"), class = class(x) ) } as_bare_character <- function(x, call = caller_env()) { if (is.character(x) && !is.object(x)) { # All OK! return(x) } warn("Coercing `pattern` to a plain character vector.", call = call) as.character(x) } ================================================ FILE: R/pad.R ================================================ #' Pad a string to minimum width #' #' Pad a string to a fixed width, so that #' `str_length(str_pad(x, n))` is always greater than or equal to `n`. #' #' @inheritParams str_detect #' @param width Minimum width of padded strings. #' @param side Side on which padding character is added (left, right or both). #' @param pad Single padding character (default is a space). #' @param use_width If `FALSE`, use the length of the string instead of the #' width; see [str_width()]/[str_length()] for the difference. #' @return A character vector the same length as `stringr`/`width`/`pad`. #' @seealso [str_trim()] to remove whitespace; #' [str_trunc()] to decrease the maximum width of a string. #' @export #' @examples #' rbind( #' str_pad("hadley", 30, "left"), #' str_pad("hadley", 30, "right"), #' str_pad("hadley", 30, "both") #' ) #' #' # All arguments are vectorised except side #' str_pad(c("a", "abc", "abcdef"), 10) #' str_pad("a", c(5, 10, 20)) #' str_pad("a", 10, pad = c("-", "_", " ")) #' #' # Longer strings are returned unchanged #' str_pad("hadley", 3) str_pad <- function( string, width, side = c("left", "right", "both"), pad = " ", use_width = TRUE ) { vctrs::vec_size_common(string = string, width = width, pad = pad) side <- arg_match(side) check_bool(use_width) out <- switch( side, left = stri_pad_left(string, width, pad = pad, use_length = !use_width), right = stri_pad_right(string, width, pad = pad, use_length = !use_width), both = stri_pad_both(string, width, pad = pad, use_length = !use_width) ) # Preserve names unless `string` is recycled if (length(out) == length(string)) copy_names(string, out) else out } ================================================ FILE: R/remove.R ================================================ #' Remove matched patterns #' #' Remove matches, i.e. replace them with `""`. #' #' @inheritParams str_detect #' @return A character vector the same length as `string`/`pattern`. #' @seealso [str_replace()] for the underlying implementation. #' @export #' @examples #' fruits <- c("one apple", "two pears", "three bananas") #' str_remove(fruits, "[aeiou]") #' str_remove_all(fruits, "[aeiou]") str_remove <- function(string, pattern) { str_replace(string, pattern, "") } #' @export #' @rdname str_remove str_remove_all <- function(string, pattern) { str_replace_all(string, pattern, "") } ================================================ FILE: R/replace.R ================================================ #' Replace matches with new text #' #' `str_replace()` replaces the first match; `str_replace_all()` replaces #' all matches. #' #' @inheritParams str_detect #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described #' in [stringi::about_search_regex]. Control options with #' [regex()]. #' #' For `str_replace_all()` this can also be a named vector #' (`c(pattern1 = replacement1)`), in order to perform multiple replacements #' in each element of `string`. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' #' You can not match boundaries, including `""`, with this function. #' @param replacement The replacement value, usually a single string, #' but it can be the a vector the same length as `string` or `pattern`. #' References of the form `\1`, `\2`, etc will be replaced with #' the contents of the respective matched group (created by `()`). #' #' Alternatively, supply a function (or formula): it will be passed a single #' character vector and should return a character vector of the same length. #' #' To replace the complete string with `NA`, use #' `replacement = NA_character_`. #' @return A character vector the same length as #' `string`/`pattern`/`replacement`. #' @seealso [str_replace_na()] to turn missing values into "NA"; #' [stringi::stri_replace()] for the underlying implementation. #' @export #' @examples #' fruits <- c("one apple", "two pears", "three bananas") #' str_replace(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", toupper) #' str_replace_all(fruits, "b", NA_character_) #' #' str_replace(fruits, "([aeiou])", "") #' str_replace(fruits, "([aeiou])", "\\1\\1") #' #' # Note that str_replace() is vectorised along text, pattern, and replacement #' str_replace(fruits, "[aeiou]", c("1", "2", "3")) #' str_replace(fruits, c("a", "e", "i"), "-") #' #' # If you want to apply multiple patterns and replacements to the same #' # string, pass a named vector to pattern. #' fruits %>% #' str_c(collapse = "---") %>% #' str_replace_all(c("one" = "1", "two" = "2", "three" = "3")) #' #' # Use a function for more sophisticated replacement. This example #' # replaces colour names with their hex values. #' colours <- str_c("\\b", colors(), "\\b", collapse="|") #' col2hex <- function(col) { #' rgb <- col2rgb(col) #' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], maxColorValue = 255) #' } #' #' x <- c( #' "Roses are red, violets are blue", #' "My favourite colour is green" #' ) #' str_replace_all(x, colours, col2hex) str_replace <- function(string, pattern, replacement) { if (!missing(replacement) && is_replacement_fun(replacement)) { replacement <- as_function(replacement) return(str_transform(string, pattern, replacement)) } check_lengths(string, pattern, replacement) out <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_replace_first_fixed( string, pattern, replacement, opts_fixed = opts(pattern) ), coll = stri_replace_first_coll( string, pattern, replacement, opts_collator = opts(pattern) ), regex = stri_replace_first_regex( string, pattern, fix_replacement(replacement), opts_regex = opts(pattern) ) ) preserve_names_if_possible(string, pattern, out) } #' @export #' @rdname str_replace str_replace_all <- function(string, pattern, replacement) { if (!missing(replacement) && is_replacement_fun(replacement)) { replacement <- as_function(replacement) return(str_transform_all(string, pattern, replacement)) } if (!is.null(names(pattern))) { vec <- FALSE replacement <- unname(pattern) pattern[] <- names(pattern) } else { check_lengths(string, pattern, replacement) vec <- TRUE } out <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_replace_all_fixed( string, pattern, replacement, vectorize_all = vec, opts_fixed = opts(pattern) ), coll = stri_replace_all_coll( string, pattern, replacement, vectorize_all = vec, opts_collator = opts(pattern) ), regex = stri_replace_all_regex( string, pattern, fix_replacement(replacement), vectorize_all = vec, opts_regex = opts(pattern) ) ) preserve_names_if_possible(string, pattern, out) } is_replacement_fun <- function(x) { is.function(x) || is_formula(x) } fix_replacement <- function(x, error_call = caller_env()) { check_character(x, arg = "replacement", call = error_call) vapply(x, fix_replacement_one, character(1), USE.NAMES = FALSE) } fix_replacement_one <- function(x) { if (is.na(x)) { return(x) } chars <- str_split(x, "")[[1]] out <- character(length(chars)) escaped <- logical(length(chars)) in_escape <- FALSE for (i in seq_along(chars)) { escaped[[i]] <- in_escape char <- chars[[i]] if (in_escape) { # Escape character not printed previously so must include here if (char == "$") { out[[i]] <- "\\\\$" } else if (char >= "0" && char <= "9") { out[[i]] <- paste0("$", char) } else { out[[i]] <- paste0("\\", char) } in_escape <- FALSE } else { if (char == "$") { out[[i]] <- "\\$" } else if (char == "\\") { in_escape <- TRUE } else { out[[i]] <- char } } } # tibble::tibble(chars, out, escaped) paste0(out, collapse = "") } #' Turn NA into "NA" #' #' @inheritParams str_replace #' @param replacement A single string. #' @export #' @examples #' str_replace_na(c(NA, "abc", "def")) str_replace_na <- function(string, replacement = "NA") { check_string(replacement) copy_names(string, stri_replace_na(string, replacement)) } str_transform <- function(string, pattern, replacement) { loc <- str_locate(string, pattern) new <- replacement(str_sub(string, loc)) str_sub(string, loc, omit_na = TRUE) <- new string } str_transform_all <- function( string, pattern, replacement, error_call = caller_env() ) { locs <- str_locate_all(string, pattern) old <- str_sub_all(string, locs) # unchop list into a vector, apply replacement(), and then rechop back into # a list old_flat <- vctrs::list_unchop(old) if (length(old_flat) == 0) { # minor optimisation to avoid problems with the many replacement # functions that use paste new_flat <- character() } else { withCallingHandlers( new_flat <- replacement(old_flat), error = function(cnd) { cli::cli_abort( c( tr_("Failed to apply {.arg replacement} function."), i = tr_("It must accept a character vector of any length.") ), parent = cnd, call = error_call ) } ) } if (!is.character(new_flat)) { cli::cli_abort( tr_( "{.arg replacement} function must return a character vector, not {.obj_type_friendly {new_flat}}." ), call = error_call ) } if (length(new_flat) != length(old_flat)) { cli::cli_abort( tr_( "{.arg replacement} function must return a vector the same length as the input ({length(old_flat)}), not length {length(new_flat)}." ), call = error_call ) } idx <- chop_index(old) new <- vctrs::vec_chop(new_flat, idx) stringi::stri_sub_all(string, locs) <- new string } chop_index <- function(x) { ls <- lengths(x) start <- cumsum(c(1L, ls[-length(ls)])) end <- start + ls - 1L lapply(seq_along(ls), function(i) seq2(start[[i]], end[[i]])) } ================================================ FILE: R/sort.R ================================================ #' Order, rank, or sort a character vector #' #' * `str_sort()` returns the sorted vector. #' * `str_order()` returns an integer vector that returns the desired #' order when used for subsetting, i.e. `x[str_order(x)]` is the same #' as `str_sort()` #' * `str_rank()` returns the ranks of the values, i.e. #' `arrange(df, str_rank(x))` is the same as `str_sort(df$x)`. #' #' @param x A character vector to sort. #' @param decreasing A boolean. If `FALSE`, the default, sorts from #' lowest to highest; if `TRUE` sorts from highest to lowest. #' @param na_last Where should `NA` go? `TRUE` at the end, #' `FALSE` at the beginning, `NA` dropped. #' @param numeric If `TRUE`, will sort digits numerically, instead #' of as strings. #' @param ... Other options used to control collation. Passed on to #' [stringi::stri_opts_collator()]. #' @inheritParams coll #' @return A character vector the same length as `string`. #' @seealso [stringi::stri_order()] for the underlying implementation. #' @export #' @examples #' x <- c("apple", "car", "happy", "char") #' str_sort(x) #' #' str_order(x) #' x[str_order(x)] #' #' str_rank(x) #' #' # In Czech, ch is a digraph that sorts after h #' str_sort(x, locale = "cs") #' #' # Use numeric = TRUE to sort numbers in strings #' x <- c("100a10", "100a5", "2b", "2a") #' str_sort(x) #' str_sort(x, numeric = TRUE) str_order <- function( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) { check_bool(decreasing) check_bool(na_last, allow_na = TRUE) check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) stri_order( x, decreasing = decreasing, na_last = na_last, opts_collator = opts ) } #' @export #' @rdname str_order str_rank <- function(x, locale = "en", numeric = FALSE, ...) { check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) stri_rank(x, opts_collator = opts) } #' @export #' @rdname str_order str_sort <- function( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) { check_bool(decreasing) check_bool(na_last, allow_na = TRUE) check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) idx <- stri_order( x, decreasing = decreasing, na_last = na_last, opts_collator = opts ) x[idx] } ================================================ FILE: R/split.R ================================================ #' Split up a string into pieces #' #' @description #' This family of functions provides various ways of splitting a string up #' into pieces. These two functions return a character vector: #' #' * `str_split_1()` takes a single string and splits it into pieces, #' returning a single character vector. #' * `str_split_i()` splits each string in a character vector into pieces and #' extracts the `i`th value, returning a character vector. #' #' These two functions return a more complex object: #' #' * `str_split()` splits each string in a character vector into a varying #' number of pieces, returning a list of character vectors. #' * `str_split_fixed()` splits each string in a character vector into a #' fixed number of pieces, returning a character matrix. #' #' @inheritParams str_extract #' @param n Maximum number of pieces to return. Default (Inf) uses all #' possible split positions. #' #' For `str_split()`, this determines the maximum length of each element #' of the output. For `str_split_fixed()`, this determines the number of #' columns in the output; if an input is too short, the result will be padded #' with `""`. #' @return #' * `str_split_1()`: a character vector. #' * `str_split()`: a list the same length as `string`/`pattern` containing #' character vectors. #' * `str_split_fixed()`: a character matrix with `n` columns and the same #' number of rows as the length of `string`/`pattern`. #' * `str_split_i()`: a character vector the same length as `string`/`pattern`. #' @seealso [stringi::stri_split()] for the underlying implementation. #' @export #' @examples #' fruits <- c( #' "apples and oranges and pears and bananas", #' "pineapples and mangos and guavas" #' ) #' #' str_split(fruits, " and ") #' str_split(fruits, " and ", simplify = TRUE) #' #' # If you want to split a single string, use `str_split_1` #' str_split_1(fruits[[1]], " and ") #' #' # Specify n to restrict the number of possible matches #' str_split(fruits, " and ", n = 3) #' str_split(fruits, " and ", n = 2) #' # If n greater than number of pieces, no padding occurs #' str_split(fruits, " and ", n = 5) #' #' # Use fixed to return a character matrix #' str_split_fixed(fruits, " and ", 3) #' str_split_fixed(fruits, " and ", 4) #' #' # str_split_i extracts only a single piece from a string #' str_split_i(fruits, " and ", 1) #' str_split_i(fruits, " and ", 4) #' # use a negative number to select from the end #' str_split_i(fruits, " and ", -1) str_split <- function(string, pattern, n = Inf, simplify = FALSE) { check_lengths(string, pattern) check_positive_integer(n) check_bool(simplify, allow_na = TRUE) if (identical(n, Inf)) { n <- -1L } out <- switch( type(pattern), empty = stri_split_boundaries( string, n = n, simplify = simplify, opts_brkiter = opts(pattern) ), bound = stri_split_boundaries( string, n = n, simplify = simplify, opts_brkiter = opts(pattern) ), fixed = stri_split_fixed( string, pattern, n = n, simplify = simplify, opts_fixed = opts(pattern) ), regex = stri_split_regex( string, pattern, n = n, simplify = simplify, opts_regex = opts(pattern) ), coll = stri_split_coll( string, pattern, n = n, simplify = simplify, opts_collator = opts(pattern) ) ) preserve_names_if_possible(string, pattern, out) } #' @export #' @rdname str_split str_split_1 <- function(string, pattern) { check_string(string) str_split(string, pattern)[[1]] } #' @export #' @rdname str_split str_split_fixed <- function(string, pattern, n) { check_lengths(string, pattern) check_positive_integer(n) str_split(string, pattern, n = n, simplify = TRUE) } #' @export #' @rdname str_split #' @param i Element to return. Use a negative value to count from the #' right hand side. str_split_i <- function(string, pattern, i) { check_number_whole(i) if (i > 0) { out <- str_split(string, pattern, simplify = NA, n = i + 1) col <- out[, i] if (keep_names(string, pattern)) copy_names(string, col) else col } else if (i < 0) { i <- abs(i) pieces <- str_split(string, pattern) last <- function(x) { n <- length(x) if (i > n) { NA_character_ } else { x[[n + 1 - i]] } } out <- map_chr(pieces, last) preserve_names_if_possible(string, pattern, out) } else { cli::cli_abort(tr_("{.arg i} must not be 0.")) } } check_positive_integer <- function( x, arg = caller_arg(x), call = caller_env() ) { if (!identical(x, Inf)) { check_number_whole(x, min = 1, arg = arg, call = call) } } ================================================ FILE: R/stringr-package.R ================================================ #' @keywords internal "_PACKAGE" ## usethis namespace: start #' @import stringi #' @import rlang #' @importFrom glue glue #' @importFrom lifecycle deprecated ## usethis namespace: end NULL ================================================ FILE: R/sub.R ================================================ #' Get and set substrings using their positions #' #' `str_sub()` extracts or replaces the elements at a single position in each #' string. `str_sub_all()` allows you to extract strings at multiple elements #' in every string. #' #' @inheritParams str_detect #' @param start,end A pair of integer vectors defining the range of characters #' to extract (inclusive). Positive values count from the left of the string, #' and negative values count from the right. In other words, if `string` is #' `"abcdef"` then 1 refers to `"a"` and -1 refers to `"f"`. #' #' Alternatively, instead of a pair of vectors, you can pass a matrix to #' `start`. The matrix should have two columns, either labelled `start` #' and `end`, or `start` and `length`. This makes `str_sub()` work directly #' with the output from [str_locate()] and friends. #' #' @param omit_na Single logical value. If `TRUE`, missing values in any of the #' arguments provided will result in an unchanged input. #' @param value Replacement string. #' @return #' * `str_sub()`: A character vector the same length as `string`/`start`/`end`. #' * `str_sub_all()`: A list the same length as `string`. Each element is #' a character vector the same length as `start`/`end`. #' #' If `end` comes before `start` or `start` is outside the range of `string` #' then the corresponding output will be the empty string. #' @seealso The underlying implementation in [stringi::stri_sub()] #' @export #' @examples #' hw <- "Hadley Wickham" #' #' str_sub(hw, 1, 6) #' str_sub(hw, end = 6) #' str_sub(hw, 8, 14) #' str_sub(hw, 8) #' #' # Negative values index from end of string #' str_sub(hw, -1) #' str_sub(hw, -7) #' str_sub(hw, end = -7) #' #' # str_sub() is vectorised by both string and position #' str_sub(hw, c(1, 8), c(6, 14)) #' #' # if you want to extract multiple positions from multiple strings, #' # use str_sub_all() #' x <- c("abcde", "ghifgh") #' str_sub(x, c(1, 2), c(2, 4)) #' str_sub_all(x, start = c(1, 2), end = c(2, 4)) #' #' # Alternatively, you can pass in a two column matrix, as in the #' # output from str_locate_all #' pos <- str_locate_all(hw, "[aeio]")[[1]] #' pos #' str_sub(hw, pos) #' #' # You can also use `str_sub()` to modify strings: #' x <- "BBCDEF" #' str_sub(x, 1, 1) <- "A"; x #' str_sub(x, -1, -1) <- "K"; x #' str_sub(x, -2, -2) <- "GHIJ"; x #' str_sub(x, 2, -2) <- ""; x str_sub <- function(string, start = 1L, end = -1L) { vctrs::vec_size_common(string = string, start = start, end = end) out <- if (is.matrix(start)) { stri_sub(string, from = start) } else { stri_sub(string, from = start, to = end) } # Preserve names unless `string` is recycled if (length(out) == length(string)) copy_names(string, out) else out } #' @export #' @rdname str_sub "str_sub<-" <- function(string, start = 1L, end = -1L, omit_na = FALSE, value) { vctrs::vec_size_common( string = string, start = start, end = end, value = value ) if (is.matrix(start)) { stri_sub(string, from = start, omit_na = omit_na) <- value } else { stri_sub(string, from = start, to = end, omit_na = omit_na) <- value } string } #' @export #' @rdname str_sub str_sub_all <- function(string, start = 1L, end = -1L) { out <- if (is.matrix(start)) { stri_sub_all(string, from = start) } else { stri_sub_all(string, from = start, to = end) } copy_names(string, out) } ================================================ FILE: R/subset.R ================================================ #' Find matching elements #' #' @description #' `str_subset()` returns all elements of `string` where there's at least #' one match to `pattern`. It's a wrapper around `x[str_detect(x, pattern)]`, #' and is equivalent to `grep(pattern, x, value = TRUE)`. #' #' Use [str_extract()] to find the location of the match _within_ each string. #' #' @inheritParams str_detect #' @return A character vector, usually smaller than `string`. #' @seealso [grep()] with argument `value = TRUE`, #' [stringi::stri_subset()] for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_subset(fruit, "a") #' #' str_subset(fruit, "^a") #' str_subset(fruit, "a$") #' str_subset(fruit, "b") #' str_subset(fruit, "[aeiou]") #' #' # Elements that don't match #' str_subset(fruit, "^p", negate = TRUE) #' #' # Missings never match #' str_subset(c("a", NA, "b"), ".") str_subset <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) idx <- switch( type(pattern), empty = no_empty(), bound = no_boundary(), fixed = str_detect(string, pattern, negate = negate), coll = str_detect(string, pattern, negate = negate), regex = str_detect(string, pattern, negate = negate) ) idx[is.na(idx)] <- FALSE string[idx] } #' Find matching indices #' #' `str_which()` returns the indices of `string` where there's at least #' one match to `pattern`. It's a wrapper around #' `which(str_detect(x, pattern))`, and is equivalent to `grep(pattern, x)`. #' #' @inheritParams str_detect #' @return An integer vector, usually smaller than `string`. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_which(fruit, "a") #' #' # Elements that don't match #' str_which(fruit, "^p", negate = TRUE) #' #' # Missings never match #' str_which(c("a", NA, "b"), ".") str_which <- function(string, pattern, negate = FALSE) { which(str_detect(string, pattern, negate = negate)) } ================================================ FILE: R/trim.R ================================================ #' Remove whitespace #' #' `str_trim()` removes whitespace from start and end of string; `str_squish()` #' removes whitespace at the start and end, and replaces all internal whitespace #' with a single space. #' #' @inheritParams str_detect #' @param side Side on which to remove whitespace: "left", "right", or #' "both", the default. #' @return A character vector the same length as `string`. #' @export #' @seealso [str_pad()] to add whitespace #' @examples #' str_trim(" String with trailing and leading white space\t") #' str_trim("\n\nString with trailing and leading white space\n\n") #' #' str_squish(" String with trailing, middle, and leading white space\t") #' str_squish("\n\nString with excess, trailing and leading white space\n\n") str_trim <- function(string, side = c("both", "left", "right")) { side <- arg_match(side) out <- switch( side, left = stri_trim_left(string), right = stri_trim_right(string), both = stri_trim_both(string) ) copy_names(string, out) } #' @export #' @rdname str_trim str_squish <- function(string) { copy_names(string, stri_trim_both(str_replace_all(string, "\\s+", " "))) } ================================================ FILE: R/trunc.R ================================================ #' Truncate a string to maximum width #' #' Truncate a string to a fixed of characters, so that #' `str_length(str_trunc(x, n))` is always less than or equal to `n`. #' #' @inheritParams str_detect #' @param width Maximum width of string. #' @param side,ellipsis Location and content of ellipsis that indicates #' content has been removed. #' @return A character vector the same length as `string`. #' @seealso [str_pad()] to increase the minimum width of a string. #' @export #' @examples #' x <- "This string is moderately long" #' rbind( #' str_trunc(x, 20, "right"), #' str_trunc(x, 20, "left"), #' str_trunc(x, 20, "center") #' ) str_trunc <- function( string, width, side = c("right", "left", "center"), ellipsis = "..." ) { check_number_whole(width) side <- arg_match(side) check_string(ellipsis) len <- str_length(string) too_long <- !is.na(string) & len > width width... <- width - str_length(ellipsis) if (width... < 0) { cli::cli_abort( tr_( "`width` ({width}) is shorter than `ellipsis` ({str_length(ellipsis)})." ) ) } string[too_long] <- switch( side, right = str_c(str_sub(string[too_long], 1, width...), ellipsis), left = str_c( ellipsis, str_sub(string[too_long], len[too_long] - width... + 1, -1) ), center = str_c( str_sub(string[too_long], 1, ceiling(width... / 2)), ellipsis, str_sub(string[too_long], len[too_long] - floor(width... / 2) + 1, -1) ) ) string } ================================================ FILE: R/unique.R ================================================ #' Remove duplicated strings #' #' `str_unique()` removes duplicated values, with optional control over #' how duplication is measured. #' #' @inheritParams str_detect #' @inheritParams str_equal #' @return A character vector, usually shorter than `string`. #' @seealso [unique()], [stringi::stri_unique()] which this function wraps. #' @examples #' str_unique(c("a", "b", "c", "b", "a")) #' #' str_unique(c("a", "b", "c", "B", "A")) #' str_unique(c("a", "b", "c", "B", "A"), ignore_case = TRUE) #' #' # Use ... to pass additional arguments to stri_unique() #' str_unique(c("motley", "mötley", "pinguino", "pingüino")) #' str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1) #' @export str_unique <- function(string, locale = "en", ignore_case = FALSE, ...) { check_string(locale) check_bool(ignore_case) opts <- str_opts_collator( locale = locale, ignore_case = ignore_case, ... ) keep <- !stringi::stri_duplicated(string, opts_collator = opts) string[keep] } ================================================ FILE: R/utils.R ================================================ #' Pipe operator #' #' @name %>% #' @rdname pipe #' @keywords internal #' @export #' @importFrom magrittr %>% #' @usage lhs \%>\% rhs NULL check_lengths <- function( string, pattern, replacement = NULL, error_call = caller_env() ) { # stringi already correctly recycles vectors of length 0 and 1 # we just want more stringent vctrs checks for other lengths vctrs::vec_size_common( string = string, pattern = pattern, replacement = replacement, .call = error_call ) } no_boundary <- function(call = caller_env()) { cli::cli_abort(tr_("{.arg pattern} can't be a boundary."), call = call) } no_empty <- function(call = caller_env()) { cli::cli_abort( tr_("{.arg pattern} can't be the empty string ({.code \"\"})."), call = call ) } tr_ <- function(...) { enc2utf8(gettext(paste0(...), domain = "R-stringr")) } # copy names from `string` to output, regardless of output type copy_names <- function(from, to) { nm <- names(from) if (is.null(nm)) { return(to) } if (is.matrix(to)) { rownames(to) <- nm to } else { set_names(to, nm) } } # keep names if pattern is scalar (i.e. vectorised) or same length as string. keep_names <- function(string, pattern) { length(pattern) == 1L || length(pattern) == length(string) } preserve_names_if_possible <- function(string, pattern, out) { if (keep_names(string, pattern)) { copy_names(string, out) } else { out } } ================================================ FILE: R/view.R ================================================ #' View strings and matches #' #' @description #' `str_view()` is used to print the underlying representation of a string and #' to see how a `pattern` matches. #' #' Matches are surrounded by `<>` and unusual whitespace (i.e. all whitespace #' apart from `" "` and `"\n"`) are surrounded by `{}` and escaped. Where #' possible, matches and unusual whitespace are coloured blue and `NA`s red. #' #' @inheritParams str_detect #' @param match If `pattern` is supplied, which elements should be shown? #' #' * `TRUE`, the default, shows only elements that match the pattern. #' * `NA` shows all elements. #' * `FALSE` shows only elements that don't match the pattern. #' #' If `pattern` is not supplied, all elements are always shown. #' @param html Use HTML output? If `TRUE` will create an HTML widget; if `FALSE` #' will style using ANSI escapes. #' @param use_escapes If `TRUE`, all non-ASCII characters will be rendered #' with unicode escapes. This is useful to see exactly what underlying #' values are stored in the string. #' @export #' @examples #' # Show special characters #' str_view(c("\"\\", "\\\\\\", "fgh", NA, "NA")) #' #' # A non-breaking space looks like a regular space: #' nbsp <- "Hi\u00A0you" #' nbsp #' # But it doesn't behave like one: #' str_detect(nbsp, " ") #' # So str_view() brings it to your attention with a blue background #' str_view(nbsp) #' #' # You can also use escapes to see all non-ASCII characters #' str_view(nbsp, use_escapes = TRUE) #' #' # Supply a pattern to see where it matches #' str_view(c("abc", "def", "fghi"), "[aeiou]") #' str_view(c("abc", "def", "fghi"), "^") #' str_view(c("abc", "def", "fghi"), "..") #' #' # By default, only matching strings will be shown #' str_view(c("abc", "def", "fghi"), "e") #' # but you can show all: #' str_view(c("abc", "def", "fghi"), "e", match = NA) #' # or just those that don't match: #' str_view(c("abc", "def", "fghi"), "e", match = FALSE) str_view <- function( string, pattern = NULL, match = TRUE, html = FALSE, use_escapes = FALSE ) { rec <- vctrs::vec_recycle_common(string = string, pattern = pattern) string <- rec$string pattern <- rec$pattern check_bool(match, allow_na = TRUE) check_bool(html) check_bool(use_escapes) filter <- str_view_filter(string, pattern, match) out <- string[filter] pattern <- pattern[filter] if (!is.null(pattern)) { out <- str_replace_all(out, pattern, str_view_highlighter(html)) } if (use_escapes) { out <- stri_escape_unicode(out) out <- str_replace_all(out, fixed("\\u001b"), "\u001b") } else { out <- str_view_special(out, html = html) } str_view_print(out, filter, html = html) } #' @rdname str_view #' @usage NULL #' @export str_view_all <- function( string, pattern = NULL, match = NA, html = FALSE, use_escapes = FALSE ) { lifecycle::deprecate_warn("1.5.0", "str_view_all()", "str_view()") str_view( string = string, pattern = pattern, match = match, html = html, use_escapes = use_escapes ) } str_view_filter <- function(x, pattern, match) { if (is.null(pattern) || inherits(pattern, "stringr_boundary")) { rep(TRUE, length(x)) } else { if (identical(match, TRUE)) { str_detect(x, pattern) & !is.na(x) } else if (identical(match, FALSE)) { !str_detect(x, pattern) | is.na(x) } else { rep(TRUE, length(x)) } } } # Helpers ----------------------------------------------------------------- str_view_highlighter <- function(html = TRUE) { if (html) { function(x) str_c("", x, "") } else { function(x) { out <- cli::col_cyan("<", x, ">") # Ensure styling is starts and ends within each line out <- cli::ansi_strsplit(out, "\n", fixed = TRUE) out <- map_chr(out, str_flatten, "\n") out } } } str_view_special <- function(x, html = TRUE) { if (html) { replace <- function(x) str_c("", x, "") } else { replace <- function(x) { if (length(x) == 0) { return(character()) } cli::col_cyan("{", stri_escape_unicode(x), "}") } } # Highlight any non-standard whitespace characters str_replace_all(x, "[\\p{Whitespace}-- \n]+", replace) } str_view_print <- function(x, filter, html = TRUE) { if (html) { str_view_widget(x) } else { structure(x, id = which(filter), class = "stringr_view") } } str_view_widget <- function(lines) { check_installed(c("htmltools", "htmlwidgets")) lines <- str_replace_na(lines) bullets <- str_c( "
    \n", str_c("
  • ", lines, "
  • ", collapse = "\n"), "\n
" ) html <- htmltools::HTML(bullets) size <- htmlwidgets::sizingPolicy( knitr.figure = FALSE, defaultHeight = pmin(10 * length(lines), 300), knitr.defaultHeight = "100%" ) htmlwidgets::createWidget( "str_view", list(html = html), sizingPolicy = size, package = "stringr" ) } #' @export print.stringr_view <- function(x, ..., n = getOption("stringr.view_n", 20)) { n_extra <- length(x) - n if (n_extra > 0) { x <- x[seq_len(n)] } if (length(x) == 0) { cli::cli_inform(c(x = "Empty `string` provided.\n")) return(invisible(x)) } bar <- if (cli::is_utf8_output()) "\u2502" else "|" id <- format(paste0("[", attr(x, "id"), "] "), justify = "right") indent <- paste0(cli::col_grey(id, bar), " ") exdent <- paste0(strrep(" ", nchar(id[[1]])), cli::col_grey(bar), " ") x[is.na(x)] <- cli::col_red("NA") x <- paste0(indent, x) x <- str_replace_all(x, "\n", paste0("\n", exdent)) cat(x, sep = "\n") if (n_extra > 0) { cat("... and ", n_extra, " more\n", sep = "") } invisible(x) } #' @export `[.stringr_view` <- function(x, i, ...) { structure(NextMethod(), id = attr(x, "id")[i], class = "stringr_view") } ================================================ FILE: R/word.R ================================================ #' Extract words from a sentence #' #' @inheritParams str_detect #' @param start,end Pair of integer vectors giving range of words (inclusive) #' to extract. If negative, counts backwards from the last word. #' #' The default value select the first word. #' @param sep Separator between words. Defaults to single space. #' @return A character vector with the same length as `string`/`start`/`end`. #' @export #' @examples #' sentences <- c("Jane saw a cat", "Jane sat down") #' word(sentences, 1) #' word(sentences, 2) #' word(sentences, -1) #' word(sentences, 2, -1) #' #' # Also vectorised over start and end #' word(sentences[1], 1:3, -1) #' word(sentences[1], 1, 1:4) #' #' # Can define words by other separators #' str <- 'abc.def..123.4568.999' #' word(str, 1, sep = fixed('..')) #' word(str, 2, sep = fixed('..')) word <- function(string, start = 1L, end = start, sep = fixed(" ")) { args <- vctrs::vec_recycle_common(string = string, start = start, end = end) string <- args$string start <- args$start end <- args$end breaks <- str_locate_all(string, sep) words <- lapply(breaks, invert_match) # Convert negative values into actual positions len <- vapply(words, nrow, integer(1)) neg_start <- !is.na(start) & start < 0L start[neg_start] <- start[neg_start] + len[neg_start] + 1L neg_end <- !is.na(end) & end < 0L end[neg_end] <- end[neg_end] + len[neg_end] + 1L # Replace indexes past end with NA start[start > len] <- NA end[end > len] <- NA # To return all words when trying to extract more words than available start[start < 1L] <- 1 # Extract locations starts <- mapply(function(word, loc) word[loc, "start"], words, start) ends <- mapply(function(word, loc) word[loc, "end"], words, end) copy_names(string, str_sub(string, starts, ends)) } ================================================ FILE: R/wrap.R ================================================ #' Wrap words into nicely formatted paragraphs #' #' Wrap words into paragraphs, minimizing the "raggedness" of the lines #' (i.e. the variation in length line) using the Knuth-Plass algorithm. #' #' @inheritParams str_detect #' @param width Positive integer giving target line width (in number of #' characters). A width less than or equal to 1 will put each word on its #' own line. #' @param indent,exdent A non-negative integer giving the indent for the #' first line (`indent`) and all subsequent lines (`exdent`). #' @param whitespace_only A boolean. #' * If `TRUE` (the default) wrapping will only occur at whitespace. #' * If `FALSE`, can break on any non-word character (e.g. `/`, `-`). #' @return A character vector the same length as `string`. #' @seealso [stringi::stri_wrap()] for the underlying implementation. #' @export #' @examples #' thanks_path <- file.path(R.home("doc"), "THANKS") #' thanks <- str_c(readLines(thanks_path), collapse = "\n") #' thanks <- word(thanks, 1, 3, fixed("\n\n")) #' cat(str_wrap(thanks), "\n") #' cat(str_wrap(thanks, width = 40), "\n") #' cat(str_wrap(thanks, width = 60, indent = 2), "\n") #' cat(str_wrap(thanks, width = 60, exdent = 2), "\n") #' cat(str_wrap(thanks, width = 0, exdent = 2), "\n") str_wrap <- function( string, width = 80, indent = 0, exdent = 0, whitespace_only = TRUE ) { check_number_decimal(width) if (width <= 0) { width <- 1 } check_number_whole(indent) check_number_whole(exdent) check_bool(whitespace_only) out <- stri_wrap( string, width = width, indent = indent, exdent = exdent, whitespace_only = whitespace_only, simplify = FALSE ) out <- vapply(out, str_c, collapse = "\n", character(1)) copy_names(string, out) } ================================================ FILE: README.Rmd ================================================ --- output: github_document --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) library(stringr) ``` # stringr
[![CRAN status](https://www.r-pkg.org/badges/version/stringr)](https://cran.r-project.org/package=stringr) [![R-CMD-check](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/tidyverse/stringr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/stringr?branch=main) [![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) ## Overview Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If you're not familiar with strings, the best place to start is the [chapter on strings](https://r4ds.hadley.nz/strings) in R for Data Science. stringr is built on top of [stringi](https://github.com/gagolews/stringi), which uses the [ICU](https://icu.unicode.org) C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you've mastered stringr, you should find stringi similarly easy to use. ## Installation ```r # The easiest way to get stringr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just stringr: install.packages("stringr") ``` ## Cheatsheet ## Usage All functions in stringr start with `str_` and take a vector of strings as the first argument: ```{r} x <- c("why", "video", "cross", "extra", "deal", "authority") str_length(x) str_c(x, collapse = ", ") str_sub(x, 1, 2) ``` Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression `"[aeiou]"` matches any single character that is a vowel: ```{r} str_subset(x, "[aeiou]") str_count(x, "[aeiou]") ``` There are seven main verbs that work with patterns: * `str_detect(x, pattern)` tells you if there's any match to the pattern: ```{r} str_detect(x, "[aeiou]") ``` * `str_count(x, pattern)` counts the number of patterns: ```{r} str_count(x, "[aeiou]") ``` * `str_subset(x, pattern)` extracts the matching components: ```{r} str_subset(x, "[aeiou]") ``` * `str_locate(x, pattern)` gives the position of the match: ```{r} str_locate(x, "[aeiou]") ``` * `str_extract(x, pattern)` extracts the text of the match: ```{r} str_extract(x, "[aeiou]") ``` * `str_match(x, pattern)` extracts parts of the match defined by parentheses: ```{r} # extract the characters on either side of the vowel str_match(x, "(.)[aeiou](.)") ``` * `str_replace(x, pattern, replacement)` replaces the matches with new text: ```{r} str_replace(x, "[aeiou]", "?") ``` * `str_split(x, pattern)` splits up a string into multiple pieces: ```{r} str_split(c("a,b", "c,d,e"), ",") ``` As well as regular expressions (the default), there are three other pattern matching engines: * `fixed()`: match exact bytes * `coll()`: match human letters * `boundary()`: match boundaries ## RStudio Addin The [RegExplain RStudio addin](https://www.garrickadenbuie.com/project/regexplain/) provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions. This addin can easily be installed with devtools: ```r # install.packages("devtools") devtools::install_github("gadenbuie/regexplain") ``` ## Compared to base R R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. * Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe: ```{r} letters %>% .[1:10] %>% str_pad(3, "right") %>% str_c(letters[2:11]) ``` * Simplifies string operations by eliminating options that you don't need 95% of the time. * Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. Learn more in `vignette("from-base")` ================================================ FILE: README.md ================================================ # stringr [![CRAN status](https://www.r-pkg.org/badges/version/stringr)](https://cran.r-project.org/package=stringr) [![R-CMD-check](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/tidyverse/stringr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/stringr?branch=main) [![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) ## Overview Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the [chapter on strings](https://r4ds.hadley.nz/strings) in R for Data Science. stringr is built on top of [stringi](https://github.com/gagolews/stringi), which uses the [ICU](https://icu.unicode.org) C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you’ve mastered stringr, you should find stringi similarly easy to use. ## Installation ``` r # The easiest way to get stringr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just stringr: install.packages("stringr") ``` ## Cheatsheet ## Usage All functions in stringr start with `str_` and take a vector of strings as the first argument: ``` r x <- c("why", "video", "cross", "extra", "deal", "authority") str_length(x) #> [1] 3 5 5 5 4 9 str_c(x, collapse = ", ") #> [1] "why, video, cross, extra, deal, authority" str_sub(x, 1, 2) #> [1] "wh" "vi" "cr" "ex" "de" "au" ``` Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression `"[aeiou]"` matches any single character that is a vowel: ``` r str_subset(x, "[aeiou]") #> [1] "video" "cross" "extra" "deal" "authority" str_count(x, "[aeiou]") #> [1] 0 3 1 2 2 4 ``` There are seven main verbs that work with patterns: - `str_detect(x, pattern)` tells you if there’s any match to the pattern: ``` r str_detect(x, "[aeiou]") #> [1] FALSE TRUE TRUE TRUE TRUE TRUE ``` - `str_count(x, pattern)` counts the number of patterns: ``` r str_count(x, "[aeiou]") #> [1] 0 3 1 2 2 4 ``` - `str_subset(x, pattern)` extracts the matching components: ``` r str_subset(x, "[aeiou]") #> [1] "video" "cross" "extra" "deal" "authority" ``` - `str_locate(x, pattern)` gives the position of the match: ``` r str_locate(x, "[aeiou]") #> start end #> [1,] NA NA #> [2,] 2 2 #> [3,] 3 3 #> [4,] 1 1 #> [5,] 2 2 #> [6,] 1 1 ``` - `str_extract(x, pattern)` extracts the text of the match: ``` r str_extract(x, "[aeiou]") #> [1] NA "i" "o" "e" "e" "a" ``` - `str_match(x, pattern)` extracts parts of the match defined by parentheses: ``` r # extract the characters on either side of the vowel str_match(x, "(.)[aeiou](.)") #> [,1] [,2] [,3] #> [1,] NA NA NA #> [2,] "vid" "v" "d" #> [3,] "ros" "r" "s" #> [4,] NA NA NA #> [5,] "dea" "d" "a" #> [6,] "aut" "a" "t" ``` - `str_replace(x, pattern, replacement)` replaces the matches with new text: ``` r str_replace(x, "[aeiou]", "?") #> [1] "why" "v?deo" "cr?ss" "?xtra" "d?al" "?uthority" ``` - `str_split(x, pattern)` splits up a string into multiple pieces: ``` r str_split(c("a,b", "c,d,e"), ",") #> [[1]] #> [1] "a" "b" #> #> [[2]] #> [1] "c" "d" "e" ``` As well as regular expressions (the default), there are three other pattern matching engines: - `fixed()`: match exact bytes - `coll()`: match human letters - `boundary()`: match boundaries ## RStudio Addin The [RegExplain RStudio addin](https://www.garrickadenbuie.com/project/regexplain/) provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions. This addin can easily be installed with devtools: ``` r # install.packages("devtools") devtools::install_github("gadenbuie/regexplain") ``` ## Compared to base R R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. - Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe: ``` r letters %>% .[1:10] %>% str_pad(3, "right") %>% str_c(letters[2:11]) #> [1] "a b" "b c" "c d" "d e" "e f" "f g" "g h" "h i" "i j" "j k" ``` - Simplifies string operations by eliminating options that you don’t need 95% of the time. - Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. Learn more in `vignette("from-base")` ================================================ FILE: _pkgdown.yml ================================================ url: https://stringr.tidyverse.org development: mode: auto template: package: tidytemplate bootstrap: 5 includes: in_header: | home: links: - text: Learn more at R4DS href: http://r4ds.hadley.nz/strings.html reference: - title: Pattern matching - subtitle: String contents: - str_count - str_detect - str_escape - str_extract - str_locate - str_match - str_replace - str_remove - str_split - str_starts - modifiers - subtitle: Vector desc: > Unlike other pattern matching functions, these functions operate on the original character vector, not the individual matches. contents: - str_subset - str_which - title: Combining strings contents: - str_c - str_flatten - str_glue - title: Character based contents: - str_dup - str_length - str_pad - str_sub - str_trim - str_trunc - str_wrap - title: Locale aware contents: - str_order - str_equal - case - str_unique - title: Other helpers contents: - invert_match - str_conv - str_like - str_replace_na - str_to_camel - str_view - word - title: Bundled data contents: - "`stringr-data`" news: releases: - text: "Version 1.6.0" href: https://tidyverse.org/blog/2025/11/stringr-1-6-0/ - text: "Version 1.5.0" href: https://www.tidyverse.org/blog/2022/12/stringr-1-5-0/ - text: "Version 1.4.0" href: https://www.tidyverse.org/articles/2019/02/stringr-1-4-0/ - text: "Version 1.3.0" href: https://www.tidyverse.org/articles/2018/02/stringr-1-3-0/ - text: "Version 1.2.0" href: https://blog.rstudio.com/2017/04/12/tidyverse-updates/ - text: "Version 1.1.0" href: https://blog.rstudio.com/2016/08/24/stringr-1-1-0/ - text: "Version 1.0.0" href: https://blog.rstudio.com/2015/05/05/stringr-1-0-0/ ================================================ FILE: air.toml ================================================ ================================================ FILE: codecov.yml ================================================ comment: false coverage: status: project: default: target: auto threshold: 1% informational: true patch: default: target: auto threshold: 1% informational: true ================================================ FILE: cran-comments.md ================================================ ## R CMD check results 0 errors | 0 warnings | 0 note ## revdepcheck results We checked 2390 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package. * We saw 9 new problems * We failed to check 2 packages We've been working with maintainers for over a month to get fixes to CRAN in a timely manner. You can track our efforts at . ================================================ FILE: data-raw/harvard-sentences.txt ================================================ The birch canoe slid on the smooth planks. Glue the sheet to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of steady work faced us. A large size in stockings is hard to sell. The boy was there when the sun rose. A rod is used to catch pink salmon. The source of the huge river is the clear spring. Kick the ball straight and follow through. Help the woman get back to her feet. A pot of tea helps to pass the evening. Smoky fires lack flame and heat. The soft cushion broke the man's fall. The salt breeze came across from the sea. The girl at the booth sold fifty bonds. The small pup gnawed a hole in the sock. The fish twisted and turned on the bent hook. Press the pants and sew a button on the vest. The swan dive was far short of perfect. The beauty of the view stunned the young boy. Two blue fish swam in the tank. Her purse was full of useless trash. The colt reared and threw the tall rider. It snowed, rained, and hailed the same morning. Read verse out loud for pleasure. Hoist the load to your left shoulder. Take the winding path to reach the lake. Note closely the size of the gas tank. Wipe the grease off his dirty face. Mend the coat before you go out. The wrist was badly strained and hung limp. The stray cat gave birth to kittens. The young girl gave no clear response. The meal was cooked before the bell rang. What joy there is in living. A king ruled the state in the early days. The ship was torn apart on the sharp reef. Sickness kept him home the third week. The wide road shimmered in the hot sun. The lazy cow lay in the cool grass. Lift the square stone over the fence. The rope will bind the seven books at once. Hop over the fence and plunge in. The friendly gang left the drug store. Mesh wire keeps chicks inside. The frosty air passed through the coat. The crooked maze failed to fool the mouse. Adding fast leads to wrong sums. The show was a flop from the very start. A saw is a tool used for making boards. The wagon moved on well oiled wheels. March the soldiers past the next hill. A cup of sugar makes sweet fudge. Place a rosebush near the porch steps. Both lost their lives in the raging storm. We talked of the side show in the circus. Use a pencil to write the first draft. He ran half way to the hardware store. The clock struck to mark the third period. A small creek cut across the field. Cars and busses stalled in snow drifts. The set of china hit the floor with a crash. This is a grand season for hikes on the road. The dune rose from the edge of the water. Those words were the cue for the actor to leave. A yacht slid around the point into the bay. The two met while playing on the sand. The ink stain dried on the finished page. The walled town was seized without a fight. The lease ran out in sixteen weeks. A tame squirrel makes a nice pet. The horn of the car woke the sleeping cop. The heart beat strongly and with firm strokes. The pearl was worn in a thin silver ring. The fruit peel was cut in thick slices. The Navy attacked the big task force. See the cat glaring at the scared mouse. There are more than two factors here. The hat brim was wide and too droopy. The lawyer tried to lose his case. The grass curled around the fence post. Cut the pie into large parts. Men strive but seldom get rich. Always close the barn door tight. He lay prone and hardly moved a limb. The slush lay deep along the street. A wisp of cloud hung in the blue air. A pound of sugar costs more than eggs. The fin was sharp and cut the clear water. The play seems dull and quite stupid. Bail the boat to stop it from sinking. The term ended in late june that year. A Tusk is used to make costly gifts. Ten pins were set in order. The bill was paid every third week. Oak is strong and also gives shade. Cats and Dogs each hate the other. The pipe began to rust while new. Open the crate but don't break the glass. Add the sum to the product of these three. Thieves who rob friends deserve jail. The ripe taste of cheese improves with age. Act on these orders with great speed. The hog crawled under the high fence. Move the vat over the hot fire. The bark of the pine tree was shiny and dark. Leaves turn brown and yellow in the fall. The pennant waved when the wind blew. Split the log with a quick, sharp blow. Burn peat after the logs give out. He ordered peach pie with ice cream. Weave the carpet on the right hand side. Hemp is a weed found in parts of the tropics. A lame back kept his score low. We find joy in the simplest things. Type out three lists of orders. The harder he tried the less he got done. The boss ran the show with a watchful eye. The cup cracked and spilled its contents. Paste can cleanse the most dirty brass. The slang word for raw whiskey is booze. It caught its hind paw in a rusty trap. The wharf could be seen at the farther shore. Feel the heat of the weak dying flame. The tiny girl took off her hat. A cramp is no small danger on a swim. He said the same phrase thirty times. Pluck the bright rose without leaves. Two plus seven is less than ten. The glow deepened in the eyes of the sweet girl. Bring your problems to the wise chief. Write a fond note to the friend you cherish. Clothes and lodging are free to new men. We frown when events take a bad turn. Port is a strong wine with a smoky taste. The young kid jumped the rusty gate. Guess the result from the first scores. A salt pickle tastes fine with ham. The just claim got the right verdict. Those thistles bend in a high wind. Pure bred poodles have curls. The tree top waved in a graceful way. The spot on the blotter was made by green ink. Mud was spattered on the front of his white shirt. The cigar burned a hole in the desk top. The empty flask stood on the tin tray. A speedy man can beat this track mark. He broke a new shoelace that day. The coffee stand is too high for the couch. The urge to write short stories is rare. The pencils have all been used. The pirates seized the crew of the lost ship. We tried to replace the coin but failed. She sewed the torn coat quite neatly. The sofa cushion is red and of light weight. The jacket hung on the back of the wide chair. At that high level the air is pure. Drop the two when you add the figures. A filing case is now hard to buy. An abrupt start does not win the prize. Wood is best for making toys and blocks. The office paint was a dull, sad tan. He knew the skill of the great young actress. A rag will soak up spilled water. A shower of dirt fell from the hot pipes. Steam hissed from the broken valve. The child almost hurt the small dog. There was a sound of dry leaves outside. The sky that morning was clear and bright blue. Torn scraps littered the stone floor. Sunday is the best part of the week. The doctor cured him with these pills. The new girl was fired today at noon. They felt gay when the ship arrived in port. Add the store's account to the last cent. Acid burns holes in wool cloth. Fairy tales should be fun to write. Eight miles of woodland burned to waste. The third act was dull and tired the players. A young child should not suffer fright. Add the column and put the sum here. We admire and love a good cook. There the flood mark is ten inches. He carved a head from the round block of marble. She has a smart way of wearing clothes. The fruit of a fig tree is apple shaped. Corn cobs can be used to kindle a fire. Where were they when the noise started. The paper box is full of thumb tacks. Sell your gift to a buyer at a good gain. The tongs lay beside the ice pail. The petals fall with the next puff of wind. Bring your best compass to the third class. They could laugh although they were sad. Farmers came in to thresh the oat crop. The brown house was on fire to the attic. The lure is used to catch trout and flounder. Float the soap on top of the bath water. A blue crane is a tall wading bird. A fresh start will work such wonders. The club rented the rink for the fifth night. After the dance, they went straight home. The hostess taught the new maid to serve. He wrote his last novel there at the inn. Even the worst will beat his low score. The cement had dried when he moved it. The loss of the second ship was hard to take. The fly made its way along the wall. Do that with a wooden stick. Live wires should be kept covered. The large house had hot water taps. It is hard to erase blue or red ink. Write at once or you may forget it. The doorknob was made of bright clean brass. The wreck occurred by the bank on Main Street. A pencil with black lead writes best. Coax a young calf to drink from a bucket. Schools for ladies teach charm and grace. The lamp shone with a steady green flame. They took the axe and the saw to the forest. The ancient coin was quite dull and worn. The shaky barn fell with a loud crash. Jazz and swing fans like fast music. Rake the rubbish up and then burn it. Slash the gold cloth into fine ribbons. Try to have the court decide the case. They are pushed back each time they attack. He broke his ties with groups of former friends. They floated on the raft to sun their white backs. The map had an X that meant nothing. Whitings are small fish caught in nets. Some ads serve to cheat buyers. Jerk the rope and the bell rings weakly. A waxed floor makes us lose balance. Madam, this is the best brand of corn. On the islands the sea breeze is soft and mild. The play began as soon as we sat down. This will lead the world to more sound and fury. Add salt before you fry the egg. The rush for funds reached its peak Tuesday. The birch looked stark white and lonesome. The box is held by a bright red snapper. To make pure ice, you freeze water. The first worm gets snapped early. Jump the fence and hurry up the bank. Yell and clap as the curtain slides back. They are men who walk the middle of the road. Both brothers wear the same size. In some form or other we need fun. The prince ordered his head chopped off. The houses are built of red clay bricks. Ducks fly north but lack a compass. Fruit flavors are used in fizz drinks. These pills do less good than others. Canned pears lack full flavor. The dark pot hung in the front closet. Carry the pail to the wall and spill it there. The train brought our hero to the big town. We are sure that one war is enough. Gray paint stretched for miles around. The rude laugh filled the empty room. High seats are best for football fans. Tea served from the brown jug is tasty. A dash of pepper spoils beef stew. A zestful food is the hot-cross bun. The horse trotted around the field at a brisk pace. Find the twin who stole the pearl necklace. Cut the cord that binds the box tightly. The red tape bound the smuggled food. Look in the corner to find the tan shirt. The cold drizzle will halt the bond drive. Nine men were hired to dig the ruins. The junk yard had a mouldy smell. The flint sputtered and lit a pine torch. Soak the cloth and drown the sharp odor. The shelves were bare of both jam or crackers. A joy to every child is the swan boat. All sat frozen and watched the screen. A cloud of dust stung his tender eyes. To reach the end he needs much courage. Shape the clay gently into block form. A ridge on a smooth surface is a bump or flaw. Hedge apples may stain your hands green. Quench your thirst, then eat the crackers. Tight curls get limp on rainy days. The mute muffled the high tones of the horn. The gold ring fits only a pierced ear. The old pan was covered with hard fudge. Watch the log float in the wide river. The node on the stalk of wheat grew daily. The heap of fallen leaves was set on fire. Write fast if you want to finish early. His shirt was clean but one button was gone. The barrel of beer was a brew of malt and hops. Tin cans are absent from store shelves. Slide the box into that empty space. The plant grew large and green in the window. The beam dropped down on the workman's head. Pink clouds floated with the breeze. She danced like a swan, tall and graceful. The tube was blown and the tire flat and useless. It is late morning on the old wall clock. Let's all join as we sing the last chorus. The last switch cannot be turned off. The fight will end in just six minutes. The store walls were lined with colored frocks. The peace league met to discuss their plans. The rise to fame of a person takes luck. Paper is scarce, so write with much care. The quick fox jumped on the sleeping cat. The nozzle of the fire hose was bright brass. Screw the round cap on as tight as needed. Time brings us many changes. The purple tie was ten years old. Men think and plan and sometimes act. Fill the ink jar with sticky glue. He smoke a big pipe with strong contents. We need grain to keep our mules healthy. Pack the records in a neat thin case. The crunch of feet in the snow was the only sound. The copper bowl shone in the sun's rays. Boards will warp unless kept dry. The plush chair leaned against the wall. Glass will clink when struck by metal. Bathe and relax in the cool green grass. Nine rows of soldiers stood in a line. The beach is dry and shallow at low tide. The idea is to sew both edges straight. The kitten chased the dog down the street. Pages bound in cloth make a book. Try to trace the fine lines of the painting. Women form less than half of the group. The zones merge in the central part of town. A gem in the rough needs work to polish. Code is used when secrets are sent. Most of the news is easy for us to hear. He used the lathe to make brass objects. The vane on top of the pole revolved in the wind. Mince pie is a dish served to children. The clan gathered on each dull night. Let it burn, it gives us warmth and comfort. A castle built from sand fails to endure. A child's wit saved the day for us. Tack the strip of carpet to the worn floor. Next Tuesday we must vote. Pour the stew from the pot into the plate. Each penny shone like new. The man went to the woods to gather sticks. The dirt piles were lines along the road. The logs fell and tumbled into the clear stream. Just hoist it up and take it away. A ripe plum is fit for a king's palate. Our plans right now are hazy. Brass rings are sold by these natives. It takes a good trap to capture a bear. Feed the white mouse some flower seeds. The thaw came early and freed the stream. He took the lead and kept it the whole distance. The key you designed will fit the lock. Plead to the council to free the poor thief. Better hash is made of rare beef. This plank was made for walking on . The lake sparkled in the red hot sun. He crawled with care along the ledge. Tend the sheep while the dog wanders. It takes a lot of help to finish these. Mark the spot with a sign painted red. Take two shares as a fair profit. The fur of cats goes by many names. North winds bring colds and fevers. He asks no person to vouch for him. Go now and come here later. A sash of gold silk will trim her dress. Soap can wash most dirt away. That move means the game is over. He wrote down a long list of items. A siege will crack the strong defense. Grape juice and water mix well. Roads are paved with sticky tar. Fake stones shine but cost little. The drip of the rain made a pleasant sound. Smoke poured out of every crack. Serve the hot rum to the tired heroes. Much of the story makes good sense. The sun came up to light the eastern sky. Heave the line over the port side. A lathe cuts and trims any wood. It's a dense crowd in two distinct ways. His hip struck the knee of the next player. The stale smell of old beer lingers. The desk was firm on the shaky floor. It takes heat to bring out the odor. Beef is scarcer than some lamb. Raise the sail and steer the ship northward. A cone costs five cents on Mondays. A pod is what peas always grow in. Jerk that dart from the cork target. No cement will hold hard wood. We now have a new base for shipping. A list of names is carved around the base. The sheep were led home by a dog. Three for a dime, the young peddler cried. The sense of smell is better than that of touch. No hardship seemed to make him sad. Grace makes up for lack of beauty. Nudge gently but wake her now. The news struck doubt into restless minds. Once we stood beside the shore. A chink in the wall allowed a draft to blow. Fasten two pins on each side. A cold dip restores health and zest. He takes the oath of office each March. The sand drifts over the sills of the old house. The point of the steel pen was bent and twisted. There is a lag between thought and act. Seed is needed to plant the spring corn. Draw the chart with heavy black lines. The boy owed his pal thirty cents. The chap slipped into the crowd and was lost. Hats are worn to tea and not to dinner. The ramp led up to the wide highway. Beat the dust from the rug onto the lawn. Say it slowly but make it ring clear. The straw nest housed five robins. Screen the porch with woven straw mats. This horse will nose his way to the finish. The dry wax protects the deep scratch. He picked up the dice for a second roll. These coins will be needed to pay his debt. The nag pulled the frail cart along. Twist the valve and release hot steam. The vamp of the shoe had a gold buckle. The smell of burned rags itches my nose. New pants lack cuffs and pockets. The marsh will freeze when cold enough. They slice the sausage thin with a knife. The bloom of the rose lasts a few days. A gray mare walked before the colt. Breakfast buns are fine with a hot drink. Bottles hold four kinds of rum. The man wore a feather in his felt hat. He wheeled the bike past the winding road. Drop the ashes on the worn old rug. The desk and both chairs were painted tan. Throw out the used paper cup and plate. A clean neck means a neat collar. The couch cover and hall drapes were blue. The stems of the tall glasses cracked and broke. The wall phone rang loud and often. The clothes dried on a thin wooden rack. Turn out the lantern which gives us light. The cleat sank deeply into the soft turf. The bills were mailed promptly on the tenth of the month. To have is better than to wait and hope. The price is fair for a good antique clock. The music played on while they talked. Dispense with a vest on a day like this. The bunch of grapes was pressed into wine. He sent the figs, but kept the ripe cherries. The hinge on the door creaked with old age. The screen before the fire kept in the sparks. Fly by night and you waste little time. Thick glasses helped him read the print. Birth and death marks the limits of life. The chair looked strong but had no bottom. The kite flew wildly in the high wind. A fur muff is stylish once more. The tin box held priceless stones. We need an end of all such matter. The case was puzzling to the old and wise. The bright lanterns were gay on the dark lawn. We don't get much money but we have fun. The youth drove with zest, but little skill. Five years he lived with a shaggy dog. A fence cuts through the corner lot. The way to save money is not to spend much. Shut the hatch before the waves push it in. The odor of spring makes young hearts jump. Crack the walnut with your sharp side teeth. He offered proof in the form of a large chart. Send the stuff in a thick paper bag. A quart of milk is water for the most part. They told wild tales to frighten him. The three story house was built of stone. In the rear of the ground floor was a large passage. A man in a blue sweater sat at the desk. Oats are a food eaten by horse and man. Their eyelids droop for want of sleep. A sip of tea revives his tired friend. There are many ways to do these things. Tuck the sheet under the edge of the mat. A force equal to that would move the earth. We like to see clear weather. The work of the tailor is seen on each side. Take a chance and win a china doll. Shake the dust from your shoes, stranger. She was kind to sick old people. The square wooden crate was packed to be shipped. The dusty bench stood by the stone wall. We dress to suit the weather of most days. Smile when you say nasty words. A bowl of rice is free with chicken stew. The water in this well is a source of good health. Take shelter in this tent, but keep still. That guy is the writer of a few banned books. The little tales they tell are false. The door was barred, locked, and bolted as well. Ripe pears are fit for a queen's table. A big wet stain was on the round carpet. The kite dipped and swayed, but stayed aloft. The pleasant hours fly by much too soon. The room was crowded with a wild mob. This strong arm shall shield your honor. She blushed when he gave her a white orchid. The beetle droned in the hot June sun. Press the pedal with your left foot. Neat plans fail without luck. The black trunk fell from the landing. The bank pressed for payment of the debt. The theft of the pearl pin was kept secret. Shake hands with this friendly child. The vast space stretched into the far distance. A rich farm is rare in this sandy waste. His wide grin earned many friends. Flax makes a fine brand of paper. Hurdle the pit with the aid of a long pole. A strong bid may scare your partner stiff. Even a just cause needs power to win. Peep under the tent and see the clowns. The leaf drifts along with a slow spin. Cheap clothes are flashy but don't last. A thing of small note can cause despair. Flood the mails with requests for this book. A thick coat of black paint covered all. The pencil was cut to be sharp at both ends. Those last words were a strong statement. He wrote his name boldly at the top of the sheet. Dill pickles are sour but taste fine. Down that road is the way to the grain farmer. Either mud or dust are found at all times. The best method is to fix it in place with clips. If you mumble your speech will be lost. At night the alarm roused him from a deep sleep. Read just what the meter says. Fill your pack with bright trinkets for the poor. The small red neon lamp went out. Clams are small, round, soft, and tasty. The fan whirled its round blades softly. The line where the edges join was clean. Breathe deep and smell the piny air. It matters not if he reads these words or those. A brown leather bag hung from its strap. A toad and a frog are hard to tell apart. A white silk jacket goes with any shoes. A break in the dam almost caused a flood. Paint the sockets in the wall dull green. The child crawled into the dense grass. Bribes fail where honest men work. Trample the spark, else the flames will spread. The hilt of the sword was carved with fine designs. A round hole was drilled through the thin board. Footprints showed the path he took up the beach. She was waiting at my front lawn. A vent near the edge brought in fresh air. Prod the old mule with a crooked stick. It is a band of steel three inches wide. The pipe ran almost the length of the ditch. It was hidden from sight by a mass of leaves and shrubs. The weight of the package was seen on the high scale. Wake and rise, and step into the green outdoors. The green light in the brown box flickered. The brass tube circled the high wall. The lobes of her ears were pierced to hold rings. Hold the hammer near the end to drive the nail. Next Sunday is the twelfth of the month. Every word and phrase he speaks is true. He put his last cartridge into the gun and fired. They took their kids from the public school. Drive the screw straight into the wood. Keep the hatch tight and the watch constant. Sever the twine with a quick snip of the knife. Paper will dry out when wet. Slide the catch back and open the desk. Help the weak to preserve their strength. A sullen smile gets few friends. Stop whistling and watch the boys march. Jerk the cord, and out tumbles the gold. Slide the tray across the glass top. The cloud moved in a stately way and was gone. Light maple makes for a swell room. Set the piece here and say nothing. Dull stories make her laugh. A stiff cord will do to fasten your shoe. Get the trust fund to the bank early. Choose between the high road and the low. A plea for funds seems to come again. He lent his coat to the tall gaunt stranger. There is a strong chance it will happen once more. The duke left the park in a silver coach. Greet the new guests and leave quickly. When the frost has come it is time for turkey. Sweet words work better than fierce. A thin stripe runs down the middle. A six comes up more often than a ten. Lush ferns grow on the lofty rocks. The ram scared the school children off. The team with the best timing looks good. The farmer swapped his horse for a brown ox. Sit on the perch and tell the others what to do. A steep trail is painful for our feet. The early phase of life moves fast. Green moss grows on the northern side. Tea in thin china has a sweet taste. Pitch the straw through the door of the stable. The latch on the back gate needed a nail. The goose was brought straight from the old market. The sink is the thing in which we pile dishes. A whiff of it will cure the most stubborn cold. The facts don't always show who is right. She flaps her cape as she parades the street. The loss of the cruiser was a blow to the fleet. Loop the braid to the left and then over. Plead with the lawyer to drop the lost cause. Calves thrive on tender spring grass. Post no bills on this office wall. Tear a thin sheet from the yellow pad. A cruise in warm waters in a sleek yacht is fun. A streak of color ran down the left edge. It was done before the boy could see it. Crouch before you jump or miss the mark. Pack the kits and don't forget the salt. The square peg will settle in the round hole. Fine soap saves tender skin. Poached eggs and tea must suffice. Bad nerves are jangled by a door slam. Ship maps are different from those for planes. Dimes showered down from all sides. They sang the same tunes at each party. The sky in the west is tinged with orange red. The pods of peas ferment in bare fields. The horse balked and threw the tall rider. The hitch between the horse and cart broke. Pile the coal high in the shed corner. A gold vase is both rare and costly. The knife was hung inside its bright sheath. The rarest spice comes from the far East. The roof should be tilted at a sharp slant. A smatter of French is worse than none. The mule trod the treadmill day and night. The aim of the contest is to raise a great fund. To send it now in large amounts is bad. There is a fine hard tang in salty air. Cod is the main business of the north shore. The slab was hewn from heavy blocks of slate. Dunk the stale biscuits into strong drink. Hang tinsel from both branches. Cap the jar with a tight brass cover. The poor boy missed the boat again. Be sure to set that lamp firmly in the hole. Pick a card and slip it under the pack. A round mat will cover the dull spot. The first part of the plan needs changing. A good book informs of what we ought to know. The mail comes in three batches per day. You cannot brew tea in a cold pot. Dots of light betrayed the black cat. Put the chart on the mantel and tack it down. The night shift men rate extra pay. The red paper brightened the dim stage. See the player scoot to third base. Slide the bill between the two leaves. Many hands help get the job done. We don't like to admit our small faults. No doubt about the way the wind blows. Dig deep in the earth for pirate's gold. The steady drip is worse than a drenching rain. A flat pack takes less luggage space. Green ice frosted the punch bowl. A stuffed chair slipped from the moving van. The stitch will serve but needs to be shortened. A thin book fits in the side pocket. The gloss on top made it unfit to read. The hail pattered on the burnt brown grass. Seven seals were stamped on great sheets. Our troops are set to strike heavy blows. The store was jammed before the sale could start. It was a bad error on the part of the new judge. One step more and the board will collapse. Take the match and strike it against your shoe. The pot boiled but the contents failed to jell. The baby puts his right foot in his mouth. The bombs left most of the town in ruins. Stop and stare at the hard working man. The streets are narrow and full of sharp turns. The pup jerked the leash as he saw a feline shape. Open your book to the first page. Fish evade the net and swim off. Dip the pail once and let it settle. Will you please answer that phone. The big red apple fell to the ground. The curtain rose and the show was on. The young prince became heir to the throne. He sent the boy on a short errand. Leave now and you will arrive on time. The corner store was robbed last night. A gold ring will please most any girl. The long journey home took a year. She saw a cat in the neighbor's house. A pink shell was found on the sandy beach. Small children came to see him. The grass and bushes were wet with dew. The blind man counted his old coins. A severe storm tore down the barn. She called his name many times. When you hear the bell, come quickly. ================================================ FILE: data-raw/samples.R ================================================ words <- rcorpora::corpora("words/common")$commonWords fruit <- rcorpora::corpora("foods/fruits")$fruits html <- read_html("https://harvardsentences.com") html %>% html_elements("li") %>% html_text() %>% iconv(to = "ASCII//translit") %>% writeLines("data-raw/harvard-sentences.txt") sentences <- readr::read_lines("data-raw/harvard-sentences.txt") usethis::use_data(words, overwrite = TRUE) usethis::use_data(fruit, overwrite = TRUE) usethis::use_data(sentences, overwrite = TRUE) ================================================ FILE: inst/htmlwidgets/lib/str_view.css ================================================ .str_view ul { font-size: 16px; } .str_view ul, .str_view li { list-style: none; padding: 0; margin: 0.5em 0; } .str_view .match { border: 1px solid #ccc; background-color: #eee; border-color: #ccc; border-radius: 3px; } .str_view .special { background-color: red; } ================================================ FILE: inst/htmlwidgets/str_view.js ================================================ HTMLWidgets.widget({ name: 'str_view', type: 'output', initialize: function(el, width, height) { }, renderValue: function(el, x, instance) { el.innerHTML = x.html; }, resize: function(el, width, height, instance) { } }); ================================================ FILE: inst/htmlwidgets/str_view.yaml ================================================ dependencies: - name: str_view version: 0.1.0 src: htmlwidgets/lib/ stylesheet: str_view.css ================================================ FILE: man/case.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/case.R \name{case} \alias{case} \alias{str_to_upper} \alias{str_to_lower} \alias{str_to_title} \alias{str_to_sentence} \title{Convert string to upper case, lower case, title case, or sentence case} \usage{ str_to_upper(string, locale = "en") str_to_lower(string, locale = "en") str_to_title(string, locale = "en") str_to_sentence(string, locale = "en") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} } \value{ A character vector the same length as \code{string}. } \description{ \itemize{ \item \code{str_to_upper()} converts to upper case. \item \code{str_to_lower()} converts to lower case. \item \code{str_to_title()} converts to title case, where only the first letter of each word is capitalized. \item \code{str_to_sentence()} convert to sentence case, where only the first letter of sentence is capitalized. } } \examples{ dog <- "The quick brown dog" str_to_upper(dog) str_to_lower(dog) str_to_title(dog) str_to_sentence("the quick brown dog") # Locale matters! str_to_upper("i") # English str_to_upper("i", "tr") # Turkish } ================================================ FILE: man/invert_match.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/locate.R \name{invert_match} \alias{invert_match} \title{Switch location of matches to location of non-matches} \usage{ invert_match(loc) } \arguments{ \item{loc}{matrix of match locations, as from \code{\link[=str_locate_all]{str_locate_all()}}} } \value{ numeric match giving locations of non-matches } \description{ Invert a matrix of match locations to match the opposite of what was previously matched. } \examples{ numbers <- "1 and 2 and 4 and 456" num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) text_loc <- invert_match(num_loc) str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) } ================================================ FILE: man/modifiers.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/modifiers.R \name{modifiers} \alias{modifiers} \alias{fixed} \alias{coll} \alias{regex} \alias{boundary} \title{Control matching behaviour with modifier functions} \usage{ fixed(pattern, ignore_case = FALSE) coll(pattern, ignore_case = FALSE, locale = "en", ...) regex( pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ... ) boundary( type = c("character", "line_break", "sentence", "word"), skip_word_none = NA, ... ) } \arguments{ \item{pattern}{Pattern to modify behaviour.} \item{ignore_case}{Should case differences be ignored in the match? For \code{fixed()}, this uses a simple algorithm which assumes a one-to-one mapping between upper and lower case letters.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{...}{Other less frequently used arguments passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}, \code{\link[stringi:stri_opts_regex]{stringi::stri_opts_regex()}}, or \code{\link[stringi:stri_opts_brkiter]{stringi::stri_opts_brkiter()}}} \item{multiline}{If \code{TRUE}, \code{$} and \code{^} match the beginning and end of each line. If \code{FALSE}, the default, only match the start and end of the input.} \item{comments}{If \code{TRUE}, white space and comments beginning with \verb{#} are ignored. Escape literal spaces with \verb{\\\\ }.} \item{dotall}{If \code{TRUE}, \code{.} will also match line terminators.} \item{type}{Boundary type to detect. \describe{ \item{\code{character}}{Every character is a boundary.} \item{\code{line_break}}{Boundaries are places where it is acceptable to have a line break in the current locale.} \item{\code{sentence}}{The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (\href{https://www.unicode.org/reports/tr29/#Sentence_Boundaries}{details}).} \item{\code{word}}{The beginnings and ends of words are boundaries.} }} \item{skip_word_none}{Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default \code{NA} will skip such "words" only when splitting on \code{word} boundaries.} } \value{ A stringr modifier object, i.e. a character vector with parent S3 class \code{stringr_pattern}. } \description{ Modifier functions control the meaning of the \code{pattern} argument to stringr functions: \itemize{ \item \code{boundary()}: Match boundaries between things. \item \code{coll()}: Compare strings using standard Unicode collation rules. \item \code{fixed()}: Compare literal bytes. \item \code{regex()} (the default): Uses ICU regular expressions. } } \examples{ pattern <- "a.b" strings <- c("abb", "a.b") str_detect(strings, pattern) str_detect(strings, fixed(pattern)) str_detect(strings, coll(pattern)) # coll() is useful for locale-aware case-insensitive matching i <- c("I", "\u0130", "i") i str_detect(i, fixed("i", TRUE)) str_detect(i, coll("i", TRUE)) str_detect(i, coll("i", TRUE, locale = "tr")) # Word boundaries words <- c("These are some words.") str_count(words, boundary("word")) str_split(words, " ")[[1]] str_split(words, boundary("word"))[[1]] # Regular expression variations str_extract_all("The Cat in the Hat", "[a-z]+") str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) str_extract_all("a\nb\nc", "^.") str_extract_all("a\nb\nc", regex("^.", multiline = TRUE)) str_extract_all("a\nb\nc", "a.") str_extract_all("a\nb\nc", regex("a.", dotall = TRUE)) } ================================================ FILE: man/pipe.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.R \name{\%>\%} \alias{\%>\%} \title{Pipe operator} \usage{ lhs \%>\% rhs } \description{ Pipe operator } \keyword{internal} ================================================ FILE: man/str_c.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/c.R \name{str_c} \alias{str_c} \title{Join multiple strings into one string} \usage{ str_c(..., sep = "", collapse = NULL) } \arguments{ \item{...}{One or more character vectors. \code{NULL}s are removed; scalar inputs (vectors of length 1) are recycled to the common length of vector inputs. Like most other R functions, missing values are "infectious": whenever a missing value is combined with another string the result will always be missing. Use \code{\link[dplyr:coalesce]{dplyr::coalesce()}} or \code{\link[=str_replace_na]{str_replace_na()}} to convert to the desired value.} \item{sep}{String to insert between input vectors.} \item{collapse}{Optional string used to combine output into single string. Generally better to use \code{\link[=str_flatten]{str_flatten()}} if you needed this behaviour.} } \value{ If \code{collapse = NULL} (the default) a character vector with length equal to the longest input. If \code{collapse} is a string, a character vector of length 1. } \description{ \code{str_c()} combines multiple character vectors into a single character vector. It's very similar to \code{\link[=paste0]{paste0()}} but uses tidyverse recycling and \code{NA} rules. One way to understand how \code{str_c()} works is picture a 2d matrix of strings, where each argument forms a column. \code{sep} is inserted between each column, and then each row is combined together into a single string. If \code{collapse} is set, it's inserted between each row, and then the result is again combined, this time into a single string. } \examples{ str_c("Letter: ", letters) str_c("Letter", letters, sep = ": ") str_c(letters, " is for", "...") str_c(letters[-26], " comes before ", letters[-1]) str_c(letters, collapse = "") str_c(letters, collapse = ", ") # Differences from paste() ---------------------- # Missing inputs give missing outputs str_c(c("a", NA, "b"), "-d") paste0(c("a", NA, "b"), "-d") # Use str_replace_NA to display literal NAs: str_c(str_replace_na(c("a", NA, "b")), "-d") # Uses tidyverse recycling rules \dontrun{str_c(1:2, 1:3)} # errors paste0(1:2, 1:3) str_c("x", character()) paste0("x", character()) } ================================================ FILE: man/str_conv.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/conv.R \name{str_conv} \alias{str_conv} \title{Specify the encoding of a string} \usage{ str_conv(string, encoding) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{encoding}{Name of encoding. See \code{\link[stringi:stri_enc_list]{stringi::stri_enc_list()}} for a complete list.} } \description{ This is a convenient way to override the current encoding of a string. } \examples{ # Example from encoding?stringi::stringi x <- rawToChar(as.raw(177)) x str_conv(x, "ISO-8859-2") # Polish "a with ogonek" str_conv(x, "ISO-8859-1") # Plus-minus } ================================================ FILE: man/str_count.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/count.R \name{str_count} \alias{str_count} \title{Count number of matches} \usage{ str_count(string, pattern = "") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. The empty string, \verb{""``, is equivalent to }boundary("character")`.} } \value{ An integer vector the same length as \code{string}/\code{pattern}. } \description{ Counts the number of times \code{pattern} is found within each element of \code{string.} } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_count(fruit, "a") str_count(fruit, "p") str_count(fruit, "e") str_count(fruit, c("a", "b", "p", "p")) str_count(c("a.", "...", ".a.a"), ".") str_count(c("a.", "...", ".a.a"), fixed(".")) } \seealso{ \code{\link[stringi:stri_count]{stringi::stri_count()}} which this function wraps. \code{\link[=str_locate]{str_locate()}}/\code{\link[=str_locate_all]{str_locate_all()}} to locate position of matches } ================================================ FILE: man/str_detect.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/detect.R \name{str_detect} \alias{str_detect} \title{Detect the presence/absence of a match} \usage{ str_detect(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ A logical vector the same length as \code{string}/\code{pattern}. } \description{ \code{str_detect()} returns a logical vector with \code{TRUE} for each element of \code{string} that matches \code{pattern} and \code{FALSE} otherwise. It's equivalent to \code{grepl(pattern, string)}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_detect(fruit, "a") str_detect(fruit, "^a") str_detect(fruit, "a$") str_detect(fruit, "b") str_detect(fruit, "[aeiou]") # Also vectorised over pattern str_detect("aecfg", letters) # Returns TRUE if the pattern do NOT match str_detect(fruit, "^p", negate = TRUE) } \seealso{ \code{\link[stringi:stri_detect]{stringi::stri_detect()}} which this function wraps, \code{\link[=str_subset]{str_subset()}} for a convenient wrapper around \code{x[str_detect(x, pattern)]} } ================================================ FILE: man/str_dup.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/dup.R \name{str_dup} \alias{str_dup} \title{Duplicate a string} \usage{ str_dup(string, times, sep = NULL) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{times}{Number of times to duplicate each string.} \item{sep}{String to insert between each duplicate.} } \value{ A character vector the same length as \code{string}/\code{times}. } \description{ \code{str_dup()} duplicates the characters within a string, e.g. \code{str_dup("xy", 3)} returns \code{"xyxyxy"}. } \examples{ fruit <- c("apple", "pear", "banana") str_dup(fruit, 2) str_dup(fruit, 2, sep = " ") str_dup(fruit, 1:3) str_c("ba", str_dup("na", 0:5)) } ================================================ FILE: man/str_equal.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/equal.R \name{str_equal} \alias{str_equal} \title{Determine if two strings are equivalent} \usage{ str_equal(x, y, locale = "en", ignore_case = FALSE, ...) } \arguments{ \item{x, y}{A pair of character vectors.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{ignore_case}{Ignore case when comparing strings?} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ An logical vector the same length as \code{x}/\code{y}. } \description{ This uses Unicode canonicalisation rules, and optionally ignores case. } \examples{ # These two strings encode "a" with an accent in two different ways a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 str_equal(a1, a2) # ohm and omega use different code points but should always be treated # as equal ohm <- "\u2126" omega <- "\u03A9" c(ohm, omega) ohm == omega str_equal(ohm, omega) } \seealso{ \code{\link[stringi:stri_compare]{stringi::stri_cmp_equiv()}} for the underlying implementation. } ================================================ FILE: man/str_escape.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/escape.R \name{str_escape} \alias{str_escape} \title{Escape regular expression metacharacters} \usage{ str_escape(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} } \value{ A character vector the same length as \code{string}. } \description{ This function escapes metacharacter, the characters that have special meaning to the regular expression engine. In most cases you are better off using \code{\link[=fixed]{fixed()}} since it is faster, but \code{str_escape()} is useful if you are composing user provided strings into a pattern. } \examples{ str_detect(c("a", "."), ".") str_detect(c("a", "."), str_escape(".")) } ================================================ FILE: man/str_extract.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/extract.R \name{str_extract} \alias{str_extract} \alias{str_extract_all} \title{Extract the complete match} \usage{ str_extract(string, pattern, group = NULL) str_extract_all(string, pattern, simplify = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. The empty string, \verb{""``, is equivalent to }boundary("character")`.} \item{group}{If supplied, instead of returning the complete match, will return the matched text from the specified capturing group.} \item{simplify}{A boolean. \itemize{ \item \code{FALSE} (the default): returns a list of character vectors. \item \code{TRUE}: returns a character matrix. }} } \value{ \itemize{ \item \code{str_extract()}: an character vector the same length as \code{string}/\code{pattern}. \item \code{str_extract_all()}: a list of character vectors the same length as \code{string}/\code{pattern}. } } \description{ \code{str_extract()} extracts the first complete match from each string, \code{str_extract_all()}extracts all matches from each string. } \examples{ shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") str_extract(shopping_list, "\\\\d") str_extract(shopping_list, "[a-z]+") str_extract(shopping_list, "[a-z]{1,4}") str_extract(shopping_list, "\\\\b[a-z]{1,4}\\\\b") str_extract(shopping_list, "([a-z]+) of ([a-z]+)") str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1) str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2) # Extract all matches str_extract_all(shopping_list, "[a-z]+") str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b") str_extract_all(shopping_list, "\\\\d") # Simplify results into character matrix str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b", simplify = TRUE) str_extract_all(shopping_list, "\\\\d", simplify = TRUE) # Extract all words str_extract_all("This is, suprisingly, a sentence.", boundary("word")) } \seealso{ \code{\link[=str_match]{str_match()}} to extract matched groups; \code{\link[stringi:stri_extract]{stringi::stri_extract()}} for the underlying implementation. } ================================================ FILE: man/str_flatten.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/flatten.R \name{str_flatten} \alias{str_flatten} \alias{str_flatten_comma} \title{Flatten a string} \usage{ str_flatten(string, collapse = "", last = NULL, na.rm = FALSE) str_flatten_comma(string, last = NULL, na.rm = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{collapse}{String to insert between each piece. Defaults to \code{""}.} \item{last}{Optional string to use in place of the final separator.} \item{na.rm}{Remove missing values? If \code{FALSE} (the default), the result will be \code{NA} if any element of \code{string} is \code{NA}.} } \value{ A string, i.e. a character vector of length 1. } \description{ \code{str_flatten()} reduces a character vector to a single string. This is a summary function because regardless of the length of the input \code{x}, it always returns a single string. \code{str_flatten_comma()} is a variation designed specifically for flattening with commas. It automatically recognises if \code{last} uses the Oxford comma and handles the special case of 2 elements. } \examples{ str_flatten(letters) str_flatten(letters, "-") str_flatten(letters[1:3], ", ") # Use last to customise the last component str_flatten(letters[1:3], ", ", " and ") # this almost works if you want an Oxford (aka serial) comma str_flatten(letters[1:3], ", ", ", and ") # but it will always add a comma, even when not necessary str_flatten(letters[1:2], ", ", ", and ") # str_flatten_comma knows how to handle the Oxford comma str_flatten_comma(letters[1:3], ", and ") str_flatten_comma(letters[1:2], ", and ") } ================================================ FILE: man/str_glue.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/glue.R \name{str_glue} \alias{str_glue} \alias{str_glue_data} \title{Interpolation with glue} \usage{ str_glue(..., .sep = "", .envir = parent.frame(), .trim = TRUE) str_glue_data(.x, ..., .sep = "", .envir = parent.frame(), .na = "NA") } \arguments{ \item{...}{[\code{expressions}]\cr Unnamed arguments are taken to be expression string(s) to format. Multiple inputs are concatenated together before formatting. Named arguments are taken to be temporary variables available for substitution. For \code{glue_data()}, elements in \code{...} override the values in \code{.x}.} \item{.sep}{[\code{character(1)}: \sQuote{""}]\cr Separator used to separate elements.} \item{.envir}{[\code{environment}: \code{parent.frame()}]\cr Environment to evaluate each expression in. Expressions are evaluated from left to right. If \code{.x} is an environment, the expressions are evaluated in that environment and \code{.envir} is ignored. If \code{NULL} is passed, it is equivalent to \code{\link[=emptyenv]{emptyenv()}}.} \item{.trim}{[\code{logical(1)}: \sQuote{TRUE}]\cr Whether to trim the input template with \code{\link[glue:trim]{trim()}} or not.} \item{.x}{[\code{listish}]\cr An environment, list, or data frame used to lookup values.} \item{.na}{[\code{character(1)}: \sQuote{NA}]\cr Value to replace \code{NA} values with. If \code{NULL} missing values are propagated, that is an \code{NA} result will cause \code{NA} output. Otherwise the value is replaced by the value of \code{.na}.} } \value{ A character vector with same length as the longest input. } \description{ These functions are wrappers around \code{\link[glue:glue]{glue::glue()}} and \code{\link[glue:glue]{glue::glue_data()}}, which provide a powerful and elegant syntax for interpolating strings with \code{{}}. These wrappers provide a small set of the full options. Use \code{glue()} and \code{glue_data()} directly from glue for more control. } \examples{ name <- "Fred" age <- 50 anniversary <- as.Date("1991-10-12") str_glue( "My name is {name}, ", "my age next year is {age + 1}, ", "and my anniversary is {format(anniversary, '\%A, \%B \%d, \%Y')}." ) # single braces can be inserted by doubling them str_glue("My name is {name}, not {{name}}.") # You can also used named arguments str_glue( "My name is {name}, ", "and my age next year is {age + 1}.", name = "Joe", age = 40 ) # `str_glue_data()` is useful in data pipelines mtcars \%>\% str_glue_data("{rownames(.)} has {hp} hp") } ================================================ FILE: man/str_interp.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/interp.R \name{str_interp} \alias{str_interp} \title{String interpolation} \usage{ str_interp(string, env = parent.frame()) } \arguments{ \item{string}{A template character string. This function is not vectorised: a character vector will be collapsed into a single string.} \item{env}{The environment in which to evaluate the expressions.} } \value{ An interpolated character string. } \description{ \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}} \code{str_interp()} is superseded in favour of \code{\link[=str_glue]{str_glue()}}. String interpolation is a useful way of specifying a character string which depends on values in a certain environment. It allows for string creation which is easier to read and write when compared to using e.g. \code{\link[=paste]{paste()}} or \code{\link[=sprintf]{sprintf()}}. The (template) string can include expression placeholders of the form \verb{$\{expression\}} or \verb{$[format]\{expression\}}, where expressions are valid R expressions that can be evaluated in the given environment, and \code{format} is a format specification valid for use with \code{\link[=sprintf]{sprintf()}}. } \examples{ # Using values from the environment, and some formats user_name <- "smbache" amount <- 6.656 account <- 1337 str_interp("User ${user_name} (account $[08d]{account}) has $$[.2f]{amount}.") # Nested brace pairs work inside expressions too, and any braces can be # placed outside the expressions. str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}") # Values can also come from a list str_interp( "One value, ${value1}, and then another, ${value2*2}.", list(value1 = 10, value2 = 20) ) # Or a data frame str_interp( "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", iris ) # Use a vector when the string is long: max_char <- 80 str_interp(c( "This particular line is so long that it is hard to write ", "without breaking the ${max_char}-char barrier!" )) } \seealso{ \code{\link[=str_glue]{str_glue()}} and \code{\link[=str_glue_data]{str_glue_data()}} for alternative approaches to the same problem. } \author{ Stefan Milton Bache } \keyword{internal} ================================================ FILE: man/str_length.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/length.R \name{str_length} \alias{str_length} \alias{str_width} \title{Compute the length/width} \usage{ str_length(string) str_width(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} } \value{ A numeric vector the same length as \code{string}. } \description{ \code{str_length()} returns the number of codepoints in a string. These are the individual elements (which are often, but not always letters) that can be extracted with \code{\link[=str_sub]{str_sub()}}. \code{str_width()} returns how much space the string will occupy when printed in a fixed width font (i.e. when printed in the console). } \examples{ str_length(letters) str_length(NA) str_length(factor("abc")) str_length(c("i", "like", "programming", NA)) # Some characters, like emoji and Chinese characters (hanzi), are square # which means they take up the width of two Latin characters x <- c("\u6c49\u5b57", "\U0001f60a") str_view(x) str_width(x) str_length(x) # There are two ways of representing a u with an umlaut u <- c("\u00fc", "u\u0308") # They have the same width str_width(u) # But a different length str_length(u) # Because the second element is made up of a u + an accent str_sub(u, 1, 1) } \seealso{ \code{\link[stringi:stri_length]{stringi::stri_length()}} which this function wraps. } ================================================ FILE: man/str_like.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/detect.R \name{str_like} \alias{str_like} \alias{str_ilike} \title{Detect a pattern in the same way as \code{SQL}'s \code{LIKE} and \code{ILIKE} operators} \usage{ str_like(string, pattern, ignore_case = deprecated()) str_ilike(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{A character vector containing a SQL "like" pattern. See above for details.} \item{ignore_case}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}} } \value{ A logical vector the same length as \code{string}. } \description{ \code{str_like()} and \code{str_like()} follow the conventions of the SQL \code{LIKE} and \code{ILIKE} operators, namely: \itemize{ \item Must match the entire string. \item \verb{_} matches a single character (like \code{.}). \item \verb{\%} matches any number of characters (like \verb{.*}). \item \verb{\\\%} and \verb{\\_} match literal \verb{\%} and \verb{_}. } The difference between the two functions is their case-sensitivity: \code{str_like()} is case sensitive and \code{str_ilike()} is not. } \note{ Prior to stringr 1.6.0, \code{str_like()} was incorrectly case-insensitive. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_like(fruit, "app") str_like(fruit, "app\%") str_like(fruit, "APP\%") str_like(fruit, "ba_ana") str_like(fruit, "\%apple") str_ilike(fruit, "app") str_ilike(fruit, "app\%") str_ilike(fruit, "APP\%") str_ilike(fruit, "ba_ana") str_ilike(fruit, "\%apple") } ================================================ FILE: man/str_locate.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/locate.R \name{str_locate} \alias{str_locate} \alias{str_locate_all} \title{Find location of match} \usage{ str_locate(string, pattern) str_locate_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. The empty string, \verb{""``, is equivalent to }boundary("character")`.} } \value{ \itemize{ \item \code{str_locate()} returns an integer matrix with two columns and one row for each element of \code{string}. The first column, \code{start}, gives the position at the start of the match, and the second column, \code{end}, gives the position of the end. \item \code{str_locate_all()} returns a list of integer matrices with the same length as \code{string}/\code{pattern}. The matrices have columns \code{start} and \code{end} as above, and one row for each match. } } \description{ \code{str_locate()} returns the \code{start} and \code{end} position of the first match; \code{str_locate_all()} returns the \code{start} and \code{end} position of each match. Because the \code{start} and \code{end} values are inclusive, zero-length matches (e.g. \code{$}, \code{^}, \verb{\\\\b}) will have an \code{end} that is smaller than \code{start}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_locate(fruit, "$") str_locate(fruit, "a") str_locate(fruit, "e") str_locate(fruit, c("a", "b", "p", "p")) str_locate_all(fruit, "a") str_locate_all(fruit, "e") str_locate_all(fruit, c("a", "b", "p", "p")) # Find location of every character str_locate_all(fruit, "") } \seealso{ \code{\link[=str_extract]{str_extract()}} for a convenient way of extracting matches, \code{\link[stringi:stri_locate]{stringi::stri_locate()}} for the underlying implementation. } ================================================ FILE: man/str_match.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/match.R \name{str_match} \alias{str_match} \alias{str_match_all} \title{Extract components (capturing groups) from a match} \usage{ str_match(string, pattern) str_match_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Unlike other stringr functions, \code{str_match()} only supports regular expressions, as described \code{vignette("regular-expressions")}. The pattern should contain at least one capturing group.} } \value{ \itemize{ \item \code{str_match()}: a character matrix with the same number of rows as the length of \code{string}/\code{pattern}. The first column is the complete match, followed by one column for each capture group. The columns will be named if you used "named captured groups", i.e. \verb{(?pattern')}. \item \code{str_match_all()}: a list of the same length as \code{string}/\code{pattern} containing character matrices. Each matrix has columns as described above and one row for each match. } } \description{ Extract any number of matches defined by unnamed, \code{(pattern)}, and named, \verb{(?pattern)} capture groups. Use a non-capturing group, \verb{(?:pattern)}, if you need to override default operate precedence but don't want to capture the result. } \examples{ strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", "Home: 543.355.3679") phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" str_extract(strings, phone) str_match(strings, phone) # Extract/match all str_extract_all(strings, phone) str_match_all(strings, phone) # You can also name the groups to make further manipulation easier phone <- "(?[2-9][0-9]{2})[- .](?[0-9]{3}[- .][0-9]{4})" str_match(strings, phone) x <- c(" ", " <>", "", "", NA) str_match(x, "<(.*?)> <(.*?)>") str_match_all(x, "<(.*?)>") str_extract(x, "<.*?>") str_extract_all(x, "<.*?>") } \seealso{ \code{\link[=str_extract]{str_extract()}} to extract the complete match, \code{\link[stringi:stri_match]{stringi::stri_match()}} for the underlying implementation. } ================================================ FILE: man/str_order.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/sort.R \name{str_order} \alias{str_order} \alias{str_rank} \alias{str_sort} \title{Order, rank, or sort a character vector} \usage{ str_order( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) str_rank(x, locale = "en", numeric = FALSE, ...) str_sort( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) } \arguments{ \item{x}{A character vector to sort.} \item{decreasing}{A boolean. If \code{FALSE}, the default, sorts from lowest to highest; if \code{TRUE} sorts from highest to lowest.} \item{na_last}{Where should \code{NA} go? \code{TRUE} at the end, \code{FALSE} at the beginning, \code{NA} dropped.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{numeric}{If \code{TRUE}, will sort digits numerically, instead of as strings.} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ A character vector the same length as \code{string}. } \description{ \itemize{ \item \code{str_sort()} returns the sorted vector. \item \code{str_order()} returns an integer vector that returns the desired order when used for subsetting, i.e. \code{x[str_order(x)]} is the same as \code{str_sort()} \item \code{str_rank()} returns the ranks of the values, i.e. \code{arrange(df, str_rank(x))} is the same as \code{str_sort(df$x)}. } } \examples{ x <- c("apple", "car", "happy", "char") str_sort(x) str_order(x) x[str_order(x)] str_rank(x) # In Czech, ch is a digraph that sorts after h str_sort(x, locale = "cs") # Use numeric = TRUE to sort numbers in strings x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) } \seealso{ \code{\link[stringi:stri_order]{stringi::stri_order()}} for the underlying implementation. } ================================================ FILE: man/str_pad.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pad.R \name{str_pad} \alias{str_pad} \title{Pad a string to minimum width} \usage{ str_pad( string, width, side = c("left", "right", "both"), pad = " ", use_width = TRUE ) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Minimum width of padded strings.} \item{side}{Side on which padding character is added (left, right or both).} \item{pad}{Single padding character (default is a space).} \item{use_width}{If \code{FALSE}, use the length of the string instead of the width; see \code{\link[=str_width]{str_width()}}/\code{\link[=str_length]{str_length()}} for the difference.} } \value{ A character vector the same length as \code{stringr}/\code{width}/\code{pad}. } \description{ Pad a string to a fixed width, so that \code{str_length(str_pad(x, n))} is always greater than or equal to \code{n}. } \examples{ rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) # All arguments are vectorised except side str_pad(c("a", "abc", "abcdef"), 10) str_pad("a", c(5, 10, 20)) str_pad("a", 10, pad = c("-", "_", " ")) # Longer strings are returned unchanged str_pad("hadley", 3) } \seealso{ \code{\link[=str_trim]{str_trim()}} to remove whitespace; \code{\link[=str_trunc]{str_trunc()}} to decrease the maximum width of a string. } ================================================ FILE: man/str_remove.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/remove.R \name{str_remove} \alias{str_remove} \alias{str_remove_all} \title{Remove matched patterns} \usage{ str_remove(string, pattern) str_remove_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} } \value{ A character vector the same length as \code{string}/\code{pattern}. } \description{ Remove matches, i.e. replace them with \code{""}. } \examples{ fruits <- c("one apple", "two pears", "three bananas") str_remove(fruits, "[aeiou]") str_remove_all(fruits, "[aeiou]") } \seealso{ \code{\link[=str_replace]{str_replace()}} for the underlying implementation. } ================================================ FILE: man/str_replace.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/replace.R \name{str_replace} \alias{str_replace} \alias{str_replace_all} \title{Replace matches with new text} \usage{ str_replace(string, pattern, replacement) str_replace_all(string, pattern, replacement) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi:about_search_regex]{stringi::about_search_regex}. Control options with \code{\link[=regex]{regex()}}. For \code{str_replace_all()} this can also be a named vector (\code{c(pattern1 = replacement1)}), in order to perform multiple replacements in each element of \code{string}. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} \item{replacement}{The replacement value, usually a single string, but it can be the a vector the same length as \code{string} or \code{pattern}. References of the form \verb{\\1}, \verb{\\2}, etc will be replaced with the contents of the respective matched group (created by \verb{()}). Alternatively, supply a function (or formula): it will be passed a single character vector and should return a character vector of the same length. To replace the complete string with \code{NA}, use \code{replacement = NA_character_}.} } \value{ A character vector the same length as \code{string}/\code{pattern}/\code{replacement}. } \description{ \code{str_replace()} replaces the first match; \code{str_replace_all()} replaces all matches. } \examples{ fruits <- c("one apple", "two pears", "three bananas") str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", toupper) str_replace_all(fruits, "b", NA_character_) str_replace(fruits, "([aeiou])", "") str_replace(fruits, "([aeiou])", "\\\\1\\\\1") # Note that str_replace() is vectorised along text, pattern, and replacement str_replace(fruits, "[aeiou]", c("1", "2", "3")) str_replace(fruits, c("a", "e", "i"), "-") # If you want to apply multiple patterns and replacements to the same # string, pass a named vector to pattern. fruits \%>\% str_c(collapse = "---") \%>\% str_replace_all(c("one" = "1", "two" = "2", "three" = "3")) # Use a function for more sophisticated replacement. This example # replaces colour names with their hex values. colours <- str_c("\\\\b", colors(), "\\\\b", collapse="|") col2hex <- function(col) { rgb <- col2rgb(col) rgb(rgb["red", ], rgb["green", ], rgb["blue", ], maxColorValue = 255) } x <- c( "Roses are red, violets are blue", "My favourite colour is green" ) str_replace_all(x, colours, col2hex) } \seealso{ \code{\link[=str_replace_na]{str_replace_na()}} to turn missing values into "NA"; \code{\link[stringi:stri_replace]{stringi::stri_replace()}} for the underlying implementation. } ================================================ FILE: man/str_replace_na.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/replace.R \name{str_replace_na} \alias{str_replace_na} \title{Turn NA into "NA"} \usage{ str_replace_na(string, replacement = "NA") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{replacement}{A single string.} } \description{ Turn NA into "NA" } \examples{ str_replace_na(c(NA, "abc", "def")) } ================================================ FILE: man/str_split.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/split.R \name{str_split} \alias{str_split} \alias{str_split_1} \alias{str_split_fixed} \alias{str_split_i} \title{Split up a string into pieces} \usage{ str_split(string, pattern, n = Inf, simplify = FALSE) str_split_1(string, pattern) str_split_fixed(string, pattern, n) str_split_i(string, pattern, i) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. The empty string, \verb{""``, is equivalent to }boundary("character")`.} \item{n}{Maximum number of pieces to return. Default (Inf) uses all possible split positions. For \code{str_split()}, this determines the maximum length of each element of the output. For \code{str_split_fixed()}, this determines the number of columns in the output; if an input is too short, the result will be padded with \code{""}.} \item{simplify}{A boolean. \itemize{ \item \code{FALSE} (the default): returns a list of character vectors. \item \code{TRUE}: returns a character matrix. }} \item{i}{Element to return. Use a negative value to count from the right hand side.} } \value{ \itemize{ \item \code{str_split_1()}: a character vector. \item \code{str_split()}: a list the same length as \code{string}/\code{pattern} containing character vectors. \item \code{str_split_fixed()}: a character matrix with \code{n} columns and the same number of rows as the length of \code{string}/\code{pattern}. \item \code{str_split_i()}: a character vector the same length as \code{string}/\code{pattern}. } } \description{ This family of functions provides various ways of splitting a string up into pieces. These two functions return a character vector: \itemize{ \item \code{str_split_1()} takes a single string and splits it into pieces, returning a single character vector. \item \code{str_split_i()} splits each string in a character vector into pieces and extracts the \code{i}th value, returning a character vector. } These two functions return a more complex object: \itemize{ \item \code{str_split()} splits each string in a character vector into a varying number of pieces, returning a list of character vectors. \item \code{str_split_fixed()} splits each string in a character vector into a fixed number of pieces, returning a character matrix. } } \examples{ fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) str_split(fruits, " and ") str_split(fruits, " and ", simplify = TRUE) # If you want to split a single string, use `str_split_1` str_split_1(fruits[[1]], " and ") # Specify n to restrict the number of possible matches str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) # If n greater than number of pieces, no padding occurs str_split(fruits, " and ", n = 5) # Use fixed to return a character matrix str_split_fixed(fruits, " and ", 3) str_split_fixed(fruits, " and ", 4) # str_split_i extracts only a single piece from a string str_split_i(fruits, " and ", 1) str_split_i(fruits, " and ", 4) # use a negative number to select from the end str_split_i(fruits, " and ", -1) } \seealso{ \code{\link[stringi:stri_split]{stringi::stri_split()}} for the underlying implementation. } ================================================ FILE: man/str_starts.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/detect.R \name{str_starts} \alias{str_starts} \alias{str_ends} \title{Detect the presence/absence of a match at the start/end} \usage{ str_starts(string, pattern, negate = FALSE) str_ends(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern with which the string starts or ends. The default interpretation is a regular expression, as described in \link[stringi:about_search_regex]{stringi::about_search_regex}. Control options with \code{\link[=regex]{regex()}}. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ A logical vector. } \description{ \code{str_starts()} and \code{str_ends()} are special cases of \code{\link[=str_detect]{str_detect()}} that only match at the beginning or end of a string, respectively. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_starts(fruit, "p") str_starts(fruit, "p", negate = TRUE) str_ends(fruit, "e") str_ends(fruit, "e", negate = TRUE) } ================================================ FILE: man/str_sub.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/sub.R \name{str_sub} \alias{str_sub} \alias{str_sub<-} \alias{str_sub_all} \title{Get and set substrings using their positions} \usage{ str_sub(string, start = 1L, end = -1L) str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value str_sub_all(string, start = 1L, end = -1L) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{start, end}{A pair of integer vectors defining the range of characters to extract (inclusive). Positive values count from the left of the string, and negative values count from the right. In other words, if \code{string} is \code{"abcdef"} then 1 refers to \code{"a"} and -1 refers to \code{"f"}. Alternatively, instead of a pair of vectors, you can pass a matrix to \code{start}. The matrix should have two columns, either labelled \code{start} and \code{end}, or \code{start} and \code{length}. This makes \code{str_sub()} work directly with the output from \code{\link[=str_locate]{str_locate()}} and friends.} \item{omit_na}{Single logical value. If \code{TRUE}, missing values in any of the arguments provided will result in an unchanged input.} \item{value}{Replacement string.} } \value{ \itemize{ \item \code{str_sub()}: A character vector the same length as \code{string}/\code{start}/\code{end}. \item \code{str_sub_all()}: A list the same length as \code{string}. Each element is a character vector the same length as \code{start}/\code{end}. } If \code{end} comes before \code{start} or \code{start} is outside the range of \code{string} then the corresponding output will be the empty string. } \description{ \code{str_sub()} extracts or replaces the elements at a single position in each string. \code{str_sub_all()} allows you to extract strings at multiple elements in every string. } \examples{ hw <- "Hadley Wickham" str_sub(hw, 1, 6) str_sub(hw, end = 6) str_sub(hw, 8, 14) str_sub(hw, 8) # Negative values index from end of string str_sub(hw, -1) str_sub(hw, -7) str_sub(hw, end = -7) # str_sub() is vectorised by both string and position str_sub(hw, c(1, 8), c(6, 14)) # if you want to extract multiple positions from multiple strings, # use str_sub_all() x <- c("abcde", "ghifgh") str_sub(x, c(1, 2), c(2, 4)) str_sub_all(x, start = c(1, 2), end = c(2, 4)) # Alternatively, you can pass in a two column matrix, as in the # output from str_locate_all pos <- str_locate_all(hw, "[aeio]")[[1]] pos str_sub(hw, pos) # You can also use `str_sub()` to modify strings: x <- "BBCDEF" str_sub(x, 1, 1) <- "A"; x str_sub(x, -1, -1) <- "K"; x str_sub(x, -2, -2) <- "GHIJ"; x str_sub(x, 2, -2) <- ""; x } \seealso{ The underlying implementation in \code{\link[stringi:stri_sub]{stringi::stri_sub()}} } ================================================ FILE: man/str_subset.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/subset.R \name{str_subset} \alias{str_subset} \title{Find matching elements} \usage{ str_subset(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ A character vector, usually smaller than \code{string}. } \description{ \code{str_subset()} returns all elements of \code{string} where there's at least one match to \code{pattern}. It's a wrapper around \code{x[str_detect(x, pattern)]}, and is equivalent to \code{grep(pattern, x, value = TRUE)}. Use \code{\link[=str_extract]{str_extract()}} to find the location of the match \emph{within} each string. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_subset(fruit, "a") str_subset(fruit, "^a") str_subset(fruit, "a$") str_subset(fruit, "b") str_subset(fruit, "[aeiou]") # Elements that don't match str_subset(fruit, "^p", negate = TRUE) # Missings never match str_subset(c("a", NA, "b"), ".") } \seealso{ \code{\link[=grep]{grep()}} with argument \code{value = TRUE}, \code{\link[stringi:stri_subset]{stringi::stri_subset()}} for the underlying implementation. } ================================================ FILE: man/str_to_camel.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/case.R \name{str_to_camel} \alias{str_to_camel} \alias{str_to_snake} \alias{str_to_kebab} \title{Convert between different types of programming case} \usage{ str_to_camel(string, first_upper = FALSE) str_to_snake(string) str_to_kebab(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{first_upper}{Logical. Should the first letter be capitalized?} } \description{ \itemize{ \item \code{str_to_camel()} converts to camel case, where the first letter of each word is capitalized, with no separation between words. By default the first letter of the first word is not capitalized. \item \code{str_to_kebab()} converts to kebab case, where words are converted to lower case and separated by dashes (\code{-}). \item \code{str_to_snake()} converts to snake case, where words are converted to lower case and separated by underscores (\verb{_}). } } \examples{ str_to_camel("my-variable") str_to_camel("my-variable", first_upper = TRUE) str_to_snake("MyVariable") str_to_kebab("MyVariable") } ================================================ FILE: man/str_trim.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/trim.R \name{str_trim} \alias{str_trim} \alias{str_squish} \title{Remove whitespace} \usage{ str_trim(string, side = c("both", "left", "right")) str_squish(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{side}{Side on which to remove whitespace: "left", "right", or "both", the default.} } \value{ A character vector the same length as \code{string}. } \description{ \code{str_trim()} removes whitespace from start and end of string; \code{str_squish()} removes whitespace at the start and end, and replaces all internal whitespace with a single space. } \examples{ str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") } \seealso{ \code{\link[=str_pad]{str_pad()}} to add whitespace } ================================================ FILE: man/str_trunc.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/trunc.R \name{str_trunc} \alias{str_trunc} \title{Truncate a string to maximum width} \usage{ str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Maximum width of string.} \item{side, ellipsis}{Location and content of ellipsis that indicates content has been removed.} } \value{ A character vector the same length as \code{string}. } \description{ Truncate a string to a fixed of characters, so that \code{str_length(str_trunc(x, n))} is always less than or equal to \code{n}. } \examples{ x <- "This string is moderately long" rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) } \seealso{ \code{\link[=str_pad]{str_pad()}} to increase the minimum width of a string. } ================================================ FILE: man/str_unique.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/unique.R \name{str_unique} \alias{str_unique} \title{Remove duplicated strings} \usage{ str_unique(string, locale = "en", ignore_case = FALSE, ...) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{ignore_case}{Ignore case when comparing strings?} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ A character vector, usually shorter than \code{string}. } \description{ \code{str_unique()} removes duplicated values, with optional control over how duplication is measured. } \examples{ str_unique(c("a", "b", "c", "b", "a")) str_unique(c("a", "b", "c", "B", "A")) str_unique(c("a", "b", "c", "B", "A"), ignore_case = TRUE) # Use ... to pass additional arguments to stri_unique() str_unique(c("motley", "mötley", "pinguino", "pingüino")) str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1) } \seealso{ \code{\link[=unique]{unique()}}, \code{\link[stringi:stri_unique]{stringi::stri_unique()}} which this function wraps. } ================================================ FILE: man/str_view.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/view.R \name{str_view} \alias{str_view} \alias{str_view_all} \title{View strings and matches} \usage{ str_view( string, pattern = NULL, match = TRUE, html = FALSE, use_escapes = FALSE ) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} \item{match}{If \code{pattern} is supplied, which elements should be shown? \itemize{ \item \code{TRUE}, the default, shows only elements that match the pattern. \item \code{NA} shows all elements. \item \code{FALSE} shows only elements that don't match the pattern. } If \code{pattern} is not supplied, all elements are always shown.} \item{html}{Use HTML output? If \code{TRUE} will create an HTML widget; if \code{FALSE} will style using ANSI escapes.} \item{use_escapes}{If \code{TRUE}, all non-ASCII characters will be rendered with unicode escapes. This is useful to see exactly what underlying values are stored in the string.} } \description{ \code{str_view()} is used to print the underlying representation of a string and to see how a \code{pattern} matches. Matches are surrounded by \verb{<>} and unusual whitespace (i.e. all whitespace apart from \code{" "} and \code{"\\n"}) are surrounded by \code{{}} and escaped. Where possible, matches and unusual whitespace are coloured blue and \code{NA}s red. } \examples{ # Show special characters str_view(c("\"\\\\", "\\\\\\\\\\\\", "fgh", NA, "NA")) # A non-breaking space looks like a regular space: nbsp <- "Hi\u00A0you" nbsp # But it doesn't behave like one: str_detect(nbsp, " ") # So str_view() brings it to your attention with a blue background str_view(nbsp) # You can also use escapes to see all non-ASCII characters str_view(nbsp, use_escapes = TRUE) # Supply a pattern to see where it matches str_view(c("abc", "def", "fghi"), "[aeiou]") str_view(c("abc", "def", "fghi"), "^") str_view(c("abc", "def", "fghi"), "..") # By default, only matching strings will be shown str_view(c("abc", "def", "fghi"), "e") # but you can show all: str_view(c("abc", "def", "fghi"), "e", match = NA) # or just those that don't match: str_view(c("abc", "def", "fghi"), "e", match = FALSE) } ================================================ FILE: man/str_which.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/subset.R \name{str_which} \alias{str_which} \title{Find matching indices} \usage{ str_which(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. You can not match boundaries, including \code{""}, with this function.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ An integer vector, usually smaller than \code{string}. } \description{ \code{str_which()} returns the indices of \code{string} where there's at least one match to \code{pattern}. It's a wrapper around \code{which(str_detect(x, pattern))}, and is equivalent to \code{grep(pattern, x)}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_which(fruit, "a") # Elements that don't match str_which(fruit, "^p", negate = TRUE) # Missings never match str_which(c("a", NA, "b"), ".") } ================================================ FILE: man/str_wrap.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/wrap.R \name{str_wrap} \alias{str_wrap} \title{Wrap words into nicely formatted paragraphs} \usage{ str_wrap(string, width = 80, indent = 0, exdent = 0, whitespace_only = TRUE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Positive integer giving target line width (in number of characters). A width less than or equal to 1 will put each word on its own line.} \item{indent, exdent}{A non-negative integer giving the indent for the first line (\code{indent}) and all subsequent lines (\code{exdent}).} \item{whitespace_only}{A boolean. \itemize{ \item If \code{TRUE} (the default) wrapping will only occur at whitespace. \item If \code{FALSE}, can break on any non-word character (e.g. \code{/}, \code{-}). }} } \value{ A character vector the same length as \code{string}. } \description{ Wrap words into paragraphs, minimizing the "raggedness" of the lines (i.e. the variation in length line) using the Knuth-Plass algorithm. } \examples{ thanks_path <- file.path(R.home("doc"), "THANKS") thanks <- str_c(readLines(thanks_path), collapse = "\n") thanks <- word(thanks, 1, 3, fixed("\n\n")) cat(str_wrap(thanks), "\n") cat(str_wrap(thanks, width = 40), "\n") cat(str_wrap(thanks, width = 60, indent = 2), "\n") cat(str_wrap(thanks, width = 60, exdent = 2), "\n") cat(str_wrap(thanks, width = 0, exdent = 2), "\n") } \seealso{ \code{\link[stringi:stri_wrap]{stringi::stri_wrap()}} for the underlying implementation. } ================================================ FILE: man/stringr-data.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{stringr-data} \alias{stringr-data} \alias{sentences} \alias{fruit} \alias{words} \title{Sample character vectors for practicing string manipulations} \format{ Character vectors. } \usage{ sentences fruit words } \description{ \code{fruit} and \code{words} come from the \code{rcorpora} package written by Gabor Csardi; the data was collected by Darius Kazemi and made available at \url{https://github.com/dariusk/corpora}. \code{sentences} is a collection of "Harvard sentences" used for standardised testing of voice. } \examples{ length(sentences) sentences[1:5] length(fruit) fruit[1:5] length(words) words[1:5] } \keyword{datasets} ================================================ FILE: man/stringr-package.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/stringr-package.R \docType{package} \name{stringr-package} \alias{stringr} \alias{stringr-package} \title{stringr: Simple, Consistent Wrappers for Common String Operations} \description{ \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another. } \seealso{ Useful links: \itemize{ \item \url{https://stringr.tidyverse.org} \item \url{https://github.com/tidyverse/stringr} \item Report bugs at \url{https://github.com/tidyverse/stringr/issues} } } \author{ \strong{Maintainer}: Hadley Wickham \email{hadley@posit.co} [copyright holder] Other contributors: \itemize{ \item Posit Software, PBC [copyright holder, funder] } } \keyword{internal} ================================================ FILE: man/word.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/word.R \name{word} \alias{word} \title{Extract words from a sentence} \usage{ word(string, start = 1L, end = start, sep = fixed(" ")) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{start, end}{Pair of integer vectors giving range of words (inclusive) to extract. If negative, counts backwards from the last word. The default value select the first word.} \item{sep}{Separator between words. Defaults to single space.} } \value{ A character vector with the same length as \code{string}/\code{start}/\code{end}. } \description{ Extract words from a sentence } \examples{ sentences <- c("Jane saw a cat", "Jane sat down") word(sentences, 1) word(sentences, 2) word(sentences, -1) word(sentences, 2, -1) # Also vectorised over start and end word(sentences[1], 1:3, -1) word(sentences[1], 1, 1:4) # Can define words by other separators str <- 'abc.def..123.4568.999' word(str, 1, sep = fixed('..')) word(str, 2, sep = fixed('..')) } ================================================ FILE: po/R-es.po ================================================ msgid "" msgstr "" "Project-Id-Version: stringr 1.5.1.9000\n" "POT-Creation-Date: 2024-07-17 11:07-0500\n" "PO-Revision-Date: 2024-07-17 11:07-0500\n" "Last-Translator: Automatically generated\n" "Language-Team: none\n" "Language: es\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" #: detect.R:141 msgid "{.arg pattern} must be a plain string, not a stringr modifier." msgstr "{.arg pattern} debe ser una cadena de caracteres, no un modificador de stringr" #: interp.R:105 msgid "Invalid template string for interpolation." msgstr "Plantilla de cadenas invalida para interpolación." #: interp.R:176 msgid "Failed to parse input {.str {text}}" msgstr "Fallo en segmentar el input {.str {text}}" #: match.R:54 match.R:68 msgid "{.arg pattern} must be a regular expression." msgstr "{.arg pattern} debe ser una expresión regular." #: modifiers.R:216 msgid "{.arg pattern} must be a string, not {.obj_type_friendly {x}}." msgstr "{.arg pattern} debe ser una cadena de caracteres, no {.obj_type_friendly {x}}." #: replace.R:208 msgid "Failed to apply {.arg replacement} function." msgstr "Fallo en aplicar la función {.arg replacement}." #: replace.R:209 msgid "It must accept a character vector of any length." msgstr "Debe aceptar un vector de caracteres de cualquier longitud." #: replace.R:220 msgid "" "{.arg replacement} function must return a character vector, not {." "obj_type_friendly {new_flat}}." msgstr "" "La función {.arg replacement} debe devolver un vector de caracteres, no {." "obj_type_friendly {new_flat}}." #: replace.R:226 msgid "" "{.arg replacement} function must return a vector the same length as the input " "({length(old_flat)}), not length {length(new_flat)}." msgstr "" "La función {.arg replacement} debe devolver un vector del mismo largo que el input " "({length(old_flat)}), no de {length(new_flat)} de largo." #: split.R:122 msgid "{.arg i} must not be 0." msgstr "{.arg i} no debe ser igual a 0." #: trunc.R:32 msgid "`width` ({width}) is shorter than `ellipsis` ({str_length(ellipsis)})." msgstr "`width` ({width}) es más corto que `ellipsis` ({str_length(ellipsis)})." #: utils.R:23 msgid "{.arg pattern} can't be a boundary." msgstr "{.arg pattern} no puede ser un límite." #: utils.R:26 msgid "{.arg pattern} can't be the empty string ({.code \"\"})." msgstr "{.arg pattern} no puede ser una cadena de caracteres vacia ({.code \"\"})." ================================================ FILE: po/R-stringr.pot ================================================ msgid "" msgstr "" "Project-Id-Version: stringr 1.5.1.9000\n" "POT-Creation-Date: 2024-08-15 10:19-0700\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #: detect.R:141 msgid "{.arg pattern} must be a plain string, not a stringr modifier." msgstr "" #: interp.R:105 msgid "Invalid template string for interpolation." msgstr "" #: interp.R:176 msgid "Failed to parse input {.str {text}}" msgstr "" #: match.R:54 match.R:68 msgid "{.arg pattern} must be a regular expression." msgstr "" #: modifiers.R:216 msgid "{.arg pattern} must be a string, not {.obj_type_friendly {x}}." msgstr "" #: replace.R:208 msgid "Failed to apply {.arg replacement} function." msgstr "" #: replace.R:209 msgid "It must accept a character vector of any length." msgstr "" #: replace.R:220 msgid "" "{.arg replacement} function must return a character vector, not {." "obj_type_friendly {new_flat}}." msgstr "" #: replace.R:226 msgid "" "{.arg replacement} function must return a vector the same length as the " "input ({length(old_flat)}), not length {length(new_flat)}." msgstr "" #: split.R:122 msgid "{.arg i} must not be 0." msgstr "" #: trunc.R:32 msgid "`width` ({width}) is shorter than `ellipsis` ({str_length(ellipsis)})." msgstr "" #: utils.R:23 msgid "{.arg pattern} can't be a boundary." msgstr "" #: utils.R:26 msgid "{.arg pattern} can't be the empty string ({.code \"\"})." msgstr "" ================================================ FILE: revdep/.gitignore ================================================ checks library checks.noindex library.noindex data.sqlite *.html cloud.noindex ================================================ FILE: revdep/README.md ================================================ # Revdeps ## Failed to check (2) |package |version |error |warning |note | |:-------------------|:-------|:-----|:-------|:----| |DSMolgenisArmadillo |? | | | | |multinma |0.8.1 |1 | | | ## New problems (9) |package |version |error |warning |note | |:---------|:-------|:------|:-------|:----| |[huxtable](problems.md#huxtable)|5.7.0 |__+2__ | |1 | |[latex2exp](problems.md#latex2exp)|0.9.6 |__+2__ | | | |[NMsim](problems.md#nmsim)|0.2.5 |__+1__ | | | |[nrlR](problems.md#nrlr)|0.1.1 |__+1__ | | | |[phenofit](problems.md#phenofit)|0.3.10 |__+2__ | | | |[psycModel](problems.md#psycmodel)|0.5.0 |__+1__ | |1 | |[salty](problems.md#salty)|0.1.1 |__+2__ | | | |[sdbuildR](problems.md#sdbuildr)|1.0.7 |__+1__ | | | |[zipangu](problems.md#zipangu)|0.3.3 |__+1__ | |1 | ================================================ FILE: revdep/cran.md ================================================ ## revdepcheck results We checked 2390 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package. * We saw 9 new problems * We failed to check 2 packages Issues with CRAN packages are summarised below. ### New problems (This reports the first line of each new failure) * huxtable checking examples ... ERROR checking tests ... ERROR * latex2exp checking tests ... ERROR checking re-building of vignette outputs ... ERROR * NMsim checking tests ... ERROR * nrlR checking examples ... ERROR * phenofit checking examples ... ERROR checking tests ... ERROR * psycModel checking tests ... ERROR * salty checking examples ... ERROR checking tests ... ERROR * sdbuildR checking tests ... ERROR * zipangu checking tests ... ERROR ### Failed to check * DSMolgenisArmadillo (NA) * multinma (NA) ================================================ FILE: revdep/email.yml ================================================ release_date: ??? rel_release_date: ??? my_news_url: ??? release_version: ??? release_details: ??? ================================================ FILE: revdep/failures.md ================================================ # DSMolgenisArmadillo (3.0.1) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "DSMolgenisArmadillo")` for more info ## Error before installation ### Devel ``` * using log directory ‘/tmp/workdir/DSMolgenisArmadillo/new/DSMolgenisArmadillo.Rcheck’ * using R version 4.5.1 (2025-06-13) * using platform: x86_64-pc-linux-gnu * R was compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 GNU Fortran (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 * running under: Ubuntu 24.04.3 LTS * using session charset: UTF-8 * using option ‘--no-manual’ * checking for file ‘DSMolgenisArmadillo/DESCRIPTION’ ... OK ... * checking files in ‘vignettes’ ... OK * checking examples ... OK * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... OK Running ‘testthat.R’ * checking for unstated dependencies in vignettes ... OK * checking package vignettes ... OK * checking re-building of vignette outputs ... OK * DONE Status: OK ``` ### CRAN ``` * using log directory ‘/tmp/workdir/DSMolgenisArmadillo/old/DSMolgenisArmadillo.Rcheck’ * using R version 4.5.1 (2025-06-13) * using platform: x86_64-pc-linux-gnu * R was compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 GNU Fortran (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 * running under: Ubuntu 24.04.3 LTS * using session charset: UTF-8 * using option ‘--no-manual’ * checking for file ‘DSMolgenisArmadillo/DESCRIPTION’ ... OK ... * checking files in ‘vignettes’ ... OK * checking examples ... OK * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... OK Running ‘testthat.R’ * checking for unstated dependencies in vignettes ... OK * checking package vignettes ... OK * checking re-building of vignette outputs ... OK * DONE Status: OK ``` # multinma (0.8.1) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "multinma")` for more info ## In both * checking whether package ‘multinma’ can be installed ... ERROR ``` Installation failed. See ‘/tmp/workdir/multinma/new/multinma.Rcheck/00install.out’ for details. ``` ## Installation ### Devel ``` * installing *source* package ‘multinma’ ... ** this is package ‘multinma’ version ‘0.8.1’ ** package ‘multinma’ successfully unpacked and MD5 sums checked ** using staged installation ** libs using C++ compiler: ‘g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0’ using C++17 g++ -std=gnu++17 -I"/opt/R/4.5.1/lib/R/include" -DNDEBUG -I"../inst/include" -I"/usr/local/lib/R/site-library/StanHeaders/include/src" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -DBOOST_MATH_OVERFLOW_ERROR_POLICY=errno_on_error -DUSE_STANC3 -D_HAS_AUTO_PTR_ETC=0 -I'/usr/local/lib/R/site-library/BH/include' -I'/usr/local/lib/R/site-library/Rcpp/include' -I'/usr/local/lib/R/site-library/RcppEigen/include' -I'/usr/local/lib/R/site-library/RcppParallel/include' -I'/usr/local/lib/R/site-library/rstan/include' -I'/usr/local/lib/R/site-library/StanHeaders/include' -I/usr/local/include -I'/usr/local/lib/R/site-library/RcppParallel/include' -D_REENTRANT -DSTAN_THREADS -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o ... /usr/local/lib/R/site-library/StanHeaders/include/src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp:22:0: required from ‘double stan::mcmc::dense_e_metric::T(stan::mcmc::dense_e_point&) [with Model = model_survival_param_namespace::model_survival_param; BaseRNG = boost::random::additive_combine_engine, boost::random::linear_congruential_engine >]’ /usr/local/lib/R/site-library/StanHeaders/include/src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp:21:0: required from here /usr/local/lib/R/site-library/RcppEigen/include/Eigen/src/Core/DenseCoeffsBase.h:654:74: warning: ignoring attributes on template argument ‘Eigen::internal::packet_traits::type’ {aka ‘__m128d’} [-Wignored-attributes] 654 | return internal::first_aligned::alignment),Derived>(m); | ^~~~~~~~~ g++: fatal error: Killed signal terminated program cc1plus compilation terminated. make: *** [/opt/R/4.5.1/lib/R/etc/Makeconf:209: stanExports_survival_param.o] Error 1 ERROR: compilation failed for package ‘multinma’ * removing ‘/tmp/workdir/multinma/new/multinma.Rcheck/multinma’ ``` ### CRAN ``` * installing *source* package ‘multinma’ ... ** this is package ‘multinma’ version ‘0.8.1’ ** package ‘multinma’ successfully unpacked and MD5 sums checked ** using staged installation ** libs using C++ compiler: ‘g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0’ using C++17 g++ -std=gnu++17 -I"/opt/R/4.5.1/lib/R/include" -DNDEBUG -I"../inst/include" -I"/usr/local/lib/R/site-library/StanHeaders/include/src" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -DBOOST_MATH_OVERFLOW_ERROR_POLICY=errno_on_error -DUSE_STANC3 -D_HAS_AUTO_PTR_ETC=0 -I'/usr/local/lib/R/site-library/BH/include' -I'/usr/local/lib/R/site-library/Rcpp/include' -I'/usr/local/lib/R/site-library/RcppEigen/include' -I'/usr/local/lib/R/site-library/RcppParallel/include' -I'/usr/local/lib/R/site-library/rstan/include' -I'/usr/local/lib/R/site-library/StanHeaders/include' -I/usr/local/include -I'/usr/local/lib/R/site-library/RcppParallel/include' -D_REENTRANT -DSTAN_THREADS -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o ... /usr/local/lib/R/site-library/StanHeaders/include/src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp:22:0: required from ‘double stan::mcmc::dense_e_metric::T(stan::mcmc::dense_e_point&) [with Model = model_survival_param_namespace::model_survival_param; BaseRNG = boost::random::additive_combine_engine, boost::random::linear_congruential_engine >]’ /usr/local/lib/R/site-library/StanHeaders/include/src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp:21:0: required from here /usr/local/lib/R/site-library/RcppEigen/include/Eigen/src/Core/DenseCoeffsBase.h:654:74: warning: ignoring attributes on template argument ‘Eigen::internal::packet_traits::type’ {aka ‘__m128d’} [-Wignored-attributes] 654 | return internal::first_aligned::alignment),Derived>(m); | ^~~~~~~~~ g++: fatal error: Killed signal terminated program cc1plus compilation terminated. make: *** [/opt/R/4.5.1/lib/R/etc/Makeconf:209: stanExports_survival_param.o] Error 1 ERROR: compilation failed for package ‘multinma’ * removing ‘/tmp/workdir/multinma/old/multinma.Rcheck/multinma’ ``` ================================================ FILE: revdep/problems.md ================================================ # huxtable (5.7.0) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "huxtable")` for more info ## Newly broken * checking examples ... ERROR ``` ... 2. ├─huxtable:::print.huxtable(x) 3. │ └─huxtable (local) meth(x, ...) 4. │ ├─base::cat(to_screen(ht, ...)) 5. │ └─huxtable::to_screen(ht, ...) 6. │ └─huxtable:::generate_table_display(...) 7. │ └─huxtable:::create_character_matrix(...) 8. │ └─huxtable:::character_matrix(...) 9. │ └─huxtable:::prepare_cell_display_data(ht, markdown) 10. │ └─huxtable:::clean_contents(ht, output_type = if (markdown) "markdown" else "screen") 11. │ └─huxtable:::format_numbers_matrix(contents, ht) 12. │ └─base::vapply(...) 13. │ └─huxtable (local) FUN(X[[i]], ...) 14. │ └─base::vapply(...) 15. │ └─huxtable (local) FUN(X[[i]], ...) 16. │ └─huxtable:::format_numbers(cell, nf[[row, col]]) 17. │ └─stringr::str_replace_all(string, number_regex(), format_numeral) 18. │ └─stringr:::str_transform_all(string, pattern, replacement) 19. │ ├─base::withCallingHandlers(...) 20. │ └─huxtable (local) replacement(old_flat) 21. │ └─numeral_formatter(num_fmt)(num) 22. └─base::.handleSimpleError(...) 23. └─stringr (local) h(simpleError(msg, call)) 24. └─cli::cli_abort(...) 25. └─rlang::abort(...) Execution halted ``` * checking tests ... ERROR ``` ... • x86_64-w64-mingw32/x64/validate-outputs/dimensions.rtf • x86_64-w64-mingw32/x64/validate-outputs/dimensions.tex • x86_64-w64-mingw32/x64/validate-outputs/dimensions.txt • x86_64-w64-mingw32/x64/validate-outputs/table_caption_tests.html • x86_64-w64-mingw32/x64/validate-outputs/table_caption_tests.rtf • x86_64-w64-mingw32/x64/validate-outputs/table_caption_tests.tex • x86_64-w64-mingw32/x64/validate-outputs/table_caption_tests.txt • x86_64-w64-mingw32/x64/validate-outputs/table_width_tests.html • x86_64-w64-mingw32/x64/validate-outputs/table_width_tests.rtf • x86_64-w64-mingw32/x64/validate-outputs/table_width_tests.tex • x86_64-w64-mingw32/x64/validate-outputs/table_width_tests.txt • x86_64-w64-mingw32/x64/validate-outputs/text_alignment.html • x86_64-w64-mingw32/x64/validate-outputs/text_alignment.rtf • x86_64-w64-mingw32/x64/validate-outputs/text_alignment.tex • x86_64-w64-mingw32/x64/validate-outputs/text_alignment.txt • x86_64-w64-mingw32/x64/validate-outputs/text_effects.html • x86_64-w64-mingw32/x64/validate-outputs/text_effects.rtf • x86_64-w64-mingw32/x64/validate-outputs/text_effects.tex • x86_64-w64-mingw32/x64/validate-outputs/text_effects.txt • x86_64-w64-mingw32/x64/validate-outputs/text_properties.html • x86_64-w64-mingw32/x64/validate-outputs/text_properties.rtf • x86_64-w64-mingw32/x64/validate-outputs/text_properties.tex • x86_64-w64-mingw32/x64/validate-outputs/text_properties.txt Error: Test failures Execution halted ``` ## In both * checking dependencies in R code ... NOTE ``` Namespaces in Imports field not imported from: ‘R6’ ‘xml2’ All declared Imports should be used. ``` # latex2exp (0.9.6) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "latex2exp")` for more info ## Newly broken * checking tests ... ERROR ``` ... (2), not length 1. Backtrace: ▆ 1. ├─latex2exp:::expect_renders_same(...) at test_simple.R:166:3 2. │ └─latex2exp:::.expect_renders(object, expected_expression, negate = FALSE) at tests/testthat/setup.R:30:3 3. │ └─latex2exp::TeX(act$val) at tests/testthat/setup.R:65:5 4. │ └─latex2exp:::parse_latex(input) 5. │ └─... %>% ... 6. ├─stringr::str_replace_all(., "([^\\\\]?)\\\\\\s", "\\1\\\\@SPACE2{}") 7. │ └─stringr:::check_lengths(string, pattern, replacement) 8. │ └─vctrs::vec_size_common(...) 9. ├─stringr::str_replace_all(., "([^\\\\]?)\\\\;", "\\1\\\\@SPACE2{}") 10. │ └─stringr:::check_lengths(string, pattern, replacement) 11. │ └─vctrs::vec_size_common(...) 12. ├─stringr::str_replace_all(., "([^\\\\]?)\\\\,", "\\1\\\\@SPACE1{}") 13. │ └─stringr:::check_lengths(string, pattern, replacement) 14. │ └─vctrs::vec_size_common(...) 15. └─stringr::str_replace_all(...) 16. └─stringr:::str_transform_all(string, pattern, replacement) 17. └─cli::cli_abort(...) 18. └─rlang::abort(...) [ FAIL 1 | WARN 1 | SKIP 0 | PASS 100 ] Error: Test failures Execution halted ``` * checking re-building of vignette outputs ... ERROR ``` Error(s) in re-building vignettes: --- re-building ‘supported-commands.Rmd’ using rmarkdown --- finished re-building ‘supported-commands.Rmd’ --- re-building ‘using-latex2exp.Rmd’ using rmarkdown ``` # NMsim (0.2.5) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "NMsim")` for more info ## Newly broken * checking tests ... ERROR ``` ... Running the tests in ‘tests/testthat.R’ failed. Complete output: > library(testthat) > library(NMsim) NMsim 0.2.5. Browse NMsim documentation at https://NMautoverse.github.io/NMsim/ > > test_check("NMsim") [ FAIL 1 | WARN 0 | SKIP 0 | PASS 168 ] ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error ('test_NMsim_VarCov.R:62:5'): Basic ─────────────────────────────────── Error in `stringr::str_replace_all(mod$THETA, "\\d+\\.\\d+", function(x) round(as.numeric(x), digits = 3))`: `replacement` function must return a character vector, not a double vector. Backtrace: ▆ 1. └─stringr::str_replace_all(...) at test_NMsim_VarCov.R:62:5 2. └─stringr:::str_transform_all(string, pattern, replacement) 3. └─cli::cli_abort(...) 4. └─rlang::abort(...) [ FAIL 1 | WARN 0 | SKIP 0 | PASS 168 ] Error: Test failures Execution halted ``` # nrlR (0.1.1) * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "nrlR")` for more info ## Newly broken * checking examples ... ERROR ``` ... > ### Name: fetch_lineups > ### Title: Fetch NRL Team Lineups > ### Aliases: fetch_lineups > > ### ** Examples > > fetch_lineups(url = "https://www.nrl.com/news/2024/05/07/nrl-team-lists-round-10/") Fetching team lineups from https://www.nrl.com/news/2024/05/07/nrl-team-lists-round-10/ Error in `stringr::str_replace()`: ! `pattern` can not contain NAs. Backtrace: ▆ 1. └─nrlR::fetch_lineups(url = "https://www.nrl.com/news/2024/05/07/nrl-team-lists-round-10/") 2. ├─stringr::str_squish(...) 3. │ └─stringr:::copy_names(...) 4. ├─stringr::str_replace(...) 5. │ └─stringr:::check_lengths(string, pattern, replacement) 6. │ └─vctrs::vec_size_common(...) 7. └─stringr::str_replace(rvest::html_text2(home_node), home_role_full, "") 8. ├─stringr:::type(pattern) 9. └─stringr:::type.character(pattern) 10. └─cli::cli_abort(tr_("{.arg pattern} can not contain NAs."), call = error_call) 11. └─rlang::abort(...) Execution halted ``` # phenofit (0.3.10) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "phenofit")` for more info ## Newly broken * checking examples ... ERROR ``` ... 3. │ └─... %>% set_names(dt$flag) 4. ├─dplyr::group_map(...) 5. ├─dplyr:::group_map.data.frame(...) 6. │ └─dplyr:::map2(chunks, group_keys, .f, ...) 7. │ └─base::mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE) 8. │ └─phenofit (local) ``(dots[[1L]][[1L]], dots[[2L]][[1L]]) 9. │ └─phenofit:::PhenoDeriv.default(values, t, der1, IsPlot = FALSE) 10. │ └─phenofit::findpeaks(...) 11. │ └─xc %<>% str_replace_midzero() 12. ├─phenofit:::str_replace_midzero(.) 13. │ └─str_replace_all(x, "\\++0\\++", . %>% replace("+")) %>% ... 14. ├─stringr::str_replace_all(., "-+0-+", . %>% replace("-")) 15. │ └─stringr:::str_transform_all(string, pattern, replacement) 16. │ ├─base::withCallingHandlers(...) 17. │ └─magrittr (local) replacement(old_flat) 18. │ └─magrittr::freduce(value, `_function_list`) 19. │ ├─base::withVisible(function_list[[k]](value)) 20. │ └─function_list[[k]](value) 21. │ └─phenofit (local) replace(., "-") 22. │ └─base::paste(rep(replacement, nchar(x)), collapse = "") 23. └─base::.handleSimpleError(...) 24. └─stringr (local) h(simpleError(msg, call)) 25. └─cli::cli_abort(...) 26. └─rlang::abort(...) Execution halted ``` * checking tests ... ERROR ``` ... 8. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) 9. ├─base::do.call(season, param) 10. ├─phenofit (local) ``(...) 11. │ └─phenofit:::findpeaks_season(...) 12. │ └─phenofit::findpeaks(...) 13. │ └─xc %<>% str_replace_midzero() 14. ├─phenofit:::str_replace_midzero(.) 15. │ └─str_replace_all(x, "\\++0\\++", . %>% replace("+")) %>% ... 16. ├─stringr::str_replace_all(., "-+0-+", . %>% replace("-")) 17. │ └─stringr:::str_transform_all(string, pattern, replacement) 18. │ ├─base::withCallingHandlers(...) 19. │ └─magrittr (local) replacement(old_flat) 20. │ └─magrittr::freduce(value, `_function_list`) 21. │ ├─base::withVisible(function_list[[k]](value)) 22. │ └─function_list[[k]](value) 23. │ └─phenofit (local) replace(., "-") 24. │ └─base::paste(rep(replacement, nchar(x)), collapse = "") 25. └─base::.handleSimpleError(...) 26. └─stringr (local) h(simpleError(msg, call)) 27. └─cli::cli_abort(...) 28. └─rlang::abort(...) [ FAIL 2 | WARN 2 | SKIP 0 | PASS 66 ] Error: Test failures Execution halted ``` # psycModel (0.5.0) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "psycModel")` for more info ## Newly broken * checking tests ... ERROR ``` ... [ FAIL 2 | WARN 0 | SKIP 0 | PASS 68 ] ══ Failed tests ════════════════════════════════════════════════════════════════ ── Failure ('test-model-table.R:15:3'): model_table: linear regression ───────── `lm_1_check` (`actual`) not equal to model_summary[[2]] (`expected`). `names(actual)`: "(Intercept)" "Sepal.Length" `names(expected)`: "" "" ── Failure ('test-model-table.R:16:3'): model_table: linear regression ───────── `lm_2_check` (`actual`) not equal to model_summary[[3]] (`expected`). `names(actual)`: "(Intercept)" "Petal.Length" `names(expected)`: "" "" [ FAIL 2 | WARN 0 | SKIP 0 | PASS 68 ] Error: Test failures Execution halted ``` ## In both * checking dependencies in R code ... NOTE ``` Namespaces in Imports field not imported from: ‘lifecycle’ ‘patchwork’ All declared Imports should be used. ``` # salty (0.1.1) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "salty")` for more info ## Newly broken * checking examples ... ERROR ``` ... > x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", + "Nunc finibus tortor a elit eleifend interdum.", + "Maecenas aliquam augue sit amet ultricies placerat.") > > salt_replace(x, replacement_shaker$capitalization, p = 0.5, rep_p = 0.2) Error in `purrr::map2_chr()`: ℹ In index: 1. Caused by error in `stringr::str_replace_all()`: ! `replacement` function must return a vector the same length as the input (47), not length 1. Backtrace: ▆ 1. └─salty::salt_replace(...) 2. └─purrr::map2_chr(...) 3. └─purrr:::map2_("character", .x, .y, .f, ..., .progress = .progress) 4. ├─purrr:::with_indexed_errors(...) 5. │ └─base::withCallingHandlers(...) 6. ├─purrr:::call_with_cleanup(...) 7. └─salty (local) .f(.x[[i]], .y[[i]], ...) 8. └─salty:::selective_replacement(xc, replacements(i = si), rep_p) 9. └─stringr::str_replace_all(x, pattern = patterns, replacement = repfun) 10. └─stringr:::str_transform_all(string, pattern, replacement) 11. └─cli::cli_abort(...) 12. └─rlang::abort(...) Execution halted ``` * checking tests ... ERROR ``` ... 9. │ └─purrr::map2_chr(...) 10. │ └─purrr:::map2_("character", .x, .y, .f, ..., .progress = .progress) 11. │ ├─purrr:::with_indexed_errors(...) 12. │ │ └─base::withCallingHandlers(...) 13. │ ├─purrr:::call_with_cleanup(...) 14. │ └─salty (local) .f(.x[[i]], .y[[i]], ...) 15. │ └─salty:::selective_replacement(xc, replacements(i = si), rep_p) 16. │ └─stringr::str_replace_all(x, pattern = patterns, replacement = repfun) 17. │ └─stringr:::str_transform_all(string, pattern, replacement) 18. │ └─cli::cli_abort(...) 19. │ └─rlang::abort(...) 20. │ └─rlang:::signal_abort(cnd, .file) 21. │ └─base::signalCondition(cnd) 22. ├─purrr (local) ``(``) 23. │ └─cli::cli_abort(...) 24. │ └─rlang::abort(...) 25. │ └─rlang:::signal_abort(cnd, .file) 26. │ └─base::signalCondition(cnd) 27. └─purrr (local) ``(``) 28. └─cli::cli_abort(...) 29. └─rlang::abort(...) [ FAIL 5 | WARN 0 | SKIP 0 | PASS 755 ] Error: Test failures Execution halted ``` # sdbuildR (1.0.7) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "sdbuildR")` for more info ## Newly broken * checking tests ... ERROR ``` ... 12. └─cli::cli_abort(...) 13. └─rlang::abort(...) ── Error ('test-conv_julia.R:723:3'): adding scientific notation ─────────────── Error in `stringr::str_replace_all(eqn, pattern = pattern, replacement = reformat_scientific)`: Failed to apply `replacement` function. i It must accept a character vector of any length. Caused by error in `if (nchar(format(num, scientific = FALSE)) > digits_max) ...`: ! the condition has length > 1 Backtrace: ▆ 1. ├─testthat::expect_equal(...) at test-conv_julia.R:723:3 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object") 3. │ └─rlang::eval_bare(expr, quo_get_env(quo)) 4. ├─sdbuildR:::scientific_notation("hiding 1e+23", task = "add") 5. │ └─stringr::str_replace_all(eqn, pattern = pattern, replacement = reformat_scientific) 6. │ └─stringr:::str_transform_all(string, pattern, replacement) 7. │ ├─base::withCallingHandlers(...) 8. │ └─sdbuildR (local) replacement(old_flat) 9. └─base::.handleSimpleError(...) 10. └─stringr (local) h(simpleError(msg, call)) 11. └─cli::cli_abort(...) 12. └─rlang::abort(...) [ FAIL 4 | WARN 0 | SKIP 30 | PASS 915 ] Error: Test failures Execution halted ``` # zipangu (0.3.3) * GitHub: * Email: * GitHub mirror: Run `revdepcheck::cloud_details(, "zipangu")` for more info ## Newly broken * checking tests ... ERROR ``` ... res <- res %>% purrr::list_merge(city = split_pref[2] %>% dplyr::if_else(is_address_block(.), stringr::str_remove(., "((土地区画|街区).+)") %>% stringr::str_remove("土地区画|街区"), .) %>% stringr::str_replace("(.市)(.+町.+)", "\\1") %>% stringr::str_replace(city_name_regex, replacement = "\\1")) } else { res <- res %>% purrr::list_merge(city = split_pref[2] %>% dplyr::if_else(is_address_block(.), stringr::str_remove(., "((土地区画|街区).+)") %>% stringr::str_remove("土地区画|街区"), .) %>% stringr::str_replace(paste0(city_name_regex, "(.+)"), replacement = "\\1")) } res <- res %>% purrr::list_merge(street = split_pref[2] %>% stringr::str_remove(res %>% purrr::pluck("city"))) res %>% purrr::map(~dplyr::if_else(.x == "", NA_character_, .x)) })`: ℹ In index: 1. Caused by error in `str_replace()`: ! `pattern` can not contain NAs. [ FAIL 1 | WARN 0 | SKIP 2 | PASS 143 ] Error: Test failures Execution halted ``` ## In both * checking DESCRIPTION meta-information ... NOTE ``` Missing dependency on R >= 4.1.0 because package code uses the pipe |> or function shorthand \(...) syntax added in R 4.1.0. File(s) using such syntax: ‘convert-jyear-legacy.R’ ``` ================================================ FILE: stringr.Rproj ================================================ Version: 1.0 RestoreWorkspace: Default SaveWorkspace: Default AlwaysSaveHistory: Default EnableCodeIndexing: Yes UseSpacesForTab: Yes NumSpacesForTab: 2 Encoding: UTF-8 RnwWeave: Sweave LaTeX: pdfLaTeX AutoAppendNewline: Yes StripTrailingWhitespace: Yes BuildType: Package PackageUseDevtools: Yes PackageInstallArgs: --no-multiarch --with-keep.source PackageRoxygenize: rd,collate,namespace ================================================ FILE: tests/testthat/_snaps/c.md ================================================ # obeys tidyverse recycling rules Code str_c(c("x", "y"), character()) Condition Error in `str_c()`: ! Can't recycle `..1` (size 2) to match `..2` (size 0). # vectorised arguments error Code str_c(letters, sep = c("a", "b")) Condition Error in `str_c()`: ! `sep` must be a single string, not a character vector. Code str_c(letters, collapse = c("a", "b")) Condition Error in `str_c()`: ! `collapse` must be a single string or `NULL`, not a character vector. ================================================ FILE: tests/testthat/_snaps/conv.md ================================================ # check encoding argument Code str_conv("A", c("ISO-8859-1", "ISO-8859-2")) Condition Error in `str_conv()`: ! `encoding` must be a single string, not a character vector. ================================================ FILE: tests/testthat/_snaps/detect.md ================================================ # can't empty/boundary Code str_detect("x", "") Condition Error in `str_detect()`: ! `pattern` can't be the empty string (`""`). Code str_starts("x", "") Condition Error in `str_starts()`: ! `pattern` can't be the empty string (`""`). Code str_ends("x", "") Condition Error in `str_ends()`: ! `pattern` can't be the empty string (`""`). # functions use tidyverse recycling rules Code str_detect(1:2, 1:3) Condition Error in `str_detect()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_starts(1:2, 1:3) Condition Error in `str_starts()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_ends(1:2, 1:3) Condition Error in `str_ends()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_like(1:2, c("a", "b", "c")) Condition Error in `str_like()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). # str_like is case sensitive Code str_like("abc", regex("x")) Condition Error in `str_like()`: ! `pattern` must be a plain string, not a stringr modifier. # ignore_case is deprecated but still respected Code out <- str_like("abc", "AB%", ignore_case = TRUE) Condition Warning: The `ignore_case` argument of `str_like()` is deprecated as of stringr 1.6.0. i `str_like()` is always case sensitive. i Use `str_ilike()` for case insensitive string matching. # str_ilike works Code str_ilike("abc", regex("x")) Condition Error in `str_ilike()`: ! `pattern` must be a plain string, not a stringr modifier. ================================================ FILE: tests/testthat/_snaps/dup.md ================================================ # separator must be a single string Code str_dup("a", 3, sep = 1) Condition Error in `str_dup()`: ! `sep` must be a single string or `NULL`, not the number 1. Code str_dup("a", 3, sep = c("-", ";")) Condition Error in `str_dup()`: ! `sep` must be a single string or `NULL`, not a character vector. ================================================ FILE: tests/testthat/_snaps/equal.md ================================================ # vectorised using TRR Code str_equal(letters[1:3], c("a", "b")) Condition Error in `str_equal()`: ! Can't recycle `x` (size 3) to match `y` (size 2). ================================================ FILE: tests/testthat/_snaps/flatten.md ================================================ # collapse must be single string Code str_flatten("A", c("a", "b")) Condition Error in `str_flatten()`: ! `collapse` must be a single string, not a character vector. ================================================ FILE: tests/testthat/_snaps/interp.md ================================================ # str_interp fails when encountering nested placeholders Code str_interp("${${msg}}") Condition Error in `str_interp()`: ! Invalid template string for interpolation. Code str_interp("$[.2f]{${msg}}") Condition Error in `str_interp()`: ! Invalid template string for interpolation. # str_interp fails when input is not a character string Code str_interp(3L) Condition Error in `str_interp()`: ! `string` must be a character vector, not the number 3. # str_interp wraps parsing errors Code str_interp("This is a ${1 +}") Condition Error in `str_interp()`: ! Failed to parse input "1 +" Caused by error in `parse()`: ! :2:0: unexpected end of input 1: 1 + ^ ================================================ FILE: tests/testthat/_snaps/match.md ================================================ # match and match_all fail when pattern is not a regex Code str_match(phones, fixed("3")) Condition Error in `str_match()`: ! `pattern` must be a regular expression. Code str_match_all(phones, coll("9")) Condition Error in `str_match_all()`: ! `pattern` must be a regular expression. # match can't use other modifiers Code str_match("x", coll("y")) Condition Error in `str_match()`: ! `pattern` must be a regular expression. Code str_match_all("x", coll("y")) Condition Error in `str_match_all()`: ! `pattern` must be a regular expression. ================================================ FILE: tests/testthat/_snaps/modifiers.md ================================================ # patterns coerced to character Code . <- regex(x) Condition Warning in `regex()`: Coercing `pattern` to a plain character vector. Code . <- coll(x) Condition Warning in `coll()`: Coercing `pattern` to a plain character vector. Code . <- fixed(x) Condition Warning in `fixed()`: Coercing `pattern` to a plain character vector. # useful error message for bad type Code type(1:3) Condition Error: ! `pattern` must be a character vector, not an integer vector. # useful errors for NAs Code type(NA) Condition Error: ! `pattern` must be a character vector, not `NA`. Code type(c("a", "b", NA_character_, "c")) Condition Error: ! `pattern` can not contain NAs. ================================================ FILE: tests/testthat/_snaps/replace.md ================================================ # replacement must be a string Code str_replace("x", "x", 1) Condition Error in `str_replace()`: ! `replacement` must be a character vector, not the number 1. # can't replace empty/boundary Code str_replace("x", "", "") Condition Error in `str_replace()`: ! `pattern` can't be the empty string (`""`). Code str_replace("x", boundary("word"), "") Condition Error in `str_replace()`: ! `pattern` can't be a boundary. Code str_replace_all("x", "", "") Condition Error in `str_replace_all()`: ! `pattern` can't be the empty string (`""`). Code str_replace_all("x", boundary("word"), "") Condition Error in `str_replace_all()`: ! `pattern` can't be a boundary. # useful error if not vectorised correctly Code str_replace_all(x, "a|c", ~ if (length(x) > 1) stop("Bad")) Condition Error in `str_replace_all()`: ! Failed to apply `replacement` function. i It must accept a character vector of any length. Caused by error in `replacement()`: ! Bad # replacement function must return correct type/length Code str_replace_all("x", "x", ~1) Condition Error in `str_replace_all()`: ! `replacement` function must return a character vector, not a number. Code str_replace_all("x", "x", ~ c("a", "b")) Condition Error in `str_replace_all()`: ! `replacement` function must return a vector the same length as the input (1), not length 2. # backrefs are correctly translated Code str_replace_all("abcde", "(b)(c)(d)", "\\4") Condition Error in `stri_replace_all_regex()`: ! Trying to access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) ================================================ FILE: tests/testthat/_snaps/split.md ================================================ # str_split() checks its inputs Code str_split(letters[1:3], letters[1:2]) Condition Error in `str_split()`: ! Can't recycle `string` (size 3) to match `pattern` (size 2). Code str_split("x", 1) Condition Error in `str_split()`: ! `pattern` must be a character vector, not a number. Code str_split("x", "x", n = 0) Condition Error in `str_split()`: ! `n` must be a number larger than 1, not the number 0. # str_split_1 takes string and returns character vector `string` must be a single string, not a character vector. # str_split_fixed check its inputs Code str_split_fixed("x", "x", 0) Condition Error in `str_split_fixed()`: ! `n` must be a number larger than 1, not the number 0. # str_split_i check its inputs Code str_split_i("x", "x", 0) Condition Error in `str_split_i()`: ! `i` must not be 0. Code str_split_i("x", "x", 0.5) Condition Error in `str_split_i()`: ! `i` must be a whole number, not the number 0.5. ================================================ FILE: tests/testthat/_snaps/sub.md ================================================ # bad vectorisation gives informative error Code str_sub(x, 1:2, 1:3) Condition Error in `str_sub()`: ! Can't recycle `string` (size 2) to match `end` (size 3). Code str_sub(x, 1:2, 1:2) <- 1:3 Condition Error in `str_sub<-`: ! Can't recycle `string` (size 2) to match `value` (size 3). ================================================ FILE: tests/testthat/_snaps/subset.md ================================================ # can't use boundaries Code str_subset(c("a", "b c"), "") Condition Error in `str_subset()`: ! `pattern` can't be the empty string (`""`). Code str_subset(c("a", "b c"), boundary()) Condition Error in `str_subset()`: ! `pattern` can't be a boundary. ================================================ FILE: tests/testthat/_snaps/trunc.md ================================================ # does not truncate to a length shorter than elipsis Code str_trunc("foobar", 2) Condition Error in `str_trunc()`: ! `width` (2) is shorter than `ellipsis` (3). Code str_trunc("foobar", 3, ellipsis = "....") Condition Error in `str_trunc()`: ! `width` (3) is shorter than `ellipsis` (4). ================================================ FILE: tests/testthat/_snaps/view.md ================================================ # results are truncated Code str_view(words) Output [1] | a [2] | able [3] | about [4] | absolute [5] | accept [6] | account [7] | achieve [8] | across [9] | act [10] | active [11] | actual [12] | add [13] | address [14] | admit [15] | advertise [16] | affect [17] | afford [18] | after [19] | afternoon [20] | again ... and 960 more --- Code str_view(words) Output [1] | a [2] | able [3] | about [4] | absolute [5] | accept ... and 975 more # indices come from original vector Code str_view(letters, "a|z", match = TRUE) Output [1] | [26] | # view highlights all matches Code str_view(x, "[aeiou]") Output [1] | bc [2] | df Code str_view(x, "d|e") Output [2] | f # view highlights whitespace (except a space/nl) Code str_view(x) Output [1] | [2] | {\u00a0} [3] | | [4] | {\t} Code # or can instead use escapes str_view(x, use_escapes = TRUE) Output [1] | [2] | \u00a0 [3] | \n [4] | \t # view displays message for empty vectors Code str_view(character()) Message x Empty `string` provided. # can match across lines Code str_view("a\nb\nbbb\nc", "(b|\n)+") Output [1] | a< | b | bbb | >c # str_view_all() is deprecated Code str_view_all("abc", "a|b") Condition Warning: `str_view_all()` was deprecated in stringr 1.5.0. i Please use `str_view()` instead. Output [1] | c # html mode continues to work Code str_view(x, "[aeiou]", html = TRUE)$x$html Output
  • abc
  • def
Code str_view(x, "d|e", html = TRUE)$x$html Output
  • def
--- Code str_view(x, html = TRUE, use_escapes = TRUE)$x$html Output
  •  
  • \u00a0
  • \n
================================================ FILE: tests/testthat/test-c.R ================================================ test_that("basic case works", { test <- c("a", "b", "c") expect_equal(str_c(test), test) expect_equal(str_c(test, sep = " "), test) expect_equal(str_c(test, collapse = ""), "abc") }) test_that("obeys tidyverse recycling rules", { expect_equal(str_c(), character()) expect_equal(str_c("x", character()), character()) expect_equal(str_c("x", NULL), "x") expect_snapshot(str_c(c("x", "y"), character()), error = TRUE) expect_equal(str_c(c("x", "y"), NULL), c("x", "y")) }) test_that("vectorised arguments error", { expect_snapshot(error = TRUE, { str_c(letters, sep = c("a", "b")) str_c(letters, collapse = c("a", "b")) }) }) ================================================ FILE: tests/testthat/test-case.R ================================================ test_that("to_upper and to_lower have equivalent base versions", { x <- "This is a sentence." expect_identical(str_to_upper(x), toupper(x)) expect_identical(str_to_lower(x), tolower(x)) }) test_that("to_title creates one capital letter per word", { x <- "This is a sentence." expect_equal(str_count(x, "\\W+"), str_count(str_to_title(x), "[[:upper:]]")) }) test_that("to_sentence capitalizes just the first letter", { expect_identical(str_to_sentence("a Test"), "A test") }) test_that("case conversions preserve names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_to_lower(x)), names(x)) expect_equal(names(str_to_upper(x)), names(x)) expect_equal(names(str_to_title(x)), names(x)) }) # programming cases ----------------------------------------------------------- test_that("to_camel can control case of first argument", { expect_equal(str_to_camel("my_variable"), "myVariable") expect_equal(str_to_camel("my$variable"), "myVariable") expect_equal(str_to_camel(" my variable "), "myVariable") expect_equal(str_to_camel("my_variable", first_upper = TRUE), "MyVariable") }) test_that("to_kebab converts to kebab case", { expect_equal(str_to_kebab("myVariable"), "my-variable") expect_equal(str_to_kebab("MyVariable"), "my-variable") expect_equal(str_to_kebab("1MyVariable1"), "1-my-variable-1") expect_equal(str_to_kebab("My$Variable"), "my-variable") expect_equal(str_to_kebab(" My Variable "), "my-variable") expect_equal(str_to_kebab("testABCTest"), "test-abc-test") expect_equal(str_to_kebab("IlÉtaitUneFois"), "il-était-une-fois") }) test_that("to_snake converts to snake case", { expect_equal(str_to_snake("myVariable"), "my_variable") expect_equal(str_to_snake("MyVariable"), "my_variable") expect_equal(str_to_snake("1MyVariable1"), "1_my_variable_1") expect_equal(str_to_snake("My$Variable"), "my_variable") expect_equal(str_to_snake(" My Variable "), "my_variable") expect_equal(str_to_snake("testABCTest"), "test_abc_test") expect_equal(str_to_snake("IlÉtaitUneFois"), "il_était_une_fois") }) test_that("to_words handles common compound cases", { expect_equal(to_words("a_b"), "a b") expect_equal(to_words("a-b"), "a b") expect_equal(to_words("aB"), "a b") expect_equal(to_words("a123b"), "a 123 b") expect_equal(to_words("HTML"), "html") }) ================================================ FILE: tests/testthat/test-conv.R ================================================ test_that("encoding conversion works", { skip_on_os("windows") x <- rawToChar(as.raw(177)) expect_equal(str_conv(x, "latin1"), "±") }) test_that("check encoding argument", { expect_snapshot(str_conv("A", c("ISO-8859-1", "ISO-8859-2")), error = TRUE) }) test_that("str_conv() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_conv(x, "UTF-8")), names(x)) }) ================================================ FILE: tests/testthat/test-count.R ================================================ test_that("counts are as expected", { fruit <- c("apple", "banana", "pear", "pineapple") expect_equal(str_count(fruit, "a"), c(1, 3, 1, 1)) expect_equal(str_count(fruit, "p"), c(2, 0, 1, 3)) expect_equal(str_count(fruit, "e"), c(1, 0, 1, 2)) expect_equal(str_count(fruit, c("a", "b", "p", "n")), c(1, 1, 1, 1)) }) test_that("uses tidyverse recycling rules", { expect_error(str_count(1:2, 1:3), class = "vctrs_error_incompatible_size") }) test_that("can use fixed() and coll()", { expect_equal(str_count("x.", fixed(".")), 1) expect_equal(str_count("\u0131", turkish_I()), 1) }) test_that("can count boundaries", { # str_count(x, boundary()) == lengths(str_split(x, boundary())) expect_equal(str_count("a b c", ""), 5) expect_equal(str_count("a b c", boundary("word")), 3) }) test_that("str_count() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_count(x, ".")), names(x)) }) test_that("str_count() drops names when pattern is vector and string is scalar", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(names(str_count(x1, p2))) }) test_that("str_count() preserves names when pattern and string have same length", { x2 <- c(A = "ab", B = "cd") p2 <- c("a", "c") expect_equal(names(str_count(x2, p2)), names(x2)) }) ================================================ FILE: tests/testthat/test-detect.R ================================================ test_that("special cases are correct", { expect_equal(str_detect(NA, "x"), NA) expect_equal(str_detect(character(), "x"), logical()) }) test_that("vectorised patterns work", { expect_equal(str_detect("ab", c("a", "b", "c")), c(T, T, F)) expect_equal(str_detect(c("ca", "ab"), c("a", "c")), c(T, F)) # negation works expect_equal(str_detect("ab", c("a", "b", "c"), negate = TRUE), c(F, F, T)) }) test_that("str_starts() and str_ends() match expected strings", { expect_equal(str_starts(c("ab", "ba"), "a"), c(TRUE, FALSE)) expect_equal(str_ends(c("ab", "ba"), "a"), c(FALSE, TRUE)) # negation expect_equal(str_starts(c("ab", "ba"), "a", negate = TRUE), c(FALSE, TRUE)) expect_equal(str_ends(c("ab", "ba"), "a", negate = TRUE), c(TRUE, FALSE)) # correct precedence expect_equal(str_starts(c("ab", "ba", "cb"), "a|b"), c(TRUE, TRUE, FALSE)) expect_equal(str_ends(c("ab", "ba", "bc"), "a|b"), c(TRUE, TRUE, FALSE)) }) test_that("can use fixed() and coll()", { expect_equal(str_detect("X", fixed(".")), FALSE) expect_equal(str_starts("X", fixed(".")), FALSE) expect_equal(str_ends("X", fixed(".")), FALSE) expect_equal(str_detect("\u0131", turkish_I()), TRUE) expect_equal(str_starts("\u0131", turkish_I()), TRUE) expect_equal(str_ends("\u0131", turkish_I()), TRUE) }) test_that("can't empty/boundary", { expect_snapshot(error = TRUE, { str_detect("x", "") str_starts("x", "") str_ends("x", "") }) }) test_that("functions use tidyverse recycling rules", { expect_snapshot(error = TRUE, { str_detect(1:2, 1:3) str_starts(1:2, 1:3) str_ends(1:2, 1:3) str_like(1:2, c("a", "b", "c")) }) }) test_that("detection functions preserve names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_detect(x, "[123]")), names(x)) expect_equal(names(str_starts(x, "1")), names(x)) expect_equal(names(str_ends(x, "1")), names(x)) expect_equal(names(str_like(x, "%")), names(x)) expect_equal(names(str_ilike(x, "%")), names(x)) }) test_that("detection drops names when pattern is vector and string is scalar", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(names(str_detect(x1, p2))) expect_null(names(str_starts(x1, p2))) expect_null(names(str_ends(x1, p2))) expect_null(names(str_like(x1, p2))) expect_null(names(str_ilike(x1, p2))) }) test_that("detection preserves names when pattern and string have same length", { x2 <- c(A = "ab", B = "cd") p2 <- c("a", "c") expect_equal(names(str_detect(x2, p2)), names(x2)) expect_equal(names(str_starts(x2, p2)), names(x2)) expect_equal(names(str_ends(x2, p2)), names(x2)) expect_equal(names(str_like(x2, p2)), names(x2)) expect_equal(names(str_ilike(x2, p2)), names(x2)) }) # str_like ---------------------------------------------------------------- test_that("str_like is case sensitive", { expect_true(str_like("abc", "ab%")) expect_false(str_like("abc", "AB%")) expect_snapshot(str_like("abc", regex("x")), error = TRUE) }) test_that("ignore_case is deprecated but still respected", { expect_snapshot(out <- str_like("abc", "AB%", ignore_case = TRUE)) expect_equal(out, TRUE) expect_warning(out <- str_like("abc", "AB%", ignore_case = FALSE)) expect_equal(out, FALSE) }) test_that("str_ilike works", { expect_true(str_ilike("abc", "ab%")) expect_true(str_ilike("abc", "AB%")) expect_snapshot(str_ilike("abc", regex("x")), error = TRUE) }) test_that("like_to_regex generates expected regexps", { expect_equal(like_to_regex("ab%"), "^ab.*$") expect_equal(like_to_regex("ab_"), "^ab.$") # escaping expect_equal(like_to_regex("ab\\%"), "^ab\\%$") expect_equal(like_to_regex("ab[%]"), "^ab[%]$") }) ================================================ FILE: tests/testthat/test-dup.R ================================================ test_that("basic duplication works", { expect_equal(str_dup("a", 3), "aaa") expect_equal(str_dup("abc", 2), "abcabc") expect_equal(str_dup(c("a", "b"), 2), c("aa", "bb")) expect_equal(str_dup(c("a", "b"), c(2, 3)), c("aa", "bbb")) }) test_that("0 duplicates equals empty string", { expect_equal(str_dup("a", 0), "") expect_equal(str_dup(c("a", "b"), 0), rep("", 2)) }) test_that("uses tidyverse recycling rules", { expect_error(str_dup(1:2, 1:3), class = "vctrs_error_incompatible_size") }) test_that("uses sep argument", { expect_equal(str_dup("abc", 1, sep = "-"), "abc") expect_equal(str_dup("abc", 2, sep = "-"), "abc-abc") expect_equal(str_dup(c("a", "b"), 2, sep = "-"), c("a-a", "b-b")) expect_equal(str_dup(c("a", "b"), c(1, 2), sep = "-"), c("a", "b-b")) expect_equal(str_dup(character(), 1, sep = "-"), character()) expect_equal(str_dup(character(), 2, sep = "-"), character()) }) test_that("separator must be a single string", { expect_snapshot(error = TRUE, { str_dup("a", 3, sep = 1) str_dup("a", 3, sep = c("-", ";")) }) }) test_that("str_dup() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_dup(x, 2)), names(x)) }) ================================================ FILE: tests/testthat/test-equal.R ================================================ test_that("vectorised using TRR", { expect_equal(str_equal("a", character()), logical()) expect_equal(str_equal("a", "b"), FALSE) expect_equal(str_equal("a", c("a", "b")), c(TRUE, FALSE)) expect_snapshot(str_equal(letters[1:3], c("a", "b")), error = TRUE) }) test_that("can ignore case", { expect_equal(str_equal("a", "A"), FALSE) expect_equal(str_equal("a", "A", ignore_case = TRUE), TRUE) }) ================================================ FILE: tests/testthat/test-escape.R ================================================ test_that("multiplication works", { expect_equal( str_escape(".^$|*+?{}[]()"), "\\.\\^\\$\\|\\*\\+\\?\\{\\}\\[\\]\\(\\)" ) expect_equal(str_escape("\\"), "\\\\") }) test_that("str_escape() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_escape(x)), names(x)) }) ================================================ FILE: tests/testthat/test-extract.R ================================================ test_that("single pattern extracted correctly", { test <- c("one two three", "a b c") expect_equal( str_extract_all(test, "[a-z]+"), list(c("one", "two", "three"), c("a", "b", "c")) ) expect_equal( str_extract_all(test, "[a-z]{3,}"), list(c("one", "two", "three"), character()) ) }) test_that("uses tidyverse recycling rules", { expect_error( str_extract(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) expect_error( str_extract_all(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) test_that("no match yields empty vector", { expect_equal(str_extract_all("a", "b")[[1]], character()) }) test_that("str_extract extracts first match if found, NA otherwise", { shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") word_1_to_4 <- str_extract(shopping_list, "\\b[a-z]{1,4}\\b") expect_length(word_1_to_4, length(shopping_list)) expect_equal(word_1_to_4[1], NA_character_) }) test_that("can extract a group", { expect_equal(str_extract("abc", "(.).(.)", group = 1), "a") expect_equal(str_extract("abc", "(.).(.)", group = 2), "c") }) test_that("can use fixed() and coll()", { expect_equal(str_extract("x.x", fixed(".")), ".") expect_equal(str_extract_all("x.x.", fixed(".")), list(c(".", "."))) expect_equal(str_extract("\u0131", turkish_I()), "\u0131") expect_equal(str_extract_all("\u0131I", turkish_I()), list(c("\u0131", "I"))) }) test_that("can extract boundaries", { expect_equal(str_extract("a b c", ""), "a") expect_equal( str_extract_all("a b c", ""), list(c("a", " ", "b", " ", "c")) ) expect_equal(str_extract("a b c", boundary("word")), "a") expect_equal( str_extract_all("a b c", boundary("word")), list(c("a", "b", "c")) ) }) test_that("str_extract() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_extract(x, "[0-9]")), names(x)) }) test_that("str_extract_all() preserves names on outer structure", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_extract_all(x, "[0-9]")), names(x)) }) test_that("str_extract and extract_all handle vectorised patterns and names", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(names(str_extract(x1, p2))) expect_null(names(str_extract_all(x1, p2))) x2 <- c(A = "ab", B = "cd") expect_equal(names(str_extract(x2, p2)), names(x2)) expect_equal(names(str_extract_all(x2, p2)), names(x2)) }) ================================================ FILE: tests/testthat/test-flatten.R ================================================ test_that("equivalent to paste with collapse", { expect_equal(str_flatten(letters), paste0(letters, collapse = "")) }) test_that("collapse must be single string", { expect_snapshot(str_flatten("A", c("a", "b")), error = TRUE) }) test_that("last optionally used instead of final separator", { expect_equal(str_flatten(letters[1:3], ", ", ", and "), "a, b, and c") expect_equal(str_flatten(letters[1:2], ", ", ", and "), "a, and b") expect_equal(str_flatten(letters[1], ", ", ", and "), "a") }) test_that("can remove missing values", { expect_equal(str_flatten(c("a", NA)), NA_character_) expect_equal(str_flatten(c("a", NA), na.rm = TRUE), "a") }) test_that("str_flatten_oxford removes comma iif necessary", { expect_equal(str_flatten_comma(letters[1:2], ", or "), "a or b") expect_equal(str_flatten_comma(letters[1:3], ", or "), "a, b, or c") expect_equal(str_flatten_comma(letters[1:3], " or "), "a, b or c") expect_equal(str_flatten_comma(letters[1:3]), "a, b, c") }) ================================================ FILE: tests/testthat/test-glue.R ================================================ test_that("verify wrapper is functional", { expect_equal(as.character(str_glue("a {b}", b = "b")), "a b") df <- data.frame(b = "b") expect_equal(as.character(str_glue_data(df, "a {b}", b = "b")), "a b") }) test_that("verify trim is functional", { expect_equal(as.character(str_glue("L1\t \n \tL2")), "L1\t \nL2") expect_equal( as.character(str_glue("L1\t \n \tL2", .trim = FALSE)), "L1\t \n \tL2" ) }) ================================================ FILE: tests/testthat/test-interp.R ================================================ test_that("str_interp works with default env", { subject <- "statistics" number <- 7 floating <- 6.656 expect_equal( str_interp("A ${subject}. B $[d]{number}. C $[.2f]{floating}."), "A statistics. B 7. C 6.66." ) expect_equal( str_interp("Pi is approximately $[.5f]{pi}"), "Pi is approximately 3.14159" ) }) test_that("str_interp works with lists and data frames.", { expect_equal( str_interp( "One value, ${value1}, and then another, ${value2*2}.", list(value1 = 10, value2 = 20) ), "One value, 10, and then another, 40." ) expect_equal( str_interp( "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", iris ), "Values are 4.40 and 2.00." ) }) test_that("str_interp works with nested expressions", { amount <- 1337 expect_equal( str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}"), "Works with } nested { braces too: 5348.00" ) }) test_that("str_interp works in the absense of placeholders", { expect_equal( str_interp("A quite static string here."), "A quite static string here." ) }) test_that("str_interp fails when encountering nested placeholders", { msg <- "This will never see the light of day" num <- 1.2345 expect_snapshot(error = TRUE, { str_interp("${${msg}}") str_interp("$[.2f]{${msg}}") }) }) test_that("str_interp fails when input is not a character string", { expect_snapshot(str_interp(3L), error = TRUE) }) test_that("str_interp wraps parsing errors", { expect_snapshot(str_interp("This is a ${1 +}"), error = TRUE) }) test_that("str_interp formats list independetly of other placeholders", { a_list <- c("item1", "item2", "item3") other <- "1" extract <- function(text) regmatches(text, regexpr("xx[^x]+xx", text)) from_list <- extract(str_interp("list: xx${a_list}xx")) from_both <- extract(str_interp("list: xx${a_list}xx, and another ${other}")) expect_equal(from_list, from_both) }) ================================================ FILE: tests/testthat/test-length.R ================================================ test_that("str_length is number of characters", { expect_equal(str_length("a"), 1) expect_equal(str_length("ab"), 2) expect_equal(str_length("abc"), 3) }) test_that("str_length of missing string is missing", { expect_equal(str_length(NA), NA_integer_) expect_equal(str_length(c(NA, 1)), c(NA, 1)) expect_equal(str_length("NA"), 2) }) test_that("str_length of factor is length of level", { expect_equal(str_length(factor("a")), 1) expect_equal(str_length(factor("ab")), 2) expect_equal(str_length(factor("abc")), 3) }) test_that("str_width returns display width", { x <- c("\u0308", "x", "\U0001f60a") expect_equal(str_width(x), c(0, 1, 2)) }) test_that("length/width preserve names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_length(x)), names(x)) expect_equal(names(str_width(x)), names(x)) }) ================================================ FILE: tests/testthat/test-locate.R ================================================ test_that("basic location matching works", { expect_equal(str_locate("abc", "a")[1, ], c(start = 1, end = 1)) expect_equal(str_locate("abc", "b")[1, ], c(start = 2, end = 2)) expect_equal(str_locate("abc", "c")[1, ], c(start = 3, end = 3)) expect_equal(str_locate("abc", ".+")[1, ], c(start = 1, end = 3)) }) test_that("uses tidyverse recycling rules", { expect_error(str_locate(1:2, 1:3), class = "vctrs_error_incompatible_size") expect_error( str_locate_all(1:2, 1:3), class = "vctrs_error_incompatible_size" ) }) test_that("locations are integers", { strings <- c("a b c", "d e f") expect_true(is.integer(str_locate(strings, "[a-z]"))) res <- str_locate_all(strings, "[a-z]")[[1]] expect_true(is.integer(res)) expect_true(is.integer(invert_match(res))) }) test_that("both string and patterns are vectorised", { strings <- c("abc", "def") locs <- str_locate(strings, "a") expect_equal(locs[, "start"], c(1, NA)) locs <- str_locate(strings, c("a", "d")) expect_equal(locs[, "start"], c(1, 1)) expect_equal(locs[, "end"], c(1, 1)) locs <- str_locate_all(c("abab"), c("a", "b")) expect_equal(locs[[1]][, "start"], c(1, 3)) expect_equal(locs[[2]][, "start"], c(2, 4)) }) test_that("can use fixed() and coll()", { expect_equal(str_locate("x.x", fixed(".")), cbind(start = 2, end = 2)) expect_equal( str_locate_all("x.x.", fixed(".")), list(cbind(start = c(2, 4), end = c(2, 4))) ) expect_equal(str_locate("\u0131", turkish_I()), cbind(start = 1, end = 1)) expect_equal( str_locate_all("\u0131I", turkish_I()), list(cbind(start = 1:2, end = 1:2)) ) }) test_that("can use boundaries", { expect_equal( str_locate(" x y", ""), cbind(start = 1, end = 1) ) expect_equal( str_locate_all("abc", ""), list(cbind(start = 1:3, end = 1:3)) ) expect_equal( str_locate(" xy", boundary("word")), cbind(start = 2, end = 3) ) expect_equal( str_locate_all(" ab cd", boundary("word")), list(cbind(start = c(2, 6), end = c(3, 7))) ) }) test_that("str_locate() preserves row names when 1:1 with input", { x <- c(C = "3", B = "2", A = "1") expect_equal(rownames(str_locate(x, "[0-9]")), names(x)) }) test_that("str_locate_all() preserves names on outer structure", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_locate_all(x, "[0-9]")), names(x)) }) test_that("locate handles vectorised patterns and names", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(rownames(str_locate(x1, p2))) expect_null(names(str_locate_all(x1, p2))) x2 <- c(A = "ab", B = "cd") expect_equal(rownames(str_locate(x2, p2)), names(x2)) expect_equal(names(str_locate_all(x2, p2)), names(x2)) }) ================================================ FILE: tests/testthat/test-match.R ================================================ set.seed(1410) num <- matrix(sample(9, 10 * 10, replace = T), ncol = 10) num_flat <- apply(num, 1, str_c, collapse = "") phones <- str_c( "(", num[, 1], num[, 2], num[, 3], ") ", num[, 4], num[, 5], num[, 6], " ", num[, 7], num[, 8], num[, 9], num[, 10] ) test_that("empty strings return correct matrix of correct size", { skip_if_not_installed("stringi", "1.2.2") expect_equal(str_match(NA, "(a)"), matrix(NA_character_, 1, 2)) expect_equal(str_match(character(), "(a)"), matrix(character(), 0, 2)) }) test_that("no matching cases returns 1 column matrix", { res <- str_match(c("a", "b"), ".") expect_equal(nrow(res), 2) expect_equal(ncol(res), 1) expect_equal(res[, 1], c("a", "b")) }) test_that("single match works when all match", { matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_equal(nrow(matches), length(phones)) expect_equal(ncol(matches), 4) expect_equal(matches[, 1], phones) matches_flat <- apply(matches[, -1], 1, str_c, collapse = "") expect_equal(matches_flat, num_flat) }) test_that("match returns NA when some inputs don't match", { matches <- str_match( c(phones, "blah", NA), "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})" ) expect_equal(nrow(matches), length(phones) + 2) expect_equal(ncol(matches), 4) expect_equal(matches[11, ], rep(NA_character_, 4)) expect_equal(matches[12, ], rep(NA_character_, 4)) }) test_that("match returns NA when optional group doesn't match", { expect_equal(str_match(c("ab", "a"), "(a)(b)?")[, 3], c("b", NA)) }) test_that("match_all returns NA when option group doesn't match", { expect_equal(str_match_all("a", "(a)(b)?")[[1]][1, ], c("a", "a", NA)) }) test_that("multiple match works", { phones_one <- str_c(phones, collapse = " ") multi_match <- str_match_all( phones_one, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})" ) single_matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_equal(multi_match[[1]], single_matches) }) test_that("match and match_all fail when pattern is not a regex", { expect_snapshot(error = TRUE, { str_match(phones, fixed("3")) str_match_all(phones, coll("9")) }) }) test_that("uses tidyverse recycling rules", { expect_error( str_match(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) expect_error( str_match_all(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) test_that("match can't use other modifiers", { expect_snapshot(error = TRUE, { str_match("x", coll("y")) str_match_all("x", coll("y")) }) }) test_that("str_match() preserves row names when 1:1 with input", { x <- c(C = "3", B = "2", A = "1") expect_equal(rownames(str_match(x, "([0-9])")), names(x)) }) test_that("str_match_all() preserves names on outer structure", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_match_all(x, "([0-9])")), names(x)) }) test_that("match handles vectorised patterns and names", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(rownames(str_match(x1, p2))) expect_null(names(str_match_all(x1, p2))) x2 <- c(A = "ab", B = "cd") expect_equal(rownames(str_match(x2, p2)), names(x2)) expect_equal(names(str_match_all(x2, p2)), names(x2)) }) ================================================ FILE: tests/testthat/test-modifiers.R ================================================ test_that("patterns coerced to character", { x <- factor("a") expect_snapshot({ . <- regex(x) . <- coll(x) . <- fixed(x) }) }) test_that("useful error message for bad type", { expect_snapshot(error = TRUE, { type(1:3) }) }) test_that("fallback for regex (#433)", { expect_equal(type(structure("x", class = "regex")), "regex") }) test_that("ignore_case sets strength, but can override manually", { x1 <- coll("x", strength = 1) x2 <- coll("x", ignore_case = TRUE) x3 <- coll("x") expect_equal(attr(x1, "options")$strength, 1) expect_equal(attr(x2, "options")$strength, 2) expect_equal(attr(x3, "options")$strength, 3) }) test_that("boundary has length 1", { expect_length(boundary(), 1) }) test_that("subsetting preserves class and options", { x <- regex("a", multiline = TRUE) expect_equal(x[], x) }) test_that("useful errors for NAs", { expect_snapshot(error = TRUE, { type(NA) type(c("a", "b", NA_character_, "c")) }) }) test_that("stringr_pattern methods", { ex <- coll(c("foo", "bar")) expect_true(inherits(ex[1], "stringr_pattern")) expect_true(inherits(ex[[1]], "stringr_pattern")) }) ================================================ FILE: tests/testthat/test-pad.R ================================================ test_that("long strings are unchanged", { lengths <- sample(40:100, 10) strings <- vapply( lengths, function(x) { str_c(letters[sample(26, x, replace = T)], collapse = "") }, character(1) ) padded <- str_pad(strings, width = 30) expect_equal(str_length(padded), str_length(strings)) }) test_that("directions work for simple case", { pad <- function(direction) str_pad("had", direction, width = 10) expect_equal(pad("right"), "had ") expect_equal(pad("left"), " had") expect_equal(pad("both"), " had ") }) test_that("padding based of length works", { # \u4e2d is a 2-characters-wide Chinese character pad <- function(...) str_pad("\u4e2d", ..., side = "both") expect_equal(pad(width = 6), " \u4e2d ") expect_equal(pad(width = 5, use_width = FALSE), " \u4e2d ") }) test_that("uses tidyverse recycling rules", { expect_error( str_pad(c("a", "b"), 1:3), class = "vctrs_error_incompatible_size" ) expect_error( str_pad(c("a", "b"), 10, pad = c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) test_that("str_pad() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_pad(x, 2, side = "left")), names(x)) }) ================================================ FILE: tests/testthat/test-remove.R ================================================ test_that("succesfully wraps str_replace_all", { expect_equal(str_remove_all("abababa", "ba"), "a") expect_equal(str_remove("abababa", "ba"), "ababa") }) test_that("str_remove() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_remove(x, "[0-9]")), names(x)) }) ================================================ FILE: tests/testthat/test-replace.R ================================================ test_that("basic replacement works", { expect_equal(str_replace_all("abababa", "ba", "BA"), "aBABABA") expect_equal(str_replace("abababa", "ba", "BA"), "aBAbaba") }) test_that("can replace multiple matches", { x <- c("a1", "b2") y <- str_replace_all(x, c("a" = "1", "b" = "2")) expect_equal(y, c("11", "22")) }) test_that("even when lengths differ", { x <- c("a1", "b2", "c3") y <- str_replace_all(x, c("a" = "1", "b" = "2")) expect_equal(y, c("11", "22", "c3")) }) test_that("multiple matches respects class", { x <- c("x", "y") y <- str_replace_all(x, regex(c("X" = "a"), ignore_case = TRUE)) expect_equal(y, c("a", "y")) }) test_that("replacement must be a string", { expect_snapshot(str_replace("x", "x", 1), error = TRUE) }) test_that("replacement must be a string", { expect_equal(str_replace("xyz", "x", NA_character_), NA_character_) }) test_that("can replace all types of NA values", { expect_equal(str_replace_na(NA), "NA") expect_equal(str_replace_na(NA_character_), "NA") expect_equal(str_replace_na(NA_complex_), "NA") expect_equal(str_replace_na(NA_integer_), "NA") expect_equal(str_replace_na(NA_real_), "NA") }) test_that("can use fixed() and coll()", { expect_equal(str_replace("x.x", fixed("."), "Y"), "xYx") expect_equal(str_replace_all("x.x.", fixed("."), "Y"), "xYxY") expect_equal(str_replace("\u0131", turkish_I(), "Y"), "Y") expect_equal(str_replace_all("\u0131I", turkish_I(), "Y"), "YY") }) test_that("can't replace empty/boundary", { expect_snapshot(error = TRUE, { str_replace("x", "", "") str_replace("x", boundary("word"), "") str_replace_all("x", "", "") str_replace_all("x", boundary("word"), "") }) }) # functions --------------------------------------------------------------- test_that("can replace multiple values", { expect_equal(str_replace("abc", "a|c", toupper), "Abc") expect_equal(str_replace_all("abc", "a|c", toupper), "AbC") }) test_that("can use formula", { expect_equal(str_replace("abc", "b", ~"x"), "axc") expect_equal(str_replace_all("abc", "b", ~"x"), "axc") }) test_that("replacement can be different length", { double <- function(x) str_dup(x, 2) expect_equal(str_replace_all("abc", "a|c", double), "aabcc") }) test_that("replacement is vectorised", { x <- c("", "a", "b", "ab", "abc", "cba") expect_equal( str_replace_all(x, "a|c", ~ toupper(str_dup(.x, 2))), c("", "AA", "b", "AAb", "AAbCC", "CCbAA") ) }) test_that("is forgiving of 0 matches with paste", { x <- c("a", "b", "c") expect_equal(str_replace_all(x, "d", ~ paste("x", .x)), x) }) test_that("useful error if not vectorised correctly", { x <- c("a", "b", "c") expect_snapshot( str_replace_all(x, "a|c", ~ if (length(x) > 1) stop("Bad")), error = TRUE ) }) test_that("works with no match", { expect_equal(str_replace("abc", "z", toupper), "abc") }) test_that("works with zero length match", { expect_equal(str_replace("abc", "$", toupper), "abc") expect_equal(str_replace_all("abc", "$|^", ~ rep("X", length(.x))), "XabcX") }) test_that("replacement function must return correct type/length", { expect_snapshot(error = TRUE, { str_replace_all("x", "x", ~1) str_replace_all("x", "x", ~ c("a", "b")) }) }) # fix_replacement --------------------------------------------------------- test_that("backrefs are correctly translated", { expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\1"), "abe") expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\2"), "ace") expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\3"), "ade") # gsub("(b)(c)(d)", "\\0", "abcde", perl=TRUE) gives a0e, # in ICU regex $0 refers to the whole pattern match expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\0"), "abcde") # gsub("(b)(c)(d)", "\\4", "abcde", perl=TRUE) is legal, # in ICU regex this gives an U_INDEX_OUTOFBOUNDS_ERROR expect_snapshot(str_replace_all("abcde", "(b)(c)(d)", "\\4"), error = TRUE) expect_equal(str_replace_all("abcde", "bcd", "\\\\1"), "a\\1e") expect_equal(str_replace_all("a!1!2!b", "!", "$"), "a$1$2$b") expect_equal(str_replace("aba", "b", "$"), "a$a") expect_equal(str_replace("aba", "b", "$$$"), "a$$$a") expect_equal(str_replace("aba", "(b)", "\\1$\\1$\\1"), "ab$b$ba") expect_equal(str_replace("aba", "(b)", "\\1$\\\\1$\\1"), "ab$\\1$ba") expect_equal(str_replace("aba", "(b)", "\\\\1$\\1$\\\\1"), "a\\1$b$\\1a") }) test_that("$ are escaped", { expect_equal(fix_replacement("$"), "\\$") expect_equal(fix_replacement("\\$"), "\\\\$") }) test_that("\1 converted to $1 etc", { expect_equal(fix_replacement("\\1"), "$1") expect_equal(fix_replacement("\\9"), "$9") }) test_that("\\1 left as is", { expect_equal(fix_replacement("\\\\1"), "\\\\1") }) test_that("replace functions preserve names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_replace(x, "[0-9]", "x")), names(x)) expect_equal(names(str_replace_all(x, "[0-9]", "x")), names(x)) }) test_that("replace functions handle vectorised patterns and names", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(names(str_replace(x1, p2, "x"))) expect_null(names(str_replace_all(x1, p2, "x"))) x2 <- c(A = "ab", B = "cd") expect_equal(names(str_replace(x2, p2, "x")), names(x2)) expect_equal(names(str_replace_all(x2, p2, "x")), names(x2)) }) test_that("str_replace_na() preserves names", { y <- c(A = NA, B = "x") expect_equal(names(str_replace_na(y)), names(y)) }) ================================================ FILE: tests/testthat/test-sort.R ================================================ test_that("digits can be sorted/ordered as strings or numbers", { x <- c("2", "1", "10") expect_equal(str_sort(x, numeric = FALSE), c("1", "10", "2")) expect_equal(str_sort(x, numeric = TRUE), c("1", "2", "10")) expect_equal(str_order(x, numeric = FALSE), c(2, 3, 1)) expect_equal(str_order(x, numeric = TRUE), c(2, 1, 3)) expect_equal(str_rank(x, numeric = FALSE), c(3, 1, 2)) expect_equal(str_rank(x, numeric = TRUE), c(2, 1, 3)) }) test_that("NA can be at beginning or end", { x <- c("2", "1", NA, "10") na_end <- str_sort(x, numeric = TRUE, na_last = TRUE) expect_equal(tail(na_end, 1), NA_character_) na_start <- str_sort(x, numeric = TRUE, na_last = FALSE) expect_equal(head(na_start, 1), NA_character_) }) test_that("str_sort() preserves names", { x <- c(C = "3", B = "2", A = "1") out <- str_sort(x) expect_equal(names(out), c("A", "B", "C")) }) ================================================ FILE: tests/testthat/test-split.R ================================================ test_that("special cases are correct", { expect_equal(str_split(NA, "")[[1]], NA_character_) expect_equal(str_split(character(), ""), list()) }) test_that("str_split functions as expected", { expect_equal( str_split(c("bab", "cac", "dadad"), "a"), list(c("b", "b"), c("c", "c"), c("d", "d", "d")) ) }) test_that("str_split() can split by special patterns", { expect_equal(str_split("ab", ""), list(c("a", "b"))) expect_equal( str_split("this that.", boundary("word")), list(c("this", "that")) ) expect_equal(str_split("a-b", fixed("-")), list(c("a", "b"))) expect_equal( str_split("aXb", coll("X", ignore_case = TRUE)), list(c("a", "b")) ) }) test_that("boundary() can be recycled", { expect_equal(str_split(c("x", "y"), boundary()), list("x", "y")) }) test_that("str_split() can control maximum number of splits", { expect_equal( str_split(c("a", "a-b"), n = 1, "-"), list("a", "a-b") ) expect_equal( str_split(c("a", "a-b"), n = 3, "-"), list("a", c("a", "b")) ) }) test_that("str_split() checks its inputs", { expect_snapshot(error = TRUE, { str_split(letters[1:3], letters[1:2]) str_split("x", 1) str_split("x", "x", n = 0) }) }) test_that("str_split_1 takes string and returns character vector", { expect_equal(str_split_1("abc", ""), c("a", "b", "c")) expect_snapshot_error(str_split_1(letters, "")) }) test_that("str_split_fixed pads with empty string", { expect_equal( str_split_fixed(c("a", "a-b"), "-", 1), cbind(c("a", "a-b")) ) expect_equal( str_split_fixed(c("a", "a-b"), "-", 2), cbind(c("a", "a"), c("", "b")) ) expect_equal( str_split_fixed(c("a", "a-b"), "-", 3), cbind(c("a", "a"), c("", "b"), c("", "")) ) }) test_that("str_split_fixed check its inputs", { expect_snapshot(str_split_fixed("x", "x", 0), error = TRUE) }) # str_split_i ------------------------------------------------------------- test_that("str_split_i can extract from LHS or RHS", { expect_equal(str_split_i(c("1-2-3", "4-5"), "-", 1), c("1", "4")) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -1), c("3", "5")) }) test_that("str_split_i returns NA for absent components", { expect_equal(str_split_i(c("a", "b-c"), "-", 2), c(NA, "c")) expect_equal(str_split_i(c("a", "b-c"), "-", 3), c(NA_character_, NA)) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -3), c("1", NA)) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -4), c(NA_character_, NA)) }) test_that("str_split_i check its inputs", { expect_snapshot(error = TRUE, { str_split_i("x", "x", 0) str_split_i("x", "x", 0.5) }) }) test_that("split functions preserve names on outer structures", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_split(x, "")), names(x)) expect_equal(rownames(str_split(x, "", simplify = TRUE)), names(x)) expect_equal(rownames(str_split_fixed(x, "", 1)), names(x)) }) test_that("str_split_i() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_split_i(x, " ", 1)), names(x)) }) test_that("split handles vectorised patterns and names", { x1 <- c(A = "ab") p2 <- c("a", "b") expect_null(names(str_split(x1, p2))) expect_null(rownames(str_split(x1, p2, simplify = TRUE))) expect_null(rownames(str_split_fixed(x1, p2, 1))) x2 <- c(A = "ab", B = "cd") expect_equal(names(str_split(x2, p2)), names(x2)) expect_equal(rownames(str_split(x2, p2, simplify = TRUE)), names(x2)) expect_equal(rownames(str_split_fixed(x2, p2, 1)), names(x2)) }) ================================================ FILE: tests/testthat/test-sub.R ================================================ test_that("correct substring extracted", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, 1, 3), "abc") expect_equal(str_sub(alphabet, 24, 26), "xyz") }) test_that("can extract multiple substrings", { expect_equal( str_sub_all(c("abc", "def"), list(c(1, 2), 1), list(c(1, 2), 2)), list(c("a", "b"), "de") ) }) test_that("arguments expanded to longest", { alphabet <- str_c(letters, collapse = "") expect_equal( str_sub(alphabet, c(1, 24), c(3, 26)), c("abc", "xyz") ) expect_equal( str_sub(c("abc", "xyz"), 2, 2), c("b", "y") ) }) test_that("can supply start and end/length as a matrix", { x <- c("abc", "def") expect_equal(str_sub(x, cbind(1, end = 1)), c("a", "d")) expect_equal(str_sub(x, cbind(1, length = 2)), c("ab", "de")) expect_equal( str_sub_all(x, cbind(c(1, 2), end = c(2, 3))), list(c("ab", "bc"), c("de", "ef")) ) str_sub(x, cbind(1, end = 1)) <- c("A", "D") expect_equal(x, c("Abc", "Def")) }) test_that("specifying only end subsets from start", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, end = 3), "abc") }) test_that("specifying only start subsets to end", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, 24), "xyz") }) test_that("specifying -1 as end selects entire string", { expect_equal( str_sub("ABCDEF", c(4, 5), c(5, -1)), c("DE", "EF") ) expect_equal( str_sub("ABCDEF", c(4, 5), c(-1, -1)), c("DEF", "EF") ) }) test_that("negative values select from end", { expect_equal(str_sub("ABCDEF", 1, -4), "ABC") expect_equal(str_sub("ABCDEF", -3), "DEF") }) test_that("missing arguments give missing results", { expect_equal(str_sub(NA), NA_character_) expect_equal(str_sub(NA, 1, 3), NA_character_) expect_equal(str_sub(c(NA, "NA"), 1, 3), c(NA, "NA")) expect_equal(str_sub("test", NA, NA), NA_character_) expect_equal(str_sub(c(NA, "test"), NA, NA), rep(NA_character_, 2)) }) test_that("negative length or out of range gives empty string", { expect_equal(str_sub("abc", 2, 1), "") expect_equal(str_sub("abc", 4, 5), "") }) test_that("replacement works", { x <- "BBCDEF" str_sub(x, 1, 1) <- "A" expect_equal(x, "ABCDEF") str_sub(x, -1, -1) <- "K" expect_equal(x, "ABCDEK") str_sub(x, -2, -1) <- "EFGH" expect_equal(x, "ABCDEFGH") str_sub(x, 2, -2) <- "" expect_equal(x, "AH") }) test_that("replacement with NA works", { x <- "BBCDEF" str_sub(x, NA) <- "A" expect_equal(x, NA_character_) x <- "BBCDEF" str_sub(x, NA, omit_na = TRUE) <- "A" str_sub(x, 1, 1, omit_na = TRUE) <- NA expect_equal(x, "BBCDEF") }) test_that("bad vectorisation gives informative error", { x <- "a" expect_snapshot(error = TRUE, { str_sub(x, 1:2, 1:3) str_sub(x, 1:2, 1:2) <- 1:3 }) }) test_that("str_sub() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_sub(x, 1, 1)), names(x)) }) test_that("str_sub_all() preserves names on outer structure", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_sub_all(x, 1, 1)), names(x)) }) ================================================ FILE: tests/testthat/test-subset.R ================================================ test_that("can subset with regexps", { x <- c("a", "b", "c") expect_equal(str_subset(x, "a|c"), c("a", "c")) expect_equal(str_subset(x, "a|c", negate = TRUE), "b") }) test_that("can subset with fixed patterns", { expect_equal(str_subset(c("i", "I"), fixed("i")), "i") expect_equal( str_subset(c("i", "I"), fixed("i", ignore_case = TRUE)), c("i", "I") ) # negation works expect_equal(str_subset(c("i", "I"), fixed("i"), negate = TRUE), "I") }) test_that("str_which is equivalent to grep", { expect_equal( str_which(head(letters), "[aeiou]"), grep("[aeiou]", head(letters)) ) # negation works expect_equal( str_which(head(letters), "[aeiou]", negate = TRUE), grep("[aeiou]", head(letters), invert = TRUE) ) }) test_that("can use fixed() and coll()", { expect_equal(str_subset(c("x", "."), fixed(".")), ".") expect_equal(str_subset(c("i", "\u0131"), turkish_I()), "\u0131") }) test_that("can't use boundaries", { expect_snapshot(error = TRUE, { str_subset(c("a", "b c"), "") str_subset(c("a", "b c"), boundary()) }) }) test_that("keep names", { fruit <- c(A = "apple", B = "banana", C = "pear", D = "pineapple") expect_identical(names(str_subset(fruit, "b")), "B") expect_identical(names(str_subset(fruit, "p")), c("A", "C", "D")) expect_identical(names(str_subset(fruit, "x")), as.character()) }) test_that("str_subset() preserves names of retained elements", { x <- c(C = "3", B = "2", A = "1") out <- str_subset(x, "[12]") expect_equal(names(out), c("B", "A")) }) test_that("str_subset() never matches missing values", { expect_equal(str_subset(c("a", NA, "b"), "."), c("a", "b")) expect_identical(str_subset(NA_character_, "."), character(0)) }) ================================================ FILE: tests/testthat/test-trim.R ================================================ test_that("trimming removes spaces", { expect_equal(str_trim("abc "), "abc") expect_equal(str_trim(" abc"), "abc") expect_equal(str_trim(" abc "), "abc") }) test_that("trimming removes tabs", { expect_equal(str_trim("abc\t"), "abc") expect_equal(str_trim("\tabc"), "abc") expect_equal(str_trim("\tabc\t"), "abc") }) test_that("side argument restricts trimming", { expect_equal(str_trim(" abc ", "left"), "abc ") expect_equal(str_trim(" abc ", "right"), " abc") }) test_that("str_squish removes excess spaces from all parts of string", { expect_equal(str_squish("ab\t\tc\t"), "ab c") expect_equal(str_squish("\ta bc"), "a bc") expect_equal(str_squish("\ta\t bc\t"), "a bc") }) test_that("trimming functions preserve names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_trim(x)), names(x)) }) ================================================ FILE: tests/testthat/test-trunc.R ================================================ test_that("NA values in input pass through unchanged", { expect_equal( str_trunc(NA_character_, width = 5), NA_character_ ) expect_equal( str_trunc(c("foobar", NA), 5), c("fo...", NA) ) }) test_that("truncations work for all elements of a vector", { expect_equal( str_trunc(c("abcd", "abcde", "abcdef"), width = 5), c("abcd", "abcde", "ab...") ) }) test_that("truncations work for all sides", { trunc <- function(direction, width) { str_trunc( "This string is moderately long", direction, width = width ) } expect_equal(trunc("right", 20), "This string is mo...") expect_equal(trunc("left", 20), "...s moderately long") expect_equal(trunc("center", 20), "This stri...ely long") expect_equal(trunc("right", 3), "...") expect_equal(trunc("left", 3), "...") expect_equal(trunc("center", 3), "...") expect_equal(trunc("right", 4), "T...") expect_equal(trunc("left", 4), "...g") expect_equal(trunc("center", 4), "T...") expect_equal(trunc("right", 5), "Th...") expect_equal(trunc("left", 5), "...ng") expect_equal(trunc("center", 5), "T...g") }) test_that("does not truncate to a length shorter than elipsis", { expect_snapshot(error = TRUE, { str_trunc("foobar", 2) str_trunc("foobar", 3, ellipsis = "....") }) }) test_that("str_trunc correctly snips rhs-of-ellipsis for truncated strings", { trunc <- function(width, side) { str_trunc( c("", "a", "aa", "aaa", "aaaa", "aaaaaaa"), width, side, ellipsis = ".." ) } expect_equal(trunc(4, "right"), c("", "a", "aa", "aaa", "aaaa", "aa..")) expect_equal(trunc(4, "left"), c("", "a", "aa", "aaa", "aaaa", "..aa")) expect_equal(trunc(4, "center"), c("", "a", "aa", "aaa", "aaaa", "a..a")) expect_equal(trunc(3, "right"), c("", "a", "aa", "aaa", "a..", "a..")) expect_equal(trunc(3, "left"), c("", "a", "aa", "aaa", "..a", "..a")) expect_equal(trunc(3, "center"), c("", "a", "aa", "aaa", "a..", "a..")) expect_equal(trunc(2, "right"), c("", "a", "aa", "..", "..", "..")) expect_equal(trunc(2, "left"), c("", "a", "aa", "..", "..", "..")) expect_equal(trunc(2, "center"), c("", "a", "aa", "..", "..", "..")) }) test_that("str_trunc() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_trunc(x, 3)), names(x)) }) ================================================ FILE: tests/testthat/test-unique.R ================================================ test_that("unique values returned for strings with duplicate values", { expect_equal(str_unique(c("a", "a", "a")), "a") expect_equal(str_unique(c(NA_character_, NA_character_)), NA_character_) }) test_that("can ignore case", { expect_equal(str_unique(c("a", "A"), ignore_case = TRUE), "a") }) test_that("str_unique() preserves names of first occurrences", { y <- c(A = "a", A2 = "a", B = "b") out <- str_unique(y) expect_equal(names(out), c("A", "B")) }) ================================================ FILE: tests/testthat/test-utils.R ================================================ test_that("keep_names() returns logical flag based on inputs", { expect_true(keep_names("a", "x")) expect_false(keep_names("a", c("x", "y"))) expect_true(keep_names(c("a", "b"), "x")) expect_true(keep_names(c("a", "b"), c("x", "y"))) }) test_that("copy_names() applies names to vectors if present", { expect_equal( copy_names(c(A = "a", B = "b"), c("x", "y")), c(A = "x", B = "y") ) expect_equal( copy_names(c("a", "b"), c("x", "y")), c("x", "y") ) }) test_that("copy_names() applies rownames to matrices if present", { from <- c(A = "a", B = "b") to <- matrix(c("x", "y"), nrow = 2) expected <- to rownames(expected) <- names(from) expect_equal(copy_names(from, to), expected) expect_equal(copy_names(c("a", "b"), to), to) }) ================================================ FILE: tests/testthat/test-view.R ================================================ test_that("results are truncated", { expect_snapshot(str_view(words)) # and can control with option local_options(stringr.view_n = 5) expect_snapshot(str_view(words)) }) test_that("indices come from original vector", { expect_snapshot(str_view(letters, "a|z", match = TRUE)) }) test_that("view highlights all matches", { x <- c("abc", "def", "fgh") expect_snapshot({ str_view(x, "[aeiou]") str_view(x, "d|e") }) }) test_that("view highlights whitespace (except a space/nl)", { x <- c(" ", "\u00A0", "\n", "\t") expect_snapshot({ str_view(x) "or can instead use escapes" str_view(x, use_escapes = TRUE) }) }) test_that("view displays message for empty vectors", { expect_snapshot(str_view(character())) }) test_that("match argument controls what is shown", { x <- c("abc", "def", "fgh", NA) out <- str_view(x, "d|e", match = NA) expect_length(out, 4) out <- str_view(x, "d|e", match = TRUE) expect_length(out, 1) out <- str_view(x, "d|e", match = FALSE) expect_length(out, 3) }) test_that("can match across lines", { local_reproducible_output(crayon = TRUE) expect_snapshot(str_view("a\nb\nbbb\nc", "(b|\n)+")) }) test_that("vectorised over pattern", { x <- str_view("a", c("a", "b"), match = NA) expect_equal(length(x), 2) }) test_that("[ preserves class", { x <- str_view(letters) expect_s3_class(x[], "stringr_view") }) test_that("str_view_all() is deprecated", { expect_snapshot(str_view_all("abc", "a|b")) }) test_that("html mode continues to work", { skip_if_not_installed("htmltools") skip_if_not_installed("htmlwidgets") x <- c("abc", "def", "fgh") expect_snapshot({ str_view(x, "[aeiou]", html = TRUE)$x$html str_view(x, "d|e", html = TRUE)$x$html }) # can use escapes x <- c(" ", "\u00A0", "\n") expect_snapshot({ str_view(x, html = TRUE, use_escapes = TRUE)$x$html }) }) ================================================ FILE: tests/testthat/test-word.R ================================================ test_that("word extraction", { expect_equal("walk", word("walk the moon")) expect_equal("walk", word("walk the moon", 1)) expect_equal("moon", word("walk the moon", 3)) expect_equal("the moon", word("walk the moon", 2, 3)) }) test_that("words past end return NA", { expect_equal(word("a b c", 4), NA_character_) }) test_that("negative parameters", { expect_equal("moon", word("walk the moon", -1, -1)) expect_equal("walk the moon", word("walk the moon", -3, -1)) expect_equal("walk the moon", word("walk the moon", -5, -1)) }) test_that("word() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(word(x, 1)), names(x)) }) ================================================ FILE: tests/testthat/test-wrap.R ================================================ test_that("wrapping removes spaces", { expect_equal(str_wrap(""), "") expect_equal(str_wrap(" "), "") expect_equal(str_wrap(" a "), "a") }) test_that("wrapping with width of 0 puts each word on own line", { n_returns <- letters %>% str_c(collapse = " ") %>% str_wrap(0) %>% str_count("\n") expect_equal(n_returns, length(letters) - 1) }) test_that("wrapping at whitespace break works", { expect_equal(str_wrap("a/b", width = 0, whitespace_only = TRUE), "a/b") expect_equal(str_wrap("a/b", width = 0, whitespace_only = FALSE), "a/\nb") }) test_that("str_wrap() preserves names", { x <- c(C = "3", B = "2", A = "1") expect_equal(names(str_wrap(x)), names(x)) }) ================================================ FILE: tests/testthat.R ================================================ library(testthat) library(stringr) test_check("stringr") ================================================ FILE: vignettes/.gitignore ================================================ /.quarto/ ================================================ FILE: vignettes/from-base.Rmd ================================================ --- title: "From base R" author: "Sara Stoudt" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{From base R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| label: setup #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) library(magrittr) ``` This vignette compares stringr functions to their base R equivalents to help users transitioning from using base R to stringr. # Overall differences We'll begin with a lookup table between the most important stringr functions and their base R equivalents. ```{r} #| label: stringr-base-r-diff #| echo: false data_stringr_base_diff <- tibble::tribble( ~stringr, ~base_r, "str_detect(string, pattern)", "grepl(pattern, x)", "str_dup(string, times)", "strrep(x, times)", "str_extract(string, pattern)", "regmatches(x, m = regexpr(pattern, text))", "str_extract_all(string, pattern)", "regmatches(x, m = gregexpr(pattern, text))", "str_length(string)", "nchar(x)", "str_locate(string, pattern)", "regexpr(pattern, text)", "str_locate_all(string, pattern)", "gregexpr(pattern, text)", "str_match(string, pattern)", "regmatches(x, m = regexec(pattern, text))", "str_order(string)", "order(...)", "str_replace(string, pattern, replacement)", "sub(pattern, replacement, x)", "str_replace_all(string, pattern, replacement)", "gsub(pattern, replacement, x)", "str_sort(string)", "sort(x)", "str_split(string, pattern)", "strsplit(x, split)", "str_sub(string, start, end)", "substr(x, start, stop)", "str_subset(string, pattern)", "grep(pattern, x, value = TRUE)", "str_to_lower(string)", "tolower(x)", "str_to_title(string)", "tools::toTitleCase(text)", "str_to_upper(string)", "toupper(x)", "str_trim(string)", "trimws(x)", "str_which(string, pattern)", "grep(pattern, x)", "str_wrap(string)", "strwrap(x)" ) # create MD table, arranged alphabetically by stringr fn name data_stringr_base_diff %>% dplyr::mutate(dplyr::across( .cols = everything(), .fns = ~ paste0("`", .x, "`")) ) %>% dplyr::arrange(stringr) %>% dplyr::rename(`base R` = base_r) %>% gt::gt() %>% gt::fmt_markdown(columns = everything()) %>% gt::tab_options(column_labels.font.weight = "bold") ``` Overall the main differences between base R and stringr are: 1. stringr functions start with `str_` prefix; base R string functions have no consistent naming scheme. 1. The order of inputs is usually different between base R and stringr. In base R, the `pattern` to match usually comes first; in stringr, the `string` to manupulate always comes first. This makes stringr easier to use in pipes, and with `lapply()` or `purrr::map()`. 1. Functions in stringr tend to do less, where many of the string processing functions in base R have multiple purposes. 1. The output and input of stringr functions has been carefully designed. For example, the output of `str_locate()` can be fed directly into `str_sub()`; the same is not true of `regexpr()` and `substr()`. 1. Base functions use arguments (like `perl`, `fixed`, and `ignore.case`) to control how the pattern is interpreted. To avoid dependence between arguments, stringr instead uses helper functions (like `fixed()`, `regex()`, and `coll()`). Next we'll walk through each of the functions, noting the similarities and important differences. These examples are adapted from the stringr documentation and here they are contrasted with the analogous base R operations. # Detect matches ## `str_detect()`: Detect the presence or absence of a pattern in a string Suppose you want to know whether each word in a vector of fruit names contains an "a". ```{r} fruit <- c("apple", "banana", "pear", "pineapple") # base grepl(pattern = "a", x = fruit) # stringr str_detect(fruit, pattern = "a") ``` In base you would use `grepl()` (see the "l" and think logical) while in stringr you use `str_detect()` (see the verb "detect" and think of a yes/no action). ## `str_which()`: Find positions matching a pattern Now you want to identify the positions of the words in a vector of fruit names that contain an "a". ```{r} # base grep(pattern = "a", x = fruit) # stringr str_which(fruit, pattern = "a") ``` In base you would use `grep()` while in stringr you use `str_which()` (by analogy to `which()`). ## `str_count()`: Count the number of matches in a string How many "a"s are in each fruit? ```{r} # base loc <- gregexpr(pattern = "a", text = fruit, fixed = TRUE) sapply(loc, function(x) length(attr(x, "match.length"))) # stringr str_count(fruit, pattern = "a") ``` This information can be gleaned from `gregexpr()` in base, but you need to look at the `match.length` attribute as the vector uses a length-1 integer vector (`-1`) to indicate no match. ## `str_locate()`: Locate the position of patterns in a string Within each fruit, where does the first "p" occur? Where are all of the "p"s? ```{r} fruit3 <- c("papaya", "lime", "apple") # base str(gregexpr(pattern = "p", text = fruit3)) # stringr str_locate(fruit3, pattern = "p") str_locate_all(fruit3, pattern = "p") ``` # Subset strings ## `str_sub()`: Extract and replace substrings from a character vector What if we want to grab part of a string? ```{r} hw <- "Hadley Wickham" # base substr(hw, start = 1, stop = 6) substring(hw, first = 1) # stringr str_sub(hw, start = 1, end = 6) str_sub(hw, start = 1) str_sub(hw, end = 6) ``` In base you could use `substr()` or `substring()`. The former requires both a start and stop of the substring while the latter assumes the stop will be the end of the string. The stringr version, `str_sub()` has the same functionality, but also gives a default start value (the beginning of the string). Both the base and stringr functions have the same order of expected inputs. In stringr you can use negative numbers to index from the right-hand side string: -1 is the last letter, -2 is the second to last, and so on. ```{r} str_sub(hw, start = 1, end = -1) str_sub(hw, start = -5, end = -2) ``` Both base R and stringr subset are vectorized over their parameters. This means you can either choose the same subset across multiple strings or specify different subsets for different strings. ```{r} al <- "Ada Lovelace" # base substr(c(hw,al), start = 1, stop = 6) substr(c(hw,al), start = c(1,1), stop = c(6,7)) # stringr str_sub(c(hw,al), start = 1, end = -1) str_sub(c(hw,al), start = c(1,1), end = c(-1,-2)) ``` stringr will automatically recycle the first argument to the same length as `start` and `stop`: ```{r} str_sub(hw, start = 1:5) ``` Whereas the base equivalent silently uses just the first value: ```{r} substr(hw, start = 1:5, stop = 15) ``` ## `str_sub() <- `: Subset assignment `substr()` behaves in a surprising way when you replace a substring with a different number of characters: ```{r} # base x <- "ABCDEF" substr(x, 1, 3) <- "x" x ``` `str_sub()` does what you would expect: ```{r} # stringr x <- "ABCDEF" str_sub(x, 1, 3) <- "x" x ``` ## `str_subset()`: Keep strings matching a pattern, or find positions We may want to retrieve strings that contain a pattern of interest: ```{r} # base grep(pattern = "g", x = fruit, value = TRUE) # stringr str_subset(fruit, pattern = "g") ``` ## `str_extract()`: Extract matching patterns from a string We may want to pick out certain patterns from a string, for example, the digits in a shopping list: ```{r} shopping_list <- c("apples x4", "bag of flour", "10", "milk x2") # base matches <- regexpr(pattern = "\\d+", text = shopping_list) # digits regmatches(shopping_list, m = matches) matches <- gregexpr(pattern = "[a-z]+", text = shopping_list) # words regmatches(shopping_list, m = matches) # stringr str_extract(shopping_list, pattern = "\\d+") str_extract_all(shopping_list, "[a-z]+") ``` Base R requires the combination of `regexpr()` with `regmatches()`; but note that the strings without matches are dropped from the output. stringr provides `str_extract()` and `str_extract_all()`, and the output is always the same length as the input. ## `str_match()`: Extract matched groups from a string We may also want to extract groups from a string. Here I'm going to use the scenario from Section 14.4.3 in [R for Data Science](https://r4ds.had.co.nz/strings.html). ```{r} head(sentences) noun <- "([A]a|[Tt]he) ([^ ]+)" # base matches <- regexec(pattern = noun, text = head(sentences)) do.call("rbind", regmatches(x = head(sentences), m = matches)) # stringr str_match(head(sentences), pattern = noun) ``` As for extracting the full match base R requires the combination of two functions, and inputs with no matches are dropped from the output. # Manage lengths ## `str_length()`: The length of a string To determine the length of a string, base R uses `nchar()` (not to be confused with `length()` which gives the length of vectors, etc.) while stringr uses `str_length()`. ```{r} # base nchar(letters) # stringr str_length(letters) ``` There are some subtle differences between base and stringr here. `nchar()` requires a character vector, so it will return an error if used on a factor. `str_length()` can handle a factor input. ```{r} #| error: true # base nchar(factor("abc")) ``` ```{r} # stringr str_length(factor("abc")) ``` Note that "characters" is a poorly defined concept, and technically both `nchar()` and `str_length()` returns the number of code points. This is usually the same as what you'd consider to be a charcter, but not always: ```{r} x <- c("\u00fc", "u\u0308") x nchar(x) str_length(x) ``` ## `str_pad()`: Pad a string To pad a string to a certain width, use stringr's `str_pad()`. In base R you could use `sprintf()`, but unlike `str_pad()`, `sprintf()` has many other functionalities. ```{r} # base sprintf("%30s", "hadley") sprintf("%-30s", "hadley") # "both" is not as straightforward # stringr rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) ``` ## `str_trunc()`: Truncate a character string The stringr package provides an easy way to truncate a character string: `str_trunc()`. Base R has no function to do this directly. ```{r} x <- "This string is moderately long" # stringr rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) ``` ## `str_trim()`: Trim whitespace from a string Similarly, stringr provides `str_trim()` to trim whitespace from a string. This is analogous to base R's `trimws()` added in R 3.3.0. ```{r} # base trimws(" String with trailing and leading white space\t") trimws("\n\nString with trailing and leading white space\n\n") # stringr str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") ``` The stringr function `str_squish()` allows for extra whitespace within a string to be trimmed (in contrast to `str_trim()` which removes whitespace at the beginning and/or end of string). In base R, one might take advantage of `gsub()` to accomplish the same effect. ```{r} # stringr str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") ``` ## `str_wrap()`: Wrap strings into nicely formatted paragraphs `strwrap()` and `str_wrap()` use different algorithms. `str_wrap()` uses the famous [Knuth-Plass algorithm](http://litherum.blogspot.com/2015/07/knuth-plass-line-breaking-algorithm.html). ```{r} gettysburg <- "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal." # base cat(strwrap(gettysburg, width = 60), sep = "\n") # stringr cat(str_wrap(gettysburg, width = 60), "\n") ``` Note that `strwrap()` returns a character vector with one element for each line; `str_wrap()` returns a single string containing line breaks. # Mutate strings ## `str_replace()`: Replace matched patterns in a string To replace certain patterns within a string, stringr provides the functions `str_replace()` and `str_replace_all()`. The base R equivalents are `sub()` and `gsub()`. Note the difference in default input order again. ```{r} fruits <- c("apple", "banana", "pear", "pineapple") # base sub("[aeiou]", "-", fruits) gsub("[aeiou]", "-", fruits) # stringr str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") ``` ## case: Convert case of a string Both stringr and base R have functions to convert to upper and lower case. Title case is also provided in stringr. ```{r} dog <- "The quick brown dog" # base toupper(dog) tolower(dog) tools::toTitleCase(dog) # stringr str_to_upper(dog) str_to_lower(dog) str_to_title(dog) ``` In stringr we can control the locale, while in base R locale distinctions are controlled with global variables. Therefore, the output of your base R code may vary across different computers with different global settings. ```{r} # stringr str_to_upper("i") # English str_to_upper("i", locale = "tr") # Turkish ``` # Join and split ## `str_flatten()`: Flatten a string If we want to take elements of a string vector and collapse them to a single string we can use the `collapse` argument in `paste()` or use stringr's `str_flatten()`. ```{r} # base paste0(letters, collapse = "-") # stringr str_flatten(letters, collapse = "-") ``` The advantage of `str_flatten()` is that it always returns a vector the same length as its input; to predict the return length of `paste()` you must carefully read all arguments. ## `str_dup()`: duplicate strings within a character vector To duplicate strings within a character vector use `strrep()` (in R 3.3.0 or greater) or `str_dup()`: ```{r} #| eval: !expr getRversion() >= "3.3.0" fruit <- c("apple", "pear", "banana") # base strrep(fruit, 2) strrep(fruit, 1:3) # stringr str_dup(fruit, 2) str_dup(fruit, 1:3) ``` ## `str_split()`: Split up a string into pieces To split a string into pieces with breaks based on a particular pattern match stringr uses `str_split()` and base R uses `strsplit()`. Unlike most other functions, `strsplit()` starts with the character vector to modify. ```{r} fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) # base strsplit(fruits, " and ") # stringr str_split(fruits, " and ") ``` The stringr package's `str_split()` allows for more control over the split, including restricting the number of possible matches. ```{r} # stringr str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) ``` ## `str_glue()`: Interpolate strings It's often useful to interpolate varying values into a fixed string. In base R, you can use `sprintf()` for this purpose; stringr provides a wrapper for the more general purpose [glue](https://glue.tidyverse.org) package. ```{r} name <- "Fred" age <- 50 anniversary <- as.Date("1991-10-12") # base sprintf( "My name is %s my age next year is %s and my anniversary is %s.", name, age + 1, format(anniversary, "%A, %B %d, %Y") ) # stringr str_glue( "My name is {name}, ", "my age next year is {age + 1}, ", "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." ) ``` # Order strings ## `str_order()`: Order or sort a character vector Both base R and stringr have separate functions to order and sort strings. ```{r} # base order(letters) sort(letters) # stringr str_order(letters) str_sort(letters) ``` Some options in `str_order()` and `str_sort()` don't have analogous base R options. For example, the stringr functions have a `locale` argument to control how to order or sort. In base R the locale is a global setting, so the outputs of `sort()` and `order()` may differ across different computers. For example, in the Norwegian alphabet, å comes after z: ```{r} x <- c("å", "a", "z") str_sort(x) str_sort(x, locale = "no") ``` The stringr functions also have a `numeric` argument to sort digits numerically instead of treating them as strings. ```{r} # stringr x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) ``` ================================================ FILE: vignettes/locale-sensitive.Rmd ================================================ --- title: "Locale sensitive functions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Locale sensitive functions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| include: FALSE knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) ``` A locale is a set of parameters that define a user's language, region, and cultural preferences. It determines language-specific rules for text processing, including how to: - Convert between uppercase and lowercase letters - Sort text alphabetically - Format dates, numbers, and currency - Handle character encoding and display In stringr, you can control the locale using the `locale` argument, which takes language codes like "en" (English), "tr" (Turkish), or "es_MX" (Mexican Spanish). In general, a locale is a lower-case language abbreviation, optionally followed by an underscore (_) and an upper-case region identifier. You can see which locales are supported in stringr by running `stringi::stri_locale_list()`. This vignette describes locale-sensitive stringr functions, i.e. functions with a `locale` argument. These functions fall into two broad categories: 1. Case conversion 2. Sorting and ordering ## Case conversion `str_to_lower()`, `str_to_upper()`, `str_to_title()`, and `str_to_sentence()` all change the case of their inputs. But while most languages that use the Latin alphabet (like English) have upper and lower case, the rules for converting between the two aren't always the same. For example, Turkish has two forms of the letter "I": as well as "i" and "I", Turkish also has "ı", the dotless lowercase i, and "İ" is the dotted uppercase I. This means the rules for converting i to upper case and I to lower case are different from English: ```{r} # English str_to_upper("i") str_to_lower("I") # Turkish str_to_upper("i", locale = "tr") str_to_lower("I", locale = "tr") ``` Another example is Dutch, where "ij" is a digraph treated as a single letter. This means that `str_to_sentence()` will incorrectly capitalize "ij" at the start of a sentence unless you use a Dutch locale: ```{r} #| warning: false dutch_sentence <- "ijsland is een prachtig land in Noord-Europa." # Incorrect str_to_sentence(dutch_sentence) # Correct str_to_sentence(dutch_sentence, locale = "nl") ``` Case conversion also comes up in another situation: case-insensitive comparison. This is relevant in two contexts. First, `str_equal()` and `str_unique()` can optionally ignore case, so it's important to also supply locale when working with non-English text. For example, imagine we're searching for a Turkish name, ignoring case: ```{r} turkish_names <- c("İpek", "Işık", "İbrahim") search_name <- "ipek" # incorrect str_equal(turkish_names, search_name, ignore_case = TRUE) # correct str_equal(turkish_names, search_name, ignore_case = TRUE, locale = "tr") ``` Case conversion also comes up in pattern matching functions like `str_detect()`. You might be accustomed to use `ignore_case = TRUE` with `regex()` or `fixed()`, but if you want to use locale-sensitive comparison you instead need to use `coll()`: ```{r} # incorrect str_detect(turkish_names, fixed(search_name, ignore_case = TRUE)) # correct str_detect(turkish_names, coll(search_name, ignore_case = TRUE, locale = "tr")) ``` ## Sorting and ordering `str_sort()`, `str_order()`, and `str_rank()` all rely on the alphabetical ordering of letters. But not every language uses the same ordering as English. For example, Lithuanian places 'y' between 'i' and 'k', and Czech treats "ch" as a single compound letter that sorts after all other words beginning with 'h'. This means that to correctly sort words in these languages, you must provide the appropriate locale: ```{r} czech_words <- c("had", "chata", "hrad", "chůze") lithuanian_words <- c("ąžuolas", "ėglė", "šuo", "yra", "žuvis") # incorrect str_sort(czech_words) str_sort(lithuanian_words) # correct str_sort(czech_words, locale = "cs") str_sort(lithuanian_words, locale = "lt") ``` ================================================ FILE: vignettes/regular-expressions.Rmd ================================================ --- title: "Regular expressions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Regular expressions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| label = "setup", #| include = FALSE knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) ``` Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr's regular expressions, as implemented by [stringi](https://github.com/gagolews/stringi). It is not a tutorial, so if you're unfamiliar regular expressions, I'd recommend starting at . If you want to master the details, I'd recommend reading the classic [_Mastering Regular Expressions_](https://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124) by Jeffrey E. F. Friedl. Regular expressions are the default pattern engine in stringr. That means when you use a pattern matching function with a bare string, it's equivalent to wrapping it in a call to `regex()`: ```{r} #| eval = FALSE # The regular call: str_extract(fruit, "nana") # Is shorthand for str_extract(fruit, regex("nana")) ``` You will need to use `regex()` explicitly if you want to override the default options, as you'll see in examples below. ## Basic matches The simplest patterns match exact strings: ```{r} x <- c("apple", "banana", "pear") str_extract(x, "an") ``` You can perform a case-insensitive match using `ignore_case = TRUE`: ```{r} bananas <- c("banana", "Banana", "BANANA") str_detect(bananas, "banana") str_detect(bananas, regex("banana", ignore_case = TRUE)) ``` The next step up in complexity is `.`, which matches any character except a newline: ```{r} str_extract(x, ".a.") ``` You can allow `.` to match everything, including `\n`, by setting `dotall = TRUE`: ```{r} str_detect("\nX\n", ".X.") str_detect("\nX\n", regex(".X.", dotall = TRUE)) ``` ## Escaping If "`.`" matches any character, how do you match a literal "`.`"? You need to use an "escape" to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, `\`, to escape special behaviour. So to match an `.`, you need the regexp `\.`. Unfortunately this creates a problem. We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings. So to create the regular expression `\.` we need the string `"\\."`. ```{r} # To create the regular expression, we need \\ dot <- "\\." # But the expression itself only contains one: writeLines(dot) # And this tells R to look for an explicit . str_extract(c("abc", "a.c", "bef"), "a\\.c") ``` If `\` is used as an escape character in regular expressions, how do you match a literal `\`? Well you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write `"\\\\"` --- you need four backslashes to match one! ```{r} x <- "a\\b" writeLines(x) str_extract(x, "\\\\") ``` In this vignette, I use `\.` to denote the regular expression, and `"\\."` to denote the string that represents the regular expression. An alternative quoting mechanism is `\Q...\E`: all the characters in `...` are treated as exact matches. This is useful if you want to exactly match user input as part of a regular expression. ```{r} x <- c("a.b.c.d", "aeb") starts_with <- "a.b" str_detect(x, paste0("^", starts_with)) str_detect(x, paste0("^\\Q", starts_with, "\\E")) ``` ## Special characters Escapes also allow you to specify individual characters that are otherwise hard to type. You can specify individual unicode characters in five ways, either as a variable number of hex digits (four is most common), or by name: * `\xhh`: 2 hex digits. * `\x{hhhh}`: 1-6 hex digits. * `\uhhhh`: 4 hex digits. * `\Uhhhhhhhh`: 8 hex digits. * `\N{name}`, e.g. `\N{grinning face}` matches the basic smiling emoji. Similarly, you can specify many common control characters: * `\a`: bell. * `\cX`: match a control-X character. * `\e`: escape (`\u001B`). * `\f`: form feed (`\u000C`). * `\n`: line feed (`\u000A`). * `\r`: carriage return (`\u000D`). * `\t`: horizontal tabulation (`\u0009`). * `\0ooo` match an octal character. 'ooo' is from one to three octal digits, from 000 to 0377. The leading zero is required. (Many of these are only of historical interest and are only included here for the sake of completeness.) ## Matching multiple characters There are a number of patterns that match more than one character. You've already seen `.`, which matches any character (except a newline). A closely related operator is `\X`, which matches a __grapheme cluster__, a set of individual elements that form a single symbol. For example, one way of representing "á" is as the letter "a" plus an accent: `.` will match the component "a", while `\X` will match the complete symbol: ```{r} x <- "a\u0301" str_extract(x, ".") str_extract(x, "\\X") ``` There are five other escaped pairs that match narrower classes of characters: * `\d`: matches any digit. The complement, `\D`, matches any character that is not a decimal digit. ```{r} str_extract_all("1 + 2 = 3", "\\d+")[[1]] ``` Technically, `\d` includes any character in the Unicode Category of Nd ("Number, Decimal Digit"), which also includes numeric symbols from other languages: ```{r} # Some Laotian numbers str_detect("១២៣", "\\d") ``` * `\s`: matches any whitespace. This includes tabs, newlines, form feeds, and any character in the Unicode Z Category (which includes a variety of space characters and other separators.). The complement, `\S`, matches any non-whitespace character. ```{r} (text <- "Some \t badly\n\t\tspaced \f text") str_replace_all(text, "\\s+", " ") ``` * `\p{property name}` matches any character with specific unicode property, like `\p{Uppercase}` or `\p{Diacritic}`. The complement, `\P{property name}`, matches all characters without the property. A complete list of unicode properties can be found at . ```{r} (text <- c('"Double quotes"', "«Guillemet»", "“Fancy quotes”")) str_replace_all(text, "\\p{quotation mark}", "'") ``` * `\w` matches any "word" character, which includes alphabetic characters, marks and decimal numbers. The complement, `\W`, matches any non-word character. ```{r} str_extract_all("Don't eat that!", "\\w+")[[1]] str_split("Don't eat that!", "\\W")[[1]] ``` Technically, `\w` also matches connector punctuation, `\u200c` (zero width connector), and `\u200d` (zero width joiner), but these are rarely seen in the wild. * `\b` matches word boundaries, the transition between word and non-word characters. `\B` matches the opposite: boundaries that have either both word or non-word characters on either side. ```{r} str_replace_all("The quick brown fox", "\\b", "_") str_replace_all("The quick brown fox", "\\B", "_") ``` You can also create your own __character classes__ using `[]`: * `[abc]`: matches a, b, or c. * `[a-z]`: matches every character between a and z (in Unicode code point order). * `[^abc]`: matches anything except a, b, or c. * `[\^\-]`: matches `^` or `-`. There are a number of pre-built classes that you can use inside `[]`: * `[:punct:]`: punctuation. * `[:alpha:]`: letters. * `[:lower:]`: lowercase letters. * `[:upper:]`: upperclass letters. * `[:digit:]`: digits. * `[:xdigit:]`: hex digits. * `[:alnum:]`: letters and numbers. * `[:cntrl:]`: control characters. * `[:graph:]`: letters, numbers, and punctuation. * `[:print:]`: letters, numbers, punctuation, and whitespace. * `[:space:]`: space characters (basically equivalent to `\s`). * `[:blank:]`: space and tab. These all go inside the `[]` for character classes, i.e. `[[:digit:]AX]` matches all digits, A, and X. You can also using Unicode properties, like `[\p{Letter}]`, and various set operations, like `[\p{Letter}--\p{script=latin}]`. See `?"stringi-search-charclass"` for details. ## Alternation `|` is the __alternation__ operator, which will pick between one or more possible matches. For example, `abc|def` will match `abc` or `def`: ```{r} str_detect(c("abc", "def", "ghi"), "abc|def") ``` Note that the precedence for `|` is low: `abc|def` is equivalent to `(abc)|(def)` not `ab(c|d)ef`. ## Grouping You can use parentheses to override the default precedence rules: ```{r} str_extract(c("grey", "gray"), "gre|ay") str_extract(c("grey", "gray"), "gr(e|a)y") ``` Parentheses also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters: ```{r} pattern <- "(..)\\1" fruit %>% str_subset(pattern) fruit %>% str_subset(pattern) %>% str_match(pattern) ``` You can use `(?:...)`, the non-grouping parentheses, to control precedence but not capture the match in a group. This is slightly more efficient than capturing parentheses. ```{r} str_match(c("grey", "gray"), "gr(e|a)y") str_match(c("grey", "gray"), "gr(?:e|a)y") ``` This is most useful for more complex cases where you need to capture matches and control precedence independently. You can use `(?...)`, the named capture group, to provide a reference to the matched text. This is more readable and maintainable, especially with complex regular expressions, because you can reference the matched text by name instead of a potentially confusing numerical index. *Note: `` should not include an underscore because they are not supported.* ```{r} date_string <- "Today's date is 2025-09-19." pattern <- "(?\\d{4})-(?\\d{2})-(?\\d{2})" str_match(date_string, pattern) ``` You can then use `\k` to backreference the previously captured named group. It is an alternative to the standard numbered backreferences like `\1` or `\2`. ```{r} text <- "This is is a test test with duplicates duplicates" pattern <- "(?\\b\\w+\\b)\\s+\\k" str_subset(text, pattern) str_match_all(text, pattern) ``` ## Anchors By default, regular expressions will match any part of a string. It's often useful to __anchor__ the regular expression so that it matches from the start or end of the string: * `^` matches the start of string. * `$` matches the end of the string. ```{r} x <- c("apple", "banana", "pear") str_extract(x, "^a") str_extract(x, "a$") ``` To match a literal "$" or "^", you need to escape them, `\$`, and `\^`. For multiline strings, you can use `regex(multiline = TRUE)`. This changes the behaviour of `^` and `$`, and introduces three new operators: * `^` now matches the start of each line. * `$` now matches the end of each line. * `\A` matches the start of the input. * `\z` matches the end of the input. * `\Z` matches the end of the input, but before the final line terminator, if it exists. ```{r} x <- "Line 1\nLine 2\nLine 3\n" str_extract_all(x, "^Line..")[[1]] str_extract_all(x, regex("^Line..", multiline = TRUE))[[1]] str_extract_all(x, regex("\\ALine..", multiline = TRUE))[[1]] ``` ## Repetition You can control how many times a pattern matches with the repetition operators: * `?`: 0 or 1. * `+`: 1 or more. * `*`: 0 or more. ```{r} x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII" str_extract(x, "CC?") str_extract(x, "CC+") str_extract(x, 'C[LX]+') ``` Note that the precedence of these operators is high, so you can write: `colou?r` to match either American or British spellings. That means most uses will need parentheses, like `bana(na)+`. You can also specify the number of matches precisely: * `{n}`: exactly n * `{n,}`: n or more * `{n,m}`: between n and m ```{r} str_extract(x, "C{2}") str_extract(x, "C{2,}") str_extract(x, "C{2,3}") ``` By default these matches are "greedy": they will match the longest string possible. You can make them "lazy", matching the shortest string possible by putting a `?` after them: * `??`: 0 or 1, prefer 0. * `+?`: 1 or more, match as few times as possible. * `*?`: 0 or more, match as few times as possible. * `{n,}?`: n or more, match as few times as possible. * `{n,m}?`: between n and m, , match as few times as possible, but at least n. ```{r} str_extract(x, c("C{2,3}", "C{2,3}?")) str_extract(x, c("C[LX]+", "C[LX]+?")) ``` You can also make the matches possessive by putting a `+` after them, which means that if later parts of the match fail, the repetition will not be re-tried with a smaller number of characters. This is an advanced feature used to improve performance in worst-case scenarios (called "catastrophic backtracking"). * `?+`: 0 or 1, possessive. * `++`: 1 or more, possessive. * `*+`: 0 or more, possessive. * `{n}+`: exactly n, possessive. * `{n,}+`: n or more, possessive. * `{n,m}+`: between n and m, possessive. A related concept is the __atomic-match__ parenthesis, `(?>...)`. If a later match fails and the engine needs to back-track, an atomic match is kept as is: it succeeds or fails as a whole. Compare the following two regular expressions: ```{r} str_detect("ABC", "(?>A|.B)C") str_detect("ABC", "(?:A|.B)C") ``` The atomic match fails because it matches A, and then the next character is a C so it fails. The regular match succeeds because it matches A, but then C doesn't match, so it back-tracks and tries B instead. ## Look arounds These assertions look ahead or behind the current match without "consuming" any characters (i.e. changing the input position). * `(?=...)`: positive look-ahead assertion. Matches if `...` matches at the current input. * `(?!...)`: negative look-ahead assertion. Matches if `...` __does not__ match at the current input. * `(?<=...)`: positive look-behind assertion. Matches if `...` matches text preceding the current position, with the last character of the match being the character just before the current position. Length must be bounded (i.e. no `*` or `+`). * `(? %\VignetteIndexEntry{Introduction to stringr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| include = FALSE library(stringr) knitr::opts_chunk$set( comment = "#>", collapse = TRUE ) ``` There are four main families of functions in stringr: 1. Character manipulation: these functions allow you to manipulate individual characters within the strings in character vectors. 1. Whitespace tools to add, remove, and manipulate whitespace. 1. Locale sensitive operations whose operations will vary from locale to locale. 1. Pattern matching functions. These recognise four engines of pattern description. The most common is regular expressions, but there are three other tools. ## Getting and setting individual characters You can get the length of the string with `str_length()`: ```{r} str_length("abc") ``` This is now equivalent to the base R function `nchar()`. Previously it was needed to work around issues with `nchar()` such as the fact that it returned 2 for `nchar(NA)`. This has been fixed as of R 3.3.0, so it is no longer so important. You can access individual character using `str_sub()`. It takes three arguments: a character vector, a `start` position and an `end` position. Either position can either be a positive integer, which counts from the left, or a negative integer which counts from the right. The positions are inclusive, and if longer than the string, will be silently truncated. ```{r} x <- c("abcdef", "ghifjk") # The 3rd letter str_sub(x, 3, 3) # The 2nd to 2nd-to-last character str_sub(x, 2, -2) ``` You can also use `str_sub()` to modify strings: ```{r} str_sub(x, 3, 3) <- "X" x ``` To duplicate individual strings, you can use `str_dup()`: ```{r} str_dup(x, c(2, 3)) ``` ## Whitespace Three functions add, remove, or modify whitespace: 1. `str_pad()` pads a string to a fixed length by adding extra whitespace on the left, right, or both sides. ```{r} x <- c("abc", "defghi") str_pad(x, 10) # default pads on left str_pad(x, 10, "both") ``` (You can pad with other characters by using the `pad` argument.) `str_pad()` will never make a string shorter: ```{r} str_pad(x, 4) ``` So if you want to ensure that all strings are the same length (often useful for print methods), combine `str_pad()` and `str_trunc()`: ```{r} x <- c("Short", "This is a long string") x %>% str_trunc(10) %>% str_pad(10, "right") ``` 1. The opposite of `str_pad()` is `str_trim()`, which removes leading and trailing whitespace: ```{r} x <- c(" a ", "b ", " c") str_trim(x) str_trim(x, "left") ``` 1. You can use `str_wrap()` to modify existing whitespace in order to wrap a paragraph of text, such that the length of each line is as similar as possible. ```{r} jabberwocky <- str_c( "`Twas brillig, and the slithy toves ", "did gyre and gimble in the wabe: ", "All mimsy were the borogoves, ", "and the mome raths outgrabe. " ) cat(str_wrap(jabberwocky, width = 40)) ``` ## Locale sensitive A handful of stringr functions are locale-sensitive: they will perform differently in different regions of the world. These functions are case transformation functions: ```{r} x <- "I like horses." str_to_upper(x) str_to_title(x) str_to_lower(x) # Turkish has two sorts of i: with and without the dot str_to_lower(x, "tr") ``` String ordering and sorting: ```{r} x <- c("y", "i", "k") str_order(x) str_sort(x) # In Lithuanian, y comes between i and k str_sort(x, locale = "lt") ``` The locale always defaults to English to ensure that the default behaviour is identical across systems. Locales always include a two letter ISO-639-1 language code (like "en" for English or "zh" for Chinese), and optionally a ISO-3166 country code (like "en_UK" vs "en_US"). You can see a complete list of available locales by running `stringi::stri_locale_list()`. ## Pattern matching The vast majority of stringr functions work with patterns. These are parameterised by the task they perform and the types of patterns they match. ### Tasks Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` to match. stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers: ```{r} strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ``` - `str_detect()` detects the presence or absence of a pattern and returns a logical vector (similar to `grepl()`). `str_subset()` returns the elements of a character vector that match a regular expression (similar to `grep()` with `value = TRUE`)`. ```{r} # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ``` - `str_count()` counts the number of matches: ```{r} # How many phone numbers in each string? str_count(strings, phone) ``` - `str_locate()` locates the **first** position of a pattern and returns a numeric matrix with columns start and end. `str_locate_all()` locates all matches, returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`. ```{r} # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ``` - `str_extract()` extracts text corresponding to the **first** match, returning a character vector. `str_extract_all()` extracts all matches and returns a list of character vectors. ```{r} # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ``` - `str_match()` extracts capture groups formed by `()` from the **first** match. It returns a character matrix with one column for the complete match and one column for each group. `str_match_all()` extracts capture groups from all matches and returns a list of character matrices. Similar to `regmatches()`. ```{r} # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ``` - `str_replace()` replaces the **first** matched pattern and returns a character vector. `str_replace_all()` replaces all matches. Similar to `sub()` and `gsub()`. ```{r} str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ``` - `str_split_fixed()` splits a string into a **fixed** number of pieces based on a pattern and returns a character matrix. `str_split()` splits a string into a **variable** number of pieces and returns a list of character vectors. ```{r} str_split("a-b-c", "-") str_split_fixed("a-b-c", "-", n = 2) ``` ### Engines There are four main engines that stringr can use to describe patterns: * Regular expressions, the default, as shown above, and described in `vignette("regular-expressions")`. * Fixed bytewise matching, with `fixed()`. * Locale-sensitive character matching, with `coll()` * Text boundary analysis with `boundary()`. #### Fixed matches `fixed(x)` only matches the exact sequence of bytes specified by `x`. This is a very limited "pattern", but the restriction can make matching much faster. Beware using `fixed()` with non-English data. It is problematic because there are often multiple ways of representing the same character. For example, there are two ways to define "á": either as a single character or as an "a" plus an accent: ```{r} a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 ``` They render identically, but because they're defined differently, `fixed()` doesn't find a match. Instead, you can use `coll()`, explained below, to respect human character comparison rules: ```{r} str_detect(a1, fixed(a2)) str_detect(a1, coll(a2)) ``` #### Collation search `coll(x)` looks for a match to `x` using human-language **coll**ation rules, and is particularly important if you want to do case insensitive matching. Collation rules differ around the world, so you'll also need to supply a `locale` parameter. ```{r} i <- c("I", "İ", "i", "ı") i str_subset(i, coll("i", ignore_case = TRUE)) str_subset(i, coll("i", ignore_case = TRUE, locale = "tr")) ``` The downside of `coll()` is speed. Because the rules for recognising which characters are the same are complicated, `coll()` is relatively slow compared to `regex()` and `fixed()`. Note that when both `fixed()` and `regex()` have `ignore_case` arguments, they perform a much simpler comparison than `coll()`. #### Boundary `boundary()` matches boundaries between characters, lines, sentences or words. It's most useful with `str_split()`, but can be used with all pattern matching functions: ```{r} x <- "This is a sentence." str_split(x, boundary("word")) str_count(x, boundary("word")) str_extract_all(x, boundary("word")) ``` By convention, `""` is treated as `boundary("character")`: ```{r} str_split(x, "") str_count(x, "") ```