[
  {
    "path": ".Rbuildignore",
    "content": "^.*\\.Rproj$\n^\\.Rproj\\.user$\n^\\.travis\\.yml$\n^README\\.Rmd$\n^README-.*\\.png$\n^CONDUCT\\.md$\n^CONTRIBUTING.md$\n^codecov\\.yml$\n^wordcountaddin\\.Rcheck$\n^wordcountaddin.*\\.tar\\.gz$\n^wordcountaddin.*\\.tgz$\n.github/\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n**To Reproduce**\nPlease include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Session Info**\nOutput of `sessionInfo()` on your device so we can see what packages and version numbers you have\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Please wait for some discussion of your report before making a Pull Request.**\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n**To Reproduce**\n\nPlease include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.\n\nDescribe the steps to reproduce the behavior:\n1. Go to '...'\n2. Click on '....'\n3. Scroll down to '....'\n4. See error\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Session Info**\nOutput of `sessionInfo()` on your device so we can see what packages and version numbers you have\n"
  },
  {
    "path": ".github/workflows/R-CMD-check.yaml",
    "content": "name: R CMD CHECK\n\non: [push]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - uses: r-lib/actions/setup-r@v2\n\n      - name: Install system dependencies\n        run: |\n          sudo apt-get update\n          sudo apt-get install -y libcurl4-openssl-dev libfontconfig1-dev libharfbuzz-dev  libfribidi-dev  libtiff-dev libjpeg-dev libwebp-dev pkg-config \n        shell: bash\n\n      - name: Install dependencies\n        run: |\n          install.packages(c(\"pak\", \"devtools\", \"testthat\"))\n          pak::local_install_deps()\n        shell: Rscript {0}\n\n      - name: Run tests\n        run: devtools::test()\n        shell: Rscript {0}\n"
  },
  {
    "path": ".gitignore",
    "content": ".Rproj.user\n.Rhistory\n.RData\nwordcountaddin.Rcheck/\nwordcountaddin*.tar.gz\nwordcountaddin*.tgz\n"
  },
  {
    "path": ".travis.yml",
    "content": "# Sample .travis.yml for R projects\n\nlanguage: r\nwarnings_are_errors: false\nsudo: required\n\nr_github_packages:\n  - jimhester/covr\n\nafter_success:\n  - Rscript -e 'covr::codecov()'\n\n\n"
  },
  {
    "path": "CONDUCT.md",
    "content": "# Contributor Code of Conduct\n\nAs contributors and maintainers of this project, we pledge to respect all people who \ncontribute through reporting issues, posting feature requests, updating documentation,\nsubmitting pull requests or patches, and other activities.\n\nWe are committed to making participation in this project a harassment-free experience for\neveryone, regardless of level of experience, gender, gender identity and expression,\nsexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.\n\nExamples of unacceptable behavior by participants include the use of sexual language or\nimagery, derogatory comments or personal attacks, trolling, public or private harassment,\ninsults, or other unprofessional conduct.\n\nProject maintainers have the right and responsibility to remove, edit, or reject comments,\ncommits, code, wiki edits, issues, and other contributions that are not aligned to this \nCode of Conduct. Project maintainers who do not follow the Code of Conduct may be removed \nfrom the project team.\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be reported by \nopening an issue or contacting one or more of the project maintainers.\n\nThis Code of Conduct is adapted from the Contributor Covenant \n(http:contributor-covenant.org), version 1.0.0, available at \nhttp://contributor-covenant.org/version/1/0/0/\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing Guidelines\n\n## Pull requests\n\nRequirements for making a pull request:\n\n  * Some knowledge of [git]()\n* Some knowledge of [GitHub]()\n\nRead more about pull requests on GitHub at [https://help.github.com/articles/using-pull-requests/](https://help.github.com/articles/using-pull-requests/). If you haven't done this before, Hadley Wickham provides a nice overview of git (<http://r-pkgs.had.co.nz/git.html>), as well as best practices for submitting pull requests (<http://r-pkgs.had.co.nz/git.html#pr-make>).\n\nThen:\n\n* Fork the repo to your GitHub account\n* Clone the version on your account down to your machine from your account, e.g,. `git clone git@github.com:benmarwick/<package name>.git`\n* Make sure to track progress upstream (i.e., on our version of the package at `benmarwick/<package name>`) by doing `git remote add upstream git@github.com:benmarwick/<package name>.git`. Each time you go to make changes on your machine, be sure to pull changes in from upstream (aka the ropensci version) by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step\n* Make your changes (we prefer if you make changes on a new branch)\n* Ideally included in your contributions:\n* Well documented code in roxygen docs\n* If you add new functions or change functionality, add one or more tests.\n* Make sure the package passes `R CMD CHECK` on your machine without errors/warnings\n* Push up to your account\n* Submit a pull request and participate in the discussion.\n\n## Documentation contributions\n\nDocumentation contributions are surely much needed in every project as each could surely use better instructions. If you are editing any files in the repo, follow the above instructions for pull requests to add contributions. However, if you are editing the wiki, then you can just edit the wiki and no need to do git, pull requests, etc.\n\nAll of the function documentation is generated automatically. Please do not edit any of the documentation files in man/ or the NAMESPACE. Instead, construct the appropriate roxygen2 documentation in the function files in R/ themselves. The documentation is then generated by running the document() function from the devtools package. Please consult the Advanced R programming guide if this workflow is unfamiliar to you. Note that functions should include examples in the documentation. Please use \\dontrun for examples that take more than a few seconds to execute or require an internet connection.\n\nLikewise, the README.md file in the base directory should not be edited directly. This file is created automatically from code that runs the examples shown, helping to ensure that they are functioning as advertised and consistent with the package README vignette. Instead, edit the README.Rmd source file in manuscripts and run make to build the README.\n\n## Repository structure\n\nThis repository is structured as a standard R package following the conventions outlined in the Writing R extensions manual. A few additional files are provided that are not part of the built R package and are listed in .Rbuildignore, such as .travis.yml, which is used for continuous testing and integration.\n\n## Code\n\nAll code for this package is found in R/, (except compiled source code, if used, which is in /src). All functions should be thoroughly documented with roxygen2 notation; see Documentation.\n\nBug reports _must_ have a [reproducible example](http://adv-r.had.co.nz/Reproducibility.html) and include the output of `devtools::session_info()` (instead of `sessionInfo()`). We recommend using Hadley Wickham's style guide when writing code (<http://adv-r.had.co.nz/Style.html>).\n\n## Testing\n\nAny new feature or bug-fix should include a unit-test demonstrating the change. Unit tests follow the testthat framework with files in tests/testthat. Please make sure that the testing suite passes before issuing a pull request. This can be done by running check() from the devtools package, which will also check for consistent documentation, etc.\n\nThis package uses the travis continuous testing mechanism for R to ensure that the test suite is run on each push to Github. An icon at the top of the README.md indicates whether or not the tests are currently passing.\n\n## Questions or comments?\n\nDo not hesitate to open an issue in the issues tracker to raise any questions or comments about the package or these guidelines.\n"
  },
  {
    "path": "DESCRIPTION",
    "content": "Package: wordcountaddin\nType: Package\nTitle: Word counts and readability statistics in R markdown documents\nVersion: 0.3.0.9000\nAuthors@R: c(person(\"Ben\", \"Marwick\",\n                  email = \"benmarwick@gmail.com\",\n                  role = c(\"aut\", \"cre\")),\n            person(\"JooYoung\", \"Seo\",\n                  email = \"jooyoung@psu.edu\",\n                  role = \"ctb\", comment = c(ORCID = \"0000-0002-4064-6012\")),\n            person(\"Henrik\", \"Bengtsson\",\n                  email = \"henrik.bengtsson@gmail.com\",\n                  role = \"ctb\"),\n            person(\"Florian S.\", \"Schaffner\",\n                  email = \"florian.schaffner@outlook.com\",\n                  role = \"ctb\"),\n            person(\"Matthew T.\", \"Warkentin\",\n                   email = \"warkentin@lunenfeld.ca\",\n                   role = \"ctb\"),\n            person(\"Luke A.\", \"McGuinness\",\n                  email = \"luke.a.mcguinness@gmail.com\",\n                  role = \"ctb\",\n                  comment = c(ORCID = \"0000-0001-8730-9761\")))\nMaintainer: Ben Marwick <benmarwick@gmail.com>\nDescription: An addin for RStudio that will count the words and characters\n    in a plain text document. It is designed for use with RMarkdown\n    documents and will exclude YAML header content, code chunks and inline\n    code from the counts. It also computes readability statistics so you can\n    get an idea of how easy or difficult your text is to read.\nLicense: MIT + file LICENSE\nLazyData: TRUE\nImports:\n    fs,\n    knitr,\n    koRpus,\n    koRpus.lang.en,\n    miniUI (>= 0.1.1),\n    purrr,\n    rstudioapi (>= 0.5),\n    shiny (>= 0.13),\n    stringi,\n    sylly,\n    sylly.en\nEncoding: UTF-8\nRoxygenNote: 7.1.1\nSuggests:\n    covr,\n    testthat\n"
  },
  {
    "path": "LICENSE",
    "content": "YEAR: 2017\nCOPYRIGHT HOLDER: Ben Marwick\n"
  },
  {
    "path": "NAMESPACE",
    "content": "# Generated by roxygen2: do not edit by hand\n\nexport(readability)\nexport(readability_chr)\nexport(text_stats)\nexport(text_stats_chr)\nexport(text_stats_fn_)\nexport(word_count)\nimport(koRpus)\nimport(purrr)\nimport(stringi)\n"
  },
  {
    "path": "NEWS.md",
    "content": "# wordcountaddin 0.3.0\n\nNEW FEATURES\n\n* Count words from Rmd filename and get scalar as output (#20)\n\nMINOR IMPROVEMENTS\n\n* make the functions more DRY by adding some unexported fns\n* Expanded readme slightly\n* Added more tests\n\n# wordcountaddin 0.2.0\n\nNEW FEATURES\n\n* Count words from Rmd filename without using RStudio (#3)\n* Count words in active Rmd in RStudio without making text selection (#3)\n* Count words in character string from command line (without Rmd or RStudio) (#2)\n\nMINOR IMPROVEMENTS\n\n* Added a `NEWS.md` file to track changes to the package.\n* Expanded readme\n* Added more tests\n\nBUG FIXES\n\n* Fixed inaccurate count when <br> present (#1)\n\nDEPRECATED AND DEFUNCT\n\nNA\n\n# wordcountaddin 0.1.0\n\nInitial release\n\n\n"
  },
  {
    "path": "R/hello.R",
    "content": "#' wordcountaddin\n#'\n#' This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.\n#'\n#' @name wordcountaddin\n#' @docType package\n#' @import purrr stringi koRpus\nNULL\n\n# global things\n\n md_file_ext_regex <- paste(\n    \"\\\\.markdown$\",\n    \"\\\\.mdown$\",\n    \"\\\\.mkdn$\",\n    \"\\\\.md$\",\n    \"\\\\.mkd$\",\n    \"\\\\.mdwn$\",\n    \"\\\\.mdtxt$\",\n    \"\\\\.mdtext$\",\n    \"\\\\.rmd$\",\n    \"\\\\.Rmd$\",\n    \"\\\\.RMD$\",\n    \"\\\\.Rmarkdown$\",\n    \"\\\\.qmd$\",\n  sep = \"|\")\n\n\n#-------------------------------------------------------------------\n# fns for working with selected text in an active Rmd\n\n#' Get text stats for selected text (excluding code chunks and inline code)\n#'\n#' Call this addin to get a word count and some other stats about the text\n#' @param filename Path to the file on which to compute text stats.\n#' Default is the current file (when working in RStudio) or the file being\n#' knit (when compiling with \\code{knitr}).\n#'\n#' @export\n#' @examples\n#' md <- system.file(package = \"wordcountaddin\", \"NEWS.md\")\n#' text_stats(md)\n#' word_count(md)\n#' \\dontrun{\n#' readability(md)\n#' }\ntext_stats <- function(filename = this_filename()) {\n\n  text_to_count_output <- text_to_count(filename)\n\n  text_stats_fn(text_to_count_output)\n}\n\n\n#' @rdname text_stats\n#' @description Get a word count as a single integer\n#' @export\nword_count <- function(filename = this_filename()){\n\n  text_to_count_output <- text_to_count(filename)\n\n  word_count_output <- text_stats_fn_(text_to_count_output)\n\n  word_count_output$n_words_korp\n}\n\n\n\n\n\n\n#' @rdname text_stats\n#' @description Get readability stats for selected text (excluding code chunks)\n#' @param quiet Logical. Should task be performed quietly?\n#'\n#' @details Call this addin to get readbility stats about the text\n#'\n#' @export\nreadability <- function(filename = this_filename(), quiet = TRUE) {\n\n\n  text_to_count_output <- text_to_count(filename)\n\n  readability_fn(text_to_count_output, quiet = TRUE)\n}\n\n#---------------------------------------------------------------\n# directly work on a character string in the console\n\n\n#' @rdname text_stats\n#' @description Get text stats for selected text (excluding code chunks and inline code)\n#'\n#' @details Use this function with a character string as input\n#'\n#' @export\ntext_stats_chr <- function(text) {\n\n  text <- paste(text, collapse=\"\\n\")\n\n  text_stats_fn(text)\n\n}\n\n\n#' @rdname text_stats\n#' @description Get readability stats for selected text (excluding code chunks)\n#'\n#' @details Use this function with a character string as input\n#'\n#' @param text a character string of text, length of one\n#'\n#' @export\nreadability_chr <- function(text, quiet = TRUE) {\n\n  text <- paste(text, collapse = \"\\n\")\n\n  readability_fn(text, quiet = TRUE)\n\n}\n#-----------------------------------------------------------\n# helper fns, not exported\n\ntext_to_count <- function(filename){\n  # selected text takes precedence over the filename argument:\n  # if text is selected, it is used. Otherwise, the text in filename is used\n  if (rstudioapi::isAvailable()) {\n    context <- rstudioapi::getActiveDocumentContext()\n    selection_text <- unname(unlist(context$selection)[\"text\"])\n    text_is_selected <- nchar(selection_text) > 0\n  } else {\n    # if not running in RStudio, assume no text is selected\n    text_is_selected <- FALSE\n  }\n\n  if (text_is_selected) {\n    text <- selection_text\n  } else {\n    # if no text is selected, read text from \"filename\" as character vector\n    is_extension_invalid <- !grepl(md_file_ext_regex, filename)\n    if (is_extension_invalid) {\n      stop(paste(\"The supplied file has an extension which is not associated with markdown.\",\n                 \"This function only works with markdown or R markdown files.\", sep = \"\\n  \"))\n    }\n    text <- paste(scan(filename, 'character', quiet = TRUE), collapse = \" \")\n  }\n  text\n}\n\nprep_text <- function(text){\n\n  # remove lines starting with :::\n  # we do this before removing line breaks so $ matches end of line\n  text <- gsub(\"(?m)^:::.*$\", \"\", text, perl = TRUE)\n\n  # remove all line breaks, http://stackoverflow.com/a/21781150/1036500\n  text <- gsub(\"[\\r\\n]\", \" \", text)\n\n  # don't include yaml front matter\n  three_dashes <- unlist(gregexpr('---', text))\n  if (three_dashes[1]==1L) {\n    yaml_end <- three_dashes[2] + 2L\n    text <- substr(text, yaml_end + 1L, nchar(text))\n  } else {\n    text\n  }\n\n  # don't include text in code chunks: https://regex101.com/#python\n  text <- gsub(\"```\\\\{.+?\\\\}.+?```\", \"\", text)\n\n  # don't include text in in-line R code\n  text <- gsub(\"`r.+?`\", \"\", text)\n\n  # don't include HTML comments\n  text <- gsub(\"<!--.+?-->\", \"\", text)\n\n  # don't include LaTeX comments\n  # how to do this? %%\n\n  # don't include images with captions\n  text <- gsub(\"!\\\\[.+?\\\\]\\\\(.+?\\\\)\\\\{.+?\\\\}\", \"\", text)\n  text <- gsub(\"!\\\\[.+?\\\\]\\\\(.+?\\\\)\", \"\", text)\n\n  # don't include inline markdown URLs\n  text <- gsub(\"\\\\(http.+?\\\\)\", \"\", text)\n\n  # don't include # for headings\n  text <- gsub(\"#*\", \"\", text)\n\n  # don't include opening html tags\n  # (source: https://www.w3schools.com/TAGS/default.ASP)\n\n  tags <- paste0(\"!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|\",\n                 \"audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|\",\n                 \"canvas|caption|center|cite|code|col|colgroup|data|datalist|\",\n                 \"dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|\",\n                 \"figcaption|figure|font|footer|form|frame|frameset|h1 to h6|\",\n                 \"head|header|hr|html|i|iframe|img|input|ins|kbd|label|legend|\",\n                 \"li|link|main|map|mark|meta|meter|nav|noframes|noscript|\",\n                 \"object|ol|optgroup|option|output|p|param|picture|pre|\",\n                 \"progress|q|rp|rt|ruby|s|samp|script|section|select|small|\",\n                 \"source|span|strike|strong|style|sub|summary|sup|svg|table|\",\n                 \"tbody|td|template|textarea|tfoot|th|thead|time|title|tr|\",\n                 \"track|tt|u|ul|var|video|wbr\")\n\n  text <- gsub(paste0(\"<\\\\s*(\",tags,\")[^>]*>\"),\"\", text)\n\n  # don't include closing html tags\n  text <- gsub(\"</.+?>\", \"\", text)\n\n  # don't include greater/less than signs because they trip up koRpus\n  text <- gsub(\"<|>\", \"\", text)\n\n  # don't include percent signs because they trip up stringi\n  text <- gsub(\"%\", \"\", text)\n\n  # don't include figures and tables inserted using plain LaTeX code\n  text <- gsub(\"\\\\\\\\begin\\\\{figure\\\\}(.*?)\\\\\\\\end\\\\{figure\\\\}\", \"\", text)\n  text <- gsub(\"\\\\\\\\begin\\\\{table\\\\}(.*?)\\\\\\\\end\\\\{table\\\\}\", \"\", text)\n\n  # don't count abbreviations as multiple words, but leave\n  # the period at the end in case it's the end of a sentence\n  text <- gsub(\"\\\\.(?=[a-z]+)\", \"\", text, perl = TRUE)\n\n  # don't include LaTeX \\eggs{ham}\n  # how to do? problem with capturing \\x\n\n  if(nchar(text) == 0){\n    stop(\"You have not selected any text. Please select some text with the mouse and try again\")\n  }\n\n  return(text)\n\n}\n\nprep_text_korpus <- function(text){\n  lengths <- unlist(strsplit(text, \" \"))\n  no_long_one <- paste0(ifelse(nchar(lengths) > 30, substr(lengths, 1, 10), lengths), collapse = \" \")\n  tokenize_safe <- purrr::safely(koRpus::tokenize)\n  k1 <- tokenize_safe(no_long_one, lang = 'en', format = 'obj')\n  k1 <- k1$result\n  return(k1)\n}\n\n\n# These functions do the actual work\n\n#' @rdname text_stats\n#' @export\ntext_stats_fn_ <- function(text){\n  # suppress warnings\n  oldw <- getOption(\"warn\")\n  options(warn = -1)\n\n  text <- prep_text(text)\n\n  require(\"koRpus.lang.en\", quietly = TRUE)\n\n  # stringi methods\n  n_char_tot <- sum(stri_stats_latex(text)[c(1,3)])\n  n_words_stri <- unname(stri_stats_latex(text)[4])\n\n  #korpus methods\n  k1 <- prep_text_korpus(text)\n  korpus_stats <- sylly::describe(k1)\n  k_nchr <- korpus_stats$all.chars\n  k_wc <- korpus_stats$words\n  k_sent <- korpus_stats$sentences\n  k_wps <- k_wc / k_sent\n\n  # reading time\n  # https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_comprehension\n  # assume 200 words per min\n  wpm <-  200\n  reading_time_korp <- paste0(round(k_wc / wpm, 1), \" minutes\")\n  reading_time_stri <- paste0(round(n_words_stri / wpm, 1), \" minutes\")\n\n  return(list(\n  # make the names more useful\n  n_char_tot_stri = n_char_tot,\n  n_char_tot_korp = k_nchr,\n  n_words_korp = k_wc,\n  n_words_stri = n_words_stri,\n  n_sentences_korp = k_sent,\n  words_per_sentence_korp = k_wps,\n  reading_time_korp = reading_time_korp,\n  reading_time_stri = reading_time_stri\n  ))\n\n  # resume warnings\n  options(warn = oldw)\n\n}\n\n\n\ntext_stats_fn <- function(text){\n\n  l <- text_stats_fn_(text)\n\n  results_df <- data.frame(Method = c(\"Word count\", \"Character count\", \"Sentence count\", \"Reading time\"),\n                           koRpus  = c(l$n_words_korp, l$n_char_tot_korp, l$n_sentences_korp, l$reading_time_korp),\n                           stringi = c(l$n_words_stri, l$n_char_tot_stri, \"Not available\", l$reading_time_stri)\n                           )\n\n  results_df_tab <- knitr::kable(results_df)\n  return(results_df_tab)\n\n}\n\n\nreadability_fn_ <- function(text, quiet = TRUE){\n\n  text <- prep_text(text)\n\n  oldw <- getOption(\"warn\")\n  options(warn = -1)\n\n  require(\"koRpus.lang.en\", quietly = TRUE)\n\n  # korpus methods\n  k1 <- prep_text_korpus(text)\n  k_readability <- koRpus::readability(k1, quiet = TRUE)\n\n  return(k_readability)\n\n  # resume warnings\n  options(warn = oldw)\n}\n\n\nreadability_fn <- function(text, quiet = TRUE){\n  # a more condensed overview of the results\n  k_readability <- readability_fn_(text, quiet = TRUE)\n  readability_summary_table <- knitr::kable(summary(k_readability))\n  return(readability_summary_table)\n\n}\n"
  },
  {
    "path": "R/utils.R",
    "content": "# Get the filename of the current file, or\n# the file being rendered\n\nthis_filename <- function() {\n  if (interactive()) {\n    filename <- rstudioapi::getSourceEditorContext()$path\n  } else {\n    filename <- knitr::current_input()\n  }\n  return(fs::path(filename))\n}\n"
  },
  {
    "path": "README.Rmd",
    "content": "---\noutput:\n  md_document:\n    variant: markdown_github\n---\n\n\n\n<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#>\",\n  fig.path = \"README-\"\n)\n```\n\n\n# wordcountaddin <img src=\"inst/logo.png\" align=\"right\" height=\"130\" />\n\n[![Last-changedate](https://img.shields.io/badge/last%20change-`r gsub('-', '--', Sys.Date())`-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master) \n[![minimal R version](https://img.shields.io/badge/R%3E%3D-`r as.character(getRversion())`-brightgreen.svg)](https://cran.r-project.org/)\n[![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/) \n[![Travis-CI Build Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin) \n[![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master) [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531) \n\n\n\n\nThis R package is an [RStudio addin](https://rstudio.github.io/rstudioaddins/) to count words and characters in text in an [R markdown](http://rmarkdown.rstudio.com/) document. It also has a function to compute readability statistics so you can get an indication of how easy or difficult your document is to read. \n\nYou can count words in your Rmd file in three ways:\n\n- In a selection of text in your active Rmd, by selecting some text with your mouse in RStudio and using the Wordcount Addin   \n- All the words in your active Rmd in RStudio, by using the Wordcount Addin  with no text selected\n- All the words in an Rmd file, directly using the `word_count` function from the console or command line (RStudio not required), and specifiying the filename as an argument to the function (e.g. `wordcountaddin::word_count(\"my_file.Rmd\")`). This will give you a single integer result, rather than the Markdown table that the other functions return. \n\nIndependent of an Rmd file, you can also count words in a character vector from the console using the `text_stats_chr` function (and there is `readability_chr` for readability). \n\n## Word count\n\nWhen counting words in the text of your Rmd document, these things will be ignored:\n\n- YAML front matter    \n- code chunks and inline code\n- text in HTML comment tags: `<!-- text -->` \n- HTML tags in the text: `<br>`,  `</br>`\n- inline URLs in this format: `[text of link](url)`\n- images with captions in this format: `![this is the caption](/path/to/image.png)`\n- header level indicators such as `#` and `##`, etc.\n\nAnd because my regex is quite simple, the word count function may also ignore parts of your actual text that resemble these things. \n\nThe word count will include text in headers, block quotations, verbatim code blocks, tables, raw LaTeX and raw HTML. \n\nIn general, there are numerous ways to count words, with no widely accepted standard method. The variety of methods is due to differences in the definitions of a word and a sentence. Run `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about the word counting methods.\n\nFor this addin I've included two methods, mostly out of curiosity to see how they differ from each other. I use functions from the  [stringi](https://cran.r-project.org/web/packages/stringi/index.html) and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) packages. If you're curious, you can compare the results you get with this addin to an online tool such as <http://wordcounttools.com/>.\n\nThe output of the `Word count` function is a markdown table in your R console that might look like this:\n\n```\n|Method          |koRpus      |stringi       |\n|:---------------|:-----------|:-------------|\n|Word count      |107         |104           |\n|Character count |604         |603           |\n|Sentence count  |10          |Not available |\n|Reading time    |0.5 minutes |0.5 minutes   |\n```\n\nIf you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::text_stats_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object, and will include several other items not shown in the markdown table.\n\n## Readability \n\nThe readability function ignores all the same parts of the text as the word count function, and then computes the values of a bunch of [readability statistics](https://en.wikipedia.org/wiki/Readability_test).\n\nMost of these readability measurements aim to approximate the years of education required to understand your text. They look at the number of characters and syllables per word, the number of words per sentence, and so on. They don't analyse the meaning of the words. A score of around 10-12 is roughly the reading level on completion of high school in the US. These stats are computed by the [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) package. \n\nThere about 27 measurements that this readability function returns (depending on how long your text is), including the Automated Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, and the Simple Measure of Gobbledygook (SMOG). For the full list of readability measurements that are returned by the readability function, run `?koRpus::readability`. That help page also shows the formulae and citations for each statistic (and an additional 20-odd other readability statistics not used here). \n\nReadability stats are, of course, no substitute for critical self-reflection on the effectiveness of your writing at communicating ideas and information. To help with that, read [_Style: Toward Clarity and Grace_](http://www.amazon.com/dp/0226899152).\n\n\nThe output of the `readability` function is a markdown table in your R console that might look like this:\n\n```\n\n|index                 |flavour     |raw   |grade |age  |\n|:---------------------|:-----------|:-----|:-----|:----|\n|ARI                   |            |      |2.31  |     |\n|Coleman-Liau          |            |66    |4.91  |     |\n|Danielson-Bryan DB1   |            |6.46  |      |     |\n|Danielson-Bryan DB2   |            |60.39 |6     |     |\n|Dickes-Steiwer        |            |53.07 |      |     |\n|ELF                   |            |1.83  |      |     |\n|Farr-Jenkins-Paterson |            |66.81 |8-9   |     |\n|Flesch                |en (Flesch) |69.57 |8-9   |     |\n|Flesch-Kincaid        |            |      |4.85  |9.8  |\n|FOG                   |            |      |7.84  |     |\n|FORCAST               |            |      |10.28 |15.3 |\n|Fucks                 |            |23.38 |4.83  |     |\n|Linsear-Write         |            |      |2.35  |     |\n|LIX                   |            |32.41 |< 5   |     |\n|nWS1                  |            |      |4.19  |     |\n|nWS2                  |            |      |4.72  |     |\n|nWS3                  |            |      |4.14  |     |\n|nWS4                  |            |      |3.64  |     |\n|RIX                   |            |1.42  |5     |     |\n|SMOG                  |            |      |8.08  |13.1 |\n|Strain                |            |2.44  |      |     |\n|TRI                   |            |-94   |      |     |\n|Tuldava               |            |2.57  |      |     |\n|Wheeler-Smith         |            |18.33 |2     |     |\n```\n\nSimilar to the `word count` function, if you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::readability_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object with slightly more detail than the summary table above. \n\nInspiration for this addin came from [jadd](https://github.com/jennybc/jadd) and [WrapRmd](https://github.com/tjmahr/WrapRmd). \n\n## How to install\n\nInstall with `devtools::install_github(\"benmarwick/wordcountaddin\",  type = \"source\", dependencies = TRUE)`\n\nGo to `Tools > Addins` in RStudio to select and configure addins. \n\n## How to use\n\n1. Open a Rmd file in RStudio.  \n2. Select some text, it can include YAML, code chunks and inline code   \n3. Go to `Tools > Addins` in RStudio and click on `Word count` or `Readability`. Computing `Readability` may take a few moments on longer documents because it has to count syllables for some of the stats.\n4. Look in the console for the output   \n\n\n## Feedback, contributing, etc.\n\nPlease [open an issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you find something that doesn't work as expected. Note that this project is released with a [Guide to Contributing](CONTRIBUTING.md) and a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.\n"
  },
  {
    "path": "README.md",
    "content": "<!-- README.md is generated from README.Rmd. Please edit that file -->\nwordcountaddin <img src=\"inst/logo.png\" align=\"right\" height=\"130\" />\n=====================================================================\n\n[![Last-changedate](https://img.shields.io/badge/last%20change-2019--01--09-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master)\n[![minimal R\nversion](https://img.shields.io/badge/R%3E%3D-3.5.2-brightgreen.svg)](https://cran.r-project.org/)\n[![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/)\n[![Travis-CI Build\nStatus](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin)\n[![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master)\n[![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531)\n\nThis R package is an [RStudio\naddin](https://rstudio.github.io/rstudioaddins/) to count words and\ncharacters in text in an [R markdown](http://rmarkdown.rstudio.com/)\ndocument. It also has a function to compute readability statistics so\nyou can get an indication of how easy or difficult your document is to\nread.\n\nYou can count words in your Rmd file in three ways:\n\n-   In a selection of text in your active Rmd, by selecting some text\n    with your mouse in RStudio and using the Wordcount Addin  \n-   All the words in your active Rmd in RStudio, by using the Wordcount\n    Addin with no text selected\n-   All the words in an Rmd file, directly using the `word_count`\n    function from the console or command line (RStudio not required),\n    and specifiying the filename as an argument to the function (e.g.\n    `wordcountaddin::word_count(\"my_file.Rmd\")`). This will give you a\n    single integer result, rather than the Markdown table that the other\n    functions return.\n\nIndependent of an Rmd file, you can also count words in a character\nvector from the console using the `text_stats_chr` function (and there\nis `readability_chr` for readability).\n\nWord count\n----------\n\nWhen counting words in the text of your Rmd document, these things will\nbe ignored:\n\n-   YAML front matter  \n-   code chunks and inline code\n-   text in HTML comment tags: `<!-- text -->`\n-   HTML tags in the text: `<br>`, `</br>`\n-   inline URLs in this format: `[text of link](url)`\n-   images with captions in this format:\n    `![this is the caption](/path/to/image.png)`\n-   header level indicators such as `#` and `##`, etc.\n\nAnd because my regex is quite simple, the word count function may also\nignore parts of your actual text that resemble these things.\n\nThe word count will include text in headers, block quotations, verbatim\ncode blocks, tables, raw LaTeX and raw HTML.\n\nIn general, there are numerous ways to count words, with no widely\naccepted standard method. The variety of methods is due to differences\nin the definitions of a word and a sentence. Run\n`?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about\nthe word counting methods.\n\nFor this addin I’ve included two methods, mostly out of curiosity to see\nhow they differ from each other. I use functions from the\n[stringi](https://cran.r-project.org/web/packages/stringi/index.html)\nand [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)\npackages. If you’re curious, you can compare the results you get with\nthis addin to an online tool such as\n<a href=\"http://wordcounttools.com/\" class=\"uri\">http://wordcounttools.com/</a>.\n\nThe output of the `Word count` function is a markdown table in your R\nconsole that might look like this:\n\n    |Method          |koRpus      |stringi       |\n    |:---------------|:-----------|:-------------|\n    |Word count      |107         |104           |\n    |Character count |604         |603           |\n    |Sentence count  |10          |Not available |\n    |Reading time    |0.5 minutes |0.5 minutes   |\n\nIf you want to reuse these results in other R functions, you can use an\nunexported function like this `wordcountaddin:::text_stats_fn_(text)`,\nwhere `text` is a character vector of your text (with length one, ie.\nall your text in a single character string). The output will be a list\nobject, and will include several other items not shown in the markdown\ntable.\n\nReadability\n-----------\n\nThe readability function ignores all the same parts of the text as the\nword count function, and then computes the values of a bunch of\n[readability\nstatistics](https://en.wikipedia.org/wiki/Readability_test).\n\nMost of these readability measurements aim to approximate the years of\neducation required to understand your text. They look at the number of\ncharacters and syllables per word, the number of words per sentence, and\nso on. They don’t analyse the meaning of the words. A score of around\n10-12 is roughly the reading level on completion of high school in the\nUS. These stats are computed by the\n[koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)\npackage.\n\nThere about 27 measurements that this readability function returns\n(depending on how long your text is), including the Automated\nReadability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level,\nand the Simple Measure of Gobbledygook (SMOG). For the full list of\nreadability measurements that are returned by the readability function,\nrun `?koRpus::readability`. That help page also shows the formulae and\ncitations for each statistic (and an additional 20-odd other readability\nstatistics not used here).\n\nReadability stats are, of course, no substitute for critical\nself-reflection on the effectiveness of your writing at communicating\nideas and information. To help with that, read [*Style: Toward Clarity\nand Grace*](http://www.amazon.com/dp/0226899152).\n\nThe output of the `readability` function is a markdown table in your R\nconsole that might look like this:\n\n\n    |index                 |flavour     |raw   |grade |age  |\n    |:---------------------|:-----------|:-----|:-----|:----|\n    |ARI                   |            |      |2.31  |     |\n    |Coleman-Liau          |            |66    |4.91  |     |\n    |Danielson-Bryan DB1   |            |6.46  |      |     |\n    |Danielson-Bryan DB2   |            |60.39 |6     |     |\n    |Dickes-Steiwer        |            |53.07 |      |     |\n    |ELF                   |            |1.83  |      |     |\n    |Farr-Jenkins-Paterson |            |66.81 |8-9   |     |\n    |Flesch                |en (Flesch) |69.57 |8-9   |     |\n    |Flesch-Kincaid        |            |      |4.85  |9.8  |\n    |FOG                   |            |      |7.84  |     |\n    |FORCAST               |            |      |10.28 |15.3 |\n    |Fucks                 |            |23.38 |4.83  |     |\n    |Linsear-Write         |            |      |2.35  |     |\n    |LIX                   |            |32.41 |< 5   |     |\n    |nWS1                  |            |      |4.19  |     |\n    |nWS2                  |            |      |4.72  |     |\n    |nWS3                  |            |      |4.14  |     |\n    |nWS4                  |            |      |3.64  |     |\n    |RIX                   |            |1.42  |5     |     |\n    |SMOG                  |            |      |8.08  |13.1 |\n    |Strain                |            |2.44  |      |     |\n    |TRI                   |            |-94   |      |     |\n    |Tuldava               |            |2.57  |      |     |\n    |Wheeler-Smith         |            |18.33 |2     |     |\n\nSimilar to the `word count` function, if you want to reuse these results\nin other R functions, you can use an unexported function like this\n`wordcountaddin:::readability_fn_(text)`, where `text` is a character\nvector of your text (with length one, ie. all your text in a single\ncharacter string). The output will be a list object with slightly more\ndetail than the summary table above.\n\nInspiration for this addin came from\n[jadd](https://github.com/jennybc/jadd) and\n[WrapRmd](https://github.com/tjmahr/WrapRmd).\n\nHow to install\n--------------\n\nInstall with\n`devtools::install_github(\"benmarwick/wordcountaddin\",  type = \"source\", dependencies = TRUE)`\n\nGo to `Tools > Addins` in RStudio to select and configure addins.\n\nHow to use\n----------\n\n1.  Open a Rmd file in RStudio.  \n2.  Select some text, it can include YAML, code chunks and inline code  \n3.  Go to `Tools > Addins` in RStudio and click on `Word count` or\n    `Readability`. Computing `Readability` may take a few moments on\n    longer documents because it has to count syllables for some of the\n    stats.\n4.  Look in the console for the output\n\nFeedback, contributing, etc.\n----------------------------\n\nPlease [open an\nissue](https://github.com/benmarwick/wordcountaddin/issues/new) if you\nfind something that doesn’t work as expected. Note that this project is\nreleased with a [Guide to Contributing](CONTRIBUTING.md) and a\n[Contributor Code of Conduct](CONDUCT.md). By participating in this\nproject you agree to abide by its terms.\n"
  },
  {
    "path": "codecov.yml",
    "content": "comment: false\n"
  },
  {
    "path": "inst/rstudio/addins.dcf",
    "content": "Name: Word count\nDescription: Counts words and characters (excluding code chunks, inline code, etc.)\nBinding: text_stats\nInteractive: true\n\nName: Readability\nDescription: Computes readability statistics (excluding code chunks, inline code, etc.)\nBinding: readability\nInteractive: true\n"
  },
  {
    "path": "man/text_stats.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hello.R\n\\name{text_stats}\n\\alias{text_stats}\n\\alias{word_count}\n\\alias{readability}\n\\alias{text_stats_chr}\n\\alias{readability_chr}\n\\alias{text_stats_fn_}\n\\title{Get text stats for selected text (excluding code chunks and inline code)}\n\\usage{\ntext_stats(filename = this_filename())\n\nword_count(filename = this_filename())\n\nreadability(filename = this_filename(), quiet = TRUE)\n\ntext_stats_chr(text)\n\nreadability_chr(text, quiet = TRUE)\n\ntext_stats_fn_(text)\n}\n\\arguments{\n\\item{filename}{Path to the file on which to compute text stats.\nDefault is the current file (when working in RStudio) or the file being\nknit (when compiling with \\code{knitr}).}\n\n\\item{quiet}{Logical. Should task be performed quietly?}\n\n\\item{text}{a character string of text, length of one}\n}\n\\description{\nCall this addin to get a word count and some other stats about the text\n\nGet a word count as a single integer\n\nGet readability stats for selected text (excluding code chunks)\n\nGet text stats for selected text (excluding code chunks and inline code)\n\nGet readability stats for selected text (excluding code chunks)\n}\n\\details{\nCall this addin to get readbility stats about the text\n\nUse this function with a character string as input\n\nUse this function with a character string as input\n}\n\\examples{\nmd <- system.file(package = \"wordcountaddin\", \"NEWS.md\")\ntext_stats(md)\nword_count(md)\n\\dontrun{\nreadability(md)\n}\n}\n"
  },
  {
    "path": "man/wordcountaddin.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hello.R\n\\docType{package}\n\\name{wordcountaddin}\n\\alias{wordcountaddin}\n\\title{wordcountaddin}\n\\description{\nThis packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.\n}\n"
  },
  {
    "path": "tests/testthat/test_wordcountaddin.R",
    "content": "library(wordcountaddin)\n\ncontext(\"Word count\")\n\ntest_that(\"Word count is correct for short simple sentence\", {\n  # short sentence\n  eleven_words <- \"here are exactly eleven words of fairly boring and unpunctuated text\"\n\n  short_stats <-  text_stats_fn_(eleven_words)\n  # qdap cannot manage without final punct.\n  n_words_stri_11 <-short_stats$n_words_stri\n  n_words_korp_11 <- short_stats$n_words_korp\n\n  n_char_tot_stri_11 <-  short_stats$n_char_tot_stri\n  n_char_tot_korp_11 <- short_stats$n_char_tot_korp\n\n  expect_equal(n_words_stri_11, 11)\n  expect_equal(n_words_korp_11, 11)\n  expect_equal(n_char_tot_stri_11, 68)\n  expect_equal(n_char_tot_korp_11, 69)\n})\n\ntest_that(\"Word count is correct for moderately complex sentences\", {\n  # Moderate: Harvard sentences, https://en.wikipedia.org/wiki/Harvard_sentences\n  moderately_complex <- \"The birch canoe slid on the smooth planks. Glue the sheet to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of steady work faced us. Large size in stockings is hard to sell.\"\n\n  moderately_complex_stats <- text_stats_fn_(moderately_complex)\n\n  n_char_tot_stri_mc <-  moderately_complex_stats$n_char_tot_stri\n  n_char_tot_korp_mc <- moderately_complex_stats$n_char_tot_korp\n\n  n_words_stri_mc <- moderately_complex_stats$n_words_stri\n  n_words_korp_mc <- moderately_complex_stats$n_words_korp\n\n  n_sentences_korp_mc <- moderately_complex_stats$n_sentences_korp\n\n  expect_equal(n_char_tot_stri_mc, 406)\n  expect_equal(n_char_tot_korp_mc, 407)\n  expect_equal(n_words_stri_mc, 80)  # MS Word says 79\n  expect_equal(n_words_korp_mc, 80)\n  expect_equal(n_sentences_korp_mc, 10)\n})\n\n\n\ntest_that(\"Word count is correct for complex sentences in filler text\", {\n  # Filler text with various punctuation\n  filler <- \"Lorem ipsum dolor sit amet, ea debet error sensibus vix, at esse decore vivendo vim, rebum aliquip an cum? His ea agam novum dissentiet! At mel audire liberavisse, mundi audiam quaeque sea ne. In eam error habemus delectus, audiam ocurreret ne sit, sit ei salutandi liberavisse! Ut vix case corpora.\n\nPosse malorum ponderum in qui, et eum dicam disputando, an vix quaestio scripserit. Falli veniam tamquam id mei. Modo sumo appetere cu mea, mutat possim rationibus ius id. Sed nominati antiopam cu, cu prima mandamus vim. Eos cu exerci consul!\n\nNam case atomorum suavitate cu? No quo inermis necessitatibus, eos ne essent scripta vivendum, ea euismod quaestio qui? Per minim tation accusamus eu, audire dolores nam an. Vel vocent inimicus ut, eu porro libris argumentum quo.\n\nVim no solet tempor, aperiam habemus assueverit ea usu: sea ut quodsi gloriatur! Eum te laudem aliquid inciderint, mollis prodesset mea ad. Dico definiebas efficiendi id usu. No bonorum suavitate adolescens per, ius oratio pericula ut, at mel porro vocibus scriptorem. Sea incorrupte definitiones necessitatibus in, cu ancillae conclusionemque duo. Ex vix dolore propriae principes, ius in augue ludus?\n\nSolet copiosae ea sed, at assum  - dolore delenit has, ex aperiam honestatis mei. No legere nemore nonumes mel. Eu ullum accusata nam, an sea wisi rebum. Ei homero equidem sea! Sed erat augue eripuit et, ea vim altera eirmod labores, ad noster veritus nec.\n\nUt porro sententiae vis, debet affert eligendi id eam! In, nominati, pertinacia has, sea admodum dissentiunt eu! Volumus appellantur ex eos. Ei duo movet scripta aliquid, ea blandit explicari consectetuer eos.\n\nNe cibo ornatus vituperata pri. Soleat populo fierent ne sed, vel congue consequat temporibus in. Pro eu nostro inermis sadipscing, ne pri possim lobortis! Sea sonet nihil accusata no. Mei virtute noluisse pericula ex, aliquid mandamus inimicus quo ex.\n\nEsse patrioque at qui, cum sanctus; consequuntur conclusionemque cu? Ut summo oportere  appellantur mel, ex per tale semper appellantur. Usu ea alia insolens sadipscing, eu aeterno persius vix. Agam prodesset interpretaris at ius, ne est malis signiferumque, illum soluta albucius mei an. Ex error tollit recusabo est, ut prompta consectetuer per. Dicam numquam eum id, brute mollis nam cu!\n\nEi vis discere interesset! Mutat 'option' qualisque ius te, sea deserunt lobortis voluptatum at. Qui et impedit accumsan atomorum, nam dicat possit ornatus an? Eu mei aperiri discere, sea veri homero ad, stet dolore putant mei in. Eu pri debet populo luptatum, eos te nominati concludaturque.\n\nTota veritus similique ne per, eam fastidii voluptatum eu. Sea tale mandamus suscipiantur ex. Ullum ullamcorper consequuntur et cum, aeque fuisset ut sea! Mea graecis pertinax explicari ne, pri tale hinc no? Eu vidisse nominati eum, et eam hendrerit voluptatum assueverit, qui ne munere recusabo democritum.\"\n\n  filler_stats <- text_stats_fn_(filler)\n\n  n_char_tot_stri_f <-  filler_stats$n_char_tot_stri\n  n_char_tot_korp_f <- filler_stats$n_char_tot_korp\n\n  n_words_stri_f <- filler_stats$n_words_stri\n  n_words_korp_f <- filler_stats$n_words_korp\n\n  n_sentences_korp_f <- filler_stats$n_sentences_korp\n\n  expect_equal(n_char_tot_stri_f, 2896)\n  expect_equal(n_char_tot_korp_f, 2897)\n  expect_equal(n_words_stri_f, 450)\n  expect_equal(n_words_korp_f, 450) # MS Word says 442\n  expect_equal(n_sentences_korp_f, 52)\n})\n\n\n\ntest_that(\"Word count is correct for rmd text\", {\n  # text with code chunks, etc.\n  rmd_text <- \"\n\n---\ntitle: 'Untitled'\noutput: html_document\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(echo = TRUE)\n```\n\n<!-- this is an HTML comment -->\n\n## Heading\n\nThis is an [R markdown](http://rmarkdown.rstudio.com/) document.\n\n```{r cars}\nsummary(cars)\n# Lines line this have caused problems -----------------------------------------\n```\n\n`r 2+2`\n\n`r nrow(cars)`\n\n##  Plots\n\nYou can also embed plots, for example:\n\n```{r pressure, echo=FALSE}\nplot(pressure)\n```\n\n![this is the caption](/path/to/image.png)\n\n\"\n\n  rmd_stats <- text_stats_fn_(rmd_text)\n\n  n_char_tot_stri_r <-  rmd_stats$n_char_tot_stri\n  n_char_tot_korp_r <- rmd_stats$n_char_tot_korp\n\n  n_words_stri_r <- rmd_stats$n_words_stri\n  n_words_korp_r <- rmd_stats$n_words_korp\n\n  n_sentences_korp_r <- rmd_stats$n_sentences_korp\n\n  expect_equal(n_char_tot_stri_r, 159)\n  expect_equal(n_char_tot_korp_r, 159)\n  expect_equal(n_words_stri_r, 20)\n  expect_equal(n_words_korp_r, 20)\n  expect_equal(n_sentences_korp_r, 4)\n})\n\ntest_that(\"we can ignore <br> and </br>\", {\n  #  test for <br>\n  string_with_br <- \"Hi, I have <br> in the </br> string\"\n\n  string_with_br_stats <- text_stats_fn_(string_with_br)\n\n  n_char_tot_stri_r <-  string_with_br_stats$n_char_tot_stri\n  n_char_tot_korp_r <- string_with_br_stats$n_char_tot_korp\n\n  n_words_stri_r <- string_with_br_stats$n_words_stri\n  n_words_korp_r <- string_with_br_stats$n_words_korp\n\n  n_sentences_korp_r <- string_with_br_stats$n_sentences_korp\n\n  expect_equal(n_char_tot_stri_r, 26)\n  expect_equal(n_char_tot_korp_r, 27)\n  expect_equal(n_words_stri_r, 6)\n  expect_equal(n_words_korp_r, 6)\n  expect_equal(n_sentences_korp_r, 0)\n})\n\ntest_that(\"we can ignore HTML tags but keep greater/less\", {\n  string_gr_ls <- \"Hi, <br> I am <20 but >10 years old\"\n\n  expect_equal(prep_text(string_gr_ls),\n               \"Hi,  I am 20 but 10 years old\")\n})\n\ntest_that(\"Word count is correct for rmd file\", {\n  # test that we can word count on a file\n  the_rmd_file_stats <- text_stats(filename = test_path(\"test_wordcountaddin.Rmd\"))\n\n  # Values updated to reflect correct Quarto block handling\n  expect_equal(the_rmd_file_stats[3],\n               \"|Word count      |117         |118           |\")\n  expect_equal(the_rmd_file_stats[4],\n               \"|Character count |733         |732           |\")\n  expect_equal(the_rmd_file_stats[5],\n               \"|Sentence count  |27          |Not available |\")\n  expect_equal(the_rmd_file_stats[6],\n               \"|Reading time    |0.6 minutes |0.6 minutes   |\")\n})\n\n\ntest_that(\"Word count is correct for cmd line\", {\n  # command line fns\n  text_on_the_command_line <- \"here is some text\"\n  text_stats_chr_out <- text_stats_chr(text_on_the_command_line)\n\n  expect_equal(text_stats_chr_out[3],\n               \"|Word count      |4         |4             |\")\n  expect_equal(text_stats_chr_out[4],\n               \"|Character count |18        |17            |\")\n  expect_equal(text_stats_chr_out[5],\n               \"|Sentence count  |0         |Not available |\")\n  expect_equal(text_stats_chr_out[6],\n               \"|Reading time    |0 minutes |0 minutes     |\")\n})\n\n\ntest_that(\"readability is correct for cmd line\", {\n  text_on_the_command_line <- \"here is some text\"\n  expect_output(\n    expect_warning(\n      readability_chr_out <- readability_chr(text_on_the_command_line)\n    )\n  )\n  expect_length(readability_chr_out, 27)\n})\n\ntest_that(\"Word count is correct for text with % sign\", {\n  # test for escaping the percent sign in plain text\n  text_with_percent_sign <- \"Here is some % text with percent % signs in it.\"\n\n  text_stats_percent_chr_out <- text_stats_chr(text_with_percent_sign)\n  expect_equal(text_stats_percent_chr_out[3],\n               \"|Word count      |9         |9             |\")\n})\n\n\ntest_that(\"Word count is correct for text with figures included using LaTeX code\", {\n  # test for escaping the percent sign in plain text\n  text_with_figures <- \"One \\\\begin{figure} \\\\caption{text} \\\\label{text} \\\\includegraphics[width=\\\\textwidth]{figure.png} \\\\end{figure} Two \\\\begin{figure} \\\\caption{text} \\\\label{text} \\\\includegraphics[width=\\\\textwidth]{figure.png} \\\\end{figure} Three\"\n\n  text_stats_percent_chr_out <- text_stats_chr(text_with_figures)\n  expect_equal(text_stats_percent_chr_out[3],\n               \"|Word count      |3         |3             |\")\n})\n\n\ntest_that(\"Word count is a single integer for a Rmd file when using word_count\", {\n  # test that we can word count on a file\n  the_rmd_word_count <- word_count(filename = test_path(\"test_wordcountaddin.Rmd\"))\n\n  expect_equal(the_rmd_word_count,\n               117L)\n})\n\ntest_that(\"We can handle very long strings, like citation keys\", {\n\n  expect_output(\n    expect_warning(\n  # test that we can word count on a file\n long_string_read <- readability_chr(\"it's a long string right at the end here because a tiny\n                                     fraction of the refreneces have crazy long keys. Why do they do\n                                     that? It's autogenerated. Why does this give so many warnings when\n                                     testing. It's a puzzle. [@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa].\",\n                                     quiet = TRUE)))\n\n expect_equal( attr(long_string_read, 'format'), \"pipe\")\n\n})\n\ntest_that(\"don't count abbreviations as multiple words\", {\n\n\n  # test that we can word count on a file\n  words_with_abbv <- \"zero .o.n.e .t.wo.\"\n  abbrev_count <- text_stats_chr(words_with_abbv)\n\n expect_equal( abbrev_count[3], \"|Word count      |3         |3             |\")\n\n})\n\ntest_that(\"text_to_count reads file contents as character vector\", {\n  contents <- text_to_count(test_path(\"test_wordcountaddin.Rmd\"))\n\n  expect_type(contents, \"character\")\n  expect_length(contents, 1)\n})\n\ntest_that(\"text_to_count raises an error for invalid file types\", {\n  expect_error(text_to_count(\"invalid.tif\"), regexp = \"works with markdown\")\n})\n\ntest_that(\"Quarto ::: blocks do not break word count\", {\n  quarto_text <- \"\nHere is some text which adds up to 10 words.\n\n::: {#fig-1}\nplot(iris$Sepal.Length)\n:::\n\nHere is more text which adds up to 10 words.\n\nHere is more text which adds up to 10 words.\n\"\n  stats <- text_stats_fn_(quarto_text)\n  # Total words should be around 30.\n  # Before fix, it will be around 10.\n  expect_gt(stats$n_words_stri, 25)\n  expect_gt(stats$n_words_korp, 25)\n})\n"
  },
  {
    "path": "tests/testthat/test_wordcountaddin.Rmd",
    "content": "---\ntitle: \"rmd_test_file.rmd\"\noutput:\n  word_document: default\n  html_document: default\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(echo = TRUE)\n```\n\n## R Markdown\n\nThis is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.\n\nWhen you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:\n\n```{r cars}\nsummary(cars)\n\n# Lines line this have caused problems -----------------------------------------\n```\n\n## Including Plots\n\nYou can also embed plots, for example:\n\n```{r pressure, echo=FALSE}\nplot(pressure)\n```\n\nNote that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.\n\n```{r}\n# context <- rstudioapi::getActiveDocumentContext()\n```\n\nThis Markdown file contains `r wordcountaddin::word_count()` words:\n\n```{r, message=FALSE, echo=FALSE, error=TRUE}\nwordcountaddin::text_stats()\n```\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n:::\n:::\n"
  },
  {
    "path": "tests/testthat.R",
    "content": "library(testthat)\nlibrary(wordcountaddin)\n\ntest_check(\"wordcountaddin\")\n"
  },
  {
    "path": "wordcountaddin.Rproj",
    "content": "Version: 1.0\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 2\nEncoding: UTF-8\n\nRnwWeave: knitr\nLaTeX: XeLaTeX\n\nAutoAppendNewline: Yes\nStripTrailingWhitespace: Yes\n\nBuildType: Package\nPackageUseDevtools: Yes\nPackageInstallArgs: --no-multiarch --with-keep.source\nPackageRoxygenize: rd,collate,namespace\n"
  }
]