Full Code of benmarwick/wordcountaddin for AI

master 99e222cfd964 cached

25 files

53.8 KB

14.8k tokens

1 requests

Download .txt

Repository: benmarwick/wordcountaddin
Branch: master
Commit: 99e222cfd964
Files: 25
Total size: 53.8 KB

Directory structure:
gitextract_4xsqdf0n/

├── .Rbuildignore
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── bug_report.md
│   ├── ISSUE_TEMPLATE.md
│   └── workflows/
│       └── R-CMD-check.yaml
├── .gitignore
├── .travis.yml
├── CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── hello.R
│   └── utils.R
├── README.Rmd
├── README.md
├── codecov.yml
├── inst/
│   └── rstudio/
│       └── addins.dcf
├── man/
│   ├── text_stats.Rd
│   └── wordcountaddin.Rd
├── tests/
│   ├── testthat/
│   │   ├── test_wordcountaddin.R
│   │   ├── test_wordcountaddin.Rmd
│   │   └── test_wordcountaddin.docx
│   └── testthat.R
└── wordcountaddin.Rproj

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
^\.travis\.yml$
^README\.Rmd$
^README-.*\.png$
^CONDUCT\.md$
^CONTRIBUTING.md$
^codecov\.yml$
^wordcountaddin\.Rcheck$
^wordcountaddin.*\.tar\.gz$
^wordcountaddin.*\.tgz$
.github/


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Session Info**
Output of `sessionInfo()` on your device so we can see what packages and version numbers you have


================================================
FILE: .github/ISSUE_TEMPLATE.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''

---

**Please wait for some discussion of your report before making a Pull Request.**

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.

Describe the steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Session Info**
Output of `sessionInfo()` on your device so we can see what packages and version numbers you have


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
name: R CMD CHECK

on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-r@v2

      - name: Install system dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y libcurl4-openssl-dev libfontconfig1-dev libharfbuzz-dev  libfribidi-dev  libtiff-dev libjpeg-dev libwebp-dev pkg-config 
        shell: bash

      - name: Install dependencies
        run: |
          install.packages(c("pak", "devtools", "testthat"))
          pak::local_install_deps()
        shell: Rscript {0}

      - name: Run tests
        run: devtools::test()
        shell: Rscript {0}


================================================
FILE: .gitignore
================================================
.Rproj.user
.Rhistory
.RData
wordcountaddin.Rcheck/
wordcountaddin*.tar.gz
wordcountaddin*.tgz


================================================
FILE: .travis.yml
================================================
# Sample .travis.yml for R projects

language: r
warnings_are_errors: false
sudo: required

r_github_packages:
  - jimhester/covr

after_success:
  - Rscript -e 'covr::codecov()'




================================================
FILE: CONDUCT.md
================================================
# Contributor Code of Conduct

As contributors and maintainers of this project, we pledge to respect all people who 
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this 
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 
from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 
opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant 
(http:contributor-covenant.org), version 1.0.0, available at 
http://contributor-covenant.org/version/1/0/0/


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing Guidelines

## Pull requests

Requirements for making a pull request:

  * Some knowledge of [git]()
* Some knowledge of [GitHub]()

Read more about pull requests on GitHub at [https://help.github.com/articles/using-pull-requests/](https://help.github.com/articles/using-pull-requests/). If you haven't done this before, Hadley Wickham provides a nice overview of git (<http://r-pkgs.had.co.nz/git.html>), as well as best practices for submitting pull requests (<http://r-pkgs.had.co.nz/git.html#pr-make>).

Then:

* Fork the repo to your GitHub account
* Clone the version on your account down to your machine from your account, e.g,. `git clone git@github.com:benmarwick/<package name>.git`
* Make sure to track progress upstream (i.e., on our version of the package at `benmarwick/<package name>`) by doing `git remote add upstream git@github.com:benmarwick/<package name>.git`. Each time you go to make changes on your machine, be sure to pull changes in from upstream (aka the ropensci version) by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step
* Make your changes (we prefer if you make changes on a new branch)
* Ideally included in your contributions:
* Well documented code in roxygen docs
* If you add new functions or change functionality, add one or more tests.
* Make sure the package passes `R CMD CHECK` on your machine without errors/warnings
* Push up to your account
* Submit a pull request and participate in the discussion.

## Documentation contributions

Documentation contributions are surely much needed in every project as each could surely use better instructions. If you are editing any files in the repo, follow the above instructions for pull requests to add contributions. However, if you are editing the wiki, then you can just edit the wiki and no need to do git, pull requests, etc.

All of the function documentation is generated automatically. Please do not edit any of the documentation files in man/ or the NAMESPACE. Instead, construct the appropriate roxygen2 documentation in the function files in R/ themselves. The documentation is then generated by running the document() function from the devtools package. Please consult the Advanced R programming guide if this workflow is unfamiliar to you. Note that functions should include examples in the documentation. Please use \dontrun for examples that take more than a few seconds to execute or require an internet connection.

Likewise, the README.md file in the base directory should not be edited directly. This file is created automatically from code that runs the examples shown, helping to ensure that they are functioning as advertised and consistent with the package README vignette. Instead, edit the README.Rmd source file in manuscripts and run make to build the README.

## Repository structure

This repository is structured as a standard R package following the conventions outlined in the Writing R extensions manual. A few additional files are provided that are not part of the built R package and are listed in .Rbuildignore, such as .travis.yml, which is used for continuous testing and integration.

## Code

All code for this package is found in R/, (except compiled source code, if used, which is in /src). All functions should be thoroughly documented with roxygen2 notation; see Documentation.

Bug reports _must_ have a [reproducible example](http://adv-r.had.co.nz/Reproducibility.html) and include the output of `devtools::session_info()` (instead of `sessionInfo()`). We recommend using Hadley Wickham's style guide when writing code (<http://adv-r.had.co.nz/Style.html>).

## Testing

Any new feature or bug-fix should include a unit-test demonstrating the change. Unit tests follow the testthat framework with files in tests/testthat. Please make sure that the testing suite passes before issuing a pull request. This can be done by running check() from the devtools package, which will also check for consistent documentation, etc.

This package uses the travis continuous testing mechanism for R to ensure that the test suite is run on each push to Github. An icon at the top of the README.md indicates whether or not the tests are currently passing.

## Questions or comments?

Do not hesitate to open an issue in the issues tracker to raise any questions or comments about the package or these guidelines.


================================================
FILE: DESCRIPTION
================================================
Package: wordcountaddin
Type: Package
Title: Word counts and readability statistics in R markdown documents
Version: 0.3.0.9000
Authors@R: c(person("Ben", "Marwick",
                  email = "benmarwick@gmail.com",
                  role = c("aut", "cre")),
            person("JooYoung", "Seo",
                  email = "jooyoung@psu.edu",
                  role = "ctb", comment = c(ORCID = "0000-0002-4064-6012")),
            person("Henrik", "Bengtsson",
                  email = "henrik.bengtsson@gmail.com",
                  role = "ctb"),
            person("Florian S.", "Schaffner",
                  email = "florian.schaffner@outlook.com",
                  role = "ctb"),
            person("Matthew T.", "Warkentin",
                   email = "warkentin@lunenfeld.ca",
                   role = "ctb"),
            person("Luke A.", "McGuinness",
                  email = "luke.a.mcguinness@gmail.com",
                  role = "ctb",
                  comment = c(ORCID = "0000-0001-8730-9761")))
Maintainer: Ben Marwick <benmarwick@gmail.com>
Description: An addin for RStudio that will count the words and characters
    in a plain text document. It is designed for use with RMarkdown
    documents and will exclude YAML header content, code chunks and inline
    code from the counts. It also computes readability statistics so you can
    get an idea of how easy or difficult your text is to read.
License: MIT + file LICENSE
LazyData: TRUE
Imports:
    fs,
    knitr,
    koRpus,
    koRpus.lang.en,
    miniUI (>= 0.1.1),
    purrr,
    rstudioapi (>= 0.5),
    shiny (>= 0.13),
    stringi,
    sylly,
    sylly.en
Encoding: UTF-8
RoxygenNote: 7.1.1
Suggests:
    covr,
    testthat


================================================
FILE: LICENSE
================================================
YEAR: 2017
COPYRIGHT HOLDER: Ben Marwick


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

export(readability)
export(readability_chr)
export(text_stats)
export(text_stats_chr)
export(text_stats_fn_)
export(word_count)
import(koRpus)
import(purrr)
import(stringi)


================================================
FILE: NEWS.md
================================================
# wordcountaddin 0.3.0

NEW FEATURES

* Count words from Rmd filename and get scalar as output (#20)

MINOR IMPROVEMENTS

* make the functions more DRY by adding some unexported fns
* Expanded readme slightly
* Added more tests

# wordcountaddin 0.2.0

NEW FEATURES

* Count words from Rmd filename without using RStudio (#3)
* Count words in active Rmd in RStudio without making text selection (#3)
* Count words in character string from command line (without Rmd or RStudio) (#2)

MINOR IMPROVEMENTS

* Added a `NEWS.md` file to track changes to the package.
* Expanded readme
* Added more tests

BUG FIXES

* Fixed inaccurate count when <br> present (#1)

DEPRECATED AND DEFUNCT

NA

# wordcountaddin 0.1.0

Initial release




================================================
FILE: R/hello.R
================================================
#' wordcountaddin
#'
#' This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.
#'
#' @name wordcountaddin
#' @docType package
#' @import purrr stringi koRpus
NULL

# global things

 md_file_ext_regex <- paste(
    "\\.markdown$",
    "\\.mdown$",
    "\\.mkdn$",
    "\\.md$",
    "\\.mkd$",
    "\\.mdwn$",
    "\\.mdtxt$",
    "\\.mdtext$",
    "\\.rmd$",
    "\\.Rmd$",
    "\\.RMD$",
    "\\.Rmarkdown$",
    "\\.qmd$",
  sep = "|")


#-------------------------------------------------------------------
# fns for working with selected text in an active Rmd

#' Get text stats for selected text (excluding code chunks and inline code)
#'
#' Call this addin to get a word count and some other stats about the text
#' @param filename Path to the file on which to compute text stats.
#' Default is the current file (when working in RStudio) or the file being
#' knit (when compiling with \code{knitr}).
#'
#' @export
#' @examples
#' md <- system.file(package = "wordcountaddin", "NEWS.md")
#' text_stats(md)
#' word_count(md)
#' \dontrun{
#' readability(md)
#' }
text_stats <- function(filename = this_filename()) {

  text_to_count_output <- text_to_count(filename)

  text_stats_fn(text_to_count_output)
}


#' @rdname text_stats
#' @description Get a word count as a single integer
#' @export
word_count <- function(filename = this_filename()){

  text_to_count_output <- text_to_count(filename)

  word_count_output <- text_stats_fn_(text_to_count_output)

  word_count_output$n_words_korp
}






#' @rdname text_stats
#' @description Get readability stats for selected text (excluding code chunks)
#' @param quiet Logical. Should task be performed quietly?
#'
#' @details Call this addin to get readbility stats about the text
#'
#' @export
readability <- function(filename = this_filename(), quiet = TRUE) {


  text_to_count_output <- text_to_count(filename)

  readability_fn(text_to_count_output, quiet = TRUE)
}

#---------------------------------------------------------------
# directly work on a character string in the console


#' @rdname text_stats
#' @description Get text stats for selected text (excluding code chunks and inline code)
#'
#' @details Use this function with a character string as input
#'
#' @export
text_stats_chr <- function(text) {

  text <- paste(text, collapse="\n")

  text_stats_fn(text)

}


#' @rdname text_stats
#' @description Get readability stats for selected text (excluding code chunks)
#'
#' @details Use this function with a character string as input
#'
#' @param text a character string of text, length of one
#'
#' @export
readability_chr <- function(text, quiet = TRUE) {

  text <- paste(text, collapse = "\n")

  readability_fn(text, quiet = TRUE)

}
#-----------------------------------------------------------
# helper fns, not exported

text_to_count <- function(filename){
  # selected text takes precedence over the filename argument:
  # if text is selected, it is used. Otherwise, the text in filename is used
  if (rstudioapi::isAvailable()) {
    context <- rstudioapi::getActiveDocumentContext()
    selection_text <- unname(unlist(context$selection)["text"])
    text_is_selected <- nchar(selection_text) > 0
  } else {
    # if not running in RStudio, assume no text is selected
    text_is_selected <- FALSE
  }

  if (text_is_selected) {
    text <- selection_text
  } else {
    # if no text is selected, read text from "filename" as character vector
    is_extension_invalid <- !grepl(md_file_ext_regex, filename)
    if (is_extension_invalid) {
      stop(paste("The supplied file has an extension which is not associated with markdown.",
                 "This function only works with markdown or R markdown files.", sep = "\n  "))
    }
    text <- paste(scan(filename, 'character', quiet = TRUE), collapse = " ")
  }
  text
}

prep_text <- function(text){

  # remove lines starting with :::
  # we do this before removing line breaks so $ matches end of line
  text <- gsub("(?m)^:::.*$", "", text, perl = TRUE)

  # remove all line breaks, http://stackoverflow.com/a/21781150/1036500
  text <- gsub("[\r\n]", " ", text)

  # don't include yaml front matter
  three_dashes <- unlist(gregexpr('---', text))
  if (three_dashes[1]==1L) {
    yaml_end <- three_dashes[2] + 2L
    text <- substr(text, yaml_end + 1L, nchar(text))
  } else {
    text
  }

  # don't include text in code chunks: https://regex101.com/#python
  text <- gsub("```\\{.+?\\}.+?```", "", text)

  # don't include text in in-line R code
  text <- gsub("`r.+?`", "", text)

  # don't include HTML comments
  text <- gsub("<!--.+?-->", "", text)

  # don't include LaTeX comments
  # how to do this? %%

  # don't include images with captions
  text <- gsub("!\\[.+?\\]\\(.+?\\)\\{.+?\\}", "", text)
  text <- gsub("!\\[.+?\\]\\(.+?\\)", "", text)

  # don't include inline markdown URLs
  text <- gsub("\\(http.+?\\)", "", text)

  # don't include # for headings
  text <- gsub("#*", "", text)

  # don't include opening html tags
  # (source: https://www.w3schools.com/TAGS/default.ASP)

  tags <- paste0("!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|",
                 "audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|",
                 "canvas|caption|center|cite|code|col|colgroup|data|datalist|",
                 "dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|",
                 "figcaption|figure|font|footer|form|frame|frameset|h1 to h6|",
                 "head|header|hr|html|i|iframe|img|input|ins|kbd|label|legend|",
                 "li|link|main|map|mark|meta|meter|nav|noframes|noscript|",
                 "object|ol|optgroup|option|output|p|param|picture|pre|",
                 "progress|q|rp|rt|ruby|s|samp|script|section|select|small|",
                 "source|span|strike|strong|style|sub|summary|sup|svg|table|",
                 "tbody|td|template|textarea|tfoot|th|thead|time|title|tr|",
                 "track|tt|u|ul|var|video|wbr")

  text <- gsub(paste0("<\\s*(",tags,")[^>]*>"),"", text)

  # don't include closing html tags
  text <- gsub("</.+?>", "", text)

  # don't include greater/less than signs because they trip up koRpus
  text <- gsub("<|>", "", text)

  # don't include percent signs because they trip up stringi
  text <- gsub("%", "", text)

  # don't include figures and tables inserted using plain LaTeX code
  text <- gsub("\\\\begin\\{figure\\}(.*?)\\\\end\\{figure\\}", "", text)
  text <- gsub("\\\\begin\\{table\\}(.*?)\\\\end\\{table\\}", "", text)

  # don't count abbreviations as multiple words, but leave
  # the period at the end in case it's the end of a sentence
  text <- gsub("\\.(?=[a-z]+)", "", text, perl = TRUE)

  # don't include LaTeX \eggs{ham}
  # how to do? problem with capturing \x

  if(nchar(text) == 0){
    stop("You have not selected any text. Please select some text with the mouse and try again")
  }

  return(text)

}

prep_text_korpus <- function(text){
  lengths <- unlist(strsplit(text, " "))
  no_long_one <- paste0(ifelse(nchar(lengths) > 30, substr(lengths, 1, 10), lengths), collapse = " ")
  tokenize_safe <- purrr::safely(koRpus::tokenize)
  k1 <- tokenize_safe(no_long_one, lang = 'en', format = 'obj')
  k1 <- k1$result
  return(k1)
}


# These functions do the actual work

#' @rdname text_stats
#' @export
text_stats_fn_ <- function(text){
  # suppress warnings
  oldw <- getOption("warn")
  options(warn = -1)

  text <- prep_text(text)

  require("koRpus.lang.en", quietly = TRUE)

  # stringi methods
  n_char_tot <- sum(stri_stats_latex(text)[c(1,3)])
  n_words_stri <- unname(stri_stats_latex(text)[4])

  #korpus methods
  k1 <- prep_text_korpus(text)
  korpus_stats <- sylly::describe(k1)
  k_nchr <- korpus_stats$all.chars
  k_wc <- korpus_stats$words
  k_sent <- korpus_stats$sentences
  k_wps <- k_wc / k_sent

  # reading time
  # https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_comprehension
  # assume 200 words per min
  wpm <-  200
  reading_time_korp <- paste0(round(k_wc / wpm, 1), " minutes")
  reading_time_stri <- paste0(round(n_words_stri / wpm, 1), " minutes")

  return(list(
  # make the names more useful
  n_char_tot_stri = n_char_tot,
  n_char_tot_korp = k_nchr,
  n_words_korp = k_wc,
  n_words_stri = n_words_stri,
  n_sentences_korp = k_sent,
  words_per_sentence_korp = k_wps,
  reading_time_korp = reading_time_korp,
  reading_time_stri = reading_time_stri
  ))

  # resume warnings
  options(warn = oldw)

}



text_stats_fn <- function(text){

  l <- text_stats_fn_(text)

  results_df <- data.frame(Method = c("Word count", "Character count", "Sentence count", "Reading time"),
                           koRpus  = c(l$n_words_korp, l$n_char_tot_korp, l$n_sentences_korp, l$reading_time_korp),
                           stringi = c(l$n_words_stri, l$n_char_tot_stri, "Not available", l$reading_time_stri)
                           )

  results_df_tab <- knitr::kable(results_df)
  return(results_df_tab)

}


readability_fn_ <- function(text, quiet = TRUE){

  text <- prep_text(text)

  oldw <- getOption("warn")
  options(warn = -1)

  require("koRpus.lang.en", quietly = TRUE)

  # korpus methods
  k1 <- prep_text_korpus(text)
  k_readability <- koRpus::readability(k1, quiet = TRUE)

  return(k_readability)

  # resume warnings
  options(warn = oldw)
}


readability_fn <- function(text, quiet = TRUE){
  # a more condensed overview of the results
  k_readability <- readability_fn_(text, quiet = TRUE)
  readability_summary_table <- knitr::kable(summary(k_readability))
  return(readability_summary_table)

}


================================================
FILE: R/utils.R
================================================
# Get the filename of the current file, or
# the file being rendered

this_filename <- function() {
  if (interactive()) {
    filename <- rstudioapi::getSourceEditorContext()$path
  } else {
    filename <- knitr::current_input()
  }
  return(fs::path(filename))
}


================================================
FILE: README.Rmd
================================================
---
output:
  md_document:
    variant: markdown_github
---



<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```


# wordcountaddin <img src="inst/logo.png" align="right" height="130" />

[![Last-changedate](https://img.shields.io/badge/last%20change-`r gsub('-', '--', Sys.Date())`-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master) 
[![minimal R version](https://img.shields.io/badge/R%3E%3D-`r as.character(getRversion())`-brightgreen.svg)](https://cran.r-project.org/)
[![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/) 
[![Travis-CI Build Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin) 
[![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master) [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531) 




This R package is an [RStudio addin](https://rstudio.github.io/rstudioaddins/) to count words and characters in text in an [R markdown](http://rmarkdown.rstudio.com/) document. It also has a function to compute readability statistics so you can get an indication of how easy or difficult your document is to read. 

You can count words in your Rmd file in three ways:

- In a selection of text in your active Rmd, by selecting some text with your mouse in RStudio and using the Wordcount Addin   
- All the words in your active Rmd in RStudio, by using the Wordcount Addin  with no text selected
- All the words in an Rmd file, directly using the `word_count` function from the console or command line (RStudio not required), and specifiying the filename as an argument to the function (e.g. `wordcountaddin::word_count("my_file.Rmd")`). This will give you a single integer result, rather than the Markdown table that the other functions return. 

Independent of an Rmd file, you can also count words in a character vector from the console using the `text_stats_chr` function (and there is `readability_chr` for readability). 

## Word count

When counting words in the text of your Rmd document, these things will be ignored:

- YAML front matter    
- code chunks and inline code
- text in HTML comment tags: `<!-- text -->` 
- HTML tags in the text: `<br>`,  `</br>`
- inline URLs in this format: `[text of link](url)`
- images with captions in this format: `![this is the caption](/path/to/image.png)`
- header level indicators such as `#` and `##`, etc.

And because my regex is quite simple, the word count function may also ignore parts of your actual text that resemble these things. 

The word count will include text in headers, block quotations, verbatim code blocks, tables, raw LaTeX and raw HTML. 

In general, there are numerous ways to count words, with no widely accepted standard method. The variety of methods is due to differences in the definitions of a word and a sentence. Run `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about the word counting methods.

For this addin I've included two methods, mostly out of curiosity to see how they differ from each other. I use functions from the  [stringi](https://cran.r-project.org/web/packages/stringi/index.html) and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) packages. If you're curious, you can compare the results you get with this addin to an online tool such as <http://wordcounttools.com/>.

The output of the `Word count` function is a markdown table in your R console that might look like this:

```
|Method          |koRpus      |stringi       |
|:---------------|:-----------|:-------------|
|Word count      |107         |104           |
|Character count |604         |603           |
|Sentence count  |10          |Not available |
|Reading time    |0.5 minutes |0.5 minutes   |
```

If you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::text_stats_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object, and will include several other items not shown in the markdown table.

## Readability 

The readability function ignores all the same parts of the text as the word count function, and then computes the values of a bunch of [readability statistics](https://en.wikipedia.org/wiki/Readability_test).

Most of these readability measurements aim to approximate the years of education required to understand your text. They look at the number of characters and syllables per word, the number of words per sentence, and so on. They don't analyse the meaning of the words. A score of around 10-12 is roughly the reading level on completion of high school in the US. These stats are computed by the [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) package. 

There about 27 measurements that this readability function returns (depending on how long your text is), including the Automated Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, and the Simple Measure of Gobbledygook (SMOG). For the full list of readability measurements that are returned by the readability function, run `?koRpus::readability`. That help page also shows the formulae and citations for each statistic (and an additional 20-odd other readability statistics not used here). 

Readability stats are, of course, no substitute for critical self-reflection on the effectiveness of your writing at communicating ideas and information. To help with that, read [_Style: Toward Clarity and Grace_](http://www.amazon.com/dp/0226899152).


The output of the `readability` function is a markdown table in your R console that might look like this:

```

|index                 |flavour     |raw   |grade |age  |
|:---------------------|:-----------|:-----|:-----|:----|
|ARI                   |            |      |2.31  |     |
|Coleman-Liau          |            |66    |4.91  |     |
|Danielson-Bryan DB1   |            |6.46  |      |     |
|Danielson-Bryan DB2   |            |60.39 |6     |     |
|Dickes-Steiwer        |            |53.07 |      |     |
|ELF                   |            |1.83  |      |     |
|Farr-Jenkins-Paterson |            |66.81 |8-9   |     |
|Flesch                |en (Flesch) |69.57 |8-9   |     |
|Flesch-Kincaid        |            |      |4.85  |9.8  |
|FOG                   |            |      |7.84  |     |
|FORCAST               |            |      |10.28 |15.3 |
|Fucks                 |            |23.38 |4.83  |     |
|Linsear-Write         |            |      |2.35  |     |
|LIX                   |            |32.41 |< 5   |     |
|nWS1                  |            |      |4.19  |     |
|nWS2                  |            |      |4.72  |     |
|nWS3                  |            |      |4.14  |     |
|nWS4                  |            |      |3.64  |     |
|RIX                   |            |1.42  |5     |     |
|SMOG                  |            |      |8.08  |13.1 |
|Strain                |            |2.44  |      |     |
|TRI                   |            |-94   |      |     |
|Tuldava               |            |2.57  |      |     |
|Wheeler-Smith         |            |18.33 |2     |     |
```

Similar to the `word count` function, if you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::readability_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object with slightly more detail than the summary table above. 

Inspiration for this addin came from [jadd](https://github.com/jennybc/jadd) and [WrapRmd](https://github.com/tjmahr/WrapRmd). 

## How to install

Install with `devtools::install_github("benmarwick/wordcountaddin",  type = "source", dependencies = TRUE)`

Go to `Tools > Addins` in RStudio to select and configure addins. 

## How to use

1. Open a Rmd file in RStudio.  
2. Select some text, it can include YAML, code chunks and inline code   
3. Go to `Tools > Addins` in RStudio and click on `Word count` or `Readability`. Computing `Readability` may take a few moments on longer documents because it has to count syllables for some of the stats.
4. Look in the console for the output   


## Feedback, contributing, etc.

Please [open an issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you find something that doesn't work as expected. Note that this project is released with a [Guide to Contributing](CONTRIBUTING.md) and a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.


================================================
FILE: README.md
================================================
<!-- README.md is generated from README.Rmd. Please edit that file -->
wordcountaddin <img src="inst/logo.png" align="right" height="130" />
=====================================================================

[![Last-changedate](https://img.shields.io/badge/last%20change-2019--01--09-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master)
[![minimal R
version](https://img.shields.io/badge/R%3E%3D-3.5.2-brightgreen.svg)](https://cran.r-project.org/)
[![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/)
[![Travis-CI Build
Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin)
[![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master)
[![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531)

This R package is an [RStudio
addin](https://rstudio.github.io/rstudioaddins/) to count words and
characters in text in an [R markdown](http://rmarkdown.rstudio.com/)
document. It also has a function to compute readability statistics so
you can get an indication of how easy or difficult your document is to
read.

You can count words in your Rmd file in three ways:

-   In a selection of text in your active Rmd, by selecting some text
    with your mouse in RStudio and using the Wordcount Addin  
-   All the words in your active Rmd in RStudio, by using the Wordcount
    Addin with no text selected
-   All the words in an Rmd file, directly using the `word_count`
    function from the console or command line (RStudio not required),
    and specifiying the filename as an argument to the function (e.g.
    `wordcountaddin::word_count("my_file.Rmd")`). This will give you a
    single integer result, rather than the Markdown table that the other
    functions return.

Independent of an Rmd file, you can also count words in a character
vector from the console using the `text_stats_chr` function (and there
is `readability_chr` for readability).

Word count
----------

When counting words in the text of your Rmd document, these things will
be ignored:

-   YAML front matter  
-   code chunks and inline code
-   text in HTML comment tags: `<!-- text -->`
-   HTML tags in the text: `<br>`, `</br>`
-   inline URLs in this format: `[text of link](url)`
-   images with captions in this format:
    `![this is the caption](/path/to/image.png)`
-   header level indicators such as `#` and `##`, etc.

And because my regex is quite simple, the word count function may also
ignore parts of your actual text that resemble these things.

The word count will include text in headers, block quotations, verbatim
code blocks, tables, raw LaTeX and raw HTML.

In general, there are numerous ways to count words, with no widely
accepted standard method. The variety of methods is due to differences
in the definitions of a word and a sentence. Run
`?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about
the word counting methods.

For this addin I’ve included two methods, mostly out of curiosity to see
how they differ from each other. I use functions from the
[stringi](https://cran.r-project.org/web/packages/stringi/index.html)
and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)
packages. If you’re curious, you can compare the results you get with
this addin to an online tool such as
<a href="http://wordcounttools.com/" class="uri">http://wordcounttools.com/</a>.

The output of the `Word count` function is a markdown table in your R
console that might look like this:

    |Method          |koRpus      |stringi       |
    |:---------------|:-----------|:-------------|
    |Word count      |107         |104           |
    |Character count |604         |603           |
    |Sentence count  |10          |Not available |
    |Reading time    |0.5 minutes |0.5 minutes   |

If you want to reuse these results in other R functions, you can use an
unexported function like this `wordcountaddin:::text_stats_fn_(text)`,
where `text` is a character vector of your text (with length one, ie.
all your text in a single character string). The output will be a list
object, and will include several other items not shown in the markdown
table.

Readability
-----------

The readability function ignores all the same parts of the text as the
word count function, and then computes the values of a bunch of
[readability
statistics](https://en.wikipedia.org/wiki/Readability_test).

Most of these readability measurements aim to approximate the years of
education required to understand your text. They look at the number of
characters and syllables per word, the number of words per sentence, and
so on. They don’t analyse the meaning of the words. A score of around
10-12 is roughly the reading level on completion of high school in the
US. These stats are computed by the
[koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)
package.

There about 27 measurements that this readability function returns
(depending on how long your text is), including the Automated
Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level,
and the Simple Measure of Gobbledygook (SMOG). For the full list of
readability measurements that are returned by the readability function,
run `?koRpus::readability`. That help page also shows the formulae and
citations for each statistic (and an additional 20-odd other readability
statistics not used here).

Readability stats are, of course, no substitute for critical
self-reflection on the effectiveness of your writing at communicating
ideas and information. To help with that, read [*Style: Toward Clarity
and Grace*](http://www.amazon.com/dp/0226899152).

The output of the `readability` function is a markdown table in your R
console that might look like this:


    |index                 |flavour     |raw   |grade |age  |
    |:---------------------|:-----------|:-----|:-----|:----|
    |ARI                   |            |      |2.31  |     |
    |Coleman-Liau          |            |66    |4.91  |     |
    |Danielson-Bryan DB1   |            |6.46  |      |     |
    |Danielson-Bryan DB2   |            |60.39 |6     |     |
    |Dickes-Steiwer        |            |53.07 |      |     |
    |ELF                   |            |1.83  |      |     |
    |Farr-Jenkins-Paterson |            |66.81 |8-9   |     |
    |Flesch                |en (Flesch) |69.57 |8-9   |     |
    |Flesch-Kincaid        |            |      |4.85  |9.8  |
    |FOG                   |            |      |7.84  |     |
    |FORCAST               |            |      |10.28 |15.3 |
    |Fucks                 |            |23.38 |4.83  |     |
    |Linsear-Write         |            |      |2.35  |     |
    |LIX                   |            |32.41 |< 5   |     |
    |nWS1                  |            |      |4.19  |     |
    |nWS2                  |            |      |4.72  |     |
    |nWS3                  |            |      |4.14  |     |
    |nWS4                  |            |      |3.64  |     |
    |RIX                   |            |1.42  |5     |     |
    |SMOG                  |            |      |8.08  |13.1 |
    |Strain                |            |2.44  |      |     |
    |TRI                   |            |-94   |      |     |
    |Tuldava               |            |2.57  |      |     |
    |Wheeler-Smith         |            |18.33 |2     |     |

Similar to the `word count` function, if you want to reuse these results
in other R functions, you can use an unexported function like this
`wordcountaddin:::readability_fn_(text)`, where `text` is a character
vector of your text (with length one, ie. all your text in a single
character string). The output will be a list object with slightly more
detail than the summary table above.

Inspiration for this addin came from
[jadd](https://github.com/jennybc/jadd) and
[WrapRmd](https://github.com/tjmahr/WrapRmd).

How to install
--------------

Install with
`devtools::install_github("benmarwick/wordcountaddin",  type = "source", dependencies = TRUE)`

Go to `Tools > Addins` in RStudio to select and configure addins.

How to use
----------

1.  Open a Rmd file in RStudio.  
2.  Select some text, it can include YAML, code chunks and inline code  
3.  Go to `Tools > Addins` in RStudio and click on `Word count` or
    `Readability`. Computing `Readability` may take a few moments on
    longer documents because it has to count syllables for some of the
    stats.
4.  Look in the console for the output

Feedback, contributing, etc.
----------------------------

Please [open an
issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you
find something that doesn’t work as expected. Note that this project is
released with a [Guide to Contributing](CONTRIBUTING.md) and a
[Contributor Code of Conduct](CONDUCT.md). By participating in this
project you agree to abide by its terms.


================================================
FILE: codecov.yml
================================================
comment: false


================================================
FILE: inst/rstudio/addins.dcf
================================================
Name: Word count
Description: Counts words and characters (excluding code chunks, inline code, etc.)
Binding: text_stats
Interactive: true

Name: Readability
Description: Computes readability statistics (excluding code chunks, inline code, etc.)
Binding: readability
Interactive: true


================================================
FILE: man/text_stats.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hello.R
\name{text_stats}
\alias{text_stats}
\alias{word_count}
\alias{readability}
\alias{text_stats_chr}
\alias{readability_chr}
\alias{text_stats_fn_}
\title{Get text stats for selected text (excluding code chunks and inline code)}
\usage{
text_stats(filename = this_filename())

word_count(filename = this_filename())

readability(filename = this_filename(), quiet = TRUE)

text_stats_chr(text)

readability_chr(text, quiet = TRUE)

text_stats_fn_(text)
}
\arguments{
\item{filename}{Path to the file on which to compute text stats.
Default is the current file (when working in RStudio) or the file being
knit (when compiling with \code{knitr}).}

\item{quiet}{Logical. Should task be performed quietly?}

\item{text}{a character string of text, length of one}
}
\description{
Call this addin to get a word count and some other stats about the text

Get a word count as a single integer

Get readability stats for selected text (excluding code chunks)

Get text stats for selected text (excluding code chunks and inline code)

Get readability stats for selected text (excluding code chunks)
}
\details{
Call this addin to get readbility stats about the text

Use this function with a character string as input

Use this function with a character string as input
}
\examples{
md <- system.file(package = "wordcountaddin", "NEWS.md")
text_stats(md)
word_count(md)
\dontrun{
readability(md)
}
}


================================================
FILE: man/wordcountaddin.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hello.R
\docType{package}
\name{wordcountaddin}
\alias{wordcountaddin}
\title{wordcountaddin}
\description{
This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.
}


================================================
FILE: tests/testthat/test_wordcountaddin.R
================================================
library(wordcountaddin)

context("Word count")

test_that("Word count is correct for short simple sentence", {
  # short sentence
  eleven_words <- "here are exactly eleven words of fairly boring and unpunctuated text"

  short_stats <-  text_stats_fn_(eleven_words)
  # qdap cannot manage without final punct.
  n_words_stri_11 <-short_stats$n_words_stri
  n_words_korp_11 <- short_stats$n_words_korp

  n_char_tot_stri_11 <-  short_stats$n_char_tot_stri
  n_char_tot_korp_11 <- short_stats$n_char_tot_korp

  expect_equal(n_words_stri_11, 11)
  expect_equal(n_words_korp_11, 11)
  expect_equal(n_char_tot_stri_11, 68)
  expect_equal(n_char_tot_korp_11, 69)
})

test_that("Word count is correct for moderately complex sentences", {
  # Moderate: Harvard sentences, https://en.wikipedia.org/wiki/Harvard_sentences
  moderately_complex <- "The birch canoe slid on the smooth planks. Glue the sheet to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of steady work faced us. Large size in stockings is hard to sell."

  moderately_complex_stats <- text_stats_fn_(moderately_complex)

  n_char_tot_stri_mc <-  moderately_complex_stats$n_char_tot_stri
  n_char_tot_korp_mc <- moderately_complex_stats$n_char_tot_korp

  n_words_stri_mc <- moderately_complex_stats$n_words_stri
  n_words_korp_mc <- moderately_complex_stats$n_words_korp

  n_sentences_korp_mc <- moderately_complex_stats$n_sentences_korp

  expect_equal(n_char_tot_stri_mc, 406)
  expect_equal(n_char_tot_korp_mc, 407)
  expect_equal(n_words_stri_mc, 80)  # MS Word says 79
  expect_equal(n_words_korp_mc, 80)
  expect_equal(n_sentences_korp_mc, 10)
})



test_that("Word count is correct for complex sentences in filler text", {
  # Filler text with various punctuation
  filler <- "Lorem ipsum dolor sit amet, ea debet error sensibus vix, at esse decore vivendo vim, rebum aliquip an cum? His ea agam novum dissentiet! At mel audire liberavisse, mundi audiam quaeque sea ne. In eam error habemus delectus, audiam ocurreret ne sit, sit ei salutandi liberavisse! Ut vix case corpora.

Posse malorum ponderum in qui, et eum dicam disputando, an vix quaestio scripserit. Falli veniam tamquam id mei. Modo sumo appetere cu mea, mutat possim rationibus ius id. Sed nominati antiopam cu, cu prima mandamus vim. Eos cu exerci consul!

Nam case atomorum suavitate cu? No quo inermis necessitatibus, eos ne essent scripta vivendum, ea euismod quaestio qui? Per minim tation accusamus eu, audire dolores nam an. Vel vocent inimicus ut, eu porro libris argumentum quo.

Vim no solet tempor, aperiam habemus assueverit ea usu: sea ut quodsi gloriatur! Eum te laudem aliquid inciderint, mollis prodesset mea ad. Dico definiebas efficiendi id usu. No bonorum suavitate adolescens per, ius oratio pericula ut, at mel porro vocibus scriptorem. Sea incorrupte definitiones necessitatibus in, cu ancillae conclusionemque duo. Ex vix dolore propriae principes, ius in augue ludus?

Solet copiosae ea sed, at assum  - dolore delenit has, ex aperiam honestatis mei. No legere nemore nonumes mel. Eu ullum accusata nam, an sea wisi rebum. Ei homero equidem sea! Sed erat augue eripuit et, ea vim altera eirmod labores, ad noster veritus nec.

Ut porro sententiae vis, debet affert eligendi id eam! In, nominati, pertinacia has, sea admodum dissentiunt eu! Volumus appellantur ex eos. Ei duo movet scripta aliquid, ea blandit explicari consectetuer eos.

Ne cibo ornatus vituperata pri. Soleat populo fierent ne sed, vel congue consequat temporibus in. Pro eu nostro inermis sadipscing, ne pri possim lobortis! Sea sonet nihil accusata no. Mei virtute noluisse pericula ex, aliquid mandamus inimicus quo ex.

Esse patrioque at qui, cum sanctus; consequuntur conclusionemque cu? Ut summo oportere  appellantur mel, ex per tale semper appellantur. Usu ea alia insolens sadipscing, eu aeterno persius vix. Agam prodesset interpretaris at ius, ne est malis signiferumque, illum soluta albucius mei an. Ex error tollit recusabo est, ut prompta consectetuer per. Dicam numquam eum id, brute mollis nam cu!

Ei vis discere interesset! Mutat 'option' qualisque ius te, sea deserunt lobortis voluptatum at. Qui et impedit accumsan atomorum, nam dicat possit ornatus an? Eu mei aperiri discere, sea veri homero ad, stet dolore putant mei in. Eu pri debet populo luptatum, eos te nominati concludaturque.

Tota veritus similique ne per, eam fastidii voluptatum eu. Sea tale mandamus suscipiantur ex. Ullum ullamcorper consequuntur et cum, aeque fuisset ut sea! Mea graecis pertinax explicari ne, pri tale hinc no? Eu vidisse nominati eum, et eam hendrerit voluptatum assueverit, qui ne munere recusabo democritum."

  filler_stats <- text_stats_fn_(filler)

  n_char_tot_stri_f <-  filler_stats$n_char_tot_stri
  n_char_tot_korp_f <- filler_stats$n_char_tot_korp

  n_words_stri_f <- filler_stats$n_words_stri
  n_words_korp_f <- filler_stats$n_words_korp

  n_sentences_korp_f <- filler_stats$n_sentences_korp

  expect_equal(n_char_tot_stri_f, 2896)
  expect_equal(n_char_tot_korp_f, 2897)
  expect_equal(n_words_stri_f, 450)
  expect_equal(n_words_korp_f, 450) # MS Word says 442
  expect_equal(n_sentences_korp_f, 52)
})



test_that("Word count is correct for rmd text", {
  # text with code chunks, etc.
  rmd_text <- "

---
title: 'Untitled'
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

<!-- this is an HTML comment -->

## Heading

This is an [R markdown](http://rmarkdown.rstudio.com/) document.

```{r cars}
summary(cars)
# Lines line this have caused problems -----------------------------------------
```

`r 2+2`

`r nrow(cars)`

##  Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

![this is the caption](/path/to/image.png)

"

  rmd_stats <- text_stats_fn_(rmd_text)

  n_char_tot_stri_r <-  rmd_stats$n_char_tot_stri
  n_char_tot_korp_r <- rmd_stats$n_char_tot_korp

  n_words_stri_r <- rmd_stats$n_words_stri
  n_words_korp_r <- rmd_stats$n_words_korp

  n_sentences_korp_r <- rmd_stats$n_sentences_korp

  expect_equal(n_char_tot_stri_r, 159)
  expect_equal(n_char_tot_korp_r, 159)
  expect_equal(n_words_stri_r, 20)
  expect_equal(n_words_korp_r, 20)
  expect_equal(n_sentences_korp_r, 4)
})

test_that("we can ignore <br> and </br>", {
  #  test for <br>
  string_with_br <- "Hi, I have <br> in the </br> string"

  string_with_br_stats <- text_stats_fn_(string_with_br)

  n_char_tot_stri_r <-  string_with_br_stats$n_char_tot_stri
  n_char_tot_korp_r <- string_with_br_stats$n_char_tot_korp

  n_words_stri_r <- string_with_br_stats$n_words_stri
  n_words_korp_r <- string_with_br_stats$n_words_korp

  n_sentences_korp_r <- string_with_br_stats$n_sentences_korp

  expect_equal(n_char_tot_stri_r, 26)
  expect_equal(n_char_tot_korp_r, 27)
  expect_equal(n_words_stri_r, 6)
  expect_equal(n_words_korp_r, 6)
  expect_equal(n_sentences_korp_r, 0)
})

test_that("we can ignore HTML tags but keep greater/less", {
  string_gr_ls <- "Hi, <br> I am <20 but >10 years old"

  expect_equal(prep_text(string_gr_ls),
               "Hi,  I am 20 but 10 years old")
})

test_that("Word count is correct for rmd file", {
  # test that we can word count on a file
  the_rmd_file_stats <- text_stats(filename = test_path("test_wordcountaddin.Rmd"))

  # Values updated to reflect correct Quarto block handling
  expect_equal(the_rmd_file_stats[3],
               "|Word count      |117         |118           |")
  expect_equal(the_rmd_file_stats[4],
               "|Character count |733         |732           |")
  expect_equal(the_rmd_file_stats[5],
               "|Sentence count  |27          |Not available |")
  expect_equal(the_rmd_file_stats[6],
               "|Reading time    |0.6 minutes |0.6 minutes   |")
})


test_that("Word count is correct for cmd line", {
  # command line fns
  text_on_the_command_line <- "here is some text"
  text_stats_chr_out <- text_stats_chr(text_on_the_command_line)

  expect_equal(text_stats_chr_out[3],
               "|Word count      |4         |4             |")
  expect_equal(text_stats_chr_out[4],
               "|Character count |18        |17            |")
  expect_equal(text_stats_chr_out[5],
               "|Sentence count  |0         |Not available |")
  expect_equal(text_stats_chr_out[6],
               "|Reading time    |0 minutes |0 minutes     |")
})


test_that("readability is correct for cmd line", {
  text_on_the_command_line <- "here is some text"
  expect_output(
    expect_warning(
      readability_chr_out <- readability_chr(text_on_the_command_line)
    )
  )
  expect_length(readability_chr_out, 27)
})

test_that("Word count is correct for text with % sign", {
  # test for escaping the percent sign in plain text
  text_with_percent_sign <- "Here is some % text with percent % signs in it."

  text_stats_percent_chr_out <- text_stats_chr(text_with_percent_sign)
  expect_equal(text_stats_percent_chr_out[3],
               "|Word count      |9         |9             |")
})


test_that("Word count is correct for text with figures included using LaTeX code", {
  # test for escaping the percent sign in plain text
  text_with_figures <- "One \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Two \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Three"

  text_stats_percent_chr_out <- text_stats_chr(text_with_figures)
  expect_equal(text_stats_percent_chr_out[3],
               "|Word count      |3         |3             |")
})


test_that("Word count is a single integer for a Rmd file when using word_count", {
  # test that we can word count on a file
  the_rmd_word_count <- word_count(filename = test_path("test_wordcountaddin.Rmd"))

  expect_equal(the_rmd_word_count,
               117L)
})

test_that("We can handle very long strings, like citation keys", {

  expect_output(
    expect_warning(
  # test that we can word count on a file
 long_string_read <- readability_chr("it's a long string right at the end here because a tiny
                                     fraction of the refreneces have crazy long keys. Why do they do
                                     that? It's autogenerated. Why does this give so many warnings when
                                     testing. It's a puzzle. [@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa].",
                                     quiet = TRUE)))

 expect_equal( attr(long_string_read, 'format'), "pipe")

})

test_that("don't count abbreviations as multiple words", {


  # test that we can word count on a file
  words_with_abbv <- "zero .o.n.e .t.wo."
  abbrev_count <- text_stats_chr(words_with_abbv)

 expect_equal( abbrev_count[3], "|Word count      |3         |3             |")

})

test_that("text_to_count reads file contents as character vector", {
  contents <- text_to_count(test_path("test_wordcountaddin.Rmd"))

  expect_type(contents, "character")
  expect_length(contents, 1)
})

test_that("text_to_count raises an error for invalid file types", {
  expect_error(text_to_count("invalid.tif"), regexp = "works with markdown")
})

test_that("Quarto ::: blocks do not break word count", {
  quarto_text <- "
Here is some text which adds up to 10 words.

::: {#fig-1}
plot(iris$Sepal.Length)
:::

Here is more text which adds up to 10 words.

Here is more text which adds up to 10 words.
"
  stats <- text_stats_fn_(quarto_text)
  # Total words should be around 30.
  # Before fix, it will be around 10.
  expect_gt(stats$n_words_stri, 25)
  expect_gt(stats$n_words_korp, 25)
})


================================================
FILE: tests/testthat/test_wordcountaddin.Rmd
================================================
---
title: "rmd_test_file.rmd"
output:
  word_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)

# Lines line this have caused problems -----------------------------------------
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

```{r}
# context <- rstudioapi::getActiveDocumentContext()
```

This Markdown file contains `r wordcountaddin::word_count()` words:

```{r, message=FALSE, echo=FALSE, error=TRUE}
wordcountaddin::text_stats()
```


::: {.cell layout-align="center"}

:::

::: {.cell layout-align="center"}
::: {.cell-output-display}
:::
:::


================================================
FILE: tests/testthat.R
================================================
library(testthat)
library(wordcountaddin)

test_check("wordcountaddin")


================================================
FILE: wordcountaddin.Rproj
================================================
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: knitr
LaTeX: XeLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace

Download .txt

gitextract_4xsqdf0n/

├── .Rbuildignore
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── bug_report.md
│   ├── ISSUE_TEMPLATE.md
│   └── workflows/
│       └── R-CMD-check.yaml
├── .gitignore
├── .travis.yml
├── CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── hello.R
│   └── utils.R
├── README.Rmd
├── README.md
├── codecov.yml
├── inst/
│   └── rstudio/
│       └── addins.dcf
├── man/
│   ├── text_stats.Rd
│   └── wordcountaddin.Rd
├── tests/
│   ├── testthat/
│   │   ├── test_wordcountaddin.R
│   │   ├── test_wordcountaddin.Rmd
│   │   └── test_wordcountaddin.docx
│   └── testthat.R
└── wordcountaddin.Rproj

Download .json

Condensed preview — 25 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (58K chars).

[
  {
    "path": ".Rbuildignore",
    "chars": 208,
    "preview": "^.*\\.Rproj$\n^\\.Rproj\\.user$\n^\\.travis\\.yml$\n^README\\.Rmd$\n^README-.*\\.png$\n^CONDUCT\\.md$\n^CONTRIBUTING.md$\n^codecov\\.yml"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 594,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the "
  },
  {
    "path": ".github/ISSUE_TEMPLATE.md",
    "chars": 795,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Please wait fo"
  },
  {
    "path": ".github/workflows/R-CMD-check.yaml",
    "chars": 677,
    "preview": "name: R CMD CHECK\n\non: [push]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n  "
  },
  {
    "path": ".gitignore",
    "chars": 95,
    "preview": ".Rproj.user\n.Rhistory\n.RData\nwordcountaddin.Rcheck/\nwordcountaddin*.tar.gz\nwordcountaddin*.tgz\n"
  },
  {
    "path": ".travis.yml",
    "chars": 181,
    "preview": "# Sample .travis.yml for R projects\n\nlanguage: r\nwarnings_are_errors: false\nsudo: required\n\nr_github_packages:\n  - jimhe"
  },
  {
    "path": "CONDUCT.md",
    "chars": 1387,
    "preview": "# Contributor Code of Conduct\n\nAs contributors and maintainers of this project, we pledge to respect all people who \ncon"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 4402,
    "preview": "# Contributing Guidelines\n\n## Pull requests\n\nRequirements for making a pull request:\n\n  * Some knowledge of [git]()\n* So"
  },
  {
    "path": "DESCRIPTION",
    "chars": 1711,
    "preview": "Package: wordcountaddin\nType: Package\nTitle: Word counts and readability statistics in R markdown documents\nVersion: 0.3"
  },
  {
    "path": "LICENSE",
    "chars": 41,
    "preview": "YEAR: 2017\nCOPYRIGHT HOLDER: Ben Marwick\n"
  },
  {
    "path": "NAMESPACE",
    "chars": 219,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nexport(readability)\nexport(readability_chr)\nexport(text_stats)\nexport(text"
  },
  {
    "path": "NEWS.md",
    "chars": 729,
    "preview": "# wordcountaddin 0.3.0\n\nNEW FEATURES\n\n* Count words from Rmd filename and get scalar as output (#20)\n\nMINOR IMPROVEMENTS"
  },
  {
    "path": "R/hello.R",
    "chars": 9841,
    "preview": "#' wordcountaddin\n#'\n#' This packages is an addin for RStudio that will count the words and characters in a plain text d"
  },
  {
    "path": "R/utils.R",
    "chars": 266,
    "preview": "# Get the filename of the current file, or\n# the file being rendered\n\nthis_filename <- function() {\n  if (interactive())"
  },
  {
    "path": "README.Rmd",
    "chars": 8988,
    "preview": "---\noutput:\n  md_document:\n    variant: markdown_github\n---\n\n\n\n<!-- README.md is generated from README.Rmd. Please edit "
  },
  {
    "path": "README.md",
    "chars": 9098,
    "preview": "<!-- README.md is generated from README.Rmd. Please edit that file -->\nwordcountaddin <img src=\"inst/logo.png\" align=\"ri"
  },
  {
    "path": "codecov.yml",
    "chars": 15,
    "preview": "comment: false\n"
  },
  {
    "path": "inst/rstudio/addins.dcf",
    "chars": 285,
    "preview": "Name: Word count\nDescription: Counts words and characters (excluding code chunks, inline code, etc.)\nBinding: text_stats"
  },
  {
    "path": "man/text_stats.Rd",
    "chars": 1474,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hello.R\n\\name{text_stats}\n\\alias{text_stat"
  },
  {
    "path": "man/wordcountaddin.Rd",
    "chars": 535,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hello.R\n\\docType{package}\n\\name{wordcounta"
  },
  {
    "path": "tests/testthat/test_wordcountaddin.R",
    "chars": 11836,
    "preview": "library(wordcountaddin)\n\ncontext(\"Word count\")\n\ntest_that(\"Word count is correct for short simple sentence\", {\n  # short"
  },
  {
    "path": "tests/testthat/test_wordcountaddin.Rmd",
    "chars": 1247,
    "preview": "---\ntitle: \"rmd_test_file.rmd\"\noutput:\n  word_document: default\n  html_document: default\n---\n\n```{r setup, include=FALSE"
  },
  {
    "path": "tests/testthat.R",
    "chars": 72,
    "preview": "library(testthat)\nlibrary(wordcountaddin)\n\ntest_check(\"wordcountaddin\")\n"
  },
  {
    "path": "wordcountaddin.Rproj",
    "chars": 394,
    "preview": "Version: 1.0\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSp"
  }
]

// ... and 1 more files (download for full content)

About this extraction

This page contains the full source code of the benmarwick/wordcountaddin GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 25 files (53.8 KB), approximately 14.8k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo