Full Code of CornellLabofOrnithology/auk for AI

main e2fc3cda779c cached

278 files

5.1 MB

1.4M tokens

86 symbols

1 requests

Download .txt

Showing preview only (5,429K chars total). Download the full file or copy to clipboard to get everything.

Repository: CornellLabofOrnithology/auk
Branch: main
Commit: e2fc3cda779c
Files: 278
Total size: 5.1 MB

Directory structure:
gitextract_veew39wq/

├── .Rbuildignore
├── .github/
│   ├── .gitignore
│   └── workflows/
│       └── R-CMD-check.yaml
├── .gitignore
├── CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── auk-bbox.R
│   ├── auk-bcr.R
│   ├── auk-breeding.R
│   ├── auk-clean.R
│   ├── auk-complete.R
│   ├── auk-country.R
│   ├── auk-county.R
│   ├── auk-date.R
│   ├── auk-distance.R
│   ├── auk-duration.R
│   ├── auk-ebd-version.R
│   ├── auk-ebd.R
│   ├── auk-exotic.R
│   ├── auk-filter.R
│   ├── auk-get-awk-path.R
│   ├── auk-get-ebd-path.R
│   ├── auk-last-edited.R
│   ├── auk-observer.R
│   ├── auk-package.R
│   ├── auk-project.R
│   ├── auk-protocol.R
│   ├── auk-rollup.R
│   ├── auk-sampling.R
│   ├── auk-select.R
│   ├── auk-set-awk-path.R
│   ├── auk-set-ebd-path.R
│   ├── auk-species.R
│   ├── auk-split.R
│   ├── auk-state.R
│   ├── auk-time.R
│   ├── auk-unique.R
│   ├── auk-version.R
│   ├── auk-year.R
│   ├── auk-zerofill.R
│   ├── data.R
│   ├── ebird-species.R
│   ├── filter-repeat-visits.R
│   ├── format-unmarked-occu.R
│   ├── get-ebird-taxonomy.R
│   ├── process_barcharts.R
│   ├── read.R
│   ├── utils.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── auk.Rproj
├── cran-comments.md
├── data/
│   ├── bcr_codes.rda
│   ├── ebird_states.rda
│   ├── ebird_taxonomy.rda
│   └── valid_protocols.rda
├── data-raw/
│   ├── BCRCodes.txt
│   ├── barchart.R
│   ├── bcr-codes.r
│   ├── ebd-samples.r
│   ├── ebird-state.r
│   ├── ebird-taxonomy.csv
│   ├── ebird-taxonomy.r
│   └── valid-protocols.r
├── docs/
│   ├── 404.html
│   ├── 404.md
│   ├── CONDUCT.html
│   ├── CONDUCT.md
│   ├── CONTRIBUTING.html
│   ├── CONTRIBUTING.md
│   ├── LICENSE.html
│   ├── LICENSE.md
│   ├── articles/
│   │   ├── auk.html
│   │   ├── auk.md
│   │   ├── development.html
│   │   ├── development.md
│   │   ├── index.html
│   │   └── index.md
│   ├── authors.html
│   ├── authors.md
│   ├── deps/
│   │   ├── data-deps.txt
│   │   ├── font-awesome-6.5.2/
│   │   │   └── css/
│   │   │       ├── all.css
│   │   │       └── v4-shims.css
│   │   └── jquery-3.6.0/
│   │       └── jquery-3.6.0.js
│   ├── index.html
│   ├── index.md
│   ├── katex-auto.js
│   ├── lightswitch.js
│   ├── llms.txt
│   ├── news/
│   │   ├── index.html
│   │   └── index.md
│   ├── pkgdown.js
│   ├── pkgdown.yml
│   ├── reference/
│   │   ├── auk-package.html
│   │   ├── auk-package.md
│   │   ├── auk.html
│   │   ├── auk_bbox.html
│   │   ├── auk_bbox.md
│   │   ├── auk_bcr.html
│   │   ├── auk_bcr.md
│   │   ├── auk_breeding.html
│   │   ├── auk_breeding.md
│   │   ├── auk_clean.html
│   │   ├── auk_clean.md
│   │   ├── auk_complete.html
│   │   ├── auk_complete.md
│   │   ├── auk_country.html
│   │   ├── auk_country.md
│   │   ├── auk_county.html
│   │   ├── auk_county.md
│   │   ├── auk_date.html
│   │   ├── auk_date.md
│   │   ├── auk_distance.html
│   │   ├── auk_distance.md
│   │   ├── auk_duration.html
│   │   ├── auk_duration.md
│   │   ├── auk_ebd.html
│   │   ├── auk_ebd.md
│   │   ├── auk_ebd_version.html
│   │   ├── auk_ebd_version.md
│   │   ├── auk_exotic.html
│   │   ├── auk_exotic.md
│   │   ├── auk_extent.html
│   │   ├── auk_extent.md
│   │   ├── auk_filter.auk_ebd.html
│   │   ├── auk_filter.auk_sampling.html
│   │   ├── auk_filter.html
│   │   ├── auk_filter.md
│   │   ├── auk_get_awk_path.html
│   │   ├── auk_get_awk_path.md
│   │   ├── auk_get_ebd_path.html
│   │   ├── auk_get_ebd_path.md
│   │   ├── auk_last_edited.html
│   │   ├── auk_last_edited.md
│   │   ├── auk_observer.html
│   │   ├── auk_observer.md
│   │   ├── auk_project.html
│   │   ├── auk_project.md
│   │   ├── auk_protocol.html
│   │   ├── auk_protocol.md
│   │   ├── auk_rollup.html
│   │   ├── auk_rollup.md
│   │   ├── auk_sampling.html
│   │   ├── auk_sampling.md
│   │   ├── auk_select.html
│   │   ├── auk_select.md
│   │   ├── auk_set_awk_path.html
│   │   ├── auk_set_awk_path.md
│   │   ├── auk_set_ebd_path.html
│   │   ├── auk_set_ebd_path.md
│   │   ├── auk_species.html
│   │   ├── auk_species.md
│   │   ├── auk_split.html
│   │   ├── auk_split.md
│   │   ├── auk_state.html
│   │   ├── auk_state.md
│   │   ├── auk_time.html
│   │   ├── auk_time.md
│   │   ├── auk_unique.html
│   │   ├── auk_unique.md
│   │   ├── auk_version.html
│   │   ├── auk_version.md
│   │   ├── auk_year.html
│   │   ├── auk_year.md
│   │   ├── auk_zerofill.auk_ebd.html
│   │   ├── auk_zerofill.character.html
│   │   ├── auk_zerofill.data.frame.html
│   │   ├── auk_zerofill.html
│   │   ├── auk_zerofill.md
│   │   ├── bcr_codes.html
│   │   ├── bcr_codes.md
│   │   ├── collapse_zerofill.html
│   │   ├── ebird_species.html
│   │   ├── ebird_species.md
│   │   ├── ebird_states.html
│   │   ├── ebird_states.md
│   │   ├── ebird_taxonomy.html
│   │   ├── ebird_taxonomy.md
│   │   ├── filter_repeat_visits.html
│   │   ├── filter_repeat_visits.md
│   │   ├── format_unmarked_occu.html
│   │   ├── format_unmarked_occu.md
│   │   ├── get_ebird_taxonomy.html
│   │   ├── get_ebird_taxonomy.md
│   │   ├── index.html
│   │   ├── index.md
│   │   ├── process_barcharts.html
│   │   ├── process_barcharts.md
│   │   ├── read_ebd.auk_ebd.html
│   │   ├── read_ebd.character.html
│   │   ├── read_ebd.html
│   │   ├── read_ebd.md
│   │   ├── read_sampling.auk_ebd.html
│   │   ├── read_sampling.auk_sampling.html
│   │   ├── read_sampling.character.html
│   │   ├── read_sampling.html
│   │   ├── valid_protocols.html
│   │   └── valid_protocols.md
│   ├── search.json
│   ├── site.webmanifest
│   └── sitemap.xml
├── inst/
│   └── extdata/
│       ├── barchart-sample.txt
│       ├── ebd-rollup-ex.txt
│       ├── ebd-sample.txt
│       ├── zerofill-ex_ebd.txt
│       └── zerofill-ex_sampling.txt
├── makefile.R
├── man/
│   ├── auk-package.Rd
│   ├── auk_bbox.Rd
│   ├── auk_bcr.Rd
│   ├── auk_breeding.Rd
│   ├── auk_clean.Rd
│   ├── auk_complete.Rd
│   ├── auk_country.Rd
│   ├── auk_county.Rd
│   ├── auk_date.Rd
│   ├── auk_distance.Rd
│   ├── auk_duration.Rd
│   ├── auk_ebd.Rd
│   ├── auk_ebd_version.Rd
│   ├── auk_exotic.Rd
│   ├── auk_extent.Rd
│   ├── auk_filter.Rd
│   ├── auk_get_awk_path.Rd
│   ├── auk_get_ebd_path.Rd
│   ├── auk_last_edited.Rd
│   ├── auk_observer.Rd
│   ├── auk_project.Rd
│   ├── auk_protocol.Rd
│   ├── auk_rollup.Rd
│   ├── auk_sampling.Rd
│   ├── auk_select.Rd
│   ├── auk_set_awk_path.Rd
│   ├── auk_set_ebd_path.Rd
│   ├── auk_species.Rd
│   ├── auk_split.Rd
│   ├── auk_state.Rd
│   ├── auk_time.Rd
│   ├── auk_unique.Rd
│   ├── auk_version.Rd
│   ├── auk_year.Rd
│   ├── auk_zerofill.Rd
│   ├── bcr_codes.Rd
│   ├── ebird_species.Rd
│   ├── ebird_states.Rd
│   ├── ebird_taxonomy.Rd
│   ├── filter_repeat_visits.Rd
│   ├── format_unmarked_occu.Rd
│   ├── get_ebird_taxonomy.Rd
│   ├── process_barcharts.Rd
│   ├── read_ebd.Rd
│   └── valid_protocols.Rd
├── tests/
│   ├── testthat/
│   │   ├── setup.R
│   │   ├── test_auk-ebd-version.r
│   │   ├── test_auk-ebd.r
│   │   ├── test_auk-filter.r
│   │   ├── test_auk-keep-drop.r
│   │   ├── test_auk-rollup.r
│   │   ├── test_auk-select.r
│   │   ├── test_auk-split.r
│   │   ├── test_auk-unique.r
│   │   ├── test_auk-zerofill.r
│   │   ├── test_ebird-species.r
│   │   ├── test_filters.r
│   │   ├── test_filters_sampling.r
│   │   ├── test_get-ebird-taxonomy.r
│   │   ├── test_read.r
│   │   ├── test_set-env.R
│   │   └── test_unmarked.r
│   └── testthat.R
└── vignettes/
    ├── auk.Rmd
    └── development.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^CRAN-RELEASE$
^Meta$
^doc$
^pkgdown$
^.*\.Rproj$
^\.Rproj\.user$
^data-raw$
^README\.Rmd$
^README\.md$
^README-.*\.png$
^docs$
^_pkgdown\.yml$
^cran-comments\.md$
^CONDUCT\.md$
^CONTRIBUTING\.md$
^LICENSE\.md$
^makefile\.r
^hex-logo$
^logo\.png
^cheatsheet
^\.github$
^CRAN-SUBMISSION$


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

name: R-CMD-check

permissions: read-all

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: macos-latest,   r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
          - {os: ubuntu-latest,   r: 'release'}
          - {os: ubuntu-latest,   r: 'oldrel-1'}

    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::rcmdcheck
          needs: check

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true
          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'


================================================
FILE: .gitignore
================================================
Meta
doc
.Rproj.user
.Rhistory
.RData
.Ruserdata
.DS_Store
inst/doc
pkgdown/
/doc/
/Meta/


================================================
FILE: CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at mes335@cornell.edu. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]

[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/


================================================
FILE: CONTRIBUTING.md
================================================
# CONTRIBUTING

## Please contribute!

We love collaboration.

## Bugs?

- Submit an issue on the Issues page [here](https://github.com/CornellLabofOrnithology/auk/issues)

## Code contributions

- Fork this repo to your GitHub account
- Clone your version on your account down to your machine from your account, e.g,. `git clone https://github.com/<yourgithubusername>/auk.git`
- Make sure to track progress upstream (i.e., on our version of `auk` at `CornellLabofOrnithology/auk`) by doing `git remote add upstream https://github.com/CornellLabofOrnithology/auk.git`. Before making changes make sure to pull changes in from upstream by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step
- Make your changes (bonus points for making changes on a new branch)
- If you alter package functionality at all (e.g., the code itself, not just documentation), please do write some tests to cover the new functionality
- Push up to your account
- Submit a pull request to home base at `CornellLabofOrnithology/auk`

### Thanks for contributing!


================================================
FILE: DESCRIPTION
================================================
Package: auk
Title: eBird Data Extraction and Processing in R
Version: 0.9.2
Authors@R: 
    c(person(given = "Matthew",
             family = "Strimas-Mackey",
             role = c("aut", "cre"),
             email = "mes335@cornell.edu",
             comment = c(ORCID = "0000-0001-8929-7776")),
      person(given = "Eliot",
             family = "Miller",
             role = "aut"),
      person(given = "Wesley",
             family = "Hochachka",
             role = "aut"),
      person(given = "Cornell Lab of Ornithology",
             role = "cph"))
Description: Extract and process bird sightings records from
    eBird (<http://ebird.org>), an online tool for recording bird
    observations.  Public access to the full eBird database is via the
    eBird Basic Dataset (EBD; see <http://ebird.org/ebird/data/download>
    for access), a downloadable text file. This package is an interface to
    AWK for extracting data from the EBD based on taxonomic, spatial, or
    temporal filters, to produce a manageable file size that can be
    imported into R.
License: GPL-3
URL: https://cornelllabofornithology.github.io/auk/
BugReports: 
    https://github.com/CornellLabofOrnithology/auk/issues
Depends: 
    R (>= 4.1.0)
Imports: 
    assertthat,
    countrycode (>= 1.0.0),
    dplyr (>= 0.7.8),
    httr2,
    readr (>= 2.0.0),
    rlang (>= 0.3.0),
    stringi,
    stringr,
    tidyr (>= 0.8.0)
Suggests: 
    covr,
    knitr,
    rmarkdown,
    sf,
    testthat,
    unmarked,
    withr
VignetteBuilder: 
    knitr
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3


================================================
FILE: LICENSE.md
================================================
GNU General Public License
==========================

_Version 3, 29 June 2007_  
_Copyright © 2007 Free Software Foundation, Inc. &lt;<http://fsf.org/>&gt;_

Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.

## Preamble

The GNU General Public License is a free, copyleft license for software and other
kinds of works.

The licenses for most software and other practical works are designed to take away
your freedom to share and change the works. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change all versions of a
program--to make sure it remains free software for all its users. We, the Free
Software Foundation, use the GNU General Public License for most of our software; it
applies also to any other work released this way by its authors. You can apply it to
your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General
Public Licenses are designed to make sure that you have the freedom to distribute
copies of free software (and charge for them if you wish), that you receive source
code or can get it if you want it, that you can change the software or use pieces of
it in new free programs, and that you know you can do these things.

To protect your rights, we need to prevent others from denying you these rights or
asking you to surrender the rights. Therefore, you have certain responsibilities if
you distribute copies of the software, or if you modify it: responsibilities to
respect the freedom of others.

For example, if you distribute copies of such a program, whether gratis or for a fee,
you must pass on to the recipients the same freedoms that you received. You must make
sure that they, too, receive or can get the source code. And you must show them these
terms so they know their rights.

Developers that use the GNU GPL protect your rights with two steps: **(1)** assert
copyright on the software, and **(2)** offer you this License giving you legal permission
to copy, distribute and/or modify it.

For the developers' and authors' protection, the GPL clearly explains that there is
no warranty for this free software. For both users' and authors' sake, the GPL
requires that modified versions be marked as changed, so that their problems will not
be attributed erroneously to authors of previous versions.

Some devices are designed to deny users access to install or run modified versions of
the software inside them, although the manufacturer can do so. This is fundamentally
incompatible with the aim of protecting users' freedom to change the software. The
systematic pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we have designed
this version of the GPL to prohibit the practice for those products. If such problems
arise substantially in other domains, we stand ready to extend this provision to
those domains in future versions of the GPL, as needed to protect the freedom of
users.

Finally, every program is threatened constantly by software patents. States should
not allow patents to restrict development and use of software on general-purpose
computers, but in those that do, we wish to avoid the special danger that patents
applied to a free program could make it effectively proprietary. To prevent this, the
GPL assures that patents cannot be used to render the program non-free.

The precise terms and conditions for copying, distribution and modification follow.

## TERMS AND CONDITIONS

### 0. Definitions

“This License” refers to version 3 of the GNU General Public License.

“Copyright” also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

“The Program” refers to any copyrightable work licensed under this
License. Each licensee is addressed as “you”. “Licensees” and
“recipients” may be individuals or organizations.

To “modify” a work means to copy from or adapt all or part of the work in
a fashion requiring copyright permission, other than the making of an exact copy. The
resulting work is called a “modified version” of the earlier work or a
work “based on” the earlier work.

A “covered work” means either the unmodified Program or a work based on
the Program.

To “propagate” a work means to do anything with it that, without
permission, would make you directly or secondarily liable for infringement under
applicable copyright law, except executing it on a computer or modifying a private
copy. Propagation includes copying, distribution (with or without modification),
making available to the public, and in some countries other activities as well.

To “convey” a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through a computer
network, with no transfer of a copy, is not conveying.

An interactive user interface displays “Appropriate Legal Notices” to the
extent that it includes a convenient and prominently visible feature that **(1)**
displays an appropriate copyright notice, and **(2)** tells the user that there is no
warranty for the work (except to the extent that warranties are provided), that
licensees may convey the work under this License, and how to view a copy of this
License. If the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

### 1. Source Code

The “source code” for a work means the preferred form of the work for
making modifications to it. “Object code” means any non-source form of a
work.

A “Standard Interface” means an interface that either is an official
standard defined by a recognized standards body, or, in the case of interfaces
specified for a particular programming language, one that is widely used among
developers working in that language.

The “System Libraries” of an executable work include anything, other than
the work as a whole, that **(a)** is included in the normal form of packaging a Major
Component, but which is not part of that Major Component, and **(b)** serves only to
enable use of the work with that Major Component, or to implement a Standard
Interface for which an implementation is available to the public in source code form.
A “Major Component”, in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system (if any) on which
the executable work runs, or a compiler used to produce the work, or an object code
interpreter used to run it.

The “Corresponding Source” for a work in object code form means all the
source code needed to generate, install, and (for an executable work) run the object
code and to modify the work, including scripts to control those activities. However,
it does not include the work's System Libraries, or general-purpose tools or
generally available free programs which are used unmodified in performing those
activities but which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for the work, and
the source code for shared libraries and dynamically linked subprograms that the work
is specifically designed to require, such as by intimate data communication or
control flow between those subprograms and other parts of the work.

The Corresponding Source need not include anything that users can regenerate
automatically from other parts of the Corresponding Source.

The Corresponding Source for a work in source code form is that same work.

### 2. Basic Permissions

All rights granted under this License are granted for the term of copyright on the
Program, and are irrevocable provided the stated conditions are met. This License
explicitly affirms your unlimited permission to run the unmodified Program. The
output from running a covered work is covered by this License only if the output,
given its content, constitutes a covered work. This License acknowledges your rights
of fair use or other equivalent, as provided by copyright law.

You may make, run and propagate covered works that you do not convey, without
conditions so long as your license otherwise remains in force. You may convey covered
works to others for the sole purpose of having them make modifications exclusively
for you, or provide you with facilities for running those works, provided that you
comply with the terms of this License in conveying all material for which you do not
control copyright. Those thus making or running the covered works for you must do so
exclusively on your behalf, under your direction and control, on terms that prohibit
them from making any copies of your copyrighted material outside their relationship
with you.

Conveying under any other circumstances is permitted solely under the conditions
stated below. Sublicensing is not allowed; section 10 makes it unnecessary.

### 3. Protecting Users' Legal Rights From Anti-Circumvention Law

No covered work shall be deemed part of an effective technological measure under any
applicable law fulfilling obligations under article 11 of the WIPO copyright treaty
adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention
of such measures.

When you convey a covered work, you waive any legal power to forbid circumvention of
technological measures to the extent such circumvention is effected by exercising
rights under this License with respect to the covered work, and you disclaim any
intention to limit operation or modification of the work as a means of enforcing,
against the work's users, your or third parties' legal rights to forbid circumvention
of technological measures.

### 4. Conveying Verbatim Copies

You may convey verbatim copies of the Program's source code as you receive it, in any
medium, provided that you conspicuously and appropriately publish on each copy an
appropriate copyright notice; keep intact all notices stating that this License and
any non-permissive terms added in accord with section 7 apply to the code; keep
intact all notices of the absence of any warranty; and give all recipients a copy of
this License along with the Program.

You may charge any price or no price for each copy that you convey, and you may offer
support or warranty protection for a fee.

### 5. Conveying Modified Source Versions

You may convey a work based on the Program, or the modifications to produce it from
the Program, in the form of source code under the terms of section 4, provided that
you also meet all of these conditions:

* **a)** The work must carry prominent notices stating that you modified it, and giving a
relevant date.
* **b)** The work must carry prominent notices stating that it is released under this
License and any conditions added under section 7. This requirement modifies the
requirement in section 4 to “keep intact all notices”.
* **c)** You must license the entire work, as a whole, under this License to anyone who
comes into possession of a copy. This License will therefore apply, along with any
applicable section 7 additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no permission to license the
work in any other way, but it does not invalidate such permission if you have
separately received it.
* **d)** If the work has interactive user interfaces, each must display Appropriate Legal
Notices; however, if the Program has interactive interfaces that do not display
Appropriate Legal Notices, your work need not make them do so.

A compilation of a covered work with other separate and independent works, which are
not by their nature extensions of the covered work, and which are not combined with
it such as to form a larger program, in or on a volume of a storage or distribution
medium, is called an “aggregate” if the compilation and its resulting
copyright are not used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work in an aggregate
does not cause this License to apply to the other parts of the aggregate.

### 6. Conveying Non-Source Forms

You may convey a covered work in object code form under the terms of sections 4 and
5, provided that you also convey the machine-readable Corresponding Source under the
terms of this License, in one of these ways:

* **a)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by the Corresponding Source fixed on a
durable physical medium customarily used for software interchange.
* **b)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by a written offer, valid for at least
three years and valid for as long as you offer spare parts or customer support for
that product model, to give anyone who possesses the object code either **(1)** a copy of
the Corresponding Source for all the software in the product that is covered by this
License, on a durable physical medium customarily used for software interchange, for
a price no more than your reasonable cost of physically performing this conveying of
source, or **(2)** access to copy the Corresponding Source from a network server at no
charge.
* **c)** Convey individual copies of the object code with a copy of the written offer to
provide the Corresponding Source. This alternative is allowed only occasionally and
noncommercially, and only if you received the object code with such an offer, in
accord with subsection 6b.
* **d)** Convey the object code by offering access from a designated place (gratis or for
a charge), and offer equivalent access to the Corresponding Source in the same way
through the same place at no further charge. You need not require recipients to copy
the Corresponding Source along with the object code. If the place to copy the object
code is a network server, the Corresponding Source may be on a different server
(operated by you or a third party) that supports equivalent copying facilities,
provided you maintain clear directions next to the object code saying where to find
the Corresponding Source. Regardless of what server hosts the Corresponding Source,
you remain obligated to ensure that it is available for as long as needed to satisfy
these requirements.
* **e)** Convey the object code using peer-to-peer transmission, provided you inform
other peers where the object code and Corresponding Source of the work are being
offered to the general public at no charge under subsection 6d.

A separable portion of the object code, whose source code is excluded from the
Corresponding Source as a System Library, need not be included in conveying the
object code work.

A “User Product” is either **(1)** a “consumer product”, which
means any tangible personal property which is normally used for personal, family, or
household purposes, or **(2)** anything designed or sold for incorporation into a
dwelling. In determining whether a product is a consumer product, doubtful cases
shall be resolved in favor of coverage. For a particular product received by a
particular user, “normally used” refers to a typical or common use of
that class of product, regardless of the status of the particular user or of the way
in which the particular user actually uses, or expects or is expected to use, the
product. A product is a consumer product regardless of whether the product has
substantial commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

“Installation Information” for a User Product means any methods,
procedures, authorization keys, or other information required to install and execute
modified versions of a covered work in that User Product from a modified version of
its Corresponding Source. The information must suffice to ensure that the continued
functioning of the modified object code is in no case prevented or interfered with
solely because modification has been made.

If you convey an object code work under this section in, or with, or specifically for
use in, a User Product, and the conveying occurs as part of a transaction in which
the right of possession and use of the User Product is transferred to the recipient
in perpetuity or for a fixed term (regardless of how the transaction is
characterized), the Corresponding Source conveyed under this section must be
accompanied by the Installation Information. But this requirement does not apply if
neither you nor any third party retains the ability to install modified object code
on the User Product (for example, the work has been installed in ROM).

The requirement to provide Installation Information does not include a requirement to
continue to provide support service, warranty, or updates for a work that has been
modified or installed by the recipient, or for the User Product in which it has been
modified or installed. Access to a network may be denied when the modification itself
materially and adversely affects the operation of the network or violates the rules
and protocols for communication across the network.

Corresponding Source conveyed, and Installation Information provided, in accord with
this section must be in a format that is publicly documented (and with an
implementation available to the public in source code form), and must require no
special password or key for unpacking, reading or copying.

### 7. Additional Terms

“Additional permissions” are terms that supplement the terms of this
License by making exceptions from one or more of its conditions. Additional
permissions that are applicable to the entire Program shall be treated as though they
were included in this License, to the extent that they are valid under applicable
law. If additional permissions apply only to part of the Program, that part may be
used separately under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

When you convey a copy of a covered work, you may at your option remove any
additional permissions from that copy, or from any part of it. (Additional
permissions may be written to require their own removal in certain cases when you
modify the work.) You may place additional permissions on material, added by you to a
covered work, for which you have or can give appropriate copyright permission.

Notwithstanding any other provision of this License, for material you add to a
covered work, you may (if authorized by the copyright holders of that material)
supplement the terms of this License with terms:

* **a)** Disclaiming warranty or limiting liability differently from the terms of
sections 15 and 16 of this License; or
* **b)** Requiring preservation of specified reasonable legal notices or author
attributions in that material or in the Appropriate Legal Notices displayed by works
containing it; or
* **c)** Prohibiting misrepresentation of the origin of that material, or requiring that
modified versions of such material be marked in reasonable ways as different from the
original version; or
* **d)** Limiting the use for publicity purposes of names of licensors or authors of the
material; or
* **e)** Declining to grant rights under trademark law for use of some trade names,
trademarks, or service marks; or
* **f)** Requiring indemnification of licensors and authors of that material by anyone
who conveys the material (or modified versions of it) with contractual assumptions of
liability to the recipient, for any liability that these contractual assumptions
directly impose on those licensors and authors.

All other non-permissive additional terms are considered “further
restrictions” within the meaning of section 10. If the Program as you received
it, or any part of it, contains a notice stating that it is governed by this License
along with a term that is a further restriction, you may remove that term. If a
license document contains a further restriction but permits relicensing or conveying
under this License, you may add to a covered work material governed by the terms of
that license document, provided that the further restriction does not survive such
relicensing or conveying.

If you add terms to a covered work in accord with this section, you must place, in
the relevant source files, a statement of the additional terms that apply to those
files, or a notice indicating where to find the applicable terms.

Additional terms, permissive or non-permissive, may be stated in the form of a
separately written license, or stated as exceptions; the above requirements apply
either way.

### 8. Termination

You may not propagate or modify a covered work except as expressly provided under
this License. Any attempt otherwise to propagate or modify it is void, and will
automatically terminate your rights under this License (including any patent licenses
granted under the third paragraph of section 11).

However, if you cease all violation of this License, then your license from a
particular copyright holder is reinstated **(a)** provisionally, unless and until the
copyright holder explicitly and finally terminates your license, and **(b)** permanently,
if the copyright holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently
if the copyright holder notifies you of the violation by some reasonable means, this
is the first time you have received notice of violation of this License (for any
work) from that copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of
parties who have received copies or rights from you under this License. If your
rights have been terminated and not permanently reinstated, you do not qualify to
receive new licenses for the same material under section 10.

### 9. Acceptance Not Required for Having Copies

You are not required to accept this License in order to receive or run a copy of the
Program. Ancillary propagation of a covered work occurring solely as a consequence of
using peer-to-peer transmission to receive a copy likewise does not require
acceptance. However, nothing other than this License grants you permission to
propagate or modify any covered work. These actions infringe copyright if you do not
accept this License. Therefore, by modifying or propagating a covered work, you
indicate your acceptance of this License to do so.

### 10. Automatic Licensing of Downstream Recipients

Each time you convey a covered work, the recipient automatically receives a license
from the original licensors, to run, modify and propagate that work, subject to this
License. You are not responsible for enforcing compliance by third parties with this
License.

An “entity transaction” is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an organization, or
merging organizations. If propagation of a covered work results from an entity
transaction, each party to that transaction who receives a copy of the work also
receives whatever licenses to the work the party's predecessor in interest had or
could give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if the predecessor
has it or can get it with reasonable efforts.

You may not impose any further restrictions on the exercise of the rights granted or
affirmed under this License. For example, you may not impose a license fee, royalty,
or other charge for exercise of rights granted under this License, and you may not
initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging
that any patent claim is infringed by making, using, selling, offering for sale, or
importing the Program or any portion of it.

### 11. Patents

A “contributor” is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The work thus
licensed is called the contributor's “contributor version”.

A contributor's “essential patent claims” are all patent claims owned or
controlled by the contributor, whether already acquired or hereafter acquired, that
would be infringed by some manner, permitted by this License, of making, using, or
selling its contributor version, but do not include claims that would be infringed
only as a consequence of further modification of the contributor version. For
purposes of this definition, “control” includes the right to grant patent
sublicenses in a manner consistent with the requirements of this License.

Each contributor grants you a non-exclusive, worldwide, royalty-free patent license
under the contributor's essential patent claims, to make, use, sell, offer for sale,
import and otherwise run, modify and propagate the contents of its contributor
version.

In the following three paragraphs, a “patent license” is any express
agreement or commitment, however denominated, not to enforce a patent (such as an
express permission to practice a patent or covenant not to sue for patent
infringement). To “grant” such a patent license to a party means to make
such an agreement or commitment not to enforce a patent against the party.

If you convey a covered work, knowingly relying on a patent license, and the
Corresponding Source of the work is not available for anyone to copy, free of charge
and under the terms of this License, through a publicly available network server or
other readily accessible means, then you must either **(1)** cause the Corresponding
Source to be so available, or **(2)** arrange to deprive yourself of the benefit of the
patent license for this particular work, or **(3)** arrange, in a manner consistent with
the requirements of this License, to extend the patent license to downstream
recipients. “Knowingly relying” means you have actual knowledge that, but
for the patent license, your conveying the covered work in a country, or your
recipient's use of the covered work in a country, would infringe one or more
identifiable patents in that country that you have reason to believe are valid.

If, pursuant to or in connection with a single transaction or arrangement, you
convey, or propagate by procuring conveyance of, a covered work, and grant a patent
license to some of the parties receiving the covered work authorizing them to use,
propagate, modify or convey a specific copy of the covered work, then the patent
license you grant is automatically extended to all recipients of the covered work and
works based on it.

A patent license is “discriminatory” if it does not include within the
scope of its coverage, prohibits the exercise of, or is conditioned on the
non-exercise of one or more of the rights that are specifically granted under this
License. You may not convey a covered work if you are a party to an arrangement with
a third party that is in the business of distributing software, under which you make
payment to the third party based on the extent of your activity of conveying the
work, and under which the third party grants, to any of the parties who would receive
the covered work from you, a discriminatory patent license **(a)** in connection with
copies of the covered work conveyed by you (or copies made from those copies), or **(b)**
primarily for and in connection with specific products or compilations that contain
the covered work, unless you entered into that arrangement, or that patent license
was granted, prior to 28 March 2007.

Nothing in this License shall be construed as excluding or limiting any implied
license or other defenses to infringement that may otherwise be available to you
under applicable patent law.

### 12. No Surrender of Others' Freedom

If conditions are imposed on you (whether by court order, agreement or otherwise)
that contradict the conditions of this License, they do not excuse you from the
conditions of this License. If you cannot convey a covered work so as to satisfy
simultaneously your obligations under this License and any other pertinent
obligations, then as a consequence you may not convey it at all. For example, if you
agree to terms that obligate you to collect a royalty for further conveying from
those to whom you convey the Program, the only way you could satisfy both those terms
and this License would be to refrain entirely from conveying the Program.

### 13. Use with the GNU Affero General Public License

Notwithstanding any other provision of this License, you have permission to link or
combine any covered work with a work licensed under version 3 of the GNU Affero
General Public License into a single combined work, and to convey the resulting work.
The terms of this License will continue to apply to the part which is the covered
work, but the special requirements of the GNU Affero General Public License, section
13, concerning interaction through a network will apply to the combination as such.

### 14. Revised Versions of this License

The Free Software Foundation may publish revised and/or new versions of the GNU
General Public License from time to time. Such new versions will be similar in spirit
to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies that
a certain numbered version of the GNU General Public License “or any later
version” applies to it, you have the option of following the terms and
conditions either of that numbered version or of any later version published by the
Free Software Foundation. If the Program does not specify a version number of the GNU
General Public License, you may choose any version ever published by the Free
Software Foundation.

If the Program specifies that a proxy can decide which future versions of the GNU
General Public License can be used, that proxy's public statement of acceptance of a
version permanently authorizes you to choose that version for the Program.

Later license versions may give you additional or different permissions. However, no
additional obligations are imposed on any author or copyright holder as a result of
your choosing to follow a later version.

### 15. Disclaimer of Warranty

THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

### 16. Limitation of Liability

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY
COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS
PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE
OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

### 17. Interpretation of Sections 15 and 16

If the disclaimer of warranty and limitation of liability provided above cannot be
given local legal effect according to their terms, reviewing courts shall apply local
law that most closely approximates an absolute waiver of all civil liability in
connection with the Program, unless a warranty or assumption of liability accompanies
a copy of the Program in return for a fee.

_END OF TERMS AND CONDITIONS_

## How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to
the public, the best way to achieve this is to make it free software which everyone
can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them
to the start of each source file to most effectively state the exclusion of warranty;
and each file should have at least the “copyright” line and a pointer to
where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) 2018 Matt Strimas-Mackey

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

If the program does terminal interaction, make it output a short notice like this
when it starts in an interactive mode:

    auk Copyright (C) 2018 Matt Strimas-Mackey
    This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type 'show c' for details.

The hypothetical commands `show w` and `show c` should show the appropriate parts of
the General Public License. Of course, your program's commands might be different;
for a GUI interface, you would use an “about box”.

You should also get your employer (if you work as a programmer) or school, if any, to
sign a “copyright disclaimer” for the program, if necessary. For more
information on this, and how to apply and follow the GNU GPL, see
&lt;<http://www.gnu.org/licenses/>&gt;.

The GNU General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may consider it
more useful to permit linking proprietary applications with the library. If this is
what you want to do, use the GNU Lesser General Public License instead of this
License. But first, please read
&lt;<http://www.gnu.org/philosophy/why-not-lgpl.html>&gt;.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

S3method(auk_bbox,auk_ebd)
S3method(auk_bbox,auk_sampling)
S3method(auk_bcr,auk_ebd)
S3method(auk_bcr,auk_sampling)
S3method(auk_breeding,auk_ebd)
S3method(auk_complete,auk_ebd)
S3method(auk_complete,auk_sampling)
S3method(auk_country,auk_ebd)
S3method(auk_country,auk_sampling)
S3method(auk_county,auk_ebd)
S3method(auk_county,auk_sampling)
S3method(auk_date,auk_ebd)
S3method(auk_date,auk_sampling)
S3method(auk_distance,auk_ebd)
S3method(auk_distance,auk_sampling)
S3method(auk_duration,auk_ebd)
S3method(auk_duration,auk_sampling)
S3method(auk_ebd_version,auk_ebd)
S3method(auk_ebd_version,auk_sampling)
S3method(auk_ebd_version,character)
S3method(auk_exotic,auk_ebd)
S3method(auk_filter,auk_ebd)
S3method(auk_filter,auk_sampling)
S3method(auk_last_edited,auk_ebd)
S3method(auk_last_edited,auk_sampling)
S3method(auk_observer,auk_ebd)
S3method(auk_observer,auk_sampling)
S3method(auk_project,auk_ebd)
S3method(auk_project,auk_sampling)
S3method(auk_protocol,auk_ebd)
S3method(auk_protocol,auk_sampling)
S3method(auk_select,auk_ebd)
S3method(auk_select,auk_sampling)
S3method(auk_species,auk_ebd)
S3method(auk_state,auk_ebd)
S3method(auk_state,auk_sampling)
S3method(auk_time,auk_ebd)
S3method(auk_time,auk_sampling)
S3method(auk_year,auk_ebd)
S3method(auk_year,auk_sampling)
S3method(auk_zerofill,auk_ebd)
S3method(auk_zerofill,character)
S3method(auk_zerofill,data.frame)
S3method(collapse_zerofill,auk_zerofill)
S3method(print,auk_ebd)
S3method(print,auk_sampling)
S3method(print,auk_zerofill)
S3method(read_ebd,auk_ebd)
S3method(read_ebd,character)
S3method(read_sampling,auk_ebd)
S3method(read_sampling,auk_sampling)
S3method(read_sampling,character)
export(auk_bbox)
export(auk_bcr)
export(auk_breeding)
export(auk_clean)
export(auk_complete)
export(auk_country)
export(auk_county)
export(auk_date)
export(auk_distance)
export(auk_duration)
export(auk_ebd)
export(auk_ebd_version)
export(auk_exotic)
export(auk_extent)
export(auk_filter)
export(auk_get_awk_path)
export(auk_get_ebd_path)
export(auk_last_edited)
export(auk_observer)
export(auk_project)
export(auk_protocol)
export(auk_rollup)
export(auk_sampling)
export(auk_select)
export(auk_set_awk_path)
export(auk_set_ebd_path)
export(auk_species)
export(auk_split)
export(auk_state)
export(auk_time)
export(auk_unique)
export(auk_version)
export(auk_year)
export(auk_zerofill)
export(collapse_zerofill)
export(ebird_species)
export(filter_repeat_visits)
export(format_unmarked_occu)
export(get_ebird_taxonomy)
export(process_barcharts)
export(read_ebd)
export(read_sampling)
importFrom(rlang,.data)
importFrom(stringr,str_interp)


================================================
FILE: NEWS.md
================================================
# auk 0.9.2

- update to v2 of the eBird API and httr2 (PR #97)
- drop magrittr pipe re-export

# auk 0.9.1

- ensure taxon_concept_id behaves correctly in auk_rollup() (issue #94)
- update EBD example files to get latest format (e.g. add taxon_concept_id)

# auk 0.9.0

- update to align with the 2025 taxonomy update

# auk 0.8.2

- handle changes to project names resulting from release of eBird Projects

# auk 0.8.1

- allow `ebird_species()` to search for species codes in addition to scientific and common names
- handle changes to EBD column names resulting from release of eBird Projects (issue #91)

# auk 0.8.0

- update for 2024 taxonomy
- added a helper function for processing bar chart data from eBird `process_barcharts()`

# auk 0.7.0

- update for 2023 eBird taxonomy
- no need to restart after setting AWK and EBD paths
- retain breeding codes in `auk_zerofill()`
- changes to conform with deprecation of `.data$` in tidyselect expressions
- changes to package-level documentation in roxygen2
- removed non-ASCII characters from datasets

# auk 0.6.0

- update for 2022 eBird taxonomy

# auk 0.5.2

- added an `extinct` column to taxonomy

# auk 0.5.1

- drop `data.table` dependency, no longer needed with `readr` speed improvements
- fix bug arising from 'breeding bird atlas code' being renamed to 'breeding code' (issue #58)

# auk 0.5.0

- update to align with 2021 eBird taxonomy

# auk 0.4.4

- updates to align with readr 2.0

# auk 0.4.3

- `get_ebird_taxonomy()` now fails gracefully when eBird API is not accessible, fixing the CRAN check errors https://cran.r-project.org/web/checks/check_results_auk.html

# auk 0.4.2

- new `auk_county()` filter
- new `auk_year()` filter
- Drop taxonomy warnings since there was no taxonomy update this year

# auk 0.4.1

- Family common names now included in eBird taxonomy
- `auk_select()` now requires certain columns to be kept
- Better handling of file paths with `prefix` argument in `auk_split()`
- Fixed bug causing undescribed species to be dropped by `auk_rollup()`
- Add a `ll_digits` argument to `filter_repeat_visits()` to round lat/lng prior to identifying sites
- Change of default parameters to `filter_repeat_visits()`
- `auk_bbox()` now takes sf/raster spatial objects and grabs bbox from them

# auk 0.4.0

- Updated to 2019 eBird taxonomy
- `auk_observer()` filter added
- `tidyr::complete_()` deprecated, stopped using

# auk 0.3.3

- Dates can now wrap in `auk_date()`, e.g. use `date = c("*-12-01", "*-01-31")` for records from December or January
- Fixed bug preventing dropping of `age/sex` column
- Allow for a wider variety of protocols in `auk_protocol()`
- Addresing some deprecated functions from rlang
- Fixed bug causing `auk_set_awk_path()` to fail

# auk 0.3.2

- Work around for bug in system2() in some R versions: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17508
- Adding a filter for PROALAS checklists to `auk_protocol()`

# auk 0.3.1

- `rlang::UQ()` and `rlang::UQS()` deprecated, switching to `!!` and `!!!`
- `auk_unique()` now keeps track of all sampling event and observer IDs that comprise a group checklist

# auk 0.3.0

- Updated to 2018 taxonomy; new function `get_ebird_taxonomy()` to get taxonomy via the eBird API
- Better handling of taxonomy versions, many functions now take a `taxonomy_version` argument and use the eBird API to get the taxonomy
- `auk_getpath()` renamed `auk_get_awk_path()`, and added `auk_set_awk_path()`
- Added `auk_set_ebd_path()` and `auk_get_ebd_path()` to set and get the 
`EBD_PATH` environment variable. Now users only need to set this once and just 
refer to the file name, rather than specifying the full path every time.
- Functions to prepare data for occupancy modeling: `filter_repeat_visits()` and `format_unmarked_occu()`
- New `auk_bcr()` function to extract data from BCRs
- Added `bcr_codes` data frame to look up BCR names and codes
- "Area" protocol added to `auk_protocol()` filter.
- `auk_extent()` renamed `auk_bbox()`; `auk_extent()` deprecated and redirects to `auk_bbox()`
- `auk_zerofill()` now checks for complete checklists and gives option to not rollup
- `auk_rollup()` now gives the option of keeping higher taxa via `drop_higher` argument
- `auk_clean()` deprecated
- Fixed package load error when `EBD_PATH` is invalid
- Fixed bug when reading files with a blank column using `readr`

# auk 0.2.2

- Updated to work with EDB version 1.9
- Modified tests to be more general to all sample data
- `ebird_species()` now returns 6-letter species codes
- Fixed bug causing auk to fail on files downloaded via custom download form
- Fixed bug with `normalizePath()` use on Windows
- Fixed bug with `system2()` on Windows

# auk 0.2.1

- Patch release fixing a couple bugs
- Removed all non-ASCII characters from example files, closes [issue #14](https://github.com/CornellLabofOrnithology/auk/issues/14)
- Fixed issue with state filtering not working, closes [issue $16](https://github.com/CornellLabofOrnithology/auk/issues/16)

# auk 0.2.0

- New function, `auk_split()`, splits EBD up into multiple files by species
- New object, `auk_sampling`, and associated methods for working with the sampling data only
- New function, `auk_select()`, for selecting a subset of columns
- `auk_date()` now allows filtering date ranges across years using wildcards, e.g. `date = c("*-05-01", "*-06-30")` for observations from May and June of any year
- New function, `auk_state()` for filtering by state
- Now using AWK arrays to speed up country and species filtering; ~20% speed up when filtering on many species/countries
- Allow selection of a subset of columns when filtering
- Remove free text columns in `auk_clean()` to decrease file size
- Updated to work with Feb 2018 version of EBD
- Fixed broken dependency on `countrycode` package

# auk 0.1.0

- eBird taxonomy update to August 2017 version, users should download the most recent EBD to ensure the taxonomy is in sync with the new package
- Manually set AWK path with environment variable `AWK_PATH` in `.Renviron` file 
- `auk_distance`, `auk_breeding`, `auk_protocol`, and `auk_project` filters added
- Users can now specify a subset of columns to return when calling auk_filter using the keep and drop arguments
- Many changes suggested by rOpenSci package peer review process, see https://github.com/ropensci/onboarding/issues/136 for details
- New vignette added to aid those wanting to contribute to package development

# auk 0.0.2

- Patch release converting ebird_taxonomy to ASCII to pass CRAN checks

# auk 0.0.1

- First CRAN release

================================================
FILE: R/auk-bbox.R
================================================
#' Filter the eBird data by spatial bounding box
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on spatial bounding
#' box. This function only defines the filter and, once all filters have been
#' defined, [auk_filter()] should be used to call AWK and perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param bbox numeric or `sf` or `Raster*` object; spatial bounding box
#'   expressed as the range of latitudes and longitudes in decimal degrees:
#'   `c(lng_min, lat_min, lng_max, lat_max)`. Note that longitudes in the
#'   Western Hemisphere and latitudes sound of the equator should be given as
#'   negative numbers. Alternatively, a spatial object from either the `sf` or 
#'   `raster` packages can be provided and the bounding box will be extracted 
#'   from this object.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # fliter to locations roughly in the Pacific Northwest
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_bbox(bbox = c(-125, 37, -120, 52))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_bbox(ebd, bbox = c(-125, 37, -120, 52))
auk_bbox <- function(x, bbox)  {
  UseMethod("auk_bbox")
}

#' @export
auk_bbox.auk_ebd <- function(x, bbox) {
  # process spatial objects
  if (inherits(bbox, c("sf", "sfc", "Raster"))) {
    if (requireNamespace("sf", quietly = TRUE)) {
      bb <- sf::st_as_sfc(sf::st_bbox(bbox))
      bb <- sf::st_set_crs(bb, value = sf::st_crs(bbox))
      bb <- sf::st_bbox(sf::st_transform(bb, crs = 4326))
      bbox <- c(bb["xmin"], bb["ymin"], bb["xmax"], bb["ymax"]) 
    } else {
      stop("To use sf or raster objects as bbox, install the sf package.")
    } 
  }
  # checks
  assertthat::assert_that(
    is.numeric(bbox),
    length(bbox) == 4,
    bbox[1] < bbox[3],
    bbox[2] < bbox[4],
    bbox[1] >= -180, bbox[1] <= 180,
    bbox[3] >= -180, bbox[3] <= 180,
    bbox[2] >= -90, bbox[2] <= 90,
    bbox[4] >= -90, bbox[4] <= 90
  )

  # define filter
  x$filters$bbox <- bbox
  return(x)
}

#' @export
auk_bbox.auk_sampling <- function(x, bbox) {
  auk_bbox.auk_ebd(x, bbox)
}

#' Filter the eBird data by spatial extent
#' 
#' **Deprecated**, use [auk_bbox()] instead.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param extent numeric; spatial extent expressed as the range of latitudes and
#'   longitudes in decimal degrees: `c(lng_min, lat_min, lng_max, lat_max)`.
#'   Note that longitudes in the Western Hemisphere and latitudes sound of the
#'   equator should be given as negative numbers.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # fliter to locations roughly in the Pacific Northwest
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_bbox(bbox = c(-125, 37, -120, 52))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_bbox(ebd, bbox = c(-125, 37, -120, 52))
auk_extent <- function(x, extent) {
  .Deprecated("auk_bbox")
  auk_bbox(x, bbox = extent)
}

================================================
FILE: R/auk-bcr.R
================================================
#' Filter the eBird data by Bird Conservation Region
#'
#' Define a filter for the eBird Basic Dataset (EBD) to extract data for a set
#' of [Bird Conservation
#' Regions](https://nabci-us.org/resources/bird-conservation-regions/) (BCRs).
#' BCRs are ecologically distinct regions in North America with similar bird
#' communities, habitats, and resource management issues. This function only
#' defines the filter and, once all filters have been defined, [auk_filter()]
#' should be used to call AWK and perform the filtering.
#' 
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param bcr integer; BCRs to filter by. BCRs are identified by an integer, 
#'   from 1 to 66, that can be looked up in the [bcr_codes] table.
#' @param replace logical; multiple calls to `auk_bcr()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of states to
#'   filter by will be removed and replaced by that in the current call.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#' user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # bcr codes can be looked up in bcr_codes
#' dplyr::filter(bcr_codes, bcr_name == "Central Hardwoods")
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_bcr(bcr = 24)
#'   
#' # filter to bcr 24
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_bcr(ebd, bcr = 24)
auk_bcr <- function(x, bcr, replace = FALSE)  {
  UseMethod("auk_bcr")
}

#' @export
auk_bcr.auk_ebd <- function(x, bcr, replace = FALSE) {
  # checks
  assertthat::assert_that(
    all(is_integer(bcr)),
    all(bcr %in% auk::bcr_codes$bcr_code),
    assertthat::is.flag(replace)
  )
  bcr <- as.integer(bcr)
  
  # check for bcr column
  if (!"bcr" %in% x$col_idx$id) {
    stop("BCR column missing from EBD")
  }
  if (!is.null(x$col_idx_sampling) && !"bcr" %in% x$col_idx_sampling$id) {
    stop("BCR column missing from sampling event data")
  }
  
  # set filter list
  if (replace) {
    x$filters$bcr <- bcr
  } else {
    x$filters$bcr <- c(x$filters$bcr, bcr)
  }
  x$filters$bcr <- sort(unique(x$filters$bcr))
  return(x)
}

#' @export
auk_bcr.auk_sampling <- function(x, bcr, replace = FALSE) {
  auk_bcr.auk_ebd(x, bcr, replace)
}


================================================
FILE: R/auk-breeding.R
================================================
#' Filter to only include observations with breeding codes
#'
#' eBird users have the option of specifying breeding bird atlas codes for their
#' observations, for example, if nesting building behaviour is observed. Use
#' this filter to select only those observations with an associated breeding
#' code. This function only defines the filter and, once all filters have been
#' defined, [auk_filter()] should be used to call AWK and perform the filtering.
#'
#' @param x `auk_ebd` object; reference to basic dataset file created by
#'   [auk_ebd()].
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_breeding()
auk_breeding <- function(x)  {
  UseMethod("auk_breeding")
}

#' @export
auk_breeding.auk_ebd <- function(x) {
  # check for breeding code column
  if (!"breeding" %in% x$col_idx$id) {
    stop("Breeding code column missing from EBD")
  }
  
  # define filter
  x$filters$breeding <- TRUE
  return(x)
}


================================================
FILE: R/auk-clean.R
================================================
#' Clean an eBird data file (Deprecated)
#'
#' This function is no longer required by current versions of the eBird Basic 
#' Dataset (EBD).
#'
#' @param f_in character; input file. If file is not found as specified, it will 
#'   be looked for in the directory specified by the `EBD_PATH` environment 
#'   variable.
#' @param f_out character; output file.
#' @param sep character; the input field separator, the basic dataset is tab
#'   separated by default. Must only be a single character and space delimited
#'   is not allowed since spaces appear in many of the fields.
#' @param remove_text logical; whether all free text entry columns should be
#'   removed. These columns include comments, location names, and observer
#'   names. These columns cause import errors due to special characters and
#'   increase the file size, yet are rarely valuable for analytical
#'   applications, so may be removed. Setting this argument to `TRUE` can lead
#'   to a significant reduction in file size.
#' @param overwrite logical; overwrite output file if it already exists.
#'
#' @return If AWK ran without errors, the output filename is returned, however,
#'   if an error was encountered the exit code is returned.
#' @export
#' @family text
#' @examples
#' \dontrun{
#' # get the path to the example data included in the package
#' f <- system.file("extdata/ebd-sample.txt", package = "auk")
#' # output to a temp file for example
#' # in practice, provide path to output file
#' # e.g. f_out <- "output/ebd_clean.txt"
#' f_out <- tempfile()
#'
#' # clean file to remove problem rows
#' # note: this function is deprecated and no longer does anything
#' auk_clean(f, f_out)
#' }
auk_clean <- function(f_in, f_out, sep = "\t", remove_text = FALSE, 
                      overwrite = FALSE) {
  .Deprecated()
  # checks
  awk_path <- auk_get_awk_path()
  if (is.na(awk_path)) {
    stop("auk_clean() requires a valid AWK install.")
  }
  assertthat::assert_that(
    assertthat::is.string(sep), nchar(sep) == 1, sep != " ",
    assertthat::is.flag(remove_text),
    assertthat::is.flag(overwrite)
  )
  f_in <- ebd_file(f_in)
  # check output file
  if (!dir.exists(dirname(f_out))) {
    stop("Output directory doesn't exist.")
  }
  if (!overwrite && file.exists(f_out)) {
    stop("Output file already exists, use overwrite = TRUE.")
  }
  f_out <- normalizePath(f_out, winslash = "/", mustWork = FALSE)

  # determine number of columns
  # read header row
  header <- get_header(f_in, sep)
  if (header[length(header)] == "") {
    header <- header[-length(header)]
  }
  ncols <- length(header)
  if (ncols < 30) {
    stop(
      sprintf("There is an error in your EBD file, only %i columns detected.",
            ncols)
      )
  }
  
  # columns to drop
  if (remove_text) {
    text_cols <- c("locality", 
                   "first name", "last name", 
                   "trip comments", 
                   "species comments")
    keep_cols <- which(!tolower(header) %in% text_cols)
    print_cols <- paste0("$", keep_cols, collapse = ",")
  } else {
    print_cols <- "$0"
  }
  

  # construct awk command
  awk <- str_interp(awk_clean, 
                    list(sep = sep, ncols = ncols, print_cols = print_cols))

  # run command
  exit_code <- system2(awk_path,
                       args = paste0("'", awk, "' ", f_in),
                       stdout = f_out, stderr = FALSE)
  
  if (exit_code == 0) {
    f_out
  } else {
    exit_code
  }
}

# awk script template
awk_clean <- "
BEGIN {
  FS = \"${sep}\"
  OFS = \"${sep}\"
}
{
  # remove end of line tab
  sub(/\t$/, \"\", $0)
  # only keep rows with correct number of records
  if (NF != ${ncols} || NR == 1) {
    print ${print_cols}
  }
}
"


================================================
FILE: R/auk-complete.R
================================================
#' Filter out incomplete checklists from the eBird data
#'
#' Define a filter for the eBird Basic Dataset (EBD) to only keep complete
#' checklists, i.e. those for which all birds seen or heard were recorded. These
#' checklists are the most valuable for scientific uses since they provide
#' presence and absence data.This function only defines the filter and, once all
#' filters have been defined, [auk_filter()] should be used to call AWK and
#' perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_complete()
auk_complete <- function(x)  {
  UseMethod("auk_complete")
}

#' @export
auk_complete.auk_ebd <- function(x) {
  # define filter
  x$filters$complete <- TRUE
  return(x)
}

#' @export
auk_complete.auk_sampling <- function(x) {
  auk_complete.auk_ebd(x)
}


================================================
FILE: R/auk-country.R
================================================
#' Filter the eBird data by country
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a set of
#' countries. This function only defines the filter and, once all filters have
#' been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param country character; countries to filter by. Countries can either be
#'   expressed as English names or
#'   [ISO 2-letter country codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2).
#'   English names are matched via regular expressions using
#'   [countrycode][countrycode::countrycode()], so there is some flexibility in names.
#' @param replace logical; multiple calls to `auk_country()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of countries to
#'   filter by will be removed and replaced by that in the current call.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # country names and ISO2 codes can be mixed
#' # not case sensitive
#' country <- c("CA", "United States", "mexico")
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_country(country)
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_country(ebd, country)
auk_country <- function(x, country, replace = FALSE)  {
  UseMethod("auk_country")
}

#' @export
auk_country.auk_ebd <- function(x, country, replace = FALSE) {
  # checks
  assertthat::assert_that(
    is.character(country),
    assertthat::is.flag(replace)
  )

  # convert country names to codes
  name_codes <- countrycode::countrycode(country,
                                         origin = "country.name",
                                         destination = "iso2c",
                                         warn = FALSE)
  # lookup codes
  code_codes <- countrycode::countrycode(country,
                                         origin = "iso2c",
                                         destination = "iso2c",
                                         warn = FALSE)
  # combine, preference to codes
  country_codes <- dplyr::coalesce(code_codes, name_codes)
  
  # some codes don't match to countrycodes package, treat seperately
  no_code <- is.na(country_codes)
  country_codes[no_code] <- missing_countries(country[no_code])

  # check codes are valid
  valid_codes <- !is.na(country_codes)
  if (!all(valid_codes)) {
    m <- paste0("The following countries are not valid: \n\t",
                paste(country[!valid_codes], collapse =", "))
    stop(m)
  }

  # add countries to filter list
  if (replace) {
    x$filters$country <- country_codes
  } else {
    x$filters$country <- c(x$filters$country, country_codes)
  }
  x$filters$country <- sort(unique(x$filters$country))
  x$filters$state <- character()
  x$filters$county <- character()
  return(x)
}

#' @export
auk_country.auk_sampling <- function(x, country, replace = FALSE) {
  auk_country.auk_ebd(x, country, replace)
}

missing_countries <- function(x) {
  cc <- structure(c("AC", "CP", "CS", "XX", "XK", "FM"), 
                  .Names = c("ashmore and cartier islands", 
                             "clipperton island", 
                             "coral sea islands", "high seas", 
                             "kosovo", "micronesia"))
  # convert country names to codes
  name_codes <- cc[match(toupper(x), cc)]
  # lookup codes
  code_codes <- cc[tolower(x)]
  # combine, preference to codes
  out <- dplyr::coalesce(code_codes, name_codes)
  names(out) <- NULL
  out
}

================================================
FILE: R/auk-county.R
================================================
#' Filter the eBird data by county
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a set of
#' counties This function only defines the filter and, once all filters have
#' been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param county character; counties to filter by. eBird uses county codes
#'   consisting of three parts, the 2-letter ISO country code, a 1-3 character
#'   state code, and a county code, all separated by a dash. For example,
#'   `"US-NY-109"` corresponds to Tompkins, NY, US. The easiest way to find a
#'   county code is to find the corresponding [explore
#'   region](https://ebird.org/explore) page and look at the URL.
#' @param replace logical; multiple calls to `auk_county()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of states to
#'   filter by will be removed and replaced by that in the current call.
#' 
#' @details It is not possible to filter by both county as well as country or
#'   state, so calling `auk_county()` will reset these filters to all countries
#'   and states, and vice versa.
#' 
#' This function can also work with on an `auk_sampling` object if the user only 
#' wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # choose tompkins county, ny, united states
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_county("US-NY-109")
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_county(ebd, "US-NY-109")
auk_county <- function(x, county, replace = FALSE)  {
  UseMethod("auk_county")
}

#' @export
auk_county.auk_ebd <- function(x, county, replace = FALSE) {
  # checks
  assertthat::assert_that(
    is.character(county),
    assertthat::is.flag(replace)
  )
  county <- toupper(county)
  
  # add county to filter list
  if (replace) {
    x$filters$county <- county
  } else {
    x$filters$county <- c(x$filters$county, county)
  }
  x$filters$county <- sort(unique(x$filters$county))
  x$filters$state <- character()
  x$filters$country <- character()
  return(x)
}

#' @export
auk_county.auk_sampling <- function(x, county, replace = FALSE) {
  auk_county.auk_ebd(x, county, replace)
}

================================================
FILE: R/auk-date.R
================================================
#' Filter the eBird data by date
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a range of dates.
#' This function only defines the filter and, once all filters have been
#' defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param date character or date; date range to filter by, provided either as a
#'   character vector in the format `"2015-12-31"` or a vector of Date objects. 
#'   To filter on a range of dates, regardless of year, use `"*"` in place of 
#'   the year.
#' 
#' @details To select observations from a range of dates, regardless of year, 
#' the  wildcard `"*"` can be used in place of the year. For example, using 
#' `date = c("*-05-01", "*-06-30")` will return observations from May and June 
#' of *any year*. When using wildcards, dates can wrap around the year end.
#' 
#' This function can also work with on an `auk_sampling` object if the user only 
#' wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_date(date = c("2010-01-01", "2010-12-31"))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_date(ebd, date = c("2010-01-01", "2010-12-31"))
#' 
#' # the * wildcard can be used in place of year to select dates from all years
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   # may-june records from all years
#'   auk_date(date = c("*-05-01", "*-06-30"))
#'   
#' # dates can also wrap around the end of the year
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   # dec-jan records from all years
#'   auk_date(date = c("*-12-01", "*-01-31"))
auk_date <- function(x, date)  {
  UseMethod("auk_date")
}

#' @export
auk_date.auk_ebd <- function(x, date) {
  # checks
  assertthat::assert_that(
    length(date) == 2,
    is.character(date) || assertthat::is.date(date)
  )
  
  # check for wildcard in year
  has_wildcard <- stringr::str_detect(date, "^\\*-[0-9]{1,2}-[0-9]{1,2}")
  if (all(has_wildcard)) {
    # temporarily replace wildcard with 2016
    date <- stringr::str_replace(date, "^\\*", "2016")
  } else if (all(!has_wildcard)) {
    assertthat::assert_that(date[1] <= date[2])
  } else {
    stop("Cannot mix wildcard dates with non-wildcard dates.")
  }

  # convert to date object, then format as ISO standard date format
  date <- format(as.Date(date), "%Y-%m-%d")
  
  assertthat::assert_that(
    all(!is.na(date)),
    date[1] >= "1850-01-01",
    date[2] >= "1850-01-01"
  )

  # define filter
  if (all(has_wildcard)) {
    x$filters$date <- stringr::str_replace(date, "^2016", "*")
    attr(x$filters$date, "wildcard") <- TRUE
    attr(x$filters$date, "wrap") <- (date[1] > date[2])
  } else {
    x$filters$date <- date
    attr(x$filters$date, "wildcard") <- FALSE
    attr(x$filters$date, "wrap") <- FALSE
  }
  
  return(x)
}

#' @export
auk_date.auk_sampling <- function(x, date) {
  auk_date.auk_ebd(x, date)
}


================================================
FILE: R/auk-distance.R
================================================
#' Filter eBird data by distance travelled
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on the distance
#' travelled on the checklist. This function only defines the filter and, once
#' all filters have been defined, [auk_filter()] should be used to call AWK and
#' perform the filtering. Note that stationary checklists (i.e. point counts) 
#' have no distance associated with them, however, since these checklists can 
#' be assumed to have 0 distance they will be kept if 0 is in the range defined 
#' by `distance`.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param distance integer; 2 element vector specifying the range of distances
#'   to filter by. The default is to accept distances in kilometers, use 
#'   `distance_units = "miles"` for miles.
#' @param distance_units character; whether distances are provided in kilometers 
#'   (the default) or miles.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # only keep checklists that are less than 10 km long
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_distance(distance = c(0, 10))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_distance(ebd, distance = c(0, 10))
auk_distance <- function(x, distance, distance_units)  {
  UseMethod("auk_distance")
}

#' @export
auk_distance.auk_ebd <- function(x, distance, 
                                 distance_units = c("km", "miles")) {
  # checks
  assertthat::assert_that(
    length(distance) == 2,
    is.numeric(distance),
    distance[1] <= distance[2],
    all(distance >= 0)
  )
  
  # convert to kilometers
  distance_units <- match.arg(distance_units)
  if (distance_units == "miles") {
    distance <- 1.60934 * distance
  }
  
  # define filter
  x$filters$distance <- distance
  return(x)
}

#' @export
auk_distance.auk_sampling <- function(x, distance, 
                                      distance_units = c("km", "miles")) {
  auk_distance.auk_ebd(x, distance, distance_units)
}


================================================
FILE: R/auk-duration.R
================================================
#' Filter the eBird data by duration
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on the duration of
#' the checklist. This function only defines the filter and, once all filters
#' have been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering. Note that checklists with no effort, such as incidental 
#' observations, will be excluded if this filter is used since they have no 
#' associated duration information.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param duration integer; 2 element vector specifying the range of durations
#'   in minutes to filter by.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # only keep checklists that are less than an hour long
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_duration(duration = c(0, 60))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_duration(ebd, duration = c(0, 60))
auk_duration <- function(x, duration)  {
  UseMethod("auk_duration")
}

#' @export
auk_duration.auk_ebd <- function(x, duration) {
  # checks
  assertthat::assert_that(
    length(duration) == 2,
    is.numeric(duration),
    duration[1] <= duration[2],
    all(duration >= 0)
  )

  # define filter
  x$filters$duration <- as.integer(round(duration))
  return(x)
}

#' @export
auk_duration.auk_sampling <- function(x, duration) {
  auk_duration.auk_ebd(x, duration)
}


================================================
FILE: R/auk-ebd-version.R
================================================
#' Get the EBD version and associated taxonomy version
#' 
#' Based on the filename of eBird Basic Dataset (EBD) or sampling event data, 
#' determine the version (i.e. release date) of this EBD. Also determine the 
#' corresponding taxonomy version. The eBird taxonomy is updated annually in 
#' August.
#'
#' @param x filename of EBD of sampling event data file, `auk_ebd` object, or
#'   `auk_sampling` object.
#' @param check_exists logical; should the file be checked for existence before 
#'   processing. If `check_exists = TRUE` and the file does not exists, the 
#'   function will raise an error.
#'
#' @return A list with two elements:
#' 
#'   - `ebd_version`: a date object specifying the release date of the EBD.
#'   - `taxonomy_version`: the year of the taxonomy used in this EBD.
#'   
#'  Both elements will be NA if an EBD version cannot be extracted from the 
#'  filename.
#'   
#' @export
#' @family helpers
#' @examples
#' auk_ebd_version("ebd_relAug-2018.txt", check_exists = FALSE)
auk_ebd_version <- function(x, check_exists = TRUE) {
  UseMethod("auk_ebd_version")
}

#' @export
auk_ebd_version.character <- function(x, check_exists = TRUE) {
  if (check_exists) {
    x <- ebd_file(x)
  }
  x <- basename(x)
  
  # get date from filename
  regex <- paste0("((", paste(month.abb, collapse = ")|("), "))-[0-9]{4}")
  ebd_date <- stringr::str_extract(x, regex)
  if (is.na(ebd_date)) {
    return(list(ebd_version = NA, taxonomy_version = NA))
  }
  ebd_date <- stringr::str_split(ebd_date, "-", n = 2)[[1]]
  mth <- match(ebd_date[1], month.abb)
  yr <- as.integer(ebd_date[2])
  ebd_date <- paste(yr, mth, "1", sep = "-")
  ebd_date <- as.Date(ebd_date, format = "%Y-%m-%d")
  if (is.na(ebd_date) || !inherits(ebd_date, "Date")) {
    return(list(ebd_version = NA, taxonomy_version = NA))
  }
  
  # determine taxonomy version
  if (mth < 8) {
    tax <- yr - 1
  } else {
    tax <- yr
  }
  return(list(ebd_version = ebd_date, taxonomy_version = tax))
}

#' @export
auk_ebd_version.auk_ebd <- function(x, check_exists = TRUE) {
  auk_ebd_version(x$file, check_exists = check_exists)
}

#' @export
auk_ebd_version.auk_sampling <- function(x, check_exists = TRUE) {
  auk_ebd_version(x$file, check_exists = check_exists)
}


================================================
FILE: R/auk-ebd.R
================================================
#' Reference to eBird data file
#'
#' Create a reference to an eBird Basic Dataset (EBD) file in preparation for
#' filtering using AWK.
#'
#' @param file character; input file. If file is not found as specified, it will
#'   be looked for in the directory specified by the `EBD_PATH` environment
#'   variable.
#' @param file_sampling character; optional input sampling event data (i.e.
#'   checklists) file, required if you intend to zero-fill the data to produce a
#'   presence-absence data set. This file consists of just effort information
#'   for every eBird checklist. Any species not appearing in the EBD for a given
#'   checklist is implicitly considered to have a count of 0. This file should
#'   be downloaded at the same time as the basic dataset to ensure they are in
#'   sync. If file is not found as specified, it will be looked for in the
#'   directory specified by the `EBD_PATH` environment variable.
#' @param sep character; the input field separator, the eBird data are tab
#'   separated so this should generally not be modified. Must only be a single
#'   character and space delimited is not allowed since spaces appear in many of
#'   the fields.
#'
#' @details 
#' eBird data can be downloaded as a tab-separated text file from the 
#' [eBird website](http://ebird.org/ebird/data/download) after submitting a 
#' request for access. As of February 2017, this file is nearly 150 GB making it
#' challenging to work with. If you're only interested in a single species or a
#' small region it is possible to submit a custom download request. This
#' approach is suggested to speed up processing time.
#'
#' There are two potential pathways for preparing eBird data. Users wishing to
#' produce presence only data, should download the 
#' [eBird Basic Dataset](http://ebird.org/ebird/data/download/) and reference 
#' this file when calling `auk_ebd()`. Users wishing to produce zero-filled,
#' presence absence data should additionally download the sampling event data
#' file associated with the basic dataset This file contains only checklist
#' information and can be used to infer absences. The sampling event data file
#' should be provided to `auk_ebd()` via the `file_sampling` argument. For
#' further details consult the vignettes.
#'
#' @return An `auk_ebd` object storing the file reference and the desired
#'   filters once created with other package functions.
#' @export
#' @family objects
#' @examples
#' # get the path to the example data included in the package
#' # in practice, provide path to ebd, e.g. f <- "data/ebd_relFeb-2018.txt
#' f <- system.file("extdata/ebd-sample.txt", package = "auk")
#' auk_ebd(f)
#' # to produce zero-filled data, provide a checklist file
#' f_ebd <- system.file("extdata/zerofill-ex_ebd.txt", package = "auk")
#' f_cl <- system.file("extdata/zerofill-ex_sampling.txt", package = "auk")
#' auk_ebd(f_ebd, file_sampling = f_cl)
auk_ebd <- function(file, file_sampling, sep = "\t") {
  # checks
  assertthat::assert_that(
    assertthat::is.string(sep), nchar(sep) == 1, sep != " "
  )
  file <- ebd_file(file)
  # read header rows
  header <- tolower(get_header(file, sep))
  header <- stringr::str_replace_all(header, "[^a-z0-9]+", " ")
  # fix for custom download
  header[header == "state province"] <- "state"
  header[header == "subnational1 code"] <- "state code"
  col_idx <- data.frame(id = NA_character_, 
                        name = header, 
                        index = seq_along(header),
                        stringsAsFactors = FALSE)
  
  # check column name for protocol column
  protocol_col_name <- "protocol name"
  if (!protocol_col_name %in% header) {
    protocol_col_name <- "protocol type"
  }
  # check column name for project column
  project_col_name <- "project names"
  if (!project_col_name %in% header) {
    project_col_name <- "project code"
  }
  
  # ensure key columns are present
  mandatory <- c("scientific name",
                 "country code", "state code",
                 "latitude", "longitude",
                 "observation date", "time observations started",
                 protocol_col_name,
                 "exotic code",
                 "duration minutes", "effort distance km",
                 "all species reported",
                 "observer id",
                 "sampling event identifier", "group identifier")
  col_miss <- mandatory[!(mandatory %in% header)]
  if (length(col_miss) > 0) {
    m <- sprintf("Required columns missing from the EBD file:\n\t%s",
                 paste(col_miss, collapse = "\n\t"))
    stop(m)
  }
  
  # identify columns required for filtering
  filter_cols <- data.frame(
    id = c("species",
           "country", "state", "county", "bcr",
           "lat", "lng", 
           "date", "time", "last_edited",
           "protocol", "project", 
           "duration", "distance", 
           "breeding", "exotic", 
           "complete",
           "observer"),
    name = c("scientific name",
             "country code", "state code", "county code", "bcr code", 
             "latitude", "longitude",
             "observation date", "time observations started",
             "last edited date", 
             protocol_col_name, project_col_name,
             "duration minutes", "effort distance km",
             "breeding code",
             "exotic code",
             "all species reported",
             "observer id"),
    stringsAsFactors = FALSE)
  filter_cols <- filter_cols[filter_cols$name %in% col_idx$name, ]
  col_idx$id[match(filter_cols$name, col_idx$name)] <- filter_cols$id
  
  # process sampling data header
  if (!missing(file_sampling)) {
    file_sampling <- ebd_file(file_sampling)
    # variables not in sampling data
    not_in_sampling <- c("species", "breeding", "exotic")
    filter_cols_sampling <- filter_cols[!filter_cols$id %in% not_in_sampling, ]
    # read header rows
    header_sampling <- tolower(get_header(file_sampling, sep))
    # ensure key columns are present
    mandatory_sampl <- setdiff(mandatory, "scientific name")
    col_miss <- mandatory_sampl[!(mandatory_sampl %in% header)]
    if (length(col_miss) > 0) {
      m <- sprintf("Required columns missing from the sampling file:\n\t%s",
                   paste(col_miss, collapse = "\n\t"))
      stop(m)
    }
    # identify column locations
    col_idx_sampling <- data.frame(id = NA_character_, 
                                   name = header_sampling, 
                                   index = seq_along(header_sampling),
                                   stringsAsFactors = FALSE)
    col_found <- filter_cols_sampling$name %in% col_idx$name
    filter_cols_sampling <- filter_cols_sampling[col_found, ]
    mtch <- match(filter_cols_sampling$name, col_idx_sampling$name)
    col_idx_sampling$id[mtch] <- filter_cols_sampling$id
  } else {
    file_sampling <- NULL
    col_idx_sampling <- NULL
  }

  # output
  structure(
    list(
      file = file,
      file_sampling = file_sampling,
      output = NULL,
      output_sampling = NULL,
      col_idx = col_idx,
      col_idx_sampling = col_idx_sampling,
      filters = list(
        species = character(),
        country = character(),
        state = character(),
        county = character(),
        bcr = integer(),
        bbox = numeric(),
        year = integer(),
        date = character(),
        time = character(),
        last_edited = character(),
        protocol = character(), 
        project = character(),
        duration = numeric(),
        distance = numeric(),
        breeding = FALSE,
        exotic = character(),
        complete = FALSE,
        observer = character()
      )
    ),
    class = "auk_ebd"
  )
}

#' @export
print.auk_ebd <- function(x, ...) {
  cat("Input \n")
  cat(paste("  EBD:", x$file, "\n"))
  if (!is.null(x$file_sampling)) {
    cat(paste("  Sampling events:", x$file_sampling, "\n"))
  }
  cat("\n")

  cat("Output \n")
  if (is.null(x$output)) {
    cat("  Filters not executed\n")
  } else {
    cat(paste("  EBD:", x$output, "\n"))
    if (!is.null(x$output_sampling)) {
      cat(paste("  Sampling events:", x$output_sampling, "\n"))
    }
  }
  cat("\n")

  cat("Filters \n")
  # species filter
  cat("  Species: ")
  if (length(x$filters$species) == 0) {
    cat("all")
  } else if (length(x$filters$species) <= 10) {
    cat(paste(x$filters$species, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$species), " species"))
  }
  cat("\n")
  # country filter
  cat("  Countries: ")
  if (length(x$filters$country) == 0) {
    cat("all")
  } else if (length(x$filters$country) <= 10) {
    cat(paste(x$filters$country, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$country), " countries"))
  }
  cat("\n")
  # state filter
  cat("  States: ")
  if (length(x$filters$state) == 0) {
    cat("all")
  } else if (length(x$filters$state) <= 10) {
    cat(paste(x$filters$state, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$state), " states"))
  }
  cat("\n")
  # county filter
  cat("  Counties: ")
  if (length(x$filters$county) == 0) {
    cat("all")
  } else if (length(x$filters$county) <= 10) {
    cat(paste(x$filters$county, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$county), " counties"))
  }
  cat("\n")
  # bcr filter
  cat("  BCRs: ")
  if (length(x$filters$bcr) == 0) {
    cat("all")
  } else if (length(x$filters$bcr) <= 10) {
    cat(paste(x$filters$bcr, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$bcr), " BCRs"))
  }
  cat("\n")
  # bbox filter
  cat("  Bounding box: ")
  e <- round(x$filters$bbox, 1)
  if (length(e) == 0) {
    cat("full extent")
  } else {
    cat(paste0("Lon ", e[1], " - ", e[3], "; "))
    cat(paste0("Lat ", e[2], " - ", e[4]))
  }
  cat("\n")
  # year filter
  cat("  Years: ")
  if (length(x$filters$year) == 0) {
    cat("all")
  } else if (length(x$filters$year) <= 10) {
    cat(paste(x$filters$year, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$year), " years"))
  }
  cat("\n")
  # date filter
  cat("  Date: ")
  if (length(x$filters$date) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$date[1], " - ", x$filters$date[2]))
  }
  cat("\n")
  # time filter
  cat("  Start time: ")
  if (length(x$filters$time) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$time[1], "-", x$filters$time[2]))
  }
  cat("\n")
  # last edited date filter
  cat("  Last edited date: ")
  if (length(x$filters$last_edited) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$last_edited[1], " - ", x$filters$last_edited[2]))
  }
  cat("\n")
  # protocol filter
  cat("  Protocol: ")
  if (length(x$filters$protocol) == 0) {
    cat("all")
  } else {
    cat(paste(x$filters$protocol, collapse = ", "))
  }
  cat("\n")
  # project filter
  cat("  Project code: ")
  if (length(x$filters$project) == 0) {
    cat("all")
  } else {
    cat(paste(x$filters$project, collapse = ", "))
  }
  cat("\n")
  # duration filter
  cat("  Duration: ")
  if (length(x$filters$duration) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$duration[1], "-", x$filters$duration[2], " minutes"))
  }
  cat("\n")
  # distance filter
  cat("  Distance travelled: ")
  if (length(x$filters$distance) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$distance[1], "-", x$filters$distance[2], " km"))
  }
  cat("\n")
  # breeding codes
  cat("  Records with breeding codes only: ")
  if (x$filters$breeding) {
    cat("yes")
  } else {
    cat("no")
  }
  cat("\n")
  # exotic code
  cat("  Exotic Codes: ")
  if (length(x$filters$exotic) %in% c(0, 4)) {
    cat("all")
  } else {
    ex_codes <- dplyr::recode(x$filters$exotic,
                              "N" = "Naturalized",
                              "P" = "Provisional",
                              "X" = "Escapee")
    ex_codes <- ifelse(ex_codes == "", "Native", ex_codes)
    cat(paste(ex_codes, collapse = ", "))
  }
  cat("\n")
  # complete checklists only
  cat("  Complete checklists only: ")
  if (x$filters$complete) {
    cat("yes")
  } else {
    cat("no")
  }
  cat("\n")
  return(invisible(x))
}


================================================
FILE: R/auk-exotic.R
================================================
#' Filter the eBird data by exotic code
#'
#' Exotic codes are applied to eBird observations when the species is believe to
#' be non-native to the given location. This function defines a filter for the
#' eBird Basic Dataset (EBD) to subset observations to one or more of the exotic
#' codes: "" (i.e. no code, meaning it is a native species), "N" (naturalized),
#' "P" (provisional), or "X" (escapee). This function only defines the filter
#' and, once all filters have been defined, [auk_filter()] should be used to
#' call AWK and perform the filtering.
#' 
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param exotic_code characterr; exotic codes to filter by. Note that an empty
#'   string (""), meaning no exotic code, is used for native species.
#' @param replace logical; multiple calls to `auk_exotic()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of states to
#'   filter by will be removed and replaced by that in the current call.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # filter to only native observations
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_exotic(ebd, exotic_code = "")
#' 
#' # filter to native and naturalized observations
#' auk_exotic(ebd, exotic_code = c("", "N"))
auk_exotic <- function(x, exotic_code, replace = FALSE)  {
  UseMethod("auk_exotic")
}

#' @export
auk_exotic.auk_ebd <- function(x, exotic_code, replace = FALSE) {
  # checks
  assertthat::assert_that(
    all(exotic_code %in% c("", "N", "P", "X")),
    assertthat::is.flag(replace)
  )
  
  # check for bcr column
  if (!"exotic" %in% x$col_idx$id) {
    stop("Exotic code column missing from EBD")
  }
  
  # set filter list
  if (replace) {
    x$filters$exotic <- exotic_code
  } else {
    x$filters$exotic <- c(x$filters$exotic, exotic_code)
  }
  x$filters$exotic <- sort(unique(x$filters$exotic))
  return(x)
}

================================================
FILE: R/auk-filter.R
================================================
#' Filter the eBird file using AWK
#'
#' Convert the filters defined in an `auk_ebd` object into an AWK script and run
#' this script to produce a filtered eBird Reference Dataset (ERD). The initial
#' creation of the `auk_ebd` object should be done with [auk_ebd()] and filters
#' can be defined using the various other functions in this package, e.g.
#' [auk_species()] or [auk_country()]. **Note that this function typically takes
#' at least a couple hours to run on the full dataset**
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param file character; output file.
#' @param file_sampling character; optional output file for sampling data.
#' @param keep character; a character vector specifying the names of the columns
#'   to keep in the output file. Columns should be as they appear in the header
#'   of the EBD; however, names are not case sensitive and spaces may be
#'   replaced by underscores, e.g. `"COMMON NAME"`, `"common name"`, and
#'   `"common_NAME"` are all valid.
#' @param drop character; a character vector of columns to drop in the same
#'   format as `keep`. Ignored if `keep` is supplied.
#' @param awk_file character; output file to optionally save the awk script to.
#' @param sep character; the input field separator, the eBird file is tab
#'   separated by default. Must only be a single character and space delimited
#'   is not allowed since spaces appear in many of the fields.
#' @param filter_sampling logical; whether the sampling event data should also
#'   be filtered.
#' @param execute logical; whether to execute the awk script, or output it to a
#'   file for manual execution. If this flag is `FALSE`, `awk_file` must be
#'   provided.
#' @param overwrite logical; overwrite output file if it already exists
#' @param ... arguments passed on to methods.
#'
#' @details
#' If a sampling file is provided in the [auk_ebd][auk_ebd()] object, this
#' function will filter both the eBird Basic Dataset and the sampling data using
#' the same set of filters. This ensures that the files are in sync, i.e. that
#' they contain data on the same set of checklists.
#'
#' The AWK script can be saved for future reference by providing an output
#' filename to `awk_file`. The default behavior of this function is to generate
#' and run the AWK script, however, by setting `execute = FALSE` the AWK script
#' will be generated but not run. In this case, `file` is ignored and `awk_file`
#' must be specified.
#'
#' Calling this function requires that the command line utility AWK is
#' installed. Linux and Mac machines should have AWK by default, Windows users
#' will likely need to install [Cygwin](https://www.cygwin.com).
#'
#' @return An `auk_ebd` object with the output files set. If `execute = FALSE`,
#'   then the path to the AWK script is returned instead.
#' @export
#' @family filter
#' @examples
#' # get the path to the example data included in the package
#' # in practice, provide path to ebd, e.g. f <- "data/ebd_relFeb-2018.txt"
#' f <- system.file("extdata/ebd-sample.txt", package = "auk")
#' # define filters
#' filters <- auk_ebd(f) |>
#'   auk_species(species = c("Canada Jay", "Blue Jay")) |>
#'   auk_country(country = c("US", "Canada")) |>
#'   auk_bbox(bbox = c(-100, 37, -80, 52)) |>
#'   auk_date(date = c("2012-01-01", "2012-12-31")) |>
#'   auk_time(start_time = c("06:00", "09:00")) |>
#'   auk_duration(duration = c(0, 60)) |>
#'   auk_complete()
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' filters <- auk_species(ebd, species = c("Canada Jay", "Blue Jay"))
#' filters <- auk_country(filters, country = c("US", "Canada"))
#' filters <- auk_bbox(filters, bbox = c(-100, 37, -80, 52))
#' filters <- auk_date(filters, date = c("2012-01-01", "2012-12-31"))
#' filters <- auk_time(filters, start_time = c("06:00", "09:00"))
#' filters <- auk_duration(filters, duration = c(0, 60))
#' filters <- auk_complete(filters)
#' 
#' # apply filters
#' \dontrun{
#' # output to a temp file for example
#' # in practice, provide path to output file
#' # e.g. f_out <- "output/ebd_filtered.txt"
#' f_out <- tempfile()
#' filtered <- auk_filter(filters, file = f_out)
#' str(read_ebd(filtered))
#' }
auk_filter <- function(x, file, ...) {
  UseMethod("auk_filter")
}

#' @export
#' @describeIn auk_filter `auk_ebd` object
auk_filter.auk_ebd <- function(x, file, file_sampling, keep, drop, awk_file,
                               sep = "\t", filter_sampling = TRUE, 
                               execute = TRUE, overwrite = FALSE, ...) {
  # checks
  awk_path <- auk_get_awk_path()
  if (execute && is.na(awk_path)) {
    stop("auk_filter() requires a valid AWK install, unless execute = FALSE.")
  }
  assertthat::assert_that(
    file.exists(x$file),
    is.null(x$file_sampling) || file.exists(x$file_sampling),
    assertthat::is.flag(execute),
    !execute || assertthat::is.string(file),
    missing(awk_file) || assertthat::is.string(awk_file),
    assertthat::is.string(sep), nchar(sep) == 1, sep != " ",
    missing(keep) || is.character(keep),
    missing(drop) || is.character(drop),
    assertthat::is.flag(filter_sampling),
    assertthat::is.flag(overwrite)
  )
  if (!execute && missing(awk_file)) {
    stop("awk_file must be set when execute is FALSE.")
  }
  
  # check output file
  if (!missing(file)) {
    if (!dir.exists(dirname(file))) {
      stop("Output directory doesn't exist.")
    }
    if (!overwrite && file.exists(file) && execute) {
      stop("Output file already exists, use overwrite = TRUE.")
    }
    file <- normalizePath(file, winslash = "/", mustWork = FALSE)
  }
  # check output awk file
  if (!missing(awk_file) && !dir.exists(dirname(awk_file))) {
    stop("Output directory for awk file doesn't exist.")
  }
  # check output sampling file
  if (is.null(x$file_sampling) || !execute || !filter_sampling) {
    filter_sampling <- FALSE
  }
  if (filter_sampling && missing(file_sampling)) {
    stop("An output file for the sampling data must be provided, ",
         "unless filter_sampling is FALSE.")
  } else if (filter_sampling) {
    if (!dir.exists(dirname(file_sampling))) {
      stop("Output directory for sampling file doesn't exist.")
    }
    if (!overwrite && file.exists(file_sampling) && execute) {
      stop("Output sampling file already exists, use overwrite = TRUE.")
    }
    file_sampling <- normalizePath(file_sampling, winslash = "/", 
                                   mustWork = FALSE)
  }
  # zero-filling requires complete checklists
  if (filter_sampling && !x$filters$complete) {
    w <- paste("Sampling event data file provided, but filters have not been ",
               "set to only return complete checklists. Complete checklists ",
               "are required for zero-filling. You may want to use ",
               "auk_complete(), or manually filter out incomplete checklists.")
    warning(w)
  }
  
  # pick columns to retain
  must_keep <- c("group identifier", "sampling event identifier",
                 "observer id",
                 "scientific name", "observation count")
  if (!missing(keep)) {
    keep <- tolower(keep)
    keep <- stringr::str_replace_all(keep, "_", " ")
    stopifnot(all(keep %in% x$col_idx$name))
    if (!all(must_keep %in% keep)) {
      m <- paste("The following columns must be retained:",
                 paste(setdiff(must_keep, keep), collapse = ", "))
      stop(m)
    }
    idx <- x$col_idx$index[x$col_idx$name %in% keep]
    select_cols <- paste0("$", idx, collapse = ", ")
  } else if (!missing(drop)) {
    drop <- tolower(drop)
    drop <- stringr::str_replace_all(drop, "_", " ")
    drop <- stringr::str_replace_all(drop, "/", " ")
    stopifnot(all(drop %in% x$col_idx$name))
    if (any(must_keep %in% drop)) {
      m <- paste("The following columns must be retained:",
                 paste(intersect(must_keep, drop), collapse = ", "))
      stop(m)
    }
    idx <- x$col_idx$index[!x$col_idx$name %in% drop]
    select_cols <- paste0("$", idx, collapse = ", ")
  } else {
    select_cols <- "$0"
  }
  
  # create awk script for the ebd
  awk_script <- awk_translate(filters = x$filters,
                              col_idx = x$col_idx,
                              sep = sep,
                              select = select_cols)
  # create awk script for the ebd sampling data
  if (filter_sampling) {
    # pick columns to retain
    if (!missing(keep)) {
      keep <- tolower(keep)
      keep <- stringr::str_replace_all(keep, "_", " ")
      idx <- x$col_idx_sampling$index[x$col_idx_sampling$name %in% keep]
      select_cols <- paste0("$", idx, collapse = ", ")
    } else if (!missing(drop)) {
      drop <- tolower(drop)
      drop <- stringr::str_replace_all(drop, "_", " ")
      idx <- x$col_idx_sampling$index[!x$col_idx_sampling$name %in% drop]
      select_cols <- paste0("$", idx, collapse = ", ")
    } else {
      select_cols <- "$0"
    }
    
    # remove species filter
    s_filters <- x$filters
    s_filters$species <- character()
    s_filters$breeding <- FALSE
    s_filters$exotic <- character()
    # fix observer filter
    s_filters$observer <- stringr::str_replace(s_filters$observer, 
                                               "^obsr", "obs")
    awk_script_sampling <- awk_translate(filters = s_filters,
                                         col_idx = x$col_idx_sampling,
                                         sep = sep,
                                         select = select_cols)
  }
  
  # output awk file
  if (!missing(awk_file)) {
    writeLines(awk_script, awk_file)
    if (!execute) {
      return(normalizePath(awk_file, winslash = "/", mustWork = FALSE))
    }
  }
  
  # run awk
  # ebd sampling
  if (filter_sampling) {
    exit_code <- system2(awk_path,
                         args = paste0("'", awk_script_sampling, "' '",
                                       x$file_sampling, "'"),
                         stdout = file_sampling, stderr = FALSE)
    if (exit_code != 0) {
      stop("Error running AWK command.")
    } else {
      x$output_sampling <- normalizePath(file_sampling, winslash = "/")
    }
  }
  
  # ebd
  exit_code <- system2(awk_path,
                       args = paste0("'", awk_script, "' '", x$file, "'"),
                       stdout = file, stderr = FALSE)
  if (exit_code != 0) {
    stop("Error running AWK command.")
  } else {
    x$output <- normalizePath(file, winslash = "/")
  }
  return(x)
}

#' @export
#' @describeIn auk_filter `auk_sampling` object
auk_filter.auk_sampling <- function(x, file, keep, drop, awk_file,
                                    sep = "\t", execute = TRUE, 
                                    overwrite = FALSE, ...) {
  # checks
  awk_path <- auk_get_awk_path()
  if (execute && is.na(awk_path)) {
    stop("auk_filter() requires a valid AWK install, unless execute = FALSE.")
  }
  assertthat::assert_that(
    file.exists(x$file),
    assertthat::is.flag(execute),
    !execute || assertthat::is.string(file),
    missing(awk_file) || assertthat::is.string(awk_file),
    assertthat::is.string(sep), nchar(sep) == 1, sep != " ",
    missing(keep) || is.character(keep),
    missing(drop) || is.character(drop),
    assertthat::is.flag(overwrite)
  )
  if (!execute && missing(awk_file)) {
    stop("awk_file must be set when execute is FALSE.")
  }
  
  # check output file
  if (!missing(file)) {
    if (!dir.exists(dirname(file))) {
      stop("Output directory doesn't exist.")
    }
    if (!overwrite && file.exists(file)) {
      stop("Output file already exists, use overwrite = TRUE.")
    }
    file <- normalizePath(file, winslash = "/", mustWork = FALSE)
  }
  # check output awk file
  if (!missing(awk_file) && !dir.exists(dirname(awk_file))) {
    stop("Output directory for awk file doesn't exist.")
  }
  
  # pick columns to retain
  must_keep <- c("group identifier", "sampling event identifier", "observer id")
  if (!missing(keep)) {
    keep <- tolower(keep)
    keep <- stringr::str_replace_all(keep, "_", " ")
    stopifnot(all(keep %in% x$col_idx$name))
    if (!all(must_keep %in% keep)) {
      m <- paste("The following columns must be retained:",
                 paste(must_keep, collapse = ", "))
      stop(m)
    }
    idx <- x$col_idx$index[x$col_idx$name %in% keep]
    select_cols <- paste0("$", idx, collapse = ", ")
  } else if (!missing(drop)) {
    drop <- tolower(drop)
    drop <- stringr::str_replace_all(drop, "_", " ")
    stopifnot(all(drop %in% x$col_idx$name))
    if (any(must_keep %in% drop)) {
      m <- paste("The following columns must be retained:",
                 paste(must_keep, collapse = ", "))
      stop(m)
    }
    idx <- x$col_idx$index[!x$col_idx$name %in% drop]
    select_cols <- paste0("$", idx, collapse = ", ")
  } else {
    select_cols <- "$0"
  }
  
  # fix observer filter
  x$filters$observer <- stringr::str_replace(x$filters$observer, 
                                             "^obsr", "obs")
  
  # create awk script for the sampling event file
  awk_script <- awk_translate(filters = x$filters,
                              col_idx = x$col_idx,
                              sep = sep,
                              select = select_cols)
  
  # output awk file
  if (!missing(awk_file)) {
    writeLines(awk_script, awk_file)
    if (!execute) {
      return(normalizePath(awk_file, winslash = "/"))
    }
  }
  
  # run awk
  # ebd
  exit_code <- system2(awk_path,
                       args = paste0("'", awk_script, "' '", x$file, "'"),
                       stdout = file, stderr = FALSE)
  if (exit_code != 0) {
    stop("Error running AWK command.")
  } else {
    x$output <- normalizePath(file, winslash = "/")
  }
  return(x)
}

awk_translate <- function(filters, col_idx, sep, select) {
  if (missing(select)) {
    select <- "$0"
  }
  # only keep filter columns
  col_idx <- col_idx[!is.na(col_idx$id), ]
  # set up filters
  filter_strings <- list(sep = sep, select = select)
  # species filter
  if (!"species" %in% names(filters) || length(filters$species) == 0) {
    filter_strings$species_array <- ""
    filter_strings$species <- ""
  } else {
    # generate list
    species_list <- paste(filters$species, collapse = "\t")
    species_array <- "
    split(\"%s\", speciesValues, \"\t\")
    for (i in speciesValues) species[speciesValues[i]] = 1"
    filter_strings$species_array <- sprintf(species_array, species_list)
    
    # check in list
    idx <- col_idx$index[col_idx$id == "species"]
    condition <- paste0("$", idx, " in species")
    filter_strings$species <- str_interp(awk_if, list(condition = condition))
  }
  # country filter
  if (length(filters$country) == 0) {
    filter_strings$country_array <- ""
    filter_strings$country <- ""
  } else {
    # generate list
    country_list <- paste(filters$country, collapse = "\t")
    country_array <- "
    split(\"%s\", countryValues, \"\t\")
    for (i in countryValues) countries[countryValues[i]] = 1"
    filter_strings$country_array <- sprintf(country_array, country_list)
    
    # check in list
    idx <- col_idx$index[col_idx$id == "country"]
    condition <- paste0("$", idx, " in countries")
    filter_strings$country <- str_interp(awk_if, list(condition = condition))
  }
  # state filter
  if (length(filters$state) == 0) {
    filter_strings$state_array <- ""
    filter_strings$state <- ""
  } else {
    # generate list
    state_list <- paste(filters$state, collapse = "\t")
    state_array <- "
    split(\"%s\", stateValues, \"\t\")
    for (i in stateValues) states[stateValues[i]] = 1"
    filter_strings$state_array <- sprintf(state_array, state_list)
    
    # check in list
    idx <- col_idx$index[col_idx$id == "state"]
    condition <- paste0("$", idx, " in states")
    filter_strings$state <- str_interp(awk_if, list(condition = condition))
  }
  # county filter
  if (length(filters$county) == 0) {
    filter_strings$county_array <- ""
    filter_strings$county <- ""
  } else {
    # generate list
    county_list <- paste(filters$county, collapse = "\t")
    county_array <- "
    split(\"%s\", countyValues, \"\t\")
    for (i in countyValues) county[countyValues[i]] = 1"
    filter_strings$county_array <- sprintf(county_array, county_list)
    
    # check in list
    idx <- col_idx$index[col_idx$id == "county"]
    condition <- paste0("$", idx, " in county")
    filter_strings$county <- str_interp(awk_if, list(condition = condition))
  }
  # bcr filter
  if (length(filters$bcr) == 0) {
    filter_strings$bcr <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "bcr"]
    condition <- paste0("$", idx, " == \"", filters$bcr, "\"",
                        collapse = " || ")
    filter_strings$bcr <- str_interp(awk_if, list(condition = condition))
  }
  # bbox filter
  if (length(filters$bbox) == 0) {
    filter_strings$bbox <- ""
  } else {
    lat_idx <- col_idx$index[col_idx$id == "lat"]
    lng_idx <- col_idx$index[col_idx$id == "lng"]
    condition_tmpl <- paste0("$${lng_idx} >= ${xmn} && ",
                             "$${lng_idx} <= ${xmx} && ",
                             "$${lat_idx} >= ${ymn} && ",
                             "$${lat_idx} <= ${ymx}")
    condition <- str_interp(condition_tmpl,
                            list(lat_idx = lat_idx, lng_idx = lng_idx,
                                 xmn = filters$bbox[1], xmx = filters$bbox[3],
                                 ymn = filters$bbox[2], ymx = filters$bbox[4]))
    filter_strings$bbox <- str_interp(awk_if, list(condition = condition))
  }
  # year filter
  if (length(filters$year) == 0) {
    filter_strings$year_substr <- ""
    filter_strings$year <- ""
  } else {
    # extract just the year with awk
    idx <- col_idx$index[col_idx$id == "date"]
    filter_strings$year_substr <- sprintf("yr = substr($%i, 1, 4)", idx)
    # subset to set of years
    condition <- paste0("yr == ", filters$year, collapse = " || ")
    filter_strings$year <- str_interp(awk_if, list(condition = condition))
  }
  # date filter
  if (length(filters$date) == 0) {
    filter_strings$date_substr <- ""
    filter_strings$date <- ""
  } else if (isTRUE(attr(filters$date, "wildcard"))) {
    # extract just the month and day with awk
    idx <- col_idx$index[col_idx$id == "date"]
    filter_strings$date_substr <- sprintf("monthday = substr($%i, 6, 5)", idx)
    # remove the wildcard part of date
    dates <- stringr::str_replace(filters$date, "^\\*-", "")
    lo_wrap <- if (attr(filters$date, "wrap")) {
      "||"
    } else {
      "&&" 
    }
    condition <- str_interp("monthday >= \"${mn}\" ${lo} monthday <= \"${mx}\"",
                            list(mn = dates[1], mx = dates[2],
                                 lo = lo_wrap))
    filter_strings$date <- str_interp(awk_if, list(condition = condition))
  } else {
    filter_strings$date_substr <- ""
    idx <- col_idx$index[col_idx$id == "date"]
    condition <- str_interp("$${idx} >= \"${mn}\" && $${idx} <= \"${mx}\"",
                            list(idx = idx,
                                 mn = filters$date[1],
                                 mx = filters$date[2]))
    filter_strings$date <- str_interp(awk_if, list(condition = condition))
  }
  # time filter
  if (length(filters$time) == 0) {
    filter_strings$time <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "time"]
    condition <- str_interp("$${idx} >= \"${mn}\" && $${idx} <= \"${mx}\"",
                            list(idx = idx,
                                 mn = filters$time[1],
                                 mx = filters$time[2]))
    filter_strings$time <- str_interp(awk_if, list(condition = condition))
  }
  # last edited date filter
  if (length(filters$last_edited) == 0) {
    filter_strings$last_edited <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "last_edited"]
    condition <- str_interp("$${idx} >= \"${mn}\" && $${idx} <= \"${mx}\"",
                            list(idx = idx,
                                 mn = filters$last_edited[1],
                                 mx = filters$last_edited[2]))
    filter_strings$last_edited <- str_interp(awk_if,
                                             list(condition = condition))
  }
  # project filter
  if (length(filters$project) == 0) {
    filter_strings$project <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "project"]
    condition <- paste0("$", idx, " ~ \"", filters$project, "\"",
                        collapse = " || ")
    filter_strings$project <- str_interp(awk_if, list(condition = condition))
  }
  # protocol filter
  if (length(filters$protocol) == 0) {
    filter_strings$protocol <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "protocol"]
    condition <- paste0("$", idx, " == \"", filters$protocol, "\"",
                        collapse = " || ")
    filter_strings$protocol <- str_interp(awk_if, list(condition = condition))
  }
  # duration filter
  if (length(filters$duration) == 0) {
    filter_strings$duration <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "duration"]
    condition <- str_interp("$${idx} >= ${mn} && $${idx} <= ${mx}",
                            list(idx = idx,
                                 mn = filters$duration[1],
                                 mx = filters$duration[2]))
    filter_strings$duration <- str_interp(awk_if, list(condition = condition))
  }
  # distance filter
  if (length(filters$distance) == 0) {
    filter_strings$distance <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "distance"]
    # include stationary counts
    if (0.0001 >= filters$distance[1]) {
      p_idx <- col_idx$index[col_idx$id == "protocol"]
      inc_stat <- str_interp("$${idx} == \"Stationary\"",
                             list(idx = p_idx))
      condition <- str_interp(
        "${inc} || ($${idx} >= ${mn} && $${idx} <= ${mx})",
        list(idx = idx,
             mn = filters$distance[1],
             mx = filters$distance[2],
             inc = inc_stat))
    } else {
      condition <- str_interp("$${idx} >= ${mn} && $${idx} <= ${mx}",
                              list(idx = idx,
                                   mn = filters$distance[1],
                                   mx = filters$distance[2]))
    }
    filter_strings$distance <- str_interp(awk_if, list(condition = condition))
  }
  # breeding records only
  if ("breeding" %in% names(filters) && filters$breeding) {
    idx <- col_idx$index[col_idx$id == "breeding"]
    condition <- str_interp("$${idx} != \"\"", list(idx = idx))
    filter_strings$breeding <- str_interp(awk_if, list(condition = condition))
  } else {
    filter_strings$breeding <- ""
  }
  # exotic codes
  if (length(filters$exotic) == 0) {
    filter_strings$exotic <- ""
  } else {
    idx <- col_idx$index[col_idx$id == "exotic"]
    condition <- paste0("$", idx, " == \"", filters$exotic, "\"",
                        collapse = " || ")
    filter_strings$exotic <- str_interp(awk_if, list(condition = condition))
  }
  # complete checklists only
  if (filters$complete) {
    idx <- col_idx$index[col_idx$id == "complete"]
    condition <- str_interp("$${idx} == 1", list(idx = idx))
    filter_strings$complete <- str_interp(awk_if, list(condition = condition))
  } else {
    filter_strings$complete <- ""
  }
  # observer filter
  if (length(filters$observer) == 0) {
    filter_strings$observer_array <- ""
    filter_strings$observer <- ""
  } else {
    # generate list
    observer_list <- paste(filters$observer, collapse = "\t")
    observer_array <- "
    split(\"%s\", observerValues, \"\t\")
    for (i in observerValues) observers[observerValues[i]] = 1"
    filter_strings$observer_array <- sprintf(observer_array, observer_list)
    
    # check in list
    idx <- col_idx$index[col_idx$id == "observer"]
    condition <- paste0("$", idx, " in observers")
    filter_strings$observer <- str_interp(awk_if, list(condition = condition))
  }
  
  # generate awk script
  str_interp(awk_filter, filter_strings)
}

# awk script template
awk_filter <- "
BEGIN {
  FS = OFS = \"${sep}\"

  ${species_array}
  ${country_array}
  ${state_array}
  ${county_array}
  ${observer_array}
}
{
  keep = 1

  # filters
  ${species}
  ${country}
  ${state}
  ${county}
  ${bcr}
  ${bbox}
  ${year_substr}
  ${year}
  ${date_substr}
  ${date}
  ${time}
  ${last_edited}
  ${protocol}
  ${project}
  ${duration}
  ${distance}
  ${breeding}
  ${exotic}
  ${complete}
  ${observer}

  # keeps header
  if (NR == 1) {
    keep = 1
  }

  if (keep == 1) {
    print ${select}
  }
}
"

awk_if <- "
  if (keep == 1 && (${condition})) {
    keep = 1
  } else {
    keep = 0
  }
"


================================================
FILE: R/auk-get-awk-path.R
================================================
#' OS specific path to AWK executable
#'
#' Return the OS specific path to AWK (e.g. `"C:/cygwin64/bin/gawk.exe"` or
#' `"/usr/bin/awk"`), or highlights if it's not installed. To manually set the
#' path to AWK, set the `AWK_PATH` environment variable in your `.Renviron`
#' file, which can be accomplished with the helper function
#' `auk_set_awk_path(path)`.
#'
#' @return Path to AWK or `NA` if AWK wasn't found.
#' @export
#' @family paths
#' @examples
#' auk_get_awk_path()
auk_get_awk_path <- function() {
  sysname <- tolower(Sys.info()[["sysname"]])

  # manually specified path
  if (Sys.getenv("AWK_PATH") != "") {
    awk <- Sys.getenv("AWK_PATH")
    awk_test <- tryCatch(
      list(result = system(paste(awk, "--version"),
                           intern = TRUE, ignore.stderr = TRUE)),
      error = function(e) list(result = NULL),
      warning = function(e) list(result = NULL)
    )
  } else if (sysname %in% c("darwin", "linux")) {
    # mac or linux
    # test and find path
    awk_test <- tryCatch(
      list(result = system("which awk", intern = TRUE, ignore.stderr = TRUE)),
      error = function(e) list(result = NULL),
      warning = function(e) list(result = NULL)
    )
    # set path
    awk <- awk_test$result
  } else if (sysname == "windows") {
    # cygwin or cygwin64?
    if (file.exists("C:/cygwin64/bin/gawk.exe")) {
      awk <- "C:/cygwin64/bin/gawk.exe"
    } else if (file.exists("C:/cygwin/bin/gawk.exe")) {
      awk <- "C:/cygwin/bin/gawk.exe"
    } else {
      return(NA_character_)
    }
    # test
    awk_test <- tryCatch(
      list(result = system(paste(awk, "--version"),
                           intern = TRUE, ignore.stderr = TRUE)),
      error = function(e) list(result = NULL),
      warning = function(e) list(result = NULL)
    )
  } else {
    return(NA_character_)
  }

  if (!is.null(awk_test$result)) {
    return(awk)
  } else {
    return(NA_character_)
  }
}


================================================
FILE: R/auk-get-ebd-path.R
================================================
#' Return EBD data path
#' 
#' Returns the environment variable `EBD_PATH`, which users are encouraged to 
#' set to the directory that stores the eBird Basic Dataset (EBD) text files.
#'
#' @return The path stored in the `EBD_PATH` environment variable.
#' @export
#' @family paths
#' @examples
#' auk_get_ebd_path()
auk_get_ebd_path <- function() {
  p <- Sys.getenv("EBD_PATH")
  if (p == "") {
    return(NA_character_)
  } else if (!dir.exists(p)) {
    warning("Directory specified by EBD_PATH does not exist.")
  }
  return(p)
}

================================================
FILE: R/auk-last-edited.R
================================================
#' Filter the eBird data by last edited date
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a range of last
#' edited dates. Last edited date is typically used to extract just new or
#' recently edited data. This function only defines the filter and, once all
#' filters have been defined, [auk_filter()] should be used to call AWK and
#' perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param date character or date; date range to filter by, provided either as a
#'   character vector in the format `"2015-12-31"` or a vector of Date objects.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_last_edited(date = c("2010-01-01", "2010-12-31"))
auk_last_edited <- function(x, date)  {
  UseMethod("auk_last_edited")
}

#' @export
auk_last_edited.auk_ebd <- function(x, date) {
  # checks
  assertthat::assert_that(
    length(date) == 2,
    is.character(date) || assertthat::is.date(date),
    date[1] <= date[2]
  )
  
  # check for last edit date column
  if (!"last_edited" %in% x$col_idx$id) {
    stop("Last edited data column missing from EBD")
  }
  if (!is.null(x$col_idx_sampling) && 
      !"last_edited" %in% x$col_idx_sampling$id) {
    stop("Last edited data column missing from sampling event data")
  }

  # convert to date object, then format as ISO standard date format
  date <- as.Date(date)
  date <- format(date, "%Y-%m-%d")

  assertthat::assert_that(
    all(!is.na(date)),
    date[1] <= date[2],
    date[1] >= "1850-01-01",
    date[2] >= "1850-01-01"
  )

  # define filter
  x$filters$last_edited <- date
  return(x)
}

#' @export
auk_last_edited.auk_sampling <- function(x, date) {
  auk_last_edited.auk_ebd(x, date)
}


================================================
FILE: R/auk-observer.R
================================================
#' Filter the eBird data by observer
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a set of
#' observer IDs This function only defines the filter and, once all filters have
#' been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param observer_id character or integer; observers to filter by. Observer IDs
#'   can be provided either as integer (e.g. 12345) or character with the "obsr" 
#'   prefix as they appear in the EBD (e.g. "obsr12345").
#'
#' @return An `auk_ebd` or `auk_sampling`` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_observer("obsr313215")
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_observer(ebd, observer = 313215)
auk_observer <- function(x, observer_id)  {
  UseMethod("auk_observer")
}

#' @export
auk_observer.auk_ebd <- function(x, observer_id) {
  if (is.character(observer_id)) {
    if (!all(stringr::str_detect(observer_id, "^obsr[0-9]+$"))) {
      stop("Invalid observer IDs detected, must be of form 'obsr12345'")
    }
  } else if (is_integer(observer_id)) {
    observer_id <- paste0("obsr", observer_id)
  } else {
    stop("observer_id must be a character or integer vector of valid IDs.")
  }
  observer_id <- tolower(observer_id)
  
  # add observer to filter list
  x$filters$observer <- observer_id
  return(x)
}

#' @export
auk_observer.auk_sampling <- function(x, observer_id) {
  auk_observer.auk_ebd(x, observer_id)
}

================================================
FILE: R/auk-package.R
================================================
#' `auk`: eBird Data Extraction and Processing in R
#'
#' Tools for extracting and processing eBird data from the eBird Basic Dataset 
#' (EBD).
#'
#' @keywords internal
"_PACKAGE"

## usethis namespace: start
#' @importFrom rlang .data
#' @importFrom stringr str_interp
## usethis namespace: end
NULL


================================================
FILE: R/auk-project.R
================================================
#' Filter the eBird data by project code
#'
#' Some eBird records are collected as part of a particular project (e.g. the
#' Virginia Breeding Bird Survey) and have an associated project code in the
#' eBird dataset (e.g. EBIRD_ATL_VA). This function only defines the filter and,
#' once all filters have been defined, [auk_filter()] should be used to call AWK
#' and perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param project character; project code to filter by (e.g. `"EBIRD_MEX"`).
#'   Multiple codes are accepted.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_project("EBIRD_MEX")
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_project(ebd, "EBIRD_MEX")
auk_project <- function(x, project)  {
  UseMethod("auk_project")
}

#' @export
auk_project.auk_ebd <- function(x, project) {
  # checks
  assertthat::assert_that(
    is.character(project),
    all(nchar(project) > 0)
  )
  
  # check for project column
  if (!"project" %in% x$col_idx$id) {
    stop("Project column missing from EBD")
  }
  if (!is.null(x$col_idx_sampling) && !"project" %in% x$col_idx_sampling$id) {
    stop("Project column missing from sampling event data")
  }
  
  # set filter list
  x$filters$project <- project
  return(x)
}

#' @export
auk_project.auk_sampling <- function(x, project) {
  auk_project.auk_ebd(x, project)
}


================================================
FILE: R/auk-protocol.R
================================================
#' Filter the eBird data by protocol
#'
#' Filter to just data collected following a specific search protocol:
#' stationary, traveling, or casual. This function only defines the filter and,
#' once all filters have been defined, [auk_filter()] should be used to call AWK
#' and perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param protocol character. Many protocols exist in the database, however, the
#'   most commonly used are:
#'   
#'   - Stationary
#'   - Traveling
#'   - Area
#'   - Incidental
#'   
#'   A complete list of valid protocols is contained within the vector 
#'   `valid_protocols` within this package. Multiple protocols are allowed at 
#'   the same time.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_protocol("Stationary")
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_protocol(ebd, "Stationary")
auk_protocol <- function(x, protocol)  {
  UseMethod("auk_protocol")
}

#' @export
auk_protocol.auk_ebd <- function(x, protocol) {
  assertthat::assert_that(
    all(protocol %in% auk::valid_protocols)
  )
  
  # set filter list
  x$filters$protocol <- protocol
  return(x)
}

#' @export
auk_protocol.auk_sampling <- function(x, protocol) {
  auk_protocol.auk_ebd(x, protocol)
}

================================================
FILE: R/auk-rollup.R
================================================
#' Roll up eBird taxonomy to species
#' 
#' The eBird Basic Dataset (EBD) includes both true species and every other
#' field-identifiable taxon that could be relevant for birders to report. This 
#' includes taxa not identifiable to a species (e.g. hybrids) and taxa reported
#' below the species level (e.g. subspecies). This function produces a list of 
#' observations of true species, by removing the former and rolling the latter 
#' up to the species level. In the resulting EBD data.frame, 
#' `category` will be `"species"` for all records and the subspecies fields will 
#' be dropped. By default, [read_ebd()] calls `ebd_rollup()` when importing an 
#' eBird data file.
#'
#' @param x data.frame; data frame of eBird data, typically as imported by
#'   [read_ebd()]
#' @param drop_higher logical; whether to remove taxa above species during the 
#'   rollup process, e.g. "spuhs" like "duck sp.".
#'   
#' @details When rolling observations up to species level the observed counts
#'   are summed across any taxa that resolve to the same species. However, if
#'   any of these taxa have a count of "X" (i.e. the observer did not enter a
#'   count), then the rolled up record will get an "X" as well. For example, if 
#'   an observer saw 3 Myrtle and 2 Audubon's Warblers, this will roll up to 5 
#'   Yellow-rumped Warblers. However, if an "X" was entered for Myrtle, this 
#'   would roll up to "X" for Yellow-rumped Warbler.
#'   
#'   The eBird taxonomy groups taxa into eight different categories. These 
#'   categories, and the way they are treated by [auk_rollup()] are as follows:
#'   
#'   - **Species:** e.g., Mallard. Combined with lower level taxa if present on 
#'   the same checklist.
#'   - **ISSF or Identifiable Sub-specific Group:** Identifiable subspecies or
#'   group of subspecies, e.g., Mallard (Mexican). Rolled-up to species level.
#'   - **Intergrade:** Hybrid between two ISSF (subspecies or subspecies
#'   groups), e.g., Mallard (Mexican intergrade. Rolled-up to species level.
#'   - **Form:** Miscellaneous other taxa, including recently-described species
#'   yet to be accepted or distinctive forms that are not universally accepted
#'   (Red-tailed Hawk (Northern), Upland Goose (Bar-breasted)). If the checklist
#'   contains multiple taxa corresponding to the same species, the lower level
#'   taxa are rolled up, otherwise these records are left as is.
#'   - **Spuh:**  Genus or identification at broad level -- e.g., duck sp.,
#'   dabbling duck sp.. Dropped by `auk_rollup()`.
#'   - **Slash:** Identification to Species-pair e.g., American Black
#'   Duck/Mallard). Dropped by `auk_rollup()`.
#'   - **Hybrid:** Hybrid between two species, e.g., American Black Duck x
#'   Mallard (hybrid). Dropped by `auk_rollup()`.
#'   - **Domestic:** Distinctly-plumaged domesticated varieties that may be
#'   free-flying (these do not count on personal lists) e.g., Mallard (Domestic
#'   type). Dropped by `auk_rollup()`.
#'   
#'   The rollup process is based on the eBird taxonomy, which is updated once a
#'   year in August. The `auk` package includes a copy of the eBird taxonomy,
#'   current at the time of release; however, if the EBD and `auk` versions are
#'   not aligned, you may need to explicitly specify which version of the
#'   taxonomy to use, in which case the eBird API will be queried to get the
#'   correct version of the taxonomy.
#'   
#' @return A data frame of the eBird data with taxonomic rollup applied.
#' @references Consult the [eBird taxonomy](https://ebird.org/science/use-ebird-data/the-ebird-taxonomy) 
#'   page for further details.
#' @export
#' @family pre
#' @examples
#' # get the path to the example data included in the package
#' # in practice, provide path to ebd, e.g. f <- "data/ebd_relFeb-2018.txt
#' f <- system.file("extdata/ebd-rollup-ex.txt", package = "auk")
#' # read in data without rolling up
#' ebd <- read_ebd(f, rollup = FALSE)
#' # rollup
#' ebd_ru <- auk_rollup(ebd)
#' # keep higher taxa
#' ebd_higher <- auk_rollup(ebd, drop_higher = FALSE)
#' 
#' # all taxa not identifiable to species are dropped
#' unique(ebd$category)
#' unique(ebd_ru$category)
#' unique(ebd_higher$category)
#' 
#' # yellow-rump warbler subspecies rollup
#' library(dplyr)
#' # without rollup, there are multiple observations per checklists
#' ebd |>
#'   filter(common_name == "Yellow-rumped Warbler") |>
#'   select(checklist_id, category, common_name, subspecies_common_name, 
#'          observation_count)
#' # with rollup, they have been combined
#' ebd_ru |>
#'   filter(common_name == "Yellow-rumped Warbler") |>
#'   select(checklist_id, category, common_name, observation_count)
auk_rollup <- function(x, drop_higher = TRUE) {
  assertthat::assert_that(
    is.data.frame(x),
    "scientific_name" %in% names(x)
  )
  
  # return as is if already run
  if (isTRUE(attr(x, "rollup"))) {
    return(x)
  }
  
  # has auk_unique been applied?
  if ("checklist_id" %in% names(x)) {
    cid <- "checklist_id"
  } else {
    cid <- "sampling_event_identifier"
  }
  
  # get the ebird taxonomy version
  tax_full <- auk::ebird_taxonomy
  
  # remove anything not identifiable to a species
  if (drop_higher) {
    include <- "species"
  } else {
    include <- c("species", "slash", "spuh", "hybrid")
  }
  # include forms that don't roll up to a species
  # these are mostly undescribed species
  undesc <- dplyr::filter(tax_full,
                          .data$category == "form",
                          is.na(.data$report_as))
  tax <- dplyr::filter(tax_full, .data$category %in% include)
  tax <- rbind(tax, undesc)
  tax <- dplyr::select(tax, "scientific_name",
                       taxonomic_order_tax = "taxonomic_order",
                       taxon_concept_id_tax = "taxon_concept_id")
  # store taxa before filtering
  species_prefilter <- unique(x$scientific_name)
  x <- dplyr::inner_join(x, tax, by = "scientific_name")

  # identify which taxa were removed
  species_after <- unique(x$scientific_name)
  removed_species <- setdiff(species_prefilter, species_after)
  # exclude taxa that were intentionally removed, e.g. because they can't be
  # identified to species
  tax_dropped <- dplyr::filter(tax_full, !.data$category %in% include)
  removed_species <- setdiff(removed_species, tax_dropped$scientific_name)

  # if taxa were removed, print a warning
  if (length(removed_species) > 0) {
    warning_message <- paste(
      "Removed the following species due to invalid taxonomy:\n",
      paste(removed_species, collapse = ", "),
      "\n\nIf taxonomy was recently updated, try updating the package:",
      "\n- Run this command in R: install.packages('auk')",
      "\n or install the latest version from GitHub: remotes::install_github('CornellLabofOrnithology/auk')"
    )

    warning(warning_message, call. = FALSE)
  } 
  
  if (nrow(x) == 0) {
    # if all species were removed, return an empty table
    if ("subspecies_common_name" %in% names(x)) {
      x$subspecies_common_name <- NULL
    }
    if ("subspecies_scientific_name" %in% names(x)) {
      x$subspecies_scientific_name <- NULL
    }
    attr(x, "rollup") <- TRUE
    return(dplyr::as_tibble(x))
  }
  
  # summarize species for cases where multiple subspecies reported on same list
  sp <- dplyr::select(x, dplyr::all_of(c(cid, "scientific_name", "observation_count")))
  suppressWarnings({
    sp$count <- as.integer(sp$observation_count)
  })
  sp <- dplyr::group_by(sp, 
                        dplyr::across(dplyr::all_of(c(cid, "scientific_name"))))
  sp <- dplyr::summarise(sp, count = sum(.data$count), .groups = "drop")
  sp <- dplyr::mutate(sp,
                      count = as.character(.data$count),
                      count = dplyr::coalesce(.data$count, "X"))
  
  # drop any duplicate species records
  x <- dplyr::group_by(x, 
                       dplyr::across(dplyr::all_of(c(cid, "scientific_name"))))
  # give precedence to true species records
  x <- dplyr::slice_min(x, n = 1, order_by = .data$taxonomic_order_tax, 
                        with_ties = FALSE)
  x <- dplyr::ungroup(x)
  
  # update counts with summary
  x <- dplyr::inner_join(x, sp, by = c(cid, "scientific_name"))
  x <- dplyr::mutate(x,
                     observation_count = .data$count,
                     taxonomic_order = .data$taxonomic_order_tax,
                     taxon_concept_id = .data$taxon_concept_id_tax)
  x <- dplyr::select(x, -"count", -"taxonomic_order_tax", -"taxon_concept_id_tax")
  
  # drop subspecies fields, set category to species
  if ("category" %in% names(x)) {
    x$category <- ifelse(x$category %in% include, x$category, "species")
  }
  if ("subspecies_common_name" %in% names(x)) {
    x$subspecies_common_name <- NULL
  }
  if ("subspecies_scientific_name" %in% names(x)) {
    x$subspecies_scientific_name <- NULL
  }
  
  # attribute flag
  attr(x, "rollup") <- TRUE
  dplyr::as_tibble(x)
}


================================================
FILE: R/auk-sampling.R
================================================
#' Reference to eBird sampling event file
#'
#' Create a reference to an eBird sampling event file in preparation for
#' filtering using AWK. For working with the sightings data use `auk_ebd()`,
#' only use `auk_sampling()` if you intend to only work with checklist-level
#' data.
#'
#' @param file character; input sampling event data file, which contains 
#'   checklist data from eBird.
#' @param sep character; the input field separator, the eBird data are tab
#'   separated so this should generally not be modified. Must only be a single
#'   character and space delimited is not allowed since spaces appear in many of
#'   the fields.
#'
#' @details eBird data can be downloaded as a tab-separated text file from the
#'   [eBird website](http://ebird.org/ebird/data/download) after submitting a
#'   request for access. In the eBird Basic Dataset (EBD) each row corresponds 
#'   to a observation of a single bird species on a single checklist, while the 
#'   sampling event data file contains a single row for every checklist. This 
#'   function creates an R object to reference only the sampling data.
#'
#' @return An `auk_sampling` object storing the file reference and the desired
#'   filters once created with other package functions.
#' @export
#' @family objects
#' @examples
#' # get the path to the example data included in the package
#' # in practice, provide path to the sampling event data
#' # e.g. f <- "data/ebd_sampling_relFeb-2018.txt"
#' f <- system.file("extdata/zerofill-ex_sampling.txt", package = "auk")
#' auk_sampling(f)
auk_sampling <- function(file, sep = "\t") {
  # checks
  assertthat::assert_that(
    assertthat::is.string(sep), nchar(sep) == 1, sep != " "
  )
  file <- ebd_file(file)
  # read header rows
  header <- tolower(get_header(file, sep))
  header <- stringr::str_replace_all(header, "_", " ")
  col_idx <- data.frame(id = NA_character_, 
                        name = header, 
                        index = seq_along(header),
                        stringsAsFactors = FALSE)
  
  # check column name for protocol column
  protocol_col_name <- "protocol name"
  if (!protocol_col_name %in% header) {
    protocol_col_name <- "protocol type"
  }
  # check column name for project column
  project_col_name <- "project names"
  if (!project_col_name %in% header) {
    project_col_name <- "project code"
  }
  
  # ensure key columns are present
  mandatory <- c("country code", "state code",
                 "latitude", "longitude",
                 "observation date", "time observations started",
                 protocol_col_name,
                 "duration minutes", "effort distance km",
                 "all species reported",
                 "sampling event identifier", "group identifier")
  col_miss <- mandatory[!(mandatory %in% header)]
  if (length(col_miss) > 0) {
    m <- sprintf("Required columns missing from the sampling file:\n\t%s",
                 paste(col_miss, collapse = "\n\t"))
    stop(m)
  }
  
  # identify columns required for filtering
  filter_cols <- data.frame(
    id = c("country", "state", "county", "bcr", 
           "lat", "lng",
           "date", "time", "last_edited",
           "protocol", "project", 
           "duration", "distance", 
           "complete",
           "observer"),
    name = c("country code", "state code", "county code", "bcr code",
             "latitude", "longitude",
             "observation date", "time observations started",
             "last edited date", 
             protocol_col_name, project_col_name,
             "duration minutes", "effort distance km",
             "all species reported",
             "observer id"),
    stringsAsFactors = FALSE)
  filter_cols <- filter_cols[filter_cols$name %in% col_idx$name, ]
  col_idx$id[match(filter_cols$name, col_idx$name)] <- filter_cols$id
  
  # output
  structure(
    list(
      file = normalizePath(file),
      output = NULL,
      col_idx = col_idx,
      filters = list(
        country = character(),
        state = character(),
        county = character(),
        bbox = numeric(),
        year = integer(),
        date = character(),
        time = character(),
        last_edited = character(),
        protocol = character(), 
        project = character(),
        duration = numeric(),
        distance = numeric(),
        complete = FALSE,
        observer = character()
      )
    ),
    class = "auk_sampling"
  )
}

#' @export
print.auk_sampling <- function(x, ...) {
  cat("Input \n")
  cat(paste("  Sampling events:", x$file, "\n"))
  cat("\n")
  
  cat("Output \n")
  if (is.null(x$output)) {
    cat("  Filters not executed\n")
  } else {
    cat(paste("  Sampling events:", x$output, "\n"))
  }
  cat("\n")
  
  cat("Filters \n")
  # country filter
  cat("  Countries: ")
  if (length(x$filters$country) == 0) {
    cat("all")
  } else if (length(x$filters$country) <= 10) {
    cat(paste(x$filters$country, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$country), " countries"))
  }
  cat("\n")
  # state filter
  cat("  States: ")
  if (length(x$filters$state) == 0) {
    cat("all")
  } else if (length(x$filters$state) <= 10) {
    cat(paste(x$filters$state, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$state), " states"))
  }
  cat("\n")
  # state filter
  cat("  Counties: ")
  if (length(x$filters$county) == 0) {
    cat("all")
  } else if (length(x$filters$county) <= 10) {
    cat(paste(x$filters$county, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$county), " counties"))
  }
  cat("\n")
  # bbox filter
  cat("  Bounding box: ")
  e <- round(x$filters$bbox, 1)
  if (length(e) == 0) {
    cat("full extent")
  } else {
    cat(paste0("Lon ", e[1], " - ", e[3], "; "))
    cat(paste0("Lat ", e[2], " - ", e[4]))
  }
  cat("\n")
  # year filter
  cat("  Years: ")
  if (length(x$filters$year) == 0) {
    cat("all")
  } else if (length(x$filters$year) <= 10) {
    cat(paste(x$filters$year, collapse = ", "))
  } else {
    cat(paste0(length(x$filters$year), " years"))
  }
  cat("\n")
  # date filter
  cat("  Date: ")
  if (length(x$filters$date) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$date[1], " - ", x$filters$date[2]))
  }
  cat("\n")
  # time filter
  cat("  Start time: ")
  if (length(x$filters$time) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$time[1], "-", x$filters$time[2]))
  }
  cat("\n")
  # last edited date filter
  cat("  Last edited date: ")
  if (length(x$filters$last_edited) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$last_edited[1], " - ", x$filters$last_edited[2]))
  }
  cat("\n")
  # protocol filter
  cat("  Protocol: ")
  if (length(x$filters$protocol) == 0) {
    cat("all")
  } else {
    cat(paste(x$filters$protocol, collapse = ", "))
  }
  cat("\n")
  # project filter
  cat("  Project code: ")
  if (length(x$filters$project) == 0) {
    cat("all")
  } else {
    cat(paste(x$filters$project, collapse = ", "))
  }
  cat("\n")
  # duration filter
  cat("  Duration: ")
  if (length(x$filters$duration) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$duration[1], "-", x$filters$duration[2], " minutes"))
  }
  cat("\n")
  # distance filter
  cat("  Distance travelled: ")
  if (length(x$filters$distance) == 0) {
    cat("all")
  } else {
    cat(paste0(x$filters$distance[1], "-", x$filters$distance[2], " km"))
  }
  cat("\n")
  # complete checklists only
  cat("  Complete checklists only: ")
  if (x$filters$complete) {
    cat("yes")
  } else {
    cat("no")
  }
  cat("\n")
  return(invisible(x))
}


================================================
FILE: R/auk-select.R
================================================
#' Select a subset of columns
#' 
#' Select a subset of columns from the eBird Basic Dataset (EBD) or the sampling 
#' events file. Subsetting the columns can significantly decrease file size.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param select character; a character vector specifying the names of the
#'   columns to select. Columns should be as they appear in the header of the
#'   EBD; however, names are not case sensitive and spaces may be replaced by
#'   underscores, e.g. `"COMMON NAME"`, `"common name"`, and `"common_NAME"` are
#'   all valid.
#' @param file character; output file.
#' @param sep character; the input field separator, the eBird file is tab
#'   separated by default. Must only be a single character and space delimited
#'   is not allowed since spaces appear in many of the fields.
#' @param overwrite logical; overwrite output file if it already exists
#'
#' @return Invisibly returns the filename of the output file.
#' @export
#' @family text
#' @examples
#' \dontrun{
#' # select a minimal set of columns
#' out_file <- tempfile()
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' cols <- c("latitude", "longitude",
#'           "group identifier", "sampling event identifier", 
#'           "scientific name", "observation count",
#'           "observer_id")
#' selected <- auk_select(ebd, select = cols, file = out_file)
#' str(read_ebd(selected))
#' }
auk_select <- function(x, select, file, sep = "\t", overwrite = FALSE) {
  UseMethod("auk_select")
}

#' @export
auk_select.auk_ebd <- function(x, select, file, sep = "\t", overwrite = FALSE) {
  # checks
  awk_path <- auk_get_awk_path()
  assertthat::assert_that(
    is.character(select),
    assertthat::is.string(file)
  )
  if (!dir.exists(dirname(file))) {
    stop("Output directory doesn't exist.")
  }
  if (!overwrite && file.exists(file)) {
    stop("Output file already exists, use overwrite = TRUE.")
  }
  file <- normalizePath(file, winslash = "/", mustWork = FALSE)
  # selected columns
  select <- tolower(select)
  select <- stringr::str_replace_all(select, "_", " ")
  found <- select %in% x$col_idx$name
  if (!all(found)) {
    col_miss <- paste(select[!found], collapse = ", ")
    stop("Selected variable not found in header: \n\t ", col_miss)
  }
  # certain columns must be kept
  must_keep <- c("group identifier", "sampling event identifier",
                 "scientific name", "observation count")
  must_keep <- intersect(must_keep, x$col_idx$name)
  if (!all(must_keep %in% select)) {
    m <- paste("The following columns must be retained:",
               paste(setdiff(must_keep, select), collapse = ", "))
    stop(m)
  }
  # find column numbers
  idx <- x$col_idx$index[x$col_idx$name %in% select]
  select_cols <- paste0("$", idx, collapse = ", ")
  # generate awk script
  awk_script <- stringr::str_interp(awk_select, 
                                    list(sep = sep, select = select_cols))
  # run
  exit_code <- system2(awk_path,
                       args = paste0("'", awk_script, "' '", x$file, "'"),
                       stdout = file, stderr = FALSE)
  if (exit_code != 0) {
    stop("Error running AWK command.")
  }
  return(invisible(file))
}

#' @export
auk_select.auk_sampling <- function(x, select, file, sep = "\t",
                                    overwrite = FALSE) {
  return(auk_select.auk_ebd(x, select, file, sep, overwrite))
}

# awk script template
awk_select <- "
BEGIN {
FS = \"${sep}\"
OFS = \"${sep}\"
}
{
  print ${select}
}
"


================================================
FILE: R/auk-set-awk-path.R
================================================
#' Set a custom path to AWK executable
#' 
#' If AWK has been installed in a non-standard location, the environment
#' variable `AWK_PATH` must be set to specify the location of the executable.
#' Use this function to set `AWK_PATH` in your .Renviron file. **Most users
#' should NOT set `AWK_PATH`, only do so if you have installed AWK in
#' non-standard location and `auk` cannot find it.** This function first looks
#' for for an .Renviron location defined by `R_ENVIRON_USER`, then defaults to
#' ~/.Renviron.
#'
#' @param path character; path to the AWK executable on your system, e.g. 
#'   `"C:/cygwin64/bin/gawk.exe"` or `"/usr/bin/awk"`.
#' @param overwrite logical; should the existing `AWK_PATH` be overwritten if it
#'   has already been set in .Renviron.
#'
#' @return Edits .Renviron, sets `AWK_PATH` for the current session, then
#'   returns the EBD path invisibly.
#' @export
#' @family paths
#' @examples
#' \dontrun{
#' auk_set_awk_path("/usr/bin/awk")
#' }
auk_set_awk_path <- function(path, overwrite = FALSE) {
  assertthat::assert_that(
    assertthat::is.string(path),
    file.exists(path)
  )
  path <- normalizePath(path, winslash = "/", mustWork = TRUE)
  # make sure awk executable is there
  awk_test <- tryCatch(
    list(result = system(paste(path, "--version"),
                         intern = TRUE, ignore.stderr = TRUE)),
    error = function(e) list(result = NULL),
    warning = function(e) list(result = NULL)
  )
  if (is.null(awk_test$result)) {
    stop("Specified AWK_PATH doesn't contain a valid AWK executable.")
  }
  
  # find .Renviron
  renv_path <- renv_path <- renv_file_path()
  renv_lines <- readLines(renv_path)
  
  # look for existing entry, remove if overwrite = TRUE
  renv_exists <- grepl("^AWK_PATH[[:space:]]*=.*", renv_lines)
  if (any(renv_exists)) {
    if (overwrite) {
      # drop existing
      writeLines(renv_lines[!renv_exists], renv_path)
    } else {
      stop(
        "AWK_PATH already set, use overwrite = TRUE to overwite existing path."
      )
    }
  }
  # set path in .Renviron
  write(paste0("AWK_PATH='", path, "'\n"), renv_path, append = TRUE)
  message(paste("AWK_PATH set to", path))
  # set AWK_PATH for this session, so user doesn't have to reload
  Sys.setenv(AWK_PATH = path)
  invisible(path)
}

================================================
FILE: R/auk-set-ebd-path.R
================================================
#' Set the path to EBD text files
#' 
#' Users of `auk` are encouraged to set the path to the directory containing the
#' eBird Basic Dataset (EBD) text files in the `EBD_PATH` environment variable.
#' All functions referencing the EBD or sampling event data files will check in
#' this directory to find the files, thus avoiding the need to specify the full
#' path every time. This will increase the portability of your code. Use this
#' function to set `EBD_PATH` in your .Renviron file; it is also possible to
#' manually edit the file. This function first looks for for an .Renviron
#' location defined by `R_ENVIRON_USER`, then defaults to ~/.Renviron.
#'
#' @param path character; directory where the EBD text files are stored, e.g. 
#'   `"/home/matt/ebd"`.
#' @param overwrite logical; should the existing `EBD_PATH` be overwritten if it
#'   has already been set in .Renviron.
#'
#' @return Edits .Renviron, sets `EBD_PATH` for the current session, then
#'   returns the EBD path invisibly.
#' @export
#' @family paths
#' @examples
#' \dontrun{
#' auk_set_ebd_path("/home/matt/ebd")
#' }
auk_set_ebd_path <- function(path, overwrite = FALSE) {
  assertthat::assert_that(
    assertthat::is.string(path),
    dir.exists(path)
  )
  path <- normalizePath(path, winslash = "/", mustWork = TRUE)
  
  # find .Renviron
  renv_path <- renv_file_path()
  renv_lines <- readLines(renv_path)
  
  # look for existing entry, remove if overwrite = TRUE
  renv_exists <- grepl("^EBD_PATH[[:space:]]*=.*", renv_lines)
  if (any(renv_exists)) {
    if (overwrite) {
      # drop existing
      writeLines(renv_lines[!renv_exists], renv_path)
    } else {
      stop(
        "EBD_PATH already set, use overwrite = TRUE to overwite existing path."
      )
    }
  }
  # set path in .Renviron
  write(paste0("EBD_PATH='", path, "'\n"), renv_path, append = TRUE)
  message(paste("EBD_PATH set to", path))
  # set EBD_PATH for this session, so user doesn't have to reload
  Sys.setenv(EBD_PATH = path)
  invisible(path)
}

renv_file_path <- function() {
  stored_path <- Sys.getenv("R_ENVIRON_USER")
  if (stored_path != "") {
    renv <- stored_path
  } else {
    renv <- path.expand(file.path("~", ".Renviron"))
  }
  
  if (!file.exists(renv)) {
    file.create(renv)
  }
  return(renv)
}

================================================
FILE: R/auk-species.R
================================================
#' Filter the eBird data by species
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on species. This
#' function only defines the filter and, once all filters have been defined,
#' [auk_filter()] should be used to call AWK and perform the filtering.
#'
#' @param x `auk_ebd` object; reference to object created by [auk_ebd()].
#' @param species character; species to filter by, provided as scientific or
#'   English common names, or a mixture of both. These names must match the
#'   official eBird Taxomony ([ebird_taxonomy]).
#' @param taxonomy_version integer; the version (i.e. year) of the taxonomy. In
#'   most cases, this should be left empty to use the version of the taxonomy
#'   included in the package. See [get_ebird_taxonomy()].
#' @param replace logical; multiple calls to `auk_species()` are additive, 
#'   unless `replace = FALSE`, in which case the previous list of species to 
#'   filter by will be removed and replaced by that in the current call.
#'   
#' @details The list of species is checked against the eBird taxonomy for
#'   validity. This taxonomy is updated once a year in August. The `auk` package 
#'   includes a copy of the eBird taxonomy, current at the time of release; 
#'   however, if the EBD and `auk` versions are not aligned, you may need to 
#'   explicitly specify which version of the taxonomy to use, in which case 
#'   the eBird API will be queried to get the correct version of the taxonomy. 
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # common and scientific names can be mixed
#' species <- c("Canada Jay", "Pluvialis squatarola")
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_species(species)
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_species(ebd, species)
auk_species <- function(x, species, taxonomy_version, replace = FALSE)  {
  UseMethod("auk_species")
}

#' @export
auk_species.auk_ebd <- function(x, species, taxonomy_version, replace = FALSE) {
  # checks
  assertthat::assert_that(
    is.character(species),
    assertthat::is.flag(replace)
  )
  if (missing(taxonomy_version)) {
    taxonomy_version <- auk_version()$taxonomy_version
  } else {
    stopifnot(is_integer(taxonomy_version), length(taxonomy_version) == 1)
  }
  v <- auk_ebd_version(x, check_exists = FALSE)$taxonomy_version
  if (!is.na(v) && (taxonomy_version != v || v == 2020)) {
    m <- paste0("Based on the EBD filename, it appears you should use ",
                "taxonomy_version = %i")
    warning(sprintf(m, v))
  }
  species_lookup <- ebird_species(species, type = "all", 
                                  taxonomy_version = taxonomy_version)

  # check all species names are valid
  species_clean <- species_lookup$scientific_name
  if (any(is.na(species_clean))) {
    stop(
      paste0("The following species were not found in the eBird taxonomy: \n\t",
             paste(species[is.na(species_clean)], collapse = ", "))
    )
  }
  
  # check all species names are valid
  sub_spp <- species_lookup$category %in% c("issf", "form", "intergrade")
  if (any(sub_spp)) {
    stop(
      paste0("Cannot extract taxa identified below species.\n\t",
             "Remove the following taxa or replace with species: \n\t",
             paste(species[sub_spp], collapse = ", "))
    )
  }
  
  # add species to filter list
  if (replace) {
    x$filters$species <- species_clean
  } else {
    x$filters$species <- c(x$filters$species, species_clean)
  }
  x$filters$species <- sort(unique(x$filters$species))
  return(x)
}


================================================
FILE: R/auk-split.R
================================================
#' Split an eBird data file by species
#' 
#' Given an eBird Basic Dataset (EBD) and a list of species, split the file into 
#' multiple text files, one for each species. This function is typically used 
#' after [auk_filter()] has been applied if the resulting file is too large to 
#' be read in all at once.
#'
#' @param file character; input file.
#' @param species character; species to filter and split by, provided as
#'   scientific or English common names, or a mixture of both. These names must
#'   match the official eBird Taxomony ([ebird_taxonomy]).
#' @param prefix character; a file and directory prefix. For example, if 
#'   splitting by species "A" and "B" and `prefix = "data/ebd_"`, the resulting 
#'   files will be "data/ebd_A.txt" and "data/ebd_B.txt".
#' @param taxonomy_version integer; the version (i.e. year) of the taxonomy. In
#'   most cases, this should be left empty to use the version of the taxonomy
#'   included in the package. See [get_ebird_taxonomy()].
#' @param sep character; the input field separator, the eBird file is tab
#'   separated by default. Must only be a single character and space delimited
#'   is not allowed since spaces appear in many of the fields.
#' @param overwrite logical; overwrite output files if they already exists.
#'   
#' @details The list of species is checked against the eBird taxonomy for
#'   validity. This taxonomy is updated once a year in August. The `auk` package 
#'   includes a copy of the eBird taxonomy, current at the time of release; 
#'   however, if the EBD and `auk` versions are not aligned, you may need to 
#'   explicitly specify which version of the taxonomy to use, in which case 
#'   the eBird API will be queried to get the correct version of the taxonomy.
#'
#' @return A vector of output filenames, one for each species.
#' @export
#' @family text
#' @examples
#' \dontrun{
#' species <- c("Canada Jay", "Cyanocitta stelleri")
#' # get the path to the example data included in the package
#' # in practice, provide path to a filtered ebd file
#' # e.g. f <- "data/ebd_filtered.txt
#' f <- system.file("extdata/ebd-sample.txt", package = "auk")
#' # output to a temporary directory for example
#' # in practice, provide the path to the output location
#' # e.g. prefix <- "output/ebd_"
#' prefix <- file.path(tempdir(), "ebd_")
#' species_files <- auk_split(f, species = species, prefix = prefix)
#' }
auk_split <- function(file, species, prefix, taxonomy_version, 
                      sep = "\t",
                      overwrite = FALSE) {
  awk_path <- auk_get_awk_path()
  if (is.na(awk_path)) {
    stop("auk_split() requires a valid AWK install.")
  }
  assertthat::assert_that(
    file.exists(file),
    is.character(species),
    missing(prefix) || assertthat::is.string(prefix),
    assertthat::is.string(sep), nchar(sep) == 1, sep != " ",
    assertthat::is.flag(overwrite)
  )
  file <- normalizePath(file, winslash = "/")
  
  # check all species names are valid and convert to scientific
  species_clean <- ebird_species(species, taxonomy_version = taxonomy_version)
  if (any(is.na(species_clean))) {
    stop(
      paste0("The following species were not found in the eBird taxonomy: \n\t",
             paste(species[is.na(species_clean)], collapse = ", "))
    )
  }
  if (length(species_clean) < 1) {
    stop("Provide at least 1 species to split on.")
  }
  
  # check output files
  if (missing(prefix)) {
    save_dir <- getwd()
    file_name <- ""
  } else if (grepl("/$", prefix)) {
    save_dir <- prefix
    file_name <- ""
  } else{
    save_dir <- dirname(prefix)
    file_name <- basename(prefix)
  }
  if (!dir.exists(save_dir)) {
    stop("Output directory doesn't exist.")
  }
  save_dir <- normalizePath(save_dir, winslash = "/", mustWork = FALSE)
  prefix <- file.path(save_dir, file_name)
  f_sp <- paste0(prefix,
                 stringr::str_replace_all(species_clean, "[^a-zA-Z]", "_"),
                 ".txt")
  for (f in f_sp) {
    if (file.exists(f)) {
      if (overwrite) {
        unlink(f_sp)
      } else {
        stop("Output file already exists, use overwrite = TRUE.")
      }
    }
  }
  
  # determine species column number
  header <- tolower(get_header(file, sep))
  sp_col <- which(header == "scientific name")
  stopifnot(length(sp_col) == 1)
  
  # copy in header rows
  header_row <- readLines(file, 1)
  for (f in f_sp) {
    writeLines(header_row, f)
  }
  
  # set up species filter
  sp_condition <- paste0("$", sp_col, " == \"", species_clean, "\"",
                         collapse = " || ")
  
  # construct awk command
  awk <- str_interp(awk_split,
                    list(sep = sep, col = sp_col, condition = sp_condition,
                         prefix = prefix))
  
  # run command
  exit_code <- system2(awk_path, args = paste0("'", awk, "' ", file), 
                       stderr = FALSE)
  if (exit_code == 0) {
    f_sp
  } else {
    exit_code
  }
}

awk_split <- "
BEGIN {
  FS = \"${sep}\"
  OFS = \"${sep}\"
}
{
  if (${condition}) {
    species = $${col}
    gsub(/[^a-zA-Z]/, \"_\", species)
    species = \"${prefix}\"species\".txt\"
    print >> species
    close (species)
  }
}
"

================================================
FILE: R/auk-state.R
================================================
#' Filter the eBird data by state
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a set of
#' states. This function only defines the filter and, once all filters have
#' been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param state character; states to filter by. eBird uses 4 to 6 character 
#'   state codes consisting of two parts, the 2-letter ISO country code and a 
#'   1-3 character state code, separated by a dash. For example, `"US-NY"` 
#'   corresponds to New York State in the United States. Refer to the data frame 
#'   [ebird_states] for look up state codes.
#' @param replace logical; multiple calls to `auk_state()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of states to
#'   filter by will be removed and replaced by that in the current call.
#' 
#' @details It is not possible to filter by both country and state, so calling 
#' `auk_state()` will reset the country filter to all countries, and vice versa.
#' 
#' This function can also work with on an `auk_sampling` object if the user only 
#' wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # state codes for a given country can be looked up in ebird_states
#' dplyr::filter(ebird_states, country == "Costa Rica")
#' # choose texas, united states and puntarenas, cost rica
#' states <- c("US-TX", "CR-P")
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_state(states)
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_state(ebd, states)
auk_state <- function(x, state, replace = FALSE)  {
  UseMethod("auk_state")
}

#' @export
auk_state.auk_ebd <- function(x, state, replace = FALSE) {
  # checks
  assertthat::assert_that(
    is.character(state),
    assertthat::is.flag(replace)
  )
  state <- toupper(state)
  
  # check codes are valid
  valid_codes <- state %in% auk::ebird_states$state_code
  if (!all(valid_codes)) {
    m <- paste0("The following state codes are not valid: \n\t",
                paste(state[!valid_codes], collapse =", "))
    stop(m)
  }
  
  # add states to filter list
  if (replace) {
    x$filters$state <- state
  } else {
    x$filters$state <- c(x$filters$state, state)
  }
  x$filters$state <- sort(unique(x$filters$state))
  x$filters$country <- character()
  x$filters$county <- character()
  return(x)
}

#' @export
auk_state.auk_sampling <- function(x, state, replace = FALSE) {
  auk_state.auk_ebd(x, state, replace)
}

================================================
FILE: R/auk-time.R
================================================
#' Filter the eBird data by checklist start time
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a range of start
#' times for the checklist. This function only defines the filter and, once all
#' filters have been defined, [auk_filter()] should be used to call AWK and
#' perform the filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param start_time character; 2 element character vector giving the range of 
#'   times in 24 hour format, e.g. `"06:30"` or `"16:22"`.
#' 
#' @details This function can also work with on an `auk_sampling` object if the 
#'   user only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # only keep checklists started between 6 and 8 in the morning
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_time(start_time = c("06:00", "08:00"))
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_time(ebd, start_time = c("06:00", "08:00"))
auk_time <- function(x, start_time)  {
  UseMethod("auk_time")
}

#' @export
auk_time.auk_ebd <- function(x, start_time) {
  # checks
  assertthat::assert_that(
    length(start_time) == 2,
    is.character(start_time)
  )
  # check for valid times
  if (!all(stringr::str_detect(start_time, 
                               "^([01]?\\d|2[0-3]):?([0-5]\\d)$"))) {
    stop("Invalid time format.")
  }

  # add optional 0 at start
  start_time <- paste0(ifelse(nchar(start_time) == 4, "0", ""), start_time)

  # check ordering of times makes sense
  assertthat::assert_that(start_time[1] <= start_time[2])

  # define filter
  x$filters$time <- start_time
  return(x)
}

#' @export
auk_time.auk_sampling <- function(x, start_time) {
  auk_time.auk_ebd(x, start_time)
}


================================================
FILE: R/auk-unique.R
================================================
#' Remove duplicate group checklists
#'
#' eBird checklists can be shared among a group of multiple observers, in which
#' case observations will be duplicated in the database. This functions removes
#' these duplicates from the eBird Basic Dataset (EBD) or the EBD sampling event
#' data (with `checklists_only = TRUE`), creating a set of unique bird
#' observations. This function is called automatically by [read_ebd()] and
#' [read_sampling()].
#'
#' @param x data.frame; the EBD data frame, typically as imported by
#'   [read_ebd()].
#' @param group_id character; the name of the group ID column.
#' @param checklist_id character; the name of the checklist ID column, each
#'   checklist within a group will get a unique value for this field. The record
#'   with the lowest `checklist_id` will be picked as the unique record within
#'   each group. In the output dataset, this field will be updated to have a 
#'   full list of the checklist IDs that went into this group checklist.
#' @param species_id character; the name of the column identifying species
#'   uniquely. This is required to ensure that removing duplicates is done
#'   independently for each species. Note that this will not treat sub-species
#'   independently and, if that behavior is desired, the user will have to
#'   generate a column uniquely identifying species and subspecies and pass that
#'   column's name to this argument.
#' @param observer_id character; the name of the column identifying the owner 
#'   of this instance of the group checklist. In the output dataset, the full 
#'   list of observer IDs will be stored (comma separated) in the new 
#'   `observer_id` field. The order of these IDs will match the order of the 
#'   comma separated checklist IDs.
#' @param checklists_only logical; whether the dataset provided only contains
#'   checklist information as with the sampling event data file. If this
#'   argument is `TRUE`, then the `species_id` argument is ignored and removing
#'   of duplicated records is done at the checklist level not the species level.
#'
#' @details This function chooses the checklist within in each that has the
#'   lowest value for the field specified by `checklist_id`. A new column is
#'   also created, `checklist_id`, whose value is the taken from the field
#'   specified in the `checklist_id` parameter for non-group checklists and from
#'   the field specified by the `group_id` parameter for grouped checklists.
#'   
#'   All the checklist and observer IDs for the checklists that comprise a given
#'   group checklist will be retained as a comma separated string ordered by 
#'   checklist ID.
#'
#' @return A data frame with unique observations, and an additional field,
#'   `checklist_id`, which is a combination of the sampling event and group IDs.
#' @export
#' @family pre
#' @examples
#' # read in an ebd file and don't automatically remove duplicates
#' f <- system.file("extdata/ebd-sample.txt", package = "auk")
#' ebd <- read_ebd(f, unique = FALSE)
#' # remove duplicates
#' ebd_unique <- auk_unique(ebd)
#' nrow(ebd)
#' nrow(ebd_unique)
auk_unique <- function(x,
                       group_id = "group_identifier",
                       checklist_id = "sampling_event_identifier",
                       species_id = "scientific_name",
                       observer_id = "observer_id",
                       checklists_only = FALSE) {
  # checks
  assertthat::assert_that(
    is.data.frame(x),
    assertthat::is.flag(checklists_only),
    assertthat::is.string(group_id),
    group_id %in% names(x),
    assertthat::is.string(checklist_id),
    checklist_id %in% names(x),
    assertthat::is.string(species_id),
    checklists_only || species_id %in% names(x),
    assertthat::is.string(observer_id),
    observer_id %in% names(x),
    # all id columns should be character vectors
    is.character(x[[group_id]]),
    is.character(x[[checklist_id]]),
    is.character(x[[observer_id]]),
    checklists_only || is.character(x[[species_id]]))
  
  # return as is if already run
  if (isTRUE(attr(x, "unique"))) {
    return(x)
  }
  
  # convert empty string groud_id to NA
  x[[group_id]][x[[group_id]] == ""] <- NA_integer_
  
  # identify and separate non-group records
  grouped <- !is.na(x[[group_id]])
  x_grouped <- x[grouped, ]

  # sort by sampling event id
  x_grouped <- x_grouped[order(x_grouped[[checklist_id]]), ]

  # identify grouping variables
  if (checklists_only) {
    cols <- group_id
  } else {
    cols <- c(species_id, group_id)
  }
  
  # generate list of checklist and observer ids
  ids <- dplyr::select(x_grouped, 
                       dplyr::all_of(c(cols, checklist_id, observer_id)))
  ids <- dplyr::group_by_at(ids, cols)
  ids <- dplyr::arrange_at(ids, checklist_id)
  ids <- dplyr::summarize(ids, 
                          .cid = paste(.data[[checklist_id]], collapse = ","),
                          .oid = paste(.data[[observer_id]], collapse = ","))
  ids <- dplyr::ungroup(ids)
  
  # add the collapsed ids
  x_grouped <- dplyr::inner_join(x_grouped, ids, by = cols)
  x_grouped[[checklist_id]] <- x_grouped$.cid
  x_grouped[[observer_id]] <- x_grouped$.oid
  x_grouped$.cid <- NULL
  x_grouped$.oid <- NULL
  
  # remove duplicated records, ensuring different species treated independently
  x_grouped <- x_grouped[!duplicated(x_grouped[, cols]), ]

  # set id field
  x$checklist_id <- x[[checklist_id]]
  x_grouped$checklist_id <- x_grouped[[group_id]]

  # only keep non-group or non-duplicated records
  x <- rbind(x[!grouped, ], x_grouped)

  # move id field to front
  x <- dplyr::select(x, "checklist_id", dplyr::everything())

  # attribute flag
  attr(x, "unique") <- TRUE
  
  dplyr::as_tibble(x)
}


================================================
FILE: R/auk-version.R
================================================
#' Versions of auk, the EBD, and the eBird taxonomy
#'
#' This package depends on the version of the EBD and on the eBird taxonomy. Use
#' this function to determine the currently installed version of `auk`, the 
#' version of the EBD that this `auk` version works with, and the version of the 
#' eBird taxonomy included in the packages. The EBD is update quarterly, in 
#' March, June, September, and December, while the taxonomy is updated annually 
#' in August or September. To ensure proper functioning, always use the latest 
#' version of the auk package and the EBD.
#'
#' @return A list with three elements:
#'   
#'   - `auk_version`: the version of `auk`, e.g. `"auk 0.4.1"`.
#'   - `ebd_version`: a date object specifying the release date of the EBD 
#'   version that this `auk` version is designed to work with.
#'   - `taxonomy_version`: the year of the taxonomy built in to this version of 
#'   `auk`, i.e. the one stored in [ebird_taxonomy].
#'   
#' @export
#' @family helpers
#' @examples
#' auk_version()
auk_version <- function() {
  list(auk_version = "auk 0.9.0",
       ebd_version = as.Date("2025-10-28", "%Y-%m-%d"), 
       taxonomy_version = 2025)
}


================================================
FILE: R/auk-year.R
================================================
#' Filter the eBird data to a set of years
#'
#' Define a filter for the eBird Basic Dataset (EBD) based on a set of
#' years. This function only defines the filter and, once all filters have
#' been defined, [auk_filter()] should be used to call AWK and perform the
#' filtering.
#'
#' @param x `auk_ebd` or `auk_sampling` object; reference to file created by 
#'   [auk_ebd()] or [auk_sampling()].
#' @param year integer; years to filter to.
#' @param replace logical; multiple calls to `auk_year()` are additive,
#'   unless `replace = FALSE`, in which case the previous list of years to
#'   filter by will be removed and replaced by that in the current call.
#' 
#' @details For filtering to a range of dates use `auk_date()`; however,
#'   sometimes the goal is to extract data for a given year or set of years, in
#'   which case `auk_year()` is simpler. In addition, `auk_year()` can be used
#'   to get data from discontiguous sets of years (e.g. 2010 and 2012, but not
#'   2011), which is not possible with `auk_date()`. Finally, `auk_year()` can
#'   be used in conjunction with `auk_date()` to extract data from a given range
#'   of dates within a set of years (see example below).
#'   
#'   This function can also work with on an `auk_sampling` object if the user
#'   only wishes to filter the sampling event data.
#'
#' @return An `auk_ebd` object.
#' @export
#' @family filter
#' @examples
#' # years to filter to
#' years <- c(2010, 2012)
#' # set up filter
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_year(year = years)
#'   
#' # alternatively, without pipes
#' ebd <- auk_ebd(system.file("extdata/ebd-sample.txt", package = "auk"))
#' auk_year(ebd, years)
#' 
#' # filter to may and june of 2010 and 2012
#' system.file("extdata/ebd-sample.txt", package = "auk") |>
#'   auk_ebd() |>
#'   auk_year(year = c(2010, 2012)) |> 
#'   auk_date(date = c("*-05-01", "*-06-30"))
auk_year <- function(x, year, replace = FALSE)  {
  UseMethod("auk_year")
}

#' @export
auk_year.auk_ebd <- function(x, year, replace = FALSE) {
  # checks
  assertthat::assert_that(
    is_integer(year),
    all(year %in% 1800:2100)
  )
  
  # add yeras to filter list
  if (replace) {
    x$filters$year <- year
  } else {
    x$filters$year <- c(x$filters$year, year)
  }
  x$filters$year <- sort(unique(x$filters$year))
  return(x)
}

#' @export
auk_year.auk_sampling <- function(x, year, replace = FALSE) {
  auk_year.auk_ebd(x, year, replace)
}

================================================
FILE: R/auk-zerofill.R
================================================
#' Read and zero-fill an eBird data file
#'
#' Read an eBird Basic Dataset (EBD) file, and associated sampling event data
#' file, to produce a zero-filled, presence-absence dataset. The EBD contains
#' bird sightings and the sampling event data is a set of all checklists, they
#' can be combined to infer absence data by assuming any species not reported on
#' a checklist was had a count of zero.
#'
#' @param x filename, `data.frame` of eBird observations, or `auk_ebd` object
#'   with associated output files as created by [auk_filter()]. If a filename is
#'   provided, it must point to the EBD and the `sampling_events` argument must
#'   point to the sampling event data file. If a `data.frame` is provided it
#'   should have been imported with [read_ebd()], to ensure the variables names
#'   have been set correctly, and it must have been passed through
#'   [auk_unique()] to ensure duplicate group checklists have been removed.
#' @param sampling_events character or `data.frame`; filename for the sampling
#'   event data or a `data.frame` of the same data. If a `data.frame` is
#'   provided it should have been imported with [read_sampling()], to ensure the
#'   variables names have been set correctly, and it must have been passed
#'   through [auk_unique()] to ensure duplicate group checklists have been
#'   removed.
#' @param species character; species to include in zero-filled dataset, provided
#'   as scientific or English common names, or a mixture of both. These names
#'   must match the official eBird Taxomony ([ebird_taxonomy]). To include all
#'   species, leave this argument blank.
#' @param taxonomy_version integer; the version (i.e. year) of the taxonomy. In
#'   most cases, this should be left empty to use the version of the taxonomy
#'   included in the package. See [get_ebird_taxonomy()].
#' @param collapse logical; whether to call `collapse_zerofill()` to return a
#'   data frame rather than an `auk_zerofill` object.
#' @param unique logical; should [auk_unique()] be run on the input data if it
#'   hasn't already.
#' @param rollup logical; should [auk_rollup()] be run on the input data if it
#'   hasn't already.
#' @param drop_higher logical; whether to remove taxa above species during the 
#'   rollup process, e.g. "spuhs" like "duck sp.". See [auk_rollup()].
#' @param complete logical; if `TRUE` (the default) all checklists are required 
#'   to be complete prior to zero-filling.
#' @param sep character; single character used to separate fields within a row.
#' @param ... additional arguments passed to methods.
#'
#' @details
#' `auk_zerofill()` generates an `auk_zerofill` object consisting of a list with
#' elements `observations` and `sampling_events`. `observations` is a data frame
#' giving counts and binary presence/absence data for each species.
#' `sampling_events` is a data frame with checklist level information. The two
#' data frames can be connected via the `checklist_id` field. This format is
#' efficient for storage since the checklist columns are not duplicated for each
#' species, however, working with the data often requires joining the two data
#' frames together.
#'
#' To return a data frame, set `collapse = TRUE`. Alternatively,
#' `zerofill_collapse()` generates a data frame from an `auk_zerofill` object,
#' by joining the two data frames together to produce a single data frame in
#' which each row provides both checklist and species information for a
#' sighting.
#' 
#' The list of species is checked against the eBird taxonomy for validity. This
#' taxonomy is updated once a year in August. The `auk` package includes a copy
#' of the eBird taxonomy, current at the time of release; however, if the EBD
#' and `auk` versions are not aligned, you may need to explicitly specify which
#' version of the taxonomy to use, in which case the eBird API will be queried
#' to get the correct version of the taxonomy.
#'
#' @return By default, an `auk_zerofill` object, or a data frame if `collapse =
#'   TRUE`.
#' @export
#' @family import
#' @examples
#' # read and zero-fill the ebd data
#' f_ebd <- system.file("extdata/zerofill-ex_ebd.txt", package = "auk")
#' f_smpl <- system.file("extdata/zerofill-ex_sampling.txt", package = "auk")
#' auk_zerofill(x = f_ebd, sampling_events = f_smpl)
#'
#' # use the species argument to only include a subset of species
#' auk_zerofill(x = f_ebd, sampling_events = f_smpl,
#'              species = "Collared Kingfisher")
#'
#' # to return a data frame use collapse = TRUE
#' ebd_df <- auk_zerofill(x = f_ebd, sampling_events = f_smpl, collapse = TRUE)
auk_zerofill <- function(x, ...) {
  UseMethod("auk_zerofill")
}

#' @export
#' @describeIn auk_zerofill EBD data frame.
auk_zerofill.data.frame <- function(x, sampling_events, 
                                    species, taxonomy_version,
                                    collapse = FALSE, unique = TRUE, 
                                    rollup = TRUE, drop_higher = TRUE,
                                    complete = TRUE, ...) {
  # checks
  assertthat::assert_that(
    is.data.frame(sampling_events),
    missing(species) || is.character(species),
    assertthat::is.flag(unique))
  
  # process species names
  # first check for scientific names
  if (!missing(species)) {
    # convert common names to scientific
    species_clean <- ebird_species(species, taxonomy_version = taxonomy_version)
    # check all species names are valid
    if (any(is.na(species_clean))) {
      stop(
        paste0("The following species were not found in the eBird taxonomy:",
               "\n\t",
               paste(species[is.na(species_clean)], collapse =", "))
      )
    }
  }
  
  # check that we only have complete checklists
  if (!all(sampling_events$all_species_reported)) {
    e <- paste0("Some checklists in sampling event data are not complete.\n",
                "Complete checklists are required for zero-filling.\n",
                "Try calling auk_complete() when filtering.")
    if (complete) {
      stop(e)
    } else {
      warning(e)
    }
  }
  
  # check that auk_unique has been run
  if (!isTRUE(attr(x, "unique"))) {
    if (!unique){
      stop("The EBD doesn't appear to have been run through auk_unique(). ",
           "Set unique = TRUE.")
    } else {
      x <- auk_unique(x)
    }
  }
  if (!isTRUE(attr(sampling_events, "unique"))) {
    if (!unique){
      stop(paste("The sampling events data doesn't appear to have been run",
                 "through auk_unique(). Set unique = TRUE."))
    } else {
      sampling_events <- auk_unique(sampling_events, checklists_only = TRUE)
    }
  }
  
  # check that auk_rollup has been run
  if (rollup && !isTRUE(attr(x, "rollup"))) {
    x <- auk_rollup(x, drop_higher = drop_higher)
  }
  
  # subset ebd to remove checklist level fields
  optional_fields <- c("breeding_code", "breeding_category", 
                       "behavior_code", "age_sex")
  species_cols <- c("checklist_id", "scientific_name", 
                    intersect(optional_fields, names(x)),
                    "observation_count")
  if (any(!species_cols %in% names(x))) {
    stop(
      paste0("The following fields must appear in the EBD: \n\t",
             paste(species_cols, collapse =", "))
    )
  }
  x <- dplyr::select(x, dplyr::all_of(species_cols))
  
  # ensure all checklist in ebd are in sampling file
  if (!all(x$checklist_id %in% sampling_events$checklist_id)) {
    stop("Some checklists in EBD are missing from sampling event data.")
  }
  
  # subset ebd by species
  if (!missing(species)) {
    in_ebd <- (species_clean %in% x$scientific_name)
    if (all(!in_ebd)) {
      stop("None of the provided species appear in the EBD.")
    } else if (any(!in_ebd)) {
      warning(
        paste0("The following species were not found in the EBD: \n\t",
               paste(species[!in_ebd], collapse =", "))
      )
    }
    species_clean <- species_clean[in_ebd]
    x <- x[x$scientific_name %in% species_clean, ]
  }
  
  # add presence absence column
  x$species_observed <- x$observation_count
  x$species_observed[x$species_observed == "X"] <- "1"
  x$species_observed <- (as.numeric(x$species_observed) >= 1)
  
  # remove absences that may have sneaked through
  # there shouldn't be any of these, but just in case...
  x <- x[x$species_observed, ]
  
  # fill in implicit missing values
  x <- tidyr::complete(
    x,
    checklist_id = sampling_events$checklist_id, .data$scientific_name,
    fill = list(observation_count = "0", species_observed = FALSE)
  )
  
  out <- structure(
    list(observations = dplyr::as_tibble(x),
         sampling_events = dplyr::as_tibble(sampling_events)),
    class = "auk_zerofill"
  )
  # return a data frame?
  if (collapse) {
    return(collapse_zerofill(out))
  } else {
    return(out)
  }
}

#' @export
#' @describeIn auk_zerofill Filename of EBD.
auk_zerofill.character <- function(x, sampling_events, 
                                   species, taxonomy_version,
                                   collapse = FALSE, unique = TRUE, 
                                   rollup = TRUE,  drop_higher = TRUE,
                                   complete = TRUE, sep = "\t", ...) {
  # checks
  assertthat::assert_that(
    assertthat::is.string(x), file.exists(x),
    assertthat::is.string(sampling_events), file.exists(sampling_events),
    missing(species) || is.character(species),
    assertthat::is.string(sep), nchar(sep) == 1, sep != " ")
  
  # read in the two files
  ebd <- read_ebd(x = x, sep = sep, unique = FALSE, rollup = FALSE)
  sed <- read_sampling(x = sampling_events, sep = sep, unique = FALSE)
  
  # pass on to df method
  auk_zerofill(x = ebd, sampling_events = sed, species = species, 
               collapse = collapse, unique = unique, complete = complete,
               rollup = rollup)
}

#' @export
#' @describeIn auk_zerofill `auk_ebd` object output from [auk_filter()]. Must
#'   have had a sampling event data file set in the original call to
#'   [auk_ebd()].
auk_zerofill.auk_ebd <- function(x, species, taxonomy_version,
                                 collapse = FALSE, unique = TRUE, 
                                 rollup = TRUE,  drop_higher = TRUE,
                                 complete = TRUE, sep = "\t", ...) {
  # check that output files defined
  if (is.null(x$output)) {
    stop("No output EBD file in this auk_ebd object, try calling auk_filter().")
  }
  if (is.null(x$output_sampling)) {
    stop("No output sampling event data file in this auk_ebd object.")
  }
  
  # pass on to file method
  auk_zerofill(x = x$output, sampling_events = x$output_sampling,
               species = species, collapse = collapse, 
               unique = unique, complete = complete, rollup = rollup,
               sep = sep)
}

#' @rdname auk_zerofill
#' @export
collapse_zerofill <- function(x) {
  UseMethod("collapse_zerofill")
}

#' @export
collapse_zerofill.auk_zerofill <- function(x) {
  out <- dplyr::inner_join(x$sampling_events, x$observations, 
                           by = "checklist_id")
  dplyr::as_tibble(out)
}

#' @export
print.auk_zerofill <- function(x, ...) {
  checklists <- nrow(x$sampling_events)
  species <- length(unique(x$observations$scientific_name))
  cat(
    paste0(
      "Zero-filled EBD: ",
      format(checklists, big.mark = ","), " unique checklists, ",
      "for ", format(species, big.mark = ","), " species.\n"
    )
  )
  return(invisible(x))
}


================================================
FILE: R/data.R
================================================
#' eBird Taxonomy
#'
#' A simplified version of the taxonomy used by eBird. Includes proper species
#' as well as various other categories such as `spuh` (e.g. *duck sp.*) and
#' *slash* (e.g. *American Black Duck/Mallard*). This taxonomy is based on the
#' Clements Checklist, which is updated annually, typically in the late summer. 
#' Non-ASCII characters (e.g. those with accents) have been converted to ASCII 
#' equivalents in this data frame.
#'
#' @format A data frame with eight variables and 16,248 rows:
#'   - `species_code`: a unique alphanumeric code identifying each species.
#'   - `taxon_concept_id`: a unique alphanumeric code identifying each species 
#'   in the Avibase taxonomy.
#'   - `scientific_name`: scientific name.
#'   - `common_name`: common name, defaults to English, but different languages 
#'   can be selected using the `locale` parameter.
#'   - `order`: the scientific name of the order that the species belongs to.
#'   - `family`: the scientific name of the family that the species belongs to.
#'   - `family_common`: the common name of the family that the species belongs
#'   to.
#'   - `category`: whether the entry is for a species or another 
#'   field-identifiable taxon, such as `spuh`, `slash`, `hybrid`, etc.
#'   - `taxonomic_order`: integer value used to sort rows in taxonomic order.
#'   - `report_as`: for taxa that can be resolved to true species (i.e. species,
#'   subspecies, and recognizable forms), this field links to the corresponding
#'   species code. For taxa that can't be resolved, this field is `NA`.
#'   - `extinct`: logical variable indicating whether the species is listed as 
#'   extinct in the eBird taxonomy.
#'
#' For further details, see \url{https://support.ebird.org/support/solutions/articles/48000837816-the-ebird-taxonomy}
#' @family data
"ebird_taxonomy"

#' eBird States
#'
#' A data frame of state codes used by eBird. These codes are 4 to 6 characters, 
#' consisting of two parts, the 2-letter ISO country code and a 1-3 character 
#' state code, separated by a dash. For example, `"US-NY"` corresponds to New 
#' York State in the United States. These state codes are required to filter by 
#' state using [auk_state()].
#' 
#' 
#' Note that some countries are not broken into states in eBird and therefore do 
#' not appear in this data frame.
#' 
#' @format A data frame with four variables and 3,145 rows:
#' - `country`: short form of English country name.
#' - `country_code`: 2-letter ISO country code.
#' - `state`: state name.
#' - `state_code`: 4 to 6 character state code.
#' @family data
"ebird_states"

#' BCR Codes
#'
#' A data frame of Bird Conservation Region (BCR) codes. BCRs are ecologically
#' distinct regions in North America with similar bird communities, habitats,
#' and resource management issues. These codes are required to filter by BCR
#' using [auk_bcr()].
#' 
#' @format A data frame with two variables and 66 rows:
#' - `bcr_code`: integer code from 1 to 66.
#' - `bcr_name`: name of BCR.
#' @family data
"bcr_codes"

#' Valid Protocols
#' 
#' A vector of valid protocol names, e.g. "Traveling", "Stationary", etc.
#' 
#' @format A vector with 42 elements.
#' @family data
"valid_protocols"

================================================
FILE: R/ebird-species.R
================================================
#' Lookup species in eBird taxonomy
#'
#' Given a list of common or scientific names, or species codes, check that they
#' appear in the official eBird taxonomy and convert them all to scientific
#' names, common names, or species codes. Un-matched species are returned as
#' `NA`.
#'
#' @param x character; species to look up, provided as scientific names, English
#'   common names, species codes, or a mixture of all three. Case insensitive.
#' @param type character; whether to return scientific names (`scientific`),
#'   English common names (`common`), or 6-letter eBird species codes (`code`). 
#'   Alternatively, use `all` to return a data frame with the all the taxonomy 
#'   information.
#' @param taxonomy_version integer; the version (i.e. year) of the taxonomy.
#'   Leave empty to use the version of the taxonomy included in the package.
#'   See [get_ebird_taxonomy()].
#'
#' @return Character vector of species identified by scientific name, common 
#'   name, or species code. If `type = "all"` a data frame of the taxonomy of 
#'   the requested species is returned.
#' @export
#' @family helpers
#' @examples
#' # mix common and scientific names, case-insensitive
#' species <- c("Blackburnian Warbler", "Poecile atricapillus",
#'              "american dipper", "Caribou", "hudgod")
#' # note that species not in the ebird taxonomy return NA
#' ebird_species(species)
#' 
#' # use taxonomy_version to query older taxonomy versions
#' \dontrun{
#' ebird_species("Cordillera Azul Antbird")
#' ebird_species("Cordillera Azul Antbird", taxonomy_version = 2017)
#' }
ebird_species <- function(x, type = c("scientific", "common", "code", "all"),
                          taxonomy_version) {
  assertthat::assert_that(is.character(x))
  type <- match.arg(type)
  
  # get the correct ebird taxonomy version
  if (missing(taxonomy_version) || 
      taxonomy_version == auk_version()$taxonomy_version) {
    tax <- auk::ebird_taxonomy
  } else {
    stopifnot(is_integer(taxonomy_version), length(taxonomy_version) == 1)
    tax <- get_ebird_taxonomy(version = taxonomy_version)
  }
  
  # deal with case issues
  lookup_species <- x
  x <- tolower(trimws(x))
  # convert to ascii
  x <- stringi::stri_trans_general(x, "latin-ascii")
  
  # first check for scientific names
  sci <- match(x, tolower(tax$scientific_name))
  # then for common names
  com <- match(x, tolower(tax$common_name))
  # finally for species codes
  sc <- match(x, tolower(tax$species_code))
  # combine
  idx <- ifelse(is.na(sci), ifelse(is.na(com), sc, com), sci)
  # convert to output format, default scientific
  if (identical(type, "scientific")) {
    return(tax$scientific_name[idx])
  } else if (identical(type, "common")) {
    return(tax$common_name[idx])
  } else if (identical(type, "code")) {
    return(tax$species_code[idx])
  } else {
    ret <- dplyr::as_tibble(tax[idx, ])
    ret$lookup_species <- lookup_species
    return(ret)
  }
}


================================================
FILE: R/filter-repeat-visits.R
================================================
#' Filter observations to repeat visits for hierarchical modeling
#' 
#' Hierarchical modeling of abundance and occurrence requires repeat visits to
#' sites to estimate detectability. These visits should be all be within a
#' period of closure, i.e. when the population can be assumed to be closed.
#' eBird data, and many other data sources, do not explicitly follow this
#' protocol; however, subsets of the data can be extracted to produce data
#' suitable for hierarchical modeling. This function extracts a subset of
#' observation data that have a desired number of repeat visits within a period
#' of closure.
#'
#' @param x `data.frame`; observation data, e.g. data from the eBird Basic 
#'   Dataset (EBD) zero-filled with [auk_zerofill()]. This function will also 
#'   work with an `auk_zerofill` object, in which case it will be converted to 
#'   a data frame with [collapse_zerofill()].
#'   **Note that these data must for a single species**. 
#' @param min_obs integer; minimum number of observations required for each
#'   site.
#' @param max_obs integer; maximum number of observations allo

Download .txt

gitextract_veew39wq/

├── .Rbuildignore
├── .github/
│   ├── .gitignore
│   └── workflows/
│       └── R-CMD-check.yaml
├── .gitignore
├── CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── auk-bbox.R
│   ├── auk-bcr.R
│   ├── auk-breeding.R
│   ├── auk-clean.R
│   ├── auk-complete.R
│   ├── auk-country.R
│   ├── auk-county.R
│   ├── auk-date.R
│   ├── auk-distance.R
│   ├── auk-duration.R
│   ├── auk-ebd-version.R
│   ├── auk-ebd.R
│   ├── auk-exotic.R
│   ├── auk-filter.R
│   ├── auk-get-awk-path.R
│   ├── auk-get-ebd-path.R
│   ├── auk-last-edited.R
│   ├── auk-observer.R
│   ├── auk-package.R
│   ├── auk-project.R
│   ├── auk-protocol.R
│   ├── auk-rollup.R
│   ├── auk-sampling.R
│   ├── auk-select.R
│   ├── auk-set-awk-path.R
│   ├── auk-set-ebd-path.R
│   ├── auk-species.R
│   ├── auk-split.R
│   ├── auk-state.R
│   ├── auk-time.R
│   ├── auk-unique.R
│   ├── auk-version.R
│   ├── auk-year.R
│   ├── auk-zerofill.R
│   ├── data.R
│   ├── ebird-species.R
│   ├── filter-repeat-visits.R
│   ├── format-unmarked-occu.R
│   ├── get-ebird-taxonomy.R
│   ├── process_barcharts.R
│   ├── read.R
│   ├── utils.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── auk.Rproj
├── cran-comments.md
├── data/
│   ├── bcr_codes.rda
│   ├── ebird_states.rda
│   ├── ebird_taxonomy.rda
│   └── valid_protocols.rda
├── data-raw/
│   ├── BCRCodes.txt
│   ├── barchart.R
│   ├── bcr-codes.r
│   ├── ebd-samples.r
│   ├── ebird-state.r
│   ├── ebird-taxonomy.csv
│   ├── ebird-taxonomy.r
│   └── valid-protocols.r
├── docs/
│   ├── 404.html
│   ├── 404.md
│   ├── CONDUCT.html
│   ├── CONDUCT.md
│   ├── CONTRIBUTING.html
│   ├── CONTRIBUTING.md
│   ├── LICENSE.html
│   ├── LICENSE.md
│   ├── articles/
│   │   ├── auk.html
│   │   ├── auk.md
│   │   ├── development.html
│   │   ├── development.md
│   │   ├── index.html
│   │   └── index.md
│   ├── authors.html
│   ├── authors.md
│   ├── deps/
│   │   ├── data-deps.txt
│   │   ├── font-awesome-6.5.2/
│   │   │   └── css/
│   │   │       ├── all.css
│   │   │       └── v4-shims.css
│   │   └── jquery-3.6.0/
│   │       └── jquery-3.6.0.js
│   ├── index.html
│   ├── index.md
│   ├── katex-auto.js
│   ├── lightswitch.js
│   ├── llms.txt
│   ├── news/
│   │   ├── index.html
│   │   └── index.md
│   ├── pkgdown.js
│   ├── pkgdown.yml
│   ├── reference/
│   │   ├── auk-package.html
│   │   ├── auk-package.md
│   │   ├── auk.html
│   │   ├── auk_bbox.html
│   │   ├── auk_bbox.md
│   │   ├── auk_bcr.html
│   │   ├── auk_bcr.md
│   │   ├── auk_breeding.html
│   │   ├── auk_breeding.md
│   │   ├── auk_clean.html
│   │   ├── auk_clean.md
│   │   ├── auk_complete.html
│   │   ├── auk_complete.md
│   │   ├── auk_country.html
│   │   ├── auk_country.md
│   │   ├── auk_county.html
│   │   ├── auk_county.md
│   │   ├── auk_date.html
│   │   ├── auk_date.md
│   │   ├── auk_distance.html
│   │   ├── auk_distance.md
│   │   ├── auk_duration.html
│   │   ├── auk_duration.md
│   │   ├── auk_ebd.html
│   │   ├── auk_ebd.md
│   │   ├── auk_ebd_version.html
│   │   ├── auk_ebd_version.md
│   │   ├── auk_exotic.html
│   │   ├── auk_exotic.md
│   │   ├── auk_extent.html
│   │   ├── auk_extent.md
│   │   ├── auk_filter.auk_ebd.html
│   │   ├── auk_filter.auk_sampling.html
│   │   ├── auk_filter.html
│   │   ├── auk_filter.md
│   │   ├── auk_get_awk_path.html
│   │   ├── auk_get_awk_path.md
│   │   ├── auk_get_ebd_path.html
│   │   ├── auk_get_ebd_path.md
│   │   ├── auk_last_edited.html
│   │   ├── auk_last_edited.md
│   │   ├── auk_observer.html
│   │   ├── auk_observer.md
│   │   ├── auk_project.html
│   │   ├── auk_project.md
│   │   ├── auk_protocol.html
│   │   ├── auk_protocol.md
│   │   ├── auk_rollup.html
│   │   ├── auk_rollup.md
│   │   ├── auk_sampling.html
│   │   ├── auk_sampling.md
│   │   ├── auk_select.html
│   │   ├── auk_select.md
│   │   ├── auk_set_awk_path.html
│   │   ├── auk_set_awk_path.md
│   │   ├── auk_set_ebd_path.html
│   │   ├── auk_set_ebd_path.md
│   │   ├── auk_species.html
│   │   ├── auk_species.md
│   │   ├── auk_split.html
│   │   ├── auk_split.md
│   │   ├── auk_state.html
│   │   ├── auk_state.md
│   │   ├── auk_time.html
│   │   ├── auk_time.md
│   │   ├── auk_unique.html
│   │   ├── auk_unique.md
│   │   ├── auk_version.html
│   │   ├── auk_version.md
│   │   ├── auk_year.html
│   │   ├── auk_year.md
│   │   ├── auk_zerofill.auk_ebd.html
│   │   ├── auk_zerofill.character.html
│   │   ├── auk_zerofill.data.frame.html
│   │   ├── auk_zerofill.html
│   │   ├── auk_zerofill.md
│   │   ├── bcr_codes.html
│   │   ├── bcr_codes.md
│   │   ├── collapse_zerofill.html
│   │   ├── ebird_species.html
│   │   ├── ebird_species.md
│   │   ├── ebird_states.html
│   │   ├── ebird_states.md
│   │   ├── ebird_taxonomy.html
│   │   ├── ebird_taxonomy.md
│   │   ├── filter_repeat_visits.html
│   │   ├── filter_repeat_visits.md
│   │   ├── format_unmarked_occu.html
│   │   ├── format_unmarked_occu.md
│   │   ├── get_ebird_taxonomy.html
│   │   ├── get_ebird_taxonomy.md
│   │   ├── index.html
│   │   ├── index.md
│   │   ├── process_barcharts.html
│   │   ├── process_barcharts.md
│   │   ├── read_ebd.auk_ebd.html
│   │   ├── read_ebd.character.html
│   │   ├── read_ebd.html
│   │   ├── read_ebd.md
│   │   ├── read_sampling.auk_ebd.html
│   │   ├── read_sampling.auk_sampling.html
│   │   ├── read_sampling.character.html
│   │   ├── read_sampling.html
│   │   ├── valid_protocols.html
│   │   └── valid_protocols.md
│   ├── search.json
│   ├── site.webmanifest
│   └── sitemap.xml
├── inst/
│   └── extdata/
│       ├── barchart-sample.txt
│       ├── ebd-rollup-ex.txt
│       ├── ebd-sample.txt
│       ├── zerofill-ex_ebd.txt
│       └── zerofill-ex_sampling.txt
├── makefile.R
├── man/
│   ├── auk-package.Rd
│   ├── auk_bbox.Rd
│   ├── auk_bcr.Rd
│   ├── auk_breeding.Rd
│   ├── auk_clean.Rd
│   ├── auk_complete.Rd
│   ├── auk_country.Rd
│   ├── auk_county.Rd
│   ├── auk_date.Rd
│   ├── auk_distance.Rd
│   ├── auk_duration.Rd
│   ├── auk_ebd.Rd
│   ├── auk_ebd_version.Rd
│   ├── auk_exotic.Rd
│   ├── auk_extent.Rd
│   ├── auk_filter.Rd
│   ├── auk_get_awk_path.Rd
│   ├── auk_get_ebd_path.Rd
│   ├── auk_last_edited.Rd
│   ├── auk_observer.Rd
│   ├── auk_project.Rd
│   ├── auk_protocol.Rd
│   ├── auk_rollup.Rd
│   ├── auk_sampling.Rd
│   ├── auk_select.Rd
│   ├── auk_set_awk_path.Rd
│   ├── auk_set_ebd_path.Rd
│   ├── auk_species.Rd
│   ├── auk_split.Rd
│   ├── auk_state.Rd
│   ├── auk_time.Rd
│   ├── auk_unique.Rd
│   ├── auk_version.Rd
│   ├── auk_year.Rd
│   ├── auk_zerofill.Rd
│   ├── bcr_codes.Rd
│   ├── ebird_species.Rd
│   ├── ebird_states.Rd
│   ├── ebird_taxonomy.Rd
│   ├── filter_repeat_visits.Rd
│   ├── format_unmarked_occu.Rd
│   ├── get_ebird_taxonomy.Rd
│   ├── process_barcharts.Rd
│   ├── read_ebd.Rd
│   └── valid_protocols.Rd
├── tests/
│   ├── testthat/
│   │   ├── setup.R
│   │   ├── test_auk-ebd-version.r
│   │   ├── test_auk-ebd.r
│   │   ├── test_auk-filter.r
│   │   ├── test_auk-keep-drop.r
│   │   ├── test_auk-rollup.r
│   │   ├── test_auk-select.r
│   │   ├── test_auk-split.r
│   │   ├── test_auk-unique.r
│   │   ├── test_auk-zerofill.r
│   │   ├── test_ebird-species.r
│   │   ├── test_filters.r
│   │   ├── test_filters_sampling.r
│   │   ├── test_get-ebird-taxonomy.r
│   │   ├── test_read.r
│   │   ├── test_set-env.R
│   │   └── test_unmarked.r
│   └── testthat.R
└── vignettes/
    ├── auk.Rmd
    └── development.Rmd

Download .txt

SYMBOL INDEX (86 symbols across 3 files)

FILE: docs/deps/jquery-3.6.0/jquery-3.6.0.js
  function DOMEval (line 107) | function DOMEval( code, node, doc ) {
  function toType (line 137) | function toType( obj ) {
  function isArrayLike (line 507) | function isArrayLike( obj ) {
  function Sizzle (line 759) | function Sizzle( selector, context, results, seed ) {
  function createCache (line 907) | function createCache() {
  function markFunction (line 927) | function markFunction( fn ) {
  function assert (line 936) | function assert( fn ) {
  function addHandle (line 960) | function addHandle( attrs, handler ) {
  function siblingCheck (line 975) | function siblingCheck( a, b ) {
  function createInputPseudo (line 1001) | function createInputPseudo( type ) {
  function createButtonPseudo (line 1012) | function createButtonPseudo( type ) {
  function createDisabledPseudo (line 1023) | function createDisabledPseudo( disabled ) {
  function createPositionalPseudo (line 1079) | function createPositionalPseudo( fn ) {
  function testContext (line 1102) | function testContext( context ) {
  function setFilters (line 2313) | function setFilters() {}
  function toSelector (line 2387) | function toSelector( tokens ) {
  function addCombinator (line 2397) | function addCombinator( matcher, combinator, base ) {
  function elementMatcher (line 2464) | function elementMatcher( matchers ) {
  function multipleContexts (line 2478) | function multipleContexts( selector, contexts, results ) {
  function condense (line 2487) | function condense( unmatched, map, filter, context, xml ) {
  function setMatcher (line 2508) | function setMatcher( preFilter, selector, matcher, postFilter, postFinde...
  function matcherFromTokens (line 2608) | function matcherFromTokens( tokens ) {
  function matcherFromGroupMatchers (line 2671) | function matcherFromGroupMatchers( elementMatchers, setMatchers ) {
  function nodeName (line 3029) | function nodeName( elem, name ) {
  function winnow (line 3039) | function winnow( elements, qualifier, not ) {
  function sibling (line 3334) | function sibling( cur, dir ) {
  function createOptions (line 3427) | function createOptions( options ) {
  function Identity (line 3652) | function Identity( v ) {
  function Thrower (line 3655) | function Thrower( ex ) {
  function adoptValue (line 3659) | function adoptValue( value, resolve, reject, noValue ) {
  function resolve (line 3752) | function resolve( depth, deferred, handler, special ) {
  function completed (line 4117) | function completed() {
  function fcamelCase (line 4212) | function fcamelCase( _all, letter ) {
  function camelCase (line 4219) | function camelCase( string ) {
  function Data (line 4236) | function Data() {
  function getData (line 4405) | function getData( data ) {
  function dataAttr (line 4430) | function dataAttr( elem, key, data ) {
  function adjustCSS (line 4742) | function adjustCSS( elem, prop, valueParts, tween ) {
  function getDefaultDisplay (line 4810) | function getDefaultDisplay( elem ) {
  function showHide (line 4833) | function showHide( elements, show ) {
  function getAll (line 4965) | function getAll( context, tag ) {
  function setGlobalEval (line 4990) | function setGlobalEval( elems, refElements ) {
  function buildFragment (line 5006) | function buildFragment( elems, context, scripts, selection, ignored ) {
  function returnTrue (line 5098) | function returnTrue() {
  function returnFalse (line 5102) | function returnFalse() {
  function expectSync (line 5112) | function expectSync( elem, type ) {
  function safeActiveElement (line 5119) | function safeActiveElement() {
  function on (line 5125) | function on( elem, types, selector, data, fn, one ) {
  function leverageNative (line 5613) | function leverageNative( el, type, expectSync ) {
  function manipulationTarget (line 5962) | function manipulationTarget( elem, content ) {
  function disableScript (line 5973) | function disableScript( elem ) {
  function restoreScript (line 5977) | function restoreScript( elem ) {
  function cloneCopyEvent (line 5987) | function cloneCopyEvent( src, dest ) {
  function fixInput (line 6020) | function fixInput( src, dest ) {
  function domManip (line 6033) | function domManip( collection, args, callback, ignored ) {
  function remove (line 6125) | function remove( elem, selector, keepData ) {
  function computeStyleTests (line 6439) | function computeStyleTests() {
  function roundPixelMeasures (line 6483) | function roundPixelMeasures( measure ) {
  function curCSS (line 6576) | function curCSS( elem, name, computed ) {
  function addGetHookIf (line 6629) | function addGetHookIf( conditionFn, hookFn ) {
  function vendorPropName (line 6654) | function vendorPropName( name ) {
  function finalPropName (line 6669) | function finalPropName( name ) {
  function setPositiveNumber (line 6695) | function setPositiveNumber( _elem, value, subtract ) {
  function boxModelAdjustment (line 6707) | function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, ...
  function getWidthOrHeight (line 6775) | function getWidthOrHeight( elem, dimension, extra ) {
  function Tween (line 7151) | function Tween( elem, options, prop, end, easing ) {
  function schedule (line 7274) | function schedule() {
  function createFxNow (line 7287) | function createFxNow() {
  function genFx (line 7295) | function genFx( type, includeWidth ) {
  function createTween (line 7315) | function createTween( value, prop, animation ) {
  function defaultPrefilter (line 7329) | function defaultPrefilter( elem, props, opts ) {
  function propFilter (line 7501) | function propFilter( props, specialEasing ) {
  function Animation (line 7538) | function Animation( elem, properties, options ) {
  function stripAndCollapse (line 8254) | function stripAndCollapse( value ) {
  function getClass (line 8260) | function getClass( elem ) {
  function classesToArray (line 8264) | function classesToArray( value ) {
  function buildParams (line 8894) | function buildParams( prefix, obj, traditional, add ) {
  function addToPrefiltersOrTransports (line 9047) | function addToPrefiltersOrTransports( structure ) {
  function inspectPrefiltersOrTransports (line 9081) | function inspectPrefiltersOrTransports( structure, options, originalOpti...
  function ajaxExtend (line 9110) | function ajaxExtend( target, src ) {
  function ajaxHandleResponses (line 9130) | function ajaxHandleResponses( s, jqXHR, responses ) {
  function ajaxConvert (line 9188) | function ajaxConvert( s, response, jqXHR, isSuccess ) {
  function done (line 9704) | function done( status, nativeStatusText, responses, headers ) {

FILE: docs/lightswitch.js
  function bsSetupThemeToggle (line 29) | function bsSetupThemeToggle() {

FILE: docs/pkgdown.js
  function changeTooltipMessage (line 32) | function changeTooltipMessage(element, msg) {
  function searchFuse (line 116) | async function searchFuse(query, callback) {

Download .json

Condensed preview — 278 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (5,643K chars).

[
  {
    "path": ".Rbuildignore",
    "chars": 287,
    "preview": "^CRAN-RELEASE$\n^Meta$\n^doc$\n^pkgdown$\n^.*\\.Rproj$\n^\\.Rproj\\.user$\n^data-raw$\n^README\\.Rmd$\n^README\\.md$\n^README-.*\\.png$"
  },
  {
    "path": ".github/.gitignore",
    "chars": 7,
    "preview": "*.html\n"
  },
  {
    "path": ".github/workflows/R-CMD-check.yaml",
    "chars": 1420,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".gitignore",
    "chars": 90,
    "preview": "Meta\ndoc\n.Rproj.user\n.Rhistory\n.RData\n.Ruserdata\n.DS_Store\ninst/doc\npkgdown/\n/doc/\n/Meta/\n"
  },
  {
    "path": "CONDUCT.md",
    "chars": 3215,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, w"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1090,
    "preview": "# CONTRIBUTING\n\n## Please contribute!\n\nWe love collaboration.\n\n## Bugs?\n\n- Submit an issue on the Issues page [here](htt"
  },
  {
    "path": "DESCRIPTION",
    "chars": 1615,
    "preview": "Package: auk\nTitle: eBird Data Extraction and Processing in R\nVersion: 0.9.2\nAuthors@R: \n    c(person(given = \"Matthew\","
  },
  {
    "path": "LICENSE.md",
    "chars": 34732,
    "preview": "GNU General Public License\n==========================\n\n_Version 3, 29 June 2007_  \n_Copyright © 2007 Free Software Found"
  },
  {
    "path": "NAMESPACE",
    "chars": 2642,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nS3method(auk_bbox,auk_ebd)\nS3method(auk_bbox,auk_sampling)\nS3method(auk_bc"
  },
  {
    "path": "NEWS.md",
    "chars": 6580,
    "preview": "# auk 0.9.2\n\n- update to v2 of the eBird API and httr2 (PR #97)\n- drop magrittr pipe re-export\n\n# auk 0.9.1\n\n- ensure ta"
  },
  {
    "path": "R/auk-bbox.R",
    "chars": 3447,
    "preview": "#' Filter the eBird data by spatial bounding box\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on spatia"
  },
  {
    "path": "R/auk-bcr.R",
    "chars": 2409,
    "preview": "#' Filter the eBird data by Bird Conservation Region\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) to extract "
  },
  {
    "path": "R/auk-breeding.R",
    "chars": 1030,
    "preview": "#' Filter to only include observations with breeding codes\n#'\n#' eBird users have the option of specifying breeding bird"
  },
  {
    "path": "R/auk-clean.R",
    "chars": 3832,
    "preview": "#' Clean an eBird data file (Deprecated)\r\n#'\r\n#' This function is no longer required by current versions of the eBird Ba"
  },
  {
    "path": "R/auk-complete.R",
    "chars": 1165,
    "preview": "#' Filter out incomplete checklists from the eBird data\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) to only "
  },
  {
    "path": "R/auk-country.R",
    "chars": 3816,
    "preview": "#' Filter the eBird data by country\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a set of\n#' countri"
  },
  {
    "path": "R/auk-county.R",
    "chars": 2435,
    "preview": "#' Filter the eBird data by county\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a set of\n#' counties"
  },
  {
    "path": "R/auk-date.R",
    "chars": 3235,
    "preview": "#' Filter the eBird data by date\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a range of dates.\n#' T"
  },
  {
    "path": "R/auk-distance.R",
    "chars": 2288,
    "preview": "#' Filter eBird data by distance travelled\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on the distance"
  },
  {
    "path": "R/auk-duration.R",
    "chars": 1708,
    "preview": "#' Filter the eBird data by duration\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on the duration of\n#'"
  },
  {
    "path": "R/auk-ebd-version.R",
    "chars": 2250,
    "preview": "#' Get the EBD version and associated taxonomy version\n#' \n#' Based on the filename of eBird Basic Dataset (EBD) or samp"
  },
  {
    "path": "R/auk-ebd.R",
    "chars": 12126,
    "preview": "#' Reference to eBird data file\n#'\n#' Create a reference to an eBird Basic Dataset (EBD) file in preparation for\n#' filt"
  },
  {
    "path": "R/auk-exotic.R",
    "chars": 1998,
    "preview": "#' Filter the eBird data by exotic code\n#'\n#' Exotic codes are applied to eBird observations when the species is believe"
  },
  {
    "path": "R/auk-filter.R",
    "chars": 24732,
    "preview": "#' Filter the eBird file using AWK\n#'\n#' Convert the filters defined in an `auk_ebd` object into an AWK script and run\n#"
  },
  {
    "path": "R/auk-get-awk-path.R",
    "chars": 1933,
    "preview": "#' OS specific path to AWK executable\n#'\n#' Return the OS specific path to AWK (e.g. `\"C:/cygwin64/bin/gawk.exe\"` or\n#' "
  },
  {
    "path": "R/auk-get-ebd-path.R",
    "chars": 535,
    "preview": "#' Return EBD data path\n#' \n#' Returns the environment variable `EBD_PATH`, which users are encouraged to \n#' set to the"
  },
  {
    "path": "R/auk-last-edited.R",
    "chars": 2020,
    "preview": "#' Filter the eBird data by last edited date\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a range of"
  },
  {
    "path": "R/auk-observer.R",
    "chars": 1708,
    "preview": "#' Filter the eBird data by observer\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a set of\n#' observ"
  },
  {
    "path": "R/auk-package.R",
    "chars": 302,
    "preview": "#' `auk`: eBird Data Extraction and Processing in R\n#'\n#' Tools for extracting and processing eBird data from the eBird "
  },
  {
    "path": "R/auk-project.R",
    "chars": 1747,
    "preview": "#' Filter the eBird data by project code\n#'\n#' Some eBird records are collected as part of a particular project (e.g. th"
  },
  {
    "path": "R/auk-protocol.R",
    "chars": 1636,
    "preview": "#' Filter the eBird data by protocol\n#'\n#' Filter to just data collected following a specific search protocol:\n#' statio"
  },
  {
    "path": "R/auk-rollup.R",
    "chars": 8920,
    "preview": "#' Roll up eBird taxonomy to species\n#' \n#' The eBird Basic Dataset (EBD) includes both true species and every other\n#' "
  },
  {
    "path": "R/auk-sampling.R",
    "chars": 7584,
    "preview": "#' Reference to eBird sampling event file\n#'\n#' Create a reference to an eBird sampling event file in preparation for\n#'"
  },
  {
    "path": "R/auk-select.R",
    "chars": 3601,
    "preview": "#' Select a subset of columns\n#' \n#' Select a subset of columns from the eBird Basic Dataset (EBD) or the sampling \n#' e"
  },
  {
    "path": "R/auk-set-awk-path.R",
    "chars": 2287,
    "preview": "#' Set a custom path to AWK executable\n#' \n#' If AWK has been installed in a non-standard location, the environment\n#' v"
  },
  {
    "path": "R/auk-set-ebd-path.R",
    "chars": 2284,
    "preview": "#' Set the path to EBD text files\n#' \n#' Users of `auk` are encouraged to set the path to the directory containing the\n#"
  },
  {
    "path": "R/auk-species.R",
    "chars": 3639,
    "preview": "#' Filter the eBird data by species\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on species. This\n#' fu"
  },
  {
    "path": "R/auk-split.R",
    "chars": 5172,
    "preview": "#' Split an eBird data file by species\n#' \n#' Given an eBird Basic Dataset (EBD) and a list of species, split the file i"
  },
  {
    "path": "R/auk-state.R",
    "chars": 2728,
    "preview": "#' Filter the eBird data by state\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a set of\n#' states. T"
  },
  {
    "path": "R/auk-time.R",
    "chars": 1914,
    "preview": "#' Filter the eBird data by checklist start time\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a rang"
  },
  {
    "path": "R/auk-unique.R",
    "chars": 5717,
    "preview": "#' Remove duplicate group checklists\n#'\n#' eBird checklists can be shared among a group of multiple observers, in which\n"
  },
  {
    "path": "R/auk-version.R",
    "chars": 1180,
    "preview": "#' Versions of auk, the EBD, and the eBird taxonomy\n#'\n#' This package depends on the version of the EBD and on the eBir"
  },
  {
    "path": "R/auk-year.R",
    "chars": 2487,
    "preview": "#' Filter the eBird data to a set of years\n#'\n#' Define a filter for the eBird Basic Dataset (EBD) based on a set of\n#' "
  },
  {
    "path": "R/auk-zerofill.R",
    "chars": 11464,
    "preview": "#' Read and zero-fill an eBird data file\n#'\n#' Read an eBird Basic Dataset (EBD) file, and associated sampling event dat"
  },
  {
    "path": "R/data.R",
    "chars": 3213,
    "preview": "#' eBird Taxonomy\n#'\n#' A simplified version of the taxonomy used by eBird. Includes proper species\n#' as well as variou"
  },
  {
    "path": "R/ebird-species.R",
    "chars": 2944,
    "preview": "#' Lookup species in eBird taxonomy\n#'\n#' Given a list of common or scientific names, or species codes, check that they\n"
  },
  {
    "path": "R/filter-repeat-visits.R",
    "chars": 7661,
    "preview": "#' Filter observations to repeat visits for hierarchical modeling\n#' \n#' Hierarchical modeling of abundance and occurren"
  },
  {
    "path": "R/format-unmarked-occu.R",
    "chars": 7749,
    "preview": "#' Format EBD data for occupancy modeling with `unmarked`\n#' \n#' Prepare a data frame of species observations for ingest"
  },
  {
    "path": "R/get-ebird-taxonomy.R",
    "chars": 3001,
    "preview": "#' Get eBird taxonomy via the eBird API\n#' \n#' Get the taxonomy used in eBird via the eBird API. \n#'\n#' @param version i"
  },
  {
    "path": "R/process_barcharts.R",
    "chars": 5132,
    "preview": "#' Process eBird bar chart data\n#' \n#' eBird bar charts show the frequency of detection for each week for all\n#' species"
  },
  {
    "path": "R/read.R",
    "chars": 4882,
    "preview": "#' Read an EBD file\n#'\n#' Read an eBird Basic Dataset file using [readr::read_delim()]. `read_ebd()`\n#' reads the EBD it"
  },
  {
    "path": "R/utils.R",
    "chars": 2838,
    "preview": "is_integer <- function(x) {\n  is.integer(x) || (is.numeric(x) && all(x == as.integer(x)))\n}\n\nget_header <- function(x, s"
  },
  {
    "path": "R/zzz.R",
    "chars": 476,
    "preview": ".onAttach <- function(libname, pkgname) {\n  m <- paste0(\"%s is designed for EBD files downloaded after %s.\")\n  p <- auk_"
  },
  {
    "path": "README.Rmd",
    "chars": 12902,
    "preview": "---\noutput: md_document\neditor_options: \n  chunk_output_type: console\n---\n\n<!-- README.md is generated from README.Rmd. "
  },
  {
    "path": "README.md",
    "chars": 18433,
    "preview": "<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n# auk: eBird Data Extraction and Processing in R"
  },
  {
    "path": "_pkgdown.yml",
    "chars": 1087,
    "preview": "url: https://cornelllabofornithology.github.io/auk/\n\nnavbar:\n  structure:\n    left:  [intro, reference, articles, cheats"
  },
  {
    "path": "auk.Rproj",
    "chars": 351,
    "preview": "Version: 1.0\nProjectId: 75beeb8f-7a35-4a34-aa86-8e874912f531\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSav"
  },
  {
    "path": "cran-comments.md",
    "chars": 600,
    "preview": "# auk 0.9.1\n\n- ensure taxon_concept_id behaves correctly in auk_rollup() (issue #94)\n- update EBD example files to get l"
  },
  {
    "path": "data-raw/BCRCodes.txt",
    "chars": 1960,
    "preview": "BCR_CODE\tBCR_NAME\r\n1\tALEUTIAN/BERING_SEA_ISLANDS\r\n2\tWESTERN_ALASKA\r\n3\tARCTIC_PLAINS_AND_MOUNTAINS\r\n4\tNORTHWESTERN_INTERI"
  },
  {
    "path": "data-raw/barchart.R",
    "chars": 147,
    "preview": "f_src <- \"~/Downloads/ebird_SJ__2015_2025_1_12_barchart.txt\"\nf_dst <- \"inst/extdata/barchart-sample.txt\"\nfile.copy(f_src"
  },
  {
    "path": "data-raw/bcr-codes.r",
    "chars": 259,
    "preview": "library(tidyverse)\nlibrary(janitor)\n\nbcr_codes <- read_tsv(\"data-raw/BCRCodes.txt\") |> \n  clean_names() |> \n  mutate(bcr"
  },
  {
    "path": "data-raw/ebd-samples.r",
    "chars": 3987,
    "preview": "library(auk)\nlibrary(fs)\nlibrary(glue)\nlibrary(stringi)\nlibrary(tidyverse)\n\nebird_dir <- \"~/data/ebird/auk/\"\n\n# US obser"
  },
  {
    "path": "data-raw/ebird-state.r",
    "chars": 780,
    "preview": "library(tidyverse)\nlibrary(stringi)\nlibrary(stringr)\nlibrary(countrycode)\nlibrary(janitor)\ndir <- \"/Users/mes335/data/eb"
  },
  {
    "path": "data-raw/ebird-taxonomy.csv",
    "chars": 2460875,
    "preview": "species_code,taxon_concept_id,scientific_name,common_name,order,family,family_common,category,taxonomic_order,report_as,"
  },
  {
    "path": "data-raw/ebird-taxonomy.r",
    "chars": 1615,
    "preview": "library(tidyverse)\nlibrary(stringi)\nlibrary(readxl)\nlibrary(auk)\n\nextract_family <- function(x) {\n  str_match(x, \"\\\\((.*"
  },
  {
    "path": "data-raw/valid-protocols.r",
    "chars": 446,
    "preview": "valid_protocols <-c(\n  \"Incidental\", \"Stationary\", \"Traveling\", \"Historical\",\n  \"Banding\", \"eBird Pelagic Protocol\", \"No"
  },
  {
    "path": "docs/404.html",
    "chars": 5763,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type"
  },
  {
    "path": "docs/404.md",
    "chars": 75,
    "preview": "Content not found. Please use links in the navbar.\n\n# Page not found (404)\n"
  },
  {
    "path": "docs/CONDUCT.html",
    "chars": 9068,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/CONDUCT.md",
    "chars": 3184,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, w"
  },
  {
    "path": "docs/CONTRIBUTING.html",
    "chars": 6646,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/CONTRIBUTING.md",
    "chars": 1110,
    "preview": "# CONTRIBUTING\n\n## Please contribute!\n\nWe love collaboration.\n\n## Bugs?\n\n- Submit an issue on the Issues page\n  [here](h"
  },
  {
    "path": "docs/LICENSE.html",
    "chars": 46008,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/LICENSE.md",
    "chars": 34807,
    "preview": "# GNU General Public License\n\n*Version 3, 29 June 2007*  \n*Copyright © 2007 Free Software Foundation, Inc. \\<<http://fsf"
  },
  {
    "path": "docs/articles/auk.html",
    "chars": 122603,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type"
  },
  {
    "path": "docs/articles/auk.md",
    "chars": 42041,
    "preview": "# Introduction to auk\n\n[eBird](http://www.ebird.org) is an online tool for recording bird\nobservations. Since its incept"
  },
  {
    "path": "docs/articles/development.html",
    "chars": 27550,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type"
  },
  {
    "path": "docs/articles/development.md",
    "chars": 15127,
    "preview": "# auk development\n\nThis vignette describes the process of updating and extending `auk`.\nThree topics are covered: updati"
  },
  {
    "path": "docs/articles/index.html",
    "chars": 4850,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/articles/index.md",
    "chars": 215,
    "preview": "# Articles\n\n### All vignettes\n\n- [Introduction to\n  auk](https://cornelllabofornithology.github.io/auk/articles/auk.md):"
  },
  {
    "path": "docs/authors.html",
    "chars": 6344,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/authors.md",
    "chars": 916,
    "preview": "# Authors and Citation\n\n## Authors\n\n- **[Matthew Strimas-Mackey](http://strimas.com)**. Author, maintainer.\n  [](https:/"
  },
  {
    "path": "docs/deps/data-deps.txt",
    "chars": 885,
    "preview": "<script src=\"deps/jquery-3.6.0/jquery-3.6.0.min.js\"></script>\n<meta name=\"viewport\" content=\"width=device-width, initial"
  },
  {
    "path": "docs/deps/font-awesome-6.5.2/css/all.css",
    "chars": 140418,
    "preview": "/*!\n * Font Awesome Free 6.5.2 by @fontawesome - https://fontawesome.com\n * License - https://fontawesome.com/license/fr"
  },
  {
    "path": "docs/deps/font-awesome-6.5.2/css/v4-shims.css",
    "chars": 41574,
    "preview": "/*!\n * Font Awesome Free 6.5.2 by @fontawesome - https://fontawesome.com\n * License - https://fontawesome.com/license/fr"
  },
  {
    "path": "docs/deps/jquery-3.6.0/jquery-3.6.0.js",
    "chars": 288580,
    "preview": "/*!\n * jQuery JavaScript Library v3.6.0\n * https://jquery.com/\n *\n * Includes Sizzle.js\n * https://sizzlejs.com/\n *\n * C"
  },
  {
    "path": "docs/index.html",
    "chars": 44547,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type"
  },
  {
    "path": "docs/index.md",
    "chars": 18599,
    "preview": "# auk: eBird Data Extraction and Processing in R\n\n## Overview\n\n[eBird](http://www.ebird.org) is an online tool for recor"
  },
  {
    "path": "docs/katex-auto.js",
    "chars": 621,
    "preview": "// https://github.com/jgm/pandoc/blob/29fa97ab96b8e2d62d48326e1b949a71dc41f47a/src/Text/Pandoc/Writers/HTML.hs#L332-L345"
  },
  {
    "path": "docs/lightswitch.js",
    "chars": 2498,
    "preview": "\n/*!\n * Color mode toggler for Bootstrap's docs (https://getbootstrap.com/)\n * Copyright 2011-2023 The Bootstrap Authors"
  },
  {
    "path": "docs/llms.txt",
    "chars": 25028,
    "preview": "# auk: eBird Data Extraction and Processing in R\n\n## Overview\n\n[eBird](http://www.ebird.org) is an online tool for recor"
  },
  {
    "path": "docs/news/index.html",
    "chars": 21400,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/news/index.md",
    "chars": 10992,
    "preview": "# Changelog\n\n## auk 0.9.2\n\n- update to v2 of the eBird API and httr2 (PR\n  [\\#97](https://github.com/CornellLabofOrnitho"
  },
  {
    "path": "docs/pkgdown.js",
    "chars": 4961,
    "preview": "/* http://gregfranko.com/blog/jquery-best-practices/ */\n(function ($) {\n  $(function () {\n\n    $('nav.navbar').headroom("
  },
  {
    "path": "docs/pkgdown.yml",
    "chars": 273,
    "preview": "pandoc: 3.6.3\npkgdown: 2.2.0\npkgdown_sha: ~\narticles:\n  auk: auk.html\n  development: development.html\nlast_built: 2026-0"
  },
  {
    "path": "docs/reference/auk-package.html",
    "chars": 6516,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk-package.md",
    "chars": 528,
    "preview": "# `auk`: eBird Data Extraction and Processing in R\n\nTools for extracting and processing eBird data from the eBird Basic\n"
  },
  {
    "path": "docs/reference/auk.html",
    "chars": 308,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_bbox.html",
    "chars": 15564,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_bbox.md",
    "chars": 4573,
    "preview": "# Filter the eBird data by spatial bounding box\n\nDefine a filter for the eBird Basic Dataset (EBD) based on spatial\nboun"
  },
  {
    "path": "docs/reference/auk_bcr.html",
    "chars": 16532,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_bcr.md",
    "chars": 4851,
    "preview": "# Filter the eBird data by Bird Conservation Region\n\nDefine a filter for the eBird Basic Dataset (EBD) to extract data f"
  },
  {
    "path": "docs/reference/auk_breeding.html",
    "chars": 11497,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_breeding.md",
    "chars": 3181,
    "preview": "# Filter to only include observations with breeding codes\n\neBird users have the option of specifying breeding bird atlas"
  },
  {
    "path": "docs/reference/auk_clean.html",
    "chars": 10160,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_clean.md",
    "chars": 1893,
    "preview": "# Clean an eBird data file (Deprecated)\n\nThis function is no longer required by current versions of the eBird\nBasic Data"
  },
  {
    "path": "docs/reference/auk_complete.html",
    "chars": 11892,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_complete.md",
    "chars": 3430,
    "preview": "# Filter out incomplete checklists from the eBird data\n\nDefine a filter for the eBird Basic Dataset (EBD) to only keep c"
  },
  {
    "path": "docs/reference/auk_country.html",
    "chars": 15667,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_country.md",
    "chars": 4708,
    "preview": "# Filter the eBird data by country\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a set of\ncountries. This "
  },
  {
    "path": "docs/reference/auk_county.html",
    "chars": 15420,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_county.md",
    "chars": 4842,
    "preview": "# Filter the eBird data by county\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a set of\ncounties This fun"
  },
  {
    "path": "docs/reference/auk_date.html",
    "chars": 21633,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_date.md",
    "chars": 6222,
    "preview": "# Filter the eBird data by date\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a range of\ndates. This funct"
  },
  {
    "path": "docs/reference/auk_distance.html",
    "chars": 16010,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_distance.md",
    "chars": 4613,
    "preview": "# Filter eBird data by distance travelled\n\nDefine a filter for the eBird Basic Dataset (EBD) based on the distance\ntrave"
  },
  {
    "path": "docs/reference/auk_duration.html",
    "chars": 15449,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_duration.md",
    "chars": 4349,
    "preview": "# Filter the eBird data by duration\n\nDefine a filter for the eBird Basic Dataset (EBD) based on the duration\nof the chec"
  },
  {
    "path": "docs/reference/auk_ebd.html",
    "chars": 15446,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_ebd.md",
    "chars": 4274,
    "preview": "# Reference to eBird data file\n\nCreate a reference to an eBird Basic Dataset (EBD) file in preparation\nfor filtering usi"
  },
  {
    "path": "docs/reference/auk_ebd_version.html",
    "chars": 8843,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_ebd_version.md",
    "chars": 1485,
    "preview": "# Get the EBD version and associated taxonomy version\n\nBased on the filename of eBird Basic Dataset (EBD) or sampling ev"
  },
  {
    "path": "docs/reference/auk_exotic.html",
    "chars": 15237,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_exotic.md",
    "chars": 4516,
    "preview": "# Filter the eBird data by exotic code\n\nExotic codes are applied to eBird observations when the species is\nbelieve to be"
  },
  {
    "path": "docs/reference/auk_extent.html",
    "chars": 14553,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_extent.md",
    "chars": 4062,
    "preview": "# Filter the eBird data by spatial extent\n\n**Deprecated**, use\n[`auk_bbox()`](https://cornelllabofornithology.github.io/"
  },
  {
    "path": "docs/reference/auk_filter.auk_ebd.html",
    "chars": 306,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_filter.auk_sampling.html",
    "chars": 306,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_filter.html",
    "chars": 23723,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_filter.md",
    "chars": 6742,
    "preview": "# Filter the eBird file using AWK\n\nConvert the filters defined in an `auk_ebd` object into an AWK script\nand run this sc"
  },
  {
    "path": "docs/reference/auk_get_awk_path.html",
    "chars": 7566,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_get_awk_path.md",
    "chars": 829,
    "preview": "# OS specific path to AWK executable\n\nReturn the OS specific path to AWK (e.g. `\"C:/cygwin64/bin/gawk.exe\"` or\n`\"/usr/bi"
  },
  {
    "path": "docs/reference/auk_get_ebd_path.html",
    "chars": 7018,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_get_ebd_path.md",
    "chars": 687,
    "preview": "# Return EBD data path\n\nReturns the environment variable `EBD_PATH`, which users are encouraged\nto set to the directory "
  },
  {
    "path": "docs/reference/auk_last_edited.html",
    "chars": 12188,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_last_edited.md",
    "chars": 3557,
    "preview": "# Filter the eBird data by last edited date\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a range of\nlast "
  },
  {
    "path": "docs/reference/auk_observer.html",
    "chars": 14298,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_observer.md",
    "chars": 4093,
    "preview": "# Filter the eBird data by observer\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a set of\nobserver IDs Th"
  },
  {
    "path": "docs/reference/auk_project.html",
    "chars": 14749,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_project.md",
    "chars": 4208,
    "preview": "# Filter the eBird data by project code\n\nSome eBird records are collected as part of a particular project (e.g.\nthe Virg"
  },
  {
    "path": "docs/reference/auk_protocol.html",
    "chars": 14773,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_protocol.md",
    "chars": 4345,
    "preview": "# Filter the eBird data by protocol\n\nFilter to just data collected following a specific search protocol:\nstationary, tra"
  },
  {
    "path": "docs/reference/auk_rollup.html",
    "chars": 21112,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_rollup.md",
    "chars": 6232,
    "preview": "# Roll up eBird taxonomy to species\n\nThe eBird Basic Dataset (EBD) includes both true species and every other\nfield-iden"
  },
  {
    "path": "docs/reference/auk_sampling.html",
    "chars": 10924,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_sampling.md",
    "chars": 2205,
    "preview": "# Reference to eBird sampling event file\n\nCreate a reference to an eBird sampling event file in preparation for\nfilterin"
  },
  {
    "path": "docs/reference/auk_select.html",
    "chars": 10840,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_select.md",
    "chars": 1857,
    "preview": "# Select a subset of columns\n\nSelect a subset of columns from the eBird Basic Dataset (EBD) or the\nsampling events file."
  },
  {
    "path": "docs/reference/auk_set_awk_path.html",
    "chars": 8987,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_set_awk_path.md",
    "chars": 1327,
    "preview": "# Set a custom path to AWK executable\n\nIf AWK has been installed in a non-standard location, the environment\nvariable `A"
  },
  {
    "path": "docs/reference/auk_set_ebd_path.html",
    "chars": 9366,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_set_ebd_path.md",
    "chars": 1445,
    "preview": "# Set the path to EBD text files\n\nUsers of `auk` are encouraged to set the path to the directory\ncontaining the eBird Ba"
  },
  {
    "path": "docs/reference/auk_species.html",
    "chars": 16022,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_species.md",
    "chars": 5138,
    "preview": "# Filter the eBird data by species\n\nDefine a filter for the eBird Basic Dataset (EBD) based on species. This\nfunction on"
  },
  {
    "path": "docs/reference/auk_split.html",
    "chars": 12189,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_split.md",
    "chars": 2833,
    "preview": "# Split an eBird data file by species\n\nGiven an eBird Basic Dataset (EBD) and a list of species, split the file\ninto mul"
  },
  {
    "path": "docs/reference/auk_state.html",
    "chars": 18326,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_state.md",
    "chars": 5501,
    "preview": "# Filter the eBird data by state\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a set of\nstates. This funct"
  },
  {
    "path": "docs/reference/auk_time.html",
    "chars": 15096,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_time.md",
    "chars": 4268,
    "preview": "# Filter the eBird data by checklist start time\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a range of\ns"
  },
  {
    "path": "docs/reference/auk_unique.html",
    "chars": 12798,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_unique.md",
    "chars": 3484,
    "preview": "# Remove duplicate group checklists\n\neBird checklists can be shared among a group of multiple observers, in\nwhich case o"
  },
  {
    "path": "docs/reference/auk_version.html",
    "chars": 9158,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_version.md",
    "chars": 1599,
    "preview": "# Versions of auk, the EBD, and the eBird taxonomy\n\nThis package depends on the version of the EBD and on the eBird\ntaxo"
  },
  {
    "path": "docs/reference/auk_year.html",
    "chars": 19316,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_year.md",
    "chars": 5819,
    "preview": "# Filter the eBird data to a set of years\n\nDefine a filter for the eBird Basic Dataset (EBD) based on a set of\nyears. Th"
  },
  {
    "path": "docs/reference/auk_zerofill.auk_ebd.html",
    "chars": 310,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_zerofill.character.html",
    "chars": 310,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_zerofill.data.frame.html",
    "chars": 310,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/auk_zerofill.html",
    "chars": 18461,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/auk_zerofill.md",
    "chars": 6466,
    "preview": "# Read and zero-fill an eBird data file\n\nRead an eBird Basic Dataset (EBD) file, and associated sampling event\ndata file"
  },
  {
    "path": "docs/reference/bcr_codes.html",
    "chars": 6824,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/bcr_codes.md",
    "chars": 793,
    "preview": "# BCR Codes\n\nA data frame of Bird Conservation Region (BCR) codes. BCRs are\necologically distinct regions in North Ameri"
  },
  {
    "path": "docs/reference/collapse_zerofill.html",
    "chars": 310,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/au"
  },
  {
    "path": "docs/reference/ebird_species.html",
    "chars": 10798,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/ebird_species.md",
    "chars": 2255,
    "preview": "# Lookup species in eBird taxonomy\n\nGiven a list of common or scientific names, or species codes, check that\nthey appear"
  },
  {
    "path": "docs/reference/ebird_states.html",
    "chars": 7516,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/ebird_states.md",
    "chars": 1105,
    "preview": "# eBird States\n\nA data frame of state codes used by eBird. These codes are 4 to 6\ncharacters, consisting of two parts, t"
  },
  {
    "path": "docs/reference/ebird_taxonomy.html",
    "chars": 8864,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/ebird_taxonomy.md",
    "chars": 2043,
    "preview": "# eBird Taxonomy\n\nA simplified version of the taxonomy used by eBird. Includes proper\nspecies as well as various other c"
  },
  {
    "path": "docs/reference/filter_repeat_visits.html",
    "chars": 18983,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/filter_repeat_visits.md",
    "chars": 6268,
    "preview": "# Filter observations to repeat visits for hierarchical modeling\n\nHierarchical modeling of abundance and occurrence requ"
  },
  {
    "path": "docs/reference/format_unmarked_occu.html",
    "chars": 26739,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/format_unmarked_occu.md",
    "chars": 7834,
    "preview": "# Format EBD data for occupancy modeling with `unmarked`\n\nPrepare a data frame of species observations for ingestion int"
  },
  {
    "path": "docs/reference/get_ebird_taxonomy.html",
    "chars": 8888,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/get_ebird_taxonomy.md",
    "chars": 1912,
    "preview": "# Get eBird taxonomy via the eBird API\n\nGet the taxonomy used in eBird via the eBird API.\n\n## Usage\n\n``` r\nget_ebird_tax"
  },
  {
    "path": "docs/reference/index.html",
    "chars": 13394,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/index.md",
    "chars": 6212,
    "preview": "# Package index\n\n## EBD Objects\n\n- [`auk_ebd()`](https://cornelllabofornithology.github.io/auk/reference/auk_ebd.md)\n  :"
  },
  {
    "path": "docs/reference/process_barcharts.html",
    "chars": 18846,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/process_barcharts.md",
    "chars": 4687,
    "preview": "# Process eBird bar chart data\n\neBird bar charts show the frequency of detection for each week for all\nspecies within a "
  },
  {
    "path": "docs/reference/read_ebd.auk_ebd.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/read_ebd.character.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/read_ebd.html",
    "chars": 17633,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/read_ebd.md",
    "chars": 5047,
    "preview": "# Read an EBD file\n\nRead an eBird Basic Dataset file using\n[`readr::read_delim()`](https://readr.tidyverse.org/reference"
  },
  {
    "path": "docs/reference/read_sampling.auk_ebd.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/read_sampling.auk_sampling.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/read_sampling.character.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/read_sampling.html",
    "chars": 302,
    "preview": "<html>\n  <head>\n    <meta http-equiv=\"refresh\" content=\"0;URL=https://cornelllabofornithology.github.io/auk/reference/re"
  },
  {
    "path": "docs/reference/valid_protocols.html",
    "chars": 6161,
    "preview": "<!DOCTYPE html>\n<!-- Generated by pkgdown: do not edit by hand --><html lang=\"en\"><head><meta http-equiv=\"Content-Type\" "
  },
  {
    "path": "docs/reference/valid_protocols.md",
    "chars": 462,
    "preview": "# Valid Protocols\n\nA vector of valid protocol names, e.g. \"Traveling\", \"Stationary\", etc.\n\n## Usage\n\n``` r\nvalid_protoco"
  }
]

// ... and 78 more files (download for full content)

About this extraction

This page contains the full source code of the CornellLabofOrnithology/auk GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 278 files (5.1 MB), approximately 1.4M tokens, and a symbol index with 86 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo