Full Code of mrcaseb/nflfastR for AI

master 0489133d85c5 cached
131 files
852.0 KB
256.7k tokens
1 requests
Download .txt
Showing preview only (895K chars total). Download the full file or copy to clipboard to get everything.
Repository: mrcaseb/nflfastR
Branch: master
Commit: 0489133d85c5
Files: 131
Total size: 852.0 KB

Directory structure:
gitextract_h9htwsm9/

├── .Rbuildignore
├── .git-blame-ignore-revs
├── .github/
│   ├── .gitignore
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── format-suggest.yaml
│       ├── pkgdown.yaml
│       ├── revdepcheck.yaml
│       └── rhub.yaml
├── .gitignore
├── .vscode/
│   ├── extensions.json
│   └── settings.json
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── aggregate_game_stats.R
│   ├── aggregate_game_stats_def.R
│   ├── aggregate_game_stats_kicking.R
│   ├── build_nflfastR_pbp.R
│   ├── build_playstats.R
│   ├── calculate_series_conversion_rates.R
│   ├── calculate_standings.R
│   ├── calculate_stats.R
│   ├── data_documentation.R
│   ├── database.R
│   ├── ep_wp_calculators.R
│   ├── helper_add_cp_cpoe.R
│   ├── helper_add_ep_wp.R
│   ├── helper_add_fixed_drives.R
│   ├── helper_add_game_data.R
│   ├── helper_add_nflscrapr_mutations.R
│   ├── helper_add_series_data.R
│   ├── helper_add_xpass.R
│   ├── helper_add_xyac.R
│   ├── helper_additional_functions.R
│   ├── helper_database_functions.R
│   ├── helper_decode_player_ids.R
│   ├── helper_get_scheds_and_rosters.R
│   ├── helper_scrape_gc.R
│   ├── helper_scrape_nfl.R
│   ├── helper_tidy_play_stats.R
│   ├── helper_variable_selector.R
│   ├── nflfastR-package.R
│   ├── report.R
│   ├── save_raw_pbp.R
│   ├── sysdata.rda
│   ├── top-level_scraper.R
│   └── utils.R
├── README.Rmd
├── README.md
├── air.toml
├── cran-comments.md
├── data/
│   ├── field_descriptions.rda
│   ├── nfl_stats_variables.rda
│   ├── stat_ids.rda
│   └── teams_colors_logos.rda
├── data-raw/
│   ├── MODELS.R
│   ├── Scrambles 1999-2004 UPDATE for NFLfastR.xlsx
│   ├── Scrambles.1999-2003.FURTHER.UPDATE.for.NFLfastR.xlsx
│   ├── _tune_spread_wp.R
│   ├── build_scramble_fix.R
│   ├── build_stat_id_df.R
│   ├── compare_dfs.R
│   ├── create_field_descriptions.R
│   ├── default_play.R
│   ├── nfl_stats_variables.R
│   ├── nfl_stats_variables.json
│   ├── pbp_datatypes.csv
│   ├── pbp_defaultplay.rds
│   ├── replace_models.R
│   ├── scramble_fix.rds
│   ├── scrambles_2005.xlsx
│   ├── teams_colors_logos.R
│   ├── tidy_play_stats_row.R
│   ├── variable_explanation.xlsx
│   ├── variable_list.txt
│   └── wordmarks.R
├── man/
│   ├── add_qb_epa.Rd
│   ├── add_xpass.Rd
│   ├── add_xyac.Rd
│   ├── build_nflfastR_pbp.Rd
│   ├── calculate_expected_points.Rd
│   ├── calculate_player_stats.Rd
│   ├── calculate_player_stats_def.Rd
│   ├── calculate_player_stats_kicking.Rd
│   ├── calculate_series_conversion_rates.Rd
│   ├── calculate_standings.Rd
│   ├── calculate_stats.Rd
│   ├── calculate_win_probability.Rd
│   ├── clean_pbp.Rd
│   ├── decode_player_ids.Rd
│   ├── fast_scraper.Rd
│   ├── fast_scraper_roster.Rd
│   ├── fast_scraper_schedules.Rd
│   ├── field_descriptions.Rd
│   ├── missing_raw_pbp.Rd
│   ├── nfl_stats_variables.Rd
│   ├── nflfastR-package.Rd
│   ├── reexports.Rd
│   ├── report.Rd
│   ├── save_raw_pbp.Rd
│   ├── stat_ids.Rd
│   ├── teams_colors_logos.Rd
│   ├── update_db.Rd
│   └── update_pbp_db.Rd
├── nflfastR.Rproj
├── pkgdown/
│   ├── _pkgdown.yml
│   └── extra.css
├── tests/
│   ├── testthat/
│   │   ├── 2019/
│   │   │   └── 2019_01_GB_CHI.rds
│   │   ├── 2025/
│   │   │   └── 2025_01_KC_LAC.rds
│   │   ├── _snaps/
│   │   │   ├── build_nflfastR_pbp.md
│   │   │   └── stats/
│   │   │       └── calculate_stats.md
│   │   ├── expected_ep.rds
│   │   ├── expected_pbp.rds
│   │   ├── expected_sc.rds
│   │   ├── expected_sc_weekly.rds
│   │   ├── expected_wp.rds
│   │   ├── games.rds
│   │   ├── helpers.R
│   │   ├── test-build_nflfastR_pbp.R
│   │   ├── test-calculate_series_conversion_rates.R
│   │   ├── test-calculate_stats.R
│   │   └── test-ep_wp_calculators.R
│   └── testthat.R
├── tools/
│   └── check.env
└── vignettes/
    ├── .gitignore
    ├── beginners_guide.Rmd
    ├── field_descriptions.Rmd
    ├── nflfastR.Rmd
    └── stats_variables.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^data-raw$
^README\.Rmd$
^.*\.pdf$
^.github$
^_pkgdown\.yml$
^docs$
^pkgdown$
^vignettes/articles$
^\.github$
^vignettes/nflfastR-models\.Rmd$
^vignettes$
^\.travis\.yml$
^man/figures/card\.png$
^man/figures/header_github\.png$
^man/figures/header_twitter\.png$
^man/figures/nflfastR_logo_fillsize\.png$
^cran-comments\.md$
^CRAN-RELEASE$
^man/figures/readme-cp-model-1\.png$
^man/figures/readme-epa-model-1\.png$
^revdep$
^CRAN-SUBMISSION$
^[.]?air[.]toml$
^\.vscode$
^\.git-blame-ignore-revs$


================================================
FILE: .git-blame-ignore-revs
================================================
# This file lists revisions of large-scale formatting/style changes so that
# they can be excluded from git blame results.
#
# To set this file as the default ignore file for git blame, run:
#   $ git config blame.ignoreRevsFile .git-blame-ignore-revs

# Format whole project with air format . (#47)
66de9ebe6d53415a770de224c1f0f442ef22358c


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
  workflow_dispatch:

name: R-CMD-check.yaml

permissions: read-all

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: macos-latest,   r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
          - {os: ubuntu-latest,   r: 'release'}
          - {os: ubuntu-latest,   r: 'oldrel-1'}

    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: |
            any::rcmdcheck
            nflverse/fastrmodels
            nflverse/nflreadr
            nflverse/nflseedR
          needs: check

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true
          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'


================================================
FILE: .github/workflows/format-suggest.yaml
================================================
# Workflow derived from https://github.com/posit-dev/setup-air/tree/main/examples

on:
  # Using `pull_request_target` over `pull_request` for elevated `GITHUB_TOKEN`
  # privileges, otherwise we can't set `pull-requests: write` when the pull
  # request comes from a fork, which is our main use case (external contributors).
  #
  # `pull_request_target` runs in the context of the target branch (`main`, usually),
  # rather than in the context of the pull request like `pull_request` does. Due
  # to this, we must explicitly checkout `ref: ${{ github.event.pull_request.head.sha }}`.
  # This is typically frowned upon by GitHub, as it exposes you to potentially running
  # untrusted code in a context where you have elevated privileges, but they explicitly
  # call out the use case of reformatting and committing back / commenting on the PR
  # as a situation that should be safe (because we aren't actually running the untrusted
  # code, we are just treating it as passive data).
  # https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/
  pull_request_target:

name: format-suggest.yaml

jobs:
  format-suggest:
    name: format-suggest
    runs-on: ubuntu-latest

    permissions:
      # Required to push suggestion comments to the PR
      pull-requests: write

    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Install
        uses: posit-dev/setup-air@v1

      - name: Format
        run: air format .

      - name: Suggest
        uses: reviewdog/action-suggester@v1
        with:
          level: error
          fail_level: error
          tool_name: air


================================================
FILE: .github/workflows/pkgdown.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]
  release:
    types: [published]
  workflow_dispatch:

name: pkgdown

jobs:
  pkgdown:
    runs-on: ubuntu-latest
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
      NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
      isPush: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true


      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: |
            r-lib/pkgdown
            nflverse/fastrmodels
            nflverse/nflplotR
            nflverse/nflreadr
            any::tidyverse
            any::ggrepel
            any::knitr
            any::tictoc
            any::ragg
            any::DT
            local::.
          needs: website

      - name: Build site
        run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
        shell: Rscript {0}

      - name: Deploy to GitHub pages 🚀
        if: github.event_name != 'pull_request'
        uses: JamesIves/github-pages-deploy-action@v4.5.0
        with:
          clean: false
          branch: gh-pages
          folder: docs

      - name: Deploy to Netlify
        if: contains(env.isPush, 'false')
        id: netlify-deploy
        uses: nwtgck/actions-netlify@v1.1
        with:
          publish-dir: './docs'
          production-branch: master
          github-token: ${{ secrets.GITHUB_TOKEN }}
          overwrites-pull-request-comment: true
          deploy-message:
            'Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})'
        timeout-minutes: 1


================================================
FILE: .github/workflows/revdepcheck.yaml
================================================
# Workflow derived from https://github.com/r-devel/recheck?tab=readme-ov-file#how-to-use-with-github-actions
on:
  workflow_dispatch:
    inputs:
      which:
        type: choice
        description: Which dependents to check
        options:
        - strong
        - most

name: Reverse dependency check

jobs:
  revdep_check:
    name: Reverse check ${{ inputs.which }} dependents
    uses: r-devel/recheck/.github/workflows/recheck.yml@v1
    with:
      which: ${{ inputs.which }}


================================================
FILE: .github/workflows/rhub.yaml
================================================
# R-hub's generic GitHub Actions workflow file. It's canonical location is at
# https://github.com/r-hub/actions/blob/v1/workflows/rhub.yaml
# You can update this file to a newer version using the rhub2 package:
#
# rhub::rhub_setup()
#
# It is unlikely that you need to modify this file manually.

name: R-hub
run-name: "${{ github.event.inputs.id }}: ${{ github.event.inputs.name || format('Manually run by {0}', github.triggering_actor) }}"

on:
  workflow_dispatch:
    inputs:
      config:
        description: 'A comma separated list of R-hub platforms to use.'
        type: string
        default: 'linux,windows,macos'
      name:
        description: 'Run name. You can leave this empty now.'
        type: string
      id:
        description: 'Unique ID. You can leave this empty now.'
        type: string

jobs:

  setup:
    runs-on: ubuntu-latest
    outputs:
      containers: ${{ steps.rhub-setup.outputs.containers }}
      platforms: ${{ steps.rhub-setup.outputs.platforms }}

    steps:
    # NO NEED TO CHECKOUT HERE
    - uses: r-hub/actions/setup@v1
      with:
        config: ${{ github.event.inputs.config }}
      id: rhub-setup

  linux-containers:
    needs: setup
    if: ${{ needs.setup.outputs.containers != '[]' }}
    runs-on: ubuntu-latest
    name: ${{ matrix.config.label }}
    strategy:
      fail-fast: false
      matrix:
        config: ${{ fromJson(needs.setup.outputs.containers) }}
    container:
      image: ${{ matrix.config.container }}

    steps:
      - uses: r-hub/actions/checkout@v1
      - uses: r-hub/actions/platform-info@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/setup-deps@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/run-check@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}

  other-platforms:
    needs: setup
    if: ${{ needs.setup.outputs.platforms != '[]' }}
    runs-on: ${{ matrix.config.os }}
    name: ${{ matrix.config.label }}
    strategy:
      fail-fast: false
      matrix:
        config: ${{ fromJson(needs.setup.outputs.platforms) }}

    steps:
      - uses: r-hub/actions/checkout@v1
      - uses: r-hub/actions/setup-r@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}
      - uses: r-hub/actions/platform-info@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/setup-deps@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}
      - uses: r-hub/actions/run-check@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}


================================================
FILE: .gitignore
================================================
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
# User-specific files
.Ruserdata
# Example code in package build process
*-Ex.R
# Output files from R CMD build
/*.tar.gz
# Output files from R CMD check
/*.Rcheck/
# RStudio files
.Rproj.user/
# produced vignettes
vignettes/*.html
vignettes/*.pdf
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
# knitr and R markdown default cache directories
*_cache/
/cache/
# Temporary files created by R markdown
*.utf8.md
*.knit.md
# R Environment Variables
.Renviron
.DS_Store
docs
inst/doc
revdep


================================================
FILE: .vscode/extensions.json
================================================
{
    "recommendations": [
        "Posit.air-vscode"
    ]
}


================================================
FILE: .vscode/settings.json
================================================
{
    "[r]": {
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "Posit.air-vscode"
    },
    "[quarto]": {
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "quarto.quarto"
    }
}


================================================
FILE: DESCRIPTION
================================================
Type: Package
Package: nflfastR
Title: Functions to Efficiently Access NFL Play by Play Data
Version: 5.2.0.9012
Authors@R: c(
    person("Sebastian", "Carl", , "mrcaseb@gmail.com", role = "aut"),
    person("Ben", "Baldwin", , "bbaldwin206@gmail.com", role = c("cre", "aut")),
    person("Lee", "Sharpe", role = "ctb"),
    person("Maksim", "Horowitz", , "maksim.horowitz@gmail.com", role = "ctb"),
    person("Ron", "Yurko", , "ryurko@stat.cmu.edu", role = "ctb"),
    person("Samuel", "Ventura", , "samventura22@gmail.com", role = "ctb"),
    person("Tan", "Ho", role = "ctb"),
    person("John", "Edwards", , "edwards1860@gmail.com", role = "ctb")
  )
Description: A set of functions to access National Football League
    play-by-play data from <https://www.nfl.com/>.
License: MIT + file LICENSE
URL: https://nflfastr.com/, https://github.com/nflverse/nflfastR
BugReports: https://github.com/nflverse/nflfastR/issues
Depends: 
    R (>= 4.1.0)
Imports: 
    cli (>= 3.0.0),
    curl,
    data.table (>= 1.15.0),
    dplyr (>= 1.0.0),
    fastrmodels (>= 2.1.0),
    furrr,
    future,
    glue,
    janitor,
    lifecycle,
    mgcv,
    nflreadr (>= 1.2.0),
    progressr (>= 0.6.0),
    rlang (>= 0.4.7),
    stringr (>= 1.4.0),
    tibble (>= 3.0),
    tidyr (>= 1.0.0),
    xgboost (>= 1.1)
Suggests: 
    DBI,
    duckdb,
    gsisdecoder,
    nflseedR (>= 2.0.0),
    purrr (>= 0.3.0),
    rmarkdown,
    RSQLite,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3


================================================
FILE: LICENSE
================================================
YEAR: 2020
COPYRIGHT HOLDER: Sebastian Carl; Ben Baldwin


================================================
FILE: LICENSE.md
================================================
# MIT License

Copyright (c) 2020 Sebastian Carl; Ben Baldwin

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

export(add_qb_epa)
export(add_xpass)
export(add_xyac)
export(build_nflfastR_pbp)
export(calculate_expected_points)
export(calculate_player_stats)
export(calculate_player_stats_def)
export(calculate_player_stats_kicking)
export(calculate_series_conversion_rates)
export(calculate_standings)
export(calculate_stats)
export(calculate_win_probability)
export(clean_pbp)
export(decode_player_ids)
export(fast_scraper)
export(fast_scraper_roster)
export(fast_scraper_schedules)
export(load_pbp)
export(load_player_stats)
export(load_rosters)
export(load_schedules)
export(load_team_stats)
export(missing_raw_pbp)
export(most_recent_season)
export(nflverse_sitrep)
export(report)
export(save_raw_pbp)
export(update_db)
export(update_pbp_db)
import(dplyr)
import(fastrmodels)
importFrom(data.table,"%between%")
importFrom(data.table,"%chin%")
importFrom(nflreadr,load_pbp)
importFrom(nflreadr,load_player_stats)
importFrom(nflreadr,load_rosters)
importFrom(nflreadr,load_schedules)
importFrom(nflreadr,load_team_stats)
importFrom(nflreadr,most_recent_season)
importFrom(nflreadr,nflverse_sitrep)
importFrom(rlang,"%||%")
importFrom(rlang,":=")
importFrom(rlang,.data)
importFrom(rlang,.env)
importFrom(xgboost,getinfo)


================================================
FILE: NEWS.md
================================================
# nflfastR (development version)

- Added new function `update_pbp_db()`, a fresh approach to the database helper. (#544)
- Added `"game_id"` to the output `calculate_stats()` if `summary_level == "week"`. (#566)
- Fixed a bug where `fixed_drive` did not increment after a muffed blocked field goal attempt. Yes this happened in `"2025_10_NO_CAR"`, play id 2504. (#567)
- nflfastR stopped supporting the 1999 and 2000 seasons because of inconsistent data sources. Data is still available through `load_pbp()` but we will not fix any issues related to those old seasons anymore. It's possible to install nflfastR v5.2.0 (with `pak::pak("nflverse/nflfastR@v5.2.0")`) to parse those seasons if necessary. (#568)
- Implemented a fresh approach to compute `play_type` based on `play_type_nfl` for faster and more consistent output. (#568)
- Fixed a bug where nflfastR overwrote the kickoff_attempt variable in the event of a penalty on a kickoff. (#569)
- Added various definitions of 'explosive' plays to the output of `calculate_stats()`. It counts passes, runs, and receptions with 10+, 20+, 40+ yards gained as well as 12+ yard runs and 16+ yard passes. (#573)
- Added several punting stats to the output of `calculate_stats()`. (#574)
- Added overall fumble counters to the output of `calculate_stats()` because it was missing some edge case fumbles on offense. (#575)
- The `play_type` variable now possibly shows `"pass"` or `"run"` on 2 point conversion plays with a post-snap penalty enforced between downs. This is different from `play_type_nfl` (which will show `"PENALTY"` in these cases). (#579)
- Fixed bug where `calculate_stats()` counted fumble recoveries in `fumble_recovery_yards_own` and `fumble_recovery_yards_opp` instead of the corresponding yards. (#584)
- Fixed bug where `calculate_stats()` counted some blocked punts as punt attempts that officially do not count as punt attempts. (#584)
- Fixed bug where `calculate_stats()` overcounted first downs in some edge cases. (#587)
- nflfastR now loads raw play-by-play data from season based releases in the `nflverse/nflverse-pbp` GitHub repository. The legacy repository `nflverse/nflfastR-raw` is deprecated and won't update in future seasons. This means that previous nflfastR versions won't be able to download 2026+ seasons! (#589)

# nflfastR 5.2.0

- Bump required fastrmodels version to 2.0 for better compatibility with xgboost.
- Fixed an issue with duplicated play IDs in some 2000 games. (#521)
- Added the argument `pbp` to `calculate_stats()` to allow stats calculation based on subsets of nflverse play-by-play data. (#524)
- Fixed a bug where `calculate_stats()` didn't count 60 yard field goal attempts in `"fg_made_60_"` and `"fg_missed_60_"`. (#531)
- Fixed a bug where `clean_pbp()` did not provide a passer on plays where scrambles where manually adjusted based on data from Aaron Schatz. (#536)
- nflfastR now directly reexports nflreadr's `load_pbp()`, `load_player_stats()`, and `load_team_stats()`. This means that the functions can be called normally via nflfastR, but are no longer available in the documentation (whether in the R Help or on the pkgdown website). Instead, only links to nflreadr are included. This ensures that the documentation is always up to date. (#538)
- `fast_scraper_roster()` and `fast_scraper_schedules()` are officially deprecated and will be removed in a future update. Please use `load_rosters()` and `load_schedules()`. (#539)
- `report()` is deprecated and will be removed in a future update. Please use `nflverse_sitrep()`. (#540)
- Fixed incompatibility with xgboost v3 model outputs. (#553)
- Added `"Kickoff Out of Bounds"` (introduced in the 2024 season) to the `penalty_type` variable in play-by-play. (#560)

Thank you to &#x0040;Doug-Analytics, &#x0040;isaactpetersen, &#x0040;jeleff1000, &#x0040;JoeMarino2021, &#x0040;kbannon77, &#x0040;lancejames35, &#x0040;LinkedInMindset, &#x0040;manbradcalf, &#x0040;mrcaseb, &#x0040;thedfszone, &#x0040;TheMathNinja, and &#x0040;zaynpatel for their questions, feedback, and contributions towards this release.

# nflfastR 5.1.0

- The function `calculate_standings()` has been deprecated. Please use `nflseedR::nfl_standings()` in nflseedR v2.0 instead. (#510)
- nflfastR now requires R 4.1 to allow the package to use R's native pipe `|>` operator. This follows the [Tidyverse R version support rules](https://tidyverse.org/blog/2019/04/r-version-support/). (#511)
- Fixed a bug where `calculate_stats()` incorrectly counted `receiving_air_yards`. (#500)
- Fixed a bug where `vegas_wp` variables were broken when `spread_line` data was missing. (#503)
- Fixed a bug where `calculate_stats()` incorrectly calculated `target_share` and `air_yards_share` when `summary_level = "season"`. (#505)
- Fixed a bug where `calculate_stats()` incorrectly counted `fumbles`. (#514)
- Compatibility improvements with xgboost. (#517)

Thank you to &#x0040;ak47twq, &#x0040;isaactpetersen, &#x0040;jacobakaye, &#x0040;johnpholden, &#x0040;marvin3FF, &#x0040;mrcaseb, and &#x0040;tanho63 for their questions, feedback, and contributions towards this release.

# nflfastR 5.0.0

## Major Changes

- Added new function `calculate_stats()` that combines the output of all `calculate_player_stats*()` functions with a more robust and faster approach. The `calculate_player_stats*()` function will be deprecated in a future release. (#470)
- Added new exported dataframe `nfl_stats_variables`. It lists and explains all variables returned by `calculate_stats()`. A searchable table is available at <https://nflfastr.com/articles/stats_variables.html>. (#470)

## Bug Fixes and Minor Changes

- Drop `{crayon}`, `{DT}`, `{httr}`, `{jsonlite}`, `{qs}` dependencies. (#453)
- The function `calculate_player_stats_def` now returns `season_type` if argument `weekly` is set to `TRUE` for consistency with the other player stats functions. (#455)
- The function `missing_raw_pbp()` now allows filtering by season. (#457)
- More robust handling of player IDs in `decode_player_ids()`. (#458)
- Fixed rare cases where the value of the `yrdln` variable didn't equal `"MID 50"` at midfield. (#459)
- Fixed rare cases where `drive_start_yard_line` missed the blank space between team name and yard line number. (#459)
- Fixed play description in some 1999 and 2000 games where the string "D.Holland" replaced the kick distance. (#459)
- Fixed a problem where the `goal_to_go` variable was `FALSE` in actual goal to go situations. (#460)
- Fixed a bug in `fixed_drive` and `fixed_drive_result` where the second weather delay in `2023_13_ARI_PIT` wasn't identified correctly. (#461)
- `punter_player_id`, and `punter_player_name` are filled for blocked punt attempts. (#463)
- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468, #489)
- Updated the dataframe `stat_ids` with some IDs that were previously missing. (#470)
- nflfastR tried to fix bugs in the underlying pbp data of JAX home games prior to the 2016 season. An update of the raw pbp data resolved those bugs so nflfastR needs to remove the hard coded adjustments. This means that nflfastR <= v4.6.1 will return incorrect pbp data for all Jacksonville home games prior to the 2016 season! (#478)
- Fixed a problem where `clean_pbp()` returned `pass = 1` in actual rush plays in very rare cases. (#479)
- Removed extra lines for injury timeouts that were breaking `fixed_drive` (#482)
- The variable `penalty_type` now correctly lists the penalty "Kickoff Short of Landing Zone" introduced in the 2024 season. (#486)
- Fixed a bug where `ep` was incorrect on PAT attempts preceded by a timeout and then a penalty (extremely rare). This bug also caused the variables `total_home_epa` and `total_away_epa` to be incorrect for all subsequent plays in the same game. (#493)

Thank you to &#x0040;ahmed-cheema, &#x0040;andrewtek, &#x0040;guga31bb, &#x0040;isaactpetersen, &#x0040;JoeMarino2021, &#x0040;john-b-edwards, &#x0040;marcusSasser, &#x0040;mlounsberry, &#x0040;morganandrew, &#x0040;mrcaseb, &#x0040;mscoop16, &#x0040;parsnipz, &#x0040;rjthompson2, and &#x0040;Useight for their questions, feedback, and contributions towards this release.

# nflfastR 4.6.1

- The function `calculate_series_conversion_rates()` now correctly aggregates season level conversion rates. Performance has also been improved. (#440)
- Adjusted test behavior at CRAN's request. 

Thank you to
&#x0040;andrewtek, &#x0040;gregalvi86, &#x0040;Ic4ru5Wing, &#x0040;JoeMarino2021, &#x0040;jreddy1990, &#x0040;marvin3FF, &#x0040;mrcaseb, &#x0040;RicShern, &#x0040;SPNE, and &#x0040;trivialfis for their questions, feedback, and contributions towards this release.

# nflfastR 4.6.0

## New Features

- nflfastR now fully supports loading raw pbp data from local file system. The best way to use this feature is to set `options("nflfastR.raw_directory" = {"your/local/directory"})`. Alternatively, both `build_nflfastR_pbp()` and `fast_scraper()` support the argument `dir` which defaults to the above option. (#423)
- Added the new function `save_raw_pbp()` which efficiently downloads raw play-by-play data and saves it to the local file system. This serves as a helper to setup the system for faster play-by-play parsing via the above functionality. (#423)
- Added the new function `missing_raw_pbp()` that computes a vector of game IDs missing in the local raw play-by-play directory. (#423)

## Minor Improvements and Bugfixes

- The internal function `get_pbp_nfl()` now uses `ifelse()` instead of `dplyr::if_else()` to handle some null-checking, fixes bug found in `2022_21_CIN_KC` match.
- The function `calculate_player_stats()` now summarises target share and air yards share correctly when called with argument `weekly = FALSE` (#413)
- The function `calculate_player_stats()` now returns the opponent team when called with argument `weekly = TRUE` (#414)
- The function `calculate_player_stats_def()` no longer errors when small subsets of pbp data are missing stats. (#415)
- The function `calculate_series_conversion_rates()` no longer returns `NA` values if a small subset of pbp data is missing series on offense or defense. (#417)
- `fixed_drive` now correctly increments on plays where posteam lost a fumble but remains posteam because defteam also lost a fumble during the same play. (#419)
- nflfastR now fixes missing drive number counts in raw pbp data in order to provide accurate drive information. (#420)
- nflfastR now returns correct `kick_distance` on all punts and kickoffs. (#422)
- Decode player IDs in 2023 pbp. (#425)
- Drop the pseudo plays TV Timeout and Two-Minute Warning. (#426)
- Fix posteam on kickoffs and PATs following a defensive TD in 2023+ pbp. (#427)
- `calculate_player_stats()` no more counts lost fumbles on plays where a player fumbles, a team mate recovers and then loses a fumble to the defense. (#431)
- The variables `passer`, `receiver`, and `rusher` no more return `NA` on "abnormal" plays - like direct snaps, aborted snaps, laterals etc. - that resulted in a penalty. (#435) 

Thank you to
&#x0040;903124, &#x0040;ak47twq, &#x0040;andrewtek, &#x0040;darkhark, &#x0040;dennisbrookner, &#x0040;marvin3FF, &#x0040;mistakia, &#x0040;mrcaseb, &#x0040;nicholasmendoza22, &#x0040;rickstarblazer, &#x0040;RileyJohnson22, and &#x0040;tanho63 for their questions, feedback, and contributions towards this release.

# nflfastR 4.5.1

* New implementation of tests to be able to identify breaking changes in reverse dependencies (#396, #406)
* `calculate_standings()` no more freezes when computing standings from schedules where some games are missing results, i.e. upcoming games.
* Bug fix that caused problems with upcoming dplyr and tidyselect updates that weren't reverse compatible.
* Significant performance improvements of internal functions. (#402)
* Wrap examples in `try()` to avoid CRAN problems. (#404)
* Fixed a bug where `calculate_standings()` wasn't able to handle nflverse pbp data. (#404)

# nflfastR 4.5.0

## New (experimental) functions
* Added new function `calculate_player_stats_def()` that aggregates defensive player stats either at game level or overall. (#288)
* The situation report `nflverse_sitrep` which is an alias of the already available `report()`
* Added new function `calculate_player_stats_kicking()` that aggregates player stats for field goals and extra points at game level or overall. (#381)
* Added new function `calculate_series_conversion_rates()` that computes series conversion and series result rates at a game level or season level. (#393)

## Bugfixes and Minor Improvements

* Internal change to `calculate_player_stats()` that reflects new nflverse data infrastructure.
* `calculate_player_stats()` now unifies player names and joins the following player information via `nflreadr::load_players()`:
  - `player_display_name` - Full name of the player
  - `position` - Position of the player
  - `position_group` - Position group of the player
  - `headshot_url` - URL to a player headshot image
* Make data work in 2022 (hopefully)
* Fix Amon-Ra St. Brown breaking the name parser
* Add gsis_id patch to `clean_pbp()`.
* `calculate_player_stats_def()` failed in situations where play-by-play data is missing certain stats. (#382)
* Spot-fixing `calculate_player_stats()` for `NA` names.

# nflfastR 4.4.0

## New Functions, Options, Data

* Added new function `calculate_standings()` that computes regular season division standings and playoff seeds from nflverse data.
* The database function `update_db()` now supports the option "nflfastR.dbdirectory" which can be used to set the directory of the nflfastR pbp database globally and independent of any project structure or working directories.
* The embedded data frame `?teams_colors_logos` has been updated to reflect the most recent team color themes and gained additional variables for conference and division as well as logo urls to the conference and league logos. (#290)
* The embedded data frame `?teams_colors_logos` has been updated with the Washington Commanders. (#312)

## Deprecation

* The argument `qs` in the functions `load_pbp()` and `load_player_stats()` has been deprecated as of nflfastR 4.3.0. This release removes the argument entirely. 

## Bugfixes and Minor Improvements

* Fixed bug where a player could be duplicated in `calculate_player_stats()` in very rare cases caused by plays with laterals. (#289)
* Fixed a bug where the function `add_xpass()` failed when called with an empty data frame. (#296)
* Fixed a bug where `play_type` showed `no_play` on plays with penalties that don't result in a replay of the down. (#277, #281)
* Fixed a bug in the variable descriptions of `total_home_score` and `total_away_score`. (#300)
* `fast_scraper_rosters()` and `fast_scraper_schedules()` now call `nflreadr::load_rosters()` and `nflreadr::load_schedules()` under the hood (#304)
* Fixed a bug causing missing EPA on game-ending turnovers in overtime
* Bump minimum nflreadr version to 1.2.0 for data repository change
* Fix a bug affecting yardline for a very small number of plays in the 2000 season (#323)
* `update_db()` now uses a default play to predefine column types for all db drivers. (#324)
* Fix a bug that resulted in incorrect `xyac_mean_yardage` on 4th downs (#327)
* Fix a bug that resulted in missing `xyac` information for plays involving J.O'Shaughnessy (#329)
* Fix a bug that resulted in missing `epa` on the last play of some games involving NE and BUF (#331)
* `fast_scraper()` and `build_nflfastR_pbp()` now return data frames of class `nflverse_data` to be consistent with `nflreadr`.
* Fix behavior of EP model in neutral site games (treats both teams as away teams)

# nflfastR 4.3.0

## Minor Changes

* Add [nflreadr](https://nflreadr.nflverse.com/) to dependecies and drop lubridate and magrittr dependency
* The functions `load_pbp()` and `load_player_stats()` now call `nflreadr::load_pbp()` and `nflreadr::load_player_stats()` respectively. Therefore the argument `qs` has been deprecated in both functions. It will be removed in a future release. Running `load_player_stats()` without any argument will now return player stats of the current season only (the default in `nflreadr`).
* The deprecated arguments `source` and `pp` in the functions `fast_scraper_*()` and `build_nflfastR_pbp()` have been removed
* Added the variables `racr` ("Receiver Air Conversion Ratio"), `target_share`, `air_yards_share`, `wopr` ("Weighted Opportunity Rating") and `pacr` ("Passing Air Conversion Ratio") to the output of `calculate_player_stats()`
* Added the function `report()` which will be used by the maintainers to help users debug their problems (#274).

## Bug Fixes

* Fixed a minor bug in the console output of `update_db()`
* Fix for a handful of missing `receiver` names (#270)
* Fixed bug with missing `return_team` on interception return touchdowns (#275)
* Fixed a rare bug where an internal object wasn't predefined (#272)

# nflfastR 4.2.0

* All `wpa` variables are `NA` on end game line
* All `wp` variables are 0, 0.5, 1, or `NA` on end game line
* Fix bug where win prob on PATs assumed a PAT placed at 15 yard line, even in older seasons
* The function `decode_player_ids()` now really decodes the new variable `fantasy_id` (#229)
* Fixed a bug that caused slightly differing `wp` values depending on the first game in the data set (#183)
* Edited GitHub references to point to nflverse
* Added the variables `sack_yards`, `sack_fumbles`, `rushing_fumbles` and `receiving_fumbles` to the output of the function `calculate_player_stats()`, thanks to Mike Filicicchia (@TheMathNinja). (#239)
* Fixed a bug where `calculate_player_stats()` falsely counted lost fumbles on aborted snaps (#238)
* Added the variable `season_type` to the output of `calculate_player_stats()` and `load_player_stats()` in preparation of the extended Regular Season starting in 2021 (#240)
* Updated `season_type` definitions in preparation of the extended Regular Season starting in 2021 (#242)
* Fix for `fixed_drive` where it wasn't incrementing when there was a muffed punt followed by timeout (#244)
* Fix for `fixed_drive` where it wasn't incrementing following an interception with the intercepting player then losing a fumble (#247)
* Fix for more issues with missing play info in 2018_01_ATL_PHI (#246)
* Added the variables `safety_player_name` and `safety_player_id` to the play-by-play data (#252)
* Dropped the dependency `usethis`

# nflfastR 4.1.0

## Breaking changes

### Functions

* Added the function `calculate_player_stats()` that aggregates official passing, rushing, and receiving stats either at game level or overall
* Added the function `load_player_stats()` that loads weekly player stats from 1999 to the most recent season
* The performance of the functions `add_xyac()` and `clean_pbp()` has been significantly improved

### New Variables

* Added the new columns `td_player_name` and `td_player_id` to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown)
* The function `calculate_player_stats()` now adds the variable `dakota`, the `epa` + `cpoe` composite, for players with minimum 5 pass attempts.
* Added column `home_opening_kickoff` to `clean_pbp()`
* Added the variables `sack_player_id`, `sack_player_name`, `half_sack_1_player_id`, `half_sack_1_player_name`, `half_sack_2_player_id` and `half_sack_2_player_name` who identify players that recorded sacks (or half sacks). Also updated the description of the variables `qb_hit_1_player_id`, `qb_hit_1_player_name`, `qb_hit_2_player_id` and `qb_hit_2_player_name` to make more clear that they did not record a sack. (#180)

## Minor improvements and fixes

* The variable `qb_scramble` was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included.
* Change `name`, `id`, `rusher`, and `rusher_id` to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162)
* The function `clean_pbp()` now standardizes the team name columns `tackle_with_assist_*_team`
* Fix bug in `drive` that was causing incorrect overtime win probabilities (#194)
* Fixed a bug where `posteam` was not `NA` on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values for `fixed_drive`, `fixed_drive_result`, `series` and `series_result`
* Fixed a bug where `fixed_drive` and `series` were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeouts
* Fixed a bug where `fixed_drive` and `series` were falsely incrementing on muffed punts recovered by the punting team for a touchdown
* Fixed a bug where `add_xpass()` crashed when ran with data already including xpass variables. 
* Fixed a bug in `epa` when a safety is scored by the team beginning the play in possession of the ball (#186)
* Fix some bugs related to David and Duke Johnson on the Texans in 2020 (#163)
* Fix yet another bug related to correctly identifying possession team on kickoffs nullified by penalty (#199)
* Fixed a bug where `calculate_player_stats()` forgot to clean player names by using their IDs
* Fixed a bug where special teams touchdowns were missing in the output of `calculate_player_stats()` (#203)
* Fixed for some old Jaguars games where the wrong team was awarded points for safeties and kickoff return TDs (#209)
* The function `update_db()` no more falsely closes a database connection provided by the argument `db_connection` (#210)
* Fixed a bug where `yards_gained` was missing yardage on plays with laterals. (#216)
* Fixed a bug where there were stats wrongly given on a play with penalty (#218)
* `fixed_drive` now increments properly on onside kick recoveries (#215)
* `fixed_drive` no longer counts a muffed kickoff as a one-play drive on its own (#217)
* `fixed_drive` now properly increments after a safety (#219)
* Improved parser for `penalty_type` and updated the description of the variable to make more clear it's the first penalty that happened on a play. (#223)

# nflfastR 4.0.0

## Breaking changes

### Changed Functions

* Deprecated the arguments `source` and `pp` all across the package. Using them will cause a 
warning. Parallel processing has to be activated by choosing an appropriate `future::plan()` before
calling the relevant functions. For more information please see [the package documentation](https://nflfastr.com/reference/nflfastR-package.html).
* The function `build_nflfastR_pbp()` will now run `decode_player_ids()` by default (can be deactivated with the argument `decode = FALSE`). 
* The function `build_nflfastR_pbp()` will now run `add_xpass()` by default and add the new variables `xpass` and `pass_oe`.
* The functions `fast_scraper()` and `build_nflfastR_pbp()` now allow the output of `fast_scraper_schedules()` directly as input so it's not necessary anymore to pull the `game_id` first.

### New Functions and Variables

* Added the new function `load_pbp()` that loads complete seasons into memory for fast access of the play-by-play data.
* Added the new variables `rushing_yards`, `lateral_rushing_yards`, `passing_yards`, `receiving_yards`, `lateral_receiving_yards` to fix an old bug where `yards_gained` gets overwritten on plays with laterals (#115).
* Added columns `vegas_wpa` and `vegas_home_wpa` which contain Win Probability Added from the spread-adjusted WP model
* Added column `out_of_bounds`
* Added columns `fantasy`, `fantasy_id`, `fantasy_player_name`, and `fantasy_player_id` that indicate the rusher or receiver on the play
* Added columns `tackle_with_assist`, `tackle_with_assist_1_player_id`, `tackle_with_assist_1_player_name`, `tackle_with_assist_1_team`, `tackle_with_assist_2_player_id`, `tackle_with_assist_2_player_name`, `tackle_with_assist_2_team`

### Models and Miscellaneous

* Tuned spread-adjusted win probability model one final (?) time. Expected points is now no longer 
required for `calculate_win_probability()`
* Added field descriptions `vignette("field_descriptions")` with a searchable list of all nflfastR variables
* Switched data source for 2001-2010 to what is used for 2011 and on
* All models have been moved to the [fastrmodels](https://cran.r-project.org/package=fastrmodels) package
* Added the data frames `?field_descriptions` and `?stat_ids` to the package

## Minor improvements and fixes

* Fix bug where `fixed_drive` and `series` weren't updating after muffed punt (#144)
* Fix bug induced by fixing the above (#149)
* Fix bug where some special teams plays were incorrectly being labeled as pass plays (#125)
* Fix bug where points for safeties were given to the `defteam` instead of the `posteam` (#152)
* Fix bug where a muffed punt TD was given to the wrong team in a 2011 Jaguars game (#154)
* Win probability is now calculated prior to PAT attempts rather than using WP on the ensuing kickoff
* Improved performance of internal functions that speed up the rebuilding process in `update_db()`
(added `qs` and `curl` to dependencies)
* Fixed a bug where `calculate_expected_points()` and `calculate_win_probability()` duplicated some existing variables instead of replacing them (#170)
* Fixed a bug where `penalty_type` wasn't `"no_play"` although it should have been (#172)
* Fixed a bug where `penalty_team` could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174)
* Fixed a bug related to the calculation of `epa` on plays before a failed pass interference challenge in a few 2019 games (#175)
* Fixed a bug related to lots of fields with `NA` on offsetting penalties (#44)
* Fixed a bug in `epa` when possession team changes at end of 1st or 3rd quarter (#182)
* Fixed a bug where various functions have left open connections
* `vegas_wp` is now `NA` on final line since there is no possession team


# nflfastR 3.2.0

## Models

* Performance update for win probability model with point spread (`vegas_wp`)
* Added `yardline_100` as an input to both win probability models (not having it included was an oversight)

## Minor improvements and fixes

* Fixed a bug where `series` was increased on PATs
* Fixed a bug affecting the week 10 Raiders-Broncos game
* Added the column `team_wordmark` - which contains URLs to the team's wordmarks - to the included data frame `?teams_colors_logos`

# nflfastR 3.1.1

## New features

### Database Function `update_db()`

* The argument `force_rebuild` of the function `update_db()` is now of hybrid 
type. It can rebuild the play by play data table either for the whole nflfastR 
era (with `force_rebuild = TRUE`) or just for specified seasons 
(e.g. `force_rebuild = 2019:2020`).
The latter is intended to be used for running seasons because the NFL fixes bugs
in the play by play data during the week and we recommend to rebuild the current 
season every Thursday.
* Fixed a bug where `update_db()` disconnected the connection to a database provided 
by the argument `db_connection` (#102)
* Fixed a bug where `update_db()` didn't build a fresh database without providing
the argument `force_rebuild`
* `update_db()` no longer removes the complete data table when a numeric argument 
`force_rebuild` is passed but only removes the rows within the table (#109)

### New Functions

* Added the new function `build_nflfastR_pbp()`, a convenient wrapper around 
multiple nflfastR functions for an easy creation of the nflfastR play-by-play data set
* Added a function that applies our experimental expected pass model, `add_xpass()`,
that creates columns `xpass` and `pass_oe`

## Minor improvements and fixes

* More fixes for `fixed_drive` which was not incrementing properly on drives
that began following a timeout
* Fixed more bugs in EPA and win probability on PATs and kickoffs with penalties
* Fixed a bug where scoring probabilities weren't adding to 1 on field goal 
attempts near the end of a half
* Messages to the user are now created with the new dependency `usethis`
* Fixed bug where plays with "backward pass" in play description were counted as 
pass plays (`pass` = 1)
* Fixed missing kick distance on touchbacks and blocked punts (#53)
* Added the option `fast` (either `TRUE` or `FALSE`) to the function 
`decode_player_ids()` to activate the high efficient C++ decoder of the package 
[`gsisdecoder`](https://cran.r-project.org/package=gsisdecoder)

# nflfastR 3.0.0

## Breaking changes

* `fast_scraper_roster()` is finally back! It loads NFL roster of a given season.
* Added the function `decode_player_ids()` to decode all player IDs to the 
commonly known GSIS ID format (00-00xxxxx)

## New features

* Add option `source = "old"` to `fast_scraper()` to enable scraping of old source.
This is mostly useless as it doesn't work for 2020 and provides less info
* Added new option `db_connection` to `update_db()` to allow advanced users to
use other DBI drivers, such as `RMariaDB::MariaDB()`, `RPostgres::Postgres()` or 
`odbc::odbc()` (please see [dbplyr](https://dbplyr.tidyverse.org/articles/dbplyr.html)
for more information)

## Minor improvements and fixes

* `clean_pbp()` now fixes some bugs in jersey numbers
* `clean_pbp()`, `add_qb_epa()` and `add_xyac()` can now handle empty data frames
* Fix empty line causing `fast_scraper()` to fail (affects multiple games of the 2020 season)
* Fix bug in `fixed_drive` that counted PAT after defensive TD as its own drive
* Fixed a bug which caused too high number of tackles in special cases
* Fixed a bug where CPOE was NA when targeting players with apostrophe in last name

# nflfastR 2.2.1

* Fix `add_xyac()` breaking with some old packages
* Fix `add_xyac()` and `add_qb_epa()` calculations being wrong for some failed 4th downs
* Updated Readme with ep and cp model plots
* Updated `vignette("examples")` with the new `add_xyac()` function
* Added xYAC model to `vignette("nflfastR-models")`
* Added variables `fixed_drive` and `fixed_drive_result` to the output of 
`fast_scraper()` because the NFL-provided drive info is extremely buggy
* Added variable `series_result`
* `clean_pbp()` now adds 4 new variables `passer_jersey_number`, 
`rusher_jersey_number`, `receiver_jersey_number` and `jersey_number`. These can 
be used to join rosters. 
* Fixed incorrect `timeout_team`, `return_team`, `fumble_recovery_1_team` for JAX
games from 2011-2015
* Re-trained EPA model with `fixed_drive` and corrections to `timeout_team`

# nflfastR 2.2.0

* New function `add_xyac()` which adds the following columns associated with expected yards after
the catch (xYAC): `xyac_epa`, `xyac_success`, `xyac_fd`, `xyac_mean_yardage`, `xyac_median_yardage`

# nflfastR 2.1.3

* Fixed a bug in `series_success` caused by bad `drive` information provided by NFL

# nflfastR 2.1.2

* Added the following columns that are available 2011 and later: `special_teams_play`, `st_play_type`, `time_of_day`, and `order_sequence`
* Added `old_game_id` column (useful for merging to external data that still uses this ID: format is YYYYMMDDxx)
* The `clean_pbp()` function now adds an `aborted_play` column
* Fixed a bug where pass plays with a penalty at end of play were classified as `play_type` = `no_play` rather than `pass`
* Fixed bug where EPA on defensive 2 point return was -0.95 instead of -2.95
* Fixed some remaining failed challenge plays that incorrectly had 0 for EPA
* Updated the included dataframe `teams_colors_logos` for the interim name of 
the 'Washington Football Team' and the corresponding logo urls.
* Some internal code improvements causing the required `tidyselect` version
to be >= 1.1.0

# nflfastR 2.1.1

### Functions

* `clean_pbp()` now standardizes player IDs across the old (1999-2010) and new 
(2011+) data sources. Player IDs once again uniquely identify players, and each 
unique player has one unique ID (as they did before the NFL data source change):
    * For players whose careers finished before 2011, their IDs remain the same
    * For players who played in both eras or only in the new era, their ID is 
    the new ID
    * For example, Akili Smith (ID: 00-0015082) and Alex Smith 
    (ID: 32013030-2d30-3032-3334-3336b638d37d) are both abbreviated as "A.Smith" 
    but can be distinguished by their IDs, with Akili showing what the old 
    format ID looks like, and Smith the new one
    * Standardization is realized by using an ID map
    available in the data repo
    
* `clean_pbp()` now removes all variables it is about to create to make sure 
nothing unexpected can happen

### Miscellaneous

* Added minimum version requirements to some package dependencies because 
installation broke for some users with outdated packages

* Made a minor bug fix to catch more out-of-order plays and fixed a bug where some
plays were being incorrectly dropped in older seasons

* Standardized team names (e.g. `SD` --> `LAC`) in some columns we had missed

# nflfastR 2.1.0

### Models

* Removed `week` from Expected Points models along with an update of
`vignette("nflfastR-models")` and `vignette("examples")`

### Functions

* Added function `update_db()` which adds all completed games to a SQLite database
* Added function `calculate_win_probability()` 
* Added new examples to `vignette("examples")` demonstrating the usage of the
above mentioned functions

### Bugs

* Fixed a problem with inconsistent data types of the variable
`drive_real_start_time` pre and post 2011
* Fixed a problem where some `game_id`s were overwritten during the play by play parsing
* Fix some more WP bugs on kickoffs with penalties and rare play description

### Miscellaneous

* `fast_scraper()` now loads the raw game data from a separate raw data repo
* Completely overhauled the entire code base to directly implement
[tidy evaluation](https://dplyr.tidyverse.org/articles/programming.html) using 
`.data` from the [rlang](https://rlang.r-lib.org/) package (this is a major 
code change that takes some getting used to but we need it in preparation of 
a future release)

# nflfastR 2.0.6

* Fixed a problem where defensive two point conversions were not counted
* Kneels on kickoffs are no longer counted as qb kneels
* Variable `yards_gained` more precisely defined
* Bugfixes for more games with out of order of plays
* Fix bug related to EPA on plays with a failed pass interference challenge
* Added new example to `vignette("examples")` to demonstrate Expected Points 
calculator `calculate_expected_points()`
* Fix for WP on 2-pt conversion negated by penalty
* Add more variables (containing team names) to team standardization in `clean_pbp()`
* Fix WP for onside kicks

# nflfastR 2.0.5

* Fix yet another bug caused by NFL providing plays out of order
* Fix bugs related to penalties on PATs and kickoffs
* Fix bugs related to NFL providing wrong scoring team on defensive touchdowns 
in older games involving the Jaguars
* Fix some minor issues related to wrong `first_down_rush` and `return_touchdown`
* Improved error handling of `fast_scraper()` for not yet played games
* Improved variable documentation and prepared for new website
* Improved performance for dplyr v1.0.0
* Rebuilt EP and WP models due to bugfixes in the underlying data in the versions
2.0.3, 2.0.4 and 2.0.5

# nflfastR 2.0.4

* Fix another bug with out of order plays
* Fix bug affecting cumulative totals for WPA, air_WPA and yac_WPA 
* Fix bug affecting cumulative totals for air_EPA and yac_EPA

# nflfastR 2.0.3

* Fix for NFL providing plays out of order
* Fix for series not incrementing following defensive TD

# nflfastR 2.0.2

* Fixed a bug in the series and series success calculations caused by timeouts
following a possession change
* Fixed win probability on PATs

# nflfastR 2.0.1

* Added minimum version requirement on `xgboost` (>= 1.1) as the recent `xgboost` update 
caused a breaking change leading to failure in adding model results to data

# nflfastR 2.0.0

### Models
* Added new models for Expected Points, Win Probability and Completion Probability 
and removed `nflscrapR` dependency. This is a **major** change as we are stepping away 
from the well established `nflscrapR` models. But we believe it is a good step forward.
See `data-raw/MODEL-README.md` for detailed model information.

* Added internal functions for `EPA` and `WPA` to `helper_add_ep_wp.R`.

* Added new function `calculate_expected_points()` usable for the enduser.

### Functions
* Completely overhauled `fast_scraper()` to make it work with the NFL's new server 
backend. The option `source` is still available but will be deprecated since there
is only one source now. There are some changes in the output as well (please see below).

* `fast_scraper()` now adds game data to the play by play data set courtesy of Lee Sharpe. 
Game data include:
away_score, home_score, location, result, total, spread_line, total_line, div_game, 
roof, surface, temp, wind, home_coach, away_coach, stadium, stadium_id, gameday

* `fastcraper_schedules()` now incorporates Lee Sharpe's `games.rds`.

* The functions `fast_scraper_clips()` and `fast_scraper_roster()` are deactivated 
due to the missing data source. They might be reactivated or completely dropped 
in future versions.

* The function `fix_fumbles()` has been renamed to `add_qb_epa()` as the new name
much better describes what the function is actually doing.

### Miscellaneous

* Added progress information using the `progressr`package and removed the 
`furrr` progress bars.

* `clean_pbp()` now adds the column `ìd` which is the id of the player in the column `name`. 
Because we have to piece together different data to cover the full span of years,
**player IDs are not consistent between the early (1999-2010) and recent (2011 onward)
periods**.

* Added a `NEWS.md` file to track changes to the package.

* Fixed several bugs inhereted from `nflscrapR`, including one where EPA was missing 
when a play was followed by two timeouts (for example, a two-minute warning followed by a timeout),
and another where `play_type` was incorrect on plays with declined penalties.

* Fixed a bug, where `receiver_player_name` and `receiver` didn't name the correct
players on plays with lateral passes.

### Play-by-Play Output
The output has changed a little bit. 

#### The following variables were dropped

| Dropped Variables          | Description                                                                                                                                                                       |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| game_key                   | RS feed game identifier.                                                                                                                                                          |
| game_time_local            | Kickoff time in local time zone.                                                                                                                                                  |
| iso_time                   | Kickoff time according ISO 8601.                                                                                                                                                  |
| game_type                  | One of 'REG', 'WC', 'DIV', 'CON', 'SB' indicating if a game was a regular season game or one of the playoff rounds.                                                               |
| site_id                    | RS feed id for game site.                                                                                                                                                         |
| site_city                  | Game site city.                                                                                                                                                                   |
| site_state                 | Game site state.                                                                                                                                                                  |
| drive_possession_team_abbr | Abbreviation of the possession team in a given drive.                                                                                                                             |
| scoring_team_abbr          | Abbreviation of the scoring team if the play was a scoring play.                                                                                                                  |
| scoring_type               | String indicating the scoring type. One of 'FG', 'TD', 'PAT', 'SFTY', 'PAT2'.                                                                                                     |
| alert_play_type            | String describing the play type of a play the NFL has listed as alert play. For most of those plays there are highlight clips available through fast_scraper_clips. |
| time_of_day                | Local time at the beginning of the play.                                                                                                                                          |
| yards                      | Analogue yards_gained but with the kicking team being the possession team (which means that there are many yards gained through kickoffs and punts).                              |
| end_yardline_number        | Yardline number within the above given side at the end of the given play.                                                                                                         |
| end_yardline_side          | String indicating the side of the field at the end of the given play.                                                                                                             |

#### The following variables were renamed

| Renamed Variables                             | Description                                                                                                                                               |
|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| game_time_eastern -> start_time               | Kickoff time in eastern time zone.                                                                                                                        |
| site_fullname -> stadium                      | Game site name.                                                                                                                                           |
| drive_how_started -> drive_start_transition   | String indicating how the offense got the ball.                                                                                                           |
| drive_how_ended -> drive_end_transition       | String indicating how the offense lost the ball.                                                                                                          |
| drive_start_time -> drive_game_clock_start    | Game time at the beginning of a given drive.                                                                                                              |
| drive_end_time -> drive_game_clock_end        | Game time at the end of a given drive.                                                                                                                    |
| drive_start_yardline -> drive_start_yard_line | String indicating where a given drive started consisting of team half and yard line number.                                                               |
| drive_end_yardline -> drive_end_yard_line     | String indicating where a given drive ended consisting of team half and yard line number.                                                                 |
| roof_type -> roof                             | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |

#### The following variables were added

| Added Variables        | Description                                                                                                                                                                                                          |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| vegas_wp               | Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.                                                                              |
| vegas_home_wp          | Estimated win probability for the home team incorporating pre-game Vegas line.                                                                                                                                       |
| weather                | String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!                                                                                        |
| nfl_api_id             | UUID of the game in the new NFL API.                                                                                                                                                                                 |
| play_clock             | Time on the playclock when the ball was snapped.                                                                                                                                                                     |
| play_deleted           | Binary indicator for deleted plays.                                                                                                                                                                                  |
| end_clock_time         | Game time at the end of a given play.                                                                                                                                                                                |
| end_yard_line          | String indicating the yardline at the end of the given play consisting of team half and yard line number.                                                                                                            |
| drive_real_start_time  | Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').                                                                                                                     |
| drive_ended_with_score | Binary indicator the drive ended with a score.                                                                                                                                                                       |
| drive_quarter_start    | Numeric value indicating in which quarter the given drive has started.                                                                                                                                               |
| drive_quarter_end      | Numeric value indicating in which quarter the given drive has ended.                                                                                                                                                 |
| drive_play_id_started  | Play_id of the first play in the given drive.                                                                                                                                                                        |
| drive_play_id_ended    | Play_id of the last play in the given drive.                                                                                                                                                                         |
| away_score             | Total points scored by the away team.                                                                                                                                                                                |
| home_score             | Total points scored by the home team.                                                                                                                                                                                |
| location               | Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site.                                                                                                                           |
| result                 | Equals home_score - away_score and means the game outcome from the perspective of the home team.                                                                                                                     |
| total                  | Equals home_score + away_score and means the total points scored in the given game.                                                                                                                                  |
| spread_line            | The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference) |
| total_line             | The closing total line for the game. (Source: Pro-Football-Reference)                                                                                                                                                |
| div_game               | Binary indicator for if the given game was a division game.                                                                                                                                                          |
| roof                   | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)                                                            |
| surface                | What type of ground the game was played on. (Source: Pro-Football-Reference)                                                                                                                                         |
| temp                   | The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)                                                                                                               |
| wind                   | The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)                                                                                                         |
| home_coach             | First and last name of the home team coach. (Source: Pro-Football-Reference)                                                                                                                                         |
| away_coach             | First and last name of the away team coach. (Source: Pro-Football-Reference)                                                                                                                                         |
| stadium_id             | ID of the stadium the game was played in. (Source: Pro-Football-Reference)                                                                                                                                           |
| game_stadium           | Name of the stadium the game was played in. (Source: Pro-Football-Reference)                                                                                                                                         |


================================================
FILE: R/aggregate_game_stats.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl
# Styleguide: styler::tidyverse_style()
################################################################################

#' Get Official Game Stats
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#' [load_pbp()] or [build_nflfastR_pbp()]. If the data doesn't include the variable
#' `qb_epa`, the function `add_qb_epa()` will be called to add it.
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats
#' for the entire Data frame.
#' @description Build columns that aggregate official passing, rushing, and receiving stats
#' either at the game level or at the level of the entire data frame passed.
#' @return A data frame including the following columns (all ID columns are
#' decoded to the gsis ID format):
#' \describe{
#' \item{player_id}{ID of the player. Use this to join to other sources.}
#' \item{player_name}{Name of the player}
#' \item{player_display_name}{Full name of the player}
#' \item{position}{Position of the player}
#' \item{position_group}{Position group of the player}
#' \item{headshot_url}{URL to a player headshot image}
#' \item{games}{The number of games where the player recorded passing, rushing or receiving stats.}
#' \item{recent_team}{Most recent team player appears in `pbp` with.}
#' \item{season}{Season if `weekly` is `TRUE`}
#' \item{week}{Week if `weekly` is `TRUE`}
#' \item{season_type}{`REG` or `POST` if `weekly` is `TRUE`}
#' \item{opponent_team}{The player's opponent team if `weekly` is `TRUE`}
#' \item{completions}{The number of completed passes.}
#' \item{attempts}{The number of pass attempts as defined by the NFL.}
#' \item{passing_yards}{Yards gained on pass plays.}
#' \item{passing_tds}{The number of passing touchdowns.}
#' \item{interceptions}{The number of interceptions thrown.}
#' \item{sacks}{The Number of times sacked.}
#' \item{sack_yards}{Yards lost on sack plays.}
#' \item{sack_fumbles}{The number of sacks with a fumble.}
#' \item{sack_fumbles_lost}{The number of sacks with a lost fumble.}
#' \item{passing_air_yards}{Passing air yards (includes incomplete passes).}
#' \item{passing_yards_after_catch}{Yards after the catch gained on plays in
#' which player was the passer (this is an unofficial stat and may differ slightly
#' between different sources).}
#' \item{passing_first_downs}{First downs on pass attempts.}
#' \item{passing_epa}{Total expected points added on pass attempts and sacks.
#' NOTE: this uses the variable `qb_epa`, which gives QB credit for EPA for up
#' to the point where a receiver lost a fumble after a completed catch and makes
#' EPA work more like passing yards on plays with fumbles.}
#' \item{passing_2pt_conversions}{Two-point conversion passes.}
#' \item{pacr}{Passing Air Conversion Ratio. PACR = `passing_yards` / `passing_air_yards`}
#' \item{dakota}{Adjusted EPA + CPOE composite based on coefficients which best predict adjusted EPA/play in the following year.}
#' \item{carries}{The number of official rush attempts (incl. scrambles and kneel downs).
#' Rushes after a lateral reception don't count as carry.}
#' \item{rushing_yards}{Yards gained when rushing with the ball (incl. scrambles and kneel downs).
#' Also includes yards gained after obtaining a lateral on a play that started
#' with a rushing attempt.}
#' \item{rushing_tds}{The number of rushing touchdowns (incl. scrambles).
#' Also includes touchdowns after obtaining a lateral on a play that started
#' with a rushing attempt.}
#' \item{rushing_fumbles}{The number of rushes with a fumble.}
#' \item{rushing_fumbles_lost}{The number of rushes with a lost fumble.}
#' \item{rushing_first_downs}{First downs on rush attempts (incl. scrambles).}
#' \item{rushing_epa}{Expected points added on rush attempts (incl. scrambles and kneel downs).}
#' \item{rushing_2pt_conversions}{Two-point conversion rushes}
#' \item{receptions}{The number of pass receptions. Lateral receptions officially
#' don't count as reception.}
#' \item{targets}{The number of pass plays where the player was the targeted receiver.}
#' \item{receiving_yards}{Yards gained after a pass reception. Includes yards
#' gained after receiving a lateral on a play that started as a pass play.}
#' \item{receiving_tds}{The number of touchdowns following a pass reception.
#' Also includes touchdowns after receiving a lateral on a play that started
#' as a pass play.}
#' \item{receiving_air_yards}{Receiving air yards (incl. incomplete passes).}
#' \item{receiving_yards_after_catch}{Yards after the catch gained on plays in
#' which player was receiver (this is an unofficial stat and may differ slightly
#' between different sources).}
#' \item{receiving_fumbles}{The number of fumbles after a pass reception.}
#' \item{receiving_fumbles_lost}{The number of fumbles lost after a pass reception.}
#' \item{receiving_2pt_conversions}{Two-point conversion receptions}
#' \item{racr}{Receiver Air Conversion Ratio. RACR = `receiving_yards` / `receiving_air_yards`}
#' \item{target_share}{The share of targets of the player in all targets of his team}
#' \item{air_yards_share}{The share of receiving_air_yards of the player in all air_yards of his team}
#' \item{wopr}{Weighted Opportunity Rating. WOPR = 1.5 × `target_share` + 0.7 × `air_yards_share`}
#' \item{fantasy_points}{Standard fantasy points.}
#' \item{fantasy_points_ppr}{PPR fantasy points.}
#' }
#' @export
#' @keywords internal
#' @seealso The function [load_player_stats()] and the corresponding examples
#' on [the nflfastR website](https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats)
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' # pbp <- nflfastR::load_pbp(2020)
#'
#' # weekly <- calculate_player_stats(pbp, weekly = TRUE)
#' # dplyr::glimpse(weekly)
#'
#' # overall <- calculate_player_stats(pbp, weekly = FALSE)
#' # dplyr::glimpse(overall)
#' })
#' }
calculate_player_stats <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)", "to join player information.")

  # Prepare data ------------------------------------------------------------

  # load plays with multiple laterals
  mult_lats <- nflreadr::rds_from_url(
    "https://github.com/nflverse/nflverse-data/releases/download/misc/multiple_lateral_yards.rds"
  ) |>
    dplyr::mutate(
      season = substr(.data$game_id, 1, 4) |> as.integer(),
      week = substr(.data$game_id, 6, 7) |> as.integer()
    ) |>
    dplyr::filter(.data$yards != 0) |>
    # the list includes all plays with multiple laterals
    # and all receivers. Since the last one already is in the
    # pbp data, we have to drop him here so the entry isn't duplicated
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::slice(seq_len(dplyr::n() - 1)) |>
    dplyr::ungroup() |>
    # there are some very rare cases where a player collects lateral yards
    # multiple times in the same play. We need to aggregate here to make sure
    # this don't messes up joins (#289)
    dplyr::group_by(
      .data$season,
      .data$week,
      .data$type,
      .data$gsis_player_id
    ) |>
    dplyr::summarise(yards = sum(.data$yards)) |>
    dplyr::ungroup()

  # filter down to the 2 dfs we need
  suppressMessages({
    # 1. for "normal" plays: get plays that count in official stats
    data <- pbp |>
      dplyr::filter(
        !is.na(.data$down),
        .data$play_type %in% c("pass", "qb_kneel", "qb_spike", "run")
      ) |>
      decode_player_ids()

    if (!"qb_epa" %in% names(data)) {
      data <- add_qb_epa(data)
    }

    # 2. for 2pt conversions only, get those plays
    two_points <- pbp |>
      dplyr::filter(.data$two_point_conv_result == "success") |>
      dplyr::select(
        "week",
        "season",
        "posteam",
        "defteam",
        "pass_attempt",
        "rush_attempt",
        "passer_player_name",
        "passer_player_id",
        "rusher_player_name",
        "rusher_player_id",
        "lateral_rusher_player_name",
        "lateral_rusher_player_id",
        "receiver_player_name",
        "receiver_player_id",
        "lateral_receiver_player_name",
        "lateral_receiver_player_id"
      ) |>
      decode_player_ids()
  })

  if (!"special" %in% names(pbp)) {
    # we need this column for the special teams tds
    pbp <- pbp |>
      dplyr::mutate(
        special = dplyr::if_else(
          .data$play_type %in%
            c("extra_point", "field_goal", "kickoff", "punt"),
          1,
          0
        )
      )
  }

  s_type <- pbp |>
    dplyr::select("season", "season_type", "week") |>
    dplyr::distinct()

  # we'll join some player information like position or full name later
  # so we load it here to be able to use it for racr ids as well
  player_info <- nflreadr::load_players() |>
    dplyr::select(
      "player_id" = "gsis_id",
      "player_display_name" = "display_name",
      "player_name" = "short_name",
      "position",
      "position_group",
      "headshot_url" = "headshot"
    )

  # load gsis_ids of RBs, FBs and HBs for RACR
  racr_ids <- player_info |>
    dplyr::filter(.data$position %in% c("RB", "FB", "HB")) |>
    dplyr::select("gsis_id" = "player_id")

  # Passing stats -----------------------------------------------------------

  # get passing stats
  pass_df <- data |>
    dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) |>
    dplyr::group_by(.data$passer_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      passing_yards_after_catch = sum(
        (.data$passing_yards - .data$air_yards) * .data$complete_pass,
        na.rm = TRUE
      ),
      name_pass = dplyr::first(.data$passer_player_name),
      team_pass = dplyr::first(.data$posteam),
      opp_pass = dplyr::first(.data$defteam),
      passing_yards = sum(.data$passing_yards, na.rm = TRUE),
      passing_tds = sum(
        .data$touchdown == 1 &
          .data$td_team == .data$posteam &
          .data$complete_pass == 1
      ),
      interceptions = sum(.data$interception),
      attempts = sum(
        .data$complete_pass == 1 |
          .data$incomplete_pass == 1 |
          .data$interception == 1
      ),
      completions = sum(.data$complete_pass == 1),
      sack_fumbles = sum(
        .data$fumble == 1 & .data$fumbled_1_player_id == .data$passer_player_id
      ),
      sack_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$passer_player_id &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      passing_air_yards = sum(.data$air_yards, na.rm = TRUE),
      sacks = sum(.data$sack),
      sack_yards = -1 * sum(.data$yards_gained * .data$sack),
      passing_first_downs = sum(.data$first_down_pass),
      passing_epa = sum(.data$qb_epa, na.rm = TRUE),
      pacr = .data$passing_yards / .data$passing_air_yards,
      pacr = dplyr::case_when(
        is.nan(.data$pacr) ~ NA_real_,
        .data$passing_air_yards <= 0 ~ 0,
        TRUE ~ .data$pacr
      ),
    ) |>
    dplyr::rename("player_id" = "passer_player_id") |>
    dplyr::ungroup()

  if (isTRUE(weekly)) {
    pass_df <- add_dakota(pass_df, pbp = pbp, weekly = weekly)
  }

  pass_two_points <- two_points |>
    dplyr::filter(.data$pass_attempt == 1) |>
    dplyr::group_by(.data$passer_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_pass and team_pass here for the full join in the next pipe
      name_pass = custom_mode(.data$passer_player_name),
      team_pass = custom_mode(.data$posteam),
      opp_pass = custom_mode(.data$defteam),
      passing_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "passer_player_id") |>
    dplyr::ungroup()

  pass_df <- pass_df |>
    # need a full join because players without passing stats that recorded
    # a passing two point (e.g. WRs) are dropped in any other join
    dplyr::full_join(
      pass_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_pass",
        "team_pass",
        "opp_pass"
      )
    ) |>
    dplyr::mutate(
      passing_2pt_conversions = dplyr::if_else(
        is.na(.data$passing_2pt_conversions),
        0L,
        .data$passing_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id))

  pass_df_nas <- is.na(pass_df)
  epa_index <- which(
    dimnames(pass_df_nas)[[2]] %in% c("passing_epa", "dakota", "pacr")
  )
  pass_df_nas[, epa_index] <- c(FALSE)

  pass_df[pass_df_nas] <- 0

  # Rushing stats -----------------------------------------------------------

  # rush df 1: primary rusher
  rushes <- data |>
    dplyr::filter(.data$play_type %in% c("run", "qb_kneel")) |>
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      name_rush = dplyr::first(.data$rusher_player_name),
      team_rush = dplyr::first(.data$posteam),
      opp_rush = dplyr::first(.data$defteam),
      yards = sum(.data$rushing_yards, na.rm = TRUE),
      tds = sum(.data$td_player_id == .data$rusher_player_id, na.rm = TRUE),
      carries = dplyr::n(),
      rushing_fumbles = sum(
        .data$fumble == 1 &
          .data$fumbled_1_player_id == .data$rusher_player_id &
          is.na(.data$lateral_rusher_player_id)
      ),
      rushing_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$rusher_player_id &
          is.na(.data$lateral_rusher_player_id) &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      rushing_first_downs = sum(
        .data$first_down_rush & is.na(.data$lateral_rusher_player_id)
      ),
      rushing_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # rush df 2: lateral
  laterals <- data |>
    dplyr::filter(!is.na(.data$lateral_rusher_player_id)) |>
    dplyr::group_by(.data$lateral_rusher_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      lateral_yards = sum(.data$lateral_rushing_yards, na.rm = TRUE),
      lateral_fds = sum(.data$first_down_rush, na.rm = TRUE),
      lateral_tds = sum(
        .data$td_player_id == .data$lateral_rusher_player_id,
        na.rm = TRUE
      ),
      lateral_att = dplyr::n(),
      lateral_fumbles = sum(.data$fumble, na.rm = TRUE),
      lateral_fumbles_lost = sum(.data$fumble_lost, na.rm = TRUE)
    ) |>
    dplyr::ungroup() |>
    dplyr::rename("rusher_player_id" = "lateral_rusher_player_id") |>
    dplyr::bind_rows(
      mult_lats |>
        dplyr::filter(
          .data$type == "lateral_rushing" &
            .data$season %in% data$season &
            .data$week %in% data$week
        ) |>
        dplyr::select(
          "season",
          "week",
          "rusher_player_id" = "gsis_player_id",
          "lateral_yards" = "yards"
        ) |>
        dplyr::mutate(lateral_tds = 0L, lateral_att = 1L)
    ) |>
    # at this stage it is possible that a player is duplicated because he
    # has lateral yards both in the regular pbp and in the multiple laterals file.
    # This can happen when a player was the last lateral player in one play and
    # not the last lateral player in another play in the same game (wow absurd)
    # We summarise all columns to make sure there is only one row per player
    # per game. See (#289)
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarise_all(.funs = sum, na.rm = TRUE) |>
    dplyr::ungroup()

  # rush df: join
  rush_df <- rushes |>
    dplyr::left_join(laterals, by = c("rusher_player_id", "week", "season")) |>
    dplyr::mutate(
      lateral_yards = dplyr::if_else(
        is.na(.data$lateral_yards),
        0,
        .data$lateral_yards
      ),
      lateral_tds = dplyr::if_else(
        is.na(.data$lateral_tds),
        0L,
        .data$lateral_tds
      ),
      lateral_fumbles = dplyr::if_else(
        is.na(.data$lateral_fumbles),
        0,
        .data$lateral_fumbles
      ),
      lateral_fumbles_lost = dplyr::if_else(
        is.na(.data$lateral_fumbles_lost),
        0,
        .data$lateral_fumbles_lost
      ),
      lateral_fds = dplyr::if_else(
        is.na(.data$lateral_fds),
        0,
        .data$lateral_fds
      )
    ) |>
    dplyr::mutate(
      rushing_yards = .data$yards + .data$lateral_yards,
      rushing_tds = .data$tds + .data$lateral_tds,
      rushing_first_downs = .data$rushing_first_downs + .data$lateral_fds,
      rushing_fumbles = .data$rushing_fumbles + .data$lateral_fumbles,
      rushing_fumbles_lost = .data$rushing_fumbles_lost +
        .data$lateral_fumbles_lost
    ) |>
    dplyr::rename("player_id" = "rusher_player_id") |>
    dplyr::select(
      "player_id",
      "week",
      "season",
      "name_rush",
      "team_rush",
      "opp_rush",
      "rushing_yards",
      "carries",
      "rushing_tds",
      "rushing_fumbles",
      "rushing_fumbles_lost",
      "rushing_first_downs",
      "rushing_epa"
    ) |>
    dplyr::ungroup()

  rush_two_points <- two_points |>
    dplyr::filter(.data$rush_attempt == 1) |>
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_rush and team_rush here for the full join in the next pipe
      name_rush = custom_mode(.data$rusher_player_name),
      team_rush = custom_mode(.data$posteam),
      opp_rush = custom_mode(.data$defteam),
      rushing_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "rusher_player_id") |>
    dplyr::ungroup()

  rush_df <- rush_df |>
    # need a full join because players without rushing stats that recorded
    # a rushing two point (mostly QBs) are dropped in any other join
    dplyr::full_join(
      rush_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_rush",
        "team_rush",
        "opp_rush"
      )
    ) |>
    dplyr::mutate(
      rushing_2pt_conversions = dplyr::if_else(
        is.na(.data$rushing_2pt_conversions),
        0L,
        .data$rushing_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id))

  rush_df_nas <- is.na(rush_df)
  epa_index <- which(dimnames(rush_df_nas)[[2]] == "rushing_epa")
  rush_df_nas[, epa_index] <- c(FALSE)

  rush_df[rush_df_nas] <- 0

  # Receiving stats ---------------------------------------------------------

  # receiver df 1: primary receiver
  rec <- data |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      name_receiver = dplyr::first(.data$receiver_player_name),
      team_receiver = dplyr::first(.data$posteam),
      opp_receiver = dplyr::first(.data$defteam),
      yards = sum(.data$receiving_yards, na.rm = TRUE),
      receptions = sum(.data$complete_pass == 1),
      targets = dplyr::n(),
      tds = sum(.data$td_player_id == .data$receiver_player_id, na.rm = TRUE),
      receiving_fumbles = sum(
        .data$fumble == 1 &
          .data$fumbled_1_player_id == .data$receiver_player_id &
          is.na(.data$lateral_receiver_player_id)
      ),
      receiving_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$receiver_player_id &
          is.na(.data$lateral_receiver_player_id) &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      receiving_air_yards = sum(.data$air_yards, na.rm = TRUE),
      receiving_yards_after_catch = sum(.data$yards_after_catch, na.rm = TRUE),
      receiving_first_downs = sum(
        .data$first_down_pass & is.na(.data$lateral_receiver_player_id)
      ),
      receiving_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # receiver df 2: lateral
  laterals <- data |>
    dplyr::filter(!is.na(.data$lateral_receiver_player_id)) |>
    dplyr::group_by(
      .data$lateral_receiver_player_id,
      .data$week,
      .data$season
    ) |>
    dplyr::summarize(
      lateral_yards = sum(.data$lateral_receiving_yards, na.rm = TRUE),
      lateral_tds = sum(
        .data$td_player_id == .data$lateral_receiver_player_id,
        na.rm = TRUE
      ),
      lateral_att = dplyr::n(),
      lateral_fds = sum(.data$first_down_pass, na.rm = T),
      lateral_fumbles = sum(.data$fumble, na.rm = T),
      lateral_fumbles_lost = sum(.data$fumble_lost, na.rm = T)
    ) |>
    dplyr::ungroup() |>
    dplyr::rename("receiver_player_id" = "lateral_receiver_player_id") |>
    dplyr::bind_rows(
      mult_lats |>
        dplyr::filter(
          .data$type == "lateral_receiving" &
            .data$season %in% data$season &
            .data$week %in% data$week
        ) |>
        dplyr::select(
          "season",
          "week",
          "receiver_player_id" = "gsis_player_id",
          "lateral_yards" = "yards"
        ) |>
        dplyr::mutate(lateral_tds = 0L, lateral_att = 1L)
    ) |>
    # at this stage it is possible that a player is duplicated because he
    # has lateral yards both in the regular pbp and in the multiple laterals file.
    # This can happen when a player was the last lateral player in one play and
    # not the last lateral player in another play in the same game (wow absurd)
    # We summarise all columns to get make sure there is only one row per player
    # per game. See (#289)
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarise_all(.funs = sum, na.rm = TRUE) |>
    dplyr::ungroup()

  # receiver df 3: team receiving for WOPR
  rec_team <- data |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::group_by(.data$posteam, .data$week, .data$season) |>
    dplyr::summarize(
      team_targets = dplyr::n(),
      team_air_yards = sum(.data$air_yards, na.rm = TRUE),
    ) |>
    dplyr::ungroup()

  # rec df: join
  rec_df <- rec |>
    dplyr::left_join(
      laterals,
      by = c("receiver_player_id", "week", "season")
    ) |>
    dplyr::left_join(
      rec_team,
      by = c("team_receiver" = "posteam", "week", "season")
    ) |>
    dplyr::mutate(
      lateral_yards = dplyr::if_else(
        is.na(.data$lateral_yards),
        0,
        .data$lateral_yards
      ),
      lateral_tds = dplyr::if_else(
        is.na(.data$lateral_tds),
        0L,
        .data$lateral_tds
      ),
      lateral_fumbles = dplyr::if_else(
        is.na(.data$lateral_fumbles),
        0,
        .data$lateral_fumbles
      ),
      lateral_fumbles_lost = dplyr::if_else(
        is.na(.data$lateral_fumbles_lost),
        0,
        .data$lateral_fumbles_lost
      ),
      lateral_fds = dplyr::if_else(
        is.na(.data$lateral_fds),
        0,
        .data$lateral_fds
      )
    ) |>
    dplyr::mutate(
      receiving_yards = .data$yards + .data$lateral_yards,
      receiving_tds = .data$tds + .data$lateral_tds,
      receiving_yards_after_catch = .data$receiving_yards_after_catch +
        .data$lateral_yards,
      receiving_first_downs = .data$receiving_first_downs + .data$lateral_fds,
      receiving_fumbles = .data$receiving_fumbles + .data$lateral_fumbles,
      receiving_fumbles_lost = .data$receiving_fumbles_lost +
        .data$lateral_fumbles_lost,
      racr = .data$receiving_yards / .data$receiving_air_yards,
      racr = dplyr::case_when(
        is.nan(.data$racr) ~ NA_real_,
        .data$receiving_air_yards == 0 ~ 0,
        # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
        # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
        .data$receiving_air_yards < 0 &
          !.data$receiver_player_id %in% racr_ids$gsis_id ~ 0,
        TRUE ~ .data$racr
      ),
      target_share = .data$targets / .data$team_targets,
      air_yards_share = .data$receiving_air_yards / .data$team_air_yards,
      wopr = 1.5 * .data$target_share + 0.7 * .data$air_yards_share
    ) |>
    dplyr::rename("player_id" = "receiver_player_id") |>
    dplyr::select(
      "player_id",
      "week",
      "season",
      "name_receiver",
      "team_receiver",
      "opp_receiver",
      "receiving_yards",
      "receiving_air_yards",
      "receiving_yards_after_catch",
      "receptions",
      "targets",
      "receiving_tds",
      "receiving_fumbles",
      "receiving_fumbles_lost",
      "receiving_first_downs",
      "receiving_epa",
      "racr",
      "target_share",
      "air_yards_share",
      "wopr"
    )

  rec_two_points <- two_points |>
    dplyr::filter(.data$pass_attempt == 1) |>
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_receiver and team_receiver here for the full join in the next pipe
      name_receiver = custom_mode(.data$receiver_player_name),
      team_receiver = custom_mode(.data$posteam),
      opp_receiver = custom_mode(.data$defteam),
      receiving_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "receiver_player_id") |>
    dplyr::ungroup()

  rec_df <- rec_df |>
    # need a full join because players without receiving stats that recorded
    # a receiving two point are dropped in any other join
    dplyr::full_join(
      rec_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_receiver",
        "team_receiver",
        "opp_receiver"
      )
    ) |>
    dplyr::mutate(
      receiving_2pt_conversions = dplyr::if_else(
        is.na(.data$receiving_2pt_conversions),
        0L,
        .data$receiving_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id), !is.na(.data$name_receiver))

  rec_df_nas <- is.na(rec_df)
  epa_index <- which(
    dimnames(rec_df_nas)[[2]] %in%
      c("receiving_epa", "racr", "target_share", "air_yards_share", "wopr")
  )
  rec_df_nas[, epa_index] <- c(FALSE)

  rec_df[rec_df_nas] <- 0

  # Special Teams -----------------------------------------------------------

  st_tds <- pbp |>
    dplyr::filter(.data$special == 1 & !is.na(.data$td_player_id)) |>
    dplyr::group_by(.data$td_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      name_st = custom_mode(.data$td_player_name),
      team_st = custom_mode(.data$td_team),
      opp_st = custom_mode(.data$defteam),
      special_teams_tds = sum(.data$touchdown, na.rm = TRUE)
    ) |>
    dplyr::rename("player_id" = "td_player_id")

  # Combine all stats -------------------------------------------------------

  # combine all the stats together
  player_df <- pass_df |>
    dplyr::full_join(rush_df, by = c("player_id", "week", "season")) |>
    dplyr::full_join(rec_df, by = c("player_id", "week", "season")) |>
    dplyr::full_join(st_tds, by = c("player_id", "week", "season")) |>
    dplyr::left_join(s_type, by = c("season", "week")) |>
    dplyr::mutate(
      player_name = dplyr::case_when(
        !is.na(.data$name_pass) ~ .data$name_pass,
        !is.na(.data$name_rush) ~ .data$name_rush,
        !is.na(.data$name_receiver) ~ .data$name_receiver,
        TRUE ~ .data$name_st
      ),
      recent_team = dplyr::case_when(
        !is.na(.data$team_pass) ~ .data$team_pass,
        !is.na(.data$team_rush) ~ .data$team_rush,
        !is.na(.data$team_receiver) ~ .data$team_receiver,
        TRUE ~ .data$team_st
      ),
      opponent_team = dplyr::case_when(
        !is.na(.data$opp_pass) ~ .data$opp_pass,
        !is.na(.data$opp_rush) ~ .data$opp_rush,
        !is.na(.data$opp_receiver) ~ .data$opp_receiver,
        TRUE ~ .data$opp_st
      )
    ) |>
    dplyr::select(dplyr::any_of(c(
      # id information
      "player_id",
      "player_name",
      "recent_team",
      "season",
      "week",
      "season_type",
      "opponent_team",

      # passing stats
      "completions",
      "attempts",
      "passing_yards",
      "passing_tds",
      "interceptions",
      "sacks",
      "sack_yards",
      "sack_fumbles",
      "sack_fumbles_lost",
      "passing_air_yards",
      "passing_yards_after_catch",
      "passing_first_downs",
      "passing_epa",
      "passing_2pt_conversions",
      "pacr",
      "dakota",

      # rushing stats
      "carries",
      "rushing_yards",
      "rushing_tds",
      "rushing_fumbles",
      "rushing_fumbles_lost",
      "rushing_first_downs",
      "rushing_epa",
      "rushing_2pt_conversions",

      # receiving stats
      "receptions",
      "targets",
      "receiving_yards",
      "receiving_tds",
      "receiving_fumbles",
      "receiving_fumbles_lost",
      "receiving_air_yards",
      "receiving_yards_after_catch",
      "receiving_first_downs",
      "receiving_epa",
      "receiving_2pt_conversions",
      "racr",
      "target_share",
      "air_yards_share",
      "wopr",

      # special teams
      "special_teams_tds"
    ))) |>
    dplyr::filter(!is.na(.data$player_id), !is.na(.data$player_name))

  player_df_nas <- is.na(player_df)
  epa_index <- which(
    dimnames(player_df_nas)[[2]] %in%
      c(
        "passing_epa",
        "rushing_epa",
        "receiving_epa",
        "dakota",
        "racr",
        "target_share",
        "air_yards_share",
        "wopr",
        "pacr"
      )
  )
  player_df_nas[, epa_index] <- c(FALSE)

  player_df[player_df_nas] <- 0

  player_df <- player_df |>
    dplyr::mutate(
      fantasy_points = 1 /
        25 *
        .data$passing_yards +
        4 * .data$passing_tds +
        -2 * .data$interceptions +
        1 / 10 * (.data$rushing_yards + .data$receiving_yards) +
        6 *
          (.data$rushing_tds + .data$receiving_tds + .data$special_teams_tds) +
        2 *
          (.data$passing_2pt_conversions +
            .data$rushing_2pt_conversions +
            .data$receiving_2pt_conversions) +
        -2 *
          (.data$sack_fumbles_lost +
            .data$rushing_fumbles_lost +
            .data$receiving_fumbles_lost),

      fantasy_points_ppr = .data$fantasy_points + .data$receptions
    ) |>
    dplyr::arrange(.data$player_id, .data$season, .data$week)

  # if user doesn't want week-by-week input, aggregate the whole df
  if (isFALSE(weekly)) {
    player_df <- player_df |>
      # helper variables to summarise targetshare and air yard share
      # because targets and air yards summarise first
      dplyr::mutate(
        tgts = .data$targets,
        rec_air_yds = .data$receiving_air_yards
      ) |>
      dplyr::group_by(.data$player_id) |>
      dplyr::summarise(
        player_name = custom_mode(.data$player_name),
        games = dplyr::n(),
        recent_team = dplyr::last(.data$recent_team),
        # passing
        completions = sum(.data$completions),
        attempts = sum(.data$attempts),
        passing_yards = sum(.data$passing_yards),
        passing_tds = sum(.data$passing_tds),
        interceptions = sum(.data$interceptions),
        sacks = sum(.data$sacks),
        sack_yards = sum(.data$sack_yards),
        sack_fumbles = sum(.data$sack_fumbles),
        sack_fumbles_lost = sum(.data$sack_fumbles_lost),
        passing_air_yards = sum(.data$passing_air_yards),
        passing_yards_after_catch = sum(.data$passing_yards_after_catch),
        passing_first_downs = sum(.data$passing_first_downs),
        passing_epa = dplyr::if_else(
          all(is.na(.data$passing_epa)),
          NA_real_,
          sum(.data$passing_epa, na.rm = TRUE)
        ),
        passing_2pt_conversions = sum(.data$passing_2pt_conversions),
        pacr = .data$passing_yards / .data$passing_air_yards,

        # rushing
        carries = sum(.data$carries),
        rushing_yards = sum(.data$rushing_yards),
        rushing_tds = sum(.data$rushing_tds),
        rushing_fumbles = sum(.data$rushing_fumbles),
        rushing_fumbles_lost = sum(.data$rushing_fumbles_lost),
        rushing_first_downs = sum(.data$rushing_first_downs),
        rushing_epa = dplyr::if_else(
          all(is.na(.data$rushing_epa)),
          NA_real_,
          sum(.data$rushing_epa, na.rm = TRUE)
        ),
        rushing_2pt_conversions = sum(.data$rushing_2pt_conversions),

        # receiving
        receptions = sum(.data$receptions),
        targets = sum(.data$targets),
        receiving_yards = sum(.data$receiving_yards),
        receiving_tds = sum(.data$receiving_tds),
        receiving_fumbles = sum(.data$receiving_fumbles),
        receiving_fumbles_lost = sum(.data$receiving_fumbles_lost),
        receiving_air_yards = sum(.data$receiving_air_yards),
        receiving_yards_after_catch = sum(.data$receiving_yards_after_catch),
        receiving_first_downs = sum(.data$receiving_first_downs),
        receiving_epa = dplyr::if_else(
          all(is.na(.data$receiving_epa)),
          NA_real_,
          sum(.data$receiving_epa, na.rm = TRUE)
        ),
        receiving_2pt_conversions = sum(.data$receiving_2pt_conversions),
        racr = .data$receiving_yards / .data$receiving_air_yards,
        target_share = dplyr::if_else(
          all(is.na(.data$target_share)),
          NA_real_,
          sum(.data$tgts, na.rm = TRUE) /
            sum(.data$tgts / .data$target_share, na.rm = TRUE)
        ),
        air_yards_share = dplyr::if_else(
          all(is.na(.data$air_yards_share)),
          NA_real_,
          sum(.data$rec_air_yds, na.rm = TRUE) /
            sum(.data$rec_air_yds / .data$air_yards_share, na.rm = TRUE)
        ),
        wopr = 1.5 * .data$target_share + 0.7 * .data$air_yards_share,

        # special teams
        special_teams_tds = sum(.data$special_teams_tds),

        # fantasy
        fantasy_points = sum(.data$fantasy_points),
        fantasy_points_ppr = sum(.data$fantasy_points_ppr)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(
        racr = dplyr::case_when(
          is.nan(.data$racr) ~ NA_real_,
          .data$receiving_air_yards == 0 ~ 0,
          # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
          # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
          .data$receiving_air_yards < 0 &
            !.data$player_id %in% racr_ids$gsis_id ~ 0,
          TRUE ~ .data$racr
        ),
        pacr = dplyr::case_when(
          is.nan(.data$pacr) ~ NA_real_,
          .data$passing_air_yards <= 0 ~ 0,
          TRUE ~ .data$pacr
        )
      ) |>
      add_dakota(pbp = pbp, weekly = weekly) |>
      dplyr::select(
        "player_id":"pacr",
        dplyr::any_of("dakota"),
        dplyr::everything()
      )
  }

  # data is missing position and player name can be messed up in pbp
  # so we join player information next
  player_df <- player_df |>
    dplyr::select(-"player_name") |>
    dplyr::left_join(player_info, by = "player_id") |>
    dplyr::select(
      "player_id",
      "player_name",
      "player_display_name",
      "position",
      "position_group",
      "headshot_url",
      dplyr::everything()
    )

  return(player_df)
}

add_dakota <- function(add_to_this, pbp, weekly) {
  dakota_model <- NULL
  con <- url(
    "https://github.com/nflverse/nflfastR-data/blob/master/models/dakota_model.Rdata?raw=true"
  )
  try(load(con), silent = TRUE)
  close(con)

  if (is.null(dakota_model)) {
    user_message(
      "This function needs to download the model data from GitHub. Please check your Internet connection and try again!",
      "oops"
    )
    return(add_to_this)
  }

  if (!"id" %in% names(pbp)) {
    pbp <- clean_pbp(pbp)
  }
  if (!"qb_epa" %in% names(pbp)) {
    pbp <- add_qb_epa(pbp)
  }

  suppressMessages({
    df <- pbp |>
      dplyr::filter(.data$pass == 1 | .data$rush == 1) |>
      dplyr::filter(
        !is.na(.data$posteam) &
          !is.na(.data$qb_epa) &
          !is.na(.data$id) &
          !is.na(.data$down)
      ) |>
      dplyr::mutate(
        epa = dplyr::if_else(.data$qb_epa < -4.5, -4.5, .data$qb_epa)
      ) |>
      decode_player_ids()
  })

  if (isTRUE(weekly)) {
    relevant_players <- add_to_this |>
      dplyr::filter(.data$attempts >= 5) |>
      dplyr::mutate(
        filter_id = paste(.data$player_id, .data$season, .data$week, sep = "_")
      ) |>
      dplyr::pull(.data$filter_id)

    model_data <- df |>
      dplyr::group_by(.data$id, .data$week, .data$season) |>
      dplyr::summarize(
        n_plays = n(),
        epa_per_play = sum(.data$epa) / .data$n_plays,
        cpoe = mean(.data$cpoe, na.rm = TRUE)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(cpoe = dplyr::if_else(is.na(.data$cpoe), 0, .data$cpoe)) |>
      dplyr::rename("player_id" = "id") |>
      dplyr::mutate(
        filter_id = paste(.data$player_id, .data$season, .data$week, sep = "_")
      ) |>
      dplyr::filter(.data$filter_id %in% relevant_players)

    model_data$dakota <- mgcv::predict.gam(dakota_model, model_data) |>
      as.vector()

    out <- add_to_this |>
      dplyr::left_join(
        model_data |>
          dplyr::select("player_id", "week", "season", "dakota"),
        by = c("player_id", "week", "season")
      )
  } else if (isFALSE(weekly)) {
    relevant_players <- add_to_this |>
      dplyr::filter(.data$attempts >= 5) |>
      dplyr::pull(.data$player_id)

    model_data <- df |>
      dplyr::group_by(.data$id) |>
      dplyr::summarize(
        n_plays = n(),
        epa_per_play = sum(.data$epa) / .data$n_plays,
        cpoe = mean(.data$cpoe, na.rm = TRUE)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(cpoe = dplyr::if_else(is.na(.data$cpoe), 0, .data$cpoe)) |>
      dplyr::rename("player_id" = "id") |>
      dplyr::filter(.data$player_id %in% relevant_players)

    model_data$dakota <- mgcv::predict.gam(dakota_model, model_data) |>
      as.vector()

    out <- add_to_this |>
      dplyr::left_join(
        model_data |>
          dplyr::select("player_id", "dakota"),
        by = "player_id"
      )
  }
  return(out)
}


================================================
FILE: R/aggregate_game_stats_def.R
================================================
################################################################################
# Author: Christian Lohr, Sebastian Carl, Tan Ho
# Styleguide: styler::tidyverse_style()
################################################################################

#' Get Official Game Stats on Defense
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#'   [load_pbp()] or [build_nflfastR_pbp()]. If the data doesn't include the variable
#'   `qb_epa`, the function `add_qb_epa()` will be called to add it.
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats
#'   for the entire Data frame.
#' @description Build columns that aggregate official defense stats
#'   either at the game level or at the level of the entire data frame passed.
#' @return A data frame of defensive player stats. See dictionary (# TODO)
#' @export
#' @keywords internal
#' @seealso The function [load_player_stats()] and the corresponding examples
#' on [the nflfastR website](https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats)
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   # pbp <- nflfastR::load_pbp(2020)
#'
#'   # weekly <- calculate_player_stats_def(pbp, weekly = TRUE)
#'   # dplyr::glimpse(weekly)
#'
#'   # overall <- calculate_player_stats_def(pbp, weekly = FALSE)
#'   # dplyr::glimpse(overall)
#' })
#' }
#'

#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# what do we need:
#
# Solo Tackles --> done
# Tackles With Assist --> done
# Assisted Tackles --> done
# Tackles for Loss --> done
# TFL Yards --> done
# Sacks --> done
# Sack Yards --> done
# QB Hits --> done
# Passes Defensed --> done
# Interceptions --> done
# Interception Yards --> done
# Interception Return TDs ///// --> only "TD" for defense
# Forced Fumbles --> done
# Opp Fumble Recoveries --> done
# Opp Fumble Recovery Yards --> done
# Opp Fumble Recovery TDs ///// --> only "TD" for defense
# Safeties --> done
# Penalties --> done
# Penalty Yards --> done
# Fumbles --> done
# Own Fumble Recoveries --> done
# Own Fumble Recovery Yards --> done
# Own Fumble Recovery TDs ///// --> only "TD" for defense

calculate_player_stats_def <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats_def()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)")

  # Prepare data ------------------------------------------------------------

  suppressMessages({
    # 1. for "normal" plays: get plays that count in official stats
    # we exclude special teams and 2pts here for now
    data <- pbp |>
      dplyr::filter(
        !is.na(.data$down),
        .data$play_type %in% c("pass", "qb_kneel", "qb_spike", "run")
      ) |>
      nflfastR::decode_player_ids()

    # 2. filter penalty plays for penalty stats
    penalty_data <- pbp |>
      dplyr::filter(.data$penalty == 1) |>
      nflfastR::decode_player_ids()
  })

  stype <- data |>
    dplyr::select("season", "week", "season_type") |>
    dplyr::distinct()

  # Tackling stats -----------------------------------------------------------

  tackle_vars <- c(
    "solo_tackle_1_player_id",
    "tackle_for_loss_1_player_id",
    "assist_tackle_1_player_id",
    "tackle_with_assist_1_player_id",
    "solo_tackle_2_player_id",
    "forced_fumble_player_1_player_id",
    "assist_tackle_2_player_id",
    "forced_fumble_player_2_player_id"
  )

  # get tackling stats
  tackle_df <- data |>
    dplyr::select("season", "week", "defteam", dplyr::any_of(tackle_vars)) |>
    tidyr::pivot_longer(
      cols = dplyr::any_of(tackle_vars),
      names_to = "desc",
      values_to = "tackle_player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::count(
      .data$tackle_player_id,
      .data$defteam,
      .data$season,
      .data$week,
      .data$desc
    ) |>
    dplyr::mutate(
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fill = 0L,
      values_fn = sum
    ) |>
    add_column_if_missing(
      "solo_tackle",
      "tackle_with_assist",
      "tackle_for_loss",
      "assist_tackle",
      "forced_fumble_player"
    ) |>
    dplyr::mutate(
      tackles = .data$solo_tackle + .data$tackle_with_assist
    ) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "player_id" = "tackle_player_id",
      "tackles",
      "tackles_solo" = "solo_tackle",
      "tackles_with_assist" = "tackle_with_assist",
      "tackle_assists" = "assist_tackle",
      "forced_fumbles" = "forced_fumble_player",
      "tackles_for_loss" = "tackle_for_loss"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      tackles = sum(.data$tackles, na.rm = TRUE),
      tackles_solo = sum(.data$tackles_solo, na.rm = TRUE),
      tackles_with_assist = sum(.data$tackles_with_assist, na.rm = TRUE),
      tackle_assists = sum(.data$tackle_assists, na.rm = TRUE),
      forced_fumbles = sum(.data$forced_fumbles, na.rm = TRUE),
      tackles_for_loss = sum(.data$tackles_for_loss, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get tackle for loss yards
  tackle_yards_df <- data |>
    dplyr::filter(
      .data$tackled_for_loss == 1,
      .data$fumble == 0,
      .data$sack == 0
    ) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "tackle_for_loss_1_player_id",
      "tackle_for_loss_2_player_id",
      "yards_gained"
    ) |>
    tidyr::pivot_longer(
      cols = c("tackle_for_loss_1_player_id", "tackle_for_loss_2_player_id"),
      names_to = "desc",
      values_to = "player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      tfl_yards = sum(-.data$yards_gained, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Sack and QB Hits stats -----------------------------------------------------------

  # get sack and pressure stats
  pressure_df <- data |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      dplyr::contains("sack_"),
      "yards_gained",
      dplyr::starts_with("qb_hit_"),
      -dplyr::contains("_name")
    ) |>
    tidyr::pivot_longer(
      cols = c(
        dplyr::contains("sack_"),
        dplyr::starts_with("qb_hit_")
      ),
      names_to = "desc",
      names_prefix = "sk_",
      values_to = "player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(
      n = dplyr::case_when(
        .data$desc %in%
          c("half_sack_1_player_id", "half_sack_2_player_id") ~ 0.5,
        TRUE ~ 1
      ),
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]") |>
        stringr::str_remove("half_")
    ) |>
    dplyr::mutate(
      sack_yards = .data$n * .data$yards_gained * -1
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = c(.data$n, .data$sack_yards),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing("n_sack", "n_qb_hit", "sack_yards_sack") |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id",
      "sacks" = "n_sack",
      "qb_hit" = "n_qb_hit",
      "sack_yards" = "sack_yards_sack"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      sacks = sum(.data$sacks, na.rm = TRUE),
      qb_hit = sum(.data$qb_hit, na.rm = TRUE),
      sack_yards = sum(.data$sack_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Interception and Deflection stats ---------------------------------------------------------

  # get int and def stats
  int_df <- data |>
    dplyr::select(
      "season",
      "week",
      "return_yards",
      "team" = "defteam",
      dplyr::starts_with("interception_"),
      dplyr::starts_with("pass_defense_"),
      -dplyr::contains("_name")
    ) |>
    tidyr::pivot_longer(
      cols = c(
        dplyr::starts_with("interception_"),
        dplyr::starts_with("pass_defense_")
      ),
      names_to = "desc",
      names_prefix = "int_",
      values_to = "db_player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = "desc",
      values_from = c("n", "return_yards"),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing(
      "n_interception",
      "n_pass_defense",
      "return_yards_interception"
    ) |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id" = "db_player_id",
      "int" = "n_interception",
      "pass_defended" = "n_pass_defense",
      "int_yards" = "return_yards_interception"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      int = sum(.data$int, na.rm = TRUE),
      pass_defended = sum(.data$pass_defended, na.rm = TRUE),
      int_yards = sum(.data$int_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Safety stats -----------------------------------------------------------

  safety_df <- data |>
    dplyr::filter(.data$safety == 1, !is.na(.data$safety_player_id)) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "player_id" = "safety_player_id"
    ) |>
    dplyr::count(
      .data$season,
      .data$week,
      .data$team,
      .data$player_id,
      name = "safety"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      safety = sum(.data$safety, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Fumble stats -----------------------------------------------------------

  # get fumble stats for fumbles and own fumble recoveries
  fumble_df_own <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumbled_1_team |
        .data$defteam == .data$fumbled_2_team
    ) |>
    dplyr::mutate(
      fumbled_1_player_id = dplyr::if_else(
        .data$defteam == .data$fumbled_1_team,
        .data$fumbled_1_player_id,
        NA_character_,
        NA_character_
      )
    ) |>
    dplyr::select(
      "season",
      "week",
      dplyr::matches("^fumble.+team"),
      dplyr::matches("^fumble.+player_id")
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("fumble"),
      names_pattern = "(.+)_(team|player_id)",
      names_to = c("desc", ".value")
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fn = sum,
      values_fill = 0L
    ) |>
    # Renaming fails if the columns don't exist. So we row bind a dummy tibble
    # including the relevant columns. The row will be filtered after renaming
    dplyr::bind_rows(
      tibble::tibble(
        player_id = NA_character_,
        fumbled = numeric(),
        fumble_recovery = numeric()
      )
    ) |>
    dplyr::rename(
      "fumble" = "fumbled",
      "fumble_recovery_own" = "fumble_recovery"
    ) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      fumble = sum(.data$fumble, na.rm = TRUE),
      fumble_recovery_own = sum(.data$fumble_recovery_own, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get fumble stats for opponent recoveries
  fumble_df_opp <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumble_recovery_1_team |
        .data$defteam == .data$fumble_recovery_2_team
    ) |>
    dplyr::mutate(
      # use data.table fifelse because base ifelse changed data type to logical
      # if there are 0 rows
      fumble_recovery_1_player_id = data.table::fifelse(
        .data$defteam != .data$fumbled_1_team,
        .data$fumble_recovery_1_player_id,
        NA_character_
      ),
      fumble_recovery_2_player_id = data.table::fifelse(
        .data$defteam != .data$fumbled_2_team,
        .data$fumble_recovery_2_player_id,
        NA_character_
      )
    ) |>
    dplyr::select(
      "season",
      "week",
      dplyr::matches("^fumble_recovery.+team"),
      dplyr::matches("^fumble_recovery.+player_id")
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("fumble"),
      names_pattern = "(.+)_(team|player_id)",
      names_to = c("desc", ".value")
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fn = sum,
      values_fill = 0L
    ) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    add_column_if_missing("fumble_recovery") |>
    dplyr::rename("fumble_recovery_opp" = "fumble_recovery") |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      fumble_recovery_opp = sum(.data$fumble_recovery_opp, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get fumble yards for own recoveries
  fumble_yds_own_data <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumbled_1_team |
        .data$defteam == .data$fumbled_2_team
    )

  fumble_yds_own_df <- fumble_yds_own_data |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$fumble_recovery_1_team,
      "player_id" = .data$fumble_recovery_1_player_id
    ) |>
    dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_1_yards)) |>
    dplyr::filter(!is.na(.data$player_id)) |> ### this happens when a fumble goes out of bounds. Noone gets yards --> NA/NA
    dplyr::bind_rows(
      fumble_yds_own_data |>
        dplyr::group_by(
          .data$season,
          .data$week,
          "team" = .data$fumble_recovery_2_team,
          "player_id" = .data$fumble_recovery_2_player_id
        ) |>
        dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_2_yards)) |>
        dplyr::filter(!is.na(.data$player_id))
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(fumble_recovery_yards_own = sum(.data$recovery_yards)) |>
    dplyr::ungroup()

  # get fumble yards for opp recoveries
  fumble_yds_opp_data <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumble_recovery_1_team,
      .data$defteam != .data$fumbled_1_team
    )

  fumble_yds_opp_df <- fumble_yds_opp_data |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$fumble_recovery_1_team,
      "player_id" = .data$fumble_recovery_1_player_id
    ) |>
    dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_1_yards)) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::bind_rows(
      fumble_yds_opp_data |>
        dplyr::group_by(
          .data$season,
          .data$week,
          "team" = .data$fumble_recovery_2_team,
          "player_id" = .data$fumble_recovery_2_player_id
        ) |>
        dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_2_yards)) |>
        dplyr::filter(!is.na(.data$player_id))
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(fumble_recovery_yards_opp = sum(.data$recovery_yards)) |>
    dplyr::ungroup()

  # Penalty stats -----------------------------------------------------------

  # get penalty stats
  penalty_df <- penalty_data |>
    dplyr::filter(
      !is.na(.data$penalty_player_id),
      .data$defteam == .data$penalty_team
    ) |>
    dplyr::select(
      "season",
      "week",
      "penalty_yards",
      "penalty_team",
      "penalty_player_id"
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("penalty"),
      names_pattern = "(.+)_(team|player_id|yards)",
      names_to = c("desc", ".value"),
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(n = 1) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = c(.data$n, .data$yards),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing("n_penalty", "yards_penalty") |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id",
      "penalty" = "n_penalty",
      "penalty_yards" = "yards_penalty"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      penalty = sum(.data$penalty, na.rm = TRUE),
      penalty_yards = sum(.data$penalty_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Touchdown stats -----------------------------------------------------------

  # get defensive touchdowns
  touchdown_df <- data |>
    dplyr::filter(.data$touchdown == 1) |>
    dplyr::filter(.data$defteam == .data$td_team) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$td_team,
      "player_id" = .data$td_player_id
    ) |>
    dplyr::summarise(td = sum(.data$touchdown)) |>
    dplyr::ungroup()

  # Combine all stats -------------------------------------------------------

  # combine all the stats together

  player_df <- tackle_df |>
    dplyr::full_join(
      tackle_yards_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      pressure_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(int_df, by = c("season", "week", "player_id", "team")) |>
    dplyr::full_join(
      safety_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_df_own,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_df_opp,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_yds_own_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_yds_opp_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      penalty_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      touchdown_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::mutate_if(is.numeric, tidyr::replace_na, 0) |>
    dplyr::left_join(
      nflreadr::load_players() |>
        dplyr::select(
          "player_id" = "gsis_id",
          "player_display_name" = "display_name",
          "player_name" = "short_name",
          "position",
          "position_group",
          "headshot_url" = "headshot"
        ),
      by = "player_id"
    ) |>
    dplyr::left_join(stype, by = c("season", "week")) |>
    dplyr::select(dplyr::any_of(c(
      # game information
      "season",
      "week",
      "season_type",

      # id information
      "player_id",
      "player_name",
      "player_display_name",
      "position",
      "position_group",
      "headshot_url",
      "team",

      # tackle stats
      "def_tackles" = "tackles",
      "def_tackles_solo" = "tackles_solo",
      "def_tackles_with_assist" = "tackles_with_assist",
      "def_tackle_assists" = "tackle_assists",
      "def_tackles_for_loss" = "tackles_for_loss",
      "def_tackles_for_loss_yards" = "tfl_yards",
      "def_fumbles_forced" = "forced_fumbles",

      # pressure stats
      "def_sacks" = "sacks",
      "def_sack_yards" = "sack_yards",
      "def_qb_hits" = "qb_hit",

      # coverage stats
      "def_interceptions" = "int",
      "def_interception_yards" = "int_yards",
      "def_pass_defended" = "pass_defended",

      # misc stats
      "def_tds" = "td",
      "def_fumbles" = "fumble",
      "def_fumble_recovery_own" = "fumble_recovery_own",
      "def_fumble_recovery_yards_own" = "fumble_recovery_yards_own",
      "def_fumble_recovery_opp" = "fumble_recovery_opp",
      "def_fumble_recovery_yards_opp" = "fumble_recovery_yards_opp",
      "def_safety" = "safety",
      "def_penalty" = "penalty",
      "def_penalty_yards" = "penalty_yards"
    ))) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::arrange(.data$player_id, .data$season, .data$week)

  # if user doesn't want week-by-week input, aggregate the whole df
  if (isFALSE(weekly)) {
    player_df <- player_df |>
      dplyr::group_by(.data$player_id, .data$team) |>
      dplyr::summarise(
        player_name = custom_mode(.data$player_name),
        player_display_name = custom_mode(.data$player_display_name),
        games = dplyr::n(),
        position = custom_mode(.data$position),
        position_group = custom_mode(.data$position_group),
        headshot_url = custom_mode(.data$headshot_url),
        def_tackles = sum(.data$def_tackles),
        def_tackles_solo = sum(.data$def_tackles_solo),
        def_tackles_with_assist = sum(.data$def_tackles_with_assist),
        def_tackle_assists = sum(.data$def_tackle_assists),
        def_tackles_for_loss = sum(.data$def_tackles_for_loss),
        def_tackles_for_loss_yards = sum(.data$def_tackles_for_loss_yards),
        def_fumbles_forced = sum(.data$def_fumbles_forced),
        def_sacks = sum(.data$def_sacks),
        def_sack_yards = sum(.data$def_sack_yards),
        def_qb_hits = sum(.data$def_qb_hits),
        def_interceptions = sum(.data$def_interceptions),
        def_interception_yards = sum(.data$def_interception_yards),
        def_pass_defended = sum(.data$def_pass_defended),
        def_tds = sum(.data$def_tds),
        def_fumbles = sum(.data$def_fumbles),
        def_fumble_recovery_own = sum(.data$def_fumble_recovery_own),
        def_fumble_recovery_yards_own = sum(
          .data$def_fumble_recovery_yards_own
        ),
        def_fumble_recovery_opp = sum(.data$def_fumble_recovery_opp),
        def_fumble_recovery_yards_opp = sum(
          .data$def_fumble_recovery_yards_opp
        ),
        def_safety = sum(.data$def_safety),
        def_penalty = sum(.data$def_penalty),
        def_penalty_yards = sum(.data$def_penalty_yards)
      ) |>
      dplyr::ungroup() |>
      dplyr::select(
        "player_id",
        "player_name",
        "player_display_name",
        "games",
        "position",
        "position_group",
        "headshot_url",
        "team",
        dplyr::everything()
      )
  }

  player_df
}

# This function checks if the variables in ... exists as column
# names in the argument .data. If not, it adds those columns and assigns
# them the value in the argument value
add_column_if_missing <- function(.data, ..., value = 0L) {
  dots <- rlang::list2(...)
  new_cols <- dots[!dots %in% names(.data)]
  .data[, unlist(new_cols)] <- value
  .data
}


================================================
FILE: R/aggregate_game_stats_kicking.R
================================================
#' Summarize Kicking Stats
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' Build columns that aggregate kicking stats at the game level.
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#' [load_pbp()] or [build_nflfastR_pbp()].
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats for
#' the entire data frame in argument `pbp`.
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'     # pbp <- nflreadr::load_pbp(2021)
#'     # weekly <- calculate_player_stats_kicking(pbp, weekly = TRUE)
#'     # dplyr::glimpse(weekly)
#'
#'     # overall <- calculate_player_stats_kicking(pbp, weekly = FALSE)
#'     # dplyr::glimpse(overall)
#' })
#' }
#'
#' @return a dataframe of kicking stats
#' @seealso <https://nflreadr.nflverse.com/reference/load_player_stats.html> for the nflreadr function to download this from repo (`stat_type = "kicking"`)
#' @export
#' @keywords internal
calculate_player_stats_kicking <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats_kicking()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)")

  # First, creating a grouping variable object to toggle the weekly argument w/
  grp_vars <- if (isTRUE(weekly)) {
    list("season", "week", "season_type", "player_id", "team")
  } else if (isFALSE(weekly)) {
    list("player_id", "team")
  }
  grp_vars <- lapply(grp_vars, as.symbol)

  # Filtering down / creating a base dataset
  df_fg_or_pat <- pbp |>
    dplyr::group_by(.data$game_id, .data$posteam) |>
    dplyr::filter(
      .data$field_goal_attempt == 1 |
        .data$extra_point_attempt == 1 |
        .data$fixed_drive == max(.data$fixed_drive, na.rm = TRUE)
    ) |>
    dplyr::ungroup() |>
    dplyr::filter(!is.na(.data$kicker_player_id)) |>
    dplyr::select(
      "game_id",
      "season",
      "week",
      "season_type",
      "team" = "posteam",
      "player_name" = "kicker_player_name",
      "player_id" = "kicker_player_id",
      "dist" = "kick_distance",
      "field_goal_attempt",
      "fg_res" = "field_goal_result",
      "extra_point_attempt",
      "pat_res" = "extra_point_result",
      "fixed_drive",
      "score_differential"
    )

  # Field-goal relevant columns
  df_field_goals <- df_fg_or_pat |>
    dplyr::filter(.data$field_goal_attempt == 1) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::mutate(
      temp_made_idx = .data$fg_res == "made",
      temp_miss_idx = .data$fg_res == "missed",
      temp_block_idx = .data$fg_res == "blocked"
    ) |>
    dplyr::summarise(
      games_fg = list(unique(.data$game_id)),
      fg_made = sum(.data$temp_made_idx, na.rm = TRUE),
      fg_att = sum(.data$field_goal_attempt, na.rm = TRUE),
      fg_missed = sum(.data$temp_miss_idx, na.rm = TRUE),
      fg_blocked = sum(.data$temp_block_idx, na.rm = TRUE),
      fg_long = if (any(.data$temp_made_idx, na.rm = TRUE)) {
        max(.data$dist[.data$temp_made_idx], na.rm = TRUE)
      } else {
        NA_real_
      },
      fg_pct = round(.data$fg_made / .data$fg_att, 3L),
      fg_made_0_19 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 0, 19),
        na.rm = TRUE
      ),
      fg_made_20_29 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 20, 29),
        na.rm = TRUE
      ),
      fg_made_30_39 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 30, 39),
        na.rm = TRUE
      ),
      fg_made_40_49 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 40, 49),
        na.rm = TRUE
      ),
      fg_made_50_59 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 50, 59),
        na.rm = TRUE
      ),
      fg_made_60_ = sum(.data$dist[.data$temp_made_idx] >= 60, na.rm = TRUE),
      fg_missed_0_19 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 0, 19),
        na.rm = TRUE
      ),
      fg_missed_20_29 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 20, 29),
        na.rm = TRUE
      ),
      fg_missed_30_39 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 30, 39),
        na.rm = TRUE
      ),
      fg_missed_40_49 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 40, 49),
        na.rm = TRUE
      ),
      fg_missed_50_59 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 50, 59),
        na.rm = TRUE
      ),
      fg_missed_60_ = sum(.data$dist[.data$temp_miss_idx] >= 60, na.rm = TRUE),
      fg_made_list = paste(
        stats::na.omit(.data$dist[.data$temp_made_idx]),
        collapse = ";"
      ),
      fg_missed_list = paste(
        stats::na.omit(.data$dist[.data$temp_miss_idx]),
        collapse = ";"
      ),
      fg_blocked_list = paste(
        stats::na.omit(.data$dist[.data$temp_block_idx]),
        collapse = ";"
      ),
      fg_made_distance = sum(.data$dist[.data$temp_made_idx], na.rm = TRUE),
      fg_missed_distance = sum(.data$dist[.data$temp_miss_idx], na.rm = TRUE),
      fg_blocked_distance = sum(.data$dist[.data$temp_block_idx], na.rm = TRUE),
      .groups = "drop"
    )

  # Extra points
  df_pat <- df_fg_or_pat |>
    dplyr::filter(.data$extra_point_attempt == 1) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      games_pat = list(unique(.data$game_id)),
      pat_made = sum(.data$pat_res == "good", na.rm = TRUE),
      pat_att = sum(.data$extra_point_attempt, na.rm = TRUE),
      pat_missed = sum(.data$pat_res == "failed", na.rm = TRUE),
      pat_blocked = sum(.data$pat_res == "blocked", na.rm = TRUE),
      pat_pct = round(.data$pat_made / .data$pat_att, 3L),
      .groups = "drop"
    )

  # The Game Winning kicks distance include up to one value at the weekly level
  # but can include multiple across the season. This is one way to account for that.
  # the downside is that the column names change depending on if it is weekly vs
  # seasonal.
  if (weekly) {
    gw_dist_name <- "gwfg_distance"
  } else {
    gw_dist_name <- "gwfg_distance_list"
  }

  # See the above note. I wonder if this should also include field goals that tie
  # the game but I kept the filter dplyr::between(score_differential, -2, 0) the way
  # that is was previously. If you do include field goals that send the game into OT,
  # then you'll probably need to include the gwfg_distance AND gwfg_distance_list columns
  # in the weekly data
  game_winners <- df_fg_or_pat |>
    dplyr::group_by(.data$game_id, .data$team) |>
    dplyr::filter(.data$fixed_drive == max(.data$fixed_drive, na.rm = TRUE)) |>
    dplyr::ungroup() |>
    dplyr::filter(
      .data$field_goal_attempt == 1,
      dplyr::between(.data$score_differential, -2, 0)
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      games_gwfg = list(unique(.data$game_id)),
      gwfg_att = dplyr::n(),
      !!gw_dist_name := if (weekly) {
        .data$dist
      } else {
        paste(stats::na.omit(.data$dist), collapse = ";")
      },
      gwfg_made = sum(.data$fg_res == "made", na.rm = TRUE),
      gwfg_missed = sum(.data$fg_res == "missed", na.rm = TRUE),
      gwfg_blocked = sum(.data$fg_res == "blocked", na.rm = TRUE),
      .groups = "drop"
    )

  # Prepping data to merge-in player names
  df_player_names <- nflreadr::load_players() |>
    dplyr::select(
      "player_id" = "gsis_id",
      "player_display_name" = "display_name",
      "player_name" = "short_name",
      "position",
      "position_group",
      "headshot_url" = "headshot"
    )

  # Joining all the data together and organizing the first few columns.
  full_kicks <- df_field_goals |>
    dplyr::full_join(df_pat, as.character(grp_vars)) |>
    dplyr::full_join(game_winners, as.character(grp_vars)) |>
    dplyr::left_join(df_player_names, "player_id") |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::mutate(
      games = length(unique(unlist(c(
        .data$games_fg,
        .data$games_pat,
        .data$games_gwfg
      ))))
    ) |>
    dplyr::ungroup() |>
    dplyr::select(
      dplyr::any_of(c("season", "week", "season_type")),
      "player_id",
      "team",
      "player_name",
      "player_display_name",
      "games",
      "position",
      "position_group",
      "headshot_url",
      dplyr::everything(),
      -c("games_fg", "games_pat", "games_gwfg")
    ) |>
    # replace "" with NA
    dplyr::mutate_all(~ replace(.x, nchar(.x) == 0 | is.nan(.x), NA)) |>
    # replace NA in attempt columns with 0
    dplyr::mutate_at(
      c("fg_att", "pat_att", "gwfg_att"),
      ~ tidyr::replace_na(.x, 0)
    )

  if (weekly) {
    full_kicks |>
      dplyr::select(-"games") |>
      dplyr::arrange(.data$player_id, .data$season, .data$week)
  } else {
    full_kicks |>
      dplyr::arrange(.data$player_id)
  }
}


================================================
FILE: R/build_nflfastR_pbp.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Wrapper around multiple nflfastR functions
# Code Style Guide: styler::tidyverse_style()
################################################################################

# The idea for this wrapper as well as some helper functions and the documentation
# style are heavily borrowed from the r-lib package pkgdown (https://github.com/r-lib/pkgdown/blob/master/R/build.r)

#' Build a Complete nflfastR Data Set
#'
#' @description
#' `build_nflfastR_pbp` is a convenient wrapper around 6 nflfastR functions:
#'
#' \itemize{
#'  \item{[fast_scraper()]}
#'  \item{[clean_pbp()]}
#'  \item{[add_qb_epa()]}
#'  \item{[add_xyac()]}
#'  \item{[add_xpass()]}
#'  \item{[decode_player_ids()]}
#' }
#'
#' Please see either the documentation of each function or
#' [the nflfastR Field Descriptions website](https://nflfastr.com/articles/field_descriptions.html)
#' to learn about the output.
#'
#' @inheritParams fast_scraper
#' @param decode If `TRUE`, the function [decode_player_ids()] will be executed.
#' @param rules If `FALSE`, printing of the header and footer in the console output will be suppressed.
#' @return An nflfastR play-by-play data frame like it can be loaded from <https://github.com/nflverse/nflverse-data>.
#' @details To load valid game_ids please use the package function [fast_scraper_schedules()].
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @export
#' @examples
#' \donttest{
#' # Build nflfastR pbp for the 2018 and 2019 Super Bowls
#' try({# to avoid CRAN test problems
#' build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
#' })
#'
#' # It is also possible to directly use the
#' # output of `load_schedules` as input
#' try({# to avoid CRAN test problems
#' nflreadr::load_schedules(2025) |>
#'   dplyr::slice_tail(n = 3) |>
#'   build_nflfastR_pbp()
#' })
#'
#' \dontshow{
#' # Close open connections for R CMD Check
#' future::plan("sequential")
#' }
#' }
build_nflfastR_pbp <- function(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...,
  decode = TRUE,
  rules = TRUE
) {
  if (!is.vector(game_ids) && is.data.frame(game_ids)) {
    game_ids <- game_ids$game_id
  }

  if (!is.vector(game_ids)) {
    cli::cli_abort("Param {.arg game_ids} is not a valid vector!")
  }

  if (isTRUE(decode) && !is_installed("gsisdecoder")) {
    cli::cli_abort(
      "Package {.pkg gsisdecoder} required for decoding. Please install it with {.code install.packages(\"gsisdecoder\")}."
    )
  }

  if (isTRUE(rules)) {
    rule_header("Build nflfastR Play-by-Play Data")
  }

  # nflfastR v6 stopped supporting the 1999 and 2000 seasons because of
  # inconsistent data sources. Data is still available through load_pbp
  # but we will not fix any issues.
  # It's possible to install nflfastR v5.2.0 to parse those seasons.
  # try pak::pak("nflverse/nflfastR@v5.2.0")
  game_ids <- check_for_dropped_seasons(game_ids)

  game_count <- ifelse(is.vector(game_ids), length(game_ids), nrow(game_ids))
  builder <- TRUE

  cli::cli_ul("{my_time()} | Start download of {game_count} game{?s}...")

  ret <- fast_scraper(
    game_ids = game_ids,
    dir = dir,
    ...,
    in_builder = builder
  ) |>
    clean_pbp(in_builder = builder) |>
    add_qb_epa(in_builder = builder) |>
    add_xyac(in_builder = builder) |>
    add_xpass(in_builder = builder)

  if (isTRUE(decode)) {
    ret <- decode_player_ids(ret, in_builder = builder)
  }

  if (isTRUE(rules)) {
    rule_footer("DONE")
  }

  make_nflverse_data(ret)
}


================================================
FILE: R/build_playstats.R
================================================
build_playstats <- function(
  seasons = nflreadr::most_recent_season(),
  stat_ids = 1:1000,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  skip_local = FALSE
) {
  if (is_sequential()) {
    cli::cli_alert_info(
      "It is recommended to use parallel processing when using this function. \\
        Please consider running {.code future::plan(\"multisession\")}! \\
        Will go on sequentially...",
      wrap = TRUE
    )
  }

  games <- nflreadr::load_schedules(seasons = seasons) |>
    dplyr::filter(!is.na(.data$result)) |>
    dplyr::pull(.data$game_id)

  p <- progressr::progressor(along = games)

  l <- furrr::future_map(
    games,
    function(id, p = NULL, dir, skip_local) {
      if (id %in% c("2000_03_SD_KC", "2000_06_BUF_MIA", "1999_01_BAL_STL")) {
        cli::cli_alert_warning(
          "We are missing raw game data of {.val {id}}. Skipping."
        )
        return(data.frame())
      }
      season <- substr(id, 1, 4)
      raw_data <- load_raw_game(id, dir = dir, skip_local = skip_local)
      if (season <= 2000) {
        drives <- raw_data[[1]][["drives"]] |>
          purrr::keep(is.list)
        out <- tibble::tibble(d = drives) |>
          tidyr::unnest_wider("d") |>
          tidyr::unnest_longer("plays") |>
          tidyr::unnest_wider("plays", names_sep = "_") |>
          dplyr::select("playId" = "plays_id", "playStats" = "plays_players") |>
          dplyr::mutate(
            playId = uniquify_ids(.data$playId)
          ) |>
          tidyr::unnest_longer("playStats") |>
          tidyr::unnest_longer("playStats") |>
          tidyr::unnest_wider("playStats") |>
          dplyr::mutate(
            playId = as.integer(.data$playId),
            statId = as.integer(.data$statId),
            yards = as.integer(.data$yards),
            team.id = NA_character_
          ) |>
          dplyr::select(-"sequence") |>
          dplyr::rename(
            team.abbreviation = "clubcode",
            gsis.Player.id = "playStats_id"
          ) |>
          tidyr::nest(
            playStats = c(
              "statId",
              "yards",
              "playerName",
              "team.id",
              "team.abbreviation",
              "gsis.Player.id"
            )
          )
      } else {
        out <- raw_data$data$viewer$gameDetail$plays[, c("playId", "playStats")]
      }
      out$game_id <- as.character(id)
      p(sprintf("ID=%s", as.character(id)))
      out
    },
    p = p,
    dir = dir,
    skip_local = skip_local
  )

  out <- data.table::rbindlist(l) |>
    tidyr::unnest(cols = c("playStats")) |>
    janitor::clean_names() |>
    dplyr::filter(.data$stat_id %in% stat_ids) |>
    dplyr::mutate(
      season = as.integer(substr(.data$game_id, 1, 4)),
      week = as.integer(substr(.data$game_id, 6, 7))
    ) |>
    decode_player_ids() |>
    dplyr::select(
      "game_id",
      "season",
      "week",
      "play_id",
      "stat_id",
      "yards",
      "team_abbr" = "team_abbreviation",
      "player_name",
      "gsis_player_id",
    ) |>
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ dplyr::na_if(.x, "")
    )
  out
}


================================================
FILE: R/calculate_series_conversion_rates.R
================================================
#' Compute Series Conversion Information from Play by Play
#'
#' @description A "Series" begins on a 1st and 10 and each team attempts to either earn
#'   a new 1st down (on offense) or prevent the offense from converting a new
#'   1st down (on defense). Series conversion rate represents how many series
#'   have been either converted to a new 1st down or ended in a touchdown.
#'   This function computes series conversion rates on offense and defense from
#'   nflverse play-by-play data along with other series results.
#'   The function automatically removes series that ended in a QB kneel down.
#'
#' @param pbp Play-by-play data as returned by [`load_pbp()`], [`build_nflfastR_pbp()`], or
#'   [`fast_scraper()`].
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise,
#'   season-by-season stats in argument `pbp`.
#'
#' @return A data frame of series information including the following columns:
#' \describe{
#' \item{season}{The NFL season}
#' \item{team}{NFL team abbreviation}
#' \item{week}{Week if `weekly` is `TRUE`}
#' \item{off_n}{The number of series the offense played (excludes QB kneel
#' downs, kickoffs, extra point/two point conversion attempts, non-plays, and
#' plays that do not list a "posteam")}
#' \item{off_scr}{The rate at which a series ended in either new 1st down or
#' touchdown while the offense was on the field}
#' \item{off_scr_1st}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 1st down}
#' \item{off_scr_2nd}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 2nd down}
#' \item{off_scr_3rd}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 3rd down}
#' \item{off_scr_4th}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 4th down}
#' \item{off_1st}{The rate of series that ended in a new 1st down while the
#' offense was on the field (does not include offensive touchdown)}
#' \item{off_td}{The rate of series that ended in an offensive touchdown while the
#' offense was on the field}
#' \item{off_fg}{The rate of series that ended in a field goal attempt while the
#' offense was on the field}
#' \item{off_punt}{The rate of series that ended in a punt while the
#' offense was on the field}
#' \item{off_to}{The rate of series that ended in a turnover (including on downs), in an
#' opponent score, or at the end of half (or game) while the
#' offense was on the field}
#' \item{def_n}{The number of series the defense played (excludes QB kneel
#' downs, kickoffs, extra point/two point conversion attempts, non-plays, and
#' plays that do not list a "posteam")}
#' \item{def_scr}{The rate at which a series ended in either new 1st down or
#' touchdown while the defense was on the field}
#' \item{def_scr_1st}{The rate at which a defense allowed a
#' 1st down or touchdown on 1st down}
#' \item{def_scr_2nd}{The rate at which a defense allowed a
#' 1st down or touchdown on 2nd down}
#' \item{def_scr_3rd}{The rate at which a defense allowed a
#' 1st down or touchdown on 3rd down}
#' \item{def_scr_4th}{The rate at which a defense allowed a
#' 1st down or touchdown on 4th down}
#' \item{def_1st}{The rate of series that ended in a new 1st down while the
#' defense was on the field (does not include offensive touchdown)}
#' \item{def_td}{The rate of series that ended in an offensive touchdown while the
#' defense was on the field}
#' \item{def_fg}{The rate of series that ended in a field goal attempt while the
#' defense was on the field}
#' \item{def_punt}{The rate of series that ended in a punt while the
#' defense was on the field}
#' \item{def_to}{The rate of series that ended in a turnover (including on downs), in an
#' opponent score, or at the end of half (or game) while the
#' defense was on the field}
#' }
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   pbp <- nflfastR::load_pbp(2021)
#'
#'   weekly <- calculate_series_conversion_rates(pbp, weekly = TRUE)
#'   dplyr::glimpse(weekly)
#'
#'   overall <- calculate_series_conversion_rates(pbp, weekly = FALSE)
#'   dplyr::glimpse(overall)
#' })
#' }
calculate_series_conversion_rates <- function(pbp, weekly = FALSE) {
  if (isTRUE(weekly)) {
    grp <- c("season", "team", "week")
  } else if (isFALSE(weekly)) {
    grp <- c("season", "team")
  }
  grp_vars <- lapply(grp, as.symbol)

  # Offense -----------------------------------------------------------------

  off_series <- pbp |>
    dplyr::filter(
      !is.na(.data$down),
      .data$series_result != "QB kneel"
      # .data$rush == 1 | .data$pass == 1
    ) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      team = .data$posteam,
      .data$series
    ) |>
    dplyr::summarise(
      conversion = dplyr::first(.data$series_success),
      result = dplyr::first(.data$series_result),
      last_down = dplyr::last(.data$down),
      .groups = "drop"
    )

  offense <- off_series |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      off_n = dplyr::n(),
      off_scr = mean(.data$conversion),
      off_scr_1st = mean(.data$last_down == 1 * .data$conversion),
      off_scr_2nd = mean(.data$last_down == 2 * .data$conversion),
      off_scr_3rd = mean(.data$last_down == 3 * .data$conversion),
      off_scr_4th = mean(.data$last_down == 4 * .data$conversion),
      off_1st = mean(.data$result == "First down"),
      off_td = mean(.data$result == "Touchdown"),
      off_fg = mean(.data$result %in% c("Field goal", "Missed field goal")),
      off_punt = mean(.data$result == "Punt"),
      off_to = mean(
        .data$result %in%
          c(
            "Turnover on downs",
            "Turnover",
            "Opp touchdown",
            "Safety",
            "End of half"
          )
      ),
      .groups = "drop"
    )

  # Defense -----------------------------------------------------------------

  def_series <- pbp |>
    dplyr::filter(
      !is.na(.data$down),
      .data$series_result != "QB kneel"
      # .data$rush == 1 | .data$pass == 1
    ) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      team = .data$defteam,
      .data$series
    ) |>
    dplyr::summarise(
      conversion = dplyr::first(.data$series_success),
      result = dplyr::first(.data$series_result),
      last_down = dplyr::last(.data$down),
      .groups = "drop"
    )

  defense <- def_series |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      def_n = dplyr::n(),
      def_scr = mean(.data$conversion),
      def_scr_1st = mean(.data$last_down == 1 * .data$conversion),
      def_scr_2nd = mean(.data$last_down == 2 * .data$conversion),
      def_scr_3rd = mean(.data$last_down == 3 * .data$conversion),
      def_scr_4th = mean(.data$last_down == 4 * .data$conversion),
      def_1st = mean(.data$result == "First down"),
      def_td = mean(.data$result == "Touchdown"),
      def_fg = mean(.data$result %in% c("Field goal", "Missed field goal")),
      def_punt = mean(.data$result == "Punt"),
      def_to = mean(
        .data$result %in%
          c(
            "Turnover on downs",
            "Turnover",
            "Opp touchdown",
            "Safety",
            "End of half"
          )
      ),
      .groups = "drop"
    )

  # Offense + Defense -------------------------------------------------------

  combined <- dplyr::full_join(offense, defense, by = grp)

  combined
}


================================================
FILE: R/calculate_standings.R
================================================
#' Compute Division Standings and Conference Seeds from Play by Play
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated and replaced by [nflseedR::nfl_standings()].
#'
#' This function calculates division standings as well as playoff
#'   seeds per conference based on either nflverse play-by-play data or nflverse
#'   schedule data.
#'
#' @param nflverse_object Data object of class `nflverse_data`. Either schedules
#'   as returned by [`fast_scraper_schedules()`] or [`nflreadr::load_schedules()`].
#'   Or play-by-play data as returned by [`load_pbp()`], [`build_nflfastR_pbp()`], or
#'  [`fast_scraper()`].
#' @param playoff_seeds Number of playoff teams per conference. If `NULL` (the
#'   default), the function will try to split `nflverse_object` into seasons prior
#'   2020 (6 seeds) and 2020ff (7 seeds). If set to a numeric, it will be used
#'   for all seasons in `nflverse_object`!
#' @inheritParams nflseedR::compute_conference_seeds
#'
#' @keywords internal
#' @return A tibble with NFL regular season standings
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   # load nflverse data both schedules and pbp
#'   # scheds <- fast_scraper_schedules(2014)
#'   # pbp <- load_pbp(c(2018, 2021))
#'
#'   # calculate standings based on pbp
#'   # calculate_standings(pbp)
#'
#'   # calculate standings based on schedules
#'   # calculate_standings(scheds)
#' })
#' }
calculate_standings <- function(
  nflverse_object,
  tiebreaker_depth = 3,
  playoff_seeds = NULL
) {
  lifecycle::deprecate_warn(
    "5.1.0",
    "calculate_standings()",
    "nflseedR::nfl_standings()"
  )

  if (!inherits(nflverse_object, "nflverse_data")) {
    cli::cli_abort(
      "The function argument {.arg nflverse_object} has to be
                   of class {.cls nflverse_data}"
    )
  }

  rlang::check_installed(
    "nflseedR",
    "to compute standings.",
    compare = ">=",
    version = "1.0.2"
  )

  type <- attr(nflverse_object, "nflverse_type")

  if (type == "play by play data") {
    .standings_from_pbp(
      nflverse_object,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  } else if (type == "games and schedules") {
    .standings_from_games(
      nflverse_object,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  } else {
    cli::cli_abort(
      "Can only handle nflverse_type {.val play by play data} or
                   {.val games and schedules} and not {.val {type}}"
    )
  }
}

.standings_from_pbp <- function(pbp, tiebreaker_depth, playoff_seeds) {
  g <- pbp |>
    dplyr::filter(.data$season_type == "REG") |>
    dplyr::group_by(.data$game_id) |>
    dplyr::summarise(
      sim = dplyr::first(.data$season),
      game_type = dplyr::first(.data$season_type),
      week = dplyr::first(.data$week),
      away_team = dplyr::first(.data$away_team),
      home_team = dplyr::first(.data$home_team),
      result = dplyr::last(.data$home_score) - dplyr::last(.data$away_score)
    ) |>
    dplyr::ungroup() |>
    dplyr::select(-"game_id")

  if (is.null(playoff_seeds)) {
    g6 <- g |>
      dplyr::filter(.data$sim %in% 1999:2019)
    g7 <- g |>
      dplyr::filter(.data$sim >= 2020)
    dplyr::bind_rows(
      .compute_standings(
        g6,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 6
      ),
      .compute_standings(
        g7,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 7
      )
    )
  } else {
    .compute_standings(
      g,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  }
}

.standings_from_games <- function(games, tiebreaker_depth, playoff_seeds) {
  g <- games |>
    dplyr::filter(.data$game_type == "REG", !is.na(.data$result)) |>
    dplyr::select(
      "sim" = "season",
      "game_type",
      "week",
      "away_team",
      "home_team",
      "result"
    )

  if (is.null(playoff_seeds)) {
    g6 <- g |>
      dplyr::filter(.data$sim %in% 1999:2019)
    g7 <- g |>
      dplyr::filter(.data$sim >= 2020)
    dplyr::bind_rows(
      .compute_standings(
        g6,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 6
      ),
      .compute_standings(
        g7,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 7
      )
    )
  } else {
    .compute_standings(
      g,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  }
}

.compute_standings <- function(games, tiebreaker_depth, playoff_seeds) {
  if (nrow(games) == 0) {
    return(data.frame())
  }
  suppressMessages({
    div <- nflseedR::compute_division_ranks(
      games,
      tiebreaker_depth = tiebreaker_depth
    )
    conf <- nflseedR::compute_conference_seeds(
      div,
      h2h = div$h2h,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  })
  conf$standings |>
    dplyr::select(-"exit", -"wins") |>
    dplyr::select("sim":"division", "div_rank", "seed", dplyr::everything()) |>
    dplyr::rename("season" = "sim", "wins" = "true_wins") |>
    dplyr::arrange(.data$season, .data$division, .data$div_rank, .data$seed) |>
    tibble::as_tibble()
}


================================================
FILE: R/calculate_stats.R
================================================
################################################################################
# Author: Sebastian Carl
################################################################################

#' Calculate NFL Stats
#'
#' Compute various NFL stats based off nflverse Play-by-Play data.
#'
#' @param seasons A numeric vector of 4-digit years associated with given NFL
#'  seasons - defaults to latest season. If set to TRUE, returns all available
#'  data since 1999. Ignored if argument `pbp` is not `NULL`.
#' @param summary_level Summarize stats by `"season"` or `"week"`.
#' @param stat_type Calculate `"player"` level stats or `"team"` level stats.
#' @param season_type One of `"REG"`, `"POST"`, or `"REG+POST"`. Filters
#'  data to regular season ("REG"), post season ("POST") or keeps all data.
#'  Only applied if `summary_level` == `"season"`.
#' @param pbp This argument allows passing a subset of nflverse play-by-play
#'  data, created with [build_nflfastR_pbp()] or loaded with [load_pbp()].
#'  Stats are then calculated based on the `game_id`s and `play_id`s in this
#'  subset of play-by-play data, rather then using the seasons specified in the
#'  `seasons` argument. The function will error if required variables are
#'  missing from the subset, but lists which variables are missing.
#'  If `pbp = NULL` (the default), all available games and plays from the
#'  `seasons` argument are used to calculate stats.
#'  Please use this responsibly, because the output is structurally identical
#'  to full seasons, even if plays have been filtered out. It may then appear
#'  as if the stats are incorrect. If `pbp` is not `NULL`, the function will add
#'  the attribute `"custom_pbp" = TRUE` to the function output to help identify
#'  stats that are possibly based on play-by-play subsets.
#'
#' @return A tibble of player/team stats summarized by season/week.
#' @seealso [nfl_stats_variables] for a description of all variables.
#' @seealso <https://nflfastr.com/articles/stats_variables.html> for a searchable
#' table of the stats variable descriptions.
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' stats <- calculate_stats(2023, "season", "player")
#' dplyr::glimpse(stats)
#' })
#' }
calculate_stats <- function(
  seasons = nflreadr::most_recent_season(),
  summary_level = c("season", "week"),
  stat_type = c("player", "team"),
  season_type = c("REG", "POST", "REG+POST"),
  pbp = NULL
) {
  summary_level <- rlang::arg_match(summary_level)
  stat_type <- rlang::arg_match(stat_type)
  season_type <- rlang::arg_match(season_type)
  custom_pbp <- !is.null(pbp)

  if (!custom_pbp) {
    pbp <- nflreadr::load_pbp(seasons = seasons)
  }

  # make sure (custom) pbp includes all required variables.
  # stats_validate_pbp will return all unique seasons in pbp.
  # We'll use this to download playstats for all seasons listed in pbp.
  seasons_in_pbp <- stats_validate_pbp(pbp)

  # we don't want groups to mess up something or slow us down.
  # this is only relevant if a user supplies grouped pbp data
  pbp <- dplyr::ungroup(pbp)

  if (season_type %in% c("REG", "POST") && summary_level == "season") {
    pbp <- dplyr::filter(pbp, .data$season_type == .env$season_type)
    if (nrow(pbp) == 0) {
      cli::cli_alert_warning(
        "Filtering {.val {seasons}} data to {.arg season_type} == \\
        {.val {season_type}} resulted in 0 rows. Returning empty tibble."
      )
      return(tibble::tibble())
    }
  }

  # defensive stats require knowledge of which team is on defense
  # special teams stats require knowledge of which plays were special teams plays
  playinfo <- pbp |>
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::summarise(
      off = .data$posteam,
      def = .data$defteam,
      special = as.integer(.data$special == 1)
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate_at(
      .vars = dplyr::vars("off", "def"),
      .funs = team_name_fn
    )

  season_type_from_pbp <- pbp |>
    dplyr::select("game_id", "season_type") |>
    dplyr::distinct()
  s_type_vctr <- season_type_from_pbp$season_type |>
    rlang::set_names(season_type_from_pbp$game_id)

  gwfg_attempts_from_pbp <- pbp |>
    dplyr::mutate(
      # final_posteam_score = data.table::fifelse(.data$posteam_type == "home", .data$home_score, .data$away_score),
      final_defteam_score = data.table::fifelse(
        .data$posteam_type == "home",
        .data$away_score,
        .data$home_score
      ),
      identifier = paste(.data$game_id, .data$play_id, sep = "_")
    ) |>
    dplyr::group_by(.data$game_id, .data$posteam) |>
    dplyr::mutate(
      # A game winning field goal attempt is
      # - a field goal attempt,
      # - in the posteam's final drive,
      # - where the posteam trailed the defteam by 2 points or less prior to the kick,
      # - and the defteam did not score afterwards
      is_gwfg_attempt = dplyr::case_when(
        .data$field_goal_attempt == 1 &
          .data$fixed_drive == max(.data$fixed_drive) &
          dplyr::between(.data$score_differential, -2, 0) &
          .data$defteam_score == .data$final_defteam_score ~ 1L,
        TRUE ~ 0L
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::filter(
      is_gwfg_attempt == 1L
    ) |>
    dplyr::select("identifier", "is_gwfg_attempt")
  gwfg_vctr <- gwfg_attempts_from_pbp$is_gwfg_attempt |>
    rlang::set_names(gwfg_attempts_from_pbp$identifier)

  # load_playstats defined below
  # more_stats = all stat IDs of one player in a single play
  # team_stats = all stat IDs of one team in a single play
  # all_stats = all stat IDs of a play, regardless of team (we need this for punting)
  # we need those to identify things like fumbles depending on playtype or
  # first downs depending on playtype
  playstats <- load_playstats(seasons = seasons_in_pbp) |>
    # apply filtering on play stats so that it matches only plays included
    # in pbp in case it was provided manually
    dplyr::semi_join(pbp, by = c("game_id", "play_id")) |>
    dplyr::rename("player_id" = "gsis_player_id", "team" = "team_abbr") |>
    dplyr::group_by(.data$season, .data$week, .data$play_id, .data$player_id) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      more_stats = paste0(";", paste(stat_id, collapse = ";"), ";")
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$play_id, .data$team) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      team_stats = paste0(";", paste(stat_id, collapse = ";"), ";"),
      team_play_air_yards = sum((stat_id %in% 111:112) * yards)
    ) |>
    # need to group by game and play here to avoid mixing of play IDs of different
    # games in the same week
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      all_stats = paste0(";", paste(stat_id, collapse = ";"), ";"),
      play_punt_return_yards = sum((stat_id %in% 33:36) * yards)
    ) |>
    # compute team targets and team air yards for calculation of target share
    # and air yard share. Since it's relative, we need to be careful with the groups
    # depending on summary level
    dplyr::group_by(
      !!!rlang::data_syms(
        if (summary_level == "season") {
          c("season", "team")
        } else {
          c("season", "week", "team")
        }
      )
    ) |>
    dplyr::mutate(
      team_targets = sum(stat_id == 115),
      team_air_yards = sum((stat_id %in% 111:112) * yards)
    ) |>
    dplyr::ungroup() |>
    dplyr::left_join(
      playinfo,
      by = c("game_id", "play_id")
    ) |>
    dplyr::mutate(
      season_type = unname(s_type_vctr[.data$game_id]),
      is_gwfg_attempt = unname(gwfg_vctr[paste(
        .data$game_id,
        .data$play_id,
        sep = "_"
      )]) %ifna%
        0L
    )

  # Check combination of summary_level and stat_type to set a helper that is
  # used to create the grouping variables
  grp_id <- data.table::fcase(
    summary_level == "season" && stat_type == "player" , "10" ,
    summary_level == "season" && stat_type == "team"   , "20" ,
    summary_level == "week" && stat_type == "player"   , "30" ,
    summary_level == "week" && stat_type == "team"     , "40"
  )
  # grp_vctr is used as character vector for joining pbp stats
  grp_vctr <- switch(
    grp_id,
    "10" = c("season", "player_id"),
    "20" = c("season", "team"),
    "30" = c("season", "week", "player_id"),
    "40" = c("season", "week", "team")
  )
  # grp_vars is used as grouping variables
  grp_vars <- rlang::data_syms(grp_vctr)

  # Stats from PBP #####################
  # we want passing epa, rushing epa, and receiving epa
  # since these depend on different player id variables and filters,
  # we create separate dfs for these stats
  passing_stats_from_pbp <- pbp |>
    dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "passer_player_id",
      "qb_epa",
      "cpoe"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      passing_epa = sum(.data$qb_epa, na.rm = TRUE),
      # mean will return NaN if all values are NA, because we remove NA
      passing_cpoe = if (any(!is.na(.data$cpoe))) {
        mean(.data$cpoe, na.rm = TRUE)
      } else {
        NA_real_
      }
    ) |>
    dplyr::ungroup()

  rushing_stats_from_pbp <- pbp |>
    dplyr::filter(.data$play_type %in% c("run", "qb_kneel")) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "rusher_player_id",
      "epa"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      rushing_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  receiving_stats_from_pbp <- pbp |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "receiver_player_id",
      "epa"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      receiving_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  stats <- playstats |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      player_name = if (.env$stat_type == "player") {
        custom_mode(.data$player_name, na.rm = TRUE)
      } else {
        NULL
      },
      # Season Type #####################
      # if summary level is week, then we have to use the season type variable
      # from playstats as it could be REG or POST depending on the value of
      # the argument season_type
      # if summary level is season, then we collapse the values of season_type
      # this will make sure that season_type is only REG+POST if the user asked
      # for it AND if postseason data is available
      season_type = if (.env$summary_level == "week") {
        dplyr::first(.data$season_type)
      } else {
        paste(unique(.data$season_type), collapse = "+")
      },

      # Game ID #####################
      # it's not strictly necessary to output game_id because we have
      # season, week, team, and opponent information but it is convenient
      # to add this here
      # Only makes sense in case of weekly stats of course
      game_id = if (.env$summary_level == "week") {
        dplyr::first(.data$game_id)
      } else {
        NULL
      },

      # Team Info #####################
      # recent_team if we do a season summary of player stats
      # team if we do a week summary of player stats
      recent_team = if (.env$grp_id == "10") dplyr::last(.data$team) else NULL,
      team = if (.env$grp_id == "30") dplyr::first(.data$team) else NULL,
      # opponent team if we do week summaries
      opponent_team = if (.env$summary_level == "week") {
        data.table::fifelse(
          dplyr::first(.data$team) == dplyr::first(.data$off),
          dplyr::first(.data$def),
          dplyr::first(.data$off)
        )
      } else {
        NULL
      },

      # number of games is only relevant if we summarise the season
      games = if (.env$summary_level == "season") {
        dplyr::n_distinct(.data$game_id)
      } else {
        NULL
      },

      # Offense #####################
      completions = sum(stat_id %in% 15:16),
      attempts = sum(stat_id %in% c(14:16, 19)),
      passing_yards = sum((stat_id %in% 15:16) * yards),
      passing_tds = sum(stat_id == 16),
      passing_interceptions = sum(stat_id == 19),
      sacks_suffered = sum(stat_id == 20),
      sack_yards_lost = sum((stat_id == 20) * yards),
      sack_fumbles = sum(stat_id == 20 & has_id(52:54, more_stats)),
      sack_fumbles_lost = sum(stat_id == 20 & has_id(106, more_stats)),
      # includes incompletions (111 = complete, 112 = incomplete)
      passing_air_yards = sum((stat_id %in% 111:112) * yards),
      # passing yac equals passing yards - air yards on completed passes
      passing_yards_after_catch = .data$passing_yards -
        sum((stat_id == 111) * yards),
      passing_first_downs = sum((stat_id %in% 15:16) & has_id(4, team_stats)),
      passing_2pt_conversions = sum(stat_id == 77),
      # this is a player stat and we skip it in team stats
      pacr = if (.env$stat_type == "player") {
        .data$passing_yards / .data$passing_air_yards
      } else {
        NULL
      },
      # "Explosives" (see #550 for discussion about the definition)
      passing_10 = sum((stat_id %in% 15:16) * (yards >= 10)),
      passing_16 = sum((stat_id %in% 15:16) * (yards >= 16)),
      passing_20 = sum((stat_id %in% 15:16) * (yards >= 20)),
      passing_40 = sum((stat_id %in% 15:16) * (yards >= 40)),
      # dakota = requires pbp,

      carries = sum(stat_id %in% 10:11),
      rushing_yards = sum((stat_id %in% 10:13) * yards),
      rushing_tds = sum(stat_id %in% c(11, 13)),
      rushing_fumbles = sum((stat_id %in% 10:11) & has_id(52:54, more_stats)),
      rushing_fumbles_lost = sum(
        (stat_id %in% 10:11) & has_id(106, more_stats)
      ),
      rushing_first_downs = sum((stat_id %in% 10:11) & has_id(3, team_stats)),
      rushing_2pt_conversions = sum(stat_id == 75),
      # "Explosives" (see #550 for discussion about the definition)
      rushing_10 = sum((stat_id %in% 10:13) * (yards >= 10)),
      rushing_12 = sum((stat_id %in% 10:13) * (yards >= 12)),
      rushing_20 = sum((stat_id %in% 10:13) * (yards >= 20)),
      rushing_40 = sum((stat_id %in% 10:13) * (yards >= 40)),

      receptions = sum(stat_id %in% 21:22),
      targets = sum(stat_id == 115),
      receiving_yards = sum((stat_id %in% 21:24) * yards),
      receiving_tds = sum(stat_id %in% c(22, 24)),
      receiving_fumbles = sum((stat_id %in% 21:22) & has_id(52:54, more_stats)),
      receiving_fumbles_lost = sum(
        (stat_id %in% 21:22) & has_id(106, more_stats)
      ),
      # air_yards are counted in 111:112 but it is a passer stat not a receiver stat
      # so we count team air yards when a player accounted for a reception
      # team air yards will always equal the correct air yards as 111 and 112
      # cannot appear more than once per play.
      # If this ever changes, we can use pbp instead.
      receiving_air_yards = if (.env$stat_type == "player") {
        sum((stat_id == 115) * .data$team_play_air_yards)
      } else {
        .data$passing_air_yards
      },
      receiving_yards_after_catch = sum((stat_id == 113) * yards),
      receiving_first_downs = sum((stat_id %in% 21:22) & has_id(4, team_stats)),
      receiving_2pt_conversions = sum(stat_id == 104),
      # "Explosives" (see #550 for discussion about the definition)
      receiving_10 = sum((stat_id %in% 21:24) * (yards >= 10)),
      receiving_16 = sum((stat_id %in% 21:24) * (yards >= 16)),
      receiving_20 = sum((stat_id %in% 21:24) * (yards >= 20)),
      receiving_40 = sum((stat_id %in% 21:24) * (yards >= 40)),
      # these are player stats and we skip them in team stats
      racr = if (.env$stat_type == "player") {
        .data$receiving_yards / .data$receiving_air_yards
      } else {
        NULL
      },
      target_share = if (.env$stat_type == "player") {
        .data$targets / dplyr::first(.data$team_targets)
      } else {
        NULL
      },
      air_yards_share = if (.env$stat_type == "player") {
        .data$receiving_air_yards / dplyr::first(.data$team_air_yards)
      } else {
        NULL
      },
      wopr = if (.env$stat_type == "player") {
        1.5 * .data$target_share + 0.7 * .data$air_yards_share
      } else {
        NULL
      },

      special_teams_tds = sum((special == 1) & stat_id %in% td_ids()),

      # Defense #####################
      # def_tackles = ,
      def_tackles_solo = sum(stat_id == 79),
      def_tackles_with_assist = sum(stat_id == 80),
      def_tackle_assists = sum(stat_id == 82),
      def_tackles_for_loss = sum(stat_id == 402),
      def_tackles_for_loss_yards = sum((stat_id == 402) * yards),
      def_fumbles_forced = sum(stat_id == 91),
      def_sacks = sum(stat_id == 83) + 1 / 2 * sum(stat_id == 84),
      def_sack_yards = sum((stat_id == 83) * -yards) +
        1 / 2 * sum((stat_id == 84) * -yards),
      def_qb_hits = sum(stat_id == 110),
      def_interceptions = sum(stat_id %in% 25:26),
      def_interception_yards = sum((stat_id %in% 25:28) * yards),
      def_pass_defended = sum(stat_id == 85),
      def_tds = sum(team == def & special != 1 & stat_id %in% td_ids()),
      # stat ID 54 is a fumble out of bounds. It's never counted alone,
      # always in combination with 52 or 53.
      def_fumbles = sum((team == def) & stat_id %in% 52:53),
      def_safeties = sum(stat_id == 89),

      # Misc #####################
      # mostly yards gained after blocked punts or fgs
      misc_yards = sum((stat_id %in% 63:64) * yards),
      fumble_recovery_own = sum(stat_id %in% 55:56),
      # 57, 58 don't count as recovery because player received a
      # lateral after recovery by other player
      fumble_recovery_yards_own = sum((stat_id %in% 55:58) * yards),
      fumble_recovery_opp = sum(stat_id %in% 59:60),
      # 61, 62 don't count as recovery because player received a
      # lateral after recovery by other player
      fumble_recovery_yards_opp = sum((stat_id %in% 59:62) * yards),
      fumble_recovery_tds = sum(stat_id %in% c(56, 58, 60, 62)),
      penalties = sum(stat_id == 93),
      penalty_yards = sum((stat_id == 93) * yards),
      timeouts = if (.env$stat_type == "team") sum(stat_id == 68) else NULL,
      # we are missing some fumbles on offense (see 515) so we just add
      # totals here. These fumble stats count all fumbles regardless of
      # the unit the player was on. This means that all above fumble stats
      # are included here but we make sure not to loose any fumbles, esp. on offense
      fumbles_forced_by_opp = sum(stat_id == 52),
      fumbles_not_forced = sum(stat_id == 53),
      fumbles_out_of_bounds = sum(stat_id == 54),
      # we could tell users to just add the above three stats but fumbles are
      # a bit confusing overall so it is ok to add a total counter that doesn't
      # miss any fumbles.
      # stat ID 54 is a fumble out of bounds. It's never counted alone,
      # always in combination with 52 or 53. So we cannot add it to the total.
      fumbles_total = sum(stat_id %in% 52:53),
      fumbles_lost_total = sum(stat_id == 106),

      # Returning #####################
      punt_returns = sum(stat_id %in% 33:34),
      punt_return_yards = sum((stat_id %in% 33:36) * yards),
      # punt return tds are counted in special teams tds atm
      # punt_return_tds = sum(stat_id %in% c(34, 36)),
      kickoff_returns = sum(stat_id %in% 45:46),
      kickoff_return_yards = sum((stat_id %in% 45:48) * yards),
      # kickoff return tds are counted in special teams tds atm
      # kickoff_return_tds = sum(stat_id %in% c(46, 48)),

      # Kicking #####################
      fg_made = sum(stat_id == 70),
      fg_att = sum(stat_id %in% 69:71),
      fg_missed = sum(stat_id == 69),
      fg_blocked = sum(stat_id == 71),
      fg_long = max((stat_id == 70) * yards) %0% NA_integer_,
      # avoid 0/0 = NaN
      fg_pct = if (.data$fg_att > 0) .data$fg_made / .data$fg_att else NA_real_,
      fg_made_0_19 = sum((stat_id == 70) * (yards %between% c(0, 19))),
      fg_made_20_29 = sum((stat_id == 70) * (yards %between% c(20, 29))),
      fg_made_30_39 = sum((stat_id == 70) * (yards %between% c(30, 39))),
      fg_made_40_49 = sum((stat_id == 70) * (yards %between% c(40, 49))),
      fg_made_50_59 = sum((stat_id == 70) * (yards %between% c(50, 59))),
      fg_made_60_ = sum((stat_id == 70) * (yards >= 60)),
      fg_missed_0_19 = sum((stat_id == 69) * (yards %between% c(0, 19))),
      fg_missed_20_29 = sum((stat_id == 69) * (yards %between% c(20, 29))),
      fg_missed_30_39 = sum((stat_id == 69) * (yards %between% c(30, 39))),
      fg_missed_40_49 = sum((stat_id == 69) * (yards %between% c(40, 49))),
      fg_missed_50_59 = sum((stat_id == 69) * (yards %between% c(50, 59))),
      fg_missed_60_ = sum((stat_id == 69) * (yards >= 60)),
      fg_made_list = fg_list(stat_id, yards, collapse_id = 70),
      fg_missed_list = fg_list(stat_id, yards, collapse_id = 69),
      fg_blocked_list = fg_list(stat_id, yards, collapse_id = 71),
      fg_made_distance = sum((stat_id == 70) * yards),
      fg_missed_distance = sum((stat_id == 69) * yards),
      fg_blocked_distance = sum((stat_id == 71) * yards),
      pat_made = sum(stat_id == 72),
      pat_att = sum(stat_id %in% 72:74),
      pat_missed = sum(stat_id == 73),
      pat_blocked = sum(stat_id == 74),
      # avoid 0/0 = NaN
      pat_pct = if (.data$pat_att > 0) {
        .data$pat_made / .data$pat_att
      } else {
        NA_real_
      },
      gwfg_made = sum((stat_id == 70) * is_gwfg_attempt),
      gwfg_att = sum((stat_id %in% 69:71) * is_gwfg_attempt),
      gwfg_missed = sum((stat_id == 69) * is_gwfg_attempt),
      gwfg_blocked = sum((stat_id == 71) * is_gwfg_attempt),
      gwfg_distance = if (.env$summary_level == "week") {
        sum((stat_id %in% 69:71) * is_gwfg_attempt * yards)
      } else {
        NULL
      },
      gwfg_distance_list = if (.env$summary_level == "season") {
        fg_list(stat_id, yards, collapse_id = 69:71, gwfg = is_gwfg_attempt)
      } else {
        NULL
      },

      # Punts #####################
      # stat ID 2 counts blocked punts that do not count as punt
      pt_att = sum(stat_id %in% c(29, 31, 32)), # 31 probably unnecessary
      pt_blocked = sum(stat_id == 2),
      pt_long = max(stat_id %in% c(29, 32) * yards) %0% NA_integer_,
      pt_yards = sum(stat_id %in% c(29, 32) * yards),
      pt_inside_20 = sum(stat_id == 30),
      # the following stats are a bit special as we need opponent team stats
      # to get the counts right. That's what 'all_stats' is for
      # stat IDs 37, 38, 39 (punts oob, downed, fair caught) are assigned to
      # the receiving team (or to the receiver in case of 39)
      # Also the number of returns, return TDs and the yardage
      pt_out_of_bounds = sum((stat_id == 29) & has_id(37, all_stats)),
      pt_downed = sum((stat_id == 29) & has_id(38, all_stats)),
      pt_touchback = sum(stat_id == 32),
      pt_fair_caught = sum((stat_id == 29) & has_id(39, all_stats)),
      pt_returned = sum(
        (stat_id %in% c(2, 29, 31, 32)) & has_id(33:34, all_stats)
      ),
      pt_return_yards = sum(
        (stat_id %in% c(2, 29, 31, 32)) * .data$play_punt_return_yards
      ),
      pt_return_tds = sum(
        (stat_id %in% c(2, 29, 31, 32)) & has_id(c(34, 36), all_stats)
      ),
      pt_net_yards = .data$pt_yards -
        .data$pt_return_yards -
        .data$pt_touchback * 20L
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ dplyr::na_if(.x, "")
    ) |>
    # Join PBP Stats #####################
    dplyr::left_join(passing_stats_from_pbp, by = grp_vctr) |>
    dplyr::left_join(rushing_stats_from_pbp, by = grp_vctr) |>
    dplyr::left_join(receiving_stats_from_pbp, by = grp_vctr) |>
    # relocate epa variables. This could be done with dplyr::relocate
    # but we want to be compatible with older dplyr versions
    dplyr::select(
      "season":"passing_first_downs",
      "passing_epa",
      "passing_cpoe",
      "passing_2pt_conversions":"rushing_first_downs",
      "rushing_epa",
      "rushing_2pt_conversions":"receiving_first_downs",
      "receiving_epa",
      dplyr::everything()
    ) |>
    dplyr::arrange(!!!grp_vars)

  # Apply Player Modifications #####################
  if (stat_type == "player") {
    # need newer version of nflreadr to use load_players
    rlang::check_installed("nflreadr (>= 1.3.0)", "to join player information.")

    player_info <- nflreadr::load_players() |>
      dplyr::select(
        "player_id" = "gsis_id",
        "player_display_name" = "display_name",
        # "player_name" = "short_name",
        "position",
        "position_group",
        "headshot_url" = "headshot"
      )

    # load gsis_ids of RBs, FBs and HBs for RACR
    racr_ids <- player_info |>
      dplyr::filter(.data$position %in% c("RB", "FB", "HB")) |>
      dplyr::pull("player_id")

    stats <- stats |>
      dplyr::mutate(
        pacr = dplyr::case_when(
          is.nan(.data$pacr) ~ NA_real_,
          .data$passing_air_yards <= 0 ~ 0,
          TRUE ~ .data$pacr
        ),
        racr = dplyr::case_when(
          is.nan(.data$racr) ~ NA_real_,
          .data$receiving_air_yards == 0 ~ 0,
          # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
          # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
          .data$receiving_air_yards < 0 & !.data$player_id %in% racr_ids ~ 0,
          TRUE ~ .data$racr
        ),
        # Fantasy #####################
        fantasy_points = 1 /
          25 *
          .data$passing_yards +
          4 * .data$passing_tds +
          -2 * .data$passing_interceptions +
          1 / 10 * (.data$rushing_yards + .data$receiving_yards) +
          6 *
            (.data$rushing_tds +
              .data$receiving_tds +
              .data$special_teams_tds) +
          2 *
            (.data$passing_2pt_conversions +
              .data$rushing_2pt_conversions +
              .data$receiving_2pt_conversions) +
          -2 *
            (.data$sack_fumbles_lost +
              .data$rushing_fumbles_lost +
              .data$receiving_fumbles_lost),

        fantasy_points_ppr = .data$fantasy_points + .data$receptions
      ) |>
      dplyr::left_join(player_info, by = "player_id") |>
      dplyr::select(
        "player_id",
        "player_name",
        "player_display_name",
        "position",
        "position_group",
        "headshot_url",
        dplyr::everything()
      )
  }

  if (custom_pbp) {
    attr(stats, "custom_pbp") <- TRUE
  }

  stats
}

# Silence global vars NOTE
# We do this differently here because it's only a bunch of variables
# and the code is more readable
utils::globalVariables(c(
  "stat_id",
  "yards",
  "more_stats",
  "team_stats",
  "team",
  "def",
  "off",
  "special",
  "is_gwfg_attempt"
))

load_playstats <- function(seasons = nflreadr::most_recent_season()) {
  if (isTRUE(seasons)) {
    seasons <- seq(1999, nflreadr::most_recent_season())
  }

  stopifnot(
    is.numeric(seasons),
    seasons >= 1999,
    seasons <= nflreadr::most_recent_season()
  )

  urls <- paste0(
    "https://github.com/nflverse/nflverse-pbp/releases/download/playstats/play_stats_",
    seasons,
    ".rds"
  )

  out <- nflreadr::load_from_url(urls, seasons = TRUE, nflverse = FALSE)

  out
}

fg_list <- function(stat_ids, yards, collapse_id, gwfg = NULL) {
  if (is.null(gwfg)) {
    paste(
      yards[stat_ids == collapse_id],
      collapse = ";"
    )
  } else {
    paste(
      yards[stat_ids %in% collapse_id & gwfg == 1L],
      collapse = ";"
    )
  }
}

`%0%` <- function(lhs, rhs) if (lhs != 0) lhs else rhs

`%ifna%` <- function(lhs, rhs) data.table::fifelse(is.na(lhs), rhs, lhs)

has_id <- function(id, all_ids) {
  stringr::str_detect(all_ids, paste0(";", id, ";", collapse = "|"))
}

td_ids <- function() {
  c(
    11,
    13,
    16,
    18,
    22,
    24,
    26,
    28,
    34,
    36,
    46,
    48,
    # 56, 58, 60, 62, # 56-62 are separately counted in fumble_recovery_tds
    64,
    108
  )
}

stats_validate_pbp <- function(pbp) {
  required_names <- c(
    "season",
    "game_id",
    "play_id",
    "posteam",
    "defteam",
    "special",
    "season_type",
    "away_score",
    "home_score",
    "field_goal_attempt",
    "fixed_drive",
    "score_differential",
    "play_type",
    "week",
    "passer_player_id",
    "qb_epa",
    "cpoe",
    "rusher_player_id",
    "epa",
    "receiver_player_id"
  )
  available_names <- names(pbp)
  missing <- required_names[!required_names %in% available_names]
  if (length(missing) > 0) {
    cli::cli_abort(
      "You have passed custom pbp to the argument {.arg pbp} but \\
      it is missing the following required variable{?s}: {.val {missing}}",
      call = rlang::caller_env()
    )
  }
  unique(pbp$season) |>
    stats::na.omit() |>
    as.vector()
}


================================================
FILE: R/data_documentation.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Documenting Data Files
# Code Style Guide: styler::tidyverse_style()
################################################################################

#' NFL Team names, colors and logo urls.
#'
#' @docType data
#' @format A data frame with 36 rows and 10 variables containing NFL team level
#' information, including franchises in multiple cities:
#' \describe{
#'   \item{team_abbr}{Team abbreviation}
#'   \item{team_name}{Complete Team name}
#'   \item{team_id}{Team id used in the roster function}
#'   \item{team_nick}{Nickname}
#'   \item{team_conf}{Conference}
#'   \item{team_division}{Division}
#'   \item{team_color}{Primary color}
#'   \item{team_color2}{Secondary color}
#'   \item{team_color3}{Tertiary color}
#'   \item{team_color4}{Quaternary color}
#'   \item{team_logo_wikipedia}{Url to Team logo on wikipedia}
#'   \item{team_logo_espn}{Url to higher quality logo on espn}
#'   \item{team_wordmark}{Url to team wordmarks}
#'   \item{team_conference_logo}{Url to AFC and NFC logos}
#'   \item{team_league_logo}{Url to NFL logo}
#' }
#' The primary and secondary colors have been taken from nfl.com with some modifications
#' for better team distinction and most recent team color themes.
#' The tertiary and quaternary colors are taken from Lee Sharpe's teamcolors.csv
#' who has taken them from the `teamcolors` package created by Ben Baumer and
#' Gregory Matthews. The Wikipeadia logo urls are taken from Lee Sharpe's logos.csv
#' Team wordmarks from nfl.com
#' @examples
#' \donttest{
#' teams_colors_logos
#' }
"teams_colors_logos"

#' nflfastR Field Descriptions
#'
#' @docType data
#' @format A data frame including names and descriptions of all variables in
#' an nflfastR dataset.
#' @seealso The searchable table on the
#' [nflfastR website](https://nflfastr.com/articles/field_descriptions.html)
#' @examples
#' \donttest{
#' field_descriptions
#' }
"field_descriptions"

#' NFL Stat IDs and their Meanings
#'
#' @docType data
#' @format A data frame including NFL stat IDs, names and descriptions used in
#' an nflfastR dataset.
#' @source \url{http://www.nflgsis.com/gsis/Documentation/Partners/StatIDs.html}
#' @examples
#' \donttest{
#' stat_ids
#' }
"stat_ids"

#' NFL Stats Variables
#'
#' @docType data
#' @format A data frame explaining all variables returned by the function
#' [calculate_stats()].
#' @examples
#' \donttest{
#' nfl_stats_variables
#' }
"nfl_stats_variables"


================================================
FILE: R/database.R
================================================
#' Update or Create a nflverse Play-by-Play Data Table in a Connected Database
#'
#' @description
#' The nflfastR play-by-play era dates back to 1999. To analyze all the data
#' efficiently, there is practically no alternative to working with a database.
#'
#' This function helps to create and maintain a table containing all
#' play-by-play data of the nflfastR era in a connected database.
#' Primarily, the preprocessed data from [load_pbp] is written to the database
#' and, if necessary, supplemented with the latest games using
#' [build_nflfastR_pbp].
#'
#' @param conn A `DBIConnection` object, as returned by [DBI::dbConnect()]
#' @inheritParams rlang::args_dots_empty
#' @inheritParams DBI::dbExistsTable
#' @param seasons Hybrid argument (logical or numeric) to update parts
#' of or the complete play by play table within the database.
#'
#' It can update the play by play data table either for the whole nflfastR era
#' (with `seasons = TRUE`) or just for specified seasons
#' (e.g. `seasons = 2024:2025`).
#'
#' Defaults to [most_recent_season]. Please see details for further information.
#'
#' @details
#' ## The `seasons` argument
#'
#' The `seasons` argument controls how the table in the connected database is
#' handled.
#'
#' With `seasons = TRUE`, the table in argument `name` will be removed completely
#' (by calling [DBI::dbRemoveTable]) and all seasons of the nflfastR era will be
#' added to a fresh table. This is helpful when new columns are added during the
#' offseason.
#'
#' With a numerical vector, e.g. `seasons = 2024:2025`, the table in argument
#' `name` will be preserved and only rows from the given seasons will be deleted
#' and re-added (by calling [DBI::dbAppendTable]). This is intended to be used
#' for ongoing seasons because the NFL fixes bugs in the underlying data during
#' the week and we recommend rebuilding the current season every Thursday during
#' the season.
#'
#' The default behavior is `seasons = most_recent_season()`, which means that
#' only the most recent season is updated or added.
#'
#' To keep the table, and thus also the schema, but update all play-by-play
#' data of the nflfastR era, set
#' ```
#' seasons = seq(1999, most_recent_season())
#' ```
#'
#' If `seasons` contains multiple seasons, it is possible to control whether the
#' seasons are loaded individually and written to the database, or whether
#' multiple seasons should be processed in chunks. The latter is more efficient
#' because fewer write operations are required, but at the same time, the data
#' must first be stored in memory. The option `“nflfastR.db_chunk_size”` can
#' be used to control how many seasons are loaded together in a chunk and
#' written to the database. With the following option, for example, 5 seasons
#' are always loaded together and written to the database.
#' ```
#' options("nflfastR.db_chunk_size" = 5L)
#' ```
#'
#' @returns Always returns the database connection invisibly.
#' @export
#'
#' @examples
#' \donttest{
#' con <- DBI::dbConnect(duckdb::duckdb())
#' try({# to avoid CRAN test problems
#' update_pbp_db(con, seasons = 2024)
#' })
#' }
update_pbp_db <- function(
  conn,
  ...,
  name = "nflverse_pbp",
  seasons = most_recent_season()
) {
  rlang::check_installed("DBI", "to communicate with databases")
  rlang::check_dots_empty()

  # Validate connection and table name --------------------------------------

  if (!DBI::dbIsValid(conn)) {
    cli::cli_abort(
      "The connection in argument {.arg conn} is invalid. \\
      Do you need to run {.fun DBI::dbConnect}?"
    )
  }

  rule_header("Update nflverse Play-by-Play Data in Connected Database")

  # msg_name is the table name used in cli messages. We need it because `name`
  # could be a call to DBI::SQL() or DBI::Id()
  # I don't want to evaluate name in every subsequent function call, so I do it
  # here once and pass it around
  msg_name <- DBI::dbQuoteIdentifier(conn = conn, x = name) |>
    as.character()

  initiated <- FALSE
  if (!DBI::dbExistsTable(conn = conn, name = name)) {
    do_it <- confirm(
      "Table {.val {msg_name}} does not yet exist in your connected database.
      Do you wish to create it? (Y/n)"
    )
    if (do_it) {
      initiated <- db_initiate_pbp(
        conn = conn,
        name = name,
        msg_name = msg_name
      )
    } else {
      rule_footer("ABORTED")
      return(invisible(conn))
    }
  }

  # Validate seasons --------------------------------------------------------

  if (is.numeric(seasons)) {
    invalid <- setdiff(seasons, valid_seasons())
    if (length(invalid) > 0) {
      cli::cli_abort(
        "The following {cli::qty(length(invalid))} season{?s} {?is/are} \\
        invalid: {.val {invalid}}"
      )
    }
    ret <- db_drop_seasons(
      conn = conn,
      name = name,
      seasons = seasons,
      msg_name = msg_name
    )
  } else if (isTRUE(seasons)) {
    # We need this block inside if (isTRUE(seasons)) to make sure we run
    # the else block in the right conditions
    if (isFALSE(initiated)) {
      do_it <- confirm(
        "Purge table {.val {msg_name}} in your connected database? (Y/n)"
      )
      if (do_it) {
        ret <- DBI::dbRemoveTable(conn = conn, name = name)
        cli_message("Removed {.val {msg_name}}")
        initiated <- db_initiate_pbp(
          conn = conn,
          name = name,
          msg_name = msg_name
        )
      } else {
        rule_footer("ABORTED")
        return(invisible(conn))
      }
    }
  } else {
    cli::cli_abort(
      "Argument {.arg seasons} must be either a vector of valid \\
      seasons or scalar TRUE"
    )
  }

  seasons <- if (isTRUE(seasons)) valid_seasons() else seasons

  # Append seasons ----------------------------------------------------------
  ret <- db_write_pbp_seasons(
    conn = conn,
    name = name,
    seasons = seasons,
    msg_name = msg_name
  )

  # Process missing games ---------------------------------------------------
  db_games <- db_query_game_ids(conn = conn, name = name, seasons = seasons)
  completed_games <- completed_game_ids(seasons = seasons)
  missing_games <- setdiff(completed_games, db_games)

  # This block is only relevant on game days
  if (length(missing_games) > 0) {
    # we enter this block if some completed games are missing in load_pbp.
    # This can happen on game days
    vec <- cli::cli_vec(missing_games, list("vec-trunc" = 5L))
    cli_message(
      "The following {cli::no(length(missing_games))} game{?s} {?is/are} not \\
      yet available via {.fun load_pbp} and {?is/are} therefore parsed directly \\
      with {.fun build_nflfastR_pbp} and appended to table {.val {msg_name}}: \\
      {.val {vec}}"
    )
    # build pbp of missing games. If raw pbp isn't ready, the function will
    # return an empty dataframe for that game
    new_pbp <- build_nflfastR_pbp(missing_games, rules = FALSE)
    ret <- DBI::dbAppendTable(
      conn = conn,
      name = name,
      value = new_pbp
    )
    # Check how many new games have been added
    new_ids <- unique(new_pbp[["game_id"]])
    cli_message(
      "Appended {cli::no(length(new_ids))} game{?s} to table {.val {msg_name}}",
      .cli_fct = cli::cli_alert_success
    )
    # Let user know that some games are still missing
    still_missing <- setdiff(missing_games, new_ids)
    vec <- cli::cli_vec(still_missing, list("vec-trunc" = 5L))
    cli_message(
      "Raw pbp data for the following {cli::no(length(still_missing))} game{?s} \\
      still missing: {.val {vec}}. Please try again in about 10 minutes.",
      .cli_fct = cli::cli_alert_warning
    )
  }

  # Remove Dummy ------------------------------------------------------------
  ret <- db_remove_dummy(conn = conn, name = name)

  # Finish ------------------------------------------------------------------
  cli_message(
    "Database update completed",
    .cli_fct = cli::cli_alert_success
  )
  rule_footer("DONE")
Download .txt
gitextract_h9htwsm9/

├── .Rbuildignore
├── .git-blame-ignore-revs
├── .github/
│   ├── .gitignore
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── format-suggest.yaml
│       ├── pkgdown.yaml
│       ├── revdepcheck.yaml
│       └── rhub.yaml
├── .gitignore
├── .vscode/
│   ├── extensions.json
│   └── settings.json
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── aggregate_game_stats.R
│   ├── aggregate_game_stats_def.R
│   ├── aggregate_game_stats_kicking.R
│   ├── build_nflfastR_pbp.R
│   ├── build_playstats.R
│   ├── calculate_series_conversion_rates.R
│   ├── calculate_standings.R
│   ├── calculate_stats.R
│   ├── data_documentation.R
│   ├── database.R
│   ├── ep_wp_calculators.R
│   ├── helper_add_cp_cpoe.R
│   ├── helper_add_ep_wp.R
│   ├── helper_add_fixed_drives.R
│   ├── helper_add_game_data.R
│   ├── helper_add_nflscrapr_mutations.R
│   ├── helper_add_series_data.R
│   ├── helper_add_xpass.R
│   ├── helper_add_xyac.R
│   ├── helper_additional_functions.R
│   ├── helper_database_functions.R
│   ├── helper_decode_player_ids.R
│   ├── helper_get_scheds_and_rosters.R
│   ├── helper_scrape_gc.R
│   ├── helper_scrape_nfl.R
│   ├── helper_tidy_play_stats.R
│   ├── helper_variable_selector.R
│   ├── nflfastR-package.R
│   ├── report.R
│   ├── save_raw_pbp.R
│   ├── sysdata.rda
│   ├── top-level_scraper.R
│   └── utils.R
├── README.Rmd
├── README.md
├── air.toml
├── cran-comments.md
├── data/
│   ├── field_descriptions.rda
│   ├── nfl_stats_variables.rda
│   ├── stat_ids.rda
│   └── teams_colors_logos.rda
├── data-raw/
│   ├── MODELS.R
│   ├── Scrambles 1999-2004 UPDATE for NFLfastR.xlsx
│   ├── Scrambles.1999-2003.FURTHER.UPDATE.for.NFLfastR.xlsx
│   ├── _tune_spread_wp.R
│   ├── build_scramble_fix.R
│   ├── build_stat_id_df.R
│   ├── compare_dfs.R
│   ├── create_field_descriptions.R
│   ├── default_play.R
│   ├── nfl_stats_variables.R
│   ├── nfl_stats_variables.json
│   ├── pbp_datatypes.csv
│   ├── pbp_defaultplay.rds
│   ├── replace_models.R
│   ├── scramble_fix.rds
│   ├── scrambles_2005.xlsx
│   ├── teams_colors_logos.R
│   ├── tidy_play_stats_row.R
│   ├── variable_explanation.xlsx
│   ├── variable_list.txt
│   └── wordmarks.R
├── man/
│   ├── add_qb_epa.Rd
│   ├── add_xpass.Rd
│   ├── add_xyac.Rd
│   ├── build_nflfastR_pbp.Rd
│   ├── calculate_expected_points.Rd
│   ├── calculate_player_stats.Rd
│   ├── calculate_player_stats_def.Rd
│   ├── calculate_player_stats_kicking.Rd
│   ├── calculate_series_conversion_rates.Rd
│   ├── calculate_standings.Rd
│   ├── calculate_stats.Rd
│   ├── calculate_win_probability.Rd
│   ├── clean_pbp.Rd
│   ├── decode_player_ids.Rd
│   ├── fast_scraper.Rd
│   ├── fast_scraper_roster.Rd
│   ├── fast_scraper_schedules.Rd
│   ├── field_descriptions.Rd
│   ├── missing_raw_pbp.Rd
│   ├── nfl_stats_variables.Rd
│   ├── nflfastR-package.Rd
│   ├── reexports.Rd
│   ├── report.Rd
│   ├── save_raw_pbp.Rd
│   ├── stat_ids.Rd
│   ├── teams_colors_logos.Rd
│   ├── update_db.Rd
│   └── update_pbp_db.Rd
├── nflfastR.Rproj
├── pkgdown/
│   ├── _pkgdown.yml
│   └── extra.css
├── tests/
│   ├── testthat/
│   │   ├── 2019/
│   │   │   └── 2019_01_GB_CHI.rds
│   │   ├── 2025/
│   │   │   └── 2025_01_KC_LAC.rds
│   │   ├── _snaps/
│   │   │   ├── build_nflfastR_pbp.md
│   │   │   └── stats/
│   │   │       └── calculate_stats.md
│   │   ├── expected_ep.rds
│   │   ├── expected_pbp.rds
│   │   ├── expected_sc.rds
│   │   ├── expected_sc_weekly.rds
│   │   ├── expected_wp.rds
│   │   ├── games.rds
│   │   ├── helpers.R
│   │   ├── test-build_nflfastR_pbp.R
│   │   ├── test-calculate_series_conversion_rates.R
│   │   ├── test-calculate_stats.R
│   │   └── test-ep_wp_calculators.R
│   └── testthat.R
├── tools/
│   └── check.env
└── vignettes/
    ├── .gitignore
    ├── beginners_guide.Rmd
    ├── field_descriptions.Rmd
    ├── nflfastR.Rmd
    └── stats_variables.Rmd
Condensed preview — 131 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (917K chars).
[
  {
    "path": ".Rbuildignore",
    "chars": 537,
    "preview": "^.*\\.Rproj$\n^\\.Rproj\\.user$\n^LICENSE\\.md$\n^data-raw$\n^README\\.Rmd$\n^.*\\.pdf$\n^.github$\n^_pkgdown\\.yml$\n^docs$\n^pkgdown$\n"
  },
  {
    "path": ".git-blame-ignore-revs",
    "chars": 341,
    "preview": "# This file lists revisions of large-scale formatting/style changes so that\n# they can be excluded from git blame result"
  },
  {
    "path": ".github/.gitignore",
    "chars": 7,
    "preview": "*.html\n"
  },
  {
    "path": ".github/workflows/R-CMD-check.yaml",
    "chars": 1524,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".github/workflows/format-suggest.yaml",
    "chars": 1683,
    "preview": "# Workflow derived from https://github.com/posit-dev/setup-air/tree/main/examples\n\non:\n  # Using `pull_request_target` o"
  },
  {
    "path": ".github/workflows/pkgdown.yaml",
    "chars": 2236,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".github/workflows/revdepcheck.yaml",
    "chars": 488,
    "preview": "# Workflow derived from https://github.com/r-devel/recheck?tab=readme-ov-file#how-to-use-with-github-actions\non:\n  workf"
  },
  {
    "path": ".github/workflows/rhub.yaml",
    "chars": 2942,
    "preview": "# R-hub's generic GitHub Actions workflow file. It's canonical location is at\n# https://github.com/r-hub/actions/blob/v1"
  },
  {
    "path": ".gitignore",
    "chars": 590,
    "preview": "# History files\n.Rhistory\n.Rapp.history\n# Session Data files\n.RData\n# User-specific files\n.Ruserdata\n# Example code in p"
  },
  {
    "path": ".vscode/extensions.json",
    "chars": 62,
    "preview": "{\n    \"recommendations\": [\n        \"Posit.air-vscode\"\n    ]\n}\n"
  },
  {
    "path": ".vscode/settings.json",
    "chars": 227,
    "preview": "{\n    \"[r]\": {\n        \"editor.formatOnSave\": true,\n        \"editor.defaultFormatter\": \"Posit.air-vscode\"\n    },\n    \"[q"
  },
  {
    "path": "DESCRIPTION",
    "chars": 1556,
    "preview": "Type: Package\nPackage: nflfastR\nTitle: Functions to Efficiently Access NFL Play by Play Data\nVersion: 5.2.0.9012\nAuthors"
  },
  {
    "path": "LICENSE",
    "chars": 57,
    "preview": "YEAR: 2020\nCOPYRIGHT HOLDER: Sebastian Carl; Ben Baldwin\n"
  },
  {
    "path": "LICENSE.md",
    "chars": 1086,
    "preview": "# MIT License\n\nCopyright (c) 2020 Sebastian Carl; Ben Baldwin\n\nPermission is hereby granted, free of charge, to any pers"
  },
  {
    "path": "NAMESPACE",
    "chars": 1257,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nexport(add_qb_epa)\nexport(add_xpass)\nexport(add_xyac)\nexport(build_nflfast"
  },
  {
    "path": "NEWS.md",
    "chars": 52363,
    "preview": "# nflfastR (development version)\n\n- Added new function `update_pbp_db()`, a fresh approach to the database helper. (#544"
  },
  {
    "path": "R/aggregate_game_stats.R",
    "chars": 38118,
    "preview": "################################################################################\n# Author: Ben Baldwin, Sebastian Carl\n#"
  },
  {
    "path": "R/aggregate_game_stats_def.R",
    "chars": 23219,
    "preview": "################################################################################\n# Author: Christian Lohr, Sebastian Car"
  },
  {
    "path": "R/aggregate_game_stats_kicking.R",
    "chars": 8945,
    "preview": "#' Summarize Kicking Stats\n#'\n#' @description\n#' `r lifecycle::badge(\"deprecated\")`\n#'\n#' This function was deprecated b"
  },
  {
    "path": "R/build_nflfastR_pbp.R",
    "chars": 3633,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Wra"
  },
  {
    "path": "R/build_playstats.R",
    "chars": 3170,
    "preview": "build_playstats <- function(\n  seasons = nflreadr::most_recent_season(),\n  stat_ids = 1:1000,\n  dir = getOption(\"nflfast"
  },
  {
    "path": "R/calculate_series_conversion_rates.R",
    "chars": 7402,
    "preview": "#' Compute Series Conversion Information from Play by Play\n#'\n#' @description A \"Series\" begins on a 1st and 10 and each"
  },
  {
    "path": "R/calculate_standings.R",
    "chars": 5240,
    "preview": "#' Compute Division Standings and Conference Seeds from Play by Play\n#'\n#' @description\n#' `r lifecycle::badge(\"deprecat"
  },
  {
    "path": "R/calculate_stats.R",
    "chars": 29526,
    "preview": "################################################################################\n# Author: Sebastian Carl\n##############"
  },
  {
    "path": "R/data_documentation.R",
    "chars": 2531,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Doc"
  },
  {
    "path": "R/database.R",
    "chars": 10508,
    "preview": "#' Update or Create a nflverse Play-by-Play Data Table in a Connected Database\n#'\n#' @description\n#' The nflfastR play-b"
  },
  {
    "path": "R/ep_wp_calculators.R",
    "chars": 6422,
    "preview": "#' Compute expected points\n#'\n#' for provided plays. Returns the data with\n#' probabilities of each scoring event and EP"
  },
  {
    "path": "R/helper_add_cp_cpoe.R",
    "chars": 2893,
    "preview": "################################################################################\n# Author: Ben Baldwin, Sebastian Carl\n#"
  },
  {
    "path": "R/helper_add_ep_wp.R",
    "chars": 63513,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Fun"
  },
  {
    "path": "R/helper_add_fixed_drives.R",
    "chars": 7621,
    "preview": "################################################################################\n# Author: Sebastian Carl, Ben Baldwin\n#"
  },
  {
    "path": "R/helper_add_game_data.R",
    "chars": 1972,
    "preview": "################################################################################\n# Author: Ben Baldwin\n# Purpose: Functi"
  },
  {
    "path": "R/helper_add_nflscrapr_mutations.R",
    "chars": 31885,
    "preview": "################################################################################\n# Author: Sebastian Carl, Ben Baldwin ("
  },
  {
    "path": "R/helper_add_series_data.R",
    "chars": 4019,
    "preview": "################################################################################\n# Author: Sebastian Carl, Ben Baldwin\n#"
  },
  {
    "path": "R/helper_add_xpass.R",
    "chars": 3288,
    "preview": "################################################################################\n# Author: Ben Baldwin\n# Stlyeguide: sty"
  },
  {
    "path": "R/helper_add_xyac.R",
    "chars": 10122,
    "preview": "################################################################################\n# Author: Ben Baldwin, Sebastian Carl\n#"
  },
  {
    "path": "R/helper_additional_functions.R",
    "chars": 24163,
    "preview": "################################################################################\n# Author: Ben Baldwin, Sebastian Carl, "
  },
  {
    "path": "R/helper_database_functions.R",
    "chars": 9381,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Cre"
  },
  {
    "path": "R/helper_decode_player_ids.R",
    "chars": 4437,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Fun"
  },
  {
    "path": "R/helper_get_scheds_and_rosters.R",
    "chars": 544,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Fun"
  },
  {
    "path": "R/helper_scrape_gc.R",
    "chars": 10356,
    "preview": "################################################################################\n# Author: Ben Baldwin\n# Stlyeguide: sty"
  },
  {
    "path": "R/helper_scrape_nfl.R",
    "chars": 14906,
    "preview": "################################################################################\n# Author: Sebastian Carl, Ben Baldwin\n#"
  },
  {
    "path": "R/helper_tidy_play_stats.R",
    "chars": 53238,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Cre"
  },
  {
    "path": "R/helper_variable_selector.R",
    "chars": 9820,
    "preview": "################################################################################\n# Author: Ben Baldwin, Sebastian Carl\n#"
  },
  {
    "path": "R/nflfastR-package.R",
    "chars": 5421,
    "preview": "#' @details # Parallel Processing and Progress Updates in nflfastR\n#'\n#' ## Preface\n#'\n#' Prior to nflfastR v4.0, parall"
  },
  {
    "path": "R/report.R",
    "chars": 1199,
    "preview": "#' Get a Situation Report on System, nflverse Package Versions and Dependencies\n#'\n#' @description\n#'\n#' `r lifecycle::b"
  },
  {
    "path": "R/save_raw_pbp.R",
    "chars": 6012,
    "preview": "#' Download Raw PBP Data to Local Filesystem\n#'\n#' The functions [build_nflfastR_pbp()] and [fast_scraper()] support loa"
  },
  {
    "path": "R/top-level_scraper.R",
    "chars": 39937,
    "preview": "################################################################################\n# Author: Sebastian Carl\n# Purpose: Top"
  },
  {
    "path": "R/utils.R",
    "chars": 9296,
    "preview": "# The function `message_completed` to create the green \"...completed\" message\n# only exists to hide the option `in_build"
  },
  {
    "path": "README.Rmd",
    "chars": 9257,
    "preview": "---\noutput: github_document\n---\n\n<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n```{r, include "
  },
  {
    "path": "README.md",
    "chars": 6342,
    "preview": "\n<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n# **nflfastR** <img src=\"man/figures/logo.png\" "
  },
  {
    "path": "air.toml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cran-comments.md",
    "chars": 348,
    "preview": "## Release summary\n\nThis is a minor release that \n* deprecates old functions, and\n* fixes bugs.\n\n## R CMD check results\n"
  },
  {
    "path": "data-raw/MODELS.R",
    "chars": 7486,
    "preview": "################################################################################\n# Author: Ben Baldwin\n# Purpose: Estima"
  },
  {
    "path": "data-raw/_tune_spread_wp.R",
    "chars": 3971,
    "preview": "library(tidyverse)\nlibrary(tidymodels)\nsource('R/helper_add_ep_wp.R')\nsource('R/helper_add_nflscrapr_mutations.R')\n\nset."
  },
  {
    "path": "data-raw/build_scramble_fix.R",
    "chars": 3470,
    "preview": "library(tidyverse)\n\npbp <- nflfastR::load_pbp(1999:2005) |>\n  # plays that could plausibly be scramble\n  filter(\n    !is"
  },
  {
    "path": "data-raw/build_stat_id_df.R",
    "chars": 599,
    "preview": "stat_ids <- \"https://www.nflgsis.com/gsis/Documentation/Partners/StatIDs_files/sheet001.html\" |>\n  xml2::read_html() |>\n"
  },
  {
    "path": "data-raw/compare_dfs.R",
    "chars": 2517,
    "preview": "library(tidyverse)\nfuture::plan(\"multisession\")\n\n# function for comparing revisions against data in repo\n# make sure to "
  },
  {
    "path": "data-raw/create_field_descriptions.R",
    "chars": 373,
    "preview": "library(dplyr)\nlibrary(tidyr)\nlibrary(stringr)\nlibrary(usethis)\n\nx <- readLines(\"data-raw/variable_list.txt\")\n\nfield_des"
  },
  {
    "path": "data-raw/default_play.R",
    "chars": 1694,
    "preview": "### Create datatype dataframe\n### This is a db that is stored on Seb's local machine\nconnection <- DBI::dbConnect(duckdb"
  },
  {
    "path": "data-raw/nfl_stats_variables.R",
    "chars": 2197,
    "preview": "s1 <- calculate_stats(2023, \"season\", \"player\")\ns2 <- calculate_stats(2023, \"week\", \"player\")\ns3 <- calculate_stats(2023"
  },
  {
    "path": "data-raw/nfl_stats_variables.json",
    "chars": 19077,
    "preview": "[\n  {\n    \"variable\": \"player_id\",\n    \"description\": \"GSIS player ID. Available if stat_type = 'player'.\"\n  },\n  {\n    "
  },
  {
    "path": "data-raw/pbp_datatypes.csv",
    "chars": 43086,
    "preview": "skim_type,skim_variable,n_missing,complete_rate,character.min,character.max,character.empty,character.n_unique,character"
  },
  {
    "path": "data-raw/replace_models.R",
    "chars": 432,
    "preview": "# Helper function to replace the internal calls to the models\n# with a call to the fastrmodels package\nmodels <- c(\n  \"e"
  },
  {
    "path": "data-raw/teams_colors_logos.R",
    "chars": 93,
    "preview": "teams_colors_logos <- nflreadr::load_teams()\n\nuse_data(teams_colors_logos, overwrite = TRUE)\n"
  },
  {
    "path": "data-raw/tidy_play_stats_row.R",
    "chars": 8384,
    "preview": "# Script to create the tidy_play_stats_row tibble that is used in\n# the internal function `sum_play_stats`\n\nlibrary(tidy"
  },
  {
    "path": "data-raw/variable_list.txt",
    "chars": 36854,
    "preview": "#' \\item{play_id}{Numeric play id that when used with game_id and drive provides the unique identifier for a single play"
  },
  {
    "path": "data-raw/wordmarks.R",
    "chars": 984,
    "preview": "library(dplyr)\n\nteams <- nflfastR::teams_colors_logos |>\n  dplyr::filter(!team_abbr %in% c(\"LAR\", \"OAK\", \"SD\", \"STL\"))\n\n"
  },
  {
    "path": "man/add_qb_epa.Rd",
    "chars": 644,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_additional_functions.R\n\\name{add_qb"
  },
  {
    "path": "man/add_xpass.Rd",
    "chars": 1083,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_add_xpass.R\n\\name{add_xpass}\n\\alias"
  },
  {
    "path": "man/add_xyac.Rd",
    "chars": 1244,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_add_xyac.R\n\\name{add_xyac}\n\\alias{a"
  },
  {
    "path": "man/build_nflfastR_pbp.Rd",
    "chars": 2416,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/build_nflfastR_pbp.R\n\\name{build_nflfastR_"
  },
  {
    "path": "man/calculate_expected_points.Rd",
    "chars": 2152,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ep_wp_calculators.R\n\\name{calculate_expect"
  },
  {
    "path": "man/calculate_player_stats.Rd",
    "chars": 6214,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/aggregate_game_stats.R\n\\name{calculate_pla"
  },
  {
    "path": "man/calculate_player_stats_def.Rd",
    "chars": 1730,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/aggregate_game_stats_def.R\n\\name{calculate"
  },
  {
    "path": "man/calculate_player_stats_kicking.Rd",
    "chars": 1502,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/aggregate_game_stats_kicking.R\n\\name{calcu"
  },
  {
    "path": "man/calculate_series_conversion_rates.Rd",
    "chars": 4248,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/calculate_series_conversion_rates.R\n\\name{"
  },
  {
    "path": "man/calculate_standings.Rd",
    "chars": 2444,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/calculate_standings.R\n\\name{calculate_stan"
  },
  {
    "path": "man/calculate_stats.Rd",
    "chars": 2465,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/calculate_stats.R\n\\name{calculate_stats}\n\\"
  },
  {
    "path": "man/calculate_win_probability.Rd",
    "chars": 2043,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ep_wp_calculators.R\n\\name{calculate_win_pr"
  },
  {
    "path": "man/clean_pbp.Rd",
    "chars": 3232,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_additional_functions.R\n\\name{clean_"
  },
  {
    "path": "man/decode_player_ids.Rd",
    "chars": 2073,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_decode_player_ids.R\n\\name{decode_pl"
  },
  {
    "path": "man/fast_scraper.Rd",
    "chars": 35183,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/top-level_scraper.R\n\\name{fast_scraper}\n\\a"
  },
  {
    "path": "man/fast_scraper_roster.Rd",
    "chars": 1390,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/top-level_scraper.R\n\\name{fast_scraper_ros"
  },
  {
    "path": "man/fast_scraper_schedules.Rd",
    "chars": 1201,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/top-level_scraper.R\n\\name{fast_scraper_sch"
  },
  {
    "path": "man/field_descriptions.Rd",
    "chars": 557,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_documentation.R\n\\docType{data}\n\\name{"
  },
  {
    "path": "man/missing_raw_pbp.Rd",
    "chars": 1254,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/save_raw_pbp.R\n\\name{missing_raw_pbp}\n\\ali"
  },
  {
    "path": "man/nfl_stats_variables.Rd",
    "chars": 453,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_documentation.R\n\\docType{data}\n\\name{"
  },
  {
    "path": "man/nflfastR-package.Rd",
    "chars": 6221,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/nflfastR-package.R\n\\docType{package}\n\\name"
  },
  {
    "path": "man/reexports.Rd",
    "chars": 824,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/nflfastR-package.R\n\\docType{import}\n\\name{"
  },
  {
    "path": "man/report.Rd",
    "chars": 2201,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/report.R\n\\name{report}\n\\alias{report}\n\\tit"
  },
  {
    "path": "man/save_raw_pbp.Rd",
    "chars": 2487,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/save_raw_pbp.R\n\\name{save_raw_pbp}\n\\alias{"
  },
  {
    "path": "man/stat_ids.Rd",
    "chars": 489,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_documentation.R\n\\docType{data}\n\\name{"
  },
  {
    "path": "man/teams_colors_logos.Rd",
    "chars": 1531,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_documentation.R\n\\docType{data}\n\\name{"
  },
  {
    "path": "man/update_db.Rd",
    "chars": 2779,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/helper_database_functions.R\n\\name{update_d"
  },
  {
    "path": "man/update_pbp_db.Rd",
    "chars": 3984,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/database.R\n\\name{update_pbp_db}\n\\alias{upd"
  },
  {
    "path": "nflfastR.Rproj",
    "chars": 472,
    "preview": "Version: 1.0\nProjectId: e1e14382-386c-49b3-9b3f-206a4cc98503\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSav"
  },
  {
    "path": "pkgdown/_pkgdown.yml",
    "chars": 4233,
    "preview": "url: https://nflfastr.com/\n\ntemplate:\n  bootstrap: 5\n  light-switch: true\n  bslib:\n    font_scale: 1.1\n    base_font: {g"
  },
  {
    "path": "pkgdown/extra.css",
    "chars": 864,
    "preview": "/*\nCheck: https://www.w3schools.com/css/css_rwd_mediaqueries.asp\nfor Responsive Web Design - Media Queries\n*/\n.row > mai"
  },
  {
    "path": "tests/testthat/_snaps/build_nflfastR_pbp.md",
    "chars": 12164,
    "preview": "# default_play is synced with build_nflfastR_pbp\n\n    {\n      \"type\": \"character\",\n      \"attributes\": {\n        \"names\""
  },
  {
    "path": "tests/testthat/_snaps/stats/calculate_stats.md",
    "chars": 21024,
    "preview": "# calculate_stats works\n\n    {\n      \"type\": \"character\",\n      \"attributes\": {\n        \"names\": {\n          \"type\": \"ch"
  },
  {
    "path": "tests/testthat/helpers.R",
    "chars": 2631,
    "preview": "# sample games we'll use to check with\ngame_ids <- c(\"2025_01_KC_LAC\", \"2019_01_GB_CHI\")\n\ntest_dir <- getwd()\n\npbp_cache"
  },
  {
    "path": "tests/testthat/test-build_nflfastR_pbp.R",
    "chars": 1784,
    "preview": "test_that(\"build_nflfastR_pbp works (local data)\", {\n  # This test used to run on CRAN but their changes to env vars whi"
  },
  {
    "path": "tests/testthat/test-calculate_series_conversion_rates.R",
    "chars": 675,
    "preview": "test_that(\"calculate_series_conversion_rates works\", {\n  # This test used to run on CRAN but their changes to env vars w"
  },
  {
    "path": "tests/testthat/test-calculate_stats.R",
    "chars": 2633,
    "preview": "test_that(\"calculate_stats works\", {\n  skip_on_cran()\n  skip_if_offline(\"github.com\")\n\n  s1 <- calculate_stats(\n    seas"
  },
  {
    "path": "tests/testthat/test-ep_wp_calculators.R",
    "chars": 1347,
    "preview": "test_that(\"calculate_expected_points works\", {\n  # This test used to run on CRAN but their changes to env vars which cau"
  },
  {
    "path": "tests/testthat.R",
    "chars": 376,
    "preview": "# This file is part of the standard setup for testthat.\n# It is recommended that you do not modify it.\n#\n# Where should "
  },
  {
    "path": "tools/check.env",
    "chars": 346,
    "preview": "# Check for usage of more than two cores. We really need to do this\n# because CRAN kept rejecting nflfastR\n# It is not s"
  },
  {
    "path": "vignettes/.gitignore",
    "chars": 18,
    "preview": "*.html\n*.R\npbp_db\n"
  },
  {
    "path": "vignettes/beginners_guide.Rmd",
    "chars": 38092,
    "preview": "---\ntitle: \"A beginner's guide to nflfastR\"\nauthor: \"Ben Baldwin\"\n---\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  "
  },
  {
    "path": "vignettes/field_descriptions.Rmd",
    "chars": 418,
    "preview": "---\ntitle: \"Field Descriptions\"\n---\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  echo = FALSE,\n  comment = \"#>\"\n)\n\n"
  },
  {
    "path": "vignettes/nflfastR.Rmd",
    "chars": 22680,
    "preview": "---\ntitle: \"Get started with nflfastR\"\nauthor: \"Ben Baldwin & Sebastian Carl\"\n---\n\n```{r, include = FALSE}\nknitr::opts_c"
  },
  {
    "path": "vignettes/stats_variables.Rmd",
    "chars": 1820,
    "preview": "---\ntitle: \"NFL Stats Variables\"\n---\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  echo = FALSE,\n  comment = \"#>\"\n)\n"
  }
]

// ... and 19 more files (download for full content)

About this extraction

This page contains the full source code of the mrcaseb/nflfastR GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 131 files (852.0 KB), approximately 256.7k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!