Repository: mrcaseb/nflfastR
Branch: master
Commit: 0489133d85c5
Files: 131
Total size: 852.0 KB

Directory structure:
gitextract_h9htwsm9/

├── .Rbuildignore
├── .git-blame-ignore-revs
├── .github/
│   ├── .gitignore
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── format-suggest.yaml
│       ├── pkgdown.yaml
│       ├── revdepcheck.yaml
│       └── rhub.yaml
├── .gitignore
├── .vscode/
│   ├── extensions.json
│   └── settings.json
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── aggregate_game_stats.R
│   ├── aggregate_game_stats_def.R
│   ├── aggregate_game_stats_kicking.R
│   ├── build_nflfastR_pbp.R
│   ├── build_playstats.R
│   ├── calculate_series_conversion_rates.R
│   ├── calculate_standings.R
│   ├── calculate_stats.R
│   ├── data_documentation.R
│   ├── database.R
│   ├── ep_wp_calculators.R
│   ├── helper_add_cp_cpoe.R
│   ├── helper_add_ep_wp.R
│   ├── helper_add_fixed_drives.R
│   ├── helper_add_game_data.R
│   ├── helper_add_nflscrapr_mutations.R
│   ├── helper_add_series_data.R
│   ├── helper_add_xpass.R
│   ├── helper_add_xyac.R
│   ├── helper_additional_functions.R
│   ├── helper_database_functions.R
│   ├── helper_decode_player_ids.R
│   ├── helper_get_scheds_and_rosters.R
│   ├── helper_scrape_gc.R
│   ├── helper_scrape_nfl.R
│   ├── helper_tidy_play_stats.R
│   ├── helper_variable_selector.R
│   ├── nflfastR-package.R
│   ├── report.R
│   ├── save_raw_pbp.R
│   ├── sysdata.rda
│   ├── top-level_scraper.R
│   └── utils.R
├── README.Rmd
├── README.md
├── air.toml
├── cran-comments.md
├── data/
│   ├── field_descriptions.rda
│   ├── nfl_stats_variables.rda
│   ├── stat_ids.rda
│   └── teams_colors_logos.rda
├── data-raw/
│   ├── MODELS.R
│   ├── Scrambles 1999-2004 UPDATE for NFLfastR.xlsx
│   ├── Scrambles.1999-2003.FURTHER.UPDATE.for.NFLfastR.xlsx
│   ├── _tune_spread_wp.R
│   ├── build_scramble_fix.R
│   ├── build_stat_id_df.R
│   ├── compare_dfs.R
│   ├── create_field_descriptions.R
│   ├── default_play.R
│   ├── nfl_stats_variables.R
│   ├── nfl_stats_variables.json
│   ├── pbp_datatypes.csv
│   ├── pbp_defaultplay.rds
│   ├── replace_models.R
│   ├── scramble_fix.rds
│   ├── scrambles_2005.xlsx
│   ├── teams_colors_logos.R
│   ├── tidy_play_stats_row.R
│   ├── variable_explanation.xlsx
│   ├── variable_list.txt
│   └── wordmarks.R
├── man/
│   ├── add_qb_epa.Rd
│   ├── add_xpass.Rd
│   ├── add_xyac.Rd
│   ├── build_nflfastR_pbp.Rd
│   ├── calculate_expected_points.Rd
│   ├── calculate_player_stats.Rd
│   ├── calculate_player_stats_def.Rd
│   ├── calculate_player_stats_kicking.Rd
│   ├── calculate_series_conversion_rates.Rd
│   ├── calculate_standings.Rd
│   ├── calculate_stats.Rd
│   ├── calculate_win_probability.Rd
│   ├── clean_pbp.Rd
│   ├── decode_player_ids.Rd
│   ├── fast_scraper.Rd
│   ├── fast_scraper_roster.Rd
│   ├── fast_scraper_schedules.Rd
│   ├── field_descriptions.Rd
│   ├── missing_raw_pbp.Rd
│   ├── nfl_stats_variables.Rd
│   ├── nflfastR-package.Rd
│   ├── reexports.Rd
│   ├── report.Rd
│   ├── save_raw_pbp.Rd
│   ├── stat_ids.Rd
│   ├── teams_colors_logos.Rd
│   ├── update_db.Rd
│   └── update_pbp_db.Rd
├── nflfastR.Rproj
├── pkgdown/
│   ├── _pkgdown.yml
│   └── extra.css
├── tests/
│   ├── testthat/
│   │   ├── 2019/
│   │   │   └── 2019_01_GB_CHI.rds
│   │   ├── 2025/
│   │   │   └── 2025_01_KC_LAC.rds
│   │   ├── _snaps/
│   │   │   ├── build_nflfastR_pbp.md
│   │   │   └── stats/
│   │   │       └── calculate_stats.md
│   │   ├── expected_ep.rds
│   │   ├── expected_pbp.rds
│   │   ├── expected_sc.rds
│   │   ├── expected_sc_weekly.rds
│   │   ├── expected_wp.rds
│   │   ├── games.rds
│   │   ├── helpers.R
│   │   ├── test-build_nflfastR_pbp.R
│   │   ├── test-calculate_series_conversion_rates.R
│   │   ├── test-calculate_stats.R
│   │   └── test-ep_wp_calculators.R
│   └── testthat.R
├── tools/
│   └── check.env
└── vignettes/
    ├── .gitignore
    ├── beginners_guide.Rmd
    ├── field_descriptions.Rmd
    ├── nflfastR.Rmd
    └── stats_variables.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^data-raw$
^README\.Rmd$
^.*\.pdf$
^.github$
^_pkgdown\.yml$
^docs$
^pkgdown$
^vignettes/articles$
^\.github$
^vignettes/nflfastR-models\.Rmd$
^vignettes$
^\.travis\.yml$
^man/figures/card\.png$
^man/figures/header_github\.png$
^man/figures/header_twitter\.png$
^man/figures/nflfastR_logo_fillsize\.png$
^cran-comments\.md$
^CRAN-RELEASE$
^man/figures/readme-cp-model-1\.png$
^man/figures/readme-epa-model-1\.png$
^revdep$
^CRAN-SUBMISSION$
^[.]?air[.]toml$
^\.vscode$
^\.git-blame-ignore-revs$


================================================
FILE: .git-blame-ignore-revs
================================================
# This file lists revisions of large-scale formatting/style changes so that
# they can be excluded from git blame results.
#
# To set this file as the default ignore file for git blame, run:
#   $ git config blame.ignoreRevsFile .git-blame-ignore-revs

# Format whole project with air format . (#47)
66de9ebe6d53415a770de224c1f0f442ef22358c


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
  workflow_dispatch:

name: R-CMD-check.yaml

permissions: read-all

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: macos-latest,   r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
          - {os: ubuntu-latest,   r: 'release'}
          - {os: ubuntu-latest,   r: 'oldrel-1'}

    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: |
            any::rcmdcheck
            nflverse/fastrmodels
            nflverse/nflreadr
            nflverse/nflseedR
          needs: check

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true
          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'


================================================
FILE: .github/workflows/format-suggest.yaml
================================================
# Workflow derived from https://github.com/posit-dev/setup-air/tree/main/examples

on:
  # Using `pull_request_target` over `pull_request` for elevated `GITHUB_TOKEN`
  # privileges, otherwise we can't set `pull-requests: write` when the pull
  # request comes from a fork, which is our main use case (external contributors).
  #
  # `pull_request_target` runs in the context of the target branch (`main`, usually),
  # rather than in the context of the pull request like `pull_request` does. Due
  # to this, we must explicitly checkout `ref: ${{ github.event.pull_request.head.sha }}`.
  # This is typically frowned upon by GitHub, as it exposes you to potentially running
  # untrusted code in a context where you have elevated privileges, but they explicitly
  # call out the use case of reformatting and committing back / commenting on the PR
  # as a situation that should be safe (because we aren't actually running the untrusted
  # code, we are just treating it as passive data).
  # https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/
  pull_request_target:

name: format-suggest.yaml

jobs:
  format-suggest:
    name: format-suggest
    runs-on: ubuntu-latest

    permissions:
      # Required to push suggestion comments to the PR
      pull-requests: write

    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Install
        uses: posit-dev/setup-air@v1

      - name: Format
        run: air format .

      - name: Suggest
        uses: reviewdog/action-suggester@v1
        with:
          level: error
          fail_level: error
          tool_name: air


================================================
FILE: .github/workflows/pkgdown.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]
  release:
    types: [published]
  workflow_dispatch:

name: pkgdown

jobs:
  pkgdown:
    runs-on: ubuntu-latest
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
      NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
      isPush: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true


      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: |
            r-lib/pkgdown
            nflverse/fastrmodels
            nflverse/nflplotR
            nflverse/nflreadr
            any::tidyverse
            any::ggrepel
            any::knitr
            any::tictoc
            any::ragg
            any::DT
            local::.
          needs: website

      - name: Build site
        run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
        shell: Rscript {0}

      - name: Deploy to GitHub pages 🚀
        if: github.event_name != 'pull_request'
        uses: JamesIves/github-pages-deploy-action@v4.5.0
        with:
          clean: false
          branch: gh-pages
          folder: docs

      - name: Deploy to Netlify
        if: contains(env.isPush, 'false')
        id: netlify-deploy
        uses: nwtgck/actions-netlify@v1.1
        with:
          publish-dir: './docs'
          production-branch: master
          github-token: ${{ secrets.GITHUB_TOKEN }}
          overwrites-pull-request-comment: true
          deploy-message:
            'Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})'
        timeout-minutes: 1


================================================
FILE: .github/workflows/revdepcheck.yaml
================================================
# Workflow derived from https://github.com/r-devel/recheck?tab=readme-ov-file#how-to-use-with-github-actions
on:
  workflow_dispatch:
    inputs:
      which:
        type: choice
        description: Which dependents to check
        options:
        - strong
        - most

name: Reverse dependency check

jobs:
  revdep_check:
    name: Reverse check ${{ inputs.which }} dependents
    uses: r-devel/recheck/.github/workflows/recheck.yml@v1
    with:
      which: ${{ inputs.which }}


================================================
FILE: .github/workflows/rhub.yaml
================================================
# R-hub's generic GitHub Actions workflow file. It's canonical location is at
# https://github.com/r-hub/actions/blob/v1/workflows/rhub.yaml
# You can update this file to a newer version using the rhub2 package:
#
# rhub::rhub_setup()
#
# It is unlikely that you need to modify this file manually.

name: R-hub
run-name: "${{ github.event.inputs.id }}: ${{ github.event.inputs.name || format('Manually run by {0}', github.triggering_actor) }}"

on:
  workflow_dispatch:
    inputs:
      config:
        description: 'A comma separated list of R-hub platforms to use.'
        type: string
        default: 'linux,windows,macos'
      name:
        description: 'Run name. You can leave this empty now.'
        type: string
      id:
        description: 'Unique ID. You can leave this empty now.'
        type: string

jobs:

  setup:
    runs-on: ubuntu-latest
    outputs:
      containers: ${{ steps.rhub-setup.outputs.containers }}
      platforms: ${{ steps.rhub-setup.outputs.platforms }}

    steps:
    # NO NEED TO CHECKOUT HERE
    - uses: r-hub/actions/setup@v1
      with:
        config: ${{ github.event.inputs.config }}
      id: rhub-setup

  linux-containers:
    needs: setup
    if: ${{ needs.setup.outputs.containers != '[]' }}
    runs-on: ubuntu-latest
    name: ${{ matrix.config.label }}
    strategy:
      fail-fast: false
      matrix:
        config: ${{ fromJson(needs.setup.outputs.containers) }}
    container:
      image: ${{ matrix.config.container }}

    steps:
      - uses: r-hub/actions/checkout@v1
      - uses: r-hub/actions/platform-info@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/setup-deps@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/run-check@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}

  other-platforms:
    needs: setup
    if: ${{ needs.setup.outputs.platforms != '[]' }}
    runs-on: ${{ matrix.config.os }}
    name: ${{ matrix.config.label }}
    strategy:
      fail-fast: false
      matrix:
        config: ${{ fromJson(needs.setup.outputs.platforms) }}

    steps:
      - uses: r-hub/actions/checkout@v1
      - uses: r-hub/actions/setup-r@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}
      - uses: r-hub/actions/platform-info@v1
        with:
          token: ${{ secrets.RHUB_TOKEN }}
          job-config: ${{ matrix.config.job-config }}
      - uses: r-hub/actions/setup-deps@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}
      - uses: r-hub/actions/run-check@v1
        with:
          job-config: ${{ matrix.config.job-config }}
          token: ${{ secrets.RHUB_TOKEN }}


================================================
FILE: .gitignore
================================================
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
# User-specific files
.Ruserdata
# Example code in package build process
*-Ex.R
# Output files from R CMD build
/*.tar.gz
# Output files from R CMD check
/*.Rcheck/
# RStudio files
.Rproj.user/
# produced vignettes
vignettes/*.html
vignettes/*.pdf
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
# knitr and R markdown default cache directories
*_cache/
/cache/
# Temporary files created by R markdown
*.utf8.md
*.knit.md
# R Environment Variables
.Renviron
.DS_Store
docs
inst/doc
revdep


================================================
FILE: .vscode/extensions.json
================================================
{
    "recommendations": [
        "Posit.air-vscode"
    ]
}


================================================
FILE: .vscode/settings.json
================================================
{
    "[r]": {
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "Posit.air-vscode"
    },
    "[quarto]": {
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "quarto.quarto"
    }
}


================================================
FILE: DESCRIPTION
================================================
Type: Package
Package: nflfastR
Title: Functions to Efficiently Access NFL Play by Play Data
Version: 5.2.0.9012
Authors@R: c(
    person("Sebastian", "Carl", , "mrcaseb@gmail.com", role = "aut"),
    person("Ben", "Baldwin", , "bbaldwin206@gmail.com", role = c("cre", "aut")),
    person("Lee", "Sharpe", role = "ctb"),
    person("Maksim", "Horowitz", , "maksim.horowitz@gmail.com", role = "ctb"),
    person("Ron", "Yurko", , "ryurko@stat.cmu.edu", role = "ctb"),
    person("Samuel", "Ventura", , "samventura22@gmail.com", role = "ctb"),
    person("Tan", "Ho", role = "ctb"),
    person("John", "Edwards", , "edwards1860@gmail.com", role = "ctb")
  )
Description: A set of functions to access National Football League
    play-by-play data from <https://www.nfl.com/>.
License: MIT + file LICENSE
URL: https://nflfastr.com/, https://github.com/nflverse/nflfastR
BugReports: https://github.com/nflverse/nflfastR/issues
Depends: 
    R (>= 4.1.0)
Imports: 
    cli (>= 3.0.0),
    curl,
    data.table (>= 1.15.0),
    dplyr (>= 1.0.0),
    fastrmodels (>= 2.1.0),
    furrr,
    future,
    glue,
    janitor,
    lifecycle,
    mgcv,
    nflreadr (>= 1.2.0),
    progressr (>= 0.6.0),
    rlang (>= 0.4.7),
    stringr (>= 1.4.0),
    tibble (>= 3.0),
    tidyr (>= 1.0.0),
    xgboost (>= 1.1)
Suggests: 
    DBI,
    duckdb,
    gsisdecoder,
    nflseedR (>= 2.0.0),
    purrr (>= 0.3.0),
    rmarkdown,
    RSQLite,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3


================================================
FILE: LICENSE
================================================
YEAR: 2020
COPYRIGHT HOLDER: Sebastian Carl; Ben Baldwin


================================================
FILE: LICENSE.md
================================================
# MIT License

Copyright (c) 2020 Sebastian Carl; Ben Baldwin

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

export(add_qb_epa)
export(add_xpass)
export(add_xyac)
export(build_nflfastR_pbp)
export(calculate_expected_points)
export(calculate_player_stats)
export(calculate_player_stats_def)
export(calculate_player_stats_kicking)
export(calculate_series_conversion_rates)
export(calculate_standings)
export(calculate_stats)
export(calculate_win_probability)
export(clean_pbp)
export(decode_player_ids)
export(fast_scraper)
export(fast_scraper_roster)
export(fast_scraper_schedules)
export(load_pbp)
export(load_player_stats)
export(load_rosters)
export(load_schedules)
export(load_team_stats)
export(missing_raw_pbp)
export(most_recent_season)
export(nflverse_sitrep)
export(report)
export(save_raw_pbp)
export(update_db)
export(update_pbp_db)
import(dplyr)
import(fastrmodels)
importFrom(data.table,"%between%")
importFrom(data.table,"%chin%")
importFrom(nflreadr,load_pbp)
importFrom(nflreadr,load_player_stats)
importFrom(nflreadr,load_rosters)
importFrom(nflreadr,load_schedules)
importFrom(nflreadr,load_team_stats)
importFrom(nflreadr,most_recent_season)
importFrom(nflreadr,nflverse_sitrep)
importFrom(rlang,"%||%")
importFrom(rlang,":=")
importFrom(rlang,.data)
importFrom(rlang,.env)
importFrom(xgboost,getinfo)


================================================
FILE: NEWS.md
================================================
# nflfastR (development version)

- Added new function `update_pbp_db()`, a fresh approach to the database helper. (#544)
- Added `"game_id"` to the output `calculate_stats()` if `summary_level == "week"`. (#566)
- Fixed a bug where `fixed_drive` did not increment after a muffed blocked field goal attempt. Yes this happened in `"2025_10_NO_CAR"`, play id 2504. (#567)
- nflfastR stopped supporting the 1999 and 2000 seasons because of inconsistent data sources. Data is still available through `load_pbp()` but we will not fix any issues related to those old seasons anymore. It's possible to install nflfastR v5.2.0 (with `pak::pak("nflverse/nflfastR@v5.2.0")`) to parse those seasons if necessary. (#568)
- Implemented a fresh approach to compute `play_type` based on `play_type_nfl` for faster and more consistent output. (#568)
- Fixed a bug where nflfastR overwrote the kickoff_attempt variable in the event of a penalty on a kickoff. (#569)
- Added various definitions of 'explosive' plays to the output of `calculate_stats()`. It counts passes, runs, and receptions with 10+, 20+, 40+ yards gained as well as 12+ yard runs and 16+ yard passes. (#573)
- Added several punting stats to the output of `calculate_stats()`. (#574)
- Added overall fumble counters to the output of `calculate_stats()` because it was missing some edge case fumbles on offense. (#575)
- The `play_type` variable now possibly shows `"pass"` or `"run"` on 2 point conversion plays with a post-snap penalty enforced between downs. This is different from `play_type_nfl` (which will show `"PENALTY"` in these cases). (#579)
- Fixed bug where `calculate_stats()` counted fumble recoveries in `fumble_recovery_yards_own` and `fumble_recovery_yards_opp` instead of the corresponding yards. (#584)
- Fixed bug where `calculate_stats()` counted some blocked punts as punt attempts that officially do not count as punt attempts. (#584)
- Fixed bug where `calculate_stats()` overcounted first downs in some edge cases. (#587)
- nflfastR now loads raw play-by-play data from season based releases in the `nflverse/nflverse-pbp` GitHub repository. The legacy repository `nflverse/nflfastR-raw` is deprecated and won't update in future seasons. This means that previous nflfastR versions won't be able to download 2026+ seasons! (#589)

# nflfastR 5.2.0

- Bump required fastrmodels version to 2.0 for better compatibility with xgboost.
- Fixed an issue with duplicated play IDs in some 2000 games. (#521)
- Added the argument `pbp` to `calculate_stats()` to allow stats calculation based on subsets of nflverse play-by-play data. (#524)
- Fixed a bug where `calculate_stats()` didn't count 60 yard field goal attempts in `"fg_made_60_"` and `"fg_missed_60_"`. (#531)
- Fixed a bug where `clean_pbp()` did not provide a passer on plays where scrambles where manually adjusted based on data from Aaron Schatz. (#536)
- nflfastR now directly reexports nflreadr's `load_pbp()`, `load_player_stats()`, and `load_team_stats()`. This means that the functions can be called normally via nflfastR, but are no longer available in the documentation (whether in the R Help or on the pkgdown website). Instead, only links to nflreadr are included. This ensures that the documentation is always up to date. (#538)
- `fast_scraper_roster()` and `fast_scraper_schedules()` are officially deprecated and will be removed in a future update. Please use `load_rosters()` and `load_schedules()`. (#539)
- `report()` is deprecated and will be removed in a future update. Please use `nflverse_sitrep()`. (#540)
- Fixed incompatibility with xgboost v3 model outputs. (#553)
- Added `"Kickoff Out of Bounds"` (introduced in the 2024 season) to the `penalty_type` variable in play-by-play. (#560)

Thank you to &#x0040;Doug-Analytics, &#x0040;isaactpetersen, &#x0040;jeleff1000, &#x0040;JoeMarino2021, &#x0040;kbannon77, &#x0040;lancejames35, &#x0040;LinkedInMindset, &#x0040;manbradcalf, &#x0040;mrcaseb, &#x0040;thedfszone, &#x0040;TheMathNinja, and &#x0040;zaynpatel for their questions, feedback, and contributions towards this release.

# nflfastR 5.1.0

- The function `calculate_standings()` has been deprecated. Please use `nflseedR::nfl_standings()` in nflseedR v2.0 instead. (#510)
- nflfastR now requires R 4.1 to allow the package to use R's native pipe `|>` operator. This follows the [Tidyverse R version support rules](https://tidyverse.org/blog/2019/04/r-version-support/). (#511)
- Fixed a bug where `calculate_stats()` incorrectly counted `receiving_air_yards`. (#500)
- Fixed a bug where `vegas_wp` variables were broken when `spread_line` data was missing. (#503)
- Fixed a bug where `calculate_stats()` incorrectly calculated `target_share` and `air_yards_share` when `summary_level = "season"`. (#505)
- Fixed a bug where `calculate_stats()` incorrectly counted `fumbles`. (#514)
- Compatibility improvements with xgboost. (#517)

Thank you to &#x0040;ak47twq, &#x0040;isaactpetersen, &#x0040;jacobakaye, &#x0040;johnpholden, &#x0040;marvin3FF, &#x0040;mrcaseb, and &#x0040;tanho63 for their questions, feedback, and contributions towards this release.

# nflfastR 5.0.0

## Major Changes

- Added new function `calculate_stats()` that combines the output of all `calculate_player_stats*()` functions with a more robust and faster approach. The `calculate_player_stats*()` function will be deprecated in a future release. (#470)
- Added new exported dataframe `nfl_stats_variables`. It lists and explains all variables returned by `calculate_stats()`. A searchable table is available at <https://nflfastr.com/articles/stats_variables.html>. (#470)

## Bug Fixes and Minor Changes

- Drop `{crayon}`, `{DT}`, `{httr}`, `{jsonlite}`, `{qs}` dependencies. (#453)
- The function `calculate_player_stats_def` now returns `season_type` if argument `weekly` is set to `TRUE` for consistency with the other player stats functions. (#455)
- The function `missing_raw_pbp()` now allows filtering by season. (#457)
- More robust handling of player IDs in `decode_player_ids()`. (#458)
- Fixed rare cases where the value of the `yrdln` variable didn't equal `"MID 50"` at midfield. (#459)
- Fixed rare cases where `drive_start_yard_line` missed the blank space between team name and yard line number. (#459)
- Fixed play description in some 1999 and 2000 games where the string "D.Holland" replaced the kick distance. (#459)
- Fixed a problem where the `goal_to_go` variable was `FALSE` in actual goal to go situations. (#460)
- Fixed a bug in `fixed_drive` and `fixed_drive_result` where the second weather delay in `2023_13_ARI_PIT` wasn't identified correctly. (#461)
- `punter_player_id`, and `punter_player_name` are filled for blocked punt attempts. (#463)
- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468, #489)
- Updated the dataframe `stat_ids` with some IDs that were previously missing. (#470)
- nflfastR tried to fix bugs in the underlying pbp data of JAX home games prior to the 2016 season. An update of the raw pbp data resolved those bugs so nflfastR needs to remove the hard coded adjustments. This means that nflfastR <= v4.6.1 will return incorrect pbp data for all Jacksonville home games prior to the 2016 season! (#478)
- Fixed a problem where `clean_pbp()` returned `pass = 1` in actual rush plays in very rare cases. (#479)
- Removed extra lines for injury timeouts that were breaking `fixed_drive` (#482)
- The variable `penalty_type` now correctly lists the penalty "Kickoff Short of Landing Zone" introduced in the 2024 season. (#486)
- Fixed a bug where `ep` was incorrect on PAT attempts preceded by a timeout and then a penalty (extremely rare). This bug also caused the variables `total_home_epa` and `total_away_epa` to be incorrect for all subsequent plays in the same game. (#493)

Thank you to &#x0040;ahmed-cheema, &#x0040;andrewtek, &#x0040;guga31bb, &#x0040;isaactpetersen, &#x0040;JoeMarino2021, &#x0040;john-b-edwards, &#x0040;marcusSasser, &#x0040;mlounsberry, &#x0040;morganandrew, &#x0040;mrcaseb, &#x0040;mscoop16, &#x0040;parsnipz, &#x0040;rjthompson2, and &#x0040;Useight for their questions, feedback, and contributions towards this release.

# nflfastR 4.6.1

- The function `calculate_series_conversion_rates()` now correctly aggregates season level conversion rates. Performance has also been improved. (#440)
- Adjusted test behavior at CRAN's request. 

Thank you to
&#x0040;andrewtek, &#x0040;gregalvi86, &#x0040;Ic4ru5Wing, &#x0040;JoeMarino2021, &#x0040;jreddy1990, &#x0040;marvin3FF, &#x0040;mrcaseb, &#x0040;RicShern, &#x0040;SPNE, and &#x0040;trivialfis for their questions, feedback, and contributions towards this release.

# nflfastR 4.6.0

## New Features

- nflfastR now fully supports loading raw pbp data from local file system. The best way to use this feature is to set `options("nflfastR.raw_directory" = {"your/local/directory"})`. Alternatively, both `build_nflfastR_pbp()` and `fast_scraper()` support the argument `dir` which defaults to the above option. (#423)
- Added the new function `save_raw_pbp()` which efficiently downloads raw play-by-play data and saves it to the local file system. This serves as a helper to setup the system for faster play-by-play parsing via the above functionality. (#423)
- Added the new function `missing_raw_pbp()` that computes a vector of game IDs missing in the local raw play-by-play directory. (#423)

## Minor Improvements and Bugfixes

- The internal function `get_pbp_nfl()` now uses `ifelse()` instead of `dplyr::if_else()` to handle some null-checking, fixes bug found in `2022_21_CIN_KC` match.
- The function `calculate_player_stats()` now summarises target share and air yards share correctly when called with argument `weekly = FALSE` (#413)
- The function `calculate_player_stats()` now returns the opponent team when called with argument `weekly = TRUE` (#414)
- The function `calculate_player_stats_def()` no longer errors when small subsets of pbp data are missing stats. (#415)
- The function `calculate_series_conversion_rates()` no longer returns `NA` values if a small subset of pbp data is missing series on offense or defense. (#417)
- `fixed_drive` now correctly increments on plays where posteam lost a fumble but remains posteam because defteam also lost a fumble during the same play. (#419)
- nflfastR now fixes missing drive number counts in raw pbp data in order to provide accurate drive information. (#420)
- nflfastR now returns correct `kick_distance` on all punts and kickoffs. (#422)
- Decode player IDs in 2023 pbp. (#425)
- Drop the pseudo plays TV Timeout and Two-Minute Warning. (#426)
- Fix posteam on kickoffs and PATs following a defensive TD in 2023+ pbp. (#427)
- `calculate_player_stats()` no more counts lost fumbles on plays where a player fumbles, a team mate recovers and then loses a fumble to the defense. (#431)
- The variables `passer`, `receiver`, and `rusher` no more return `NA` on "abnormal" plays - like direct snaps, aborted snaps, laterals etc. - that resulted in a penalty. (#435) 

Thank you to
&#x0040;903124, &#x0040;ak47twq, &#x0040;andrewtek, &#x0040;darkhark, &#x0040;dennisbrookner, &#x0040;marvin3FF, &#x0040;mistakia, &#x0040;mrcaseb, &#x0040;nicholasmendoza22, &#x0040;rickstarblazer, &#x0040;RileyJohnson22, and &#x0040;tanho63 for their questions, feedback, and contributions towards this release.

# nflfastR 4.5.1

* New implementation of tests to be able to identify breaking changes in reverse dependencies (#396, #406)
* `calculate_standings()` no more freezes when computing standings from schedules where some games are missing results, i.e. upcoming games.
* Bug fix that caused problems with upcoming dplyr and tidyselect updates that weren't reverse compatible.
* Significant performance improvements of internal functions. (#402)
* Wrap examples in `try()` to avoid CRAN problems. (#404)
* Fixed a bug where `calculate_standings()` wasn't able to handle nflverse pbp data. (#404)

# nflfastR 4.5.0

## New (experimental) functions
* Added new function `calculate_player_stats_def()` that aggregates defensive player stats either at game level or overall. (#288)
* The situation report `nflverse_sitrep` which is an alias of the already available `report()`
* Added new function `calculate_player_stats_kicking()` that aggregates player stats for field goals and extra points at game level or overall. (#381)
* Added new function `calculate_series_conversion_rates()` that computes series conversion and series result rates at a game level or season level. (#393)

## Bugfixes and Minor Improvements

* Internal change to `calculate_player_stats()` that reflects new nflverse data infrastructure.
* `calculate_player_stats()` now unifies player names and joins the following player information via `nflreadr::load_players()`:
  - `player_display_name` - Full name of the player
  - `position` - Position of the player
  - `position_group` - Position group of the player
  - `headshot_url` - URL to a player headshot image
* Make data work in 2022 (hopefully)
* Fix Amon-Ra St. Brown breaking the name parser
* Add gsis_id patch to `clean_pbp()`.
* `calculate_player_stats_def()` failed in situations where play-by-play data is missing certain stats. (#382)
* Spot-fixing `calculate_player_stats()` for `NA` names.

# nflfastR 4.4.0

## New Functions, Options, Data

* Added new function `calculate_standings()` that computes regular season division standings and playoff seeds from nflverse data.
* The database function `update_db()` now supports the option "nflfastR.dbdirectory" which can be used to set the directory of the nflfastR pbp database globally and independent of any project structure or working directories.
* The embedded data frame `?teams_colors_logos` has been updated to reflect the most recent team color themes and gained additional variables for conference and division as well as logo urls to the conference and league logos. (#290)
* The embedded data frame `?teams_colors_logos` has been updated with the Washington Commanders. (#312)

## Deprecation

* The argument `qs` in the functions `load_pbp()` and `load_player_stats()` has been deprecated as of nflfastR 4.3.0. This release removes the argument entirely. 

## Bugfixes and Minor Improvements

* Fixed bug where a player could be duplicated in `calculate_player_stats()` in very rare cases caused by plays with laterals. (#289)
* Fixed a bug where the function `add_xpass()` failed when called with an empty data frame. (#296)
* Fixed a bug where `play_type` showed `no_play` on plays with penalties that don't result in a replay of the down. (#277, #281)
* Fixed a bug in the variable descriptions of `total_home_score` and `total_away_score`. (#300)
* `fast_scraper_rosters()` and `fast_scraper_schedules()` now call `nflreadr::load_rosters()` and `nflreadr::load_schedules()` under the hood (#304)
* Fixed a bug causing missing EPA on game-ending turnovers in overtime
* Bump minimum nflreadr version to 1.2.0 for data repository change
* Fix a bug affecting yardline for a very small number of plays in the 2000 season (#323)
* `update_db()` now uses a default play to predefine column types for all db drivers. (#324)
* Fix a bug that resulted in incorrect `xyac_mean_yardage` on 4th downs (#327)
* Fix a bug that resulted in missing `xyac` information for plays involving J.O'Shaughnessy (#329)
* Fix a bug that resulted in missing `epa` on the last play of some games involving NE and BUF (#331)
* `fast_scraper()` and `build_nflfastR_pbp()` now return data frames of class `nflverse_data` to be consistent with `nflreadr`.
* Fix behavior of EP model in neutral site games (treats both teams as away teams)

# nflfastR 4.3.0

## Minor Changes

* Add [nflreadr](https://nflreadr.nflverse.com/) to dependecies and drop lubridate and magrittr dependency
* The functions `load_pbp()` and `load_player_stats()` now call `nflreadr::load_pbp()` and `nflreadr::load_player_stats()` respectively. Therefore the argument `qs` has been deprecated in both functions. It will be removed in a future release. Running `load_player_stats()` without any argument will now return player stats of the current season only (the default in `nflreadr`).
* The deprecated arguments `source` and `pp` in the functions `fast_scraper_*()` and `build_nflfastR_pbp()` have been removed
* Added the variables `racr` ("Receiver Air Conversion Ratio"), `target_share`, `air_yards_share`, `wopr` ("Weighted Opportunity Rating") and `pacr` ("Passing Air Conversion Ratio") to the output of `calculate_player_stats()`
* Added the function `report()` which will be used by the maintainers to help users debug their problems (#274).

## Bug Fixes

* Fixed a minor bug in the console output of `update_db()`
* Fix for a handful of missing `receiver` names (#270)
* Fixed bug with missing `return_team` on interception return touchdowns (#275)
* Fixed a rare bug where an internal object wasn't predefined (#272)

# nflfastR 4.2.0

* All `wpa` variables are `NA` on end game line
* All `wp` variables are 0, 0.5, 1, or `NA` on end game line
* Fix bug where win prob on PATs assumed a PAT placed at 15 yard line, even in older seasons
* The function `decode_player_ids()` now really decodes the new variable `fantasy_id` (#229)
* Fixed a bug that caused slightly differing `wp` values depending on the first game in the data set (#183)
* Edited GitHub references to point to nflverse
* Added the variables `sack_yards`, `sack_fumbles`, `rushing_fumbles` and `receiving_fumbles` to the output of the function `calculate_player_stats()`, thanks to Mike Filicicchia (@TheMathNinja). (#239)
* Fixed a bug where `calculate_player_stats()` falsely counted lost fumbles on aborted snaps (#238)
* Added the variable `season_type` to the output of `calculate_player_stats()` and `load_player_stats()` in preparation of the extended Regular Season starting in 2021 (#240)
* Updated `season_type` definitions in preparation of the extended Regular Season starting in 2021 (#242)
* Fix for `fixed_drive` where it wasn't incrementing when there was a muffed punt followed by timeout (#244)
* Fix for `fixed_drive` where it wasn't incrementing following an interception with the intercepting player then losing a fumble (#247)
* Fix for more issues with missing play info in 2018_01_ATL_PHI (#246)
* Added the variables `safety_player_name` and `safety_player_id` to the play-by-play data (#252)
* Dropped the dependency `usethis`

# nflfastR 4.1.0

## Breaking changes

### Functions

* Added the function `calculate_player_stats()` that aggregates official passing, rushing, and receiving stats either at game level or overall
* Added the function `load_player_stats()` that loads weekly player stats from 1999 to the most recent season
* The performance of the functions `add_xyac()` and `clean_pbp()` has been significantly improved

### New Variables

* Added the new columns `td_player_name` and `td_player_id` to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown)
* The function `calculate_player_stats()` now adds the variable `dakota`, the `epa` + `cpoe` composite, for players with minimum 5 pass attempts.
* Added column `home_opening_kickoff` to `clean_pbp()`
* Added the variables `sack_player_id`, `sack_player_name`, `half_sack_1_player_id`, `half_sack_1_player_name`, `half_sack_2_player_id` and `half_sack_2_player_name` who identify players that recorded sacks (or half sacks). Also updated the description of the variables `qb_hit_1_player_id`, `qb_hit_1_player_name`, `qb_hit_2_player_id` and `qb_hit_2_player_name` to make more clear that they did not record a sack. (#180)

## Minor improvements and fixes

* The variable `qb_scramble` was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included.
* Change `name`, `id`, `rusher`, and `rusher_id` to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162)
* The function `clean_pbp()` now standardizes the team name columns `tackle_with_assist_*_team`
* Fix bug in `drive` that was causing incorrect overtime win probabilities (#194)
* Fixed a bug where `posteam` was not `NA` on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values for `fixed_drive`, `fixed_drive_result`, `series` and `series_result`
* Fixed a bug where `fixed_drive` and `series` were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeouts
* Fixed a bug where `fixed_drive` and `series` were falsely incrementing on muffed punts recovered by the punting team for a touchdown
* Fixed a bug where `add_xpass()` crashed when ran with data already including xpass variables. 
* Fixed a bug in `epa` when a safety is scored by the team beginning the play in possession of the ball (#186)
* Fix some bugs related to David and Duke Johnson on the Texans in 2020 (#163)
* Fix yet another bug related to correctly identifying possession team on kickoffs nullified by penalty (#199)
* Fixed a bug where `calculate_player_stats()` forgot to clean player names by using their IDs
* Fixed a bug where special teams touchdowns were missing in the output of `calculate_player_stats()` (#203)
* Fixed for some old Jaguars games where the wrong team was awarded points for safeties and kickoff return TDs (#209)
* The function `update_db()` no more falsely closes a database connection provided by the argument `db_connection` (#210)
* Fixed a bug where `yards_gained` was missing yardage on plays with laterals. (#216)
* Fixed a bug where there were stats wrongly given on a play with penalty (#218)
* `fixed_drive` now increments properly on onside kick recoveries (#215)
* `fixed_drive` no longer counts a muffed kickoff as a one-play drive on its own (#217)
* `fixed_drive` now properly increments after a safety (#219)
* Improved parser for `penalty_type` and updated the description of the variable to make more clear it's the first penalty that happened on a play. (#223)

# nflfastR 4.0.0

## Breaking changes

### Changed Functions

* Deprecated the arguments `source` and `pp` all across the package. Using them will cause a 
warning. Parallel processing has to be activated by choosing an appropriate `future::plan()` before
calling the relevant functions. For more information please see [the package documentation](https://nflfastr.com/reference/nflfastR-package.html).
* The function `build_nflfastR_pbp()` will now run `decode_player_ids()` by default (can be deactivated with the argument `decode = FALSE`). 
* The function `build_nflfastR_pbp()` will now run `add_xpass()` by default and add the new variables `xpass` and `pass_oe`.
* The functions `fast_scraper()` and `build_nflfastR_pbp()` now allow the output of `fast_scraper_schedules()` directly as input so it's not necessary anymore to pull the `game_id` first.

### New Functions and Variables

* Added the new function `load_pbp()` that loads complete seasons into memory for fast access of the play-by-play data.
* Added the new variables `rushing_yards`, `lateral_rushing_yards`, `passing_yards`, `receiving_yards`, `lateral_receiving_yards` to fix an old bug where `yards_gained` gets overwritten on plays with laterals (#115).
* Added columns `vegas_wpa` and `vegas_home_wpa` which contain Win Probability Added from the spread-adjusted WP model
* Added column `out_of_bounds`
* Added columns `fantasy`, `fantasy_id`, `fantasy_player_name`, and `fantasy_player_id` that indicate the rusher or receiver on the play
* Added columns `tackle_with_assist`, `tackle_with_assist_1_player_id`, `tackle_with_assist_1_player_name`, `tackle_with_assist_1_team`, `tackle_with_assist_2_player_id`, `tackle_with_assist_2_player_name`, `tackle_with_assist_2_team`

### Models and Miscellaneous

* Tuned spread-adjusted win probability model one final (?) time. Expected points is now no longer 
required for `calculate_win_probability()`
* Added field descriptions `vignette("field_descriptions")` with a searchable list of all nflfastR variables
* Switched data source for 2001-2010 to what is used for 2011 and on
* All models have been moved to the [fastrmodels](https://cran.r-project.org/package=fastrmodels) package
* Added the data frames `?field_descriptions` and `?stat_ids` to the package

## Minor improvements and fixes

* Fix bug where `fixed_drive` and `series` weren't updating after muffed punt (#144)
* Fix bug induced by fixing the above (#149)
* Fix bug where some special teams plays were incorrectly being labeled as pass plays (#125)
* Fix bug where points for safeties were given to the `defteam` instead of the `posteam` (#152)
* Fix bug where a muffed punt TD was given to the wrong team in a 2011 Jaguars game (#154)
* Win probability is now calculated prior to PAT attempts rather than using WP on the ensuing kickoff
* Improved performance of internal functions that speed up the rebuilding process in `update_db()`
(added `qs` and `curl` to dependencies)
* Fixed a bug where `calculate_expected_points()` and `calculate_win_probability()` duplicated some existing variables instead of replacing them (#170)
* Fixed a bug where `penalty_type` wasn't `"no_play"` although it should have been (#172)
* Fixed a bug where `penalty_team` could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174)
* Fixed a bug related to the calculation of `epa` on plays before a failed pass interference challenge in a few 2019 games (#175)
* Fixed a bug related to lots of fields with `NA` on offsetting penalties (#44)
* Fixed a bug in `epa` when possession team changes at end of 1st or 3rd quarter (#182)
* Fixed a bug where various functions have left open connections
* `vegas_wp` is now `NA` on final line since there is no possession team


# nflfastR 3.2.0

## Models

* Performance update for win probability model with point spread (`vegas_wp`)
* Added `yardline_100` as an input to both win probability models (not having it included was an oversight)

## Minor improvements and fixes

* Fixed a bug where `series` was increased on PATs
* Fixed a bug affecting the week 10 Raiders-Broncos game
* Added the column `team_wordmark` - which contains URLs to the team's wordmarks - to the included data frame `?teams_colors_logos`

# nflfastR 3.1.1

## New features

### Database Function `update_db()`

* The argument `force_rebuild` of the function `update_db()` is now of hybrid 
type. It can rebuild the play by play data table either for the whole nflfastR 
era (with `force_rebuild = TRUE`) or just for specified seasons 
(e.g. `force_rebuild = 2019:2020`).
The latter is intended to be used for running seasons because the NFL fixes bugs
in the play by play data during the week and we recommend to rebuild the current 
season every Thursday.
* Fixed a bug where `update_db()` disconnected the connection to a database provided 
by the argument `db_connection` (#102)
* Fixed a bug where `update_db()` didn't build a fresh database without providing
the argument `force_rebuild`
* `update_db()` no longer removes the complete data table when a numeric argument 
`force_rebuild` is passed but only removes the rows within the table (#109)

### New Functions

* Added the new function `build_nflfastR_pbp()`, a convenient wrapper around 
multiple nflfastR functions for an easy creation of the nflfastR play-by-play data set
* Added a function that applies our experimental expected pass model, `add_xpass()`,
that creates columns `xpass` and `pass_oe`

## Minor improvements and fixes

* More fixes for `fixed_drive` which was not incrementing properly on drives
that began following a timeout
* Fixed more bugs in EPA and win probability on PATs and kickoffs with penalties
* Fixed a bug where scoring probabilities weren't adding to 1 on field goal 
attempts near the end of a half
* Messages to the user are now created with the new dependency `usethis`
* Fixed bug where plays with "backward pass" in play description were counted as 
pass plays (`pass` = 1)
* Fixed missing kick distance on touchbacks and blocked punts (#53)
* Added the option `fast` (either `TRUE` or `FALSE`) to the function 
`decode_player_ids()` to activate the high efficient C++ decoder of the package 
[`gsisdecoder`](https://cran.r-project.org/package=gsisdecoder)

# nflfastR 3.0.0

## Breaking changes

* `fast_scraper_roster()` is finally back! It loads NFL roster of a given season.
* Added the function `decode_player_ids()` to decode all player IDs to the 
commonly known GSIS ID format (00-00xxxxx)

## New features

* Add option `source = "old"` to `fast_scraper()` to enable scraping of old source.
This is mostly useless as it doesn't work for 2020 and provides less info
* Added new option `db_connection` to `update_db()` to allow advanced users to
use other DBI drivers, such as `RMariaDB::MariaDB()`, `RPostgres::Postgres()` or 
`odbc::odbc()` (please see [dbplyr](https://dbplyr.tidyverse.org/articles/dbplyr.html)
for more information)

## Minor improvements and fixes

* `clean_pbp()` now fixes some bugs in jersey numbers
* `clean_pbp()`, `add_qb_epa()` and `add_xyac()` can now handle empty data frames
* Fix empty line causing `fast_scraper()` to fail (affects multiple games of the 2020 season)
* Fix bug in `fixed_drive` that counted PAT after defensive TD as its own drive
* Fixed a bug which caused too high number of tackles in special cases
* Fixed a bug where CPOE was NA when targeting players with apostrophe in last name

# nflfastR 2.2.1

* Fix `add_xyac()` breaking with some old packages
* Fix `add_xyac()` and `add_qb_epa()` calculations being wrong for some failed 4th downs
* Updated Readme with ep and cp model plots
* Updated `vignette("examples")` with the new `add_xyac()` function
* Added xYAC model to `vignette("nflfastR-models")`
* Added variables `fixed_drive` and `fixed_drive_result` to the output of 
`fast_scraper()` because the NFL-provided drive info is extremely buggy
* Added variable `series_result`
* `clean_pbp()` now adds 4 new variables `passer_jersey_number`, 
`rusher_jersey_number`, `receiver_jersey_number` and `jersey_number`. These can 
be used to join rosters. 
* Fixed incorrect `timeout_team`, `return_team`, `fumble_recovery_1_team` for JAX
games from 2011-2015
* Re-trained EPA model with `fixed_drive` and corrections to `timeout_team`

# nflfastR 2.2.0

* New function `add_xyac()` which adds the following columns associated with expected yards after
the catch (xYAC): `xyac_epa`, `xyac_success`, `xyac_fd`, `xyac_mean_yardage`, `xyac_median_yardage`

# nflfastR 2.1.3

* Fixed a bug in `series_success` caused by bad `drive` information provided by NFL

# nflfastR 2.1.2

* Added the following columns that are available 2011 and later: `special_teams_play`, `st_play_type`, `time_of_day`, and `order_sequence`
* Added `old_game_id` column (useful for merging to external data that still uses this ID: format is YYYYMMDDxx)
* The `clean_pbp()` function now adds an `aborted_play` column
* Fixed a bug where pass plays with a penalty at end of play were classified as `play_type` = `no_play` rather than `pass`
* Fixed bug where EPA on defensive 2 point return was -0.95 instead of -2.95
* Fixed some remaining failed challenge plays that incorrectly had 0 for EPA
* Updated the included dataframe `teams_colors_logos` for the interim name of 
the 'Washington Football Team' and the corresponding logo urls.
* Some internal code improvements causing the required `tidyselect` version
to be >= 1.1.0

# nflfastR 2.1.1

### Functions

* `clean_pbp()` now standardizes player IDs across the old (1999-2010) and new 
(2011+) data sources. Player IDs once again uniquely identify players, and each 
unique player has one unique ID (as they did before the NFL data source change):
    * For players whose careers finished before 2011, their IDs remain the same
    * For players who played in both eras or only in the new era, their ID is 
    the new ID
    * For example, Akili Smith (ID: 00-0015082) and Alex Smith 
    (ID: 32013030-2d30-3032-3334-3336b638d37d) are both abbreviated as "A.Smith" 
    but can be distinguished by their IDs, with Akili showing what the old 
    format ID looks like, and Smith the new one
    * Standardization is realized by using an ID map
    available in the data repo
    
* `clean_pbp()` now removes all variables it is about to create to make sure 
nothing unexpected can happen

### Miscellaneous

* Added minimum version requirements to some package dependencies because 
installation broke for some users with outdated packages

* Made a minor bug fix to catch more out-of-order plays and fixed a bug where some
plays were being incorrectly dropped in older seasons

* Standardized team names (e.g. `SD` --> `LAC`) in some columns we had missed

# nflfastR 2.1.0

### Models

* Removed `week` from Expected Points models along with an update of
`vignette("nflfastR-models")` and `vignette("examples")`

### Functions

* Added function `update_db()` which adds all completed games to a SQLite database
* Added function `calculate_win_probability()` 
* Added new examples to `vignette("examples")` demonstrating the usage of the
above mentioned functions

### Bugs

* Fixed a problem with inconsistent data types of the variable
`drive_real_start_time` pre and post 2011
* Fixed a problem where some `game_id`s were overwritten during the play by play parsing
* Fix some more WP bugs on kickoffs with penalties and rare play description

### Miscellaneous

* `fast_scraper()` now loads the raw game data from a separate raw data repo
* Completely overhauled the entire code base to directly implement
[tidy evaluation](https://dplyr.tidyverse.org/articles/programming.html) using 
`.data` from the [rlang](https://rlang.r-lib.org/) package (this is a major 
code change that takes some getting used to but we need it in preparation of 
a future release)

# nflfastR 2.0.6

* Fixed a problem where defensive two point conversions were not counted
* Kneels on kickoffs are no longer counted as qb kneels
* Variable `yards_gained` more precisely defined
* Bugfixes for more games with out of order of plays
* Fix bug related to EPA on plays with a failed pass interference challenge
* Added new example to `vignette("examples")` to demonstrate Expected Points 
calculator `calculate_expected_points()`
* Fix for WP on 2-pt conversion negated by penalty
* Add more variables (containing team names) to team standardization in `clean_pbp()`
* Fix WP for onside kicks

# nflfastR 2.0.5

* Fix yet another bug caused by NFL providing plays out of order
* Fix bugs related to penalties on PATs and kickoffs
* Fix bugs related to NFL providing wrong scoring team on defensive touchdowns 
in older games involving the Jaguars
* Fix some minor issues related to wrong `first_down_rush` and `return_touchdown`
* Improved error handling of `fast_scraper()` for not yet played games
* Improved variable documentation and prepared for new website
* Improved performance for dplyr v1.0.0
* Rebuilt EP and WP models due to bugfixes in the underlying data in the versions
2.0.3, 2.0.4 and 2.0.5

# nflfastR 2.0.4

* Fix another bug with out of order plays
* Fix bug affecting cumulative totals for WPA, air_WPA and yac_WPA 
* Fix bug affecting cumulative totals for air_EPA and yac_EPA

# nflfastR 2.0.3

* Fix for NFL providing plays out of order
* Fix for series not incrementing following defensive TD

# nflfastR 2.0.2

* Fixed a bug in the series and series success calculations caused by timeouts
following a possession change
* Fixed win probability on PATs

# nflfastR 2.0.1

* Added minimum version requirement on `xgboost` (>= 1.1) as the recent `xgboost` update 
caused a breaking change leading to failure in adding model results to data

# nflfastR 2.0.0

### Models
* Added new models for Expected Points, Win Probability and Completion Probability 
and removed `nflscrapR` dependency. This is a **major** change as we are stepping away 
from the well established `nflscrapR` models. But we believe it is a good step forward.
See `data-raw/MODEL-README.md` for detailed model information.

* Added internal functions for `EPA` and `WPA` to `helper_add_ep_wp.R`.

* Added new function `calculate_expected_points()` usable for the enduser.

### Functions
* Completely overhauled `fast_scraper()` to make it work with the NFL's new server 
backend. The option `source` is still available but will be deprecated since there
is only one source now. There are some changes in the output as well (please see below).

* `fast_scraper()` now adds game data to the play by play data set courtesy of Lee Sharpe. 
Game data include:
away_score, home_score, location, result, total, spread_line, total_line, div_game, 
roof, surface, temp, wind, home_coach, away_coach, stadium, stadium_id, gameday

* `fastcraper_schedules()` now incorporates Lee Sharpe's `games.rds`.

* The functions `fast_scraper_clips()` and `fast_scraper_roster()` are deactivated 
due to the missing data source. They might be reactivated or completely dropped 
in future versions.

* The function `fix_fumbles()` has been renamed to `add_qb_epa()` as the new name
much better describes what the function is actually doing.

### Miscellaneous

* Added progress information using the `progressr`package and removed the 
`furrr` progress bars.

* `clean_pbp()` now adds the column `ìd` which is the id of the player in the column `name`. 
Because we have to piece together different data to cover the full span of years,
**player IDs are not consistent between the early (1999-2010) and recent (2011 onward)
periods**.

* Added a `NEWS.md` file to track changes to the package.

* Fixed several bugs inhereted from `nflscrapR`, including one where EPA was missing 
when a play was followed by two timeouts (for example, a two-minute warning followed by a timeout),
and another where `play_type` was incorrect on plays with declined penalties.

* Fixed a bug, where `receiver_player_name` and `receiver` didn't name the correct
players on plays with lateral passes.

### Play-by-Play Output
The output has changed a little bit. 

#### The following variables were dropped

| Dropped Variables          | Description                                                                                                                                                                       |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| game_key                   | RS feed game identifier.                                                                                                                                                          |
| game_time_local            | Kickoff time in local time zone.                                                                                                                                                  |
| iso_time                   | Kickoff time according ISO 8601.                                                                                                                                                  |
| game_type                  | One of 'REG', 'WC', 'DIV', 'CON', 'SB' indicating if a game was a regular season game or one of the playoff rounds.                                                               |
| site_id                    | RS feed id for game site.                                                                                                                                                         |
| site_city                  | Game site city.                                                                                                                                                                   |
| site_state                 | Game site state.                                                                                                                                                                  |
| drive_possession_team_abbr | Abbreviation of the possession team in a given drive.                                                                                                                             |
| scoring_team_abbr          | Abbreviation of the scoring team if the play was a scoring play.                                                                                                                  |
| scoring_type               | String indicating the scoring type. One of 'FG', 'TD', 'PAT', 'SFTY', 'PAT2'.                                                                                                     |
| alert_play_type            | String describing the play type of a play the NFL has listed as alert play. For most of those plays there are highlight clips available through fast_scraper_clips. |
| time_of_day                | Local time at the beginning of the play.                                                                                                                                          |
| yards                      | Analogue yards_gained but with the kicking team being the possession team (which means that there are many yards gained through kickoffs and punts).                              |
| end_yardline_number        | Yardline number within the above given side at the end of the given play.                                                                                                         |
| end_yardline_side          | String indicating the side of the field at the end of the given play.                                                                                                             |

#### The following variables were renamed

| Renamed Variables                             | Description                                                                                                                                               |
|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| game_time_eastern -> start_time               | Kickoff time in eastern time zone.                                                                                                                        |
| site_fullname -> stadium                      | Game site name.                                                                                                                                           |
| drive_how_started -> drive_start_transition   | String indicating how the offense got the ball.                                                                                                           |
| drive_how_ended -> drive_end_transition       | String indicating how the offense lost the ball.                                                                                                          |
| drive_start_time -> drive_game_clock_start    | Game time at the beginning of a given drive.                                                                                                              |
| drive_end_time -> drive_game_clock_end        | Game time at the end of a given drive.                                                                                                                    |
| drive_start_yardline -> drive_start_yard_line | String indicating where a given drive started consisting of team half and yard line number.                                                               |
| drive_end_yardline -> drive_end_yard_line     | String indicating where a given drive ended consisting of team half and yard line number.                                                                 |
| roof_type -> roof                             | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |

#### The following variables were added

| Added Variables        | Description                                                                                                                                                                                                          |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| vegas_wp               | Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.                                                                              |
| vegas_home_wp          | Estimated win probability for the home team incorporating pre-game Vegas line.                                                                                                                                       |
| weather                | String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!                                                                                        |
| nfl_api_id             | UUID of the game in the new NFL API.                                                                                                                                                                                 |
| play_clock             | Time on the playclock when the ball was snapped.                                                                                                                                                                     |
| play_deleted           | Binary indicator for deleted plays.                                                                                                                                                                                  |
| end_clock_time         | Game time at the end of a given play.                                                                                                                                                                                |
| end_yard_line          | String indicating the yardline at the end of the given play consisting of team half and yard line number.                                                                                                            |
| drive_real_start_time  | Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').                                                                                                                     |
| drive_ended_with_score | Binary indicator the drive ended with a score.                                                                                                                                                                       |
| drive_quarter_start    | Numeric value indicating in which quarter the given drive has started.                                                                                                                                               |
| drive_quarter_end      | Numeric value indicating in which quarter the given drive has ended.                                                                                                                                                 |
| drive_play_id_started  | Play_id of the first play in the given drive.                                                                                                                                                                        |
| drive_play_id_ended    | Play_id of the last play in the given drive.                                                                                                                                                                         |
| away_score             | Total points scored by the away team.                                                                                                                                                                                |
| home_score             | Total points scored by the home team.                                                                                                                                                                                |
| location               | Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site.                                                                                                                           |
| result                 | Equals home_score - away_score and means the game outcome from the perspective of the home team.                                                                                                                     |
| total                  | Equals home_score + away_score and means the total points scored in the given game.                                                                                                                                  |
| spread_line            | The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference) |
| total_line             | The closing total line for the game. (Source: Pro-Football-Reference)                                                                                                                                                |
| div_game               | Binary indicator for if the given game was a division game.                                                                                                                                                          |
| roof                   | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)                                                            |
| surface                | What type of ground the game was played on. (Source: Pro-Football-Reference)                                                                                                                                         |
| temp                   | The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)                                                                                                               |
| wind                   | The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)                                                                                                         |
| home_coach             | First and last name of the home team coach. (Source: Pro-Football-Reference)                                                                                                                                         |
| away_coach             | First and last name of the away team coach. (Source: Pro-Football-Reference)                                                                                                                                         |
| stadium_id             | ID of the stadium the game was played in. (Source: Pro-Football-Reference)                                                                                                                                           |
| game_stadium           | Name of the stadium the game was played in. (Source: Pro-Football-Reference)                                                                                                                                         |


================================================
FILE: R/aggregate_game_stats.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl
# Styleguide: styler::tidyverse_style()
################################################################################

#' Get Official Game Stats
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#' [load_pbp()] or [build_nflfastR_pbp()]. If the data doesn't include the variable
#' `qb_epa`, the function `add_qb_epa()` will be called to add it.
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats
#' for the entire Data frame.
#' @description Build columns that aggregate official passing, rushing, and receiving stats
#' either at the game level or at the level of the entire data frame passed.
#' @return A data frame including the following columns (all ID columns are
#' decoded to the gsis ID format):
#' \describe{
#' \item{player_id}{ID of the player. Use this to join to other sources.}
#' \item{player_name}{Name of the player}
#' \item{player_display_name}{Full name of the player}
#' \item{position}{Position of the player}
#' \item{position_group}{Position group of the player}
#' \item{headshot_url}{URL to a player headshot image}
#' \item{games}{The number of games where the player recorded passing, rushing or receiving stats.}
#' \item{recent_team}{Most recent team player appears in `pbp` with.}
#' \item{season}{Season if `weekly` is `TRUE`}
#' \item{week}{Week if `weekly` is `TRUE`}
#' \item{season_type}{`REG` or `POST` if `weekly` is `TRUE`}
#' \item{opponent_team}{The player's opponent team if `weekly` is `TRUE`}
#' \item{completions}{The number of completed passes.}
#' \item{attempts}{The number of pass attempts as defined by the NFL.}
#' \item{passing_yards}{Yards gained on pass plays.}
#' \item{passing_tds}{The number of passing touchdowns.}
#' \item{interceptions}{The number of interceptions thrown.}
#' \item{sacks}{The Number of times sacked.}
#' \item{sack_yards}{Yards lost on sack plays.}
#' \item{sack_fumbles}{The number of sacks with a fumble.}
#' \item{sack_fumbles_lost}{The number of sacks with a lost fumble.}
#' \item{passing_air_yards}{Passing air yards (includes incomplete passes).}
#' \item{passing_yards_after_catch}{Yards after the catch gained on plays in
#' which player was the passer (this is an unofficial stat and may differ slightly
#' between different sources).}
#' \item{passing_first_downs}{First downs on pass attempts.}
#' \item{passing_epa}{Total expected points added on pass attempts and sacks.
#' NOTE: this uses the variable `qb_epa`, which gives QB credit for EPA for up
#' to the point where a receiver lost a fumble after a completed catch and makes
#' EPA work more like passing yards on plays with fumbles.}
#' \item{passing_2pt_conversions}{Two-point conversion passes.}
#' \item{pacr}{Passing Air Conversion Ratio. PACR = `passing_yards` / `passing_air_yards`}
#' \item{dakota}{Adjusted EPA + CPOE composite based on coefficients which best predict adjusted EPA/play in the following year.}
#' \item{carries}{The number of official rush attempts (incl. scrambles and kneel downs).
#' Rushes after a lateral reception don't count as carry.}
#' \item{rushing_yards}{Yards gained when rushing with the ball (incl. scrambles and kneel downs).
#' Also includes yards gained after obtaining a lateral on a play that started
#' with a rushing attempt.}
#' \item{rushing_tds}{The number of rushing touchdowns (incl. scrambles).
#' Also includes touchdowns after obtaining a lateral on a play that started
#' with a rushing attempt.}
#' \item{rushing_fumbles}{The number of rushes with a fumble.}
#' \item{rushing_fumbles_lost}{The number of rushes with a lost fumble.}
#' \item{rushing_first_downs}{First downs on rush attempts (incl. scrambles).}
#' \item{rushing_epa}{Expected points added on rush attempts (incl. scrambles and kneel downs).}
#' \item{rushing_2pt_conversions}{Two-point conversion rushes}
#' \item{receptions}{The number of pass receptions. Lateral receptions officially
#' don't count as reception.}
#' \item{targets}{The number of pass plays where the player was the targeted receiver.}
#' \item{receiving_yards}{Yards gained after a pass reception. Includes yards
#' gained after receiving a lateral on a play that started as a pass play.}
#' \item{receiving_tds}{The number of touchdowns following a pass reception.
#' Also includes touchdowns after receiving a lateral on a play that started
#' as a pass play.}
#' \item{receiving_air_yards}{Receiving air yards (incl. incomplete passes).}
#' \item{receiving_yards_after_catch}{Yards after the catch gained on plays in
#' which player was receiver (this is an unofficial stat and may differ slightly
#' between different sources).}
#' \item{receiving_fumbles}{The number of fumbles after a pass reception.}
#' \item{receiving_fumbles_lost}{The number of fumbles lost after a pass reception.}
#' \item{receiving_2pt_conversions}{Two-point conversion receptions}
#' \item{racr}{Receiver Air Conversion Ratio. RACR = `receiving_yards` / `receiving_air_yards`}
#' \item{target_share}{The share of targets of the player in all targets of his team}
#' \item{air_yards_share}{The share of receiving_air_yards of the player in all air_yards of his team}
#' \item{wopr}{Weighted Opportunity Rating. WOPR = 1.5 × `target_share` + 0.7 × `air_yards_share`}
#' \item{fantasy_points}{Standard fantasy points.}
#' \item{fantasy_points_ppr}{PPR fantasy points.}
#' }
#' @export
#' @keywords internal
#' @seealso The function [load_player_stats()] and the corresponding examples
#' on [the nflfastR website](https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats)
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' # pbp <- nflfastR::load_pbp(2020)
#'
#' # weekly <- calculate_player_stats(pbp, weekly = TRUE)
#' # dplyr::glimpse(weekly)
#'
#' # overall <- calculate_player_stats(pbp, weekly = FALSE)
#' # dplyr::glimpse(overall)
#' })
#' }
calculate_player_stats <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)", "to join player information.")

  # Prepare data ------------------------------------------------------------

  # load plays with multiple laterals
  mult_lats <- nflreadr::rds_from_url(
    "https://github.com/nflverse/nflverse-data/releases/download/misc/multiple_lateral_yards.rds"
  ) |>
    dplyr::mutate(
      season = substr(.data$game_id, 1, 4) |> as.integer(),
      week = substr(.data$game_id, 6, 7) |> as.integer()
    ) |>
    dplyr::filter(.data$yards != 0) |>
    # the list includes all plays with multiple laterals
    # and all receivers. Since the last one already is in the
    # pbp data, we have to drop him here so the entry isn't duplicated
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::slice(seq_len(dplyr::n() - 1)) |>
    dplyr::ungroup() |>
    # there are some very rare cases where a player collects lateral yards
    # multiple times in the same play. We need to aggregate here to make sure
    # this don't messes up joins (#289)
    dplyr::group_by(
      .data$season,
      .data$week,
      .data$type,
      .data$gsis_player_id
    ) |>
    dplyr::summarise(yards = sum(.data$yards)) |>
    dplyr::ungroup()

  # filter down to the 2 dfs we need
  suppressMessages({
    # 1. for "normal" plays: get plays that count in official stats
    data <- pbp |>
      dplyr::filter(
        !is.na(.data$down),
        .data$play_type %in% c("pass", "qb_kneel", "qb_spike", "run")
      ) |>
      decode_player_ids()

    if (!"qb_epa" %in% names(data)) {
      data <- add_qb_epa(data)
    }

    # 2. for 2pt conversions only, get those plays
    two_points <- pbp |>
      dplyr::filter(.data$two_point_conv_result == "success") |>
      dplyr::select(
        "week",
        "season",
        "posteam",
        "defteam",
        "pass_attempt",
        "rush_attempt",
        "passer_player_name",
        "passer_player_id",
        "rusher_player_name",
        "rusher_player_id",
        "lateral_rusher_player_name",
        "lateral_rusher_player_id",
        "receiver_player_name",
        "receiver_player_id",
        "lateral_receiver_player_name",
        "lateral_receiver_player_id"
      ) |>
      decode_player_ids()
  })

  if (!"special" %in% names(pbp)) {
    # we need this column for the special teams tds
    pbp <- pbp |>
      dplyr::mutate(
        special = dplyr::if_else(
          .data$play_type %in%
            c("extra_point", "field_goal", "kickoff", "punt"),
          1,
          0
        )
      )
  }

  s_type <- pbp |>
    dplyr::select("season", "season_type", "week") |>
    dplyr::distinct()

  # we'll join some player information like position or full name later
  # so we load it here to be able to use it for racr ids as well
  player_info <- nflreadr::load_players() |>
    dplyr::select(
      "player_id" = "gsis_id",
      "player_display_name" = "display_name",
      "player_name" = "short_name",
      "position",
      "position_group",
      "headshot_url" = "headshot"
    )

  # load gsis_ids of RBs, FBs and HBs for RACR
  racr_ids <- player_info |>
    dplyr::filter(.data$position %in% c("RB", "FB", "HB")) |>
    dplyr::select("gsis_id" = "player_id")

  # Passing stats -----------------------------------------------------------

  # get passing stats
  pass_df <- data |>
    dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) |>
    dplyr::group_by(.data$passer_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      passing_yards_after_catch = sum(
        (.data$passing_yards - .data$air_yards) * .data$complete_pass,
        na.rm = TRUE
      ),
      name_pass = dplyr::first(.data$passer_player_name),
      team_pass = dplyr::first(.data$posteam),
      opp_pass = dplyr::first(.data$defteam),
      passing_yards = sum(.data$passing_yards, na.rm = TRUE),
      passing_tds = sum(
        .data$touchdown == 1 &
          .data$td_team == .data$posteam &
          .data$complete_pass == 1
      ),
      interceptions = sum(.data$interception),
      attempts = sum(
        .data$complete_pass == 1 |
          .data$incomplete_pass == 1 |
          .data$interception == 1
      ),
      completions = sum(.data$complete_pass == 1),
      sack_fumbles = sum(
        .data$fumble == 1 & .data$fumbled_1_player_id == .data$passer_player_id
      ),
      sack_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$passer_player_id &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      passing_air_yards = sum(.data$air_yards, na.rm = TRUE),
      sacks = sum(.data$sack),
      sack_yards = -1 * sum(.data$yards_gained * .data$sack),
      passing_first_downs = sum(.data$first_down_pass),
      passing_epa = sum(.data$qb_epa, na.rm = TRUE),
      pacr = .data$passing_yards / .data$passing_air_yards,
      pacr = dplyr::case_when(
        is.nan(.data$pacr) ~ NA_real_,
        .data$passing_air_yards <= 0 ~ 0,
        TRUE ~ .data$pacr
      ),
    ) |>
    dplyr::rename("player_id" = "passer_player_id") |>
    dplyr::ungroup()

  if (isTRUE(weekly)) {
    pass_df <- add_dakota(pass_df, pbp = pbp, weekly = weekly)
  }

  pass_two_points <- two_points |>
    dplyr::filter(.data$pass_attempt == 1) |>
    dplyr::group_by(.data$passer_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_pass and team_pass here for the full join in the next pipe
      name_pass = custom_mode(.data$passer_player_name),
      team_pass = custom_mode(.data$posteam),
      opp_pass = custom_mode(.data$defteam),
      passing_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "passer_player_id") |>
    dplyr::ungroup()

  pass_df <- pass_df |>
    # need a full join because players without passing stats that recorded
    # a passing two point (e.g. WRs) are dropped in any other join
    dplyr::full_join(
      pass_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_pass",
        "team_pass",
        "opp_pass"
      )
    ) |>
    dplyr::mutate(
      passing_2pt_conversions = dplyr::if_else(
        is.na(.data$passing_2pt_conversions),
        0L,
        .data$passing_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id))

  pass_df_nas <- is.na(pass_df)
  epa_index <- which(
    dimnames(pass_df_nas)[[2]] %in% c("passing_epa", "dakota", "pacr")
  )
  pass_df_nas[, epa_index] <- c(FALSE)

  pass_df[pass_df_nas] <- 0

  # Rushing stats -----------------------------------------------------------

  # rush df 1: primary rusher
  rushes <- data |>
    dplyr::filter(.data$play_type %in% c("run", "qb_kneel")) |>
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      name_rush = dplyr::first(.data$rusher_player_name),
      team_rush = dplyr::first(.data$posteam),
      opp_rush = dplyr::first(.data$defteam),
      yards = sum(.data$rushing_yards, na.rm = TRUE),
      tds = sum(.data$td_player_id == .data$rusher_player_id, na.rm = TRUE),
      carries = dplyr::n(),
      rushing_fumbles = sum(
        .data$fumble == 1 &
          .data$fumbled_1_player_id == .data$rusher_player_id &
          is.na(.data$lateral_rusher_player_id)
      ),
      rushing_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$rusher_player_id &
          is.na(.data$lateral_rusher_player_id) &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      rushing_first_downs = sum(
        .data$first_down_rush & is.na(.data$lateral_rusher_player_id)
      ),
      rushing_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # rush df 2: lateral
  laterals <- data |>
    dplyr::filter(!is.na(.data$lateral_rusher_player_id)) |>
    dplyr::group_by(.data$lateral_rusher_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      lateral_yards = sum(.data$lateral_rushing_yards, na.rm = TRUE),
      lateral_fds = sum(.data$first_down_rush, na.rm = TRUE),
      lateral_tds = sum(
        .data$td_player_id == .data$lateral_rusher_player_id,
        na.rm = TRUE
      ),
      lateral_att = dplyr::n(),
      lateral_fumbles = sum(.data$fumble, na.rm = TRUE),
      lateral_fumbles_lost = sum(.data$fumble_lost, na.rm = TRUE)
    ) |>
    dplyr::ungroup() |>
    dplyr::rename("rusher_player_id" = "lateral_rusher_player_id") |>
    dplyr::bind_rows(
      mult_lats |>
        dplyr::filter(
          .data$type == "lateral_rushing" &
            .data$season %in% data$season &
            .data$week %in% data$week
        ) |>
        dplyr::select(
          "season",
          "week",
          "rusher_player_id" = "gsis_player_id",
          "lateral_yards" = "yards"
        ) |>
        dplyr::mutate(lateral_tds = 0L, lateral_att = 1L)
    ) |>
    # at this stage it is possible that a player is duplicated because he
    # has lateral yards both in the regular pbp and in the multiple laterals file.
    # This can happen when a player was the last lateral player in one play and
    # not the last lateral player in another play in the same game (wow absurd)
    # We summarise all columns to make sure there is only one row per player
    # per game. See (#289)
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarise_all(.funs = sum, na.rm = TRUE) |>
    dplyr::ungroup()

  # rush df: join
  rush_df <- rushes |>
    dplyr::left_join(laterals, by = c("rusher_player_id", "week", "season")) |>
    dplyr::mutate(
      lateral_yards = dplyr::if_else(
        is.na(.data$lateral_yards),
        0,
        .data$lateral_yards
      ),
      lateral_tds = dplyr::if_else(
        is.na(.data$lateral_tds),
        0L,
        .data$lateral_tds
      ),
      lateral_fumbles = dplyr::if_else(
        is.na(.data$lateral_fumbles),
        0,
        .data$lateral_fumbles
      ),
      lateral_fumbles_lost = dplyr::if_else(
        is.na(.data$lateral_fumbles_lost),
        0,
        .data$lateral_fumbles_lost
      ),
      lateral_fds = dplyr::if_else(
        is.na(.data$lateral_fds),
        0,
        .data$lateral_fds
      )
    ) |>
    dplyr::mutate(
      rushing_yards = .data$yards + .data$lateral_yards,
      rushing_tds = .data$tds + .data$lateral_tds,
      rushing_first_downs = .data$rushing_first_downs + .data$lateral_fds,
      rushing_fumbles = .data$rushing_fumbles + .data$lateral_fumbles,
      rushing_fumbles_lost = .data$rushing_fumbles_lost +
        .data$lateral_fumbles_lost
    ) |>
    dplyr::rename("player_id" = "rusher_player_id") |>
    dplyr::select(
      "player_id",
      "week",
      "season",
      "name_rush",
      "team_rush",
      "opp_rush",
      "rushing_yards",
      "carries",
      "rushing_tds",
      "rushing_fumbles",
      "rushing_fumbles_lost",
      "rushing_first_downs",
      "rushing_epa"
    ) |>
    dplyr::ungroup()

  rush_two_points <- two_points |>
    dplyr::filter(.data$rush_attempt == 1) |>
    dplyr::group_by(.data$rusher_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_rush and team_rush here for the full join in the next pipe
      name_rush = custom_mode(.data$rusher_player_name),
      team_rush = custom_mode(.data$posteam),
      opp_rush = custom_mode(.data$defteam),
      rushing_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "rusher_player_id") |>
    dplyr::ungroup()

  rush_df <- rush_df |>
    # need a full join because players without rushing stats that recorded
    # a rushing two point (mostly QBs) are dropped in any other join
    dplyr::full_join(
      rush_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_rush",
        "team_rush",
        "opp_rush"
      )
    ) |>
    dplyr::mutate(
      rushing_2pt_conversions = dplyr::if_else(
        is.na(.data$rushing_2pt_conversions),
        0L,
        .data$rushing_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id))

  rush_df_nas <- is.na(rush_df)
  epa_index <- which(dimnames(rush_df_nas)[[2]] == "rushing_epa")
  rush_df_nas[, epa_index] <- c(FALSE)

  rush_df[rush_df_nas] <- 0

  # Receiving stats ---------------------------------------------------------

  # receiver df 1: primary receiver
  rec <- data |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarize(
      name_receiver = dplyr::first(.data$receiver_player_name),
      team_receiver = dplyr::first(.data$posteam),
      opp_receiver = dplyr::first(.data$defteam),
      yards = sum(.data$receiving_yards, na.rm = TRUE),
      receptions = sum(.data$complete_pass == 1),
      targets = dplyr::n(),
      tds = sum(.data$td_player_id == .data$receiver_player_id, na.rm = TRUE),
      receiving_fumbles = sum(
        .data$fumble == 1 &
          .data$fumbled_1_player_id == .data$receiver_player_id &
          is.na(.data$lateral_receiver_player_id)
      ),
      receiving_fumbles_lost = sum(
        .data$fumble_lost == 1 &
          .data$fumbled_1_player_id == .data$receiver_player_id &
          is.na(.data$lateral_receiver_player_id) &
          .data$fumble_recovery_1_team != .data$posteam
      ),
      receiving_air_yards = sum(.data$air_yards, na.rm = TRUE),
      receiving_yards_after_catch = sum(.data$yards_after_catch, na.rm = TRUE),
      receiving_first_downs = sum(
        .data$first_down_pass & is.na(.data$lateral_receiver_player_id)
      ),
      receiving_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # receiver df 2: lateral
  laterals <- data |>
    dplyr::filter(!is.na(.data$lateral_receiver_player_id)) |>
    dplyr::group_by(
      .data$lateral_receiver_player_id,
      .data$week,
      .data$season
    ) |>
    dplyr::summarize(
      lateral_yards = sum(.data$lateral_receiving_yards, na.rm = TRUE),
      lateral_tds = sum(
        .data$td_player_id == .data$lateral_receiver_player_id,
        na.rm = TRUE
      ),
      lateral_att = dplyr::n(),
      lateral_fds = sum(.data$first_down_pass, na.rm = T),
      lateral_fumbles = sum(.data$fumble, na.rm = T),
      lateral_fumbles_lost = sum(.data$fumble_lost, na.rm = T)
    ) |>
    dplyr::ungroup() |>
    dplyr::rename("receiver_player_id" = "lateral_receiver_player_id") |>
    dplyr::bind_rows(
      mult_lats |>
        dplyr::filter(
          .data$type == "lateral_receiving" &
            .data$season %in% data$season &
            .data$week %in% data$week
        ) |>
        dplyr::select(
          "season",
          "week",
          "receiver_player_id" = "gsis_player_id",
          "lateral_yards" = "yards"
        ) |>
        dplyr::mutate(lateral_tds = 0L, lateral_att = 1L)
    ) |>
    # at this stage it is possible that a player is duplicated because he
    # has lateral yards both in the regular pbp and in the multiple laterals file.
    # This can happen when a player was the last lateral player in one play and
    # not the last lateral player in another play in the same game (wow absurd)
    # We summarise all columns to get make sure there is only one row per player
    # per game. See (#289)
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarise_all(.funs = sum, na.rm = TRUE) |>
    dplyr::ungroup()

  # receiver df 3: team receiving for WOPR
  rec_team <- data |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::group_by(.data$posteam, .data$week, .data$season) |>
    dplyr::summarize(
      team_targets = dplyr::n(),
      team_air_yards = sum(.data$air_yards, na.rm = TRUE),
    ) |>
    dplyr::ungroup()

  # rec df: join
  rec_df <- rec |>
    dplyr::left_join(
      laterals,
      by = c("receiver_player_id", "week", "season")
    ) |>
    dplyr::left_join(
      rec_team,
      by = c("team_receiver" = "posteam", "week", "season")
    ) |>
    dplyr::mutate(
      lateral_yards = dplyr::if_else(
        is.na(.data$lateral_yards),
        0,
        .data$lateral_yards
      ),
      lateral_tds = dplyr::if_else(
        is.na(.data$lateral_tds),
        0L,
        .data$lateral_tds
      ),
      lateral_fumbles = dplyr::if_else(
        is.na(.data$lateral_fumbles),
        0,
        .data$lateral_fumbles
      ),
      lateral_fumbles_lost = dplyr::if_else(
        is.na(.data$lateral_fumbles_lost),
        0,
        .data$lateral_fumbles_lost
      ),
      lateral_fds = dplyr::if_else(
        is.na(.data$lateral_fds),
        0,
        .data$lateral_fds
      )
    ) |>
    dplyr::mutate(
      receiving_yards = .data$yards + .data$lateral_yards,
      receiving_tds = .data$tds + .data$lateral_tds,
      receiving_yards_after_catch = .data$receiving_yards_after_catch +
        .data$lateral_yards,
      receiving_first_downs = .data$receiving_first_downs + .data$lateral_fds,
      receiving_fumbles = .data$receiving_fumbles + .data$lateral_fumbles,
      receiving_fumbles_lost = .data$receiving_fumbles_lost +
        .data$lateral_fumbles_lost,
      racr = .data$receiving_yards / .data$receiving_air_yards,
      racr = dplyr::case_when(
        is.nan(.data$racr) ~ NA_real_,
        .data$receiving_air_yards == 0 ~ 0,
        # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
        # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
        .data$receiving_air_yards < 0 &
          !.data$receiver_player_id %in% racr_ids$gsis_id ~ 0,
        TRUE ~ .data$racr
      ),
      target_share = .data$targets / .data$team_targets,
      air_yards_share = .data$receiving_air_yards / .data$team_air_yards,
      wopr = 1.5 * .data$target_share + 0.7 * .data$air_yards_share
    ) |>
    dplyr::rename("player_id" = "receiver_player_id") |>
    dplyr::select(
      "player_id",
      "week",
      "season",
      "name_receiver",
      "team_receiver",
      "opp_receiver",
      "receiving_yards",
      "receiving_air_yards",
      "receiving_yards_after_catch",
      "receptions",
      "targets",
      "receiving_tds",
      "receiving_fumbles",
      "receiving_fumbles_lost",
      "receiving_first_downs",
      "receiving_epa",
      "racr",
      "target_share",
      "air_yards_share",
      "wopr"
    )

  rec_two_points <- two_points |>
    dplyr::filter(.data$pass_attempt == 1) |>
    dplyr::group_by(.data$receiver_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      # need name_receiver and team_receiver here for the full join in the next pipe
      name_receiver = custom_mode(.data$receiver_player_name),
      team_receiver = custom_mode(.data$posteam),
      opp_receiver = custom_mode(.data$defteam),
      receiving_2pt_conversions = dplyr::n()
    ) |>
    dplyr::rename("player_id" = "receiver_player_id") |>
    dplyr::ungroup()

  rec_df <- rec_df |>
    # need a full join because players without receiving stats that recorded
    # a receiving two point are dropped in any other join
    dplyr::full_join(
      rec_two_points,
      by = c(
        "player_id",
        "week",
        "season",
        "name_receiver",
        "team_receiver",
        "opp_receiver"
      )
    ) |>
    dplyr::mutate(
      receiving_2pt_conversions = dplyr::if_else(
        is.na(.data$receiving_2pt_conversions),
        0L,
        .data$receiving_2pt_conversions
      )
    ) |>
    dplyr::filter(!is.na(.data$player_id), !is.na(.data$name_receiver))

  rec_df_nas <- is.na(rec_df)
  epa_index <- which(
    dimnames(rec_df_nas)[[2]] %in%
      c("receiving_epa", "racr", "target_share", "air_yards_share", "wopr")
  )
  rec_df_nas[, epa_index] <- c(FALSE)

  rec_df[rec_df_nas] <- 0

  # Special Teams -----------------------------------------------------------

  st_tds <- pbp |>
    dplyr::filter(.data$special == 1 & !is.na(.data$td_player_id)) |>
    dplyr::group_by(.data$td_player_id, .data$week, .data$season) |>
    dplyr::summarise(
      name_st = custom_mode(.data$td_player_name),
      team_st = custom_mode(.data$td_team),
      opp_st = custom_mode(.data$defteam),
      special_teams_tds = sum(.data$touchdown, na.rm = TRUE)
    ) |>
    dplyr::rename("player_id" = "td_player_id")

  # Combine all stats -------------------------------------------------------

  # combine all the stats together
  player_df <- pass_df |>
    dplyr::full_join(rush_df, by = c("player_id", "week", "season")) |>
    dplyr::full_join(rec_df, by = c("player_id", "week", "season")) |>
    dplyr::full_join(st_tds, by = c("player_id", "week", "season")) |>
    dplyr::left_join(s_type, by = c("season", "week")) |>
    dplyr::mutate(
      player_name = dplyr::case_when(
        !is.na(.data$name_pass) ~ .data$name_pass,
        !is.na(.data$name_rush) ~ .data$name_rush,
        !is.na(.data$name_receiver) ~ .data$name_receiver,
        TRUE ~ .data$name_st
      ),
      recent_team = dplyr::case_when(
        !is.na(.data$team_pass) ~ .data$team_pass,
        !is.na(.data$team_rush) ~ .data$team_rush,
        !is.na(.data$team_receiver) ~ .data$team_receiver,
        TRUE ~ .data$team_st
      ),
      opponent_team = dplyr::case_when(
        !is.na(.data$opp_pass) ~ .data$opp_pass,
        !is.na(.data$opp_rush) ~ .data$opp_rush,
        !is.na(.data$opp_receiver) ~ .data$opp_receiver,
        TRUE ~ .data$opp_st
      )
    ) |>
    dplyr::select(dplyr::any_of(c(
      # id information
      "player_id",
      "player_name",
      "recent_team",
      "season",
      "week",
      "season_type",
      "opponent_team",

      # passing stats
      "completions",
      "attempts",
      "passing_yards",
      "passing_tds",
      "interceptions",
      "sacks",
      "sack_yards",
      "sack_fumbles",
      "sack_fumbles_lost",
      "passing_air_yards",
      "passing_yards_after_catch",
      "passing_first_downs",
      "passing_epa",
      "passing_2pt_conversions",
      "pacr",
      "dakota",

      # rushing stats
      "carries",
      "rushing_yards",
      "rushing_tds",
      "rushing_fumbles",
      "rushing_fumbles_lost",
      "rushing_first_downs",
      "rushing_epa",
      "rushing_2pt_conversions",

      # receiving stats
      "receptions",
      "targets",
      "receiving_yards",
      "receiving_tds",
      "receiving_fumbles",
      "receiving_fumbles_lost",
      "receiving_air_yards",
      "receiving_yards_after_catch",
      "receiving_first_downs",
      "receiving_epa",
      "receiving_2pt_conversions",
      "racr",
      "target_share",
      "air_yards_share",
      "wopr",

      # special teams
      "special_teams_tds"
    ))) |>
    dplyr::filter(!is.na(.data$player_id), !is.na(.data$player_name))

  player_df_nas <- is.na(player_df)
  epa_index <- which(
    dimnames(player_df_nas)[[2]] %in%
      c(
        "passing_epa",
        "rushing_epa",
        "receiving_epa",
        "dakota",
        "racr",
        "target_share",
        "air_yards_share",
        "wopr",
        "pacr"
      )
  )
  player_df_nas[, epa_index] <- c(FALSE)

  player_df[player_df_nas] <- 0

  player_df <- player_df |>
    dplyr::mutate(
      fantasy_points = 1 /
        25 *
        .data$passing_yards +
        4 * .data$passing_tds +
        -2 * .data$interceptions +
        1 / 10 * (.data$rushing_yards + .data$receiving_yards) +
        6 *
          (.data$rushing_tds + .data$receiving_tds + .data$special_teams_tds) +
        2 *
          (.data$passing_2pt_conversions +
            .data$rushing_2pt_conversions +
            .data$receiving_2pt_conversions) +
        -2 *
          (.data$sack_fumbles_lost +
            .data$rushing_fumbles_lost +
            .data$receiving_fumbles_lost),

      fantasy_points_ppr = .data$fantasy_points + .data$receptions
    ) |>
    dplyr::arrange(.data$player_id, .data$season, .data$week)

  # if user doesn't want week-by-week input, aggregate the whole df
  if (isFALSE(weekly)) {
    player_df <- player_df |>
      # helper variables to summarise targetshare and air yard share
      # because targets and air yards summarise first
      dplyr::mutate(
        tgts = .data$targets,
        rec_air_yds = .data$receiving_air_yards
      ) |>
      dplyr::group_by(.data$player_id) |>
      dplyr::summarise(
        player_name = custom_mode(.data$player_name),
        games = dplyr::n(),
        recent_team = dplyr::last(.data$recent_team),
        # passing
        completions = sum(.data$completions),
        attempts = sum(.data$attempts),
        passing_yards = sum(.data$passing_yards),
        passing_tds = sum(.data$passing_tds),
        interceptions = sum(.data$interceptions),
        sacks = sum(.data$sacks),
        sack_yards = sum(.data$sack_yards),
        sack_fumbles = sum(.data$sack_fumbles),
        sack_fumbles_lost = sum(.data$sack_fumbles_lost),
        passing_air_yards = sum(.data$passing_air_yards),
        passing_yards_after_catch = sum(.data$passing_yards_after_catch),
        passing_first_downs = sum(.data$passing_first_downs),
        passing_epa = dplyr::if_else(
          all(is.na(.data$passing_epa)),
          NA_real_,
          sum(.data$passing_epa, na.rm = TRUE)
        ),
        passing_2pt_conversions = sum(.data$passing_2pt_conversions),
        pacr = .data$passing_yards / .data$passing_air_yards,

        # rushing
        carries = sum(.data$carries),
        rushing_yards = sum(.data$rushing_yards),
        rushing_tds = sum(.data$rushing_tds),
        rushing_fumbles = sum(.data$rushing_fumbles),
        rushing_fumbles_lost = sum(.data$rushing_fumbles_lost),
        rushing_first_downs = sum(.data$rushing_first_downs),
        rushing_epa = dplyr::if_else(
          all(is.na(.data$rushing_epa)),
          NA_real_,
          sum(.data$rushing_epa, na.rm = TRUE)
        ),
        rushing_2pt_conversions = sum(.data$rushing_2pt_conversions),

        # receiving
        receptions = sum(.data$receptions),
        targets = sum(.data$targets),
        receiving_yards = sum(.data$receiving_yards),
        receiving_tds = sum(.data$receiving_tds),
        receiving_fumbles = sum(.data$receiving_fumbles),
        receiving_fumbles_lost = sum(.data$receiving_fumbles_lost),
        receiving_air_yards = sum(.data$receiving_air_yards),
        receiving_yards_after_catch = sum(.data$receiving_yards_after_catch),
        receiving_first_downs = sum(.data$receiving_first_downs),
        receiving_epa = dplyr::if_else(
          all(is.na(.data$receiving_epa)),
          NA_real_,
          sum(.data$receiving_epa, na.rm = TRUE)
        ),
        receiving_2pt_conversions = sum(.data$receiving_2pt_conversions),
        racr = .data$receiving_yards / .data$receiving_air_yards,
        target_share = dplyr::if_else(
          all(is.na(.data$target_share)),
          NA_real_,
          sum(.data$tgts, na.rm = TRUE) /
            sum(.data$tgts / .data$target_share, na.rm = TRUE)
        ),
        air_yards_share = dplyr::if_else(
          all(is.na(.data$air_yards_share)),
          NA_real_,
          sum(.data$rec_air_yds, na.rm = TRUE) /
            sum(.data$rec_air_yds / .data$air_yards_share, na.rm = TRUE)
        ),
        wopr = 1.5 * .data$target_share + 0.7 * .data$air_yards_share,

        # special teams
        special_teams_tds = sum(.data$special_teams_tds),

        # fantasy
        fantasy_points = sum(.data$fantasy_points),
        fantasy_points_ppr = sum(.data$fantasy_points_ppr)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(
        racr = dplyr::case_when(
          is.nan(.data$racr) ~ NA_real_,
          .data$receiving_air_yards == 0 ~ 0,
          # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
          # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
          .data$receiving_air_yards < 0 &
            !.data$player_id %in% racr_ids$gsis_id ~ 0,
          TRUE ~ .data$racr
        ),
        pacr = dplyr::case_when(
          is.nan(.data$pacr) ~ NA_real_,
          .data$passing_air_yards <= 0 ~ 0,
          TRUE ~ .data$pacr
        )
      ) |>
      add_dakota(pbp = pbp, weekly = weekly) |>
      dplyr::select(
        "player_id":"pacr",
        dplyr::any_of("dakota"),
        dplyr::everything()
      )
  }

  # data is missing position and player name can be messed up in pbp
  # so we join player information next
  player_df <- player_df |>
    dplyr::select(-"player_name") |>
    dplyr::left_join(player_info, by = "player_id") |>
    dplyr::select(
      "player_id",
      "player_name",
      "player_display_name",
      "position",
      "position_group",
      "headshot_url",
      dplyr::everything()
    )

  return(player_df)
}

add_dakota <- function(add_to_this, pbp, weekly) {
  dakota_model <- NULL
  con <- url(
    "https://github.com/nflverse/nflfastR-data/blob/master/models/dakota_model.Rdata?raw=true"
  )
  try(load(con), silent = TRUE)
  close(con)

  if (is.null(dakota_model)) {
    user_message(
      "This function needs to download the model data from GitHub. Please check your Internet connection and try again!",
      "oops"
    )
    return(add_to_this)
  }

  if (!"id" %in% names(pbp)) {
    pbp <- clean_pbp(pbp)
  }
  if (!"qb_epa" %in% names(pbp)) {
    pbp <- add_qb_epa(pbp)
  }

  suppressMessages({
    df <- pbp |>
      dplyr::filter(.data$pass == 1 | .data$rush == 1) |>
      dplyr::filter(
        !is.na(.data$posteam) &
          !is.na(.data$qb_epa) &
          !is.na(.data$id) &
          !is.na(.data$down)
      ) |>
      dplyr::mutate(
        epa = dplyr::if_else(.data$qb_epa < -4.5, -4.5, .data$qb_epa)
      ) |>
      decode_player_ids()
  })

  if (isTRUE(weekly)) {
    relevant_players <- add_to_this |>
      dplyr::filter(.data$attempts >= 5) |>
      dplyr::mutate(
        filter_id = paste(.data$player_id, .data$season, .data$week, sep = "_")
      ) |>
      dplyr::pull(.data$filter_id)

    model_data <- df |>
      dplyr::group_by(.data$id, .data$week, .data$season) |>
      dplyr::summarize(
        n_plays = n(),
        epa_per_play = sum(.data$epa) / .data$n_plays,
        cpoe = mean(.data$cpoe, na.rm = TRUE)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(cpoe = dplyr::if_else(is.na(.data$cpoe), 0, .data$cpoe)) |>
      dplyr::rename("player_id" = "id") |>
      dplyr::mutate(
        filter_id = paste(.data$player_id, .data$season, .data$week, sep = "_")
      ) |>
      dplyr::filter(.data$filter_id %in% relevant_players)

    model_data$dakota <- mgcv::predict.gam(dakota_model, model_data) |>
      as.vector()

    out <- add_to_this |>
      dplyr::left_join(
        model_data |>
          dplyr::select("player_id", "week", "season", "dakota"),
        by = c("player_id", "week", "season")
      )
  } else if (isFALSE(weekly)) {
    relevant_players <- add_to_this |>
      dplyr::filter(.data$attempts >= 5) |>
      dplyr::pull(.data$player_id)

    model_data <- df |>
      dplyr::group_by(.data$id) |>
      dplyr::summarize(
        n_plays = n(),
        epa_per_play = sum(.data$epa) / .data$n_plays,
        cpoe = mean(.data$cpoe, na.rm = TRUE)
      ) |>
      dplyr::ungroup() |>
      dplyr::mutate(cpoe = dplyr::if_else(is.na(.data$cpoe), 0, .data$cpoe)) |>
      dplyr::rename("player_id" = "id") |>
      dplyr::filter(.data$player_id %in% relevant_players)

    model_data$dakota <- mgcv::predict.gam(dakota_model, model_data) |>
      as.vector()

    out <- add_to_this |>
      dplyr::left_join(
        model_data |>
          dplyr::select("player_id", "dakota"),
        by = "player_id"
      )
  }
  return(out)
}


================================================
FILE: R/aggregate_game_stats_def.R
================================================
################################################################################
# Author: Christian Lohr, Sebastian Carl, Tan Ho
# Styleguide: styler::tidyverse_style()
################################################################################

#' Get Official Game Stats on Defense
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#'   [load_pbp()] or [build_nflfastR_pbp()]. If the data doesn't include the variable
#'   `qb_epa`, the function `add_qb_epa()` will be called to add it.
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats
#'   for the entire Data frame.
#' @description Build columns that aggregate official defense stats
#'   either at the game level or at the level of the entire data frame passed.
#' @return A data frame of defensive player stats. See dictionary (# TODO)
#' @export
#' @keywords internal
#' @seealso The function [load_player_stats()] and the corresponding examples
#' on [the nflfastR website](https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats)
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   # pbp <- nflfastR::load_pbp(2020)
#'
#'   # weekly <- calculate_player_stats_def(pbp, weekly = TRUE)
#'   # dplyr::glimpse(weekly)
#'
#'   # overall <- calculate_player_stats_def(pbp, weekly = FALSE)
#'   # dplyr::glimpse(overall)
#' })
#' }
#'

#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# what do we need:
#
# Solo Tackles --> done
# Tackles With Assist --> done
# Assisted Tackles --> done
# Tackles for Loss --> done
# TFL Yards --> done
# Sacks --> done
# Sack Yards --> done
# QB Hits --> done
# Passes Defensed --> done
# Interceptions --> done
# Interception Yards --> done
# Interception Return TDs ///// --> only "TD" for defense
# Forced Fumbles --> done
# Opp Fumble Recoveries --> done
# Opp Fumble Recovery Yards --> done
# Opp Fumble Recovery TDs ///// --> only "TD" for defense
# Safeties --> done
# Penalties --> done
# Penalty Yards --> done
# Fumbles --> done
# Own Fumble Recoveries --> done
# Own Fumble Recovery Yards --> done
# Own Fumble Recovery TDs ///// --> only "TD" for defense

calculate_player_stats_def <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats_def()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)")

  # Prepare data ------------------------------------------------------------

  suppressMessages({
    # 1. for "normal" plays: get plays that count in official stats
    # we exclude special teams and 2pts here for now
    data <- pbp |>
      dplyr::filter(
        !is.na(.data$down),
        .data$play_type %in% c("pass", "qb_kneel", "qb_spike", "run")
      ) |>
      nflfastR::decode_player_ids()

    # 2. filter penalty plays for penalty stats
    penalty_data <- pbp |>
      dplyr::filter(.data$penalty == 1) |>
      nflfastR::decode_player_ids()
  })

  stype <- data |>
    dplyr::select("season", "week", "season_type") |>
    dplyr::distinct()

  # Tackling stats -----------------------------------------------------------

  tackle_vars <- c(
    "solo_tackle_1_player_id",
    "tackle_for_loss_1_player_id",
    "assist_tackle_1_player_id",
    "tackle_with_assist_1_player_id",
    "solo_tackle_2_player_id",
    "forced_fumble_player_1_player_id",
    "assist_tackle_2_player_id",
    "forced_fumble_player_2_player_id"
  )

  # get tackling stats
  tackle_df <- data |>
    dplyr::select("season", "week", "defteam", dplyr::any_of(tackle_vars)) |>
    tidyr::pivot_longer(
      cols = dplyr::any_of(tackle_vars),
      names_to = "desc",
      values_to = "tackle_player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::count(
      .data$tackle_player_id,
      .data$defteam,
      .data$season,
      .data$week,
      .data$desc
    ) |>
    dplyr::mutate(
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fill = 0L,
      values_fn = sum
    ) |>
    add_column_if_missing(
      "solo_tackle",
      "tackle_with_assist",
      "tackle_for_loss",
      "assist_tackle",
      "forced_fumble_player"
    ) |>
    dplyr::mutate(
      tackles = .data$solo_tackle + .data$tackle_with_assist
    ) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "player_id" = "tackle_player_id",
      "tackles",
      "tackles_solo" = "solo_tackle",
      "tackles_with_assist" = "tackle_with_assist",
      "tackle_assists" = "assist_tackle",
      "forced_fumbles" = "forced_fumble_player",
      "tackles_for_loss" = "tackle_for_loss"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      tackles = sum(.data$tackles, na.rm = TRUE),
      tackles_solo = sum(.data$tackles_solo, na.rm = TRUE),
      tackles_with_assist = sum(.data$tackles_with_assist, na.rm = TRUE),
      tackle_assists = sum(.data$tackle_assists, na.rm = TRUE),
      forced_fumbles = sum(.data$forced_fumbles, na.rm = TRUE),
      tackles_for_loss = sum(.data$tackles_for_loss, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get tackle for loss yards
  tackle_yards_df <- data |>
    dplyr::filter(
      .data$tackled_for_loss == 1,
      .data$fumble == 0,
      .data$sack == 0
    ) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "tackle_for_loss_1_player_id",
      "tackle_for_loss_2_player_id",
      "yards_gained"
    ) |>
    tidyr::pivot_longer(
      cols = c("tackle_for_loss_1_player_id", "tackle_for_loss_2_player_id"),
      names_to = "desc",
      values_to = "player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      tfl_yards = sum(-.data$yards_gained, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Sack and QB Hits stats -----------------------------------------------------------

  # get sack and pressure stats
  pressure_df <- data |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      dplyr::contains("sack_"),
      "yards_gained",
      dplyr::starts_with("qb_hit_"),
      -dplyr::contains("_name")
    ) |>
    tidyr::pivot_longer(
      cols = c(
        dplyr::contains("sack_"),
        dplyr::starts_with("qb_hit_")
      ),
      names_to = "desc",
      names_prefix = "sk_",
      values_to = "player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(
      n = dplyr::case_when(
        .data$desc %in%
          c("half_sack_1_player_id", "half_sack_2_player_id") ~ 0.5,
        TRUE ~ 1
      ),
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]") |>
        stringr::str_remove("half_")
    ) |>
    dplyr::mutate(
      sack_yards = .data$n * .data$yards_gained * -1
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = c(.data$n, .data$sack_yards),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing("n_sack", "n_qb_hit", "sack_yards_sack") |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id",
      "sacks" = "n_sack",
      "qb_hit" = "n_qb_hit",
      "sack_yards" = "sack_yards_sack"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      sacks = sum(.data$sacks, na.rm = TRUE),
      qb_hit = sum(.data$qb_hit, na.rm = TRUE),
      sack_yards = sum(.data$sack_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Interception and Deflection stats ---------------------------------------------------------

  # get int and def stats
  int_df <- data |>
    dplyr::select(
      "season",
      "week",
      "return_yards",
      "team" = "defteam",
      dplyr::starts_with("interception_"),
      dplyr::starts_with("pass_defense_"),
      -dplyr::contains("_name")
    ) |>
    tidyr::pivot_longer(
      cols = c(
        dplyr::starts_with("interception_"),
        dplyr::starts_with("pass_defense_")
      ),
      names_to = "desc",
      names_prefix = "int_",
      values_to = "db_player_id",
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_player_id") |>
        stringr::str_remove_all("_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = "desc",
      values_from = c("n", "return_yards"),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing(
      "n_interception",
      "n_pass_defense",
      "return_yards_interception"
    ) |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id" = "db_player_id",
      "int" = "n_interception",
      "pass_defended" = "n_pass_defense",
      "int_yards" = "return_yards_interception"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      int = sum(.data$int, na.rm = TRUE),
      pass_defended = sum(.data$pass_defended, na.rm = TRUE),
      int_yards = sum(.data$int_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Safety stats -----------------------------------------------------------

  safety_df <- data |>
    dplyr::filter(.data$safety == 1, !is.na(.data$safety_player_id)) |>
    dplyr::select(
      "season",
      "week",
      "team" = "defteam",
      "player_id" = "safety_player_id"
    ) |>
    dplyr::count(
      .data$season,
      .data$week,
      .data$team,
      .data$player_id,
      name = "safety"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      safety = sum(.data$safety, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Fumble stats -----------------------------------------------------------

  # get fumble stats for fumbles and own fumble recoveries
  fumble_df_own <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumbled_1_team |
        .data$defteam == .data$fumbled_2_team
    ) |>
    dplyr::mutate(
      fumbled_1_player_id = dplyr::if_else(
        .data$defteam == .data$fumbled_1_team,
        .data$fumbled_1_player_id,
        NA_character_,
        NA_character_
      )
    ) |>
    dplyr::select(
      "season",
      "week",
      dplyr::matches("^fumble.+team"),
      dplyr::matches("^fumble.+player_id")
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("fumble"),
      names_pattern = "(.+)_(team|player_id)",
      names_to = c("desc", ".value")
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fn = sum,
      values_fill = 0L
    ) |>
    # Renaming fails if the columns don't exist. So we row bind a dummy tibble
    # including the relevant columns. The row will be filtered after renaming
    dplyr::bind_rows(
      tibble::tibble(
        player_id = NA_character_,
        fumbled = numeric(),
        fumble_recovery = numeric()
      )
    ) |>
    dplyr::rename(
      "fumble" = "fumbled",
      "fumble_recovery_own" = "fumble_recovery"
    ) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      fumble = sum(.data$fumble, na.rm = TRUE),
      fumble_recovery_own = sum(.data$fumble_recovery_own, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get fumble stats for opponent recoveries
  fumble_df_opp <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumble_recovery_1_team |
        .data$defteam == .data$fumble_recovery_2_team
    ) |>
    dplyr::mutate(
      # use data.table fifelse because base ifelse changed data type to logical
      # if there are 0 rows
      fumble_recovery_1_player_id = data.table::fifelse(
        .data$defteam != .data$fumbled_1_team,
        .data$fumble_recovery_1_player_id,
        NA_character_
      ),
      fumble_recovery_2_player_id = data.table::fifelse(
        .data$defteam != .data$fumbled_2_team,
        .data$fumble_recovery_2_player_id,
        NA_character_
      )
    ) |>
    dplyr::select(
      "season",
      "week",
      dplyr::matches("^fumble_recovery.+team"),
      dplyr::matches("^fumble_recovery.+player_id")
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("fumble"),
      names_pattern = "(.+)_(team|player_id)",
      names_to = c("desc", ".value")
    ) |>
    dplyr::mutate(
      n = 1,
      desc = stringr::str_remove_all(.data$desc, "_[0-9]")
    ) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = .data$n,
      values_fn = sum,
      values_fill = 0L
    ) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    add_column_if_missing("fumble_recovery") |>
    dplyr::rename("fumble_recovery_opp" = "fumble_recovery") |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      fumble_recovery_opp = sum(.data$fumble_recovery_opp, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # get fumble yards for own recoveries
  fumble_yds_own_data <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumbled_1_team |
        .data$defteam == .data$fumbled_2_team
    )

  fumble_yds_own_df <- fumble_yds_own_data |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$fumble_recovery_1_team,
      "player_id" = .data$fumble_recovery_1_player_id
    ) |>
    dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_1_yards)) |>
    dplyr::filter(!is.na(.data$player_id)) |> ### this happens when a fumble goes out of bounds. Noone gets yards --> NA/NA
    dplyr::bind_rows(
      fumble_yds_own_data |>
        dplyr::group_by(
          .data$season,
          .data$week,
          "team" = .data$fumble_recovery_2_team,
          "player_id" = .data$fumble_recovery_2_player_id
        ) |>
        dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_2_yards)) |>
        dplyr::filter(!is.na(.data$player_id))
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(fumble_recovery_yards_own = sum(.data$recovery_yards)) |>
    dplyr::ungroup()

  # get fumble yards for opp recoveries
  fumble_yds_opp_data <- data |>
    dplyr::filter(.data$fumble == 1 | .data$fumble_lost == 1) |>
    dplyr::filter(
      .data$defteam == .data$fumble_recovery_1_team,
      .data$defteam != .data$fumbled_1_team
    )

  fumble_yds_opp_df <- fumble_yds_opp_data |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$fumble_recovery_1_team,
      "player_id" = .data$fumble_recovery_1_player_id
    ) |>
    dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_1_yards)) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::bind_rows(
      fumble_yds_opp_data |>
        dplyr::group_by(
          .data$season,
          .data$week,
          "team" = .data$fumble_recovery_2_team,
          "player_id" = .data$fumble_recovery_2_player_id
        ) |>
        dplyr::summarise(recovery_yards = sum(.data$fumble_recovery_2_yards)) |>
        dplyr::filter(!is.na(.data$player_id))
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(fumble_recovery_yards_opp = sum(.data$recovery_yards)) |>
    dplyr::ungroup()

  # Penalty stats -----------------------------------------------------------

  # get penalty stats
  penalty_df <- penalty_data |>
    dplyr::filter(
      !is.na(.data$penalty_player_id),
      .data$defteam == .data$penalty_team
    ) |>
    dplyr::select(
      "season",
      "week",
      "penalty_yards",
      "penalty_team",
      "penalty_player_id"
    ) |>
    tidyr::pivot_longer(
      cols = dplyr::contains("penalty"),
      names_pattern = "(.+)_(team|player_id|yards)",
      names_to = c("desc", ".value"),
      values_drop_na = TRUE
    ) |>
    dplyr::mutate(n = 1) |>
    tidyr::pivot_wider(
      names_from = .data$desc,
      values_from = c(.data$n, .data$yards),
      values_fn = sum,
      values_fill = 0L
    ) |>
    add_column_if_missing("n_penalty", "yards_penalty") |>
    dplyr::select(
      "season",
      "week",
      "team",
      "player_id",
      "penalty" = "n_penalty",
      "penalty_yards" = "yards_penalty"
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$team, .data$player_id) |>
    dplyr::summarise(
      penalty = sum(.data$penalty, na.rm = TRUE),
      penalty_yards = sum(.data$penalty_yards, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # Touchdown stats -----------------------------------------------------------

  # get defensive touchdowns
  touchdown_df <- data |>
    dplyr::filter(.data$touchdown == 1) |>
    dplyr::filter(.data$defteam == .data$td_team) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      "team" = .data$td_team,
      "player_id" = .data$td_player_id
    ) |>
    dplyr::summarise(td = sum(.data$touchdown)) |>
    dplyr::ungroup()

  # Combine all stats -------------------------------------------------------

  # combine all the stats together

  player_df <- tackle_df |>
    dplyr::full_join(
      tackle_yards_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      pressure_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(int_df, by = c("season", "week", "player_id", "team")) |>
    dplyr::full_join(
      safety_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_df_own,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_df_opp,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_yds_own_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      fumble_yds_opp_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      penalty_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::full_join(
      touchdown_df,
      by = c("season", "week", "player_id", "team")
    ) |>
    dplyr::mutate_if(is.numeric, tidyr::replace_na, 0) |>
    dplyr::left_join(
      nflreadr::load_players() |>
        dplyr::select(
          "player_id" = "gsis_id",
          "player_display_name" = "display_name",
          "player_name" = "short_name",
          "position",
          "position_group",
          "headshot_url" = "headshot"
        ),
      by = "player_id"
    ) |>
    dplyr::left_join(stype, by = c("season", "week")) |>
    dplyr::select(dplyr::any_of(c(
      # game information
      "season",
      "week",
      "season_type",

      # id information
      "player_id",
      "player_name",
      "player_display_name",
      "position",
      "position_group",
      "headshot_url",
      "team",

      # tackle stats
      "def_tackles" = "tackles",
      "def_tackles_solo" = "tackles_solo",
      "def_tackles_with_assist" = "tackles_with_assist",
      "def_tackle_assists" = "tackle_assists",
      "def_tackles_for_loss" = "tackles_for_loss",
      "def_tackles_for_loss_yards" = "tfl_yards",
      "def_fumbles_forced" = "forced_fumbles",

      # pressure stats
      "def_sacks" = "sacks",
      "def_sack_yards" = "sack_yards",
      "def_qb_hits" = "qb_hit",

      # coverage stats
      "def_interceptions" = "int",
      "def_interception_yards" = "int_yards",
      "def_pass_defended" = "pass_defended",

      # misc stats
      "def_tds" = "td",
      "def_fumbles" = "fumble",
      "def_fumble_recovery_own" = "fumble_recovery_own",
      "def_fumble_recovery_yards_own" = "fumble_recovery_yards_own",
      "def_fumble_recovery_opp" = "fumble_recovery_opp",
      "def_fumble_recovery_yards_opp" = "fumble_recovery_yards_opp",
      "def_safety" = "safety",
      "def_penalty" = "penalty",
      "def_penalty_yards" = "penalty_yards"
    ))) |>
    dplyr::filter(!is.na(.data$player_id)) |>
    dplyr::arrange(.data$player_id, .data$season, .data$week)

  # if user doesn't want week-by-week input, aggregate the whole df
  if (isFALSE(weekly)) {
    player_df <- player_df |>
      dplyr::group_by(.data$player_id, .data$team) |>
      dplyr::summarise(
        player_name = custom_mode(.data$player_name),
        player_display_name = custom_mode(.data$player_display_name),
        games = dplyr::n(),
        position = custom_mode(.data$position),
        position_group = custom_mode(.data$position_group),
        headshot_url = custom_mode(.data$headshot_url),
        def_tackles = sum(.data$def_tackles),
        def_tackles_solo = sum(.data$def_tackles_solo),
        def_tackles_with_assist = sum(.data$def_tackles_with_assist),
        def_tackle_assists = sum(.data$def_tackle_assists),
        def_tackles_for_loss = sum(.data$def_tackles_for_loss),
        def_tackles_for_loss_yards = sum(.data$def_tackles_for_loss_yards),
        def_fumbles_forced = sum(.data$def_fumbles_forced),
        def_sacks = sum(.data$def_sacks),
        def_sack_yards = sum(.data$def_sack_yards),
        def_qb_hits = sum(.data$def_qb_hits),
        def_interceptions = sum(.data$def_interceptions),
        def_interception_yards = sum(.data$def_interception_yards),
        def_pass_defended = sum(.data$def_pass_defended),
        def_tds = sum(.data$def_tds),
        def_fumbles = sum(.data$def_fumbles),
        def_fumble_recovery_own = sum(.data$def_fumble_recovery_own),
        def_fumble_recovery_yards_own = sum(
          .data$def_fumble_recovery_yards_own
        ),
        def_fumble_recovery_opp = sum(.data$def_fumble_recovery_opp),
        def_fumble_recovery_yards_opp = sum(
          .data$def_fumble_recovery_yards_opp
        ),
        def_safety = sum(.data$def_safety),
        def_penalty = sum(.data$def_penalty),
        def_penalty_yards = sum(.data$def_penalty_yards)
      ) |>
      dplyr::ungroup() |>
      dplyr::select(
        "player_id",
        "player_name",
        "player_display_name",
        "games",
        "position",
        "position_group",
        "headshot_url",
        "team",
        dplyr::everything()
      )
  }

  player_df
}

# This function checks if the variables in ... exists as column
# names in the argument .data. If not, it adds those columns and assigns
# them the value in the argument value
add_column_if_missing <- function(.data, ..., value = 0L) {
  dots <- rlang::list2(...)
  new_cols <- dots[!dots %in% names(.data)]
  .data[, unlist(new_cols)] <- value
  .data
}


================================================
FILE: R/aggregate_game_stats_kicking.R
================================================
#' Summarize Kicking Stats
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated because we have a new, much better and
#' harmonized approach in [`calculate_stats()`].
#'
#' Build columns that aggregate kicking stats at the game level.
#'
#' @param pbp A Data frame of NFL play-by-play data typically loaded with
#' [load_pbp()] or [build_nflfastR_pbp()].
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise, stats for
#' the entire data frame in argument `pbp`.
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'     # pbp <- nflreadr::load_pbp(2021)
#'     # weekly <- calculate_player_stats_kicking(pbp, weekly = TRUE)
#'     # dplyr::glimpse(weekly)
#'
#'     # overall <- calculate_player_stats_kicking(pbp, weekly = FALSE)
#'     # dplyr::glimpse(overall)
#' })
#' }
#'
#' @return a dataframe of kicking stats
#' @seealso <https://nflreadr.nflverse.com/reference/load_player_stats.html> for the nflreadr function to download this from repo (`stat_type = "kicking"`)
#' @export
#' @keywords internal
calculate_player_stats_kicking <- function(pbp, weekly = FALSE) {
  lifecycle::deprecate_warn(
    "5.0",
    "calculate_player_stats_kicking()",
    "calculate_stats()"
  )

  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)")

  # First, creating a grouping variable object to toggle the weekly argument w/
  grp_vars <- if (isTRUE(weekly)) {
    list("season", "week", "season_type", "player_id", "team")
  } else if (isFALSE(weekly)) {
    list("player_id", "team")
  }
  grp_vars <- lapply(grp_vars, as.symbol)

  # Filtering down / creating a base dataset
  df_fg_or_pat <- pbp |>
    dplyr::group_by(.data$game_id, .data$posteam) |>
    dplyr::filter(
      .data$field_goal_attempt == 1 |
        .data$extra_point_attempt == 1 |
        .data$fixed_drive == max(.data$fixed_drive, na.rm = TRUE)
    ) |>
    dplyr::ungroup() |>
    dplyr::filter(!is.na(.data$kicker_player_id)) |>
    dplyr::select(
      "game_id",
      "season",
      "week",
      "season_type",
      "team" = "posteam",
      "player_name" = "kicker_player_name",
      "player_id" = "kicker_player_id",
      "dist" = "kick_distance",
      "field_goal_attempt",
      "fg_res" = "field_goal_result",
      "extra_point_attempt",
      "pat_res" = "extra_point_result",
      "fixed_drive",
      "score_differential"
    )

  # Field-goal relevant columns
  df_field_goals <- df_fg_or_pat |>
    dplyr::filter(.data$field_goal_attempt == 1) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::mutate(
      temp_made_idx = .data$fg_res == "made",
      temp_miss_idx = .data$fg_res == "missed",
      temp_block_idx = .data$fg_res == "blocked"
    ) |>
    dplyr::summarise(
      games_fg = list(unique(.data$game_id)),
      fg_made = sum(.data$temp_made_idx, na.rm = TRUE),
      fg_att = sum(.data$field_goal_attempt, na.rm = TRUE),
      fg_missed = sum(.data$temp_miss_idx, na.rm = TRUE),
      fg_blocked = sum(.data$temp_block_idx, na.rm = TRUE),
      fg_long = if (any(.data$temp_made_idx, na.rm = TRUE)) {
        max(.data$dist[.data$temp_made_idx], na.rm = TRUE)
      } else {
        NA_real_
      },
      fg_pct = round(.data$fg_made / .data$fg_att, 3L),
      fg_made_0_19 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 0, 19),
        na.rm = TRUE
      ),
      fg_made_20_29 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 20, 29),
        na.rm = TRUE
      ),
      fg_made_30_39 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 30, 39),
        na.rm = TRUE
      ),
      fg_made_40_49 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 40, 49),
        na.rm = TRUE
      ),
      fg_made_50_59 = sum(
        dplyr::between(.data$dist[.data$temp_made_idx], 50, 59),
        na.rm = TRUE
      ),
      fg_made_60_ = sum(.data$dist[.data$temp_made_idx] >= 60, na.rm = TRUE),
      fg_missed_0_19 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 0, 19),
        na.rm = TRUE
      ),
      fg_missed_20_29 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 20, 29),
        na.rm = TRUE
      ),
      fg_missed_30_39 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 30, 39),
        na.rm = TRUE
      ),
      fg_missed_40_49 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 40, 49),
        na.rm = TRUE
      ),
      fg_missed_50_59 = sum(
        dplyr::between(.data$dist[.data$temp_miss_idx], 50, 59),
        na.rm = TRUE
      ),
      fg_missed_60_ = sum(.data$dist[.data$temp_miss_idx] >= 60, na.rm = TRUE),
      fg_made_list = paste(
        stats::na.omit(.data$dist[.data$temp_made_idx]),
        collapse = ";"
      ),
      fg_missed_list = paste(
        stats::na.omit(.data$dist[.data$temp_miss_idx]),
        collapse = ";"
      ),
      fg_blocked_list = paste(
        stats::na.omit(.data$dist[.data$temp_block_idx]),
        collapse = ";"
      ),
      fg_made_distance = sum(.data$dist[.data$temp_made_idx], na.rm = TRUE),
      fg_missed_distance = sum(.data$dist[.data$temp_miss_idx], na.rm = TRUE),
      fg_blocked_distance = sum(.data$dist[.data$temp_block_idx], na.rm = TRUE),
      .groups = "drop"
    )

  # Extra points
  df_pat <- df_fg_or_pat |>
    dplyr::filter(.data$extra_point_attempt == 1) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      games_pat = list(unique(.data$game_id)),
      pat_made = sum(.data$pat_res == "good", na.rm = TRUE),
      pat_att = sum(.data$extra_point_attempt, na.rm = TRUE),
      pat_missed = sum(.data$pat_res == "failed", na.rm = TRUE),
      pat_blocked = sum(.data$pat_res == "blocked", na.rm = TRUE),
      pat_pct = round(.data$pat_made / .data$pat_att, 3L),
      .groups = "drop"
    )

  # The Game Winning kicks distance include up to one value at the weekly level
  # but can include multiple across the season. This is one way to account for that.
  # the downside is that the column names change depending on if it is weekly vs
  # seasonal.
  if (weekly) {
    gw_dist_name <- "gwfg_distance"
  } else {
    gw_dist_name <- "gwfg_distance_list"
  }

  # See the above note. I wonder if this should also include field goals that tie
  # the game but I kept the filter dplyr::between(score_differential, -2, 0) the way
  # that is was previously. If you do include field goals that send the game into OT,
  # then you'll probably need to include the gwfg_distance AND gwfg_distance_list columns
  # in the weekly data
  game_winners <- df_fg_or_pat |>
    dplyr::group_by(.data$game_id, .data$team) |>
    dplyr::filter(.data$fixed_drive == max(.data$fixed_drive, na.rm = TRUE)) |>
    dplyr::ungroup() |>
    dplyr::filter(
      .data$field_goal_attempt == 1,
      dplyr::between(.data$score_differential, -2, 0)
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      games_gwfg = list(unique(.data$game_id)),
      gwfg_att = dplyr::n(),
      !!gw_dist_name := if (weekly) {
        .data$dist
      } else {
        paste(stats::na.omit(.data$dist), collapse = ";")
      },
      gwfg_made = sum(.data$fg_res == "made", na.rm = TRUE),
      gwfg_missed = sum(.data$fg_res == "missed", na.rm = TRUE),
      gwfg_blocked = sum(.data$fg_res == "blocked", na.rm = TRUE),
      .groups = "drop"
    )

  # Prepping data to merge-in player names
  df_player_names <- nflreadr::load_players() |>
    dplyr::select(
      "player_id" = "gsis_id",
      "player_display_name" = "display_name",
      "player_name" = "short_name",
      "position",
      "position_group",
      "headshot_url" = "headshot"
    )

  # Joining all the data together and organizing the first few columns.
  full_kicks <- df_field_goals |>
    dplyr::full_join(df_pat, as.character(grp_vars)) |>
    dplyr::full_join(game_winners, as.character(grp_vars)) |>
    dplyr::left_join(df_player_names, "player_id") |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::mutate(
      games = length(unique(unlist(c(
        .data$games_fg,
        .data$games_pat,
        .data$games_gwfg
      ))))
    ) |>
    dplyr::ungroup() |>
    dplyr::select(
      dplyr::any_of(c("season", "week", "season_type")),
      "player_id",
      "team",
      "player_name",
      "player_display_name",
      "games",
      "position",
      "position_group",
      "headshot_url",
      dplyr::everything(),
      -c("games_fg", "games_pat", "games_gwfg")
    ) |>
    # replace "" with NA
    dplyr::mutate_all(~ replace(.x, nchar(.x) == 0 | is.nan(.x), NA)) |>
    # replace NA in attempt columns with 0
    dplyr::mutate_at(
      c("fg_att", "pat_att", "gwfg_att"),
      ~ tidyr::replace_na(.x, 0)
    )

  if (weekly) {
    full_kicks |>
      dplyr::select(-"games") |>
      dplyr::arrange(.data$player_id, .data$season, .data$week)
  } else {
    full_kicks |>
      dplyr::arrange(.data$player_id)
  }
}


================================================
FILE: R/build_nflfastR_pbp.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Wrapper around multiple nflfastR functions
# Code Style Guide: styler::tidyverse_style()
################################################################################

# The idea for this wrapper as well as some helper functions and the documentation
# style are heavily borrowed from the r-lib package pkgdown (https://github.com/r-lib/pkgdown/blob/master/R/build.r)

#' Build a Complete nflfastR Data Set
#'
#' @description
#' `build_nflfastR_pbp` is a convenient wrapper around 6 nflfastR functions:
#'
#' \itemize{
#'  \item{[fast_scraper()]}
#'  \item{[clean_pbp()]}
#'  \item{[add_qb_epa()]}
#'  \item{[add_xyac()]}
#'  \item{[add_xpass()]}
#'  \item{[decode_player_ids()]}
#' }
#'
#' Please see either the documentation of each function or
#' [the nflfastR Field Descriptions website](https://nflfastr.com/articles/field_descriptions.html)
#' to learn about the output.
#'
#' @inheritParams fast_scraper
#' @param decode If `TRUE`, the function [decode_player_ids()] will be executed.
#' @param rules If `FALSE`, printing of the header and footer in the console output will be suppressed.
#' @return An nflfastR play-by-play data frame like it can be loaded from <https://github.com/nflverse/nflverse-data>.
#' @details To load valid game_ids please use the package function [fast_scraper_schedules()].
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @export
#' @examples
#' \donttest{
#' # Build nflfastR pbp for the 2018 and 2019 Super Bowls
#' try({# to avoid CRAN test problems
#' build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
#' })
#'
#' # It is also possible to directly use the
#' # output of `load_schedules` as input
#' try({# to avoid CRAN test problems
#' nflreadr::load_schedules(2025) |>
#'   dplyr::slice_tail(n = 3) |>
#'   build_nflfastR_pbp()
#' })
#'
#' \dontshow{
#' # Close open connections for R CMD Check
#' future::plan("sequential")
#' }
#' }
build_nflfastR_pbp <- function(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...,
  decode = TRUE,
  rules = TRUE
) {
  if (!is.vector(game_ids) && is.data.frame(game_ids)) {
    game_ids <- game_ids$game_id
  }

  if (!is.vector(game_ids)) {
    cli::cli_abort("Param {.arg game_ids} is not a valid vector!")
  }

  if (isTRUE(decode) && !is_installed("gsisdecoder")) {
    cli::cli_abort(
      "Package {.pkg gsisdecoder} required for decoding. Please install it with {.code install.packages(\"gsisdecoder\")}."
    )
  }

  if (isTRUE(rules)) {
    rule_header("Build nflfastR Play-by-Play Data")
  }

  # nflfastR v6 stopped supporting the 1999 and 2000 seasons because of
  # inconsistent data sources. Data is still available through load_pbp
  # but we will not fix any issues.
  # It's possible to install nflfastR v5.2.0 to parse those seasons.
  # try pak::pak("nflverse/nflfastR@v5.2.0")
  game_ids <- check_for_dropped_seasons(game_ids)

  game_count <- ifelse(is.vector(game_ids), length(game_ids), nrow(game_ids))
  builder <- TRUE

  cli::cli_ul("{my_time()} | Start download of {game_count} game{?s}...")

  ret <- fast_scraper(
    game_ids = game_ids,
    dir = dir,
    ...,
    in_builder = builder
  ) |>
    clean_pbp(in_builder = builder) |>
    add_qb_epa(in_builder = builder) |>
    add_xyac(in_builder = builder) |>
    add_xpass(in_builder = builder)

  if (isTRUE(decode)) {
    ret <- decode_player_ids(ret, in_builder = builder)
  }

  if (isTRUE(rules)) {
    rule_footer("DONE")
  }

  make_nflverse_data(ret)
}


================================================
FILE: R/build_playstats.R
================================================
build_playstats <- function(
  seasons = nflreadr::most_recent_season(),
  stat_ids = 1:1000,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  skip_local = FALSE
) {
  if (is_sequential()) {
    cli::cli_alert_info(
      "It is recommended to use parallel processing when using this function. \\
        Please consider running {.code future::plan(\"multisession\")}! \\
        Will go on sequentially...",
      wrap = TRUE
    )
  }

  games <- nflreadr::load_schedules(seasons = seasons) |>
    dplyr::filter(!is.na(.data$result)) |>
    dplyr::pull(.data$game_id)

  p <- progressr::progressor(along = games)

  l <- furrr::future_map(
    games,
    function(id, p = NULL, dir, skip_local) {
      if (id %in% c("2000_03_SD_KC", "2000_06_BUF_MIA", "1999_01_BAL_STL")) {
        cli::cli_alert_warning(
          "We are missing raw game data of {.val {id}}. Skipping."
        )
        return(data.frame())
      }
      season <- substr(id, 1, 4)
      raw_data <- load_raw_game(id, dir = dir, skip_local = skip_local)
      if (season <= 2000) {
        drives <- raw_data[[1]][["drives"]] |>
          purrr::keep(is.list)
        out <- tibble::tibble(d = drives) |>
          tidyr::unnest_wider("d") |>
          tidyr::unnest_longer("plays") |>
          tidyr::unnest_wider("plays", names_sep = "_") |>
          dplyr::select("playId" = "plays_id", "playStats" = "plays_players") |>
          dplyr::mutate(
            playId = uniquify_ids(.data$playId)
          ) |>
          tidyr::unnest_longer("playStats") |>
          tidyr::unnest_longer("playStats") |>
          tidyr::unnest_wider("playStats") |>
          dplyr::mutate(
            playId = as.integer(.data$playId),
            statId = as.integer(.data$statId),
            yards = as.integer(.data$yards),
            team.id = NA_character_
          ) |>
          dplyr::select(-"sequence") |>
          dplyr::rename(
            team.abbreviation = "clubcode",
            gsis.Player.id = "playStats_id"
          ) |>
          tidyr::nest(
            playStats = c(
              "statId",
              "yards",
              "playerName",
              "team.id",
              "team.abbreviation",
              "gsis.Player.id"
            )
          )
      } else {
        out <- raw_data$data$viewer$gameDetail$plays[, c("playId", "playStats")]
      }
      out$game_id <- as.character(id)
      p(sprintf("ID=%s", as.character(id)))
      out
    },
    p = p,
    dir = dir,
    skip_local = skip_local
  )

  out <- data.table::rbindlist(l) |>
    tidyr::unnest(cols = c("playStats")) |>
    janitor::clean_names() |>
    dplyr::filter(.data$stat_id %in% stat_ids) |>
    dplyr::mutate(
      season = as.integer(substr(.data$game_id, 1, 4)),
      week = as.integer(substr(.data$game_id, 6, 7))
    ) |>
    decode_player_ids() |>
    dplyr::select(
      "game_id",
      "season",
      "week",
      "play_id",
      "stat_id",
      "yards",
      "team_abbr" = "team_abbreviation",
      "player_name",
      "gsis_player_id",
    ) |>
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ dplyr::na_if(.x, "")
    )
  out
}


================================================
FILE: R/calculate_series_conversion_rates.R
================================================
#' Compute Series Conversion Information from Play by Play
#'
#' @description A "Series" begins on a 1st and 10 and each team attempts to either earn
#'   a new 1st down (on offense) or prevent the offense from converting a new
#'   1st down (on defense). Series conversion rate represents how many series
#'   have been either converted to a new 1st down or ended in a touchdown.
#'   This function computes series conversion rates on offense and defense from
#'   nflverse play-by-play data along with other series results.
#'   The function automatically removes series that ended in a QB kneel down.
#'
#' @param pbp Play-by-play data as returned by [`load_pbp()`], [`build_nflfastR_pbp()`], or
#'   [`fast_scraper()`].
#' @param weekly If `TRUE`, returns week-by-week stats, otherwise,
#'   season-by-season stats in argument `pbp`.
#'
#' @return A data frame of series information including the following columns:
#' \describe{
#' \item{season}{The NFL season}
#' \item{team}{NFL team abbreviation}
#' \item{week}{Week if `weekly` is `TRUE`}
#' \item{off_n}{The number of series the offense played (excludes QB kneel
#' downs, kickoffs, extra point/two point conversion attempts, non-plays, and
#' plays that do not list a "posteam")}
#' \item{off_scr}{The rate at which a series ended in either new 1st down or
#' touchdown while the offense was on the field}
#' \item{off_scr_1st}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 1st down}
#' \item{off_scr_2nd}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 2nd down}
#' \item{off_scr_3rd}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 3rd down}
#' \item{off_scr_4th}{The rate at which an offense earned a 1st down
#' or scored a touchdown on 4th down}
#' \item{off_1st}{The rate of series that ended in a new 1st down while the
#' offense was on the field (does not include offensive touchdown)}
#' \item{off_td}{The rate of series that ended in an offensive touchdown while the
#' offense was on the field}
#' \item{off_fg}{The rate of series that ended in a field goal attempt while the
#' offense was on the field}
#' \item{off_punt}{The rate of series that ended in a punt while the
#' offense was on the field}
#' \item{off_to}{The rate of series that ended in a turnover (including on downs), in an
#' opponent score, or at the end of half (or game) while the
#' offense was on the field}
#' \item{def_n}{The number of series the defense played (excludes QB kneel
#' downs, kickoffs, extra point/two point conversion attempts, non-plays, and
#' plays that do not list a "posteam")}
#' \item{def_scr}{The rate at which a series ended in either new 1st down or
#' touchdown while the defense was on the field}
#' \item{def_scr_1st}{The rate at which a defense allowed a
#' 1st down or touchdown on 1st down}
#' \item{def_scr_2nd}{The rate at which a defense allowed a
#' 1st down or touchdown on 2nd down}
#' \item{def_scr_3rd}{The rate at which a defense allowed a
#' 1st down or touchdown on 3rd down}
#' \item{def_scr_4th}{The rate at which a defense allowed a
#' 1st down or touchdown on 4th down}
#' \item{def_1st}{The rate of series that ended in a new 1st down while the
#' defense was on the field (does not include offensive touchdown)}
#' \item{def_td}{The rate of series that ended in an offensive touchdown while the
#' defense was on the field}
#' \item{def_fg}{The rate of series that ended in a field goal attempt while the
#' defense was on the field}
#' \item{def_punt}{The rate of series that ended in a punt while the
#' defense was on the field}
#' \item{def_to}{The rate of series that ended in a turnover (including on downs), in an
#' opponent score, or at the end of half (or game) while the
#' defense was on the field}
#' }
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   pbp <- nflfastR::load_pbp(2021)
#'
#'   weekly <- calculate_series_conversion_rates(pbp, weekly = TRUE)
#'   dplyr::glimpse(weekly)
#'
#'   overall <- calculate_series_conversion_rates(pbp, weekly = FALSE)
#'   dplyr::glimpse(overall)
#' })
#' }
calculate_series_conversion_rates <- function(pbp, weekly = FALSE) {
  if (isTRUE(weekly)) {
    grp <- c("season", "team", "week")
  } else if (isFALSE(weekly)) {
    grp <- c("season", "team")
  }
  grp_vars <- lapply(grp, as.symbol)

  # Offense -----------------------------------------------------------------

  off_series <- pbp |>
    dplyr::filter(
      !is.na(.data$down),
      .data$series_result != "QB kneel"
      # .data$rush == 1 | .data$pass == 1
    ) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      team = .data$posteam,
      .data$series
    ) |>
    dplyr::summarise(
      conversion = dplyr::first(.data$series_success),
      result = dplyr::first(.data$series_result),
      last_down = dplyr::last(.data$down),
      .groups = "drop"
    )

  offense <- off_series |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      off_n = dplyr::n(),
      off_scr = mean(.data$conversion),
      off_scr_1st = mean(.data$last_down == 1 * .data$conversion),
      off_scr_2nd = mean(.data$last_down == 2 * .data$conversion),
      off_scr_3rd = mean(.data$last_down == 3 * .data$conversion),
      off_scr_4th = mean(.data$last_down == 4 * .data$conversion),
      off_1st = mean(.data$result == "First down"),
      off_td = mean(.data$result == "Touchdown"),
      off_fg = mean(.data$result %in% c("Field goal", "Missed field goal")),
      off_punt = mean(.data$result == "Punt"),
      off_to = mean(
        .data$result %in%
          c(
            "Turnover on downs",
            "Turnover",
            "Opp touchdown",
            "Safety",
            "End of half"
          )
      ),
      .groups = "drop"
    )

  # Defense -----------------------------------------------------------------

  def_series <- pbp |>
    dplyr::filter(
      !is.na(.data$down),
      .data$series_result != "QB kneel"
      # .data$rush == 1 | .data$pass == 1
    ) |>
    dplyr::group_by(
      .data$season,
      .data$week,
      team = .data$defteam,
      .data$series
    ) |>
    dplyr::summarise(
      conversion = dplyr::first(.data$series_success),
      result = dplyr::first(.data$series_result),
      last_down = dplyr::last(.data$down),
      .groups = "drop"
    )

  defense <- def_series |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      def_n = dplyr::n(),
      def_scr = mean(.data$conversion),
      def_scr_1st = mean(.data$last_down == 1 * .data$conversion),
      def_scr_2nd = mean(.data$last_down == 2 * .data$conversion),
      def_scr_3rd = mean(.data$last_down == 3 * .data$conversion),
      def_scr_4th = mean(.data$last_down == 4 * .data$conversion),
      def_1st = mean(.data$result == "First down"),
      def_td = mean(.data$result == "Touchdown"),
      def_fg = mean(.data$result %in% c("Field goal", "Missed field goal")),
      def_punt = mean(.data$result == "Punt"),
      def_to = mean(
        .data$result %in%
          c(
            "Turnover on downs",
            "Turnover",
            "Opp touchdown",
            "Safety",
            "End of half"
          )
      ),
      .groups = "drop"
    )

  # Offense + Defense -------------------------------------------------------

  combined <- dplyr::full_join(offense, defense, by = grp)

  combined
}


================================================
FILE: R/calculate_standings.R
================================================
#' Compute Division Standings and Conference Seeds from Play by Play
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated and replaced by [nflseedR::nfl_standings()].
#'
#' This function calculates division standings as well as playoff
#'   seeds per conference based on either nflverse play-by-play data or nflverse
#'   schedule data.
#'
#' @param nflverse_object Data object of class `nflverse_data`. Either schedules
#'   as returned by [`fast_scraper_schedules()`] or [`nflreadr::load_schedules()`].
#'   Or play-by-play data as returned by [`load_pbp()`], [`build_nflfastR_pbp()`], or
#'  [`fast_scraper()`].
#' @param playoff_seeds Number of playoff teams per conference. If `NULL` (the
#'   default), the function will try to split `nflverse_object` into seasons prior
#'   2020 (6 seeds) and 2020ff (7 seeds). If set to a numeric, it will be used
#'   for all seasons in `nflverse_object`!
#' @inheritParams nflseedR::compute_conference_seeds
#'
#' @keywords internal
#' @return A tibble with NFL regular season standings
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#'   # load nflverse data both schedules and pbp
#'   # scheds <- fast_scraper_schedules(2014)
#'   # pbp <- load_pbp(c(2018, 2021))
#'
#'   # calculate standings based on pbp
#'   # calculate_standings(pbp)
#'
#'   # calculate standings based on schedules
#'   # calculate_standings(scheds)
#' })
#' }
calculate_standings <- function(
  nflverse_object,
  tiebreaker_depth = 3,
  playoff_seeds = NULL
) {
  lifecycle::deprecate_warn(
    "5.1.0",
    "calculate_standings()",
    "nflseedR::nfl_standings()"
  )

  if (!inherits(nflverse_object, "nflverse_data")) {
    cli::cli_abort(
      "The function argument {.arg nflverse_object} has to be
                   of class {.cls nflverse_data}"
    )
  }

  rlang::check_installed(
    "nflseedR",
    "to compute standings.",
    compare = ">=",
    version = "1.0.2"
  )

  type <- attr(nflverse_object, "nflverse_type")

  if (type == "play by play data") {
    .standings_from_pbp(
      nflverse_object,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  } else if (type == "games and schedules") {
    .standings_from_games(
      nflverse_object,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  } else {
    cli::cli_abort(
      "Can only handle nflverse_type {.val play by play data} or
                   {.val games and schedules} and not {.val {type}}"
    )
  }
}

.standings_from_pbp <- function(pbp, tiebreaker_depth, playoff_seeds) {
  g <- pbp |>
    dplyr::filter(.data$season_type == "REG") |>
    dplyr::group_by(.data$game_id) |>
    dplyr::summarise(
      sim = dplyr::first(.data$season),
      game_type = dplyr::first(.data$season_type),
      week = dplyr::first(.data$week),
      away_team = dplyr::first(.data$away_team),
      home_team = dplyr::first(.data$home_team),
      result = dplyr::last(.data$home_score) - dplyr::last(.data$away_score)
    ) |>
    dplyr::ungroup() |>
    dplyr::select(-"game_id")

  if (is.null(playoff_seeds)) {
    g6 <- g |>
      dplyr::filter(.data$sim %in% 1999:2019)
    g7 <- g |>
      dplyr::filter(.data$sim >= 2020)
    dplyr::bind_rows(
      .compute_standings(
        g6,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 6
      ),
      .compute_standings(
        g7,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 7
      )
    )
  } else {
    .compute_standings(
      g,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  }
}

.standings_from_games <- function(games, tiebreaker_depth, playoff_seeds) {
  g <- games |>
    dplyr::filter(.data$game_type == "REG", !is.na(.data$result)) |>
    dplyr::select(
      "sim" = "season",
      "game_type",
      "week",
      "away_team",
      "home_team",
      "result"
    )

  if (is.null(playoff_seeds)) {
    g6 <- g |>
      dplyr::filter(.data$sim %in% 1999:2019)
    g7 <- g |>
      dplyr::filter(.data$sim >= 2020)
    dplyr::bind_rows(
      .compute_standings(
        g6,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 6
      ),
      .compute_standings(
        g7,
        tiebreaker_depth = tiebreaker_depth,
        playoff_seeds = 7
      )
    )
  } else {
    .compute_standings(
      g,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  }
}

.compute_standings <- function(games, tiebreaker_depth, playoff_seeds) {
  if (nrow(games) == 0) {
    return(data.frame())
  }
  suppressMessages({
    div <- nflseedR::compute_division_ranks(
      games,
      tiebreaker_depth = tiebreaker_depth
    )
    conf <- nflseedR::compute_conference_seeds(
      div,
      h2h = div$h2h,
      tiebreaker_depth = tiebreaker_depth,
      playoff_seeds = playoff_seeds
    )
  })
  conf$standings |>
    dplyr::select(-"exit", -"wins") |>
    dplyr::select("sim":"division", "div_rank", "seed", dplyr::everything()) |>
    dplyr::rename("season" = "sim", "wins" = "true_wins") |>
    dplyr::arrange(.data$season, .data$division, .data$div_rank, .data$seed) |>
    tibble::as_tibble()
}


================================================
FILE: R/calculate_stats.R
================================================
################################################################################
# Author: Sebastian Carl
################################################################################

#' Calculate NFL Stats
#'
#' Compute various NFL stats based off nflverse Play-by-Play data.
#'
#' @param seasons A numeric vector of 4-digit years associated with given NFL
#'  seasons - defaults to latest season. If set to TRUE, returns all available
#'  data since 1999. Ignored if argument `pbp` is not `NULL`.
#' @param summary_level Summarize stats by `"season"` or `"week"`.
#' @param stat_type Calculate `"player"` level stats or `"team"` level stats.
#' @param season_type One of `"REG"`, `"POST"`, or `"REG+POST"`. Filters
#'  data to regular season ("REG"), post season ("POST") or keeps all data.
#'  Only applied if `summary_level` == `"season"`.
#' @param pbp This argument allows passing a subset of nflverse play-by-play
#'  data, created with [build_nflfastR_pbp()] or loaded with [load_pbp()].
#'  Stats are then calculated based on the `game_id`s and `play_id`s in this
#'  subset of play-by-play data, rather then using the seasons specified in the
#'  `seasons` argument. The function will error if required variables are
#'  missing from the subset, but lists which variables are missing.
#'  If `pbp = NULL` (the default), all available games and plays from the
#'  `seasons` argument are used to calculate stats.
#'  Please use this responsibly, because the output is structurally identical
#'  to full seasons, even if plays have been filtered out. It may then appear
#'  as if the stats are incorrect. If `pbp` is not `NULL`, the function will add
#'  the attribute `"custom_pbp" = TRUE` to the function output to help identify
#'  stats that are possibly based on play-by-play subsets.
#'
#' @return A tibble of player/team stats summarized by season/week.
#' @seealso [nfl_stats_variables] for a description of all variables.
#' @seealso <https://nflfastr.com/articles/stats_variables.html> for a searchable
#' table of the stats variable descriptions.
#' @export
#'
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' stats <- calculate_stats(2023, "season", "player")
#' dplyr::glimpse(stats)
#' })
#' }
calculate_stats <- function(
  seasons = nflreadr::most_recent_season(),
  summary_level = c("season", "week"),
  stat_type = c("player", "team"),
  season_type = c("REG", "POST", "REG+POST"),
  pbp = NULL
) {
  summary_level <- rlang::arg_match(summary_level)
  stat_type <- rlang::arg_match(stat_type)
  season_type <- rlang::arg_match(season_type)
  custom_pbp <- !is.null(pbp)

  if (!custom_pbp) {
    pbp <- nflreadr::load_pbp(seasons = seasons)
  }

  # make sure (custom) pbp includes all required variables.
  # stats_validate_pbp will return all unique seasons in pbp.
  # We'll use this to download playstats for all seasons listed in pbp.
  seasons_in_pbp <- stats_validate_pbp(pbp)

  # we don't want groups to mess up something or slow us down.
  # this is only relevant if a user supplies grouped pbp data
  pbp <- dplyr::ungroup(pbp)

  if (season_type %in% c("REG", "POST") && summary_level == "season") {
    pbp <- dplyr::filter(pbp, .data$season_type == .env$season_type)
    if (nrow(pbp) == 0) {
      cli::cli_alert_warning(
        "Filtering {.val {seasons}} data to {.arg season_type} == \\
        {.val {season_type}} resulted in 0 rows. Returning empty tibble."
      )
      return(tibble::tibble())
    }
  }

  # defensive stats require knowledge of which team is on defense
  # special teams stats require knowledge of which plays were special teams plays
  playinfo <- pbp |>
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::summarise(
      off = .data$posteam,
      def = .data$defteam,
      special = as.integer(.data$special == 1)
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate_at(
      .vars = dplyr::vars("off", "def"),
      .funs = team_name_fn
    )

  season_type_from_pbp <- pbp |>
    dplyr::select("game_id", "season_type") |>
    dplyr::distinct()
  s_type_vctr <- season_type_from_pbp$season_type |>
    rlang::set_names(season_type_from_pbp$game_id)

  gwfg_attempts_from_pbp <- pbp |>
    dplyr::mutate(
      # final_posteam_score = data.table::fifelse(.data$posteam_type == "home", .data$home_score, .data$away_score),
      final_defteam_score = data.table::fifelse(
        .data$posteam_type == "home",
        .data$away_score,
        .data$home_score
      ),
      identifier = paste(.data$game_id, .data$play_id, sep = "_")
    ) |>
    dplyr::group_by(.data$game_id, .data$posteam) |>
    dplyr::mutate(
      # A game winning field goal attempt is
      # - a field goal attempt,
      # - in the posteam's final drive,
      # - where the posteam trailed the defteam by 2 points or less prior to the kick,
      # - and the defteam did not score afterwards
      is_gwfg_attempt = dplyr::case_when(
        .data$field_goal_attempt == 1 &
          .data$fixed_drive == max(.data$fixed_drive) &
          dplyr::between(.data$score_differential, -2, 0) &
          .data$defteam_score == .data$final_defteam_score ~ 1L,
        TRUE ~ 0L
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::filter(
      is_gwfg_attempt == 1L
    ) |>
    dplyr::select("identifier", "is_gwfg_attempt")
  gwfg_vctr <- gwfg_attempts_from_pbp$is_gwfg_attempt |>
    rlang::set_names(gwfg_attempts_from_pbp$identifier)

  # load_playstats defined below
  # more_stats = all stat IDs of one player in a single play
  # team_stats = all stat IDs of one team in a single play
  # all_stats = all stat IDs of a play, regardless of team (we need this for punting)
  # we need those to identify things like fumbles depending on playtype or
  # first downs depending on playtype
  playstats <- load_playstats(seasons = seasons_in_pbp) |>
    # apply filtering on play stats so that it matches only plays included
    # in pbp in case it was provided manually
    dplyr::semi_join(pbp, by = c("game_id", "play_id")) |>
    dplyr::rename("player_id" = "gsis_player_id", "team" = "team_abbr") |>
    dplyr::group_by(.data$season, .data$week, .data$play_id, .data$player_id) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      more_stats = paste0(";", paste(stat_id, collapse = ";"), ";")
    ) |>
    dplyr::group_by(.data$season, .data$week, .data$play_id, .data$team) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      team_stats = paste0(";", paste(stat_id, collapse = ";"), ";"),
      team_play_air_yards = sum((stat_id %in% 111:112) * yards)
    ) |>
    # need to group by game and play here to avoid mixing of play IDs of different
    # games in the same week
    dplyr::group_by(.data$game_id, .data$play_id) |>
    dplyr::mutate(
      # we wrap the collapsed string in ";" in order to search for the pattern
      # ";stat_id;" to avoid matching 1 with 10, 11, 21, etc.
      all_stats = paste0(";", paste(stat_id, collapse = ";"), ";"),
      play_punt_return_yards = sum((stat_id %in% 33:36) * yards)
    ) |>
    # compute team targets and team air yards for calculation of target share
    # and air yard share. Since it's relative, we need to be careful with the groups
    # depending on summary level
    dplyr::group_by(
      !!!rlang::data_syms(
        if (summary_level == "season") {
          c("season", "team")
        } else {
          c("season", "week", "team")
        }
      )
    ) |>
    dplyr::mutate(
      team_targets = sum(stat_id == 115),
      team_air_yards = sum((stat_id %in% 111:112) * yards)
    ) |>
    dplyr::ungroup() |>
    dplyr::left_join(
      playinfo,
      by = c("game_id", "play_id")
    ) |>
    dplyr::mutate(
      season_type = unname(s_type_vctr[.data$game_id]),
      is_gwfg_attempt = unname(gwfg_vctr[paste(
        .data$game_id,
        .data$play_id,
        sep = "_"
      )]) %ifna%
        0L
    )

  # Check combination of summary_level and stat_type to set a helper that is
  # used to create the grouping variables
  grp_id <- data.table::fcase(
    summary_level == "season" && stat_type == "player" , "10" ,
    summary_level == "season" && stat_type == "team"   , "20" ,
    summary_level == "week" && stat_type == "player"   , "30" ,
    summary_level == "week" && stat_type == "team"     , "40"
  )
  # grp_vctr is used as character vector for joining pbp stats
  grp_vctr <- switch(
    grp_id,
    "10" = c("season", "player_id"),
    "20" = c("season", "team"),
    "30" = c("season", "week", "player_id"),
    "40" = c("season", "week", "team")
  )
  # grp_vars is used as grouping variables
  grp_vars <- rlang::data_syms(grp_vctr)

  # Stats from PBP #####################
  # we want passing epa, rushing epa, and receiving epa
  # since these depend on different player id variables and filters,
  # we create separate dfs for these stats
  passing_stats_from_pbp <- pbp |>
    dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "passer_player_id",
      "qb_epa",
      "cpoe"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      passing_epa = sum(.data$qb_epa, na.rm = TRUE),
      # mean will return NaN if all values are NA, because we remove NA
      passing_cpoe = if (any(!is.na(.data$cpoe))) {
        mean(.data$cpoe, na.rm = TRUE)
      } else {
        NA_real_
      }
    ) |>
    dplyr::ungroup()

  rushing_stats_from_pbp <- pbp |>
    dplyr::filter(.data$play_type %in% c("run", "qb_kneel")) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "rusher_player_id",
      "epa"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      rushing_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  receiving_stats_from_pbp <- pbp |>
    dplyr::filter(!is.na(.data$receiver_player_id)) |>
    dplyr::select(
      "season",
      "week",
      "team" = "posteam",
      "player_id" = "receiver_player_id",
      "epa"
    ) |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      receiving_epa = sum(.data$epa, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  stats <- playstats |>
    dplyr::group_by(!!!grp_vars) |>
    dplyr::summarise(
      player_name = if (.env$stat_type == "player") {
        custom_mode(.data$player_name, na.rm = TRUE)
      } else {
        NULL
      },
      # Season Type #####################
      # if summary level is week, then we have to use the season type variable
      # from playstats as it could be REG or POST depending on the value of
      # the argument season_type
      # if summary level is season, then we collapse the values of season_type
      # this will make sure that season_type is only REG+POST if the user asked
      # for it AND if postseason data is available
      season_type = if (.env$summary_level == "week") {
        dplyr::first(.data$season_type)
      } else {
        paste(unique(.data$season_type), collapse = "+")
      },

      # Game ID #####################
      # it's not strictly necessary to output game_id because we have
      # season, week, team, and opponent information but it is convenient
      # to add this here
      # Only makes sense in case of weekly stats of course
      game_id = if (.env$summary_level == "week") {
        dplyr::first(.data$game_id)
      } else {
        NULL
      },

      # Team Info #####################
      # recent_team if we do a season summary of player stats
      # team if we do a week summary of player stats
      recent_team = if (.env$grp_id == "10") dplyr::last(.data$team) else NULL,
      team = if (.env$grp_id == "30") dplyr::first(.data$team) else NULL,
      # opponent team if we do week summaries
      opponent_team = if (.env$summary_level == "week") {
        data.table::fifelse(
          dplyr::first(.data$team) == dplyr::first(.data$off),
          dplyr::first(.data$def),
          dplyr::first(.data$off)
        )
      } else {
        NULL
      },

      # number of games is only relevant if we summarise the season
      games = if (.env$summary_level == "season") {
        dplyr::n_distinct(.data$game_id)
      } else {
        NULL
      },

      # Offense #####################
      completions = sum(stat_id %in% 15:16),
      attempts = sum(stat_id %in% c(14:16, 19)),
      passing_yards = sum((stat_id %in% 15:16) * yards),
      passing_tds = sum(stat_id == 16),
      passing_interceptions = sum(stat_id == 19),
      sacks_suffered = sum(stat_id == 20),
      sack_yards_lost = sum((stat_id == 20) * yards),
      sack_fumbles = sum(stat_id == 20 & has_id(52:54, more_stats)),
      sack_fumbles_lost = sum(stat_id == 20 & has_id(106, more_stats)),
      # includes incompletions (111 = complete, 112 = incomplete)
      passing_air_yards = sum((stat_id %in% 111:112) * yards),
      # passing yac equals passing yards - air yards on completed passes
      passing_yards_after_catch = .data$passing_yards -
        sum((stat_id == 111) * yards),
      passing_first_downs = sum((stat_id %in% 15:16) & has_id(4, team_stats)),
      passing_2pt_conversions = sum(stat_id == 77),
      # this is a player stat and we skip it in team stats
      pacr = if (.env$stat_type == "player") {
        .data$passing_yards / .data$passing_air_yards
      } else {
        NULL
      },
      # "Explosives" (see #550 for discussion about the definition)
      passing_10 = sum((stat_id %in% 15:16) * (yards >= 10)),
      passing_16 = sum((stat_id %in% 15:16) * (yards >= 16)),
      passing_20 = sum((stat_id %in% 15:16) * (yards >= 20)),
      passing_40 = sum((stat_id %in% 15:16) * (yards >= 40)),
      # dakota = requires pbp,

      carries = sum(stat_id %in% 10:11),
      rushing_yards = sum((stat_id %in% 10:13) * yards),
      rushing_tds = sum(stat_id %in% c(11, 13)),
      rushing_fumbles = sum((stat_id %in% 10:11) & has_id(52:54, more_stats)),
      rushing_fumbles_lost = sum(
        (stat_id %in% 10:11) & has_id(106, more_stats)
      ),
      rushing_first_downs = sum((stat_id %in% 10:11) & has_id(3, team_stats)),
      rushing_2pt_conversions = sum(stat_id == 75),
      # "Explosives" (see #550 for discussion about the definition)
      rushing_10 = sum((stat_id %in% 10:13) * (yards >= 10)),
      rushing_12 = sum((stat_id %in% 10:13) * (yards >= 12)),
      rushing_20 = sum((stat_id %in% 10:13) * (yards >= 20)),
      rushing_40 = sum((stat_id %in% 10:13) * (yards >= 40)),

      receptions = sum(stat_id %in% 21:22),
      targets = sum(stat_id == 115),
      receiving_yards = sum((stat_id %in% 21:24) * yards),
      receiving_tds = sum(stat_id %in% c(22, 24)),
      receiving_fumbles = sum((stat_id %in% 21:22) & has_id(52:54, more_stats)),
      receiving_fumbles_lost = sum(
        (stat_id %in% 21:22) & has_id(106, more_stats)
      ),
      # air_yards are counted in 111:112 but it is a passer stat not a receiver stat
      # so we count team air yards when a player accounted for a reception
      # team air yards will always equal the correct air yards as 111 and 112
      # cannot appear more than once per play.
      # If this ever changes, we can use pbp instead.
      receiving_air_yards = if (.env$stat_type == "player") {
        sum((stat_id == 115) * .data$team_play_air_yards)
      } else {
        .data$passing_air_yards
      },
      receiving_yards_after_catch = sum((stat_id == 113) * yards),
      receiving_first_downs = sum((stat_id %in% 21:22) & has_id(4, team_stats)),
      receiving_2pt_conversions = sum(stat_id == 104),
      # "Explosives" (see #550 for discussion about the definition)
      receiving_10 = sum((stat_id %in% 21:24) * (yards >= 10)),
      receiving_16 = sum((stat_id %in% 21:24) * (yards >= 16)),
      receiving_20 = sum((stat_id %in% 21:24) * (yards >= 20)),
      receiving_40 = sum((stat_id %in% 21:24) * (yards >= 40)),
      # these are player stats and we skip them in team stats
      racr = if (.env$stat_type == "player") {
        .data$receiving_yards / .data$receiving_air_yards
      } else {
        NULL
      },
      target_share = if (.env$stat_type == "player") {
        .data$targets / dplyr::first(.data$team_targets)
      } else {
        NULL
      },
      air_yards_share = if (.env$stat_type == "player") {
        .data$receiving_air_yards / dplyr::first(.data$team_air_yards)
      } else {
        NULL
      },
      wopr = if (.env$stat_type == "player") {
        1.5 * .data$target_share + 0.7 * .data$air_yards_share
      } else {
        NULL
      },

      special_teams_tds = sum((special == 1) & stat_id %in% td_ids()),

      # Defense #####################
      # def_tackles = ,
      def_tackles_solo = sum(stat_id == 79),
      def_tackles_with_assist = sum(stat_id == 80),
      def_tackle_assists = sum(stat_id == 82),
      def_tackles_for_loss = sum(stat_id == 402),
      def_tackles_for_loss_yards = sum((stat_id == 402) * yards),
      def_fumbles_forced = sum(stat_id == 91),
      def_sacks = sum(stat_id == 83) + 1 / 2 * sum(stat_id == 84),
      def_sack_yards = sum((stat_id == 83) * -yards) +
        1 / 2 * sum((stat_id == 84) * -yards),
      def_qb_hits = sum(stat_id == 110),
      def_interceptions = sum(stat_id %in% 25:26),
      def_interception_yards = sum((stat_id %in% 25:28) * yards),
      def_pass_defended = sum(stat_id == 85),
      def_tds = sum(team == def & special != 1 & stat_id %in% td_ids()),
      # stat ID 54 is a fumble out of bounds. It's never counted alone,
      # always in combination with 52 or 53.
      def_fumbles = sum((team == def) & stat_id %in% 52:53),
      def_safeties = sum(stat_id == 89),

      # Misc #####################
      # mostly yards gained after blocked punts or fgs
      misc_yards = sum((stat_id %in% 63:64) * yards),
      fumble_recovery_own = sum(stat_id %in% 55:56),
      # 57, 58 don't count as recovery because player received a
      # lateral after recovery by other player
      fumble_recovery_yards_own = sum((stat_id %in% 55:58) * yards),
      fumble_recovery_opp = sum(stat_id %in% 59:60),
      # 61, 62 don't count as recovery because player received a
      # lateral after recovery by other player
      fumble_recovery_yards_opp = sum((stat_id %in% 59:62) * yards),
      fumble_recovery_tds = sum(stat_id %in% c(56, 58, 60, 62)),
      penalties = sum(stat_id == 93),
      penalty_yards = sum((stat_id == 93) * yards),
      timeouts = if (.env$stat_type == "team") sum(stat_id == 68) else NULL,
      # we are missing some fumbles on offense (see 515) so we just add
      # totals here. These fumble stats count all fumbles regardless of
      # the unit the player was on. This means that all above fumble stats
      # are included here but we make sure not to loose any fumbles, esp. on offense
      fumbles_forced_by_opp = sum(stat_id == 52),
      fumbles_not_forced = sum(stat_id == 53),
      fumbles_out_of_bounds = sum(stat_id == 54),
      # we could tell users to just add the above three stats but fumbles are
      # a bit confusing overall so it is ok to add a total counter that doesn't
      # miss any fumbles.
      # stat ID 54 is a fumble out of bounds. It's never counted alone,
      # always in combination with 52 or 53. So we cannot add it to the total.
      fumbles_total = sum(stat_id %in% 52:53),
      fumbles_lost_total = sum(stat_id == 106),

      # Returning #####################
      punt_returns = sum(stat_id %in% 33:34),
      punt_return_yards = sum((stat_id %in% 33:36) * yards),
      # punt return tds are counted in special teams tds atm
      # punt_return_tds = sum(stat_id %in% c(34, 36)),
      kickoff_returns = sum(stat_id %in% 45:46),
      kickoff_return_yards = sum((stat_id %in% 45:48) * yards),
      # kickoff return tds are counted in special teams tds atm
      # kickoff_return_tds = sum(stat_id %in% c(46, 48)),

      # Kicking #####################
      fg_made = sum(stat_id == 70),
      fg_att = sum(stat_id %in% 69:71),
      fg_missed = sum(stat_id == 69),
      fg_blocked = sum(stat_id == 71),
      fg_long = max((stat_id == 70) * yards) %0% NA_integer_,
      # avoid 0/0 = NaN
      fg_pct = if (.data$fg_att > 0) .data$fg_made / .data$fg_att else NA_real_,
      fg_made_0_19 = sum((stat_id == 70) * (yards %between% c(0, 19))),
      fg_made_20_29 = sum((stat_id == 70) * (yards %between% c(20, 29))),
      fg_made_30_39 = sum((stat_id == 70) * (yards %between% c(30, 39))),
      fg_made_40_49 = sum((stat_id == 70) * (yards %between% c(40, 49))),
      fg_made_50_59 = sum((stat_id == 70) * (yards %between% c(50, 59))),
      fg_made_60_ = sum((stat_id == 70) * (yards >= 60)),
      fg_missed_0_19 = sum((stat_id == 69) * (yards %between% c(0, 19))),
      fg_missed_20_29 = sum((stat_id == 69) * (yards %between% c(20, 29))),
      fg_missed_30_39 = sum((stat_id == 69) * (yards %between% c(30, 39))),
      fg_missed_40_49 = sum((stat_id == 69) * (yards %between% c(40, 49))),
      fg_missed_50_59 = sum((stat_id == 69) * (yards %between% c(50, 59))),
      fg_missed_60_ = sum((stat_id == 69) * (yards >= 60)),
      fg_made_list = fg_list(stat_id, yards, collapse_id = 70),
      fg_missed_list = fg_list(stat_id, yards, collapse_id = 69),
      fg_blocked_list = fg_list(stat_id, yards, collapse_id = 71),
      fg_made_distance = sum((stat_id == 70) * yards),
      fg_missed_distance = sum((stat_id == 69) * yards),
      fg_blocked_distance = sum((stat_id == 71) * yards),
      pat_made = sum(stat_id == 72),
      pat_att = sum(stat_id %in% 72:74),
      pat_missed = sum(stat_id == 73),
      pat_blocked = sum(stat_id == 74),
      # avoid 0/0 = NaN
      pat_pct = if (.data$pat_att > 0) {
        .data$pat_made / .data$pat_att
      } else {
        NA_real_
      },
      gwfg_made = sum((stat_id == 70) * is_gwfg_attempt),
      gwfg_att = sum((stat_id %in% 69:71) * is_gwfg_attempt),
      gwfg_missed = sum((stat_id == 69) * is_gwfg_attempt),
      gwfg_blocked = sum((stat_id == 71) * is_gwfg_attempt),
      gwfg_distance = if (.env$summary_level == "week") {
        sum((stat_id %in% 69:71) * is_gwfg_attempt * yards)
      } else {
        NULL
      },
      gwfg_distance_list = if (.env$summary_level == "season") {
        fg_list(stat_id, yards, collapse_id = 69:71, gwfg = is_gwfg_attempt)
      } else {
        NULL
      },

      # Punts #####################
      # stat ID 2 counts blocked punts that do not count as punt
      pt_att = sum(stat_id %in% c(29, 31, 32)), # 31 probably unnecessary
      pt_blocked = sum(stat_id == 2),
      pt_long = max(stat_id %in% c(29, 32) * yards) %0% NA_integer_,
      pt_yards = sum(stat_id %in% c(29, 32) * yards),
      pt_inside_20 = sum(stat_id == 30),
      # the following stats are a bit special as we need opponent team stats
      # to get the counts right. That's what 'all_stats' is for
      # stat IDs 37, 38, 39 (punts oob, downed, fair caught) are assigned to
      # the receiving team (or to the receiver in case of 39)
      # Also the number of returns, return TDs and the yardage
      pt_out_of_bounds = sum((stat_id == 29) & has_id(37, all_stats)),
      pt_downed = sum((stat_id == 29) & has_id(38, all_stats)),
      pt_touchback = sum(stat_id == 32),
      pt_fair_caught = sum((stat_id == 29) & has_id(39, all_stats)),
      pt_returned = sum(
        (stat_id %in% c(2, 29, 31, 32)) & has_id(33:34, all_stats)
      ),
      pt_return_yards = sum(
        (stat_id %in% c(2, 29, 31, 32)) * .data$play_punt_return_yards
      ),
      pt_return_tds = sum(
        (stat_id %in% c(2, 29, 31, 32)) & has_id(c(34, 36), all_stats)
      ),
      pt_net_yards = .data$pt_yards -
        .data$pt_return_yards -
        .data$pt_touchback * 20L
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ dplyr::na_if(.x, "")
    ) |>
    # Join PBP Stats #####################
    dplyr::left_join(passing_stats_from_pbp, by = grp_vctr) |>
    dplyr::left_join(rushing_stats_from_pbp, by = grp_vctr) |>
    dplyr::left_join(receiving_stats_from_pbp, by = grp_vctr) |>
    # relocate epa variables. This could be done with dplyr::relocate
    # but we want to be compatible with older dplyr versions
    dplyr::select(
      "season":"passing_first_downs",
      "passing_epa",
      "passing_cpoe",
      "passing_2pt_conversions":"rushing_first_downs",
      "rushing_epa",
      "rushing_2pt_conversions":"receiving_first_downs",
      "receiving_epa",
      dplyr::everything()
    ) |>
    dplyr::arrange(!!!grp_vars)

  # Apply Player Modifications #####################
  if (stat_type == "player") {
    # need newer version of nflreadr to use load_players
    rlang::check_installed("nflreadr (>= 1.3.0)", "to join player information.")

    player_info <- nflreadr::load_players() |>
      dplyr::select(
        "player_id" = "gsis_id",
        "player_display_name" = "display_name",
        # "player_name" = "short_name",
        "position",
        "position_group",
        "headshot_url" = "headshot"
      )

    # load gsis_ids of RBs, FBs and HBs for RACR
    racr_ids <- player_info |>
      dplyr::filter(.data$position %in% c("RB", "FB", "HB")) |>
      dplyr::pull("player_id")

    stats <- stats |>
      dplyr::mutate(
        pacr = dplyr::case_when(
          is.nan(.data$pacr) ~ NA_real_,
          .data$passing_air_yards <= 0 ~ 0,
          TRUE ~ .data$pacr
        ),
        racr = dplyr::case_when(
          is.nan(.data$racr) ~ NA_real_,
          .data$receiving_air_yards == 0 ~ 0,
          # following Josh Hermsmeyer's definition, RACR stays < 0 for RBs (and FBs) and is set to
          # 0 for Receivers. The list "racr_ids" includes all known RB and FB gsis_ids
          .data$receiving_air_yards < 0 & !.data$player_id %in% racr_ids ~ 0,
          TRUE ~ .data$racr
        ),
        # Fantasy #####################
        fantasy_points = 1 /
          25 *
          .data$passing_yards +
          4 * .data$passing_tds +
          -2 * .data$passing_interceptions +
          1 / 10 * (.data$rushing_yards + .data$receiving_yards) +
          6 *
            (.data$rushing_tds +
              .data$receiving_tds +
              .data$special_teams_tds) +
          2 *
            (.data$passing_2pt_conversions +
              .data$rushing_2pt_conversions +
              .data$receiving_2pt_conversions) +
          -2 *
            (.data$sack_fumbles_lost +
              .data$rushing_fumbles_lost +
              .data$receiving_fumbles_lost),

        fantasy_points_ppr = .data$fantasy_points + .data$receptions
      ) |>
      dplyr::left_join(player_info, by = "player_id") |>
      dplyr::select(
        "player_id",
        "player_name",
        "player_display_name",
        "position",
        "position_group",
        "headshot_url",
        dplyr::everything()
      )
  }

  if (custom_pbp) {
    attr(stats, "custom_pbp") <- TRUE
  }

  stats
}

# Silence global vars NOTE
# We do this differently here because it's only a bunch of variables
# and the code is more readable
utils::globalVariables(c(
  "stat_id",
  "yards",
  "more_stats",
  "team_stats",
  "team",
  "def",
  "off",
  "special",
  "is_gwfg_attempt"
))

load_playstats <- function(seasons = nflreadr::most_recent_season()) {
  if (isTRUE(seasons)) {
    seasons <- seq(1999, nflreadr::most_recent_season())
  }

  stopifnot(
    is.numeric(seasons),
    seasons >= 1999,
    seasons <= nflreadr::most_recent_season()
  )

  urls <- paste0(
    "https://github.com/nflverse/nflverse-pbp/releases/download/playstats/play_stats_",
    seasons,
    ".rds"
  )

  out <- nflreadr::load_from_url(urls, seasons = TRUE, nflverse = FALSE)

  out
}

fg_list <- function(stat_ids, yards, collapse_id, gwfg = NULL) {
  if (is.null(gwfg)) {
    paste(
      yards[stat_ids == collapse_id],
      collapse = ";"
    )
  } else {
    paste(
      yards[stat_ids %in% collapse_id & gwfg == 1L],
      collapse = ";"
    )
  }
}

`%0%` <- function(lhs, rhs) if (lhs != 0) lhs else rhs

`%ifna%` <- function(lhs, rhs) data.table::fifelse(is.na(lhs), rhs, lhs)

has_id <- function(id, all_ids) {
  stringr::str_detect(all_ids, paste0(";", id, ";", collapse = "|"))
}

td_ids <- function() {
  c(
    11,
    13,
    16,
    18,
    22,
    24,
    26,
    28,
    34,
    36,
    46,
    48,
    # 56, 58, 60, 62, # 56-62 are separately counted in fumble_recovery_tds
    64,
    108
  )
}

stats_validate_pbp <- function(pbp) {
  required_names <- c(
    "season",
    "game_id",
    "play_id",
    "posteam",
    "defteam",
    "special",
    "season_type",
    "away_score",
    "home_score",
    "field_goal_attempt",
    "fixed_drive",
    "score_differential",
    "play_type",
    "week",
    "passer_player_id",
    "qb_epa",
    "cpoe",
    "rusher_player_id",
    "epa",
    "receiver_player_id"
  )
  available_names <- names(pbp)
  missing <- required_names[!required_names %in% available_names]
  if (length(missing) > 0) {
    cli::cli_abort(
      "You have passed custom pbp to the argument {.arg pbp} but \\
      it is missing the following required variable{?s}: {.val {missing}}",
      call = rlang::caller_env()
    )
  }
  unique(pbp$season) |>
    stats::na.omit() |>
    as.vector()
}


================================================
FILE: R/data_documentation.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Documenting Data Files
# Code Style Guide: styler::tidyverse_style()
################################################################################

#' NFL Team names, colors and logo urls.
#'
#' @docType data
#' @format A data frame with 36 rows and 10 variables containing NFL team level
#' information, including franchises in multiple cities:
#' \describe{
#'   \item{team_abbr}{Team abbreviation}
#'   \item{team_name}{Complete Team name}
#'   \item{team_id}{Team id used in the roster function}
#'   \item{team_nick}{Nickname}
#'   \item{team_conf}{Conference}
#'   \item{team_division}{Division}
#'   \item{team_color}{Primary color}
#'   \item{team_color2}{Secondary color}
#'   \item{team_color3}{Tertiary color}
#'   \item{team_color4}{Quaternary color}
#'   \item{team_logo_wikipedia}{Url to Team logo on wikipedia}
#'   \item{team_logo_espn}{Url to higher quality logo on espn}
#'   \item{team_wordmark}{Url to team wordmarks}
#'   \item{team_conference_logo}{Url to AFC and NFC logos}
#'   \item{team_league_logo}{Url to NFL logo}
#' }
#' The primary and secondary colors have been taken from nfl.com with some modifications
#' for better team distinction and most recent team color themes.
#' The tertiary and quaternary colors are taken from Lee Sharpe's teamcolors.csv
#' who has taken them from the `teamcolors` package created by Ben Baumer and
#' Gregory Matthews. The Wikipeadia logo urls are taken from Lee Sharpe's logos.csv
#' Team wordmarks from nfl.com
#' @examples
#' \donttest{
#' teams_colors_logos
#' }
"teams_colors_logos"

#' nflfastR Field Descriptions
#'
#' @docType data
#' @format A data frame including names and descriptions of all variables in
#' an nflfastR dataset.
#' @seealso The searchable table on the
#' [nflfastR website](https://nflfastr.com/articles/field_descriptions.html)
#' @examples
#' \donttest{
#' field_descriptions
#' }
"field_descriptions"

#' NFL Stat IDs and their Meanings
#'
#' @docType data
#' @format A data frame including NFL stat IDs, names and descriptions used in
#' an nflfastR dataset.
#' @source \url{http://www.nflgsis.com/gsis/Documentation/Partners/StatIDs.html}
#' @examples
#' \donttest{
#' stat_ids
#' }
"stat_ids"

#' NFL Stats Variables
#'
#' @docType data
#' @format A data frame explaining all variables returned by the function
#' [calculate_stats()].
#' @examples
#' \donttest{
#' nfl_stats_variables
#' }
"nfl_stats_variables"


================================================
FILE: R/database.R
================================================
#' Update or Create a nflverse Play-by-Play Data Table in a Connected Database
#'
#' @description
#' The nflfastR play-by-play era dates back to 1999. To analyze all the data
#' efficiently, there is practically no alternative to working with a database.
#'
#' This function helps to create and maintain a table containing all
#' play-by-play data of the nflfastR era in a connected database.
#' Primarily, the preprocessed data from [load_pbp] is written to the database
#' and, if necessary, supplemented with the latest games using
#' [build_nflfastR_pbp].
#'
#' @param conn A `DBIConnection` object, as returned by [DBI::dbConnect()]
#' @inheritParams rlang::args_dots_empty
#' @inheritParams DBI::dbExistsTable
#' @param seasons Hybrid argument (logical or numeric) to update parts
#' of or the complete play by play table within the database.
#'
#' It can update the play by play data table either for the whole nflfastR era
#' (with `seasons = TRUE`) or just for specified seasons
#' (e.g. `seasons = 2024:2025`).
#'
#' Defaults to [most_recent_season]. Please see details for further information.
#'
#' @details
#' ## The `seasons` argument
#'
#' The `seasons` argument controls how the table in the connected database is
#' handled.
#'
#' With `seasons = TRUE`, the table in argument `name` will be removed completely
#' (by calling [DBI::dbRemoveTable]) and all seasons of the nflfastR era will be
#' added to a fresh table. This is helpful when new columns are added during the
#' offseason.
#'
#' With a numerical vector, e.g. `seasons = 2024:2025`, the table in argument
#' `name` will be preserved and only rows from the given seasons will be deleted
#' and re-added (by calling [DBI::dbAppendTable]). This is intended to be used
#' for ongoing seasons because the NFL fixes bugs in the underlying data during
#' the week and we recommend rebuilding the current season every Thursday during
#' the season.
#'
#' The default behavior is `seasons = most_recent_season()`, which means that
#' only the most recent season is updated or added.
#'
#' To keep the table, and thus also the schema, but update all play-by-play
#' data of the nflfastR era, set
#' ```
#' seasons = seq(1999, most_recent_season())
#' ```
#'
#' If `seasons` contains multiple seasons, it is possible to control whether the
#' seasons are loaded individually and written to the database, or whether
#' multiple seasons should be processed in chunks. The latter is more efficient
#' because fewer write operations are required, but at the same time, the data
#' must first be stored in memory. The option `“nflfastR.db_chunk_size”` can
#' be used to control how many seasons are loaded together in a chunk and
#' written to the database. With the following option, for example, 5 seasons
#' are always loaded together and written to the database.
#' ```
#' options("nflfastR.db_chunk_size" = 5L)
#' ```
#'
#' @returns Always returns the database connection invisibly.
#' @export
#'
#' @examples
#' \donttest{
#' con <- DBI::dbConnect(duckdb::duckdb())
#' try({# to avoid CRAN test problems
#' update_pbp_db(con, seasons = 2024)
#' })
#' }
update_pbp_db <- function(
  conn,
  ...,
  name = "nflverse_pbp",
  seasons = most_recent_season()
) {
  rlang::check_installed("DBI", "to communicate with databases")
  rlang::check_dots_empty()

  # Validate connection and table name --------------------------------------

  if (!DBI::dbIsValid(conn)) {
    cli::cli_abort(
      "The connection in argument {.arg conn} is invalid. \\
      Do you need to run {.fun DBI::dbConnect}?"
    )
  }

  rule_header("Update nflverse Play-by-Play Data in Connected Database")

  # msg_name is the table name used in cli messages. We need it because `name`
  # could be a call to DBI::SQL() or DBI::Id()
  # I don't want to evaluate name in every subsequent function call, so I do it
  # here once and pass it around
  msg_name <- DBI::dbQuoteIdentifier(conn = conn, x = name) |>
    as.character()

  initiated <- FALSE
  if (!DBI::dbExistsTable(conn = conn, name = name)) {
    do_it <- confirm(
      "Table {.val {msg_name}} does not yet exist in your connected database.
      Do you wish to create it? (Y/n)"
    )
    if (do_it) {
      initiated <- db_initiate_pbp(
        conn = conn,
        name = name,
        msg_name = msg_name
      )
    } else {
      rule_footer("ABORTED")
      return(invisible(conn))
    }
  }

  # Validate seasons --------------------------------------------------------

  if (is.numeric(seasons)) {
    invalid <- setdiff(seasons, valid_seasons())
    if (length(invalid) > 0) {
      cli::cli_abort(
        "The following {cli::qty(length(invalid))} season{?s} {?is/are} \\
        invalid: {.val {invalid}}"
      )
    }
    ret <- db_drop_seasons(
      conn = conn,
      name = name,
      seasons = seasons,
      msg_name = msg_name
    )
  } else if (isTRUE(seasons)) {
    # We need this block inside if (isTRUE(seasons)) to make sure we run
    # the else block in the right conditions
    if (isFALSE(initiated)) {
      do_it <- confirm(
        "Purge table {.val {msg_name}} in your connected database? (Y/n)"
      )
      if (do_it) {
        ret <- DBI::dbRemoveTable(conn = conn, name = name)
        cli_message("Removed {.val {msg_name}}")
        initiated <- db_initiate_pbp(
          conn = conn,
          name = name,
          msg_name = msg_name
        )
      } else {
        rule_footer("ABORTED")
        return(invisible(conn))
      }
    }
  } else {
    cli::cli_abort(
      "Argument {.arg seasons} must be either a vector of valid \\
      seasons or scalar TRUE"
    )
  }

  seasons <- if (isTRUE(seasons)) valid_seasons() else seasons

  # Append seasons ----------------------------------------------------------
  ret <- db_write_pbp_seasons(
    conn = conn,
    name = name,
    seasons = seasons,
    msg_name = msg_name
  )

  # Process missing games ---------------------------------------------------
  db_games <- db_query_game_ids(conn = conn, name = name, seasons = seasons)
  completed_games <- completed_game_ids(seasons = seasons)
  missing_games <- setdiff(completed_games, db_games)

  # This block is only relevant on game days
  if (length(missing_games) > 0) {
    # we enter this block if some completed games are missing in load_pbp.
    # This can happen on game days
    vec <- cli::cli_vec(missing_games, list("vec-trunc" = 5L))
    cli_message(
      "The following {cli::no(length(missing_games))} game{?s} {?is/are} not \\
      yet available via {.fun load_pbp} and {?is/are} therefore parsed directly \\
      with {.fun build_nflfastR_pbp} and appended to table {.val {msg_name}}: \\
      {.val {vec}}"
    )
    # build pbp of missing games. If raw pbp isn't ready, the function will
    # return an empty dataframe for that game
    new_pbp <- build_nflfastR_pbp(missing_games, rules = FALSE)
    ret <- DBI::dbAppendTable(
      conn = conn,
      name = name,
      value = new_pbp
    )
    # Check how many new games have been added
    new_ids <- unique(new_pbp[["game_id"]])
    cli_message(
      "Appended {cli::no(length(new_ids))} game{?s} to table {.val {msg_name}}",
      .cli_fct = cli::cli_alert_success
    )
    # Let user know that some games are still missing
    still_missing <- setdiff(missing_games, new_ids)
    vec <- cli::cli_vec(still_missing, list("vec-trunc" = 5L))
    cli_message(
      "Raw pbp data for the following {cli::no(length(still_missing))} game{?s} \\
      still missing: {.val {vec}}. Please try again in about 10 minutes.",
      .cli_fct = cli::cli_alert_warning
    )
  }

  # Remove Dummy ------------------------------------------------------------
  ret <- db_remove_dummy(conn = conn, name = name)

  # Finish ------------------------------------------------------------------
  cli_message(
    "Database update completed",
    .cli_fct = cli::cli_alert_success
  )
  rule_footer("DONE")

  invisible(conn)
}

db_query_game_ids <- function(conn, name, seasons) {
  res <- DBI::dbGetQuery(
    conn = conn,
    statement = glue::glue_sql(
      "SELECT DISTINCT game_id FROM {`name`} WHERE season IN ({seasons*});",
      .con = conn
    )
  )
  res[["game_id"]]
}

db_remove_dummy <- function(conn, name) {
  n_drops <- DBI::dbExecute(
    conn = conn,
    statement = glue::glue_sql(
      "DELETE FROM {`name`} WHERE game_id IN ({vals*})",
      vals = "9999_99_DEF_TYP",
      .con = conn
    )
  )
  invisible(TRUE)
}

db_write_pbp_seasons <- function(conn, name, seasons, msg_name) {
  vec <- cli::cli_vec(seasons, list("vec-trunc" = 5L))
  cli_message(
    "Append {.val {vec}} {cli::qty(length(seasons))}\\
    season{?s} to table {.val {msg_name}}"
  )
  chunks <- compute_chunks(
    seasons,
    chunk_size = getOption("nflfastR.db_chunk_size", 1L)
  )
  p <- progressr::progressor(along = chunks)
  for (chunk in chunks) {
    ret <- DBI::dbAppendTable(
      conn = conn,
      name = name,
      value = load_pbp(seasons = chunk)
    )
    p("Appending...")
  }
  invisible(TRUE)
}

db_drop_seasons <- function(conn, name, seasons, msg_name) {
  vec <- cli::cli_vec(seasons, list("vec-trunc" = 5L))
  cli_message(
    "Drop {.val {vec}} {cli::qty(length(seasons))}\\
    season{?s} from table {.val {msg_name}}"
  )
  n_drops <- DBI::dbExecute(
    conn = conn,
    statement = glue::glue_sql(
      "DELETE FROM {`name`} WHERE season IN ({vals*})",
      vals = seasons,
      .con = conn
    )
  )
  invisible(TRUE)
}

db_initiate_pbp <- function(conn, name, msg_name) {
  cli_message(
    "Initiate table {.val {msg_name}} with nflverse pbp schema"
  )
  ret <- DBI::dbCreateTable(
    conn = conn,
    name = name,
    fields = default_play
  )
  invisible(TRUE)
}

completed_game_ids <- function(seasons) {
  scheds <- nflreadr::load_schedules(seasons = seasons)
  scheds <- data.table::setDT(scheds)
  scheds[
    !is.na(result) & !game_id %in% missing_raw_games,
    game_id
  ]
}
utils::globalVariables(c("result", "game_id"), add = TRUE)

missing_raw_games <- c("1999_01_BAL_STL", "2000_06_BUF_MIA", "2000_03_SD_KC")

valid_seasons <- function() {
  seq(1999, nflreadr::most_recent_season())
}

# https://stackoverflow.com/a/3321659
compute_chunks <- function(x, chunk_size = 4) {
  split(x, ceiling(seq_along(x) / chunk_size))
}

confirm <- function(msg, ..., .envir = parent.frame()) {
  cli::cli_alert_info(
    text = msg,
    wrap = FALSE,
    .envir = .envir
  )
  ans <- readline()
  tolower(ans) %in% c("", "y", "yes", "yeah", "yep")
}


================================================
FILE: R/ep_wp_calculators.R
================================================
#' Compute expected points
#'
#' for provided plays. Returns the data with
#' probabilities of each scoring event and EP added. The following columns
#' must be present: season, home_team, posteam, roof (coded as 'open',
#' 'closed', or 'retractable'), half_seconds_remaining, yardline_100,
#' ydstogo, posteam_timeouts_remaining, defteam_timeouts_remaining
#'
#' @param pbp_data Play-by-play dataset to estimate expected points for.
#' @details Computes expected points for provided plays. Returns the data with
#' probabilities of each scoring event and EP added. The following columns
#' must be present:
#' \itemize{
#' \item{season}
#' \item{home_team}
#' \item{posteam}
#' \item{roof (coded as 'outdoors', 'dome', or 'open'/'closed'/NA (retractable))}
#' \item{half_seconds_remaining}
#' \item{yardline_100}
#' \item{down}
#' \item{ydstogo}
#' \item{posteam_timeouts_remaining}
#' \item{defteam_timeouts_remaining}
#' }
#' @return The original pbp_data with the following columns appended to it:
#' \describe{
#' \item{ep}{expected points.}
#' \item{no_score_prob}{probability of no more scoring this half.}
#' \item{opp_fg_prob}{probability next score opponent field goal this half.}
#' \item{opp_safety_prob}{probability next score opponent safety  this half.}
#' \item{opp_td_prob}{probability of next score opponent touchdown this half.}
#' \item{fg_prob}{probability next score field goal this half.}
#' \item{safety_prob}{probability next score safety this half.}
#' \item{td_prob}{probability text score touchdown this half.}
#' }
#' @export
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' library(dplyr)
#' data <- tibble::tibble(
#' "season" = 1999:2019,
#' "home_team" = "SEA",
#' "posteam" = "SEA",
#' "roof" = "outdoors",
#' "half_seconds_remaining" = 1800,
#' "yardline_100" = c(rep(80, 17), rep(75, 4)),
#' "down" = 1,
#' "ydstogo" = 10,
#' "posteam_timeouts_remaining" = 3,
#' "defteam_timeouts_remaining" = 3
#' )
#'
#' nflfastR::calculate_expected_points(data) |>
#'   dplyr::select(season, yardline_100, td_prob, ep)
#' })
#' }
calculate_expected_points <- function(pbp_data) {
  # drop existing values of ep and the probs before making new ones
  pbp_data <- pbp_data |> dplyr::select(-dplyr::any_of(drop.cols))

  suppressWarnings(
    model_data <- pbp_data |>
      make_model_mutations() |>
      ep_model_select()
  )

  preds <- stats::predict(load_model("ep"), as.matrix(model_data))

  # xgboost v3 returns a matrix of predictions instead of a vector as returned
  # by xgboost v1.
  if (is.vector(preds)) {
    preds <- preds |>
      matrix(ncol = 7, byrow = TRUE) |>
      as.data.frame()
  } else if (is.matrix(preds)) {
    preds <- as.data.frame(preds)
  }

  colnames(preds) <- c(
    "td_prob",
    "opp_td_prob",
    "fg_prob",
    "opp_fg_prob",
    "safety_prob",
    "opp_safety_prob",
    "no_score_prob"
  )

  preds <- preds |>
    dplyr::mutate(
      ep = (-3 * .data$opp_fg_prob) +
        (-2 * .data$opp_safety_prob) +
        (-7 * .data$opp_td_prob) +
        (3 * .data$fg_prob) +
        (2 * .data$safety_prob) +
        (7 * .data$td_prob)
    ) |>
    dplyr::bind_cols(pbp_data)

  return(preds)
}

# helper column for ep calculator
drop.cols <- c(
  "ep",
  "td_prob",
  "opp_td_prob",
  "fg_prob",
  "opp_fg_prob",
  "safety_prob",
  "opp_safety_prob",
  "no_score_prob"
)


#' Compute win probability
#'
#' for provided plays. Returns the data with
#' probabilities of winning the game. The following columns
#' must be present: receive_h2_ko (1 if game is in 1st half and possession
#' team will receive 2nd half kickoff, 0 otherwise),
#' home_team, posteam, half_seconds_remaining, game_seconds_remaining,
#' spread_line (how many points home team was favored by), down, ydstogo,
#' yardline_100, posteam_timeouts_remaining, defteam_timeouts_remaining
#'
#' @param pbp_data Play-by-play dataset to estimate win probability for.
#' @details Computes win probability for provided plays. Returns the data with
#' spread and non-spread-adjusted win probabilities. The following columns
#' must be present:
#' \itemize{
#' \item{receive_2h_ko (1 if game is in 1st half and possession team will receive 2nd half kickoff, 0 otherwise)}
#' \item{score_differential}
#' \item{home_team}
#' \item{posteam}
#' \item{half_seconds_remaining}
#' \item{game_seconds_remaining}
#' \item{spread_line (how many points home team was favored by)}
#' \item{down}
#' \item{ydstogo}
#' \item{yardline_100}
#' \item{posteam_timeouts_remaining}
#' \item{defteam_timeouts_remaining}
#' }
#' @return The original pbp_data with the following columns appended to it:
#' \describe{
#' \item{wp}{win probability.}
#' \item{vegas_wp}{win probability taking into account pre-game spread.}
#' }
#' @export
#' @examples
#' \donttest{
#' try({# to avoid CRAN test problems
#' library(dplyr)
#' data <- tibble::tibble(
#' "receive_2h_ko" = 0,
#' "home_team" = "SEA",
#' "posteam" = "SEA",
#' "score_differential" = 0,
#' "half_seconds_remaining" = 1800,
#' "game_seconds_remaining" = 3600,
#' "spread_line" = c(1, 3, 4, 7, 14),
#' "down" = 1,
#' "ydstogo" = 10,
#' "yardline_100" = 75,
#' "posteam_timeouts_remaining" = 3,
#' "defteam_timeouts_remaining" = 3
#' )
#'
#' nflfastR::calculate_win_probability(data) |>
#'   dplyr::select(spread_line, wp, vegas_wp)
#' })
#' }
calculate_win_probability <- function(pbp_data) {
  # drop existing values of ep and the probs before making new ones
  pbp_data <- pbp_data |> dplyr::select(-dplyr::any_of(drop.cols.wp))

  suppressWarnings(
    model_data <- pbp_data |>
      dplyr::mutate(
        home = dplyr::if_else(.data$posteam == .data$home_team, 1, 0),
        posteam_spread = dplyr::if_else(
          .data$home == 1,
          .data$spread_line,
          -1 * .data$spread_line
        ),
        elapsed_share = (3600 - .data$game_seconds_remaining) / 3600,
        spread_time = .data$posteam_spread * exp(-4 * .data$elapsed_share),
        Diff_Time_Ratio = .data$score_differential /
          (exp(-4 * .data$elapsed_share))
      )
  )

  wp <- get_preds_wp(model_data) |>
    tibble::as_tibble() |>
    dplyr::rename(wp = "value")
  wp_spread <- get_preds_wp_spread(model_data) |>
    tibble::as_tibble() |>
    dplyr::rename(vegas_wp = "value")

  preds <- dplyr::bind_cols(
    pbp_data,
    wp,
    wp_spread
  )

  return(preds)
}

# helper column for wp calculator
drop.cols.wp <- c(
  "wp",
  "vegas_wp"
)


================================================
FILE: R/helper_add_cp_cpoe.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl
# Purpose: Function to add cp and cpoe variables.
# CP model created by Zach Feldman: https://github.com/z-feldman/Expected_Completion_NFL
# Code Style Guide: styler::tidyverse_style()
################################################################################

add_cp <- function(pbp) {
  # testing only
  # pbp <- g

  passes <- prepare_cp_data(pbp)

  if (!nrow(passes |> dplyr::filter(.data$valid_pass == 1)) == 0) {
    pred <- stats::predict(
      load_model("cp"),
      as.matrix(passes |> dplyr::select(-"complete_pass", -"valid_pass"))
    ) |>
      tibble::as_tibble() |>
      dplyr::rename(cp = "value") |>
      dplyr::bind_cols(passes) |>
      dplyr::select("cp", "valid_pass")

    pbp <- pbp |>
      dplyr::bind_cols(pred) |>
      dplyr::mutate(
        cp = dplyr::if_else(
          .data$valid_pass == 1,
          .data$cp,
          NA_real_
        ),
        cpoe = dplyr::if_else(
          !is.na(.data$cp),
          100 * (.data$complete_pass - .data$cp),
          NA_real_
        )
      ) |>
      dplyr::select(-"valid_pass")

    user_message("added cp and cpoe", "done")
  } else {
    pbp <- pbp |>
      dplyr::mutate(
        cp = NA_real_,
        cpoe = NA_real_
      )
    user_message(
      "No non-NA values for cp calculation detected. cp and cpoe set to NA",
      "info"
    )
  }

  return(pbp)
}


### helper function for getting the data ready
prepare_cp_data <- function(pbp) {
  # valid pass play: at least -15 air yards, less than 70 air yards, has intended receiver, has pass location
  passes <- pbp |>
    dplyr::mutate(
      receiver_player_name = stringr::str_extract(
        .data$desc,
        "(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})[A-Z][A-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?"
      ),
      pass_middle = dplyr::if_else(.data$pass_location == "middle", 1, 0),
      air_is_zero = dplyr::if_else(.data$air_yards == 0, 1, 0),
      distance_to_sticks = .data$air_yards - .data$ydstogo,
      valid_pass = dplyr::if_else(
        (.data$complete_pass == 1 |
          .data$incomplete_pass == 1 |
          .data$interception == 1) &
          !is.na(.data$air_yards) &
          .data$air_yards >= -15 &
          .data$air_yards < 70 &
          (!is.na(.data$receiver_player_name) |
            !is.na(.data$receiver_player_id)) &
          !is.na(.data$pass_location),
        1,
        0
      )
    ) |>
    dplyr::select(
      "complete_pass",
      "air_yards",
      "yardline_100",
      "ydstogo",
      "down1",
      "down2",
      "down3",
      "down4",
      "air_is_zero",
      "pass_middle",
      "era2",
      "era3",
      "era4",
      "qb_hit",
      "home",
      "outdoors",
      "retractable",
      "dome",
      "distance_to_sticks",
      "valid_pass"
    )
}


================================================
FILE: R/helper_add_ep_wp.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Functions to add ep(a) and wp(a) variables
# Code Style Guide: styler::tidyverse_style()
################################################################################

add_ep <- function(pbp) {
  out <- pbp |> add_ep_variables()
  user_message("added ep variables", "done")
  return(out)
}

add_air_yac_ep <- function(pbp) {
  if (nrow(pbp |> dplyr::filter(!is.na(.data$air_yards))) == 0) {
    out <- pbp |>
      dplyr::mutate(
        air_epa = NA_real_,
        yac_epa = NA_real_,
        comp_air_epa = NA_real_,
        comp_yac_epa = NA_real_,
        home_team_comp_air_epa = NA_real_,
        away_team_comp_air_epa = NA_real_,
        home_team_comp_yac_epa = NA_real_,
        away_team_comp_yac_epa = NA_real_,
        total_home_comp_air_epa = NA_real_,
        total_away_comp_air_epa = NA_real_,
        total_home_comp_yac_epa = NA_real_,
        total_away_comp_yac_epa = NA_real_,
        home_team_raw_air_epa = NA_real_,
        away_team_raw_air_epa = NA_real_,
        home_team_raw_yac_epa = NA_real_,
        away_team_raw_yac_epa = NA_real_,
        total_home_raw_air_epa = NA_real_,
        total_away_raw_air_epa = NA_real_,
        total_home_raw_yac_epa = NA_real_,
        total_away_raw_yac_epa = NA_real_
      )
    user_message(
      "No non-NA air_yards detected. air_yac_ep variables set to NA",
      "info"
    )
  } else {
    out <- pbp |> add_air_yac_ep_variables()
    user_message("added air_yac_ep variables", "done")
  }
  return(out)
}

add_wp <- function(pbp) {
  out <- pbp |> add_wp_variables()
  user_message("added wp variables", "done")
  return(out)
}

add_air_yac_wp <- function(pbp) {
  if (nrow(pbp |> dplyr::filter(!is.na(.data$air_yards))) == 0) {
    out <- pbp |>
      dplyr::mutate(
        air_wpa = NA_real_,
        yac_wpa = NA_real_,
        comp_air_wpa = NA_real_,
        comp_yac_wpa = NA_real_,
        home_team_comp_air_wpa = NA_real_,
        away_team_comp_air_wpa = NA_real_,
        home_team_comp_yac_wpa = NA_real_,
        away_team_comp_yac_wpa = NA_real_,
        total_home_comp_air_wpa = NA_real_,
        total_away_comp_air_wpa = NA_real_,
        total_home_comp_yac_wpa = NA_real_,
        total_away_comp_yac_wpa = NA_real_,
        home_team_raw_air_wpa = NA_real_,
        away_team_raw_air_wpa = NA_real_,
        home_team_raw_yac_wpa = NA_real_,
        away_team_raw_yac_wpa = NA_real_,
        total_home_raw_air_wpa = NA_real_,
        total_away_raw_air_wpa = NA_real_,
        total_home_raw_yac_wpa = NA_real_,
        total_away_raw_yac_wpa = NA_real_
      )
    user_message(
      "No non-NA air_yards detected. air_yac_wp variables set to NA",
      "info"
    )
  } else {
    out <- pbp |> add_air_yac_wp_variables()
    user_message("added air_yac_wp variables", "done")
  }
  return(out)
}

#get predictions for a set of pbp data
#for predict stage of EP
get_preds <- function(pbp) {
  if ("location" %in% names(pbp)) {
    pbp <- pbp |>
      dplyr::mutate(
        home = dplyr::if_else(.data$location == "Neutral", 0, .data$home)
      )
  }

  preds <- stats::predict(
    load_model("ep"),
    pbp |> ep_model_select() |> as.matrix()
  )

  # xgboost v3 returns a matrix of predictions instead of a vector as returned
  # by xgboost v1.
  if (is.vector(preds)) {
    preds <- preds |>
      matrix(ncol = 7, byrow = TRUE) |>
      as.data.frame()
  } else if (is.matrix(preds)) {
    preds <- as.data.frame(preds)
  }

  colnames(preds) <- c(
    "Touchdown",
    "Opp_Touchdown",
    "Field_Goal",
    "Opp_Field_Goal",
    "Safety",
    "Opp_Safety",
    "No_Score"
  )

  return(preds)
}

#get predictions for a set of pbp data
#for predict stage
get_preds_wp <- function(pbp) {
  preds <- stats::predict(load_model("wp"), as.matrix(pbp |> wp_model_select()))

  return(preds)
}

#get predictions for a set of pbp data
#for predict stage
get_preds_wp_spread <- function(pbp) {
  preds <- stats::predict(
    load_model("wp_spread"),
    as.matrix(pbp |> wp_spread_model_select())
  )

  return(preds)
}


#get the columns needed for ep predictions
#making sure they're in the right order
ep_model_select <- function(pbp) {
  pbp <- pbp |>
    dplyr::select(
      "half_seconds_remaining",
      "yardline_100",
      "home",
      "retractable",
      "dome",
      "outdoors",
      "ydstogo",
      "era0",
      "era1",
      "era2",
      "era3",
      "era4",
      "down1",
      "down2",
      "down3",
      "down4",
      "posteam_timeouts_remaining",
      "defteam_timeouts_remaining",
    )

  return(pbp)
}

#get the columns needed for wp predictions
#making sure they're in the right order
wp_model_select <- function(pbp) {
  pbp <- pbp |>
    dplyr::select(
      "receive_2h_ko",
      "home",
      "half_seconds_remaining",
      "game_seconds_remaining",
      "Diff_Time_Ratio",
      "score_differential",
      "down",
      "ydstogo",
      "yardline_100",
      "posteam_timeouts_remaining",
      "defteam_timeouts_remaining"
    )

  return(pbp)
}

#get the columns needed for wp predictions
#making sure they're in the right order
wp_spread_model_select <- function(pbp) {
  pbp <- pbp |>
    dplyr::select(
      "receive_2h_ko",
      "spread_time",
      "home",
      "half_seconds_remaining",
      "game_seconds_remaining",
      "Diff_Time_Ratio",
      "score_differential",
      "down",
      "ydstogo",
      "yardline_100",
      "posteam_timeouts_remaining",
      "defteam_timeouts_remaining"
    )

  return(pbp)
}
prepare_wp_data <- function(pbp) {
  if (any(is.na(pbp$spread_line))) {
    broken_games <- pbp |>
      dplyr::filter(is.na(.data$spread_line)) |>
      dplyr::pull(.data$game_id) |>
      unique() |>
      sort()
    cli::cli_alert_danger(
      "The following game{?s} {?is/are} missing valid spread lines: {.val {broken_games}}."
    )
    cli::cli_alert_warning(
      "nflfastR will manually set the spread for the home team to {.val 1.5} points!"
    )
    cli::cli_alert_warning(
      "If you see this, please reach out to the package maintainers {.url https://github.com/nflverse/nflfastR/issues}"
    )
    pbp$spread_line[is.na(pbp$spread_line)] <- 1.5
  }

  pbp <- pbp |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      receive_2h_ko = dplyr::if_else(
        .data$qtr <= 2 &
          .data$posteam == dplyr::first(stats::na.omit(.data$defteam)),
        1,
        0
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate(
      posteam_spread = dplyr::if_else(
        .data$home == 1,
        .data$spread_line,
        -1 * .data$spread_line
      ),
      elapsed_share = (3600 - .data$game_seconds_remaining) / 3600,
      spread_time = .data$posteam_spread * exp(-4 * .data$elapsed_share),
      Diff_Time_Ratio = .data$score_differential /
        (exp(-4 * .data$elapsed_share))
    )

  return(pbp)
}


#add ep variables
#All of these are heavily borrowed from nflscrapR (Maksim Horowitz, Ronald Yurko, and Samuel Ventura)
add_ep_variables <- function(pbp_data) {
  #testing
  #pbp_data <- g

  #this function is below
  base_ep_preds <- get_preds(pbp_data)

  # ----------------------------------------------------------------------------
  # ---- special case: deal with FG attempts
  # Now make another dataset that to get the EP probabilities from a missed FG:
  missed_fg_data <- pbp_data
  # Subtract 5.065401 from TimeSecs:
  missed_fg_data$half_seconds_remaining <- missed_fg_data$half_seconds_remaining -
    5.065401

  # Correct the yrdline100:
  missed_fg_data$yardline_100 <- 100 - (missed_fg_data$yardline_100 + 8)
  # Now first down:
  missed_fg_data$down1 <- rep(1, nrow(pbp_data))
  missed_fg_data$down2 <- rep(0, nrow(pbp_data))
  missed_fg_data$down3 <- rep(0, nrow(pbp_data))
  missed_fg_data$down4 <- rep(0, nrow(pbp_data))
  # 10 ydstogo:
  missed_fg_data$ydstogo <- rep(10, nrow(pbp_data))

  # Get the new predicted probabilites:
  if (nrow(missed_fg_data) > 1) {
    missed_fg_ep_preds <- get_preds(missed_fg_data)
  } else {
    missed_fg_ep_preds <- get_preds(missed_fg_data)
  }

  # Find the rows where TimeSecs_Remaining became 0 or negative and make all the probs equal to 0:
  end_game_i <- which(missed_fg_data$half_seconds_remaining <= 0)
  missed_fg_ep_preds[end_game_i, ] <- rep(0, ncol(missed_fg_ep_preds))

  # if the half ends, no one scored
  missed_fg_ep_preds[end_game_i, "No_Score"] <- 1

  # Get the probability of making the field goal:
  make_fg_prob <- as.numeric(mgcv::predict.bam(
    fastrmodels::fg_model,
    newdata = pbp_data,
    type = "response"
  ))

  # Multiply each value of the missed_fg_ep_preds by the 1 - make_fg_prob
  missed_fg_ep_preds <- missed_fg_ep_preds * (1 - make_fg_prob)
  # Find the FG attempts:
  fg_attempt_i <- which(pbp_data$play_type == "field_goal")

  # Now update the probabilities for the FG attempts (also includes Opp_Field_Goal probability from missed_fg_ep_preds)
  base_ep_preds[fg_attempt_i, "Field_Goal"] <- make_fg_prob[fg_attempt_i] +
    missed_fg_ep_preds[fg_attempt_i, "Opp_Field_Goal"]
  # Update the other columns based on the opposite possession:
  base_ep_preds[fg_attempt_i, "Touchdown"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "Opp_Touchdown"
  ]
  base_ep_preds[fg_attempt_i, "Opp_Field_Goal"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "Field_Goal"
  ]
  base_ep_preds[fg_attempt_i, "Opp_Touchdown"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "Touchdown"
  ]
  base_ep_preds[fg_attempt_i, "Safety"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "Opp_Safety"
  ]
  base_ep_preds[fg_attempt_i, "Opp_Safety"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "Safety"
  ]
  base_ep_preds[fg_attempt_i, "No_Score"] <- missed_fg_ep_preds[
    fg_attempt_i,
    "No_Score"
  ]

  # ----------------------------------------------------------------------------------
  # ---- special case: deal with kickoffs
  # Calculate the EP for receiving a touchback (from the point of view for recieving team)
  # and update the columns for Kickoff plays:
  kickoff_data <- pbp_data

  # Change the yard line to be 80 for 2009-2015 and 75 otherwise
  # (accounting for the fact that Jan 2016 is in the 2015 season:
  kickoff_data$yardline_100 <- with(kickoff_data, ifelse(season < 2016, 80, 75))
  # Now first down:
  kickoff_data$down1 <- rep(1, nrow(pbp_data))
  kickoff_data$down2 <- rep(0, nrow(pbp_data))
  kickoff_data$down3 <- rep(0, nrow(pbp_data))
  kickoff_data$down4 <- rep(0, nrow(pbp_data))
  # 10 ydstogo:
  kickoff_data$ydstogo <- rep(10, nrow(pbp_data))

  # Get the new predicted probabilites:
  kickoff_preds <- get_preds(kickoff_data)

  # Find the kickoffs:
  kickoff_i <- which(
    pbp_data$play_type == "kickoff" | pbp_data$kickoff_attempt == 1
  )

  # Now update the probabilities:
  base_ep_preds[kickoff_i, "Field_Goal"] <- kickoff_preds[
    kickoff_i,
    "Field_Goal"
  ]
  base_ep_preds[kickoff_i, "Touchdown"] <- kickoff_preds[kickoff_i, "Touchdown"]
  base_ep_preds[kickoff_i, "Opp_Field_Goal"] <- kickoff_preds[
    kickoff_i,
    "Opp_Field_Goal"
  ]
  base_ep_preds[kickoff_i, "Opp_Touchdown"] <- kickoff_preds[
    kickoff_i,
    "Opp_Touchdown"
  ]
  base_ep_preds[kickoff_i, "Safety"] <- kickoff_preds[kickoff_i, "Safety"]
  base_ep_preds[kickoff_i, "Opp_Safety"] <- kickoff_preds[
    kickoff_i,
    "Opp_Safety"
  ]
  base_ep_preds[kickoff_i, "No_Score"] <- kickoff_preds[kickoff_i, "No_Score"]

  # ----------------------------------------------------------------------------------
  # Insert probabilities of 0 for everything but No_Score for QB Kneels that
  # occur on the possession team's side of the field:
  # Find these QB Kneels:
  qb_kneels_i <- which(
    pbp_data$play_type == "qb_kneel" & pbp_data$yardline_100 > 50
  )

  # Now update the probabilities:
  base_ep_preds[qb_kneels_i, "Field_Goal"] <- 0
  base_ep_preds[qb_kneels_i, "Touchdown"] <- 0
  base_ep_preds[qb_kneels_i, "Opp_Field_Goal"] <- 0
  base_ep_preds[qb_kneels_i, "Opp_Touchdown"] <- 0
  base_ep_preds[qb_kneels_i, "Safety"] <- 0
  base_ep_preds[qb_kneels_i, "Opp_Safety"] <- 0
  base_ep_preds[qb_kneels_i, "No_Score"] <- 1

  # ----------------------------------------------------------------------------------
  # Create two new columns, ExPoint_Prob and TwoPoint_Prob, for the PAT events:
  base_ep_preds$ExPoint_Prob <- 0
  base_ep_preds$TwoPoint_Prob <- 0

  # Find the indices for these types of plays:
  extrapoint_i <- which(
    (pbp_data$play_type == "extra_point" |
      pbp_data$play_type_nfl == "XP_KICK") &
      (is.na(pbp_data$play_type_nfl) | pbp_data$play_type_nfl != "PAT2")
  )
  twopoint_i <- which(pbp_data$two_point_attempt == 1)

  #new: special case for PAT or kickoff with penalty
  #for inserting NAs
  st_penalty_i_1 <- which(
    # pat: prior play was TD or PAT or Timeout and next play is PAT and this play isn't a td and it's not a regular down
    (pbp_data$touchdown == 0 &
      is.na(pbp_data$down) &
      (dplyr::lag(pbp_data$touchdown) == 1 |
        dplyr::lag(pbp_data$play_type_nfl) == "XP_KICK" |
        dplyr::lag(pbp_data$timeout) == 1) &
      (dplyr::lead(pbp_data$two_point_attempt) == 1 |
        dplyr::lead(pbp_data$extra_point_attempt) == 1 |
        dplyr::lead(pbp_data$play_type_nfl) == "XP_KICK")) |
      #kickoff: prior play was PAT and next play is kickoff
      ((dplyr::lag(pbp_data$two_point_attempt) == 1 |
        dplyr::lag(pbp_data$extra_point_attempt) == 1) &
        dplyr::lead(pbp_data$kickoff_attempt == 1))
  )

  st_penalty_i_2 <- which(
    is.na(dplyr::lead(pbp_data$down)) &
      # has a key term in desc
      (((stringr::str_detect(pbp_data$desc, 'Kick formation') &
        is.na(pbp_data$down) &
        pbp_data$play_type == 'no_play') |
        (stringr::str_detect(pbp_data$desc, 'Pass formation') &
          is.na(pbp_data$down) &
          pbp_data$play_type == 'no_play') |
        (stringr::str_detect(pbp_data$desc, 'kicks onside') &
          is.na(pbp_data$down) &
          pbp_data$play_type == 'no_play') |
        (stringr::str_detect(pbp_data$desc, 'Offside on Free Kick') &
          is.na(pbp_data$down) &
          pbp_data$play_type == 'no_play') |
        (stringr::str_detect(pbp_data$desc, 'TWO-POINT CONVERSION')) &
          # down is NA and play type no play and next play isn't a kickoff
          is.na(pbp_data$down) &
          pbp_data$play_type == 'no_play' &
          dplyr::lead(pbp_data$kickoff_attempt) == 0))
  )

  # Assign the make_fg_probs of the extra-point PATs:
  base_ep_preds$ExPoint_Prob[extrapoint_i] <- make_fg_prob[extrapoint_i]

  # Assign the TwoPoint_Prob with the historical success rate:
  base_ep_preds$TwoPoint_Prob[twopoint_i] <- 0.4735

  # ----------------------------------------------------------------------------------
  # Insert NAs for timeouts and end of play rows:
  missing_i <- which(
    (pbp_data$timeout == 1 &
      pbp_data$play_type == "no_play" &
      !stringr::str_detect(pbp_data$desc, ' pass ') &
      !stringr::str_detect(pbp_data$desc, ' sacked ') &
      !stringr::str_detect(pbp_data$desc, ' scramble ') &
      !stringr::str_detect(pbp_data$desc, ' punts ') &
      !stringr::str_detect(pbp_data$desc, ' up the middle ') &
      !stringr::str_detect(pbp_data$desc, ' left end ') &
      !stringr::str_detect(pbp_data$desc, ' left guard ') &
      !stringr::str_detect(pbp_data$desc, ' left tackle ') &
      !stringr::str_detect(pbp_data$desc, ' right end ') &
      !stringr::str_detect(pbp_data$desc, ' right guard ') &
      !stringr::str_detect(pbp_data$desc, ' right tackle ')) |
      is.na(pbp_data$play_type)
  )

  # Now update the probabilities for missing and PATs:
  base_ep_preds$Field_Goal[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$Touchdown[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$Opp_Field_Goal[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$Opp_Touchdown[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$Safety[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$Opp_Safety[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0
  base_ep_preds$No_Score[c(
    missing_i,
    extrapoint_i,
    twopoint_i,
    st_penalty_i_1,
    st_penalty_i_2
  )] <- 0

  # Rename the events to all have _Prob at the end of them:
  base_ep_preds <- dplyr::rename(
    base_ep_preds,
    Field_Goal_Prob = "Field_Goal",
    Touchdown_Prob = "Touchdown",
    Opp_Field_Goal_Prob = "Opp_Field_Goal",
    Opp_Touchdown_Prob = "Opp_Touchdown",
    Safety_Prob = "Safety",
    Opp_Safety_Prob = "Opp_Safety",
    No_Score_Prob = "No_Score"
  )

  # Join them together:
  pbp_data <- cbind(pbp_data, base_ep_preds)

  # Calculate the ExpPts:
  pbp_data_ep <- dplyr::mutate(
    pbp_data,
    ExpPts = (0 * .data$No_Score_Prob) +
      (-3 * .data$Opp_Field_Goal_Prob) +
      (-2 * .data$Opp_Safety_Prob) +
      (-7 * .data$Opp_Touchdown_Prob) +
      (3 * .data$Field_Goal_Prob) +
      (2 * .data$Safety_Prob) +
      (7 * .data$Touchdown_Prob) +
      (1 * .data$ExPoint_Prob) +
      (2 * .data$TwoPoint_Prob)
  )

  #just going to set these to NA bc we have no way of calculating EPA for them
  if (length(st_penalty_i_1) > 0) {
    pbp_data_ep$ExpPts[st_penalty_i_1] <- NA_real_
  }

  if (length(st_penalty_i_2) > 0) {
    pbp_data_ep$ExpPts[st_penalty_i_2] <- NA_real_
  }

  pbp_data_ep$ExpPts[missing_i] <- NA_real_

  #################################################################
  # Calculate EPA:

  ### Adding Expected Points Added (EPA) column

  # Create multiple types of EPA columns
  # for each of the possible cases,
  # grouping by GameID (will then just use
  # an ifelse statement to decide which one
  # to use as the final EPA):
  pbp_data_ep |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      # Now conditionally assign the EPA, first for possession team
      # touchdowns:
      ep = .data$ExpPts,
      tmp_posteam = .data$posteam
    ) |>
    tidyr::fill(
      .data$ep,
      .direction = "up"
    ) |>
    tidyr::fill(
      .data$tmp_posteam,
      .direction = "up"
    ) |>
    dplyr::mutate(
      # get epa for non-scoring plays
      home_ep = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$ep,
        -.data$ep
      ),
      home_epa = dplyr::lead(.data$home_ep) - .data$home_ep,
      epa = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$home_epa,
        -.data$home_epa
      ),

      # td
      epa = dplyr::if_else(
        !is.na(.data$td_team),
        dplyr::if_else(
          .data$td_team == .data$posteam,
          7 - .data$ep,
          -7 - .data$ep
        ),
        .data$epa
      ),
      # Offense field goal:
      epa = dplyr::if_else(
        is.na(.data$td_team) & .data$field_goal_made == 1,
        3 - .data$ep,
        .data$epa,
        missing = .data$epa
      ),
      # Offense extra-point:
      epa = dplyr::if_else(
        is.na(.data$td_team) &
          .data$field_goal_made == 0 &
          .data$extra_point_good == 1,
        1 - .data$ep,
        .data$epa,
        missing = .data$epa
      ),
      # Offense two-point conversion:
      epa = dplyr::if_else(
        is.na(.data$td_team) &
          .data$field_goal_made == 0 &
          .data$extra_point_good == 0 &
          (.data$two_point_rush_good == 1 |
            .data$two_point_pass_good == 1 |
            .data$two_point_pass_reception_good == 1),
        2 - .data$ep,
        .data$epa,
        missing = .data$epa
      ),
      # Failed PAT (both 1 and 2):
      epa = dplyr::if_else(
        is.na(.data$td_team) &
          .data$field_goal_made == 0 &
          .data$extra_point_good == 0 &
          ((.data$extra_point_failed == 1 |
            .data$extra_point_blocked == 1 |
            .data$extra_point_aborted == 1) |
            (.data$two_point_rush_failed == 1 |
              .data$two_point_pass_failed == 1 |
              .data$two_point_pass_reception_failed == 1)),
        0 - .data$ep,
        .data$epa,
        missing = .data$epa
      ),
      # Opponent scores defensive 2 point:
      epa = dplyr::if_else(
        .data$defensive_two_point_conv == 1,
        -2 - .data$ep,
        .data$epa,
        missing = .data$epa
      ),
      # Safety:
      epa = dplyr::case_when(
        !is.na(.data$safety_team) & .data$safety_team == .data$posteam ~ 2 -
          .data$ep,
        !is.na(.data$safety_team) & .data$safety_team == .data$defteam ~ -2 -
          .data$ep,
        TRUE ~ .data$epa
      )
    ) |>
    # Now rename each of the expected points columns to match the style of
    # the updated code:
    dplyr::rename(
      no_score_prob = "No_Score_Prob",
      opp_fg_prob = "Opp_Field_Goal_Prob",
      opp_safety_prob = "Opp_Safety_Prob",
      opp_td_prob = "Opp_Touchdown_Prob",
      fg_prob = "Field_Goal_Prob",
      safety_prob = "Safety_Prob",
      td_prob = "Touchdown_Prob",
      extra_point_prob = "ExPoint_Prob",
      two_point_conversion_prob = "TwoPoint_Prob"
    ) |>
    # Create columns with cumulative epa totals for both teams:
    dplyr::mutate(
      # helper for end of game
      end_game = ifelse(
        stringr::str_detect(tolower(.data$desc), "(end of game)|(end game)"),
        1,
        0
      ),

      # Change epa for plays occurring at end of half with no scoring
      # plays to be just the difference between 0 and starting ep:
      epa = dplyr::if_else(
        ((.data$qtr == 2 &
          (dplyr::lead(.data$qtr) == 3 |
            dplyr::lead(.data$desc) == "END QUARTER 2")) |
          (.data$qtr == 4 &
            (dplyr::lead(.data$qtr) == 5 |
              dplyr::lead(.data$desc) == "END QUARTER 4" |
              dplyr::lead(.data$end_game) == 1))) &
          .data$sp == 0 &
          !is.na(.data$play_type),
        0 - .data$ep,
        .data$epa
      ),
      # last play of OT
      epa = dplyr::if_else(
        .data$qtr > 4 & dplyr::lead(.data$end_game) == 1 & .data$sp == 0,
        0 - .data$ep,
        .data$epa
      ),
      epa = dplyr::if_else(.data$desc == "END QUARTER 2", NA_real_, .data$epa),
      epa = dplyr::if_else(.data$end_game == 1, NA_real_, .data$epa),
      ep = dplyr::if_else(.data$desc == "END QUARTER 2", NA_real_, .data$ep),
      ep = dplyr::if_else(.data$end_game == 1, NA_real_, .data$ep),
      home_team_epa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$epa,
        -.data$epa
      ),
      away_team_epa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$epa,
        -.data$epa
      ),
      home_team_epa = dplyr::if_else(
        is.na(.data$home_team_epa),
        0,
        .data$home_team_epa
      ),
      away_team_epa = dplyr::if_else(
        is.na(.data$away_team_epa),
        0,
        .data$away_team_epa
      ),
      total_home_epa = cumsum(.data$home_team_epa),
      total_away_epa = cumsum(.data$away_team_epa),
      # Same thing but separating passing and rushing:
      home_team_rush_epa = dplyr::if_else(
        .data$play_type == "run",
        .data$home_team_epa,
        0
      ),
      away_team_rush_epa = dplyr::if_else(
        .data$play_type == "run",
        .data$away_team_epa,
        0
      ),
      home_team_rush_epa = dplyr::if_else(
        is.na(.data$home_team_rush_epa),
        0,
        .data$home_team_rush_epa
      ),
      away_team_rush_epa = dplyr::if_else(
        is.na(.data$away_team_rush_epa),
        0,
        .data$away_team_rush_epa
      ),
      total_home_rush_epa = cumsum(.data$home_team_rush_epa),
      total_away_rush_epa = cumsum(.data$away_team_rush_epa),
      home_team_pass_epa = dplyr::if_else(
        .data$play_type == "pass",
        .data$home_team_epa,
        0
      ),
      away_team_pass_epa = dplyr::if_else(
        .data$play_type == "pass",
        .data$away_team_epa,
        0
      ),
      home_team_pass_epa = dplyr::if_else(
        is.na(.data$home_team_pass_epa),
        0,
        .data$home_team_pass_epa
      ),
      away_team_pass_epa = dplyr::if_else(
        is.na(.data$away_team_pass_epa),
        0,
        .data$away_team_pass_epa
      ),
      total_home_pass_epa = cumsum(.data$home_team_pass_epa),
      total_away_pass_epa = cumsum(.data$away_team_pass_epa)
    ) |>
    dplyr::ungroup()
}


#################################################################
# Calculate WP and WPA:
add_wp_variables <- function(pbp_data) {
  #testing only
  # pbp_data <- g

  # Initialize the df to store predicted win probability
  OffWinProb <- rep(NA_real_, nrow(pbp_data))
  OffWinProb_spread <- rep(NA_real_, nrow(pbp_data))

  pbp_data <- pbp_data |>
    prepare_wp_data()

  # First check if there's any overtime plays:
  if (any(pbp_data$qtr > 4)) {
    # Find the rows that are overtime:
    overtime_i <- which(pbp_data$qtr > 4)

    # Separate the dataset into regular_df and overtime_df:
    overtime_df <- pbp_data[overtime_i, ]

    # Separate routine for overtime:

    # Create a column that is just the first drive of overtime repeated:
    overtime_df$First_Drive <- rep(
      min(overtime_df$drive, na.rm = TRUE),
      nrow(overtime_df)
    )

    # Calculate the difference in drive number
    overtime_df <- dplyr::mutate(
      overtime_df,
      Drive_Diff = .data$drive - .data$First_Drive
    )

    # Create an indicator column that means the posteam is losing by 3 and
    # its the second drive of overtime:
    overtime_df$One_FG_Game <- ifelse(
      overtime_df$score_differential == -3 &
        overtime_df$Drive_Diff == 1,
      1,
      0
    )

    # Now create a copy of the dataset to then make the EP predictions for when
    # a field goal is scored and its not sudden death:
    overtime_df_ko <- overtime_df

    overtime_df_ko$yrdline100 <- with(
      overtime_df_ko,
      ifelse(
        game_year < 2016 |
          (game_year == 2016 & game_month < 4),
        80,
        75
      )
    )

    # Now first down:
    overtime_df_ko$down1 <- rep(1, nrow(overtime_df_ko))
    overtime_df_ko$down2 <- rep(0, nrow(overtime_df_ko))
    overtime_df_ko$down3 <- rep(0, nrow(overtime_df_ko))
    overtime_df_ko$down4 <- rep(0, nrow(overtime_df_ko))
    # 10 ydstogo:
    overtime_df_ko$ydstogo <- rep(10, nrow(overtime_df_ko))

    # Get the predictions from the EP model and calculate the necessary probability:
    overtime_df_ko_preds <- get_preds(overtime_df_ko)

    overtime_df_ko_preds <- dplyr::mutate(
      overtime_df_ko_preds,
      Win_Back = .data$No_Score +
        .data$Opp_Field_Goal +
        .data$Opp_Safety +
        .data$Opp_Touchdown
    )

    # Calculate the two possible win probability types, Sudden Death and one Field Goal:
    overtime_df$Sudden_Death_WP <- overtime_df$fg_prob +
      overtime_df$td_prob +
      overtime_df$safety_prob
    overtime_df$One_FG_WP <- overtime_df$td_prob +
      (overtime_df$fg_prob * overtime_df_ko_preds$Win_Back)

    # Decide which win probability to use:
    OffWinProb[overtime_i] <- ifelse(
      overtime_df$game_year >= 2012 &
        (overtime_df$Drive_Diff == 0 |
          (overtime_df$Drive_Diff == 1 & overtime_df$One_FG_Game == 1)),
      overtime_df$One_FG_WP,
      overtime_df$Sudden_Death_WP
    )
    OffWinProb_spread[overtime_i] <- OffWinProb[overtime_i]
  }

  #regulation plays
  regular_i <- which(pbp_data$qtr <= 4)

  # df of just the regulation plays:
  regular_df <- pbp_data[regular_i, ]

  # do predictions for the regular df
  OffWinProb[regular_i] <- get_preds_wp(regular_df)
  OffWinProb_spread[regular_i] <- get_preds_wp_spread(regular_df)

  ## set to NA WP for plays down is missing
  # for kickoffs and PATs, these will get overwritten by the fixes after this

  down_na <- which(is.na(pbp_data$down))
  OffWinProb[down_na] <- NA_real_
  OffWinProb_spread[down_na] <- NA_real_

  ## start PAT fix

  make_pat_prob <- as.numeric(
    mgcv::predict.bam(
      fastrmodels::fg_model,
      newdata = pbp_data |>
        mutate(
          yardline_100 = ifelse(.data$season >= 2015, 15, 3)
        ),
      type = "response"
    )
  )

  # plays with 1 point PAT attempts
  pat_i <- which(
    (pbp_data$kickoff_attempt == 0 &
      !(stringr::str_detect(pbp_data$desc, 'Onside Kick')) &
      (stringr::str_detect(pbp_data$desc, 'Kick formation')) &
      is.na(pbp_data$down)) |
      # or has PAT indicators
      stringr::str_detect(pbp_data$desc, 'extra point') |
      !is.na(pbp_data$extra_point_result)
  )

  # plays with 2 point PAT attempts
  two_pt_i <- which(
    (pbp_data$kickoff_attempt == 0 &
      !(stringr::str_detect(pbp_data$desc, 'Onside Kick')) &
      (stringr::str_detect(pbp_data$desc, 'Pass formation')) &
      is.na(pbp_data$down)) |
      # or has PAT indicators
      stringr::str_detect(pbp_data$desc, 'TWO-POINT CONVERSION ATTEMPT') |
      !is.na(pbp_data$two_point_conv_result)
  )

  # some rare 2 point PAT attempts have duplicated matches in 1 point PAT attempts
  # so we remove them in the next line
  pat_i <- pat_i[!pat_i %in% two_pt_i]

  # make df of post-PAT plays
  pat_data <- pbp_data |>
    dplyr::mutate(
      # swap timeouts
      to_pos = .data$posteam_timeouts_remaining,
      to_def = .data$defteam_timeouts_remaining,
      posteam_timeouts_remaining = .data$to_def,
      defteam_timeouts_remaining = .data$to_pos,
      # swap score
      score_differential = -.data$score_differential,
      # 1st and 10
      down = 1,
      ydstogo = 10,
      # flip receive_2h_ko var
      receive_2h_ko = case_when(
        .data$qtr <= 2 & .data$receive_2h_ko == 0 ~ 1,
        .data$qtr <= 2 & .data$receive_2h_ko == 1 ~ 0,
        TRUE ~ .data$receive_2h_ko
      ),
      # switch posteam
      posteam = if_else(
        .data$home_team == .data$posteam,
        .data$away_team,
        .data$home_team
      ),
      yardline_100 = 75
    ) |>
    dplyr::mutate(
      home = case_when(
        .data$home == 0 ~ 1,
        .data$home == 1 ~ 0
      ),
      posteam_spread = dplyr::if_else(
        .data$home == 1,
        .data$spread_line,
        -1 * .data$spread_line
      ),
      elapsed_share = (3600 - .data$game_seconds_remaining) / 3600,
      spread_time = .data$posteam_spread * exp(-4 * .data$elapsed_share)
    )

  ## start with spread version
  # get pat if 0, 1, or 2
  pat_0 <- get_preds_wp_spread(pat_data |> add_esdtr())
  pat_1 <- get_preds_wp_spread(
    pat_data |>
      dplyr::mutate(score_differential = .data$score_differential - 1) |>
      add_esdtr()
  )
  pat_2 <- get_preds_wp_spread(
    pat_data |>
      dplyr::mutate(score_differential = .data$score_differential - 2) |>
      add_esdtr()
  )

  # Using nflscrapR version of 2pt make prob on 2nd line here
  pat_go_for_1 <- 1 - (make_pat_prob * pat_1 + (1 - make_pat_prob) * pat_0)
  pat_go_for_2 <- 1 - (0.4735 * pat_2 + (1 - 0.4735) * pat_0)

  OffWinProb_spread[two_pt_i] <- pat_go_for_2[two_pt_i]
  OffWinProb_spread[pat_i] <- pat_go_for_1[pat_i]

  ## repeat for non-spread version
  # get pat if 0, 1, or 2
  pat_0 <- get_preds_wp(pat_data |> add_esdtr())
  pat_1 <- get_preds_wp(
    pat_data |>
      dplyr::mutate(score_differential = .data$score_differential - 1) |>
      add_esdtr()
  )
  pat_2 <- get_preds_wp(
    pat_data |>
      dplyr::mutate(score_differential = .data$score_differential - 2) |>
      add_esdtr()
  )

  # Using nflscrapR version of 2pt make prob on 2nd line here
  pat_go_for_1 <- 1 - (make_pat_prob * pat_1 + (1 - make_pat_prob) * pat_0)
  pat_go_for_2 <- 1 - (0.4735 * pat_2 + (1 - 0.4735) * pat_0)

  OffWinProb[two_pt_i] <- pat_go_for_2[two_pt_i]
  OffWinProb[pat_i] <- pat_go_for_1[pat_i]

  ## end PAT fix

  ## now we need to fix WP on kickoffs, which will be WP associated with touchback
  kickoff_data <- pbp_data

  # Change the yard line to be 80 for 2009-2015 and 75 otherwise
  kickoff_data$yardline_100 <- with(kickoff_data, ifelse(season < 2016, 80, 75))
  # Now first down:
  kickoff_data$down <- rep(1, nrow(pbp_data))
  kickoff_data$down1 <- rep(1, nrow(pbp_data))
  kickoff_data$down2 <- rep(0, nrow(pbp_data))
  kickoff_data$down3 <- rep(0, nrow(pbp_data))
  kickoff_data$down4 <- rep(0, nrow(pbp_data))
  # 10 ydstogo:
  kickoff_data$ydstogo <- rep(10, nrow(pbp_data))

  # Get the new predicted probabilites:
  kickoff_preds <- get_preds_wp(kickoff_data)
  kickoff_preds_spread <- get_preds_wp_spread(kickoff_data)

  # Find the kickoffs in regulation:
  kickoff_i <- which(
    (pbp_data$play_type == "kickoff" | pbp_data$kickoff_attempt == 1) &
      pbp_data$qtr <= 4
  )

  # Now update the probabilities:
  OffWinProb[kickoff_i] <- kickoff_preds[kickoff_i]
  OffWinProb_spread[kickoff_i] <- kickoff_preds_spread[kickoff_i]

  ## end fix for kickoffs

  # Now create the win probability columns and return:
  pbp_data <- pbp_data |>
    dplyr::mutate(
      wp = OffWinProb,
      vegas_wp = OffWinProb_spread,
      # for figuring out posteam on NA posteam lines
      tmp_posteam = .data$posteam
    ) |>
    tidyr::fill(
      .data$wp,
      .direction = "up"
    ) |>
    tidyr::fill(
      .data$vegas_wp,
      .direction = "up"
    ) |>
    tidyr::fill(
      .data$tmp_posteam,
      .direction = "up"
    ) |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      #add columns for home WP
      home_wp = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$wp,
        1 - .data$wp
      ),
      vegas_home_wp = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$vegas_wp,
        1 - .data$vegas_wp
      ),

      # convenience to mark end of game
      end_game = ifelse(
        stringr::str_detect(tolower(.data$desc), "(end of game)|(end game)"),
        1,
        0
      ),

      # convenience for marking home win prob on last line
      final_value = dplyr::case_when(
        .data$home_score > .data$away_score ~ 1,
        .data$away_score > .data$home_score ~ 0,
        .data$home_score == .data$away_score ~ .5
      ),

      #make 1 or 0 the final win prob
      vegas_home_wp = dplyr::if_else(
        .data$end_game == 1,
        .data$final_value,
        .data$vegas_home_wp
      ),

      # can we make this and the above into a function? feels like a lot of repitition
      home_wp = dplyr::if_else(
        .data$end_game == 1,
        .data$final_value,
        .data$home_wp
      ),

      away_wp = 1 - .data$home_wp,

      # make wp of posteam on last line NA because there's no posteam
      vegas_wp = dplyr::if_else(
        .data$end_game == 1,
        NA_real_,
        .data$vegas_wp
      ),

      wp = dplyr::if_else(
        .data$end_game == 1,
        NA_real_,
        .data$wp
      ),

      def_wp = 1 - .data$wp,

      # make wpa
      vegas_home_wpa = dplyr::lead(.data$vegas_home_wp) - .data$vegas_home_wp,
      vegas_wpa = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$vegas_home_wpa,
        -.data$vegas_home_wpa
      ),
      vegas_wpa = dplyr::if_else(
        stringr::str_detect(
          tolower(.data$desc),
          "( kneels )|(end of game)|(end game)"
        ),
        NA_real_,
        .data$vegas_wpa
      ),

      # home wpa isn't saved but needed for next line
      home_wpa = dplyr::lead(.data$home_wp) - .data$home_wp,
      wpa = dplyr::if_else(
        .data$tmp_posteam == .data$home_team,
        .data$home_wpa,
        -.data$home_wpa
      ),
      wpa = dplyr::if_else(
        stringr::str_detect(
          tolower(.data$desc),
          "( kneels )|(end of game)|(end game)"
        ),
        NA_real_,
        .data$wpa
      )
    ) |>
    dplyr::ungroup()

  # Home and Away post:

  pbp_data$home_wp_post <- ifelse(
    pbp_data$posteam == pbp_data$home_team,
    pbp_data$home_wp + pbp_data$wpa,
    pbp_data$home_wp - pbp_data$wpa
  )
  pbp_data$away_wp_post <- ifelse(
    pbp_data$posteam == pbp_data$away_team,
    pbp_data$away_wp + pbp_data$wpa,
    pbp_data$away_wp - pbp_data$wpa
  )

  # If next thing is end of game, and post score differential is tied because it's
  # overtime then make both the home_wp_post and away_wp_post equal to 0:
  pbp_data <- pbp_data |>
    dplyr::mutate(
      home_wp_post = dplyr::if_else(
        .data$qtr == 5 &
          stringr::str_detect(
            tolower(dplyr::lead(.data$desc)),
            "(end of game)|(end game)"
          ) &
          .data$score_differential_post == 0,
        0,
        .data$home_wp_post
      ),
      away_wp_post = dplyr::if_else(
        .data$qtr == 5 &
          stringr::str_detect(
            tolower(dplyr::lead(.data$desc)),
            "(end of game)|(end game)"
          ) &
          .data$score_differential_post == 0,
        0,
        .data$away_wp_post
      )
    )

  # For plays with playtype of End of Game, use the previous play's WP_post columns
  # as the pre and post, since those are already set to be 1 and 0:

  pbp_data$home_wp_post <- with(
    pbp_data,
    ifelse(
      stringr::str_detect(tolower(desc), "(end of game)|(end game)"),
      dplyr::lag(home_wp_post),
      ifelse(
        dplyr::lag(play_type) == "no_play" & play_type == "no_play",
        dplyr::lag(home_wp_post),
        home_wp_post
      )
    )
  )

  pbp_data$away_wp_post <- with(
    pbp_data,
    ifelse(
      stringr::str_detect(tolower(desc), "(end of game)|(end game)"),
      dplyr::lag(away_wp_post),
      ifelse(
        dplyr::lag(play_type) == "no_play" & play_type == "no_play",
        dplyr::lag(away_wp_post),
        away_wp_post
      )
    )
  )

  # Now drop the unnecessary columns, rename variables back, and return:
  pbp_data |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      # Generate columns to keep track of cumulative rushing and
      # passing WPA values:
      home_team_wpa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$wpa,
        -.data$wpa
      ),
      away_team_wpa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$wpa,
        -.data$wpa
      ),
      home_team_wpa = dplyr::if_else(
        is.na(.data$home_team_wpa),
        0,
        .data$home_team_wpa
      ),
      away_team_wpa = dplyr::if_else(
        is.na(.data$away_team_wpa),
        0,
        .data$away_team_wpa
      ),
      # Same thing but separating passing and rushing:
      home_team_rush_wpa = dplyr::if_else(
        .data$play_type == "run",
        .data$home_team_wpa,
        0
      ),
      away_team_rush_wpa = dplyr::if_else(
        .data$play_type == "run",
        .data$away_team_wpa,
        0
      ),
      home_team_rush_wpa = dplyr::if_else(
        is.na(.data$home_team_rush_wpa),
        0,
        .data$home_team_rush_wpa
      ),
      away_team_rush_wpa = dplyr::if_else(
        is.na(.data$away_team_rush_wpa),
        0,
        .data$away_team_rush_wpa
      ),
      total_home_rush_wpa = cumsum(.data$home_team_rush_wpa),
      total_away_rush_wpa = cumsum(.data$away_team_rush_wpa),
      home_team_pass_wpa = dplyr::if_else(
        .data$play_type == "pass",
        .data$home_team_wpa,
        0
      ),
      away_team_pass_wpa = dplyr::if_else(
        .data$play_type == "pass",
        .data$away_team_wpa,
        0
      ),
      home_team_pass_wpa = dplyr::if_else(
        is.na(.data$home_team_pass_wpa),
        0,
        .data$home_team_pass_wpa
      ),
      away_team_pass_wpa = dplyr::if_else(
        is.na(.data$away_team_pass_wpa),
        0,
        .data$away_team_pass_wpa
      ),
      total_home_pass_wpa = cumsum(.data$home_team_pass_wpa),
      total_away_pass_wpa = cumsum(.data$away_team_pass_wpa)
    ) |>
    dplyr::ungroup()
}


# helper function to get expected score diff to time ratio
# needed after flipping teams in WP for getting PAT WP
add_esdtr <- function(data) {
  data |>
    dplyr::mutate(
      Diff_Time_Ratio = .data$score_differential /
        (exp(-4 * .data$elapsed_share))
    )
}


#################################################################
# air and YAC EP:
# as with the rest, heavily borrowed from nflscrapR:
# https://github.com/maksimhorowitz/nflscrapR/blob/master/R/add_ep_wp_variables.R
add_air_yac_ep_variables <- function(pbp_data) {
  #testing
  #pbp_data <- g

  # Final all pass attempts that are not sacks:
  pass_plays_i <- which(
    !is.na(pbp_data$air_yards) & pbp_data$play_type == 'pass'
  )
  pass_pbp_data <- pbp_data[pass_plays_i, ]

  # Using the air_yards need to update the following:
  # - yrdline100
  # - TimeSecs_Remaining
  # - ydstogo
  # - down
  # - timeouts

  # Get everything set up for calculation
  pass_pbp_data <- pass_pbp_data |>
    dplyr::mutate(
      posteam_timeouts_pre = .data$posteam_timeouts_remaining,
      defeam_timeouts_pre = .data$defteam_timeouts_remaining
    ) |>
    # Rename the old columns to update for calculating the EP from the air:
    dplyr::rename(
      old_yrdline100 = .data$yardline_100,
      old_ydstogo = .data$ydstogo,
      old_TimeSecs_Remaining = .data$half_seconds_remaining,
      old_down = .data$down
    ) |>
    dplyr::mutate(
      Turnover_Ind = dplyr::if_else(
        .data$old_down == 4 & .data$air_yards < .data$old_ydstogo,
        1,
        0
      ),
      yardline_100 = dplyr::if_else(
        .data$Turnover_Ind == 0,
        .data$old_yrdline100 - .data$air_yards,
        100 - (.data$old_yrdline100 - .data$air_yards)
      ),
      ydstogo = dplyr::if_else(
        .data$air_yards >= .data$old_ydstogo |
          .data$Turnover_Ind == 1,
        10,
        .data$old_ydstogo - .data$air_yards
      ),
      down = dplyr::if_else(
        .data$air_yards >= .data$old_ydstogo |
          .data$Turnover_Ind == 1,
        1,
        as.numeric(.data$old_down) + 1
      ),
      half_seconds_remaining = .data$old_TimeSecs_Remaining - 5.704673,
      down1 = dplyr::if_else(.data$down == 1, 1, 0),
      down2 = dplyr::if_else(.data$down == 2, 1, 0),
      down3 = dplyr::if_else(.data$down == 3, 1, 0),
      down4 = dplyr::if_else(.data$down == 4, 1, 0),
      posteam_timeouts_remaining = dplyr::if_else(
        .data$Turnover_Ind == 1,
        .data$defeam_timeouts_pre,
        .data$posteam_timeouts_pre
      ),
      defteam_timeouts_remaining = dplyr::if_else(
        .data$Turnover_Ind == 1,
        .data$posteam_timeouts_pre,
        .data$defeam_timeouts_pre
      )
    )

  #get EP predictions
  pass_pbp_data_preds <- get_preds(pass_pbp_data)

  # Convert to air EP:
  pass_pbp_data_preds <- dplyr::mutate(
    pass_pbp_data_preds,
    airEP = (.data$Opp_Safety * -2) +
      (.data$Opp_Field_Goal * -3) +
      (.data$Opp_Touchdown * -7) +
      (.data$Safety * 2) +
      (.data$Field_Goal * 3) +
      (.data$Touchdown * 7)
  )

  # Return back to the passing data:
  pass_pbp_data$airEP <- pass_pbp_data_preds$airEP

  # For the plays that have TimeSecs_Remaining 0 or less, set airEP to 0:
  pass_pbp_data$airEP[which(pass_pbp_data$half_seconds_remaining <= 0)] <- 0

  # Calculate the airEPA based on 4 scenarios:
  pass_pbp_data$airEPA <- with(
    pass_pbp_data,
    ifelse(
      old_yrdline100 - air_yards <= 0,
      7 - ep,
      ifelse(
        old_yrdline100 - air_yards > 99,
        -2 - ep,
        ifelse(Turnover_Ind == 1, (-1 * airEP) - ep, airEP - ep)
      )
    )
  )

  # If the play is a two-point conversion then change the airEPA to NA since
  # no air yards are provided:
  pass_pbp_data$airEPA <- with(
    pass_pbp_data,
    ifelse(two_point_attempt == 1, NA, airEPA)
  )
  # Calculate the yards after catch EPA:
  pass_pbp_data <- dplyr::mutate(
    pass_pbp_data,
    yacEPA = .data$epa - .data$airEPA
  )

  # if Yards after catch is 0 make yacEPA set to 0:
  pass_pbp_data$yacEPA <- ifelse(
    pass_pbp_data$penalty == 0 &
      pass_pbp_data$yards_after_catch == 0 &
      pass_pbp_data$complete_pass == 1,
    0,
    pass_pbp_data$yacEPA
  )

  # if Yards after catch is 0 make airEPA set to EPA:
  pass_pbp_data$airEPA <- ifelse(
    pass_pbp_data$penalty == 0 &
      pass_pbp_data$yards_after_catch == 0 &
      pass_pbp_data$complete_pass == 1,
    pass_pbp_data$epa,
    pass_pbp_data$airEPA
  )

  # Now add airEPA and yacEPA to the original dataset:
  pbp_data$airEPA <- NA
  pbp_data$yacEPA <- NA
  pbp_data$airEPA[pass_plays_i] <- pass_pbp_data$airEPA
  pbp_data$yacEPA[pass_plays_i] <- pass_pbp_data$yacEPA

  # Now change the names to be the right style, calculate the completion form
  # of the variables, as well as the cumulative totals and return:
  pbp_data |>
    dplyr::rename(air_epa = "airEPA", yac_epa = "yacEPA") |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      comp_air_epa = dplyr::if_else(.data$complete_pass == 1, .data$air_epa, 0),
      comp_yac_epa = dplyr::if_else(.data$complete_pass == 1, .data$yac_epa, 0),
      home_team_comp_air_epa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$comp_air_epa,
        -.data$comp_air_epa
      ),
      away_team_comp_air_epa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$comp_air_epa,
        -.data$comp_air_epa
      ),
      home_team_comp_yac_epa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$comp_yac_epa,
        -.data$comp_yac_epa
      ),
      away_team_comp_yac_epa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$comp_yac_epa,
        -.data$comp_yac_epa
      ),
      home_team_comp_air_epa = dplyr::if_else(
        is.na(.data$home_team_comp_air_epa),
        0,
        .data$home_team_comp_air_epa
      ),
      away_team_comp_air_epa = dplyr::if_else(
        is.na(.data$away_team_comp_air_epa),
        0,
        .data$away_team_comp_air_epa
      ),
      home_team_comp_yac_epa = dplyr::if_else(
        is.na(.data$home_team_comp_yac_epa),
        0,
        .data$home_team_comp_yac_epa
      ),
      away_team_comp_yac_epa = dplyr::if_else(
        is.na(.data$away_team_comp_yac_epa),
        0,
        .data$away_team_comp_yac_epa
      ),
      total_home_comp_air_epa = cumsum(.data$home_team_comp_air_epa),
      total_away_comp_air_epa = cumsum(.data$away_team_comp_air_epa),
      total_home_comp_yac_epa = cumsum(.data$home_team_comp_yac_epa),
      total_away_comp_yac_epa = cumsum(.data$away_team_comp_yac_epa),
      # Same but for raw - not just completions:
      home_team_raw_air_epa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$air_epa,
        -.data$air_epa
      ),
      away_team_raw_air_epa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$air_epa,
        -.data$air_epa
      ),
      home_team_raw_yac_epa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$yac_epa,
        -.data$yac_epa
      ),
      away_team_raw_yac_epa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$yac_epa,
        -.data$yac_epa
      ),
      home_team_raw_air_epa = dplyr::if_else(
        is.na(.data$home_team_raw_air_epa),
        0,
        .data$home_team_raw_air_epa
      ),
      away_team_raw_air_epa = dplyr::if_else(
        is.na(.data$away_team_raw_air_epa),
        0,
        .data$away_team_raw_air_epa
      ),
      home_team_raw_yac_epa = dplyr::if_else(
        is.na(.data$home_team_raw_yac_epa),
        0,
        .data$home_team_raw_yac_epa
      ),
      away_team_raw_yac_epa = dplyr::if_else(
        is.na(.data$away_team_raw_yac_epa),
        0,
        .data$away_team_raw_yac_epa
      ),
      total_home_raw_air_epa = cumsum(.data$home_team_raw_air_epa),
      total_away_raw_air_epa = cumsum(.data$away_team_raw_air_epa),
      total_home_raw_yac_epa = cumsum(.data$home_team_raw_yac_epa),
      total_away_raw_yac_epa = cumsum(.data$away_team_raw_yac_epa)
    ) |>
    dplyr::ungroup()
}


#################################################################
# air and YAC WP:
# as with the rest, heavily borrowed from nflscrapR:
# https://github.com/maksimhorowitz/nflscrapR/blob/master/R/add_ep_wp_variables.R
add_air_yac_wp_variables <- function(pbp_data) {
  #testing
  #pbp_data <- g

  # Change the names to reflect the old style - will update this later on:
  pbp_data <- pbp_data |>
    dplyr::mutate(
      posteam_timeouts_pre = .data$posteam_timeouts_remaining,
      defeam_timeouts_pre = .data$defteam_timeouts_remaining
    )

  # Final all pass attempts that are not sacks:
  pass_plays_i <- which(
    !is.na(pbp_data$air_yards) & pbp_data$play_type == 'pass'
  )
  pass_pbp_data <- pbp_data[pass_plays_i, ]

  pass_pbp_data <- pass_pbp_data |>
    dplyr::mutate(
      half_seconds_remaining = .data$half_seconds_remaining - 5.704673,
      game_seconds_remaining = .data$game_seconds_remaining - 5.704673,
      Diff_Time_Ratio = .data$score_differential /
        (exp(-4 * .data$elapsed_share)),
      Turnover_Ind = dplyr::if_else(
        .data$down == 4 & .data$air_yards < .data$ydstogo,
        1,
        0
      ),
      Diff_Time_Ratio = dplyr::if_else(
        .data$Turnover_Ind == 1,
        -1 * .data$Diff_Time_Ratio,
        .data$Diff_Time_Ratio
      ),
      posteam_timeouts_remaining = dplyr::if_else(
        .data$Turnover_Ind == 1,
        .data$defeam_timeouts_pre,
        .data$posteam_timeouts_pre
      ),
      defteam_timeouts_remaining = dplyr::if_else(
        .data$Turnover_Ind == 1,
        .data$posteam_timeouts_pre,
        .data$defeam_timeouts_pre
      )
    )

  # Calculate the airWP:
  pass_pbp_data$airWP <- get_preds_wp(pass_pbp_data)

  # Now for plays marked with Turnover_Ind, use 1 - airWP to flip back to the original
  # team with possession:
  pass_pbp_data$airWP <- ifelse(
    pass_pbp_data$Turnover_Ind == 1,
    1 - pass_pbp_data$airWP,
    pass_pbp_data$airWP
  )

  # For the plays that have TimeSecs_Remaining 0 or less, set airWP to 0:
  pass_pbp_data$airWP[which(pass_pbp_data$half_seconds_remaining <= 0)] <- 0
  pass_pbp_data$airWP[which(pass_pbp_data$game_seconds_remaining <= 0)] <- 0

  # Calculate the airWPA and yacWPA:
  pass_pbp_data <- dplyr::mutate(
    pass_pbp_data,
    airWPA = .data$airWP - .data$wp,
    yacWPA = .data$wpa - .data$airWPA
  )

  # If the play is a two-point conversion then change the airWPA to NA since
  # no air yards are provided:
  pass_pbp_data$airWPA <- with(
    pass_pbp_data,
    ifelse(two_point_attempt == 1, NA, airWPA)
  )
  pass_pbp_data$yacWPA <- with(
    pass_pbp_data,
    ifelse(two_point_attempt == 1, NA, yacWPA)
  )

  # Check to see if there is any overtime plays, if so then need to calculate
  # by essentially taking the same process as the airEP calculation and using
  # the resulting probabilities for overtime:

  # First check if there's any overtime plays:
  if (any(pass_pbp_data$qtr == 5 | pass_pbp_data$qtr == 6)) {
    # Find the rows that are overtime:
    pass_overtime_i <- which(pass_pbp_data$qtr == 5 | pass_pbp_data$qtr == 6)
    pass_overtime_df <- pass_pbp_data[pass_overtime_i, ]

    # Find the rows that are overtime:

    # Need to generate same overtime scenario data as before in the wp function:
    # Find the rows that are overtime:
    overtime_i <- which(pbp_data$qtr == 5 | pbp_data$qtr == 6)

    overtime_df <- pbp_data[overtime_i, ]

    # Separate routine for overtime:

    # Create a column that is just the first drive of overtime repeated:
    overtime_df$First_Drive <- rep(
      min(overtime_df$drive, na.rm = TRUE),
      nrow(overtime_df)
    )

    # Calculate the difference in drive number
    overtime_df <- dplyr::mutate(
      overtime_df,
      Drive_Diff = .data$drive - .data$First_Drive
    )

    # Create an indicator column that means the posteam is losing by 3 and
    # its the second drive of overtime:
    overtime_df$One_FG_Game <- ifelse(
      overtime_df$score_differential == -3 &
        overtime_df$Drive_Diff == 1,
      1,
      0
    )

    # Now create a copy of the dataset to then make the EP predictions for when
    # a field goal is scored and its not sudden death:
    overtime_df_ko <- overtime_df

    overtime_df_ko$yardline_100 <- with(
      overtime_df_ko,
      ifelse(
        game_year < 2016 |
          (game_year == 2016 & game_month < 4),
        80,
        75
      )
    )

    # Now first down:
    overtime_df_ko$down1 <- rep(1, nrow(overtime_df_ko))
    overtime_df_ko$down2 <- rep(0, nrow(overtime_df_ko))
    overtime_df_ko$down3 <- rep(0, nrow(overtime_df_ko))
    overtime_df_ko$down4 <- rep(0, nrow(overtime_df_ko))
    # 10 ydstogo:
    overtime_df_ko$ydstogo <- rep(10, nrow(overtime_df_ko))

    # Get the predictions from the EP model and calculate the necessary probability:
    if (nrow(overtime_df_ko) > 1) {
      overtime_df_ko_preds <- get_preds(overtime_df_ko)
    } else {
      overtime_df_ko_preds <- get_preds(overtime_df_ko)
    }

    overtime_df_ko_preds <- dplyr::mutate(
      overtime_df_ko_preds,
      Win_Back = .data$No_Score +
        .data$Opp_Field_Goal +
        .data$Opp_Safety +
        .data$Opp_Touchdown
    )

    # Calculate the two possible win probability types, Sudden Death and one Field Goal:
    overtime_df$Sudden_Death_WP <- overtime_df$fg_prob +
      overtime_df$td_prob +
      overtime_df$safety_prob
    overtime_df$One_FG_WP <- overtime_df$td_prob +
      (overtime_df$fg_prob * overtime_df_ko_preds$Win_Back)

    # Find all Pass Attempts that are also actual plays in overtime:
    overtime_pass_plays_i <- which(
      overtime_df$play_type == "pass" &
        !is.na(overtime_df$air_yards)
    )

    overtime_pass_df <- overtime_df[overtime_pass_plays_i, ]
    overtime_df_ko_preds_pass <- overtime_df_ko_preds[overtime_pass_plays_i, ]

    # Using the AirYards need to update the following:
    # - yardline_100
    # - half_seconds_remaining
    # - ydstogo
    # - down

    # First rename the old columns to update for calculating the EP from the air:
    overtime_pass_df <- dplyr::rename(
      overtime_pass_df,
      old_yrdline100 = "yardline_100",
      old_ydstogo = "ydstogo",
      old_TimeSecs_Remaining = "half_seconds_remaining",
      old_down = "down"
    )

    # Create an indicator column for the air yards failing to convert the first down:
    overtime_pass_df$Turnover_Ind <- ifelse(
      overtime_pass_df$old_down == 4 &
        overtime_pass_df$air_yards < overtime_pass_df$old_ydstogo,
      1,
      0
    )
    # Adjust the field position variables:
    overtime_pass_df$yardline_100 <- ifelse(
      overtime_pass_df$Turnover_Ind == 0,
      overtime_pass_df$old_yrdline100 - overtime_pass_df$air_yards,
      100 - (overtime_pass_df$old_yrdline100 - overtime_pass_df$air_yards)
    )

    overtime_pass_df$ydstogo <- ifelse(
      overtime_pass_df$air_yards >= overtime_pass_df$old_ydstogo |
        overtime_pass_df$Turnover_Ind == 1,
      10,
      overtime_pass_df$old_ydstogo - overtime_pass_df$air_yards
    )

    overtime_pass_df$down <- ifelse(
      overtime_pass_df$air_yards >= overtime_pass_df$old_ydstogo |
        overtime_pass_df$Turnover_Ind == 1,
      1,
      as.numeric(overtime_pass_df$old_down) + 1
    )

    # Adjust the time with the average incomplete pass time:
    overtime_pass_df$half_seconds_remaining <- overtime_pass_df$old_TimeSecs_Remaining -
      5.704673

    overtime_pass_df <- overtime_pass_df |>
      dplyr::mutate(
        down1 = dplyr::if_else(.data$down == 1, 1, 0),
        down2 = dplyr::if_else(.data$down == 2, 1, 0),
        down3 = dplyr::if_else(.data$down == 3, 1, 0),
        down4 = dplyr::if_else(.data$down == 4, 1, 0)
      )

    # Get the predictions from the EP model and calculate the necessary probability:
    if (nrow(overtime_df_ko) > 1) {
      overtime_pass_data_preds <- get_preds(overtime_pass_df)
    } else {
      overtime_pass_data_preds <- get_preds(overtime_pass_df)
    }

    # For the turnover plays flip the scoring probabilities:
    overtime_pass_data_preds <- dplyr::mutate(
      overtime_pass_data_preds,
      old_Opp_Field_Goal = .data$Opp_Field_Goal,
      old_Opp_Safety = .data$Opp_Safety,
      old_Opp_Touchdown = .data$Opp_Touchdown,
      old_Field_Goal = .data$Field_Goal,
      old_Safety = .data$Safety,
      old_Touchdown = .data$Touchdown
    )
    overtime_pass_data_preds$Opp_Field_Goal <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Field_Goal,
      overtime_pass_data_preds$Opp_Field_Goal
    )
    overtime_pass_data_preds$Opp_Safety <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Safety,
      overtime_pass_data_preds$Opp_Safety
    )
    overtime_pass_data_preds$Opp_Touchdown <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Touchdown,
      overtime_pass_data_preds$Opp_Touchdown
    )
    overtime_pass_data_preds$Field_Goal <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Opp_Field_Goal,
      overtime_pass_data_preds$Field_Goal
    )
    overtime_pass_data_preds$Safety <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Opp_Safety,
      overtime_pass_data_preds$Safety
    )
    overtime_pass_data_preds$Touchdown <- ifelse(
      overtime_pass_df$Turnover_Ind == 1,
      overtime_pass_data_preds$old_Opp_Touchdown,
      overtime_pass_data_preds$Touchdown
    )

    # Calculate the two possible win probability types, Sudden Death and one Field Goal:
    pass_overtime_df$Sudden_Death_airWP <- with(
      overtime_pass_data_preds,
      Field_Goal + Touchdown + Safety
    )
    pass_overtime_df$One_FG_airWP <- overtime_pass_data_preds$Touchdown +
      (overtime_pass_data_preds$Field_Goal * overtime_df_ko_preds_pass$Win_Back)

    # Decide which win probability to use:
    pass_overtime_df$airWP <- ifelse(
      overtime_pass_df$game_year >= 2012 &
        (overtime_pass_df$Drive_Diff == 0 |
          (overtime_pass_df$Drive_Diff == 1 &
            overtime_pass_df$One_FG_Game == 1)),
      pass_overtime_df$One_FG_airWP,
      pass_overtime_df$Sudden_Death_airWP
    )

    # For the plays that have TimeSecs_Remaining 0 or less, set airWP to 0:
    pass_overtime_df$airWP[which(
      overtime_pass_df$half_seconds_remaining <= 0
    )] <- 0

    # Calculate the airWPA and yacWPA:
    pass_overtime_df <- dplyr::mutate(
      pass_overtime_df,
      airWPA = .data$airWP - .data$wp,
      yacWPA = .data$wpa - .data$airWPA
    )

    # If the play is a two-point conversion then change the airWPA to NA since
    # no air yards are provided:
    pass_overtime_df$airWPA <- with(
      pass_overtime_df,
      ifelse(two_point_attempt == 1, NA, airWPA)
    )
    pass_overtime_df$yacWPA <- with(
      pass_overtime_df,
      ifelse(two_point_attempt == 1, NA, yacWPA)
    )

    pass_overtime_df <- pass_pbp_data[pass_overtime_i, ]

    # Now update the overtime rows in the original pass_pbp_data for airWPA and yacWPA:
    pass_pbp_data$airWPA[pass_overtime_i] <- pass_overtime_df$airWPA
    pass_pbp_data$yacWPA[pass_overtime_i] <- pass_overtime_df$yacWPA
  }

  # if Yards after catch is 0 make yacWPA set to 0:
  pass_pbp_data$yacWPA <- ifelse(
    pass_pbp_data$penalty == 0 &
      pass_pbp_data$yards_after_catch == 0 &
      pass_pbp_data$complete_pass == 1,
    0,
    pass_pbp_data$yacWPA
  )
  # if Yards after catch is 0 make airWPA set to WPA:
  pass_pbp_data$airWPA <- ifelse(
    pass_pbp_data$penalty == 0 &
      pass_pbp_data$yards_after_catch == 0 &
      pass_pbp_data$complete_pass == 1,
    pass_pbp_data$wpa,
    pass_pbp_data$airWPA
  )

  # Now add airWPA and yacWPA to the original dataset:
  pbp_data$airWPA <- NA
  pbp_data$yacWPA <- NA
  pbp_data$airWPA[pass_plays_i] <- pass_pbp_data$airWPA
  pbp_data$yacWPA[pass_plays_i] <- pass_pbp_data$yacWPA

  # Now change the names to be the right style, calculate the completion form
  # of the variables, as well as the cumulative totals and return:
  pbp_data |>
    dplyr::rename(air_wpa = "airWPA", yac_wpa = "yacWPA") |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      comp_air_wpa = dplyr::if_else(.data$complete_pass == 1, .data$air_wpa, 0),
      comp_yac_wpa = dplyr::if_else(.data$complete_pass == 1, .data$yac_wpa, 0),
      home_team_comp_air_wpa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$comp_air_wpa,
        -.data$comp_air_wpa
      ),
      away_team_comp_air_wpa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$comp_air_wpa,
        -.data$comp_air_wpa
      ),
      home_team_comp_yac_wpa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$comp_yac_wpa,
        -.data$comp_yac_wpa
      ),
      away_team_comp_yac_wpa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$comp_yac_wpa,
        -.data$comp_yac_wpa
      ),
      home_team_comp_air_wpa = dplyr::if_else(
        is.na(.data$home_team_comp_air_wpa),
        0,
        .data$home_team_comp_air_wpa
      ),
      away_team_comp_air_wpa = dplyr::if_else(
        is.na(.data$away_team_comp_air_wpa),
        0,
        .data$away_team_comp_air_wpa
      ),
      home_team_comp_yac_wpa = dplyr::if_else(
        is.na(.data$home_team_comp_yac_wpa),
        0,
        .data$home_team_comp_yac_wpa
      ),
      away_team_comp_yac_wpa = dplyr::if_else(
        is.na(.data$away_team_comp_yac_wpa),
        0,
        .data$away_team_comp_yac_wpa
      ),
      total_home_comp_air_wpa = cumsum(.data$home_team_comp_air_wpa),
      total_away_comp_air_wpa = cumsum(.data$away_team_comp_air_wpa),
      total_home_comp_yac_wpa = cumsum(.data$home_team_comp_yac_wpa),
      total_away_comp_yac_wpa = cumsum(.data$away_team_comp_yac_wpa),
      # Same but for raw - not just completions:
      home_team_raw_air_wpa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$air_wpa,
        -.data$air_wpa
      ),
      away_team_raw_air_wpa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$air_wpa,
        -.data$air_wpa
      ),
      home_team_raw_yac_wpa = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$yac_wpa,
        -.data$yac_wpa
      ),
      away_team_raw_yac_wpa = dplyr::if_else(
        .data$posteam == .data$away_team,
        .data$yac_wpa,
        -.data$yac_wpa
      ),
      home_team_raw_air_wpa = dplyr::if_else(
        is.na(.data$home_team_raw_air_wpa),
        0,
        .data$home_team_raw_air_wpa
      ),
      away_team_raw_air_wpa = dplyr::if_else(
        is.na(.data$away_team_raw_air_wpa),
        0,
        .data$away_team_raw_air_wpa
      ),
      home_team_raw_yac_wpa = dplyr::if_else(
        is.na(.data$home_team_raw_yac_wpa),
        0,
        .data$home_team_raw_yac_wpa
      ),
      away_team_raw_yac_wpa = dplyr::if_else(
        is.na(.data$away_team_raw_yac_wpa),
        0,
        .data$away_team_raw_yac_wpa
      ),
      total_home_raw_air_wpa = cumsum(.data$home_team_raw_air_wpa),
      total_away_raw_air_wpa = cumsum(.data$away_team_raw_air_wpa),
      total_home_raw_yac_wpa = cumsum(.data$home_team_raw_yac_wpa),
      total_away_raw_yac_wpa = cumsum(.data$away_team_raw_yac_wpa)
    ) |>
    dplyr::ungroup()
}


================================================
FILE: R/helper_add_fixed_drives.R
================================================
################################################################################
# Author: Sebastian Carl, Ben Baldwin
# Purpose: Function to add drive variables
# Code Style Guide: styler::tidyverse_style()
################################################################################

## fixed_drive =
##  starts at 1, each new drive, numbers shared across both teams
## fixed_drive_result =
##  result of  given drive
add_drive_results <- function(d) {
  drive_df <- d |>
    dplyr::mutate(
      old_posteam = .data$posteam,
      posteam = dplyr::case_when(
        # on kickoffs the kicking team is the defteam but this should be swapped
        # in terms of this function if the kickoff is recovered
        .data$kickoff_attempt == 1 &
          (.data$own_kickoff_recovery == 1 |
            .data$fumble_lost == 1) ~ .data$defteam,
        # if a kickoff has to be replayed due to a penalty and is then recovered,
        # the prior (reversed) kickoff shouldn't be a new drive/series
        stringr::str_detect(.data$desc, kickoff_finder) &
          .data$own_kickoff_recovery == 0 &
          dplyr::lead(.data$own_kickoff_recovery == 1) ~ .data$defteam,
        TRUE ~ .data$posteam
      )
    ) |>
    dplyr::group_by(.data$game_id, .data$game_half) |>
    dplyr::mutate(
      row = 1:dplyr::n(),
      new_drive = dplyr::if_else(
        # change in posteam
        .data$posteam != dplyr::lag(.data$posteam) |
          # change in posteam in t-2 and na posteam in t-1
          (.data$posteam != dplyr::lag(.data$posteam, 2) &
            is.na(dplyr::lag(.data$posteam))) |
          # change in posteam in t-3 and na posteam in t-1 and t-2
          (.data$posteam != dplyr::lag(.data$posteam, 3) &
            is.na(dplyr::lag(.data$posteam, 2)) &
            is.na(dplyr::lag(.data$posteam))),
        1,
        0
      ),
      # PAT after defensive TD is not a new drive
      new_drive = dplyr::if_else(
        dplyr::lag(.data$touchdown == 1) &
          (dplyr::lag(.data$posteam) != dplyr::lag(.data$td_team)) &
          # this last part is needed because otherwise it was overwriting
          # the existing value of new_drive with NA on plays following timeouts
          !is.na(dplyr::lag(.data$posteam)),
        0,
        .data$new_drive
      ),
      # PAT after defensive TD is not a new drive even if a Timeout follows the TD
      new_drive = dplyr::if_else(
        dplyr::lag(stringr::str_detect(
          .data$desc,
          "(Timeout)|(Two-Minute Warning)"
        )) &
          dplyr::lag(.data$touchdown == 1, 2L) &
          (dplyr::lag(.data$posteam, 2L) != dplyr::lag(.data$td_team, 2L)),
        0,
        .data$new_drive,
        missing = .data$new_drive
      ),
      # PAT after defensive TD is not a new drive even if 2 Timeouts follow the TD
      new_drive = dplyr::if_else(
        dplyr::lag(stringr::str_detect(
          .data$desc,
          "(Timeout)|(Two-Minute Warning)"
        )) &
          dplyr::lag(
            stringr::str_detect(.data$desc, "(Timeout)|(Two-Minute Warning)"),
            2L
          ) &
          dplyr::lag(.data$touchdown == 1, 3L) &
          (dplyr::lag(.data$posteam, 3L) != dplyr::lag(.data$td_team, 3L)),
        0,
        .data$new_drive,
        missing = .data$new_drive
      ),
      # if same team has the ball as prior play, but prior play was a punt with lost fumble, it's a new drive
      # or if the prior play was a lost fumble or interception
      new_drive = dplyr::if_else(
        # this line is to prevent it from overwriting already-defined new drives with NA
        # when there's a timeout on prior line bc if_else is obnoxious like that
        (.data$new_drive != 1 | is.na(.data$new_drive)) &
          (
            # same team has ball after lost fumble on punt, fg, pass or rush
            (.data$posteam == dplyr::lag(.data$posteam) &
              dplyr::lag(.data$fumble_lost) == 1 &
              dplyr::lag(.data$play_type) %in%
                c("punt", "pass", "run", "field_goal") &
              # but not if the play resulted in a touchdown because otherwise the
              # following extra point or 2pt conversion will be new drives
              dplyr::lag(.data$touchdown) == 0) |

              # same team has ball after lost fumble on punt, fg, pass or rush 2 plays earlier with prior play missing posteam
              (is.na(dplyr::lag(.data$posteam)) &
                # posteam is same as posteam 2 plays ago
                .data$posteam == dplyr::lag(.data$posteam, 2) &
                # lost fumble 2 plays ago
                dplyr::lag(.data$fumble_lost, 2) == 1 &
                dplyr::lag(.data$play_type, 2) %in%
                  c("punt", "pass", "run", "field_goal") &
                # but not if the lost fumble 2 plays ago resulted in a touchdown because otherwise the
                # following extra point or 2pt conversion will be new drives
                dplyr::lag(.data$touchdown, 2) == 0)
          ),
        1,
        .data$new_drive
      ),
      # first observation of a half is also a new drive
      new_drive = dplyr::if_else(.data$row == 1, 1, .data$new_drive),

      # if you recovered an onside kick or muffed return, it's a new drive
      new_drive = dplyr::case_when(
        .data$play_type == "kickoff" &
          (.data$own_kickoff_recovery == 1 | .data$fumble_lost == 1) ~ 1,
        TRUE ~ .data$new_drive
      ),

      # if it's a kickoff and the prior play was a safety, it's a new drive
      new_drive = dplyr::case_when(
        # safety prior play
        .data$kickoff_attempt == 1 & dplyr::lag(.data$safety) == 1 ~ 1,
        # safety 2 plays ago and timeout on previous play
        .data$kickoff_attempt == 1 &
          dplyr::lag(.data$safety, 2) == 1 &
          (is.na(dplyr::lag(.data$play_type)) |
            dplyr::lag(.data$play_type) == "no_play") ~ 1,
        TRUE ~ .data$new_drive
      ),

      # if there's a missing, make it not a new drive (0)
      new_drive = dplyr::if_else(is.na(.data$new_drive), 0, .data$new_drive)
    ) |>
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      fixed_drive = cumsum(.data$new_drive),
      tmp_result = dplyr::case_when(
        .data$touchdown == 1 & .data$posteam == .data$td_team ~ "Touchdown",
        .data$touchdown == 1 & .data$posteam != .data$td_team ~ "Opp touchdown",
        .data$field_goal_result == "made" ~ "Field goal",
        .data$field_goal_result %in%
          c("blocked", "missed") ~ "Missed field goal",
        .data$safety == 1 ~ "Safety",
        .data$play_type == "punt" | .data$punt_attempt == 1 ~ "Punt",
        .data$interception == 1 | .data$fumble_lost == 1 ~ "Turnover",
        .data$down == 4 &
          .data$yards_gained < .data$ydstogo &
          .data$play_type != "no_play" ~ "Turnover on downs",
        stringr::str_detect(
          .data$desc,
          "(END QUARTER 2)|(END QUARTER 4)|(END GAME)"
        ) ~ "End of half"
      )
    ) |>
    dplyr::group_by(.data$game_id, .data$fixed_drive) |>
    dplyr::mutate(
      fixed_drive_result = dplyr::if_else(
        # if it's end of half, take the first thing we see
        dplyr::last(stats::na.omit(.data$tmp_result)) == "End of half",
        dplyr::first(stats::na.omit(.data$tmp_result)),
        # otherwise take the last
        dplyr::last(stats::na.omit(.data$tmp_result))
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate(posteam = .data$old_posteam) |>
    dplyr::select(-"row", -"new_drive", -"tmp_result", -"old_posteam")

  user_message("added fixed drive variables", "done")
  return(drive_df)
}


================================================
FILE: R/helper_add_game_data.R
================================================
################################################################################
# Author: Ben Baldwin
# Purpose: Function to add Lee Sharpe's game data
# Code Style Guide: styler::tidyverse_style()
################################################################################

# Thanks Lee!
add_game_data <- function(pbp, games = NULL, ...) {
  out <- pbp
  warn <- 0
  tryCatch(
    expr = {
      # we use dir to specify the directory of a locally stored games file
      # for unit tests
      if (is.null(games)) {
        games <- nflreadr::load_schedules()
      } else {
        stopifnot(
          inherits(games, "nflverse_data"),
          isTRUE(attr(games, "nflverse_type") == "games and schedules")
        )
      }

      out <- out |>
        dplyr::left_join(
          games |>
            dplyr::select(
              "game_id",
              "old_game_id",
              "away_score",
              "home_score",
              "location",
              "result",
              "total",
              "spread_line",
              "total_line",
              "div_game",
              "roof",
              "surface",
              "temp",
              "wind",
              "home_coach",
              "away_coach",
              "stadium",
              "stadium_id",
              "gameday"
            ) |>
            dplyr::rename(game_stadium = "stadium"),
          by = c("game_id")
        ) |>
        dplyr::mutate(
          game_date = .data$gameday
        )

      user_message("added game variables", "done")
    },
    error = function(e) {
      message("The following error has occured:")
      message(e)
    },
    warning = function(w) {
      if (warn == 1) {
        message(
          "Warning: The data hosting servers are down, so we can't add game data in the moment!"
        )
      } else {
        message("The following warning has occured:")
        message(w)
      }
    },
    finally = {}
  )
  return(out)
}


================================================
FILE: R/helper_add_nflscrapr_mutations.R
================================================
################################################################################
# Author: Sebastian Carl, Ben Baldwin (Code mostly extracted from nflscrapR)
# Purpose: Add variables mostly needed for ep(a) and wp(a) calculation
# Code Style Guide: styler::tidyverse_style()
################################################################################

add_nflscrapr_mutations <- function(pbp) {
  #testing only
  #pbp <- combined

  out <-
    pbp |>
    dplyr::mutate(index = 1:dplyr::n()) |>
    # remove duplicate plays. can't do this with play_id because duplicate plays
    # sometimes have different play_ids
    dplyr::group_by(
      .data$game_id,
      .data$quarter,
      .data$time,
      .data$play_description,
      .data$down
    ) |>
    dplyr::slice(1) |>
    dplyr::ungroup() |>
    dplyr::mutate(
      # Modify the time column for the quarter end:
      time = dplyr::if_else(
        .data$quarter_end == 1 |
          (.data$play_description == "END GAME" & is.na(.data$time)),
        "00:00",
        .data$time
      ),
      time = dplyr::if_else(
        .data$play_description == 'GAME',
        "15:00",
        .data$time
      ),
      # Create a column with the time in seconds remaining for the quarter:
      quarter_seconds_remaining = time_to_seconds(.data$time),
      play_description = dplyr::case_when(
        stringr::str_detect(
          .data$play_description,
          "(?<=kicks )[:alpha:]{1,}.[:alpha:]{1,}(?= yards)"
        ) ~
          stringr::str_replace(
            .data$play_description,
            "(?<=kicks )[:alpha:]{1,}.[:alpha:]{1,}(?= yards)",
            as.character(.data$kick_distance)
          ),
        TRUE ~ .data$play_description
      )
    ) |>
    #put plays in the right order
    dplyr::group_by(.data$game_id) |>
    # the !is.na(drive), drive part is to make the initial GAME line show up first
    # https://stackoverflow.com/questions/43343590/how-to-sort-putting-nas-first-in-dplyr
    dplyr::arrange(
      .data$order_sequence,
      .data$quarter,
      !is.na(.data$quarter_seconds_remaining),
      -.data$quarter_seconds_remaining,
      !is.na(.data$drive),
      .data$drive,
      .data$index,
      .by_group = TRUE
    ) |>
    dplyr::mutate(
      # Using the various two point indicators, create a column denoting the result
      # outcome for two point conversions:
      two_point_conv_result = dplyr::if_else(
        (.data$two_point_rush_good == 1 |
          .data$two_point_pass_good == 1 |
          .data$two_point_pass_reception_good == 1) &
          .data$two_point_attempt == 1,
        "success",
        NA_character_
      ),
      two_point_conv_result = dplyr::if_else(
        (.data$two_point_rush_failed == 1 |
          .data$two_point_pass_failed == 1 |
          .data$two_point_pass_reception_failed == 1) &
          .data$two_point_attempt == 1,
        "failure",
        .data$two_point_conv_result
      ),
      two_point_conv_result = dplyr::if_else(
        (.data$two_point_rush_safety == 1 |
          .data$two_point_pass_safety == 1) &
          .data$two_point_attempt == 1,
        "safety",
        .data$two_point_conv_result
      ),
      two_point_conv_result = dplyr::if_else(
        .data$two_point_return == 1 &
          .data$two_point_attempt == 1,
        "return",
        .data$two_point_conv_result
      ),
      # If the result was a success, make the yards_gained to be 2:
      yards_gained = dplyr::if_else(
        !is.na(.data$two_point_conv_result) &
          .data$two_point_conv_result == "success",
        2,
        .data$yards_gained
      ),
      # Fix yards_gained for plays with laterals
      yards_gained = dplyr::case_when(
        !is.na(.data$passing_yards) &
          .data$yards_gained != .data$passing_yards &
          .data$penalty == 0 ~ .data$passing_yards,
        !is.na(.data$rushing_yards) &
          !is.na(.data$lateral_rushing_yards) &
          .data$yards_gained != .data$rushing_yards &
          .data$penalty == 0 ~ .data$rushing_yards +
          .data$lateral_rushing_yards,
        TRUE ~ yards_gained
      ),
      # Extract the penalty type:
      penalty_type = dplyr::if_else(
        .data$penalty == 1,
        .data$play_description |>
          stringr::str_extract(
            "(?<=PENALTY on .{1,50}, ).{1,50}(?=, [0-9]{1,2} yard)"
          ) |>
          # Face Mask penalties include the yardage as string (either 5 Yards or 15 Yards)
          # We remove the 15 Yards part and just keep the additional info if it's a
          # 5 yard Face Mask penalty
          stringr::str_remove("\\([0-9]{2}+ Yards\\)") |>
          stringr::str_squish(),
        NA_character_
      ),
      # The new "dynamic Kickoff" in the 2024 season introduces new penalty types
      penalty_type = dplyr::if_else(
        .data$penalty == 1 &
          stringr::str_detect(
            tolower(.data$play_description),
            "kickoff short of landing zone"
          ),
        "Kickoff Short of Landing Zone",
        .data$penalty_type
      ),
      penalty_type = dplyr::if_else(
        .data$penalty == 1 &
          stringr::str_detect(
            tolower(.data$play_description),
            "kickoff out of bounds"
          ),
        "Kickoff Out of Bounds",
        .data$penalty_type
      ),
      # Make plays marked with down == 0 as NA:
      down = dplyr::if_else(
        .data$down == 0,
        NA_real_,
        .data$down
      ),
      # Using the field goal indicators make a column with the field goal result:
      field_goal_result = dplyr::if_else(
        .data$field_goal_attempt == 1 &
          .data$field_goal_made == 1,
        "made",
        NA_character_
      ),
      field_goal_result = dplyr::if_else(
        .data$field_goal_attempt == 1 &
          .data$field_goal_missed == 1,
        "missed",
        .data$field_goal_result
      ),
      field_goal_result = dplyr::if_else(
        .data$field_goal_attempt == 1 &
          .data$field_goal_blocked == 1,
        "blocked",
        .data$field_goal_result
      ),

      # Using the indicators make a column with the extra point result:
      extra_point_result = dplyr::if_else(
        .data$extra_point_attempt == 1 &
          .data$extra_point_good == 1,
        "good",
        NA_character_
      ),
      extra_point_result = dplyr::if_else(
        .data$extra_point_attempt == 1 &
          .data$extra_point_failed == 1,
        "failed",
        .data$extra_point_result
      ),
      extra_point_result = dplyr::if_else(
        .data$extra_point_attempt == 1 &
          .data$extra_point_blocked == 1,
        "blocked",
        .data$extra_point_result
      ),
      extra_point_result = dplyr::if_else(
        .data$extra_point_attempt == 1 &
          .data$extra_point_safety == 1,
        "safety",
        .data$extra_point_result
      ),
      extra_point_result = dplyr::if_else(
        .data$extra_point_attempt == 1 &
          .data$extra_point_aborted == 1,
        "aborted",
        .data$extra_point_result
      ),

      # find kickoffs with penalty: a play where the next play is a kickoff
      # and the prior play wasn't a safety or PAT
      lead_ko = case_when(
        dplyr::lead(.data$kickoff_attempt) == 1 &
          .data$game_id == dplyr::lead(.data$game_id) &
          !stringr::str_detect(
            tolower(.data$play_description),
            "(injured sf )|(tonight's attendance )|(injury update )|(end quarter)|(timeout)|( captains:)|( captains )|( captians:)|( humidity:)|(note - )|( deferred)|(game start )|( game has been suspended)"
          ) &
          !stringr::str_detect(.data$play_description, "GAME ") &
          !.data$play_description %in%
            c("GAME", "Two-Minute Warning", "The game has resumed.") &
          is.na(.data$two_point_conv_result) &
          is.na(.data$extra_point_result) &
          is.na(.data$field_goal_result) &
          (.data$safety == 0 | is.na(.data$safety)) &
          # because things too messed up before
          .data$season > 2000 ~ 1,
        TRUE ~ 0
      ),

      # we overwrite kickoff_attempt for kickoffs with penalties because
      # those mess with ep/epa/wp/wpa. Since this is inconsistent compared to
      # all other *_attempt variables, we will restore kickoff_attempt after
      # models are applied. That's done with a temporary copy of kickoff_attempt.
      # See #556, #202, #199 for example
      copy_of_kickoff_attempt = .data$kickoff_attempt,
      kickoff_attempt = dplyr::if_else(
        .data$lead_ko == 1,
        1,
        .data$kickoff_attempt
      ),

      # https://github.com/nflverse/nflfastR/issues/199#issuecomment-792321171
      kickoff_attempt = dplyr::if_else(
        .data$game_id == "2014_02_ATL_CIN" & .data$play_id == 3498,
        1,
        .data$kickoff_attempt
      ),

      # Make the possession team for kickoffs be the return team, since that is
      # more intuitive from the EPA / WPA point of view:
      posteam = dplyr::case_when(
        # kickoff_finder is defined below
        (.data$lead_ko == 1 |
          .data$kickoff_attempt == 1 |
          stringr::str_detect(.data$play_description, kickoff_finder)) &
          .data$posteam == .data$home_team ~ .data$away_team,
        (.data$lead_ko == 1 |
          .data$kickoff_attempt == 1 |
          stringr::str_detect(.data$play_description, kickoff_finder)) &
          .data$posteam == .data$away_team ~ .data$home_team,
        TRUE ~ .data$posteam
      ),

      # Fill in the rows with missing posteam with the lead:
      posteam = dplyr::if_else(
        (.data$quarter_end == 1 | .data$posteam == ""),
        dplyr::lead(.data$posteam),
        .data$posteam
      ),
      posteam_id = dplyr::if_else(
        (.data$quarter_end == 1 | .data$posteam_id == ""),
        dplyr::lead(.data$posteam_id),
        .data$posteam_id
      ),

      # remove posteam from END Q2 plays or END Q4 plays (when game goes in OT)
      # because it doesn't make sense and breaks fixed_drive and fixed_drive_result
      posteam = dplyr::if_else(
        stringr::str_detect(
          .data$play_description,
          "(END QUARTER 2)|(END QUARTER 4)"
        ),
        NA_character_,
        .data$posteam
      ),

      # Denote whether the home or away team has possession:
      posteam_type = dplyr::if_else(
        .data$posteam == .data$home_team,
        "home",
        "away"
      ),

      # manual posteam adjustments for rare plays with issues related to game
      # delays.
      posteam = dplyr::case_when(
        # 2025_01_CAR_JAX, 1317: Game resumed after weather delay
        # AND it was delayed right after a PAT.
        # Prior two plays were delay info that shouldn't have posteam in order
        # to get correct fixed drive results #529
        # https://github.com/nflverse/nflfastR/issues/529
        .data$game_id == "2025_01_CAR_JAX" &
          .data$play_id %in% c(1282, 1303) ~ NA_character_,
        TRUE ~ .data$posteam
      ),

      # Column denoting which team is on defense:
      defteam = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$away_team,
        .data$home_team
      ),

      yardline = dplyr::if_else(
        stringr::str_detect(.data$yardline, "50"),
        "MID 50",
        .data$yardline
      ),
      yardline = dplyr::if_else(
        nchar(.data$yardline) == 0 |
          is.null(.data$yardline) |
          .data$yardline == "NULL" |
          is.na(.data$yardline),
        dplyr::lead(.data$yardline),
        .data$yardline
      ),
      yardline_number = dplyr::if_else(
        .data$yardline == "MID 50",
        50,
        .data$yardline_number
      ),
      yardline_100 = dplyr::if_else(
        .data$yardline_side == .data$posteam | .data$yardline == "MID 50",
        100 - .data$yardline_number,
        .data$yardline_number
      ),
      # Set the kick_distance for extra points by adding 18 to the yardline_100:
      kick_distance = dplyr::if_else(
        .data$extra_point_attempt == 1,
        .data$yardline_100 + 18,
        .data$kick_distance
      ),
      # Create a column with the time in seconds remaining for each half:
      half_seconds_remaining = dplyr::if_else(
        .data$quarter %in% c(1, 3),
        .data$quarter_seconds_remaining + 900,
        .data$quarter_seconds_remaining
      ),
      # Create a column with the time in seconds remaining for the game:
      game_seconds_remaining = dplyr::if_else(
        .data$quarter %in% c(1, 2, 3, 4),
        .data$quarter_seconds_remaining +
          (900 * (4 - as.numeric(.data$quarter))),
        .data$quarter_seconds_remaining
      ),
      # Add column for replay or challenge:
      replay_or_challenge = stringr::str_detect(
        .data$play_description,
        "(Replay Official reviewed)|( challenge(d)? )|(Challenged)"
      ) |>
        as.numeric(),
      # Result of replay or challenge:
      replay_or_challenge_result = dplyr::if_else(
        .data$replay_or_challenge == 1,
        dplyr::if_else(
          stringr::str_detect(
            tolower(.data$play_description),
            "( upheld)|( reversed)|( confirmed)"
          ),
          stringr::str_extract(
            tolower(.data$play_description),
            "( upheld)|( reversed)|( confirmed)"
          ) |>
            stringr::str_trim(),
          "denied"
        ),
        NA_character_
      ),

      # Create the column denoting the categorical description of the pass length:
      pass_length = dplyr::if_else(
        .data$two_point_attempt == 0 &
          .data$sack == 0 &
          .data$pass_attempt == 1,
        .data$play_description |>
          stringr::str_extract("pass (incomplete )?(short|deep)") |>
          stringr::str_extract("short|deep"),
        NA_character_
      ),
      # Create the column denoting the categorical location of the pass:
      pass_location = dplyr::if_else(
        .data$two_point_attempt == 0 &
          .data$sack == 0 &
          .data$pass_attempt == 1,
        .data$play_description |>
          stringr::str_extract("(short|deep) (left|middle|right)") |>
          stringr::str_extract("left|middle|right"),
        NA_character_
      ),
      # Indicator columns for both QB kneels, spikes, scrambles,
      # no huddle, shotgun plays:
      qb_kneel = dplyr::if_else(
        stringr::str_detect(.data$play_description, " kneels ") &
          .data$kickoff_attempt != 1,
        1,
        0
      ),
      qb_spike = stringr::str_detect(.data$play_description, " spiked ") |>
        as.numeric(),
      qb_scramble = stringr::str_detect(
        .data$play_description,
        " scrambles "
      ) |>
        as.numeric(),
      shotgun = stringr::str_detect(.data$play_description, "Shotgun") |>
        as.numeric(),
      no_huddle = stringr::str_detect(.data$play_description, "No Huddle") |>
        as.numeric(),

      # Create a play type column: either pass, run, field_goal, extra_point,
      # kickoff, punt, qb_kneel, qb_spike, or no_play (which includes timeouts and
      # penalties):
      play_type = translate_play_type_nfl(
        .data$play_type_nfl,
        qb_spike = .data$qb_spike,
        qb_kneel = .data$qb_kneel,
        pass_attempt = .data$pass_attempt,
        rush_attempt = .data$rush_attempt,
        punt_attempt = .data$punt_attempt,
        field_goal_attempt = .data$field_goal_attempt,
        penalty = .data$penalty,
        is_penalty_enforced_between_downs = stringr::str_detect(
          tolower(.data$play_description),
          "enforced between downs"
        )
      ),

      # Indicator for QB dropbacks (exclude spikes and kneels):
      qb_dropback = dplyr::if_else(
        .data$play_type == "pass" |
          (.data$play_type == "run" &
            .data$qb_scramble == 1),
        1,
        0
      ),
      # Columns denoting the run location and gap:
      run_location = dplyr::if_else(
        .data$two_point_attempt == 0 &
          .data$rush_attempt == 1,
        .data$play_description |>
          stringr::str_extract(" (left|middle|right) ") |>
          stringr::str_trim(),
        NA_character_
      ),
      run_gap = dplyr::if_else(
        .data$two_point_attempt == 0 &
          .data$rush_attempt == 1,
        .data$play_description |>
          stringr::str_extract(" (guard|tackle|end) ") |>
          stringr::str_trim(),
        NA_character_
      ),
      game_half = dplyr::case_when(
        .data$quarter %in% c(1, 2) ~ "Half1",
        .data$quarter %in% c(3, 4) ~ "Half2",
        .data$quarter >= 5 ~ "Overtime",
        FALSE ~ NA_character_
      ),
      # Create columns to denote the timeouts remaining for each team, making
      # columns for both home/away and pos/def (this will involve creating
      # temporary columns that will not be included):
      # Initialize both home and away to have 3 timeouts for each
      # half except overtime where they have 2:

      # extract timeouts from failed challenges when it's not otherwise there
      tmp_timeout = stringr::str_extract(
        .data$play_description,
        "(?<=by\\s)[:upper:]{2,3}(?=\\s)"
      ),
      timeout_team = dplyr::if_else(
        .data$replay_or_challenge == 1 &
          .data$timeout == 1 &
          is.na(.data$timeout_team),
        .data$tmp_timeout,
        .data$timeout_team
      ),

      home_timeouts_remaining = dplyr::if_else(
        .data$quarter %in% c(1, 2, 3, 4),
        3,
        2
      ),
      away_timeouts_remaining = dplyr::if_else(
        .data$quarter %in% c(1, 2, 3, 4),
        3,
        2
      ),
      home_timeout_used = dplyr::if_else(
        .data$timeout == 1 &
          .data$timeout_team == .data$home_team,
        1,
        0
      ),
      away_timeout_used = dplyr::if_else(
        .data$timeout == 1 &
          .data$timeout_team == .data$away_team,
        1,
        0
      ),
      home_timeout_used = dplyr::if_else(
        is.na(.data$home_timeout_used),
        0,
        .data$home_timeout_used
      ),
      away_timeout_used = dplyr::if_else(
        is.na(.data$away_timeout_used),
        0,
        .data$away_timeout_used
      )
    ) |>
    # replace empty strings in yard line variables
    dplyr::mutate_at(
      .vars = c("yardline", "drive_start_yard_line", "drive_end_yard_line"),
      .funs = ~ dplyr::na_if(.x, "")
    ) |>
    # fix cases where a yardline variable misses the blank space between team name
    # and yard number. At the point of adding this, the only spot where this happened
    # was in the variable drive_start_yard_line in the games
    # "2000_01_CAR_WAS", "2000_02_NE_NYJ", and "2000_03_ATL_CAR"
    dplyr::mutate_at(
      .vars = c("yardline", "drive_start_yard_line", "drive_end_yard_line"),
      .funs = ~ dplyr::case_when(
        stringr::str_detect(.x, "[:upper:]{2,3}(?=[:digit:]{1,2})") ~
          stringr::str_c(
            stringr::str_extract(.x, "[:upper:]{2,3}"),
            stringr::str_extract(.x, "[:digit:]{1,2}"),
            sep = " "
          ),
        TRUE ~ .x
      )
    ) |>
    # Group by the game_half to then create cumulative timeouts used for both
    # the home and away teams:
    dplyr::group_by(.data$game_id, .data$game_half) |>
    dplyr::mutate(
      total_home_timeouts_used = dplyr::if_else(
        cumsum(.data$home_timeout_used) > 3,
        3,
        cumsum(.data$home_timeout_used)
      ),
      total_away_timeouts_used = dplyr::if_else(
        cumsum(.data$away_timeout_used) > 3,
        3,
        cumsum(.data$away_timeout_used)
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::group_by(.data$game_id) |>
    # Now just take the difference between the timeouts remaining
    # columns and the total timeouts used, and create the columns for both
    # the pos and def team timeouts remaining:
    dplyr::mutate(
      home_timeouts_remaining = .data$home_timeouts_remaining -
        .data$total_home_timeouts_used,
      away_timeouts_remaining = .data$away_timeouts_remaining -
        .data$total_away_timeouts_used,
      posteam_timeouts_remaining = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$home_timeouts_remaining,
        .data$away_timeouts_remaining
      ),
      defteam_timeouts_remaining = dplyr::if_else(
        .data$defteam == .data$home_team,
        .data$home_timeouts_remaining,
        .data$away_timeouts_remaining
      ),
      # Same type of logic to calculate the score for each team and the score
      # differential in the game. First create columns to track how many points
      # were scored on a particular play based on various scoring indicators for
      # both the home and away teams:
      home_points_scored = dplyr::if_else(
        .data$touchdown == 1 &
          .data$td_team == .data$home_team,
        6,
        0
      ),
      home_points_scored = dplyr::if_else(
        .data$posteam == .data$home_team &
          .data$field_goal_made == 1,
        3,
        .data$home_points_scored
      ),
      home_points_scored = dplyr::if_else(
        .data$posteam == .data$home_team &
          (.data$extra_point_good == 1 |
            .data$extra_point_safety == 1 |
            .data$two_point_rush_safety == 1 |
            .data$two_point_pass_safety == 1),
        1,
        .data$home_points_scored
      ),
      home_points_scored = dplyr::if_else(
        .data$posteam == .data$home_team &
          (.data$two_point_rush_good == 1 |
            .data$two_point_pass_good == 1 |
            .data$two_point_pass_reception_good == 1),
        2,
        .data$home_points_scored
      ),
      home_points_scored = dplyr::if_else(
        .data$defteam == .data$home_team &
          (.data$two_point_return == 1 | .data$defensive_two_point_conv == 1),
        2,
        .data$home_points_scored
      ),
      home_points_scored = dplyr::if_else(
        .data$safety_team == .data$home_team & .data$safety == 1,
        2,
        .data$home_points_scored
      ),
      away_points_scored = dplyr::if_else(
        .data$touchdown == 1 &
          .data$td_team == .data$away_team,
        6,
        0
      ),
      away_points_scored = dplyr::if_else(
        .data$posteam == .data$away_team &
          .data$field_goal_made == 1,
        3,
        .data$away_points_scored
      ),
      away_points_scored = dplyr::if_else(
        .data$posteam == .data$away_team &
          (.data$extra_point_good == 1 |
            .data$extra_point_safety == 1 |
            .data$two_point_rush_safety == 1 |
            .data$two_point_pass_safety == 1),
        1,
        .data$away_points_scored
      ),
      away_points_scored = dplyr::if_else(
        .data$posteam == .data$away_team &
          (.data$two_point_rush_good == 1 |
            .data$two_point_pass_good == 1 |
            .data$two_point_pass_reception_good == 1),
        2,
        .data$away_points_scored
      ),
      away_points_scored = dplyr::if_else(
        .data$defteam == .data$away_team &
          (.data$two_point_return == 1 | .data$defensive_two_point_conv == 1),
        2,
        .data$away_points_scored
      ),
      away_points_scored = dplyr::if_else(
        .data$safety_team == .data$away_team & .data$safety == 1,
        2,
        .data$away_points_scored
      ),
      home_points_scored = dplyr::if_else(
        is.na(.data$home_points_scored),
        0,
        .data$home_points_scored
      ),
      away_points_scored = dplyr::if_else(
        is.na(.data$away_points_scored),
        0,
        .data$away_points_scored
      ),
      # Now create cumulative totals:
      total_home_score = cumsum(.data$home_points_scored),
      total_away_score = cumsum(.data$away_points_scored),
      posteam_score = dplyr::if_else(
        .data$posteam == .data$home_team,
        dplyr::lag(.data$total_home_score),
        dplyr::lag(.data$total_away_score)
      ),
      defteam_score = dplyr::if_else(
        .data$defteam == .data$home_team,
        dplyr::lag(.data$total_home_score),
        dplyr::lag(.data$total_away_score)
      ),
      score_differential = .data$posteam_score - .data$defteam_score,
      abs_score_differential = abs(.data$score_differential),
      # Make post score differential columns to be used for the final
      # game indicators in the win probability calculations:
      posteam_score_post = dplyr::if_else(
        .data$posteam == .data$home_team,
        .data$total_home_score,
        .data$total_away_score
      ),
      defteam_score_post = dplyr::if_else(
        .data$defteam == .data$home_team,
        .data$total_home_score,
        .data$total_away_score
      ),
      score_differential_post = .data$posteam_score_post -
        .data$defteam_score_post,
      abs_score_differential_post = abs(
        .data$posteam_score_post - .data$defteam_score_post
      ),
      # Create a variable for whether or not a touchback occurred, this
      # will apply to any type of play:
      touchback = as.numeric(stringr::str_detect(
        tolower(.data$play_description),
        "touchback"
      )),
      # There are a few plays with air_yards prior 2006 (most likely accidently)
      # To not crash the air_yac ep and wp calculation they are being set to NA
      air_yards = dplyr::if_else(.data$season < 2006, NA_real_, .data$air_yards)
    ) |>
    dplyr::rename(
      ydstogo = "yards_to_go",
      desc = "play_description",
      yrdln = "yardline",
      side_of_field = "yardline_side",
      qtr = "quarter"
    ) |>
    dplyr::filter(
      !is.na(.data$desc),
      .data$desc != "",
      !is.na(.data$qtr)
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate(
      game_id = as.character(.data$game_id),
      # kick distance is NA on kickoffs and punts that result in touchbacks
      # (unless the kick/punt) was caught between endzones
      # we use yardline_100 to add it in those cases
      is_relevant_touchback = as.numeric(
        is.na(.data$kick_distance) &
          .data$touchback == 1 &
          .data$play_type %in% c("punt", "kickoff")
      ),
      kick_distance = dplyr::case_when(
        .data$is_relevant_touchback == 1 &
          .data$kickoff_attempt == 0 ~ yardline_100,
        # gotta reverse yardline_100 on kickoffs
        .data$is_relevant_touchback == 1 & .data$kickoff_attempt == 1 ~ 100 -
          yardline_100,
        TRUE ~ .data$kick_distance
      ),
      # drop helper variable
      is_relevant_touchback = NULL
    ) |>
    fix_scrambles() |>
    make_model_mutations()

  user_message("added nflscrapR variables", "done")
  return(out)
}

# to help find kickoffs on plays with penalties
# otherwise win prob breaks down the road
kickoff_finder <- "(Offside on Free Kick)|(Delay of Kickoff)|(Onside Kick formation)|(kicks onside)|( kicks [:digit:]+ yards from)"


##some steps to prepare the data for the EP/WP/CP/FG models
make_model_mutations <- function(pbp) {
  pbp <- pbp |>
    dplyr::mutate(
      #for EP, CP, and WP model, xgb needs 0/1 for eras
      era0 = dplyr::if_else(.data$season <= 2001, 1, 0),
      era1 = dplyr::if_else(.data$season > 2001 & .data$season <= 2005, 1, 0),
      era2 = dplyr::if_else(.data$season > 2005 & .data$season <= 2013, 1, 0),
      era3 = dplyr::if_else(.data$season > 2013 & .data$season <= 2017, 1, 0),
      era4 = dplyr::if_else(.data$season > 2017, 1, 0),
      #for fg model, an era factor
      era = dplyr::case_when(
        .data$era0 == 1 ~ 0,
        .data$era1 == 1 ~ 1,
        .data$era2 == 1 ~ 2,
        .data$era3 == 1 | era4 == 1 ~ 3
      ),
      era = as.factor(.data$era),
      down1 = dplyr::if_else(.data$down == 1, 1, 0),
      down2 = dplyr::if_else(.data$down == 2, 1, 0),
      down3 = dplyr::if_else(.data$down == 3, 1, 0),
      down4 = dplyr::if_else(.data$down == 4, 1, 0),
      home = dplyr::if_else(.data$posteam == .data$home_team, 1, 0),
      model_roof = dplyr::if_else(
        is.na(.data$roof) | .data$roof == 'open' | .data$roof == 'closed',
        as.character('retractable'),
        as.character(.data$roof)
      ),
      model_roof = as.factor(.data$model_roof),
      retractable = dplyr::if_else(.data$model_roof == 'retractable', 1, 0),
      dome = dplyr::if_else(.data$model_roof == 'dome', 1, 0),
      outdoors = dplyr::if_else(.data$model_roof == 'outdoors', 1, 0)
    )

  return(pbp)
}


fix_scrambles <- function(pbp) {
  # skip below code if <= 2005 is not in the data
  if (min(pbp$season) > 2005) {
    return(pbp)
  }

  pbp |>
    dplyr::mutate(
      scramble_id = paste0(.data$game_id, "_", .data$play_id),
      qb_scramble = dplyr::if_else(
        .data$scramble_id %in% scramble_fix,
        1,
        .data$qb_scramble
      )
    ) |>
    dplyr::select(-"scramble_id")

  # Some notes on the scramble_fix:
  # This marks scrambles in the 1999 - 2005 season using charting data
  # Because NFL did not put scramble in play description during this season
  # Data from Aaron Schatz!
}

translate_play_type_nfl <- function(
  play_type_nfl,
  qb_spike,
  qb_kneel,
  pass_attempt,
  rush_attempt,
  punt_attempt,
  field_goal_attempt,
  penalty,
  is_penalty_enforced_between_downs
) {
  # I want the arg name to be descriptive, but I want a short variable name
  # for the code below
  x <- play_type_nfl

  out <- dplyr::case_when(
    x == "COMMENT" ~ NA_character_,
    x == "END_GAME" ~ NA_character_,
    x == "END_QUARTER" ~ NA_character_,
    x == "FIELD_GOAL" ~ "field_goal",
    x == "FREE_KICK" ~ "kickoff",
    x == "GAME_START" ~ NA_character_,
    x == "INTERCEPTION" ~ "pass",
    x == "KICK_OFF" ~ "kickoff",
    x == "PASS" ~ "pass",
    x == "PAT2" & pass_attempt == 1 ~ "pass",
    x == "PAT2" & rush_attempt == 1 ~ "run",
    x == "PENALTY" &
      pass_attempt == 1 &
      is_penalty_enforced_between_downs ~ "pass",
    x == "PENALTY" &
      rush_attempt == 1 &
      is_penalty_enforced_between_downs ~ "run",
    x == "PENALTY" ~ "no_play",
    x == "PUNT" ~ "punt",
    x == "RUSH" ~ "run",
    x == "SACK" ~ "pass",
    x == "TIMEOUT" ~ "no_play",
    x == "XP_KICK" ~ "extra_point",

    # UNSPECIFIED is a mix of all sorts of weird plays
    x == "UNSPECIFIED" & penalty == 1 ~ "no_play",

    # the following lines imply penalty == 0 because penalty == 1 triggers above
    x == "UNSPECIFIED" & pass_attempt == 1 ~ "pass",
    x == "UNSPECIFIED" & rush_attempt == 1 ~ "run",
    x == "UNSPECIFIED" & punt_attempt == 1 ~ "punt",
    x == "UNSPECIFIED" & field_goal_attempt == 1 ~ "field_goal",

    # most of the remaining UNSPECIFIED plays will be declined penalties
    # from punt or fg formation. These don't really count as play so we define
    # them as no_play
    x == "UNSPECIFIED" ~ "no_play",

    # default
    TRUE ~ ""
  )

  # every play_type_nfl that we do not catch in the above cases
  # will be an empty string. We try to resolve these as good as we can
  # also need to replace passes and runs that were spikes and kneel downs
  dplyr::case_when(
    out == "" & penalty == 1 ~ "no_play",
    out == "" & pass_attempt == 1 ~ "pass",
    out == "" & rush_attempt == 1 ~ "run",
    out == "" & punt_attempt == 1 ~ "punt",
    out == "" & field_goal_attempt == 1 ~ "field_goal",
    qb_spike == 1 & out %in% c("pass", "run") ~ "qb_spike",
    qb_kneel == 1 & out %in% c("pass", "run") ~ "qb_kneel",
    TRUE ~ out
  )
}

# we overwrite kickoff_attempt for kickoffs with penalties because
# those mess with ep/epa/wp/wpa. Since this is inconsistent compared to
# all other *_attempt variables, we will restore kickoff_attempt after
# models are applied. That's done with a temporary copy of kickoff_attempt.
# See #556, #202, #199 for example
restore_kickoff_attempt <- function(pbp) {
  pbp |>
    dplyr::mutate(
      kickoff_attempt = .data$copy_of_kickoff_attempt,
      copy_of_kickoff_attempt = NULL
    )
}


================================================
FILE: R/helper_add_series_data.R
================================================
################################################################################
# Author: Sebastian Carl, Ben Baldwin
# Purpose: Function to add series variables analogue Lee Sharpe's Version
# Code Style Guide: styler::tidyverse_style()
################################################################################

## series =
##  starts at 1, each new first down increments, numbers shared across both teams
##  NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam
## series_success =
##  1: scored touchdown, gained enough yards for first down
##  0: everything else
add_series_data <- function(pbp) {
  out <-
    pbp |>
    dplyr::mutate(
      old_posteam = .data$posteam,
      posteam = dplyr::case_when(
        # on kickoffs the kicking team is the defteam but this should be swapped
        # in terms of this function if the kickoff is recovered
        .data$kickoff_attempt == 1 &
          (.data$own_kickoff_recovery == 1 |
            .data$fumble_lost == 1) ~ .data$defteam,
        # if a kickoff has to be replayed due to a penalty and is then recovered,
        # the prior (reversed) kickoff shouldn't be a new drive/series
        stringr::str_detect(.data$desc, kickoff_finder) &
          .data$own_kickoff_recovery == 0 &
          dplyr::lead(.data$own_kickoff_recovery == 1) ~ .data$defteam,
        TRUE ~ .data$posteam
      )
    ) |>
    dplyr::group_by(.data$game_id, .data$game_half) |>
    dplyr::mutate(
      row = 1:dplyr::n(),
      new_series = dplyr::if_else(
        # a new drive
        .data$fixed_drive != dplyr::lag(.data$fixed_drive) |
          # or a first down on the prior play except touchdown plays
          ((dplyr::lag(.data$first_down_rush) == 1 |
            dplyr::lag(.data$first_down_pass) == 1 |
            dplyr::lag(.data$first_down_penalty) == 1) &
            dplyr::lag(.data$touchdown) == 0) |
          # or the first play
          .data$row == 1,
        1,
        0
      ),
      new_series = dplyr::if_else(is.na(.data$new_series), 0, .data$new_series)
    ) |>
    # now compute series number with cumsum (for the calculation NA are being relaced with 0)
    dplyr::group_by(.data$game_id) |>
    dplyr::mutate(
      series = cumsum(.data$new_series),
      tmp_result = dplyr::case_when(
        (.data$first_down_penalty == 1 |
          .data$first_down_rush == 1 |
          .data$first_down_pass == 1) &
          touchdown == 0 ~ "First down",
        .data$touchdown == 1 & .data$posteam == .data$td_team ~ "Touchdown",
        .data$touchdown == 1 & .data$posteam != .data$td_team ~ "Opp touchdown",
        .data$field_goal_result == "made" ~ "Field goal",
        .data$field_goal_result %in%
          c("blocked", "missed") ~ "Missed field goal",
        .data$safety == 1 ~ "Safety",
        .data$play_type == "punt" | .data$punt_attempt == 1 ~ "Punt",
        .data$interception == 1 | .data$fumble_lost == 1 ~ "Turnover",
        .data$down == 4 &
          .data$yards_gained < .data$ydstogo &
          .data$play_type != "no_play" ~ "Turnover on downs",
        .data$qb_kneel == 1 ~ "QB kneel",
        stringr::str_detect(
          .data$desc,
          "(END QUARTER 2)|(END QUARTER 4)|(END GAME)"
        ) ~ "End of half"
      )
    ) |>
    dplyr::group_by(.data$game_id, .data$series) |>
    dplyr::mutate(
      series_result = dplyr::if_else(
        # if it's end of half, take the first thing we see
        dplyr::last(stats::na.omit(.data$tmp_result)) == "End of half",
        dplyr::first(stats::na.omit(.data$tmp_result)),
        # otherwise take the last
        dplyr::last(stats::na.omit(.data$tmp_result))
      ),
      series_success = dplyr::if_else(
        .data$series_result %in% c("Touchdown", "First down"),
        1,
        0
      )
    ) |>
    dplyr::ungroup() |>
    dplyr::mutate(posteam = .data$old_posteam) |>
    dplyr::select(-"row", -"tmp_result", -"new_series", -"old_posteam")

  user_message("added series variables", "done")
  return(out)
}


================================================
FILE: R/helper_add_xpass.R
================================================
################################################################################
# Author: Ben Baldwin
# Stlyeguide: styler::tidyverse_style()
################################################################################

#' Add expected pass columns
#'
#' @inheritParams clean_pbp
#' @description Build columns from the expected dropback model. Will return
#' `NA` on data prior to 2006 since that was before NFL started marking scrambles.
#' Must be run on a dataframe that has already had [clean_pbp()] run on it.
#' Note that the functions [build_nflfastR_pbp()] and
#' the database function [update_db()] already include this function.
#' @return The input Data Frame of the parameter `pbp` with the following columns
#' added:
#' \describe{
#' \item{xpass}{Probability of dropback scaled from 0 to 1.}
#' \item{pass_oe}{Dropback percent over expected on a given play scaled from 0 to 100.}
#' }
#' @export
add_xpass <- function(pbp, ...) {
  if (nrow(pbp) == 0) {
    user_message("Nothing to do. Return passed data frame.", "info")
    return(pbp)
  }
  pbp <- pbp |> dplyr::select(-dplyr::any_of(c("xpass", "pass_oe")))
  plays <- prepare_xpass_data(pbp)

  if (!nrow(plays |> dplyr::filter(.data$valid_play == 1)) == 0) {
    user_message("Computing xpass...", "todo")

    pred <- stats::predict(
      load_model("xpass"),
      as.matrix(plays |> dplyr::select(-"valid_play"))
    ) |>
      tibble::as_tibble() |>
      dplyr::rename(xpass = "value") |>
      dplyr::bind_cols(plays) |>
      dplyr::select("xpass", "valid_play")

    pbp <- pbp |>
      dplyr::bind_cols(pred) |>
      dplyr::mutate(
        xpass = dplyr::if_else(
          .data$valid_play == 1,
          .data$xpass,
          NA_real_
        ),
        pass_oe = dplyr::if_else(
          !is.na(.data$xpass),
          100 * (.data$pass - .data$xpass),
          NA_real_
        ),
        pass_oe = dplyr::if_else(
          .data$rush == 0 & .data$pass == 0,
          NA_real_,
          .data$pass_oe
        )
      ) |>
      dplyr::select(-"valid_play")

    message_completed("added xpass and pass_oe", ...)
  } else {
    pbp <- pbp |>
      dplyr::mutate(
        xpass = NA_real_,
        pass_oe = NA_real_
      )
    user_message(
      "No non-NA values for xpass calculation detected. xpass and pass_oe set to NA",
      "info"
    )
  }
  return(pbp)
}

prepare_xpass_data <- function(pbp) {
  plays <- pbp |>
    dplyr::mutate(
      valid_play = dplyr::if_else(
        .data$season >= 2006 &
          .data$play_type %in% c("no_play", "pass", "run") &
          !is.na(.data$posteam) &
          !is.na(.data$down) &
          !is.na(.data$defteam_timeouts_remaining) &
          !is.na(.data$posteam_timeouts_remaining) &
          !is.na(.data$yardline_100) &
          !is.na(.data$score_differential),
        1,
        0
      )
    ) |>
    make_model_mutations() |>
    dplyr::select(
      "valid_play",
      "down",
      "ydstogo",
      "yardline_100",
      "qtr",
      "wp",
      "vegas_wp",
      "era2",
      "era3",
      "era4",
      "score_differential",
      "home",
      "half_seconds_remaining",
      "posteam_timeouts_remaining",
      "defteam_timeouts_remaining",
      "outdoors",
      "retractable",
      "dome"
    )

  return(plays)
}


================================================
FILE: R/helper_add_xyac.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl
# Purpose: Function to add expected yac variables.
# Code Style Guide: styler::tidyverse_style()
################################################################################
#' Add expected yards after completion (xyac) variables
#'
#' @inheritParams clean_pbp
#' @details Build columns that capture what we should expect after the catch.
#' @return The input Data Frame of the parameter 'pbp' with the following columns
#' added:
#' \describe{
#' \item{xyac_epa}{Expected value of EPA gained after the catch, starting from where the catch was made. Zero yards after the catch would be listed as zero EPA.}
#' \item{xyac_success}{Probability play earns positive EPA (relative to where play started) based on where ball was caught.}
#' \item{xyac_fd}{Probability play earns a first down based on where the ball was caught.}
#' \item{xyac_mean_yardage}{Average expected yards after the catch based on where the ball was caught.}
#' \item{xyac_median_yardage}{Median expected yards after the catch based on where the ball was caught.}
#' }
#' @export
add_xyac <- function(pbp, ...) {
  if (nrow(pbp) == 0) {
    user_message("Nothing to do. Return passed data frame.", "info")
  } else {
    # testing only
    # pbp <- g

    pbp <- pbp |> dplyr::select(-dplyr::any_of(drop.cols.xyac))

    # for joining at the end
    pbp <- pbp |>
      dplyr::mutate(index = 1:dplyr::n())

    # prepare_xyac_data helper function shown below
    passes <- prepare_xyac_data(pbp) |>
      dplyr::filter(.data$valid_pass == 1, .data$distance_to_goal != 0)

    if (!nrow(passes) == 0) {
      user_message("Computing xyac...", "todo")
      join_data <- passes |>
        dplyr::select(
          "index",
          "distance_to_goal",
          "season",
          "week",
          "home_team",
          "posteam",
          "roof",
          "half_seconds_remaining",
          "down",
          "ydstogo",
          "posteam_timeouts_remaining",
          "defteam_timeouts_remaining",
          "original_spot" = "yardline_100",
          "original_ep" = "ep",
          "air_epa",
          "air_yards"
        ) |>
        dplyr::mutate(
          down = as.integer(.data$down),
          ydstogo = as.integer(.data$ydstogo),
          original_ydstogo = .data$ydstogo
        ) |>
        dplyr::select(
          "index":"ydstogo",
          "original_ydstogo",
          dplyr::everything()
        )

      preds <- stats::predict(
        load_model("xyac"),
        as.matrix(passes |> xyac_model_select())
      )

      # xgboost v3 returns a matrix of predictions but the below code is designed
      # to work with a vector as returned by xgboost v1.
      # We switch back to the vector (transposing might be expensive) in order
      # to keep the rest of the code (for now).
      if (is.matrix(preds)) {
        preds <- preds |>
          t() |>
          as.vector("numeric")
      }

      xyac_vars <- preds |>
        tibble::as_tibble() |>
        dplyr::rename(prob = "value") |>
        dplyr::bind_cols(
          tibble::tibble(
            "yac" = rep_len(-5:70, length.out = nrow(passes) * 76),
            "index" = rep(
              passes$index,
              times = rep_len(76, length.out = nrow(passes))
            )
          ) |>
            dplyr::left_join(join_data, by = "index") |>
            dplyr::mutate(
              half_seconds_remaining = dplyr::if_else(
                .data$half_seconds_remaining <= 6,
                0,
                .data$half_seconds_remaining - 6
              )
            )
        ) |>
        dplyr::group_by(.data$index) |>
        dplyr::mutate(
          max_loss = dplyr::if_else(
            .data$distance_to_goal < 95,
            -5,
            .data$distance_to_goal - 99
          ),
          max_gain = dplyr::if_else(
            .data$distance_to_goal > 70,
            70,
            .data$distance_to_goal
          ),
          cum_prob = cumsum(.data$prob),
          prob = dplyr::case_when(
            # truncate probs at loss greater than max loss
            .data$yac == .data$max_loss ~ .data$cum_prob,
            # same for gains bigger than possible
            .data$yac == .data$max_gain ~ 1 - dplyr::lag(.data$cum_prob, 1),
            TRUE ~ .data$prob
          ),
          # get updated end result for each possibility
          yardline_100 = .data$distance_to_goal - .data$yac
        ) |>
        dplyr::filter(
          .data$yac >= .data$max_loss,
          .data$yac <= .data$max_gain
        ) |>
        dplyr::select(-"cum_prob") |>
        dplyr::mutate(
          posteam_timeouts_pre = .data$posteam_timeouts_remaining,
          defeam_timeouts_pre = .data$defteam_timeouts_remaining,
          gain = .data$original_spot - .data$yardline_100,
          turnover = dplyr::if_else(
            .data$down == 4 & .data$gain < .data$ydstogo,
            as.integer(1),
            as.integer(0)
          ),
          down = dplyr::if_else(.data$gain >= .data$ydstogo, 1, .data$down + 1),
          ydstogo = dplyr::if_else(
            .data$gain >= .data$ydstogo,
            10,
            .data$ydstogo - .data$gain
          ),
          # possession change if 4th down failed
          down = dplyr::if_else(
            .data$turnover == 1,
            as.integer(1),
            as.integer(.data$down)
          ),
          ydstogo = dplyr::if_else(
            .data$turnover == 1,
            as.integer(10),
            as.integer(.data$ydstogo)
          ),
          # save yardline_100 for yards gained calculation
          yardline_100_noflip = .data$yardline_100,
          # flip yardline_100 and timeouts for turnovers for EP calculation
          yardline_100 = dplyr::if_else(
            .data$turnover == 1,
            as.integer(100 - .data$yardline_100),
            as.integer(.data$yardline_100)
          ),
          posteam_timeouts_remaining = dplyr::if_else(
            .data$turnover == 1,
            .data$defeam_timeouts_pre,
            .data$posteam_timeouts_pre
          ),
          defteam_timeouts_remaining = dplyr::if_else(
            .data$turnover == 1,
            .data$posteam_timeouts_pre,
            .data$defeam_timeouts_pre
          ),
          # ydstogo can't be bigger than yardline
          ydstogo = dplyr::if_else(
            .data$ydstogo >= .data$yardline_100,
            as.integer(.data$yardline_100),
            as.integer(.data$ydstogo)
          )
        ) |>
        dplyr::ungroup() |>
        nflfastR::calculate_expected_points() |>
        dplyr::group_by(.data$index) |>
        dplyr::mutate(
          ep = dplyr::case_when(
            .data$yardline_100 == 0 ~ 7,
            .data$turnover == 1 ~ -1 * .data$ep,
            TRUE ~ ep
          ),
          epa = .data$ep - .data$original_ep,
          wt_epa = .data$epa * .data$prob,
          wt_yardln = .data$yardline_100_noflip * .data$prob,
          med = dplyr::if_else(
            cumsum(.data$prob) > .5 & dplyr::lag(cumsum(.data$prob) < .5),
            .data$yac,
            as.integer(0)
          )
        ) |>
        dplyr::summarise(
          xyac_epa = sum(.data$wt_epa) - dplyr::first(.data$air_epa),
          xyac_mean_yardage = (dplyr::first(.data$original_spot) -
            dplyr::first(.data$air_yards)) -
            sum(.data$wt_yardln),
          xyac_median_yardage = max(.data$med),
          xyac_success = sum((.data$ep > .data$original_ep) * .data$prob),
          xyac_fd = sum((.data$gain >= .data$original_ydstogo) * .data$prob)
        ) |>
        dplyr::ungroup()

      pbp <- pbp |>
        dplyr::left_join(xyac_vars, by = "index") |>
        dplyr::select(-"index")

      message_completed("added xyac variables", ...)
    } else {
      # means no valid pass plays in the pbp
      pbp <- pbp |>
        dplyr::mutate(
          xyac_epa = NA_real_,
          xyac_mean_yardage = NA_real_,
          xyac_median_yardage = NA_real_,
          xyac_success = NA_real_,
          xyac_fd = NA_real_
        ) |>
        dplyr::select(-"index")
      user_message(
        "No non-NA values for xyac calculation detected. xyac variables set to NA",
        "info"
      )
    }
  }

  return(pbp)
}


### helper function for getting the data ready
prepare_xyac_data <- function(pbp) {
  # valid pass play: at least -15 air yards, less than 70 air yards, has intended receiver, has pass location
  passes <- pbp |>
    make_model_mutations() |>
    dplyr::mutate(
      receiver_player_name = stringr::str_extract(
        .data$desc,
        glue::glue('{receiver_finder}{big_parser}')
      ),
      pass_middle = dplyr::if_else(.data$pass_location == "middle", 1, 0),
      air_is_zero = dplyr::if_else(.data$air_yards == 0, 1, 0),
      distance_to_sticks = .data$air_yards - .data$ydstogo,
      distance_to_goal = .data$yardline_100 - .data$air_yards,
      valid_pass = dplyr::if_else(
        (.data$complete_pass == 1 |
          .data$incomplete_pass == 1 |
          .data$interception == 1) &
          !is.na(.data$air_yards) &
          .data$air_yards >= -15 &
          .data$air_yards < 70 &
          !is.na(.data$receiver_player_name) &
          !is.na(.data$pass_location),
        1,
        0
      )
    )
  return(passes)
}

### another helper function for getting the data ready
xyac_model_select <- function(pbp) {
  pbp |>
    dplyr::select(
      "air_yards",
      "yardline_100",
      "ydstogo",
      "distance_to_goal",
      "down1",
      "down2",
      "down3",
      "down4",
      "air_is_zero",
      "pass_middle",
      "era2",
      "era3",
      "era4",
      "qb_hit",
      "home",
      "outdoors",
      "retractable",
      "dome",
      "distance_to_sticks"
    )
}

# These columns are being generated by add_xyac and the function tries to drop
# them in case it is being used on a pbp dataset where the columns already exist
drop.cols.xyac <- c(
  "xyac_epa",
  "xyac_mean_yardage",
  "xyac_median_yardage",
  "xyac_success",
  "xyac_fd"
)


================================================
FILE: R/helper_additional_functions.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl, Tan Ho
# Stlyeguide: styler::tidyverse_style()
################################################################################

#' Clean Play by Play Data
#'
#' @param pbp is a Data frame of play-by-play data scraped using [fast_scraper()].
#' @param ... Additional arguments passed to a message function (for internal use).
#' @details Build columns that capture what happens on all plays, including
#' penalties, using string extraction from play description.
#' Loosely based on Ben's nflfastR guide (<https://nflfastr.com/articles/beginners_guide.html>)
#' but updated to work with the RS data, which has a different player format in
#' the play description; e.g. 24-M.Lynch instead of M.Lynch.
#' The function also standardizes team abbreviations so that, for example,
#' the Chargers are always represented by 'LAC' regardless of which year it was.
#' Starting in 2022, play-by-play data was missing gsis player IDs of rookies.
#' This functions tries to fix as many as possible.
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @return The input Data Frame of the parameter 'pbp' with the following columns
#' added:
#' \describe{
#' \item{success}{Binary indicator wheter epa > 0 in the given play. }
#' \item{passer}{Name of the dropback player (scrambles included) including plays with penalties.}
#' \item{passer_jersey_number}{Jersey number of the passer.}
#' \item{rusher}{Name of the rusher (no scrambles) including plays with penalties.}
#' \item{rusher_jersey_number}{Jersey number of the rusher.}
#' \item{receiver}{Name of the receiver including plays with penalties.}
#' \item{receiver_jersey_number}{Jersey number of the receiver.}
#' \item{pass}{Binary indicator if the play was a pass play (sacks and scrambles included).}
#' \item{rush}{Binary indicator if the play was a rushing play.}
#' \item{special}{Binary indicator if the play was a special teams play.}
#' \item{first_down}{Binary indicator if the play ended in a first down.}
#' \item{aborted_play}{Binary indicator if the play description indicates "Aborted".}
#' \item{play}{Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.}
#' \item{passer_id}{ID of the player in the 'passer' column.}
#' \item{rusher_id}{ID of the player in the 'rusher' column.}
#' \item{receiver_id}{ID of the player in the 'receiver' column.}
#' \item{name}{Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.}
#' \item{fantasy}{Name of the rusher on rush plays or receiver on pass plays.}
#' \item{fantasy_id}{ID of the rusher on rush plays or receiver on pass plays.}
#' \item{fantasy_player_name}{Name of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{fantasy_player_id}{ID of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{jersey_number}{Jersey number of the player listed in the 'name' column.}
#' \item{id}{ID of the player in the 'name' column.}
#' \item{out_of_bounds}{= 1 if play description contains "ran ob", "pushed ob", or "sacked ob"; = 0 otherwise.}
#' \item{home_opening_kickoff}{= 1 if the home team received the opening kickoff, 0 otherwise.}
#' }
#' @export
clean_pbp <- function(pbp, ...) {
  if (nrow(pbp) == 0) {
    user_message("Nothing to clean. Return passed data frame.", "info")
    r <- pbp
  } else {
    user_message("Cleaning up play-by-play...", "todo")

    # drop existing values of clean_pbp
    pbp <- pbp |> dplyr::select(-dplyr::any_of(drop.cols))

    r <- pbp |>
      dplyr::mutate(
        aborted_play = dplyr::if_else(
          stringr::str_detect(.data$desc, 'Aborted'),
          1,
          0
        ),
        #get rid of extraneous spaces that mess with player name finding
        #if there is a space or dash, and then a capital letter, and then a period, and then a space, take out the space
        desc = stringr::str_replace_all(
          .data$desc,
          "(((\\s)|(\\-))[A-Z]\\.)\\s+",
          "\\1"
        ),
        success = dplyr::if_else(
          is.na(.data$epa),
          NA_real_,
          dplyr::if_else(.data$epa > 0, 1, 0)
        ),
        passer = stringr::str_extract(
          .data$desc,
          glue::glue('{big_parser}{pass_finder}')
        ),
        passer_jersey_number = stringr::str_extract(
          stringr::str_extract(
            .data$desc,
            glue::glue('{number_parser}{big_parser}{pass_finder}')
          ),
          "[:digit:]*"
        ) |>
          as.integer(),
        rusher = stringr::str_extract(
          .data$desc,
          glue::glue('{big_parser}{rush_finder}')
        ),
        rusher_jersey_number = stringr::str_extract(
          stringr::str_extract(
            .data$desc,
            glue::glue('{number_parser}{big_parser}{rush_finder}')
          ),
          "[:digit:]*"
        ) |>
          as.integer(),
        #get rusher_player_name as a measure of last resort
        #finds things like aborted snaps and "F.Last to NYG 44."
        rusher = dplyr::if_else(
          is.na(.data$rusher) &
            is.na(.data$passer) &
            !is.na(.data$rusher_player_name),
          .data$rusher_player_name,
          .data$rusher
        ),
        receiver = stringr::str_extract(
          .data$desc,
          glue::glue('{receiver_finder}{big_parser}')
        ),
        receiver_jersey_number = stringr::str_extract(
          stringr::str_extract(
            .data$desc,
            glue::glue('{receiver_number}{big_parser}')
          ),
          "[:digit:]*"
        ) |>
          as.integer(),
        #overwrite all these weird plays messing with the parser
        receiver = dplyr::case_when(
          stringr::str_detect(.data$desc, glue::glue('{abnormal_play}')) &
            !is.na(.data$receiver_player_name) ~ .data$receiver_player_name,
          TRUE ~ .data$receiver
        ),
        rusher = dplyr::case_when(
          stringr::str_detect(.data$desc, glue::glue('{abnormal_play}')) &
            !is.na(.data$rusher_player_name) ~ .data$rusher_player_name,
          TRUE ~ .data$rusher
        ),
        passer = dplyr::case_when(
          stringr::str_detect(.data$desc, glue::glue('{abnormal_play}')) &
            !is.na(.data$passer_player_name) ~ .data$passer_player_name,
          TRUE ~ .data$passer
        ),
        # fix the plays where scramble was fixed using charting data from 1999 to 2005
        passer = dplyr::case_when(
          is.na(.data$passer) &
            .data$qb_scramble == 1 &
            !is.na(.data$rusher) &
            .data$season <= 2005 ~ .data$rusher,
          TRUE ~ .data$passer
        ),
        # finally, for rusher, if there was already a passer (eg from scramble), set rusher to NA
        rusher = dplyr::if_else(
          !is.na(.data$passer),
          NA_character_,
          .data$rusher
        ),
        # if no pass is thrown, there shouldn't be a receiver
        receiver = dplyr::if_else(
          stringr::str_detect(.data$desc, ' pass '),
          .data$receiver,
          NA_character_
        ),
        # if there's a pass, sack, or scramble, it's a pass play...
        pass = dplyr::if_else(
          stringr::str_detect(.data$desc, "( pass )|(sacked)|(scramble)") |
            .data$qb_scramble == 1,
          1,
          0
        ),
        # ...unless it says "backward(s) pass" or "lateral pass" and there's a rusher
        pass = dplyr::if_else(
          stringr::str_detect(
            stringr::str_to_lower(.data$desc),
            "(backward pass)|(backwards pass)|(lateral pass)"
          ) &
            !is.na(.data$rusher),
          0,
          .data$pass
        ),
        # and make sure there's no pass on a kickoff (sometimes there's forward pass on kickoff but that's not a pass play)
        pass = dplyr::case_when(
          .data$kickoff_attempt == 1 ~ 0,
          TRUE ~ .data$pass
        ),
        # in very rare cases, the pass logic can fail. We do a hard coded overwrite here because it's not worth the time
        # to overthink the logic to catch weird play descriptions.
        pass = fix_weird_pass_plays(.data$pass, .data$game_id, .data$play_id),
        #if there's a rusher and it wasn't a QB kneel or pass play, it's a run play
        rush = dplyr::if_else(
          !is.na(.data$rusher) & .data$qb_kneel == 0 & .data$pass == 0,
          1,
          0
        ),
        #fix some common QBs with inconsistent names
        passer = dplyr::case_when(
          passer == "Jos.Allen" ~ "J.Allen",
          passer == "Alex Smith" | passer == "Ale.Smith" ~ "A.Smith",
          passer == "Ryan" & .data$posteam == "ATL" ~ "M.Ryan",
          passer == "Tr.Brown" ~ "T.Brown",
          passer == "Sh.Hill" ~ "S.Hill",
          passer == "Matt.Moore" | passer == "Mat.Moore" ~ "M.Moore",
          passer == "Jo.Freeman" ~ "J.Freeman",
          passer == "G.Minshew" ~ "G.Minshew II",
          passer == "R.Griffin" ~ "R.Griffin III",
          passer == "Randel El" ~ "A.Randle El",
          passer == "Randle El" ~ "A.Randle El",
          season <= 2003 & passer == "Van Pelt" ~ "A.Van Pelt",
          season > 2003 & passer == "Van Pelt" ~ "B.Van Pelt",
          passer == "Dom.Davis" ~ "D.Davis",
          TRUE ~ .data$passer
        ),
        rusher = dplyr::case_when(
          rusher == "D.Johnson" &
            posteam == "HOU" &
            season == 2020 &
            rusher_jersey_number == 31 ~ "Da.Johnson",
          rusher == "D.Johnson" &
            posteam == "HOU" &
            season == 2020 &
            rusher_jersey_number == 25 ~ "Du.Johnson",
          rusher == "Jos.Allen" ~ "J.Allen",
          rusher == "Alex Smith" | rusher == "Ale.Smith" ~ "A.Smith",
          rusher == "Ryan" & .data$posteam == "ATL" ~ "M.Ryan",
          rusher == "Tr.Brown" ~ "T.Brown",
          rusher == "Sh.Hill" ~ "S.Hill",
          rusher == "Matt.Moore" | rusher == "Mat.Moore" ~ "M.Moore",
          rusher == "Jo.Freeman" ~ "J.Freeman",
          rusher == "G.Minshew" ~ "G.Minshew II",
          rusher == "R.Griffin" ~ "R.Griffin III",
          rusher == "Randel El" ~ "A.Randle El",
          rusher == "Randle El" ~ "A.Randle El",
          season <= 2003 & rusher == "Van Pelt" ~ "A.Van Pelt",
          season > 2003 & rusher == "Van Pelt" ~ "B.Van Pelt",
          rusher == "Dom.Davis" ~ "D.Davis",
          TRUE ~ rusher
        ),
        receiver = dplyr::case_when(
          receiver == "F.R" ~ "F.Jones",
          receiver_player_name == "D.Wells" &
            receiver_player_id == "00-0017421" ~ "D.Wells",
          receiver_player_name == "D.Hayes" &
            receiver_player_id == "00-0007144" ~ "D.Hayes",
          receiver_player_name == "DanielThomas" ~ "D.Thomas",
          receiver_player_name == "JulioJones" ~ "J.Jones",
          receiver_player_name == "Andre' Davis" ~ "A.Davis",
          receiver_player_name == "A.al-Jabbar" ~ "A.al-Jabbar",
          receiver_player_name == "A.St. Brown" ~ "A.St. Brown",
          TRUE ~ receiver
        ),
        first_down = dplyr::if_else(
          .data$first_down_rush == 1 |
            .data$first_down_pass == 1 |
            .data$first_down_penalty == 1,
          1,
          0
        ),
        # easy filter: play is 1 if a "special teams" play, or 0 otherwise
        # with thanks to Lee Sharpe for the code
        special = dplyr::if_else(
          .data$play_type %in%
            c("extra_point", "field_goal", "kickoff", "punt"),
          1,
          0
        ),
        # easy filter: play is 1 if a "normal" play (including penalties), or 0 otherwise
        # with thanks to Lee Sharpe for the code
        play = dplyr::if_else(
          !is.na(.data$epa) &
            !is.na(.data$posteam) &
            .data$desc != "*** play under review ***" &
            substr(.data$desc, 1, 8) != "Timeout " &
            .data$play_type %in% c("no_play", "pass", "run"),
          1,
          0
        )
      ) |>
      #standardize team names (eg Chargers are always LAC even when they were playing in SD)
      dplyr::mutate_at(
        dplyr::vars(
          "posteam",
          "defteam",
          "home_team",
          "away_team",
          "timeout_team",
          "td_team",
          "return_team",
          "penalty_team",
          "side_of_field",
          "forced_fumble_player_1_team",
          "forced_fumble_player_2_team",
          "solo_tackle_1_team",
          "solo_tackle_2_team",
          "assist_tackle_1_team",
          "assist_tackle_2_team",
          "assist_tackle_3_team",
          "assist_tackle_4_team",
          "tackle_with_assist_1_team",
          "tackle_with_assist_2_team",
          "fumbled_1_team",
          "fumbled_2_team",
          "fumble_recovery_1_team",
          "fumble_recovery_2_team",
          "yrdln",
          "end_yard_line",
          "drive_start_yard_line",
          "drive_end_yard_line"
        ),
        team_name_fn
      ) |>

      #Seb's stuff for fixing player ids
      dplyr::mutate(index = 1:dplyr::n()) |> # to re-sort after all the group_bys

      dplyr::group_by(.data$passer, .data$posteam, .data$season) |>
      dplyr::mutate(
        passer_id = dplyr::if_else(
          is.na(.data$passer),
          NA_character_,
          custom_mode(.data$passer_player_id)
        )
      ) |>

      dplyr::group_by(.data$passer_id) |>
      dplyr::mutate(
        passer = dplyr::if_else(
          is.na(.data$passer_id),
          NA_character_,
          custom_mode(.data$passer)
        )
      ) |>

      dplyr::group_by(.data$rusher, .data$posteam, .data$season) |>
      dplyr::mutate(
        rusher_id = dplyr::if_else(
          is.na(.data$rusher),
          NA_character_,
          custom_mode(.data$rusher_player_id)
        )
      ) |>

      dplyr::group_by(.data$rusher_id) |>
      dplyr::mutate(
        rusher = dplyr::if_else(
          is.na(.data$rusher_id),
          NA_character_,
          custom_mode(.data$rusher)
        )
      ) |>

      dplyr::group_by(.data$receiver, .data$posteam, .data$season) |>
      dplyr::mutate(
        receiver_id = dplyr::if_else(
          is.na(.data$receiver),
          NA_character_,
          custom_mode(.data$receiver_player_id)
        )
      ) |>

      dplyr::group_by(.data$receiver_id) |>
      dplyr::mutate(
        receiver = dplyr::if_else(
          is.na(.data$receiver_id),
          NA_character_,
          custom_mode(.data$receiver)
        )
      ) |>

      dplyr::ungroup() |>
      dplyr::mutate(
        # if there's an aborted snap and qb didn't get a pass off,
        # then charge it to whoever charged with the fumble
        # this has to go after all the custom_mode stuff or it gets messed up
        rusher = dplyr::if_else(
          .data$aborted_play == 1 &
            is.na(.data$passer) &
            !is.na(.data$fumbled_1_player_name),
          .data$fumbled_1_player_name,
          .data$rusher
        ),
        rusher_id = dplyr::if_else(
          .data$aborted_play == 1 &
            is.na(.data$passer) &
            !is.na(.data$fumbled_1_player_id),
          .data$fumbled_1_player_id,
          .data$rusher_id
        ),

        name = dplyr::if_else(!is.na(.data$passer), .data$passer, .data$rusher),
        jersey_number = dplyr::if_else(
          !is.na(.data$passer_jersey_number),
          .data$passer_jersey_number,
          .data$rusher_jersey_number
        ),
        id = dplyr::if_else(
          !is.na(.data$passer_id),
          .data$passer_id,
          .data$rusher_id
        )
      ) |>
      dplyr::arrange(.data$index) |>
      dplyr::select(-"index") |>
      # add action player
      dplyr::mutate(
        fantasy_player_name = case_when(
          !is.na(.data$rusher_player_name) ~ .data$rusher_player_name,
          is.na(.data$rusher_player_name) &
            !is.na(.data$receiver_player_name) ~ .data$receiver_player_name,
          TRUE ~ NA_character_
        ),
        fantasy_player_id = case_when(
          !is.na(.data$rusher_player_id) ~ .data$rusher_player_id,
          is.na(.data$rusher_player_id) &
            !is.na(.data$receiver_player_id) ~ .data$receiver_player_id,
          TRUE ~ NA_character_
        ),
        fantasy = case_when(
          !is.na(.data$rusher) ~ .data$rusher,
          is.na(.data$rusher) & !is.na(.data$receiver) ~ .data$receiver,
          .data$qb_scramble == 1 ~ .data$passer,
          TRUE ~ NA_character_
        ),
        fantasy_id = case_when(
          !is.na(.data$rusher_id) ~ .data$rusher_id,
          is.na(.data$rusher_id) &
            !is.na(.data$receiver_id) ~ .data$receiver_id,
          .data$qb_scramble == 1 ~ .data$passer_id,
          TRUE ~ NA_character_
        ),
        out_of_bounds = dplyr::if_else(
          stringr::str_detect(.data$desc, "(ran ob)|(pushed ob)|(sacked ob)"),
          1,
          0
        )
      ) |>
      dplyr::group_by(.data$game_id) |>
      dplyr::mutate(
        home_opening_kickoff = dplyr::if_else(
          .data$home_team == dplyr::first(stats::na.omit(.data$posteam)),
          1,
          0
        )
      ) |>
      dplyr::ungroup()
  }

  message_completed("Cleaning completed", ...)

  return(r)
}

#these things are used in clean_pbp() above

# look for First[period or space]Last[maybe - or ' in last][maybe more letters in last][maybe Jr. or II or IV]
big_parser <- "(?<=)[A-Z][A-z]*+(\\.|\\s)+[A-Z][A-z]*+\\'*\\-*[A-Z]*+[a-z]*+(\\s((Jr.)|(Sr.)|I{2,3})|(IV))?"
# maybe some spaces and letters, and then a rush direction unless they fumbled
rush_finder <- "(?=\\s*[a-z]*+\\s*((FUMBLES) | (left end)|(left tackle)|(left guard)|(up the middle)|(right guard)|(right tackle)|(right end)))"
# maybe some spaces and letters, and then pass / sack / scramble
pass_finder <- "(?=\\s*[a-z]*+\\s*(( pass)|(sack)|(scramble)))"
# to or for, maybe a jersey number and a dash
receiver_finder <- "(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})"
# weird play finder
abnormal_play <- "(Lateral)|(lateral)|(pitches to)|(Direct snap to)|(New quarterback for)|(Aborted)|(backwards pass)|(Pass back to)|(Flea-flicker)"
# look for 1-2 numbers before a dash
number_parser <- "((?<=)[:digit:]{1,2}(-))?"
# special case for receivers
receiver_number <- "(?<=((to)|(for))\\s)[:digit:]{0,2}\\-{0,1}"

# These columns are being generated by clean_pbp and the function tries to drop
# them in case it is being used on a pbp dataset where the columns already exist
drop.cols <- c(
  "success",
  "passer",
  "rusher",
  "receiver",
  "pass",
  "rush",
  "special",
  "first_down",
  "play",
  "passer_id",
  "rusher_id",
  "receiver_id",
  "name",
  "id",
  "passer_jersey_number",
  "rusher_jersey_number",
  "receiver_jersey_number",
  "jersey_number",
  "aborted_play",
  "fantasy",
  "fantasy_id",
  "fantasy_player_name",
  "fantasy_player_id",
  "out_of_bounds"
)

# fixes team names on columns with yard line
# example: 'SD 49' --> 'LAC 49'
# thanks to awgymer for the contribution:
# https://github.com/nflverse/nflfastR/issues/29#issuecomment-654592195
team_name_fn <- function(var) {
  stringr::str_replace_all(
    var,
    c(
      "JAC" = "JAX",
      "STL" = "LA",
      "SL" = "LA",
      "LAR" = "LA",
      "ARZ" = "ARI",
      "BLT" = "BAL",
      "CLV" = "CLE",
      "HST" = "HOU",
      "SD" = "LAC",
      "OAK" = "LV"
    )
  )
}

#' Compute QB epa
#'
#' @inheritParams clean_pbp
#' @details Add the variable 'qb_epa', which gives QB credit for EPA for up to the point where
#' a receiver lost a fumble after a completed catch and makes EPA work more
#' like passing yards on plays with fumbles
#' @export
add_qb_epa <- function(pbp, ...) {
  if (nrow(pbp) == 0) {
    user_message("Nothing to do. Return passed data frame.", "info")
  } else {
    # drop existing values of clean_pbp
    pbp <- pbp |> dplyr::select(-dplyr::any_of("qb_epa"))

    fumbles_df <- pbp |>
      dplyr::filter(
        .data$complete_pass == 1 &
          .data$fumble_lost == 1 &
          !is.na(.data$epa) &
          !is.na(.data$down)
      ) |>
      dplyr::mutate(
        half_seconds_remaining = dplyr::if_else(
          .data$half_seconds_remaining <= 6,
          0,
          .data$half_seconds_remaining - 6
        ),
        down = as.numeric(.data$down),
        # save old stuff for testing/checking
        posteam_timeouts_pre = .data$posteam_timeouts_remaining,
        defeam_timeouts_pre = .data$defteam_timeouts_remaining,
        down_old = .data$down,
        ydstogo_old = .data$ydstogo,
        epa_old = .data$epa,
        # update yard line, down, yards to go from play result
        yardline_100 = .data$yardline_100 - .data$yards_gained,
        down = dplyr::if_else(
          .data$yards_gained >= .data$ydstogo,
          1,
          .data$down + 1
        ),
        # if the fumble spot would have resulted in turnover on downs, need to give other team the ball and fix
        change = dplyr::if_else(.data$down == 5, 1, 0),
        down = dplyr::if_else(.data$down == 5, 1, .data$down),
        # yards to go is 10 if its a first down, update otherwise
        ydstogo = dplyr::if_else(
          .data$down == 1,
          10,
          .data$ydstogo - .data$yards_gained
        ),
        # 10 yards to go if possession change
        ydstogo = dplyr::if_else(.data$change == 1, 10, .data$ydstogo),
        # flip field and timeouts for possession change
        yardline_100 = dplyr::if_else(
          .data$change == 1,
          100 - .data$yardline_100,
          .data$yardline_100
        ),
        posteam_timeouts_remaining = dplyr::if_else(
          .data$change == 1,
          .data$defeam_timeouts_pre,
          .data$posteam_timeouts_pre
        ),
        defteam_timeouts_remaining = dplyr::if_else(
          .data$change == 1,
          .data$posteam_timeouts_pre,
          .data$defeam_timeouts_pre
        ),
        # fix yards to go for goal line (eg can't have 1st & 10 inside opponent 10 yard line)
        ydstogo = dplyr::if_else(
          .data$yardline_100 < .data$ydstogo,
          .data$yardline_100,
          .data$ydstogo
        ),
        ep_old = .data$ep
      ) |>
      dplyr::select(
        "game_id",
        "play_id",
        "season",
        "home_team",
        "posteam",
        "roof",
        "half_seconds_remaining",
        "yardline_100",
        "down",
        "ydstogo",
        "posteam_timeouts_remaining",
        "defteam_timeouts_remaining",
        "down_old",
        "ep_old",
        "change"
      )

    if (nrow(fumbles_df) > 0) {
      new_ep_df <- calculate_expected_points(fumbles_df) |>
        dplyr::mutate(
          ep = dplyr::if_else(.data$change == 1, -.data$ep, .data$ep),
          fixed_epa = .data$ep - .data$ep_old
        ) |>
        dplyr::select("game_id", "play_id", "fixed_epa")

      pbp <- pbp |>
        dplyr::left_join(new_ep_df, by = c("game_id", "play_id")) |>
        dplyr::mutate(
          qb_epa = dplyr::if_else(
            !is.na(.data$fixed_epa),
            .data$fixed_epa,
            .data$epa
          )
        ) |>
        dplyr::select(-"fixed_epa")
    } else {
      pbp <- pbp |> dplyr::mutate(qb_epa = .data$epa)
    }
  }

  message_completed("added qb_epa", ...)

  return(pbp)
}

# Function that fixes false "pass" positives in some hard coded plays where
# the parser logic reached its limit
fix_weird_pass_plays <- function(pass, game_id, play_id) {
  combined_id <- paste(game_id, play_id, sep = "_")
  false_positives <- c(
    "1999_01_ARI_PHI_1611",
    "1999_01_SF_JAX_1788",
    "1999_01_SF_JAX_2081",
    "1999_11_ATL_TB_1740",
    "2001_09_MIN_PHI_1307",
    "2001_14_NE_BUF_452",
    "2002_16_PIT_TB_527",
    "2003_02_HOU_NO_3924",
    "2003_15_PIT_NYJ_873",
    "2004_05_BUF_NYJ_2555",
    "2005_07_SD_PHI_321",
    "2011_02_STL_NYG_1369",
    "2016_05_NE_CLE_912",
    "2016_06_CAR_NO_2690",
    "2020_10_BAL_NE_2013"
  )
  data.table::fifelse(combined_id %chin% false_positives, 0, pass, pass)
}


================================================
FILE: R/helper_database_functions.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Create and update database with nflfastR pbp data
# Code Style Guide: styler::tidyverse_style()
################################################################################

#' Update or Create a nflfastR Play-by-Play Database
#'
#' `update_db` updates or creates a database with `nflfastR`
#' play by play data of all completed games since 1999.
#'
#' @details This function creates and updates a data table with the name `tblname`
#' within a SQLite database (other drivers via `db_connection`) located in
#' `dbdir` and named `dbname`.
#' The data table combines all play by play data for every available game back
#' to the 1999 season and adds the most recent completed games as soon as they
#' are available for `nflfastR`.
#'
#' The argument `force_rebuild` is of hybrid type. It can rebuild the play
#' by play data table either for the whole nflfastR era (with `force_rebuild = TRUE`)
#' or just for specified seasons (e.g. `force_rebuild = c(2019, 2020)`).
#' Please note the following behavior:
#' * `force_rebuild = TRUE`: The data table with the name `tblname`
#'   will be removed completely and rebuilt from scratch. This is helpful when
#'   new columns are added during the Off-Season.
#' * `force_rebuild = c(2019, 2020)`: The data table with the name `tblname`
#'   will be preserved and only rows from the 2019 and 2020 seasons will be
#'   deleted and re-added. This is intended to be used for ongoing seasons because
#'   the NFL fixes bugs in the underlying data during the week and we recommend
#'   rebuilding the current season every Thursday during the season.
#'
#' The parameter `db_connection` is intended for advanced users who want
#' to use other DBI drivers, such as MariaDB, Postgres or odbc. Please note that
#' the arguments `dbdir` and `dbname` are dropped in case a `db_connection`
#' is provided but the argument `tblname` will still be used to write the
#' data table into the database.
#'
#' @param dbdir Directory in which the database is or shall be located. Can also
#'   be set globally with `options(nflfastR.dbdirectory)`
#' @param dbname File name of an existing or desired SQLite database within `dbdir`
#' @param tblname The name of the play by play data table within the database
#' @param force_rebuild Hybrid parameter (logical or numeric) to rebuild parts
#' of or the complete play by play data table within the database (please see details for further information)
#' @param db_connection A `DBIConnection` object, as returned by
#' [DBI::dbConnect()] (please see details for further information)
#' @export
update_db <- function(
  dbdir = getOption("nflfastR.dbdirectory", default = "."),
  dbname = "pbp_db",
  tblname = "nflfastR_pbp",
  force_rebuild = FALSE,
  db_connection = NULL
) {
  rule_header("Update nflfastR Play-by-Play Database")

  if (
    !is_installed("DBI") |
      !is_installed("purrr") |
      (!is_installed("RSQLite") & is.null(db_connection))
  ) {
    cli::cli_abort(
      "{my_time()} | Packages {.pkg DBI}, {.pkg RSQLite} and {.pkg purrr} required for database communication. Please install them."
    )
  }

  if (any(force_rebuild == "NEW")) {
    cli::cli_abort(
      "{my_time()} | The argument {.arg force_rebuild = NEW} is only for internal usage!"
    )
  }

  if (!(is.logical(force_rebuild) | is.numeric(force_rebuild))) {
    cli::cli_abort(
      "{my_time()} | The argument {.arg force_rebuild} has to be either logical or numeric!"
    )
  }

  if (!dir.exists(dbdir) & is.null(db_connection)) {
    cli::cli_alert_danger(
      "{my_time()} | Directory {.file {dbdir}} doesn't exist yet. Try creating..."
    )
    dir.create(dbdir)
  }

  if (is.null(db_connection)) {
    connection <- DBI::dbConnect(RSQLite::SQLite(), file.path(dbdir, dbname))
  } else {
    connection <- db_connection
  }

  # create db if it doesn't exist or user forces rebuild
  if (!DBI::dbExistsTable(connection, tblname)) {
    build_db(tblname, connection, rebuild = "NEW")
  } else if (
    DBI::dbExistsTable(connection, tblname) & all(force_rebuild != FALSE)
  ) {
    build_db(tblname, connection, rebuild = force_rebuild)
  }

  # get completed games using Lee's file (thanks Lee!)
  user_message("Checking for missing completed games...", "todo")
  completed_games <- nflreadr::load_schedules() |>
    # completed games since 1999, excluding the broken games
    dplyr::filter(
      .data$season >= 1999,
      !is.na(.data$result),
      !.data$game_id %in%
        c("1999_01_BAL_STL", "2000_06_BUF_MIA", "2000_03_SD_KC")
    ) |>
    dplyr::arrange(.data$gameday) |>
    dplyr::pull(.data$game_id)

  # function below
  missing <- get_missing_games(completed_games, connection, tblname)

  # rebuild db if number of missing games is too large
  if (length(missing) > 16) {
    # limit set to >16 to make sure this doesn't get triggered on gameday (e.g. week 17)
    build_db(
      tblname,
      connection,
      show_message = FALSE,
      rebuild = as.numeric(unique(stringr::str_sub(missing, 1, 4)))
    )
    missing <- get_missing_games(completed_games, connection, tblname)
  }

  # if there's missing games, scrape and write to db
  if (length(missing) > 0) {
    new_pbp <- build_nflfastR_pbp(missing, rules = FALSE)

    if (nrow(new_pbp) == 0) {
      user_message(
        "Raw data of new games are not yet ready. Please try again in about 10 minutes.",
        "oops"
      )
    } else {
      user_message("Appending new data to database...", "todo")
      DBI::dbWriteTable(connection, tblname, new_pbp, append = TRUE)
    }
  }

  # Remove default play which is just a helper to define columns correctly
  DBI::dbExecute(
    connection,
    glue::glue_sql(
      "DELETE FROM {`tblname`} WHERE game_id IN ({vals*})",
      vals = "9999_99_DEF_TYP",
      .con = connection
    )
  )

  message_completed("Database update completed", in_builder = TRUE)
  cli::cli_alert_info(
    "{my_time()} | Path to your db: {.file {DBI::dbGetInfo(connection)$dbname}}"
  )
  if (is.null(db_connection)) {
    DBI::dbDisconnect(connection)
  }
  rule_footer("DONE")
}

# this is a helper function to build nflfastR database from Scratch
build_db <- function(
  tblname = "nflfastR_pbp",
  db_conn,
  rebuild = FALSE,
  show_message = TRUE
) {
  valid_seasons <- nflreadr::load_schedules() |>
    dplyr::filter(.data$season >= 1999 & !is.na(.data$result)) |>
    dplyr::group_by(.data$season) |>
    dplyr::summarise() |>
    dplyr::ungroup()

  if (all(rebuild == TRUE)) {
    cli::cli_ul(
      "{my_time()} | Purging the complete data table {.val {tblname}}
                in your connected database..."
    )
    DBI::dbRemoveTable(db_conn, tblname)
    seasons <- valid_seasons |> dplyr::pull("season")
    cli::cli_ul(
      "{my_time()} | Starting download of {length(seasons)} seasons
                between {min(seasons)} and {max(seasons)}..."
    )
  } else if (is.numeric(rebuild) & all(rebuild %in% valid_seasons$season)) {
    # s <- glue::glue_collapse(rebuild, sep = ", ", last = ", and ")
    # string <- stringr::str_c(stringr::str_sub(s, 1, 11), "...", stringr::str_sub(s, -16, -1))
    if (show_message) {
      cli::cli_ul(
        "{my_time()} | Purging
                                  {cli::qty(length(rebuild))}season{?s} {rebuild}
                                  from the data table {.val {tblname}} in your
                                  connected database..."
      )
    }
    DBI::dbExecute(
      db_conn,
      glue::glue_sql(
        "DELETE FROM {`tblname`} WHERE season IN ({vals*})",
        vals = rebuild,
        .con = db_conn
      )
    )
    seasons <- valid_seasons |>
      dplyr::filter(.data$season %in% rebuild) |>
      dplyr::pull("season")
    cli::cli_ul(
      "{my_time()} | Starting download of the {length(rebuild)}
                season{?s} {rebuild}"
    )
  } else if (all(rebuild == "NEW")) {
    cli::cli_alert_info(
      "{my_time()} | Can't find the data table {.val {tblname}}
                        in your database. Will load the play by play data from
                        scratch."
    )
    seasons <- valid_seasons |> dplyr::pull("season")
    cli::cli_ul(
      "{my_time()} | Starting download of {length(seasons)} seasons
                between {min(seasons)} and {max(seasons)}..."
    )
  } else {
    seasons <- NULL
    cli::cli_alert_danger(
      "{my_time()} | At least one invalid value passed to argument {.arg force_rebuild}. Please try again with valid input."
    )
  }

  if (!is.null(seasons)) {
    # this function lives in R/utils.R
    write_pbp(seasons, dbConnection = db_conn, tablename = tblname)
  }
}

# this is a helper function to check a list of completed games
# against the games that exist in a database connection
get_missing_games <- function(completed_games, dbConnection, tablename) {
  db_ids <- dplyr::tbl(dbConnection, tablename) |>
    dplyr::select("game_id") |>
    dplyr::filter(.data$game_id != "9999_99_DEF_TYP") |>
    dplyr::distinct() |>
    dplyr::collect() |>
    dplyr::pull("game_id")

  need_scrape <- completed_games[
    !completed_games %in% c(db_ids, "9999_99_DEF_TYP")
  ]

  cli::cli_alert_info(
    "{my_time()} | You have {length(db_ids)} game{?s} and are missing {length(need_scrape)}."
  )
  return(need_scrape)
}


================================================
FILE: R/helper_decode_player_ids.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Function to decode play-by-play player IDs.
# Code Style Guide: styler::tidyverse_style()
################################################################################

#' Decode the player IDs in nflfastR play-by-play data
#'
#' @inheritParams clean_pbp
#' @param fast If `TRUE` the IDs will be decoded with the high efficient
#' function [decode_ids][gsisdecoder::decode_ids]. If `FALSE` an nflfastR internal
#' function will be used for decoding (it is generally not recommended to do this,
#' unless there is a problem with [decode_ids][gsisdecoder::decode_ids]
#' which can take several days to fix on CRAN.)
#'
#' @description Takes all columns ending with \code{'player_id'} as well as the
#' variables \code{'passer_id'}, \code{'rusher_id'}, \code{'fantasy_id'},
#' \code{'receiver_id'}, and \code{'id'} of an nflfastR play-by-play data set
#' and decodes the player IDs to the commonly known GSIS ID format 00-00xxxxx.
#'
#' The function uses by default the high efficient [decode_ids][gsisdecoder::decode_ids]
#' of the package [`gsisdecoder`](https://cran.r-project.org/package=gsisdecoder).
#' In the unlikely event that there is a problem with this function, an nflfastR
#' internal decoder can be used with the option `fast = FALSE`.
#'
#' The 2022 play by play data introduced new player IDs that can't be decoded
#' with gsisdecoder. In that case, IDs are joined through [nflreadr::load_players].
#'
#' @return The input data frame of the parameter `pbp` with decoded player IDs.
#' @export
#' @examples
#' \donttest{
#' # Decode data frame consisting of some names and ids
#' decode_player_ids(data.frame(
#'   name = c("P.Mahomes", "B.Baldwin", "P.Mahomes", "S.Carl", "J.Jones"),
#'   id = c(
#'     "32013030-2d30-3033-3338-3733fa30c4fa",
#'     NA_character_,
#'     "00-0033873",
#'     NA_character_,
#'     "32013030-2d30-3032-3739-3434d4d3846d"
#'   )
#' ))
#' }
decode_player_ids <- function(pbp, ..., fast = TRUE) {
  # need newer version of nflreadr to use load_players
  rlang::check_installed("nflreadr (>= 1.3.0)", "to decode player IDs.")

  if (isFALSE(fast)) {
    if (nrow(pbp) > 1000 && is_sequential()) {
      cli::cli_alert_info(c(
        "It is recommended to use parallel processing when trying to to decode big data frames.",
        "Please consider running {.code future::plan(\"multisession\")}! ",
        "Will go on sequentially..."
      ))
    }
    decode_gsis <- decode_ids
  } else if (isTRUE(fast)) {
    rlang::check_installed("gsisdecoder", "to run fast decoding of player IDs.")
    decode_gsis <- gsisdecoder::decode_ids
  }

  user_message("Decode player ids...", "todo")

  players <- nflreadr::load_players()

  id_vector <- players$gsis_id
  names(id_vector) <- players$esb_id

  ret <- pbp |>
    dplyr::mutate_at(
      dplyr::vars(
        dplyr::any_of(c(
          "passer_id",
          "rusher_id",
          "receiver_id",
          "id",
          "fantasy_id"
        )),
        dplyr::ends_with("player_id")
      ),
      function(id, id_vec = id_vector) {
        chars <- nchar(id)
        dplyr::case_when(
          is.na(chars) ~ NA_character_,
          # this means it's gsis ID. 30 30 2d 30 30 translates to 00-00
          stringr::str_sub(id, 5, 16) == "3030-2d30-30" ~ decode_gsis(id),
          # if it's not gsis, it is likely elias. We drop names to avoid confusion
          nchar(id) == 36 ~ unname(id_vec[extract_elias(
            id,
            decoder = decode_gsis
          )]),
          TRUE ~ id
        )
      }
    )

  message_completed("Decoding of player ids completed", ...)

  ret
}

decode_ids <- function(var) {
  furrr::future_map_chr(var, convert_to_gsis_id)
}

convert_to_gsis_id <- function(new_id) {
  if (is.na(new_id) | stringr::str_length(new_id) != 36) {
    ret <- new_id
  } else {
    to_decode <- new_id |>
      stringr::str_sub(5, -9) |>
      stringr::str_replace_all("-", "")
    hex_raw <- sapply(seq(1, nchar(to_decode), by = 2), function(x) {
      substr(to_decode, x, x + 1)
    })
    ret <- rawToChar(as.raw(strtoi(hex_raw, 16L)))
  }
  return(ret)
}

extract_elias <- function(smart_id, decoder) {
  name_abbr <- decoder(smart_id) |> substr(1, 3)
  id_no <- stringr::str_remove_all(smart_id, "-") |>
    stringr::str_sub(11, 16)
  elias_id <- paste0(name_abbr, id_no)
  elias_id
}


================================================
FILE: R/helper_get_scheds_and_rosters.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Function for loading schedules and rosters from nflfastR repos
# Code Style Guide: styler::tidyverse_style()
################################################################################

get_scheds_and_rosters <- function(season, type) {
  type <- match.arg(type, choices = c("schedule", "roster"))

  switch(
    type,
    "schedule" = nflreadr::load_schedules(season),
    "roster" = nflreadr::load_rosters(season)
  )
}


================================================
FILE: R/helper_scrape_gc.R
================================================
################################################################################
# Author: Ben Baldwin
# Stlyeguide: styler::tidyverse_style()
################################################################################

# Build a tidy version of scraped gamecenter data
# Data exist since 1999
#
# @param gameId Specifies the game

get_pbp_gc <- function(
  gameId,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...
) {
  # testing only
  # gameId = '2013120812'
  # gameId = '2019_01_GB_CHI'
  # gameId = '2009_18_NYJ_CIN'
  # gameId = '2007_01_ARI_SF'
  # gameId = '1999_01_BAL_STL'
  # gameId <- "2000_03_PIT_CLE"

  if (gameId %in% c("2000_03_SD_KC", "2000_06_BUF_MIA", "1999_01_BAL_STL")) {
    cli::cli_abort("You asked for GameID {.val {gameId}} is broken. Skipping.")
  }

  season <- as.integer(substr(gameId, 1, 4))

  raw <- fetch_raw(game_id = gameId, dir = dir)

  game_json <- raw[[1]]

  date_parse <- names(raw)[1] |> stringr::str_extract(pattern = "[0-9]{8}")
  date_year <- stringr::str_sub(date_parse, 1, 4)
  date_month <- stringr::str_sub(date_parse, 5, 6)
  date_day <- stringr::str_sub(
    date_parse,
    nchar(date_parse) - 1,
    nchar(date_parse)
  )

  week <- as.integer(substr(gameId, 6, 7))
  if (week <= 17) {
    season_type <- "REG"
  } else {
    season_type <- "POST"
  }

  if (date_year < 1999) {
    cli::cli_abort(
      "You asked a game from {date_year}, but data only goes back to 1999."
    )
  }

  # excluding last element since it's "crntdrv" and not an actual
  drives <- game_json$drives[-length(game_json$drives)]

  # list of plays
  # each play has "players" column which is a list of player stats from the play
  plays <- suppressWarnings(furrr::future_map_dfr(
    seq_along(drives),
    function(x) {
      cbind(
        "drive" = x,
        data.frame(do.call(
          rbind,
          drives[[x]]$plays
        ))[, c(1:11)]
      ) |>
        dplyr::mutate(
          play_id = names(drives[[x]]$plays),
          play_id = as.numeric(.data$play_id)
        )
    }
  ))

  # some 2000 games have play_ids like 2767.375 and 2767.703 which results in
  # duplicates that can be fixed. We save play IDs as numeric first and then
  # check whether or not there are duplicates when we convert them to integer
  # If there are duplicates, we multiply all play IDs by 10 and check again
  # If there are still duplicates, we multiply all play IDs by 100 and so on
  # As soon as play IDs are unique, we save them as integer and go on
  plays$play_id <- uniquify_ids(plays$play_id)

  plays$quarter_end <- dplyr::if_else(
    stringr::str_detect(
      plays$desc,
      "(END QUARTER)|(END GAME)|(End of quarter)"
    ),
    1,
    0
  )
  plays$home_team <- game_json$home$abbr
  plays$away_team <- game_json$away$abbr

  # get df with 1 line per statId
  stats <- furrr::future_map_dfr(seq_along(plays$play_id), function(x) {
    dplyr::bind_rows(plays[x, ]$players[[1]], .id = "player_id") |>
      dplyr::mutate(play_id = plays[x, ]$play_id)
  }) |>
    dplyr::mutate(
      sequence = as.numeric(.data$sequence),
      statId = as.numeric(.data$statId),
      play_id = as.character(.data$play_id),
      yards = as.integer(.data$yards)
    ) |>
    dplyr::arrange(.data$play_id, .data$sequence) |>
    dplyr::rename(
      playId = "play_id",
      teamAbbr = "clubcode",
      player.esbId = "player_id",
      player.displayName = "playerName",
      playStatSeq = "sequence"
    )

  pbp_stats <- lapply(unique(stats$playId), sum_play_stats, stats)
  pbp_stats <- data.table::rbindlist(pbp_stats) |> tibble::as_tibble()

  # drive info
  d <- tibble::tibble(drives) |>
    tidyr::unnest_wider(drives) |>
    # dplyr::select(-plays) |>
    tidyr::unnest_wider("start", names_sep = "_") |>
    tidyr::unnest_wider("end", names_sep = "_") |>
    dplyr::mutate(drive = 1:dplyr::n()) |>
    dplyr::rename(
      drive_play_count = "numplays",
      drive_time_of_possession = "postime",
      drive_first_downs = "fds",
      drive_inside20 = "redzone",
      drive_quarter_start = "start_qtr",
      drive_quarter_end = "end_qtr",
      drive_end_transition = "result",
      drive_game_clock_start = "start_time",
      drive_game_clock_end = "end_time",
      drive_start_yard_line = "start_yrdln",
      drive_end_yard_line = "end_yrdln"
    ) |>
    dplyr::mutate(
      drive_inside20 = dplyr::if_else(.data$drive_inside20, 1, 0),
      drive_how_ended_description = .data$drive_end_transition,
      drive_ended_with_score = dplyr::if_else(
        .data$drive_how_ended_description == "Touchdown" |
          .data$drive_how_ended_description == "Field Goal",
        1,
        0
      ),
      drive_start_transition = dplyr::lag(.data$drive_how_ended_description, 1),
      drive_how_started_description = .data$drive_start_transition
    ) |>
    dplyr::select(
      "drive",
      "drive_play_count",
      "drive_time_of_possession",
      "drive_first_downs",
      "drive_inside20",
      "drive_ended_with_score",
      "drive_quarter_start",
      "drive_quarter_end",
      "drive_end_transition",
      "drive_how_ended_description",
      "drive_game_clock_start",
      "drive_game_clock_end",
      "drive_start_yard_line",
      "drive_end_yard_line",
      "drive_start_transition",
      "drive_how_started_description"
    )

  combined <- plays |>
    dplyr::left_join(pbp_stats, by = "play_id") |>
    dplyr::mutate_if(is.logical, as.numeric) |>
    dplyr::mutate_if(is.integer, as.numeric) |>
    dplyr::select(-"players", -"note") |>
    #Weirdly formatted and missing anyway
    dplyr::mutate(note = NA_character_) |>
    dplyr::rename(
      yardline = "yrdln",
      quarter = "qtr",
      play_description = "desc",
      yards_to_go = "ydstogo"
    ) |>
    tidyr::unnest(
      cols = c(
        "sp",
        "quarter",
        "down",
        "time",
        "yardline",
        "yards_to_go",
        "ydsnet",
        "posteam",
        "play_description",
        "note"
      )
    ) |>
    dplyr::left_join(d, by = "drive") |>
    dplyr::mutate(
      posteam_id = .data$posteam,
      game_id = gameId,
      game_year = as.integer(date_year),
      game_month = as.integer(date_month),
      game_date = as.Date(
        paste(date_month, date_day, date_year, sep = "/"),
        format = "%m/%d/%Y"
      ),
      season = season,

      # fix up yardline before doing stuff. from nflscrapr
      yardline = dplyr::if_else(
        .data$yardline == "50",
        "MID 50",
        .data$yardline
      ),
      yardline = dplyr::if_else(
        nchar(.data$yardline) == 0 |
          is.null(.data$yardline) |
          .data$yardline == "NULL",
        dplyr::lag(.data$yardline),
        .data$yardline
      ),

      # have to do all this nonsense to make goal_to_go and yardline_side for compatibility with later functions
      yardline_side = furrr::future_map_chr(
        stringr::str_split(.data$yardline, " "),
        function(x) x[1]
      ),
      yardline_number = as.numeric(furrr::future_map_chr(
        stringr::str_split(.data$yardline, " "),
        function(x) x[2]
      )),
      goal_to_go = dplyr::if_else(
        .data$yardline_side != .data$posteam &
          ((.data$yards_to_go == .data$yardline_number) |
            (.data$yards_to_go <= 1 & .data$yardline_number == 1)),
        1,
        0
      ),
      down = as.double(.data$down),
      quarter = as.double(.data$quarter),
      week = week,
      season_type = season_type,
      # missing from older gc data
      drive_real_start_time = NA_character_,
      start_time = NA_character_,
      stadium = NA_character_,
      weather = NA_character_,
      nfl_api_id = NA_character_,
      play_clock = NA_character_,
      play_deleted = NA_real_,
      play_type_nfl = NA_character_,
      drive_yards_penalized = NA_real_,
      end_clock_time = NA_character_,
      end_yard_line = NA_character_,
      order_sequence = NA_real_,
      time_of_day = NA_character_,
      special_teams_play = NA_real_,
      st_play_type = NA_character_,
      # there seems to be no easy way to find the safety scoring team. Will hard code the plays
      # as there are only 6 of them in the game center data
      safety_team = dplyr::case_when(
        .data$safety == 1 &
          .data$game_id == "1999_04_PHI_NYG" &
          .data$play_id == 827 ~ .data$posteam,
        .data$safety == 1 &
          .data$game_id == "2000_03_ATL_CAR" &
          .data$play_id == 3423 ~ .data$posteam,
        .data$safety == 1 &
          .data$game_id == "2000_16_OAK_SEA" &
          .data$play_id == 3590 ~ .data$posteam,
        .data$safety == 1 &
          .data$game_id == "2001_14_DAL_SEA" &
          .data$play_id == 2552 ~ .data$posteam,
        .data$safety == 1 &
          .data$game_id == "2003_03_NO_TEN" &
          .data$play_id == 416 ~ .data$posteam,
        .data$safety == 1 &
          .data$game_id == "2009_08_STL_DET" &
          .data$play_id == 987 ~ .data$posteam,
        .data$safety == 1 & .data$posteam == .data$home_team ~ .data$away_team,
        .data$safety == 1 & .data$posteam == .data$away_team ~ .data$home_team,
        TRUE ~ NA_character_
      )
    ) |>
    dplyr::group_by(.data$drive) |>
    dplyr::mutate(
      drive_play_id_started = min(.data$play_id, na.rm = TRUE),
      drive_play_seq_started = min(.data$play_id, na.rm = TRUE),
      drive_play_id_ended = max(.data$play_id, na.rm = TRUE),
      drive_play_seq_ended = max(.data$play_id, na.rm = TRUE)
    ) |>
    dplyr::ungroup()

  # missing space in side of field breaks parser
  if (gameId %in% c('2000_01_CAR_WAS', '2000_02_NE_NYJ', '2000_03_ATL_CAR')) {
    combined <- combined |>
      dplyr::mutate(
        yardline_number = case_when(
          .data$yardline %in% c("WAS20", "NYJ20", "ATL20") ~ 20,
          TRUE ~ .data$yardline_number
        ),
        yardline = case_when(
          .data$yardline == "WAS20" ~ "WAS 20",
          .data$yardline == "NYJ20" ~ "NYJ 20",
          .data$yardline == "ATL20" ~ "ATL 20",
          TRUE ~ .data$yardline
        ),
        yardline_side = case_when(
          .data$yardline_side == "WAS20" ~ "WAS",
          .data$yardline_side == "NYJ20" ~ "NYJ",
          .data$yardline_side == "ATL20" ~ "ATL",
          TRUE ~ .data$yardline_side
        )
      )
  }
  return(combined)
}


================================================
FILE: R/helper_scrape_nfl.R
================================================
################################################################################
# Author: Sebastian Carl, Ben Baldwin
# Purpose: Function for scraping pbp data from the new NFL web site
# Code Style Guide: styler::tidyverse_style()
################################################################################

# Build a tidy version of scraped NFL data
#
# @param id Specifies the game
get_pbp_nfl <- function(
  id,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...
) {
  #testing
  #id = '2022_01_PHI_DET'
  # id = '2015_01_CAR_JAX'
  #id = '2011_01_NO_GB'

  season <- substr(id, 1, 4)
  week <- as.integer(substr(id, 6, 7))

  raw_data <- fetch_raw(game_id = id, dir = dir)

  season_type <- dplyr::case_when(
    season <= 2020 & week <= 17 ~ "REG",
    season >= 2021 & week <= 18 ~ "REG",
    TRUE ~ "POST"
  )

  game_id <- raw_data$data$viewer$gameDetail$id
  home_team <- raw_data$data$viewer$gameDetail$homeTeam$abbreviation
  away_team <- raw_data$data$viewer$gameDetail$visitorTeam$abbreviation
  home_team <- data.table::fcase(
    home_team == "JAC" , "JAX" ,
    home_team == "SD"  , "LAC" ,
    default = home_team
  )
  away_team <- data.table::fcase(
    away_team == "JAC" , "JAX" ,
    away_team == "SD"  , "LAC" ,
    default = away_team
  )

  # if home team and away team are the same, the game is messed up and needs fixing
  if (home_team == away_team) {
    # get correct home and away from the game ID
    id_parts <- stringr::str_split(id, "_")
    away_team <- id_parts[[1]][3]
    home_team <- id_parts[[1]][4]
    bad_game <- 1
  } else {
    bad_game <- 0
  }

  weather <- ifelse(
    is.null(raw_data$data$viewer$gameDetail$weather$shortDescription),
    NA_character_,
    raw_data$data$viewer$gameDetail$weather$shortDescription
  )
  stadium <- ifelse(
    is.null(raw_data$data$viewer$gameDetail$stadium),
    NA_character_,
    raw_data$data$viewer$gameDetail$stadium
  )
  start_time <- raw_data$data$viewer$gameDetail$startTime

  game_info <- tibble::tibble(
    game_id = as.character(game_id),
    home_team,
    away_team,
    weather,
    stadium,
    start_time
  )

  plays <- raw_data$data$viewer$gameDetail$plays |>
    dplyr::mutate(game_id = as.character(game_id))

  # We have this issue https://github.com/nflverse/nflfastR/issues/309 with 2013 postseason games
  # where the driveSequenceNumber in the plays df is NA for all plays. That prevents drive information
  # from being joined.
  # In this case, we compute our own driveSequenceNumber by incrementing a counter depending on the
  # value of driveTimeOfPossession.
  # driveTimeOfPossession will be a constant value during a drive so this should actually be accurate
  if (all(is.na(plays$driveSequenceNumber))) {
    plays <- plays |>
      dplyr::mutate(
        # First, create a trigger for cumsum
        drive_trigger = dplyr::case_when(
          # this is the first play of the first drive
          is.na(dplyr::lag(.data$driveTimeOfPossession)) &
            !is.na(.data$driveTimeOfPossession) ~ 1,
          # if driveTimeOfPossession changes, there is a new drive
          dplyr::lag(.data$driveTimeOfPossession) !=
            .data$driveTimeOfPossession ~ 1,
          TRUE ~ 0
        ),
        # Now create the drive number by accumulationg triggers
        driveSequenceNumber = cumsum(.data$drive_trigger),
        # driveSequenceNumber should be NA on plays where driveTimeOfPossession is NA
        driveSequenceNumber = ifelse(
          is.na(.data$driveTimeOfPossession),
          NA_real_,
          .data$driveSequenceNumber
        ),
        # drop the helper
        drive_trigger = NULL
      )
  }

  drives <- raw_data$data$viewer$gameDetail$drives |>
    dplyr::mutate(ydsnet = .data$yards + .data$yardsPenalized) |>
    # these are already in plays
    dplyr::select(
      -"possessionTeam.abbreviation",
      -"possessionTeam.nickName",
      -"possessionTeam.franchise.currentLogo.url"
    ) |>
    janitor::clean_names()
  colnames(drives) <- paste0("drive_", colnames(drives))

  stats <- tidyr::unnest(
    plays |> dplyr::select(-"yards"),
    cols = c("playStats")
  ) |>
    dplyr::mutate(
      yards = as.integer(.data$yards),
      statId = as.numeric(.data$statId),
      team.abbreviation = as.character(.data$team.abbreviation)
    ) |>
    dplyr::rename(
      player.esbId = "gsisPlayer.id",
      player.displayName = "playerName",
      teamAbbr = "team.abbreviation"
    ) |>
    dplyr::select(
      "playId",
      "statId",
      "yards",
      "teamAbbr",
      "player.displayName",
      "player.esbId"
    )

  # there was a penalty on this play so these stat IDs shouldn't exist
  if (id == "2020_10_DEN_LV") {
    stats <- stats |>
      dplyr::filter(!(.data$playId == 979 & .data$statId %in% c(8, 10, 79)))
  }

  pbp_stats <- lapply(unique(stats$playId), sum_play_stats, stats)
  pbp_stats <- data.table::rbindlist(pbp_stats) |> tibble::as_tibble()

  combined <- game_info |>
    dplyr::bind_cols(plays |> dplyr::select(-"playStats", -"game_id")) |>
    dplyr::left_join(
      drives,
      by = c("driveSequenceNumber" = "drive_order_sequence")
    ) |>
    dplyr::left_join(pbp_stats, by = c("playId" = "play_id")) |>
    dplyr::mutate_if(is.logical, as.numeric) |>
    dplyr::mutate_if(is.integer, as.numeric) |>
    dplyr::mutate_if(is.factor, as.character) |>
    # The abbreviations SD <-> LAC and JAC <-> JAX are mixed up in the raw json data
    # to make sure team names match, we normalize the names here
    # We also remove new line characters esp. from desc
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ team_name_fn(.x) |>
        stringr::str_replace_all("[\r\n]", " ") |>
        stringr::str_squish()
    ) |>
    janitor::clean_names() |>
    dplyr::select(
      -"drive_play_count",
      -"drive_time_of_possession",
      -"next_play_type"
    ) |>
    dplyr::rename(
      time = "clock_time",
      play_type_nfl = "play_type",
      posteam = "possession_team_abbreviation",
      yardline = "yard_line",
      sp = "scoring_play",
      drive = "drive_sequence_number",
      nfl_api_id = "game_id",
      drive_play_count = "drive_play_count_2",
      drive_time_of_possession = "drive_time_of_possession_2",
      ydsnet = "drive_ydsnet"
    ) |>
    dplyr::mutate(
      posteam_id = .data$posteam,
      # have to do all this nonsense to make goal_to_go and yardline_side for compatibility with later functions
      yardline_side = str_split_and_extract(.data$yardline, " ", 1),
      yardline_number = as.numeric(str_split_and_extract(
        .data$yardline,
        " ",
        2
      )),
      quarter_end = dplyr::if_else(
        stringr::str_detect(.data$play_description, "END QUARTER"),
        1,
        0
      ),
      game_year = as.integer(season),
      season = as.integer(season),
      # this is only needed for epa and dropped later
      game_month = as.integer(11),
      game_id = id,
      play_description = .data$play_description_with_jersey_numbers,
      week = week,
      season_type = season_type,
      play_clock = as.character(.data$play_clock),
      st_play_type = as.character(.data$st_play_type),

      # fix muffed punt td in JAC game
      td_team = dplyr::if_else(
        id == "2011_14_TB_JAX" & .data$play_id == 1343 & .data$td_team != "JAX",
        'JAX',
        .data$td_team
      ),

      # kickoff return TDs in old JAC games
      td_team = dplyr::if_else(
        id == "2006_14_IND_JAX" &
          .data$play_id == 2078 &
          .data$td_team != "JAX",
        'JAX',
        .data$td_team
      ),
      td_team = dplyr::if_else(
        id == "2007_17_JAX_HOU" &
          .data$play_id %in% c(1907, 2042) &
          .data$td_team != "JAX",
        'HOU',
        .data$td_team
      ),
      td_team = dplyr::if_else(
        id == "2008_09_JAX_CIN" &
          .data$play_id == 3145 &
          .data$td_team != "JAX",
        'JAX',
        .data$td_team
      ),
      td_team = dplyr::if_else(
        id == "2009_15_IND_JAX" &
          .data$play_id == 1088 &
          .data$td_team != "JAX",
        'IND',
        .data$td_team
      ),
      td_team = dplyr::if_else(
        id == "2010_15_JAX_IND" &
          .data$play_id == 3848 &
          .data$td_team != "JAX",
        'IND',
        .data$td_team
      ),

      time = dplyr::case_when(
        id == '2012_04_NO_GB' & .data$play_id == 1085 ~ '3:34',
        id == '2012_16_BUF_MIA' & .data$play_id == 2571 ~ '8:31',
        TRUE ~ .data$time
      ),
      drive_real_start_time = as.character(.data$drive_real_start_time),
      # get the safety team to ensure the correct team gets the points
      # usage of base ifelse is important here for non-scoring games (i.e. early live games)
      safety_team = ifelse(
        .data$safety == 1,
        .data$scoring_team_abbreviation,
        NA_character_
      ),

      # can't trust the goal_to_go variable so we overwrite it here
      goal_to_go = as.numeric(stringr::str_detect(
        tolower(.data$pre_play_by_play),
        "goal"
      ))
    ) |>
    dplyr::mutate_if(
      .predicate = is.character,
      .funs = ~ dplyr::na_if(.x, "")
    ) |>
    # Data in 2023 pbp introduced separate "plays" for TV timeouts and two minute warnings
    # These mess up some of our logic. Since they are useless, we remove them here
    dplyr::filter(
      !(is.na(.data$timeout_team) &
        stringr::str_detect(
          tolower(.data$play_description),
          "timeout at|two-minute"
        ))
    ) |>
    # Data in 2024 pbp introduced separate "plays" for injury updates
    # These mess up some of our logic. Since they are useless, we remove them here
    dplyr::filter(
      !(is.na(.data$timeout_team) &
        stringr::str_starts(
          tolower(.data$play_description),
          "\\*\\* injury update:"
        ))
    ) |>
    fix_posteams()

  # fix for games where home_team == away_team and fields are messed up
  if (bad_game == 1) {
    combined <- combined |>
      fix_bad_games()
  }

  # nfl didn't fill in first downs on this game
  if (id == '2018_01_ATL_PHI') {
    combined <- combined |>
      dplyr::mutate(
        first_down_pass = dplyr::if_else(
          .data$pass_attempt == 1 & .data$first_down == 1,
          1,
          .data$first_down_pass
        ),
        first_down_rush = dplyr::if_else(
          .data$rush_attempt == 1 & .data$first_down == 1,
          1,
          .data$first_down_rush
        ),

        third_down_converted = dplyr::if_else(
          .data$first_down == 1 & .data$down == 3,
          1,
          .data$third_down_converted
        ),
        fourth_down_converted = dplyr::if_else(
          .data$first_down == 1 & .data$down == 4,
          1,
          .data$fourth_down_converted
        ),

        third_down_failed = dplyr::if_else(
          .data$first_down == 0 & .data$down == 3,
          1,
          .data$third_down_failed
        ),
        fourth_down_failed = dplyr::if_else(
          .data$first_down == 0 &
            .data$down == 4 &
            .data$play_type_nfl != "FIELD_GOAL" &
            .data$play_type_nfl != "PUNT" &
            .data$play_type_nfl != "PENALTY",
          1,
          .data$fourth_down_failed
        )
      )
  }

  return(combined)
}

# helper function to manually fill in fields for problematic games
fix_bad_games <- function(pbp) {
  fixed <- pbp |>
    dplyr::mutate(
      #if team has the ball and scored, make them the scoring team
      td_team = dplyr::if_else(
        .data$drive_how_ended_description == 'Touchdown' &
          !is.na(.data$td_team),
        .data$posteam,
        .data$td_team
      ),
      #if team defensive team score, fill in the right team
      td_team = dplyr::if_else(
        #game involving the jags
        #defensive TD
        .data$drive_how_ended_description != 'Touchdown' &
          !is.na(.data$td_team),
        #if home team has ball, then away team scored, otherwise home team scored
        dplyr::if_else(
          .data$posteam == .data$home_team,
          .data$away_team,
          .data$home_team
        ),
        .data$td_team
      ),
      # fill in return team
      return_team = dplyr::if_else(
        !is.na(.data$return_team),
        dplyr::if_else(
          # if the home team has the ball, return team is away team (this is before we flip posteam for kickoffs)
          .data$posteam == .data$home_team,
          .data$away_team,
          .data$home_team
        ),
        .data$return_team
      ),
      fumble_recovery_1_team = dplyr::if_else(
        !is.na(.data$fumble_recovery_1_team),
        # assign possession based on fumble_lost
        dplyr::case_when(
          .data$fumble_lost == 1 &
            .data$posteam == .data$home_team ~ .data$away_team,
          .data$fumble_lost == 1 &
            .data$posteam == .data$away_team ~ .data$home_team,
          .data$fumble_lost == 0 &
            .data$posteam == .data$home_team ~ .data$home_team,
          .data$fumble_lost == 0 &
            .data$posteam == .data$away_team ~ .data$away_team
        ),
        .data$fumble_recovery_1_team
      ),
      timeout_team = dplyr::if_else(
        # if there's a timeout in the affected seasons
        !is.na(.data$timeout_team),
        # extract from play description
        stringr::str_extract(
          .data$play_description,
          "(?<=Timeout #[1-3] by )[:upper:]+"
        ),
        .data$timeout_team
      )
    )

  return(fixed)
}

fix_posteams <- function(pbp) {
  # Data source switch in 2023 introduced new problems
  # 1. Definition of posteam on kick offs changed to receiving team. That's our
  #    definition and we swap teams later.
  # 2. Posteam doesn't change on the PAT after defensive TD
  #
  # We adjust both things here
  # We need the variable pre_play_by_play which usually looks like "KC  1-10  NYJ 40"
  if ("pre_play_by_play" %in% names(pbp)) {
    # Let's be as explicit as possible about what we want to extract from the string
    # It's really only the first valid team abbreviation followed by a blank space
    valid_team_abbrs <- paste(
      nflfastR::teams_colors_logos$team_abbr,
      collapse = " |"
    )
    posteam_regex <- paste0("^", valid_team_abbrs, "(?=[:space:])")

    pbp <- pbp |>
      dplyr::mutate(
        parsed_posteam = stringr::str_extract(
          .data$pre_play_by_play,
          posteam_regex
        ) |>
          stringr::str_trim(),
        posteam = dplyr::case_when(
          stringr::str_detect(
            .data$play_description,
            "^Timeout "
          ) ~ NA_character_,
          is.na(.data$parsed_posteam) ~ .data$posteam,
          .data$play_description == "GAME" ~ NA_character_,
          TRUE ~ .data$parsed_posteam
        ),
        # drop helper
        parsed_posteam = NULL
      )
  }

  pbp
}


================================================
FILE: R/helper_tidy_play_stats.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Create a single row with all play stats of a given play built in the
#          Scraper Functions
# Stlyeguide: styler::tidyverse_style()
################################################################################

# Build a single row for tidy data structure
#
# This is a sub-function for the get_pbp_nfl and get_pbp_gc functions.
#
# @param play_Id (integer) Specifies the play_Id for which the stats should be combined
# @param stats A dataframe including multiple rows for each play_Id holding
# gsis stat ids and stats
sum_play_stats <- function(play_Id, stats) {
  play_stats <- stats[stats$playId == play_Id, ]

  row <- c("play_id" = as.integer(play_Id), tidy_play_stats_row)

  for (index in seq_along(play_stats$playId)) {
    stat_id <- play_stats$statId[index]
    if (stat_id == 2) {
      row$punt_blocked <- 1
      row$punt_attempt <- 1
      row$kick_distance <- play_stats$yards[index]
      row$punter_player_id <- play_stats$player.esbId[index]
      row$punter_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 3) {
      row$first_down_rush <- 1
    } else if (stat_id == 4) {
      row$first_down_pass <- 1
    } else if (stat_id == 5) {
      row$first_down_penalty <- 1
    } else if (stat_id == 6) {
      row$third_down_converted <- 1
    } else if (stat_id == 7) {
      row$third_down_failed <- 1
    } else if (stat_id == 8) {
      row$fourth_down_converted <- 1
    } else if (stat_id == 9) {
      row$fourth_down_failed <- 1
    } else if (stat_id == 10) {
      row$rush_attempt <- 1
      row$rusher_player_id <- play_stats$player.esbId[index]
      row$rusher_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$rushing_yards <- play_stats$yards[index]
    } else if (stat_id == 11) {
      row$rush_attempt <- 1
      row$touchdown <- 1
      row$first_down_rush <- 1
      row$rush_touchdown <- 1
      row$rusher_player_id <- play_stats$player.esbId[index]
      row$rusher_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$rushing_yards <- play_stats$yards[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 12) {
      row$rush_attempt <- 1
      row$lateral_rush <- 1
      row$lateral_rusher_player_id <- play_stats$player.esbId[index]
      row$lateral_rusher_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$lateral_rushing_yards <- play_stats$yards[index]
    } else if (stat_id == 13) {
      row$rush_attempt <- 1
      row$touchdown <- 1
      row$rush_touchdown <- 1
      row$lateral_rush <- 1
      row$lateral_rusher_player_id <- play_stats$player.esbId[index]
      row$lateral_rusher_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$lateral_rushing_yards <- play_stats$yards[index]
    } else if (stat_id == 14) {
      row$incomplete_pass <- 1
      row$pass_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 15) {
      row$pass_attempt <- 1
      row$complete_pass <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$passing_yards <- play_stats$yards[index]
    } else if (stat_id == 16) {
      row$pass_attempt <- 1
      row$touchdown <- 1
      row$pass_touchdown <- 1
      row$complete_pass <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$passing_yards <- play_stats$yards[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 19) {
      row$interception <- 1
      row$pass_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 20) {
      row$pass_attempt <- 1
      row$sack <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
    } else if (stat_id == 21) {
      row$pass_attempt <- 1
      row$complete_pass <- 1
      row$receiver_player_id <- play_stats$player.esbId[index]
      row$receiver_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$receiving_yards <- play_stats$yards[index]
    } else if (stat_id == 22) {
      row$pass_attempt <- 1
      row$touchdown <- 1
      row$pass_touchdown <- 1
      row$complete_pass <- 1
      row$receiver_player_id <- play_stats$player.esbId[index]
      row$receiver_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$receiving_yards <- play_stats$yards[index]
    } else if (stat_id == 23) {
      row$pass_attempt <- 1
      row$complete_pass <- 1
      row$lateral_reception <- 1
      row$lateral_receiver_player_id <- play_stats$player.esbId[index]
      row$lateral_receiver_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$lateral_receiving_yards <- play_stats$yards[index]
    } else if (stat_id == 24) {
      row$pass_attempt <- 1
      row$touchdown <- 1
      row$pass_touchdown <- 1
      row$complete_pass <- 1
      row$lateral_reception <- 1
      row$lateral_receiver_player_id <- play_stats$player.esbId[index]
      row$lateral_receiver_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$yards_gained <- play_stats$yards[index]
      row$lateral_receiving_yards <- play_stats$yards[index]
    } else if (stat_id == 25) {
      row$pass_attempt <- 1
      row$interception_player_id <- play_stats$player.esbId[index]
      row$interception_player_name <- play_stats$player.displayName[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 26) {
      row$pass_attempt <- 1
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$interception_player_id <- play_stats$player.esbId[index]
      row$interception_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 27) {
      row$pass_attempt <- 1
      row$lateral_return <- 1
      row$lateral_interception_player_id <- play_stats$player.esbId[index]
      row$lateral_interception_player_name <- play_stats$player.displayName[
        index
      ]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 28) {
      row$pass_attempt <- 1
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$lateral_return <- 1
      row$lateral_interception_player_id <- play_stats$player.esbId[index]
      row$lateral_interception_player_name <- play_stats$player.displayName[
        index
      ]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 29) {
      row$punt_attempt <- 1
      row$punter_player_id <- play_stats$player.esbId[index]
      row$punter_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 30) {
      # yards always zero for stat_id 30 (punt inside 20) so we don't write kick_distance here
      row$punt_inside_twenty <- 1
      row$punt_attempt <- 1
      row$punter_player_id <- play_stats$player.esbId[index]
      row$punter_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 31) {
      row$punt_in_endzone <- 1
      row$punt_attempt <- 1
      row$punter_player_id <- play_stats$player.esbId[index]
      row$punter_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 32) {
      row$punt_attempt <- 1
      row$kick_distance <- play_stats$yards[index]
      row$punter_player_id <- play_stats$player.esbId[index]
      row$punter_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 33) {
      row$punt_attempt <- 1
      row$punt_returner_player_id <- play_stats$player.esbId[index]
      row$punt_returner_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 34) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$punt_attempt <- 1
      row$punt_returner_player_id <- play_stats$player.esbId[index]
      row$punt_returner_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 35) {
      row$punt_attempt <- 1
      row$lateral_return <- 1
      row$lateral_punt_returner_player_id <- play_stats$player.esbId[index]
      row$lateral_punt_returner_player_name <- play_stats$player.displayName[
        index
      ]
      row$return_yards <- play_stats$yards[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 36) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$punt_attempt <- 1
      row$lateral_return <- 1
      row$lateral_punt_returner_player_id <- play_stats$player.esbId[index]
      row$lateral_punt_returner_player_name <- play_stats$player.displayName[
        index
      ]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 37) {
      row$punt_out_of_bounds <- 1
      row$punt_attempt <- 1
      row$return_yards <- 0
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 38) {
      row$punt_downed <- 1
      row$punt_attempt <- 1
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 39) {
      row$punt_fair_catch <- 1
      row$punt_attempt <- 1
      row$punt_returner_player_id <- play_stats$player.esbId[index]
      row$punt_returner_player_name <- play_stats$player.displayName[index]
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 40) {
      row$punt_attempt <- 1
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 41) {
      row$kickoff_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 42) {
      # yards always zero for stat_id 42 so we don't write kick_distance here
      row$kickoff_inside_twenty <- 1
      row$kickoff_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 43) {
      row$kickoff_in_endzone <- 1
      row$kickoff_attempt <- 1
      row$kick_distance <- play_stats$yards[index]
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 44) {
      row$kickoff_attempt <- 1
      row$kick_distance <- play_stats$yards[index]
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 45) {
      row$kickoff_attempt <- 1
      row$kickoff_returner_player_id <- play_stats$player.esbId[index]
      row$kickoff_returner_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 46) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$kickoff_attempt <- 1
      row$kickoff_returner_player_id <- play_stats$player.esbId[index]
      row$kickoff_returner_player_name <- play_stats$player.displayName[index]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 47) {
      row$kickoff_attempt <- 1
      row$lateral_return <- 1
      row$lateral_kickoff_returner_player_id <- play_stats$player.esbId[index]
      row$lateral_kickoff_returner_player_name <- play_stats$player.displayName[
        index
      ]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 48) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$kickoff_attempt <- 1
      row$lateral_return <- 1
      row$lateral_kickoff_returner_player_id <- play_stats$player.esbId[index]
      row$lateral_kickoff_returner_player_name <- play_stats$player.displayName[
        index
      ]
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$return_yards <- play_stats$yards[index]
      row$return_team <- play_stats$teamAbbr[index]
      row$return_penalty_fix <- 1
    } else if (stat_id == 49) {
      row$kickoff_out_of_bounds <- 1
      row$kickoff_attempt <- 1
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 50) {
      row$kickoff_fair_catch <- 1
      row$kickoff_attempt <- 1
      row$kickoff_returner_player_id <- play_stats$player.esbId[index]
      row$kickoff_returner_player_name <- play_stats$player.displayName[index]
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 51) {
      row$kickoff_attempt <- 1
      row$return_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 52) {
      row$fumble_forced <- 1
      row$fumble <- 1
      row$fumbled_1_player_id <-
        if_else(
          is.na(row$fumbled_1_player_id),
          play_stats$player.esbId[index],
          row$fumbled_1_player_id
        )
      row$fumbled_1_player_name <-
        if_else(
          is.na(row$fumbled_1_player_name),
          play_stats$player.displayName[index],
          row$fumbled_1_player_name
        )
      row$fumbled_1_team <-
        if_else(
          is.na(row$fumbled_1_team),
          play_stats$teamAbbr[index],
          row$fumbled_1_team
        )
      row$fumbled_2_player_id <-
        if_else(
          is.na(row$fumbled_2_player_id) &
            row$fumbled_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumbled_2_player_id
        )
      row$fumbled_2_player_name <-
        if_else(
          is.na(row$fumbled_2_player_name) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumbled_2_player_name
        )
      row$fumbled_2_team <-
        if_else(
          is.na(row$fumbled_2_team) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          # row$fumbled_1_team != play_stats$teamAbbr[index], # can't use team here because multiple players of the same team are possible
          play_stats$teamAbbr[index],
          row$fumbled_2_team
        )
    } else if (stat_id == 53) {
      row$fumble_not_forced <- 1
      row$fumble <- 1
      row$fumbled_1_player_id <-
        if_else(
          is.na(row$fumbled_1_player_id),
          play_stats$player.esbId[index],
          row$fumbled_1_player_id
        )
      row$fumbled_1_player_name <-
        if_else(
          is.na(row$fumbled_1_player_name),
          play_stats$player.displayName[index],
          row$fumbled_1_player_name
        )
      row$fumbled_1_team <-
        if_else(
          is.na(row$fumbled_1_team),
          play_stats$teamAbbr[index],
          row$fumbled_1_team
        )
      row$fumbled_2_player_id <-
        if_else(
          is.na(row$fumbled_2_player_id) &
            row$fumbled_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumbled_2_player_id
        )
      row$fumbled_2_player_name <-
        if_else(
          is.na(row$fumbled_2_player_name) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumbled_2_player_name
        )
      row$fumbled_2_team <-
        if_else(
          is.na(row$fumbled_2_team) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          # row$fumbled_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumbled_2_team
        )
    } else if (stat_id == 54) {
      row$fumble_out_of_bounds <- 1
      row$fumble <- 1
      row$fumbled_1_player_id <-
        if_else(
          is.na(row$fumbled_1_player_id),
          play_stats$player.esbId[index],
          row$fumbled_1_player_id
        )
      row$fumbled_1_player_name <-
        if_else(
          is.na(row$fumbled_1_player_name),
          play_stats$player.displayName[index],
          row$fumbled_1_player_name
        )
      row$fumbled_1_team <-
        if_else(
          is.na(row$fumbled_1_team),
          play_stats$teamAbbr[index],
          row$fumbled_1_team
        )
      row$fumbled_2_player_id <-
        if_else(
          is.na(row$fumbled_2_player_id) &
            row$fumbled_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumbled_2_player_id
        )
      row$fumbled_2_player_name <-
        if_else(
          is.na(row$fumbled_2_player_name) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumbled_2_player_name
        )
      row$fumbled_2_team <-
        if_else(
          is.na(row$fumbled_2_team) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          # row$fumbled_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumbled_2_team
        )
    } else if (stat_id == 55) {
      row$fumble <- 1
      row$fumble_recovery_1_player_id <-
        if_else(
          is.na(row$fumble_recovery_1_player_id),
          play_stats$player.esbId[index],
          row$fumble_recovery_1_player_id
        )
      row$fumble_recovery_1_player_name <-
        if_else(
          is.na(row$fumble_recovery_1_player_name),
          play_stats$player.displayName[index],
          row$fumble_recovery_1_player_name
        )
      row$fumble_recovery_1_team <-
        if_else(
          is.na(row$fumble_recovery_1_team),
          play_stats$teamAbbr[index],
          row$fumble_recovery_1_team
        )
      row$fumble_recovery_1_yards <-
        if_else(
          is.na(row$fumble_recovery_1_yards),
          play_stats$yards[index],
          row$fumble_recovery_1_yards
        )
      row$fumble_recovery_2_player_id <-
        if_else(
          is.na(row$fumble_recovery_2_player_id) &
            row$fumble_recovery_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumble_recovery_2_player_id
        )
      row$fumble_recovery_2_player_name <-
        if_else(
          is.na(row$fumble_recovery_2_player_name) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumble_recovery_2_player_name
        )
      row$fumble_recovery_2_team <-
        if_else(
          is.na(row$fumble_recovery_2_team) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumble_recovery_2_team
        )
      row$fumble_recovery_2_yards <-
        if_else(
          is.na(row$fumble_recovery_2_yards) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_yards != play_stats$yards[index],
          play_stats$yards[index],
          row$fumble_recovery_2_yards
        )
    } else if (stat_id == 56) {
      row$touchdown <- 1
      row$fumble <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$fumble_recovery_1_player_id <-
        if_else(
          is.na(row$fumble_recovery_1_player_id),
          play_stats$player.esbId[index],
          row$fumble_recovery_1_player_id
        )
      row$fumble_recovery_1_player_name <-
        if_else(
          is.na(row$fumble_recovery_1_player_name),
          play_stats$player.displayName[index],
          row$fumble_recovery_1_player_name
        )
      row$fumble_recovery_1_team <-
        if_else(
          is.na(row$fumble_recovery_1_team),
          play_stats$teamAbbr[index],
          row$fumble_recovery_1_team
        )
      row$fumble_recovery_1_yards <-
        if_else(
          is.na(row$fumble_recovery_1_yards),
          play_stats$yards[index],
          row$fumble_recovery_1_yards
        )
      row$fumble_recovery_2_player_id <-
        if_else(
          is.na(row$fumble_recovery_2_player_id) &
            row$fumble_recovery_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumble_recovery_2_player_id
        )
      row$fumble_recovery_2_player_name <-
        if_else(
          is.na(row$fumble_recovery_2_player_name) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumble_recovery_2_player_name
        )
      row$fumble_recovery_2_team <-
        if_else(
          is.na(row$fumble_recovery_2_team) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumble_recovery_2_team
        )
      row$fumble_recovery_2_yards <-
        if_else(
          is.na(row$fumble_recovery_2_yards) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_yards != play_stats$yards[index],
          play_stats$yards[index],
          row$fumble_recovery_2_yards
        )
    } else if (stat_id == 57) {
      row$fumble <- 1
      row$lateral_recovery <- 1
    } else if (stat_id == 58) {
      row$touchdown <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$fumble <- 1
      row$lateral_recovery <- 1
    } else if (stat_id == 59) {
      row$fumble <- 1
      row$fumble_recovery_1_player_id <-
        if_else(
          is.na(row$fumble_recovery_1_player_id),
          play_stats$player.esbId[index],
          row$fumble_recovery_1_player_id
        )
      row$fumble_recovery_1_player_name <-
        if_else(
          is.na(row$fumble_recovery_1_player_name),
          play_stats$player.displayName[index],
          row$fumble_recovery_1_player_name
        )
      row$fumble_recovery_1_team <-
        if_else(
          is.na(row$fumble_recovery_1_team),
          play_stats$teamAbbr[index],
          row$fumble_recovery_1_team
        )
      row$fumble_recovery_1_yards <-
        if_else(
          is.na(row$fumble_recovery_1_yards),
          play_stats$yards[index],
          row$fumble_recovery_1_yards
        )
      row$fumble_recovery_2_player_id <-
        if_else(
          is.na(row$fumble_recovery_2_player_id) &
            row$fumble_recovery_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumble_recovery_2_player_id
        )
      row$fumble_recovery_2_player_name <-
        if_else(
          is.na(row$fumble_recovery_2_player_name) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumble_recovery_2_player_name
        )
      row$fumble_recovery_2_team <-
        if_else(
          is.na(row$fumble_recovery_2_team) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumble_recovery_2_team
        )
      row$fumble_recovery_2_yards <-
        if_else(
          is.na(row$fumble_recovery_2_yards) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_yards != play_stats$yards[index],
          play_stats$yards[index],
          row$fumble_recovery_2_yards
        )
    } else if (stat_id == 60) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$fumble <- 1
      row$fumble_recovery_1_player_id <-
        if_else(
          is.na(row$fumble_recovery_1_player_id),
          play_stats$player.esbId[index],
          row$fumble_recovery_1_player_id
        )
      row$fumble_recovery_1_player_name <-
        if_else(
          is.na(row$fumble_recovery_1_player_name),
          play_stats$player.displayName[index],
          row$fumble_recovery_1_player_name
        )
      row$fumble_recovery_1_team <-
        if_else(
          is.na(row$fumble_recovery_1_team),
          play_stats$teamAbbr[index],
          row$fumble_recovery_1_team
        )
      row$fumble_recovery_1_yards <-
        if_else(
          is.na(row$fumble_recovery_1_yards),
          play_stats$yards[index],
          row$fumble_recovery_1_yards
        )
      row$fumble_recovery_2_player_id <-
        if_else(
          is.na(row$fumble_recovery_2_player_id) &
            row$fumble_recovery_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumble_recovery_2_player_id
        )
      row$fumble_recovery_2_player_name <-
        if_else(
          is.na(row$fumble_recovery_2_player_name) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumble_recovery_2_player_name
        )
      row$fumble_recovery_2_team <-
        if_else(
          is.na(row$fumble_recovery_2_team) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumble_recovery_2_team
        )
      row$fumble_recovery_2_yards <-
        if_else(
          is.na(row$fumble_recovery_2_yards) &
            row$fumble_recovery_1_player_name !=
              play_stats$player.displayName[index],
          # row$fumble_recovery_1_yards != play_stats$yards[index],
          play_stats$yards[index],
          row$fumble_recovery_2_yards
        )
    } else if (stat_id == 61) {
      row$fumble <- 1
      row$lateral_recovery <- 1
    } else if (stat_id == 62) {
      row$touchdown <- 1
      row$return_touchdown <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$fumble <- 1
      row$lateral_recovery <- 1
    } else if (stat_id == 63) {
      NULL
    } else if (stat_id == 64) {
      row$touchdown <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 68) {
      row$timeout <- 1
      row$timeout_team <- play_stats$teamAbbr[index]
    } else if (stat_id == 69) {
      row$field_goal_missed <- 1
      row$field_goal_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 70) {
      row$field_goal_made <- 1
      row$field_goal_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 71) {
      row$field_goal_blocked <- 1
      row$field_goal_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
      row$kick_distance <- play_stats$yards[index]
    } else if (stat_id == 72) {
      row$extra_point_good <- 1
      row$extra_point_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 73) {
      row$extra_point_failed <- 1
      row$extra_point_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 74) {
      row$extra_point_blocked <- 1
      row$extra_point_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 75) {
      row$two_point_rush_good <- 1
      row$rush_attempt <- 1
      row$two_point_attempt <- 1
      row$rusher_player_id <- play_stats$player.esbId[index]
      row$rusher_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 76) {
      row$two_point_rush_failed <- 1
      row$rush_attempt <- 1
      row$two_point_attempt <- 1
      row$rusher_player_id <- play_stats$player.esbId[index]
      row$rusher_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 77) {
      row$two_point_pass_good <- 1
      row$pass_attempt <- 1
      row$two_point_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 78) {
      row$two_point_pass_failed <- 1
      row$pass_attempt <- 1
      row$two_point_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 79) {
      row$solo_tackle <- 1
      row$solo_tackle_1_player_id <-
        if_else(
          is.na(row$solo_tackle_1_player_id),
          play_stats$player.esbId[index],
          row$solo_tackle_1_player_id
        )
      row$solo_tackle_1_player_name <-
        if_else(
          is.na(row$solo_tackle_1_player_name),
          play_stats$player.displayName[index],
          row$solo_tackle_1_player_name
        )
      row$solo_tackle_1_team <-
        if_else(
          is.na(row$solo_tackle_1_team),
          play_stats$teamAbbr[index],
          row$solo_tackle_1_team
        )
      row$solo_tackle_2_player_id <-
        if_else(
          is.na(row$solo_tackle_2_player_id) &
            row$solo_tackle_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$solo_tackle_2_player_id
        )
      row$solo_tackle_2_player_name <-
        if_else(
          is.na(row$solo_tackle_2_player_name) &
            row$solo_tackle_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$solo_tackle_2_player_name
        )
      row$solo_tackle_2_team <-
        if_else(
          is.na(row$solo_tackle_2_team) &
            row$solo_tackle_1_player_name !=
              play_stats$player.displayName[index],
          # row$solo_tackle_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$solo_tackle_2_team
        )
    } else if (stat_id == 80) {
      row$tackle_with_assist <- 1
      row$tackle_with_assist_1_player_id <-
        if_else(
          is.na(row$tackle_with_assist_1_player_id),
          play_stats$player.esbId[index],
          row$tackle_with_assist_1_player_id
        )
      row$tackle_with_assist_1_player_name <-
        if_else(
          is.na(row$tackle_with_assist_1_player_name),
          play_stats$player.displayName[index],
          row$tackle_with_assist_1_player_name
        )
      row$tackle_with_assist_1_team <-
        if_else(
          is.na(row$tackle_with_assist_1_team),
          play_stats$teamAbbr[index],
          row$tackle_with_assist_1_team
        )
      row$tackle_with_assist_2_player_id <-
        if_else(
          is.na(row$tackle_with_assist_2_player_id) &
            row$tackle_with_assist_1_player_id !=
              play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$tackle_with_assist_2_player_id
        )
      row$tackle_with_assist_2_player_name <-
        if_else(
          is.na(row$tackle_with_assist_2_player_name) &
            row$tackle_with_assist_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$tackle_with_assist_2_player_name
        )
      row$tackle_with_assist_2_team <-
        if_else(
          is.na(row$tackle_with_assist_2_team) &
            row$tackle_with_assist_1_player_name !=
              play_stats$player.displayName[index],
          # row$tackle_with_assist_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$tackle_with_assist_2_team
        )
    } else if (stat_id == 82) {
      # =81
      row$assist_tackle <- 1
      row$assist_tackle_1_player_id <-
        if_else(
          is.na(row$assist_tackle_1_player_id),
          play_stats$player.esbId[index],
          row$assist_tackle_1_player_id
        )
      row$assist_tackle_1_player_name <-
        if_else(
          is.na(row$assist_tackle_1_player_name),
          play_stats$player.displayName[index],
          row$assist_tackle_1_player_name
        )
      row$assist_tackle_1_team <-
        if_else(
          is.na(row$assist_tackle_1_team),
          play_stats$teamAbbr[index],
          row$assist_tackle_1_team
        )
      row$assist_tackle_2_player_id <-
        if_else(
          is.na(row$assist_tackle_2_player_id) &
            row$assist_tackle_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$assist_tackle_2_player_id
        )
      row$assist_tackle_2_player_name <-
        if_else(
          is.na(row$assist_tackle_2_player_name) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$assist_tackle_2_player_name
        )
      row$assist_tackle_2_team <-
        if_else(
          is.na(row$assist_tackle_2_team) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index],
          # row$assist_tackle_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$assist_tackle_2_team
        )
      row$assist_tackle_3_player_id <-
        if_else(
          (is.na(row$assist_tackle_3_player_id) &
            row$assist_tackle_1_player_id != play_stats$player.esbId[index] &
            row$assist_tackle_2_player_id != play_stats$player.esbId[index]),
          play_stats$player.esbId[index],
          row$assist_tackle_3_player_id
        )
      row$assist_tackle_3_player_name <-
        if_else(
          (is.na(row$assist_tackle_3_player_name) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_2_player_name !=
              play_stats$player.displayName[index]),
          play_stats$player.displayName[index],
          row$assist_tackle_3_player_name
        )
      row$assist_tackle_3_team <-
        if_else(
          (is.na(row$assist_tackle_3_team) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_2_player_name !=
              play_stats$player.displayName[index]),
          # row$assist_tackle_1_team != play_stats$teamAbbr[index] &
          #  row$assist_tackle_2_team != play_stats$teamAbbr[index]),
          play_stats$teamAbbr[index],
          row$assist_tackle_3_team
        )
      row$assist_tackle_4_player_id <-
        if_else(
          (is.na(row$assist_tackle_4_player_id) &
            row$assist_tackle_1_player_id != play_stats$player.esbId[index] &
            row$assist_tackle_2_player_id != play_stats$player.esbId[index] &
            row$assist_tackle_3_player_id != play_stats$player.esbId[index]),
          play_stats$player.esbId[index],
          row$assist_tackle_4_player_id
        )
      row$assist_tackle_4_player_name <-
        if_else(
          (is.na(row$assist_tackle_4_player_name) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_2_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_3_player_name !=
              play_stats$player.displayName[index]),
          play_stats$player.displayName[index],
          row$assist_tackle_4_player_name
        )
      row$assist_tackle_4_team <-
        if_else(
          (is.na(row$assist_tackle_4_team) &
            row$assist_tackle_1_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_2_player_name !=
              play_stats$player.displayName[index] &
            row$assist_tackle_3_player_name !=
              play_stats$player.displayName[index]),
          # row$assist_tackle_1_team != play_stats$teamAbbr[index] &
          #  row$assist_tackle_2_team != play_stats$teamAbbr[index] &
          #  row$assist_tackle_3_team != play_stats$teamAbbr[index]),
          play_stats$teamAbbr[index],
          row$assist_tackle_4_team
        )
    } else if (stat_id == 83) {
      row$sack <- 1
      row$sack_player_id <- play_stats$player.esbId[index]
      row$sack_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 84) {
      row$sack <- 1
      row$assist_tackle <- 1
      row$half_sack_1_player_id <-
        if_else(
          is.na(row$half_sack_1_player_id),
          play_stats$player.esbId[index],
          row$half_sack_1_player_id
        )
      row$half_sack_1_player_name <-
        if_else(
          is.na(row$half_sack_1_player_name),
          play_stats$player.displayName[index],
          row$half_sack_1_player_name
        )
      row$half_sack_2_player_id <-
        if_else(
          is.na(row$half_sack_2_player_id) &
            row$half_sack_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$half_sack_2_player_id
        )
      row$half_sack_2_player_name <-
        if_else(
          is.na(row$half_sack_2_player_name) &
            row$half_sack_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$half_sack_2_player_name
        )
    } else if (stat_id == 85) {
      row$pass_defense_1_player_id <-
        if_else(
          is.na(row$pass_defense_1_player_id),
          play_stats$player.esbId[index],
          row$pass_defense_1_player_id
        )
      row$pass_defense_1_player_name <-
        if_else(
          is.na(row$pass_defense_1_player_name),
          play_stats$player.displayName[index],
          row$pass_defense_1_player_name
        )
      row$pass_defense_2_player_id <-
        if_else(
          is.na(row$pass_defense_2_player_id) &
            row$pass_defense_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$pass_defense_2_player_id
        )
      row$pass_defense_2_player_name <-
        if_else(
          is.na(row$pass_defense_2_player_name) &
            row$pass_defense_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$pass_defense_2_player_name
        )
    } else if (stat_id == 86) {
      row$punt_attempt <- 1
      row$blocked_player_id <- play_stats$player.esbId[index]
      row$blocked_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 87) {
      row$blocked_player_id <- play_stats$player.esbId[index]
      row$blocked_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 88) {
      row$field_goal_attempt <- 1
      row$blocked_player_id <- play_stats$player.esbId[index]
      row$blocked_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 89) {
      row$safety <- 1
      row$safety_player_id <- play_stats$player.esbId[index]
      row$safety_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 91) {
      row$fumble <- 1
      row$forced_fumble_player_1_player_id <-
        if_else(
          is.na(row$forced_fumble_player_1_player_id),
          play_stats$player.esbId[index],
          row$forced_fumble_player_1_player_id
        )
      row$forced_fumble_player_1_player_name <-
        if_else(
          is.na(row$forced_fumble_player_1_player_name),
          play_stats$player.displayName[index],
          row$forced_fumble_player_1_player_name
        )
      row$forced_fumble_player_1_team <-
        if_else(
          is.na(row$forced_fumble_player_1_team),
          play_stats$teamAbbr[index],
          row$forced_fumble_player_1_team
        )
      row$forced_fumble_player_2_player_id <-
        if_else(
          is.na(row$forced_fumble_player_2_player_id) &
            row$forced_fumble_player_1_player_id !=
              play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$forced_fumble_player_2_player_id
        )
      row$forced_fumble_player_2_player_name <-
        if_else(
          is.na(row$forced_fumble_player_2_player_name) &
            row$forced_fumble_player_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$forced_fumble_player_2_player_name
        )
      row$forced_fumble_player_2_team <-
        if_else(
          is.na(row$forced_fumble_player_2_team) &
            row$forced_fumble_player_1_player_name !=
              play_stats$player.displayName[index],
          # row$forced_fumble_player_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$forced_fumble_player_2_team
        )
    } else if (stat_id == 93) {
      row$penalty <- 1
      row$penalty_player_id <- play_stats$player.esbId[index]
      row$penalty_player_name <- play_stats$player.displayName[index]
      row$penalty_team <- play_stats$teamAbbr[index]
      row$penalty_yards <- play_stats$yards[index]
    } else if (stat_id == 95) {
      row$tackled_for_loss <- 1
    } else if (stat_id == 96) {
      row$extra_point_safety <- 1
      row$extra_point_attempt <- 1
    } else if (stat_id == 99) {
      row$two_point_rush_safety <- 1
      row$rush_attempt <- 1
      row$two_point_attempt <- 1
      row$rusher_player_id <- play_stats$player.esbId[index]
      row$rusher_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 100) {
      row$two_point_pass_safety <- 1
      row$pass_attempt <- 1
      row$two_point_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 102) {
      row$kickoff_downed <- 1
      row$kickoff_attempt <- 1
    } else if (stat_id == 103) {
      row$lateral_sack_player_id <- play_stats$player.esbId[index]
      row$lateral_sack_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 104) {
      row$two_point_pass_reception_good <- 1
      row$pass_attempt <- 1
      row$two_point_attempt <- 1
      row$receiver_player_id <- play_stats$player.esbId[index]
      row$receiver_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 105) {
      row$two_point_pass_reception_failed <- 1
      row$pass_attempt <- 1
      row$two_point_attempt <- 1
      row$receiver_player_id <- play_stats$player.esbId[index]
      row$receiver_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 106) {
      row$fumble_lost <- 1
      row$fumble <- 1
      row$fumbled_1_player_id <-
        if_else(
          is.na(row$fumbled_1_player_id),
          play_stats$player.esbId[index],
          row$fumbled_1_player_id
        )
      row$fumbled_1_player_name <-
        if_else(
          is.na(row$fumbled_1_player_name),
          play_stats$player.displayName[index],
          row$fumbled_1_player_name
        )
      row$fumbled_1_team <-
        if_else(
          is.na(row$fumbled_1_team),
          play_stats$teamAbbr[index],
          row$fumbled_1_team
        )
      row$fumbled_2_player_id <-
        if_else(
          is.na(row$fumbled_2_player_id) &
            row$fumbled_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$fumbled_2_player_id
        )
      row$fumbled_2_player_name <-
        if_else(
          is.na(row$fumbled_2_player_name) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$fumbled_2_player_name
        )
      row$fumbled_2_team <-
        if_else(
          is.na(row$fumbled_2_team) &
            row$fumbled_1_player_name != play_stats$player.displayName[index],
          # row$fumbled_1_team != play_stats$teamAbbr[index],
          play_stats$teamAbbr[index],
          row$fumbled_2_team
        )
    } else if (stat_id == 107) {
      row$own_kickoff_recovery <- 1
      row$kickoff_attempt <- 1
      row$own_kickoff_recovery_player_id <- play_stats$player.esbId[index]
      row$own_kickoff_recovery_player_name <- play_stats$player.displayName[
        index
      ]
    } else if (stat_id == 108) {
      row$own_kickoff_recovery_td <- 1
      row$touchdown <- 1
      row$td_team <- play_stats$teamAbbr[index]
      row$td_player_id <- play_stats$player.esbId[index]
      row$td_player_name <- play_stats$player.displayName[index]
      row$kickoff_attempt <- 1
      row$own_kickoff_recovery_player_id <- play_stats$player.esbId[index]
      row$own_kickoff_recovery_player_name <- play_stats$player.displayName[
        index
      ]
    } else if (stat_id == 110) {
      row$qb_hit <- 1
      row$qb_hit_1_player_id <-
        if_else(
          is.na(row$qb_hit_1_player_id),
          play_stats$player.esbId[index],
          row$qb_hit_1_player_id
        )
      row$qb_hit_1_player_name <-
        if_else(
          is.na(row$qb_hit_1_player_name),
          play_stats$player.displayName[index],
          row$qb_hit_1_player_name
        )
      row$qb_hit_2_player_id <-
        if_else(
          is.na(row$qb_hit_2_player_id) &
            row$qb_hit_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$qb_hit_2_player_id
        )
      row$qb_hit_2_player_name <-
        if_else(
          is.na(row$qb_hit_2_player_name) &
            row$qb_hit_1_player_name != play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$qb_hit_2_player_name
        )
    } else if (stat_id == 111) {
      row$pass_attempt <- 1
      row$complete_pass <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
      row$air_yards <- play_stats$yards[index]
    } else if (stat_id == 112) {
      row$pass_attempt <- 1
      row$passer_player_id <- play_stats$player.esbId[index]
      row$passer_player_name <- play_stats$player.displayName[index]
      row$air_yards <- play_stats$yards[index]
    } else if (stat_id == 113) {
      row$pass_attempt <- 1
      row$complete_pass <- 1
      if (is.na(row$receiver_player_id)) {
        row$receiver_player_id <- play_stats$player.esbId[index]
        row$receiver_player_name <- play_stats$player.displayName[index]
      }
      if (is.na(row$yards_after_catch)) {
        row$yards_after_catch <- play_stats$yards[index]
      }
    } else if (stat_id == 115) {
      row$pass_attempt <- 1
      row$receiver_player_id <- play_stats$player.esbId[index]
      row$receiver_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 120) {
      row$tackle_for_loss_1_player_id <-
        if_else(
          is.na(row$tackle_for_loss_1_player_id),
          play_stats$player.esbId[index],
          row$tackle_for_loss_1_player_id
        )
      row$tackle_for_loss_1_player_name <-
        if_else(
          is.na(row$tackle_for_loss_1_player_name),
          play_stats$player.displayName[index],
          row$tackle_for_loss_1_player_name
        )
      row$tackle_for_loss_2_player_id <-
        if_else(
          is.na(row$tackle_for_loss_2_player_id) &
            row$tackle_for_loss_1_player_id != play_stats$player.esbId[index],
          play_stats$player.esbId[index],
          row$tackle_for_loss_2_player_id
        )
      row$tackle_for_loss_2_player_name <-
        if_else(
          is.na(row$tackle_for_loss_2_player_name) &
            row$tackle_for_loss_1_player_name !=
              play_stats$player.displayName[index],
          play_stats$player.displayName[index],
          row$tackle_for_loss_2_player_name
        )
    } else if (stat_id == 301) {
      row$extra_point_aborted <- 1
      row$extra_point_attempt <- 1
    } else if (stat_id == 402) {
      # tackle for loss player information is recorded in stat id 120
      NULL
    } else if (stat_id == 403) {
      row$defensive_two_point_attempt <- 1
    } else if (stat_id == 404) {
      row$defensive_two_point_conv <- 1
    } else if (stat_id == 405) {
      row$defensive_extra_point_attempt <- 1
    } else if (stat_id == 406) {
      row$defensive_extra_point_conv <- 1
    } else if (stat_id == 410) {
      row$kickoff_attempt <- 1
      row$kicker_player_id <- play_stats$player.esbId[index]
      row$kicker_player_name <- play_stats$player.displayName[index]
    } else if (stat_id == 420) {
      row$two_point_return <- 1
      row$two_point_attempt <- 1
    } else {
      NULL
    }
  }
  return(row)
}


================================================
FILE: R/helper_variable_selector.R
================================================
################################################################################
# Author: Ben Baldwin, Sebastian Carl
# Purpose: Build the final output of the pbp functions
# Code Style Guide: styler::tidyverse_style()
################################################################################

select_variables <- function(pbp) {
  suppressWarnings(
    out <-
      pbp |>
      dplyr::select(
        dplyr::any_of(
          c(nflscrapr_cols, new_cols, api_cols)
        )
      )
  )

  return(out)
}

# columns that are not in gamecenter that we created
new_cols <- c(
  "season",
  "cp",
  "cpoe",
  "series",
  "series_success",
  "series_result"
)

# original nflscrapr columns
nflscrapr_cols <-
  c(
    "play_id",
    "game_id",
    "old_game_id",
    "home_team",
    "away_team",
    #added these to new gc scraper
    "season_type",
    "week",
    "posteam",
    "posteam_type",
    "defteam",
    "side_of_field",
    "yardline_100",
    "game_date",
    "quarter_seconds_remaining",
    "half_seconds_remaining",
    "game_seconds_remaining",
    "game_half",
    "quarter_end",
    "drive",
    "sp",
    "qtr",
    "down",
    "goal_to_go",
    "time",
    "yrdln",
    "ydstogo",
    "ydsnet",
    "desc",
    "play_type",
    "yards_gained",
    "shotgun",
    "no_huddle",
    "qb_dropback",
    "qb_kneel",
    "qb_spike",
    "qb_scramble",
    "pass_length",
    "pass_location",
    "air_yards",
    "yards_after_catch",
    "run_location",
    "run_gap",
    "field_goal_result",
    "kick_distance",
    "extra_point_result",
    "two_point_conv_result",
    "home_timeouts_remaining",
    "away_timeouts_remaining",
    "timeout",
    "timeout_team",
    "td_team",
    "td_player_name",
    "td_player_id",
    "posteam_timeouts_remaining",
    "defteam_timeouts_remaining",
    "total_home_score",
    "total_away_score",
    "posteam_score",
    "defteam_score",
    "score_differential",
    "posteam_score_post",
    "defteam_score_post",
    "score_differential_post",
    "no_score_prob",
    "opp_fg_prob",
    "opp_safety_prob",
    "opp_td_prob",
    "fg_prob",
    "safety_prob",
    "td_prob",
    "extra_point_prob",
    "two_point_conversion_prob",
    "ep",
    "epa",
    "total_home_epa",
    "total_away_epa",
    "total_home_rush_epa",
    "total_away_rush_epa",
    "total_home_pass_epa",
    "total_away_pass_epa",
    "air_epa",
    "yac_epa",
    "comp_air_epa",
    "comp_yac_epa",
    "total_home_comp_air_epa",
    "total_away_comp_air_epa",
    "total_home_comp_yac_epa",
    "total_away_comp_yac_epa",
    "total_home_raw_air_epa",
    "total_away_raw_air_epa",
    "total_home_raw_yac_epa",
    "total_away_raw_yac_epa",
    "wp",
    "def_wp",
    "home_wp",
    "away_wp",
    "wpa",
    "vegas_wpa",
    "vegas_home_wpa",
    "home_wp_post",
    "away_wp_post",
    "vegas_wp",
    "vegas_home_wp",
    "total_home_rush_wpa",
    "total_away_rush_wpa",
    "total_home_pass_wpa",
    "total_away_pass_wpa",
    "air_wpa",
    "yac_wpa",
    "comp_air_wpa",
    "comp_yac_wpa",
    "total_home_comp_air_wpa",
    "total_away_comp_air_wpa",
    "total_home_comp_yac_wpa",
    "total_away_comp_yac_wpa",
    "total_home_raw_air_wpa",
    "total_away_raw_air_wpa",
    "total_home_raw_yac_wpa",
    "total_away_raw_yac_wpa",
    "punt_blocked",
    "first_down_rush",
    "first_down_pass",
    "first_down_penalty",
    "third_down_converted",
    "third_down_failed",
    "fourth_down_converted",
    "fourth_down_failed",
    "incomplete_pass",
    "touchback",
    "interception",
    "punt_inside_twenty",
    "punt_in_endzone",
    "punt_out_of_bounds",
    "punt_downed",
    "punt_fair_catch",
    "kickoff_inside_twenty",
    "kickoff_in_endzone",
    "kickoff_out_of_bounds",
    "kickoff_downed",
    "kickoff_fair_catch",
    "fumble_forced",
    "fumble_not_forced",
    "fumble_out_of_bounds",
    "solo_tackle",
    "safety",
    "penalty",
    "tackled_for_loss",
    "fumble_lost",
    "own_kickoff_recovery",
    "own_kickoff_recovery_td",
    "qb_hit",
    "rush_attempt",
    "pass_attempt",
    "sack",
    "touchdown",
    "pass_touchdown",
    "rush_touchdown",
    "return_touchdown",
    "extra_point_attempt",
    "two_point_attempt",
    "field_goal_attempt",
    "kickoff_attempt",
    "punt_attempt",
    "fumble",
    "complete_pass",
    "assist_tackle",
    "lateral_reception",
    "lateral_rush",
    "lateral_return",
    "lateral_recovery",
    "passer_player_id",
    "passer_player_name",
    "passing_yards",
    "receiver_player_id",
    "receiver_player_name",
    "receiving_yards",
    "rusher_player_id",
    "rusher_player_name",
    "rushing_yards",
    "lateral_receiver_player_id",
    "lateral_receiver_player_name",
    "lateral_receiving_yards",
    "lateral_rusher_player_id",
    "lateral_rusher_player_name",
    "lateral_rushing_yards",
    "lateral_sack_player_id",
    "lateral_sack_player_name",
    "interception_player_id",
    "interception_player_name",
    "lateral_interception_player_id",
    "lateral_interception_player_name",
    "punt_returner_player_id",
    "punt_returner_player_name",
    "lateral_punt_returner_player_id",
    "lateral_punt_returner_player_name",
    "kickoff_returner_player_name",
    "kickoff_returner_player_id",
    "lateral_kickoff_returner_player_id",
    "lateral_kickoff_returner_player_name",
    "punter_player_id",
    "punter_player_name",
    "kicker_player_name",
    "kicker_player_id",
    "own_kickoff_recovery_player_id",
    "own_kickoff_recovery_player_name",
    "blocked_player_id",
    "blocked_player_name",
    "tackle_for_loss_1_player_id",
    "tackle_for_loss_1_player_name",
    "tackle_for_loss_2_player_id",
    "tackle_for_loss_2_player_name",
    "qb_hit_1_player_id",
    "qb_hit_1_player_name",
    "qb_hit_2_player_id",
    "qb_hit_2_player_name",
    "forced_fumble_player_1_team",
    "forced_fumble_player_1_player_id",
    "forced_fumble_player_1_player_name",
    "forced_fumble_player_2_team",
    "forced_fumble_player_2_player_id",
    "forced_fumble_player_2_player_name",
    "solo_tackle_1_team",
    "solo_tackle_2_team",
    "solo_tackle_1_player_id",
    "solo_tackle_2_player_id",
    "solo_tackle_1_player_name",
    "solo_tackle_2_player_name",
    "assist_tackle_1_player_id",
    "assist_tackle_1_player_name",
    "assist_tackle_1_team",
    "assist_tackle_2_player_id",
    "assist_tackle_2_player_name",
    "assist_tackle_2_team",
    "assist_tackle_3_player_id",
    "assist_tackle_3_player_name",
    "assist_tackle_3_team",
    "assist_tackle_4_player_id",
    "assist_tackle_4_player_name",
    "assist_tackle_4_team",
    #new in nflfastR v4.0
    "tackle_with_assist",
    "tackle_with_assist_1_player_id",
    "tackle_with_assist_1_player_name",
    "tackle_with_assist_1_team",
    "tackle_with_assist_2_player_id",
    "tackle_with_assist_2_player_name",
    "tackle_with_assist_2_team",
    "pass_defense_1_player_id",
    "pass_defense_1_player_name",
    "pass_defense_2_player_id",
    "pass_defense_2_player_name",
    "fumbled_1_team",
    "fumbled_1_player_id",
    "fumbled_1_player_name",
    "fumbled_2_player_id",
    "fumbled_2_player_name",
    "fumbled_2_team",
    "fumble_recovery_1_team",
    "fumble_recovery_1_yards",
    "fumble_recovery_1_player_id",
    "fumble_recovery_1_player_name",
    "fumble_recovery_2_team",
    "fumble_recovery_2_yards",
    "fumble_recovery_2_player_id",
    "fumble_recovery_2_player_name",
    #new in nflfastR v4.1
    "sack_player_id",
    "sack_player_name",
    "half_sack_1_player_id",
    "half_sack_1_player_name",
    "half_sack_2_player_id",
    "half_sack_2_player_name",
    "return_team",
    "return_yards",
    "penalty_team",
    "penalty_player_id",
    "penalty_player_name",
    "penalty_yards",
    "replay_or_challenge",
    "replay_or_challenge_result",
    "penalty_type",
    "defensive_two_point_attempt",
    "defensive_two_point_conv",
    "defensive_extra_point_attempt",
    "defensive_extra_point_conv",
    #new in nflfastR > v4.1
    "safety_player_name",
    "safety_player_id"
  )


# these are columns in the RS data that aren't in nflscrapR
rs_cols <- c(
  "season_type",
  "week",
  "game_key",
  "game_time_eastern",
  "game_time_local",
  "iso_time",
  "game_type",
  "site_id",
  "site_city",
  "site_fullname",
  "site_state",
  "roof_type",
  "drive_start_time",
  "drive_end_time",
  "drive_start_yardline",
  "drive_end_yardline",
  "drive_how_started",
  "drive_how_ended",
  "drive_play_count",
  "drive_yards_penalized",
  "drive_time_of_possession",
  "drive_inside20",
  "drive_first_downs",
  "drive_possession_team_abbr",
  "scoring_team_abbr",
  "scoring_type",
  "alert_play_type",
  "play_type_nfl",
  "time_of_day",
  "yards",
  "end_yardline_side",
  "end_yardline_number"
)


# these are columns in the new API that aren't in nflscrapR
api_cols <- c(
  "order_sequence",
  "start_time",
  "time_of_day",
  "stadium",
  "weather",
  "nfl_api_id",
  "play_clock",
  "play_deleted",
  "play_type_nfl",
  "special_teams_play",
  "st_play_type",
  "end_clock_time",
  "end_yard_line",

  "fixed_drive",
  "fixed_drive_result",
  "drive_real_start_time",

  "drive_play_count",
  "drive_time_of_possession",
  "drive_first_downs",
  "drive_inside20",
  "drive_ended_with_score",
  "drive_quarter_start",
  "drive_quarter_end",
  "drive_yards_penalized",

  "drive_start_transition",
  "drive_end_transition",

  "drive_game_clock_start",
  "drive_game_clock_end",
  "drive_start_yard_line",
  "drive_end_yard_line",
  "drive_play_id_started",
  "drive_play_id_ended",
  "away_score",
  "home_score",
  "location",
  "result",
  "total",
  "spread_line",
  "total_line",
  "div_game",
  "roof",
  "surface",
  "temp",
  "wind",
  "home_coach",
  "away_coach",
  "stadium_id",
  "game_stadium"
)


================================================
FILE: R/nflfastR-package.R
================================================
#' @details # Parallel Processing and Progress Updates in nflfastR
#'
#' ## Preface
#'
#' Prior to nflfastR v4.0, parallel processing could be activated with an
#' argument `pp` in the relevant functions and progress updates were always
#' shown. Both of these methods are bad practice and were therefore removed
#' in nflfastR v4.0
#'
#' The next sections describe how to make nflfastR work in parallel processes
#' and show progress updates if the user wants to.
#'
#' ## More Speed Using Parallel Processing
#'
#' Nearly all nflfastR functions support parallel processing
#' using [furrr::future_map()] if it is enabled by a call to [future::plan()]
#' prior to the function call.
#' Please see the documentation of the functions for detailed information.
#'
#' As an example, the following code block will resolve all function calls in the
#' current session using multiple sessions in the background and load play-by-play
#' data for the 2018 through 2020 seasons or build them freshly for the 2018 and
#' 2019 Super Bowls:
#' ```
#' future::plan("multisession")
#' load_pbp(2018:2020)
#' build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
#' ```
#' We recommend choosing a default parallel processing method and saving it
#' as an environment variable in the R user profile to make sure all futures
#' will be resolved with the chosen method by default.
#' This can be done by following the below given steps.
#'
#' First, run the following line and the file `.Renviron` should be opened automatically.
#' If you haven't saved any environment variables yet, this will be an empty file.
#' ```
#' usethis::edit_r_environ()
#'```
#' In the opened file `.Renviron` add the next line, then save the file and restart your R session.
#' Please note that this example sets "multisession" as default. For most users
#' this should be the appropriate plan but please make sure it truly is.
#' ```
#' R_FUTURE_PLAN="multisession"
#' ```
#' After the session is freshly restarted please check if the above method worked
#' by running the next line. If the output is `FALSE` you successfully set up a
#' default non-sequential [future::plan()]. If the output is `TRUE` all functions
#' will behave like they were called with [purrr::map()] and NOT in multisession.
#' ```
#' inherits(future::plan(), "sequential")
#' ```
#' For more information on possible plans please see
#' [the future package Readme](https://github.com/futureverse/future/blob/develop/README.md).
#'
#' For more information on `.Renviron` please see
#' [this book chapter](https://rstats.wtf/r-startup.html).
#'
#' ## Get Progress Updates while Functions are Running
#'
#' Most nflfastR functions are able to show progress updates
#' using [progressr::progressor()] if they are turned on before the function is
#' called. There are at least two basic ways to do this by either activating
#' progress updates globally (for the current session) with
#' ```
#' progressr::handlers(global = TRUE)
#' ```
#' or by piping the function call into [progressr::with_progress()]:
#' ```
#' load_pbp(2018:2020) |>
#'   progressr::with_progress()
#' ```
#'
#' Just like in the previous section, it is possible to activate global
#' progression handlers by default. This can be done by following the below given steps.
#'
#' First, run the following line and the file `.Rprofile` should be opened automatically.
#' If you haven't saved any code yet, this will be an empty file.
#' ```
#' usethis::edit_r_profile()
#'```
#' In the opened file `.Rprofile` add the next line, then save the file and restart your R
#' session. All code in this file will be executed when a new R session starts.
#' The part `if (require("progressr"))` makes sure this will only run if the
#' package progressr is installed to avoid crashing R sessions.
#' ```
#' if (requireNamespace("progressr", quietly = TRUE)) progressr::handlers(global = TRUE)
#' ```
#'
#' After the session is freshly restarted please check if the above method worked
#' by running the next line. If the output is `TRUE` you successfully activated
#' global progression handlers for all sessions.
#' ```
#' progressr::handlers(global = NA)
#' ```
#'
#' For more information how to work with progress handlers please see [progressr::progressr].
#'
#' For more information on `.Rprofile` please see
#' [this book chapter](https://rstats.wtf/r-startup.html).
#'
"_PACKAGE"

# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @import dplyr
#' @import fastrmodels
#' @importFrom data.table %between% %chin%
#' @importFrom rlang .data := .env %||%
# We have to import something from xgboost because it is listed as dependency to
# be able to apply models.
#' @importFrom xgboost getinfo
## usethis namespace: end
NULL


# Re-Exports --------------------------------------------------------------

#' @importFrom nflreadr load_pbp
#' @export
nflreadr::load_pbp

#' @importFrom nflreadr load_player_stats
#' @export
nflreadr::load_player_stats

#' @importFrom nflreadr load_team_stats
#' @export
nflreadr::load_team_stats

#' @importFrom nflreadr load_schedules
#' @export
nflreadr::load_schedules

#' @importFrom nflreadr load_rosters
#' @export
nflreadr::load_rosters

#' @importFrom nflreadr nflverse_sitrep
#' @export
nflreadr::nflverse_sitrep

#' @importFrom nflreadr most_recent_season
#' @export
nflreadr::most_recent_season


================================================
FILE: R/report.R
================================================
#' Get a Situation Report on System, nflverse Package Versions and Dependencies
#'
#' @description
#'
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated. Please use [`nflreadr::nflverse_sitrep`].
#'
#' This function gives a quick overview of the versions of R and
#'   the operating system as well as the versions of nflverse packages, options,
#'   and their dependencies. It's primarily designed to help you get a quick
#'   idea of what's going on when you're helping someone else debug a problem.
#' @details See [`nflreadr::nflverse_sitrep`] for details.
#' @inheritDotParams nflreadr::nflverse_sitrep
#' @inherit nflreadr::nflverse_sitrep
#' @keywords internal
#' @examples
#' \donttest{
#' \dontshow{
#' # set CRAN mirror to avoid failing checks in weird scenarios
#' old_ops <- options(repos = c("CRAN" = "https://cran.rstudio.com/"))
#' }
#'
#' # report(recursive = FALSE)
#' nflverse_sitrep(pkg = "nflreadr", recursive = TRUE)
#'
#' \dontshow{
#' # restore old options
#' options(old_ops)
#' }
#' }
#' @export
report <- function(...) {
  lifecycle::deprecate_warn(
    "5.2.0",
    "report()",
    "nflreadr::nflverse_sitrep()"
  )
  nflreadr::nflverse_sitrep(...)
}


================================================
FILE: R/save_raw_pbp.R
================================================
#' Download Raw PBP Data to Local Filesystem
#'
#' The functions [build_nflfastR_pbp()] and [fast_scraper()] support loading
#' raw pbp data from local file systems instead of Github servers.
#' This function is intended to help setting this up. It loads raw pbp data
#' and saves it in the given directory split by season in subdirectories.
#'
#' @param game_ids A vector of nflverse game IDs.
#' @param dir Path to local directory (defaults to option "nflfastR.raw_directory").
#'   nflfastR will download the raw game files split by season into one sub
#'   directory per season.
#'
#' @returns The function returns a data frame with one row for each downloaded file and
#' the following columns:
#'  - `success` if the HTTP request was successfully performed, regardless of the
#'  response status code. This is `FALSE` in case of a network error, or in case
#'  you tried to resume from a server that did not support this. A value of `NA`
#'  means the download was interrupted while in progress.
#'  - `status_code` the HTTP status code from the request. A successful download is
#'  usually `200` for full requests or `206` for resumed requests. Anything else
#'  could indicate that the downloaded file contains an error page instead of the
#'  requested content.
#'  - `resumefrom` the file size before the request, in case a download was resumed.
#'  - `url` final url (after redirects) of the request.
#'  - `destfile` downloaded file on disk.
#'  - `error` if `success == FALSE` this column contains an error message.
#'  - `type` the `Content-Type` response header value.
#'  - `modified` the `Last-Modified` response header value.
#'  - `time` total elapsed download time for this file in seconds.
#'  - `headers` vector with http response headers for the request.
#' @export
#'
#' @seealso [build_nflfastR_pbp()], [missing_raw_pbp()]
#'
#' @examples
#' \donttest{
#' # CREATE LOCAL TEMP DIRECTORY
#' local_dir <- tempdir()
#'
#' # LOAD AND SAVE A GAME TO TEMP DIRECTORY
#' save_raw_pbp("2021_20_BUF_KC", dir = local_dir)
#'
#' # REMOVE THE DIRECTORY
#' unlink(file.path(local_dir, 2021))
#' }
save_raw_pbp <- function(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL)
) {
  verify_game_ids(game_ids = game_ids)
  if (is.null(dir)) {
    cli::cli_abort(
      "Invalid argument {.arg dir}. Do you need to set \\
                   {.code options(nflfastR.raw_directory)}?"
    )
  } else if (!dir.exists(dir)) {
    cli::cli_abort(
      "You've asked to save raw pbp to {.path {dir}} which \\
                   doesn't exist. Please create it."
    )
  }
  seasons <- substr(game_ids, 1, 4)
  season_folders <- file.path(dir, unique(seasons)) |> sort()
  missing_season_folders <- season_folders[!dir.exists(season_folders)]
  created_folders <- vapply(
    missing_season_folders,
    dir.create,
    FUN.VALUE = logical(1L)
  )
  to_load <- raw_pbp_urls(game_ids)
  save_to <- file.path(
    dir,
    seasons,
    paste0(game_ids, ".rds")
  )
  dl <- curl::multi_download(to_load, save_to)
  failed <- dl$status_code != 200
  if (any(failed)) {
    cli::cli_alert_danger(
      "Failed to download: {.var {game_ids[failed]}}"
    )
    file.remove(save_to[failed])
  }
  dl
}

#' Compute Missing Raw PBP Data on Local Filesystem
#'
#' Uses [nflreadr::load_schedules()] to load game IDs of finished games and
#' compares these IDs to all files saved under `dir`.
#' This function is intended to serve as input for [save_raw_pbp()].
#'
#' @inheritParams save_raw_pbp
#' @inheritParams nflreadr::load_schedules
#' @param verbose If `TRUE`, will print number of missing game files as well as
#'   oldest and most recent missing ID to console.
#'
#' @return A character vector of missing game IDs. If no files are missing,
#'  returns `NULL` invisibly.
#' @export
#'
#' @seealso [save_raw_pbp()]
#'
#' @examples
#' \donttest{
#' try(
#' missing <- missing_raw_pbp(tempdir())
#' )
#' }
missing_raw_pbp <- function(
  dir = getOption("nflfastR.raw_directory", default = NULL),
  seasons = TRUE,
  verbose = TRUE
) {
  if (is.null(dir)) {
    cli::cli_abort(
      "Invalid argument {.arg dir}. Do you need to set \\
                   {.code options(nflfastR.raw_directory)}?"
    )
  } else if (!dir.exists(dir)) {
    cli::cli_abort(
      "You've asked to check raw pbp in {.path {dir}} which \\
                   doesn't exist. Please create it."
    )
  }
  local_games <- sapply(list.files(dir, full.names = TRUE), list.files) |>
    unlist(use.names = FALSE) |>
    tools::file_path_sans_ext()

  finished_games <- nflreadr::load_schedules(seasons = seasons) |>
    dplyr::filter(!is.na(.data$result)) |>
    dplyr::pull(.data$game_id)

  local_missing_games <- finished_games[!finished_games %in% local_games]

  if (length(local_missing_games) == 0) {
    cli::cli_alert_success("No missing games!")
    return(invisible(NULL))
  }

  if (isTRUE(verbose)) {
    cli::cli_alert_info(
      "You are missing {length(local_missing_games)} game file{?s}. \\
       The oldest missing game is {.val {local_missing_games[[1]]}}. \\
       The most recent missing game is \\
       {.val {local_missing_games[length(local_missing_games)]}}."
    )
  }

  local_missing_games
}


verify_game_ids <- function(game_ids) {
  # game_ids <- c(
  #   "2021_02_LAC_KC",
  #   "Hello World",
  #   "2028_01_LAC_JAX",
  #   "2022_27_LAC_BUF",
  #   "2021_02_LAC_KAC"
  # )
  season_check <- substr(game_ids, 1, 4) %in%
    seq.int(1999, as.integer(format(Sys.Date(), "%Y")) + 1, 1)
  week_check <- as.integer(substr(game_ids, 6, 7)) %in% seq_len(22)
  team_name_check <-
    vapply(
      stringr::str_extract_all(game_ids, "(?<=_)[:upper:]{2,3}"),
      function(t) all(t %in% nflfastR::teams_colors_logos$team_abbr),
      FUN.VALUE = logical(1L)
    )
  combined_check <- season_check & week_check & team_name_check

  if (any(combined_check == FALSE)) {
    cli::cli_abort(
      "The game IDs {.val {game_ids[!combined_check]}} seem to be invalid!"
    )
  }

  invisible(NULL)
}


================================================
FILE: R/top-level_scraper.R
================================================
################################################################################
# Author: Sebastian Carl
# Purpose: Top-Level functions which will be made available through the package
# Code Style Guide: styler::tidyverse_style()
################################################################################

# pbp ---------------------------------------------------------------------

#' Get NFL Play by Play Data
#'
#' @description Load and parse NFL play-by-play data and add all of the original
#'   nflfastR variables. As nflfastR now provides multiple functions which add
#'   information to the output of this function, it is recommended to use
#'   \code{\link{build_nflfastR_pbp}} instead.
#'
#' @param game_ids Vector of character ids or a data frame including the variable
#' `game_id` (see details for further information).
#' @param dir Path to local directory (defaults to option "nflfastR.raw_directory")
#'   where nflfastR searches for raw game play-by-play data.
#'   See [save_raw_pbp()] for additional information.
#' @param ... Additional arguments passed to the scraping functions (for internal use)
#' @param in_builder If \code{TRUE}, the final message will be suppressed (for usage inside of \code{\link{build_nflfastR_pbp}}).
#' @details To load valid game_ids please use the package function
#' \code{\link{fast_scraper_schedules}} (the function can directly handle the
#' output of that function)
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @seealso [build_nflfastR_pbp()], [save_raw_pbp()]
#' @return Data frame where each individual row represents a single play for
#' all passed game_ids containing the following
#' detailed information (description partly extracted from nflscrapR):
#' \describe{
#' \item{play_id}{Numeric play id that when used with game_id and drive provides the unique identifier for a single play.}
#' \item{game_id}{Ten digit identifier for NFL game.}
#' \item{old_game_id}{Legacy NFL game ID.}
#' \item{home_team}{String abbreviation for the home team.}
#' \item{away_team}{String abbreviation for the away team.}
#' \item{season_type}{'REG' or 'POST' indicating if the game belongs to regular or post season.}
#' \item{week}{Season week.}
#' \item{posteam}{String abbreviation for the team with possession.}
#' \item{posteam_type}{String indicating whether the posteam team is home or away.}
#' \item{defteam}{String abbreviation for the team on defense.}
#' \item{side_of_field}{String abbreviation for which team's side of the field the team with possession is currently on.}
#' \item{yardline_100}{Numeric distance in the number of yards from the opponent's endzone for the posteam.}
#' \item{game_date}{Date of the game.}
#' \item{quarter_seconds_remaining}{Numeric seconds remaining in the quarter.}
#' \item{half_seconds_remaining}{Numeric seconds remaining in the half.}
#' \item{game_seconds_remaining}{Numeric seconds remaining in the game.}
#' \item{game_half}{String indicating which half the play is in, either Half1, Half2, or Overtime.}
#' \item{quarter_end}{Binary indicator for whether or not the row of the data is marking the end of a quarter.}
#' \item{drive}{Numeric drive number in the game.}
#' \item{sp}{Binary indicator for whether or not a score occurred on the play.}
#' \item{qtr}{Quarter of the game (5 is overtime).}
#' \item{down}{The down for the given play.}
#' \item{goal_to_go}{Binary indicator for whether or not the posteam is in a goal down situation.}
#' \item{time}{Time at start of play provided in string format as minutes:seconds remaining in the quarter.}
#' \item{yrdln}{String indicating the current field position for a given play.}
#' \item{ydstogo}{Numeric yards in distance from either the first down marker or the endzone in goal down situations.}
#' \item{ydsnet}{Numeric value for total yards gained on the given drive.}
#' \item{desc}{Detailed string description for the given play.}
#' \item{play_type}{String indicating the type of play: pass (includes sacks), run (includes scrambles), punt, field_goal, kickoff, extra_point, qb_kneel, qb_spike, no_play (timeouts and penalties), and missing for rows indicating end of play.}
#' \item{yards_gained}{Numeric yards gained (or lost) by the possessing team, excluding yards gained via fumble recoveries and laterals.}
#' \item{shotgun}{Binary indicator for whether or not the play was in shotgun formation.}
#' \item{no_huddle}{Binary indicator for whether or not the play was in no_huddle formation.}
#' \item{qb_dropback}{Binary indicator for whether or not the QB dropped back on the play (pass attempt, sack, or scrambled).}
#' \item{qb_kneel}{Binary indicator for whether or not the QB took a knee.}
#' \item{qb_spike}{Binary indicator for whether or not the QB spiked the ball.}
#' \item{qb_scramble}{Binary indicator for whether or not the QB scrambled.}
#' \item{pass_length}{String indicator for pass length: short or deep.}
#' \item{pass_location}{String indicator for pass location: left, middle, or right.}
#' \item{air_yards}{Numeric value for distance in yards perpendicular to the line of scrimmage at where the targeted receiver either caught or didn't catch the ball.}
#' \item{yards_after_catch}{Numeric value for distance in yards perpendicular to the yard line where the receiver made the reception to where the play ended.}
#' \item{run_location}{String indicator for location of run: left, middle, or right.}
#' \item{run_gap}{String indicator for line gap of run: end, guard, or tackle}
#' \item{field_goal_result}{String indicator for result of field goal attempt: made, missed, or blocked.}
#' \item{kick_distance}{Numeric distance in yards for kickoffs, field goals, and punts.}
#' \item{extra_point_result}{String indicator for the result of the extra point attempt: good, failed, blocked, safety (touchback in defensive endzone is 1 point apparently), or aborted.}
#' \item{two_point_conv_result}{String indicator for result of two point conversion attempt: success, failure, safety (touchback in defensive endzone is 1 point apparently), or return.}
#' \item{home_timeouts_remaining}{Numeric timeouts remaining in the half for the home team.}
#' \item{away_timeouts_remaining}{Numeric timeouts remaining in the half for the away team.}
#' \item{timeout}{Binary indicator for whether or not a timeout was called by either team.}
#' \item{timeout_team}{String abbreviation for which team called the timeout.}
#' \item{td_team}{String abbreviation for which team scored the touchdown.}
#' \item{td_player_name}{String name of the player who scored a touchdown.}
#' \item{td_player_id}{Unique identifier of the player who scored a touchdown.}
#' \item{posteam_timeouts_remaining}{Number of timeouts remaining for the possession team.}
#' \item{defteam_timeouts_remaining}{Number of timeouts remaining for the team on defense.}
#' \item{total_home_score}{Score for the home team at the end of the play.}
#' \item{total_away_score}{Score for the away team at the end of the play.}
#' \item{posteam_score}{Score the posteam at the start of the play.}
#' \item{defteam_score}{Score the defteam at the start of the play.}
#' \item{score_differential}{Score differential between the posteam and defteam at the start of the play.}
#' \item{posteam_score_post}{Score for the posteam at the end of the play.}
#' \item{defteam_score_post}{Score for the defteam at the end of the play.}
#' \item{score_differential_post}{Score differential between the posteam and defteam at the end of the play.}
#' \item{no_score_prob}{Predicted probability of no score occurring for the rest of the half based on the expected points model.}
#' \item{opp_fg_prob}{Predicted probability of the defteam scoring a FG next.}
#' \item{opp_safety_prob}{Predicted probability of the defteam scoring a safety next.}
#' \item{opp_td_prob}{Predicted probability of the defteam scoring a TD next.}
#' \item{fg_prob}{Predicted probability of the posteam scoring a FG next.}
#' \item{safety_prob}{Predicted probability of the posteam scoring a safety next.}
#' \item{td_prob}{Predicted probability of the posteam scoring a TD next.}
#' \item{extra_point_prob}{Predicted probability of the posteam scoring an extra point.}
#' \item{two_point_conversion_prob}{Predicted probability of the posteam scoring the two point conversion.}
#' \item{ep}{Using the scoring event probabilities, the estimated expected points with respect to the possession team for the given play.}
#' \item{epa}{Expected points added (EPA) by the posteam for the given play.}
#' \item{total_home_epa}{Cumulative total EPA for the home team in the game so far.}
#' \item{total_away_epa}{Cumulative total EPA for the away team in the game so far.}
#' \item{total_home_rush_epa}{Cumulative total rushing EPA for the home team in the game so far.}
#' \item{total_away_rush_epa}{Cumulative total rushing EPA for the away team in the game so far.}
#' \item{total_home_pass_epa}{Cumulative total passing EPA for the home team in the game so far.}
#' \item{total_away_pass_epa}{Cumulative total passing EPA for the away team in the game so far.}
#' \item{air_epa}{EPA from the air yards alone. For completions this represents the actual value provided through the air. For incompletions this represents the hypothetical value that could've been added through the air if the pass was completed.}
#' \item{yac_epa}{EPA from the yards after catch alone. For completions this represents the actual value provided after the catch. For incompletions this represents the difference between the hypothetical air_epa and the play's raw observed EPA (how much the incomplete pass cost the posteam).}
#' \item{comp_air_epa}{EPA from the air yards alone only for completions.}
#' \item{comp_yac_epa}{EPA from the yards after catch alone only for completions.}
#' \item{total_home_comp_air_epa}{Cumulative total completions air EPA for the home team in the game so far.}
#' \item{total_away_comp_air_epa}{Cumulative total completions air EPA for the away team in the game so far.}
#' \item{total_home_comp_yac_epa}{Cumulative total completions yac EPA for the home team in the game so far.}
#' \item{total_away_comp_yac_epa}{Cumulative total completions yac EPA for the away team in the game so far.}
#' \item{total_home_raw_air_epa}{Cumulative total raw air EPA for the home team in the game so far.}
#' \item{total_away_raw_air_epa}{Cumulative total raw air EPA for the away team in the game so far.}
#' \item{total_home_raw_yac_epa}{Cumulative total raw yac EPA for the home team in the game so far.}
#' \item{total_away_raw_yac_epa}{Cumulative total raw yac EPA for the away team in the game so far.}
#' \item{wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play.}
#' \item{def_wp}{Estimated win probability for the defteam.}
#' \item{home_wp}{Estimated win probability for the home team.}
#' \item{away_wp}{Estimated win probability for the away team.}
#' \item{wpa}{Win probability added (WPA) for the posteam.}
#' \item{vegas_wpa}{Win probability added (WPA) for the posteam: spread_adjusted model.}
#' \item{vegas_home_wpa}{Win probability added (WPA) for the home team: spread_adjusted model.}
#' \item{home_wp_post}{Estimated win probability for the home team at the end of the play.}
#' \item{away_wp_post}{Estimated win probability for the away team at the end of the play.}
#' \item{vegas_wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.}
#' \item{vegas_home_wp}{Estimated win probability for the home team incorporating pre-game Vegas line.}
#' \item{total_home_rush_wpa}{Cumulative total rushing WPA for the home team in the game so far.}
#' \item{total_away_rush_wpa}{Cumulative total rushing WPA for the away team in the game so far.}
#' \item{total_home_pass_wpa}{Cumulative total passing WPA for the home team in the game so far.}
#' \item{total_away_pass_wpa}{Cumulative total passing WPA for the away team in the game so far.}
#' \item{air_wpa}{WPA through the air (same logic as air_epa).}
#' \item{yac_wpa}{WPA from yards after the catch (same logic as yac_epa).}
#' \item{comp_air_wpa}{The air_wpa for completions only.}
#' \item{comp_yac_wpa}{The yac_wpa for completions only.}
#' \item{total_home_comp_air_wpa}{Cumulative total completions air WPA for the home team in the game so far.}
#' \item{total_away_comp_air_wpa}{Cumulative total completions air WPA for the away team in the game so far.}
#' \item{total_home_comp_yac_wpa}{Cumulative total completions yac WPA for the home team in the game so far.}
#' \item{total_away_comp_yac_wpa}{Cumulative total completions yac WPA for the away team in the game so far.}
#' \item{total_home_raw_air_wpa}{Cumulative total raw air WPA for the home team in the game so far.}
#' \item{total_away_raw_air_wpa}{Cumulative total raw air WPA for the away team in the game so far.}
#' \item{total_home_raw_yac_wpa}{Cumulative total raw yac WPA for the home team in the game so far.}
#' \item{total_away_raw_yac_wpa}{Cumulative total raw yac WPA for the away team in the game so far.}
#' \item{punt_blocked}{Binary indicator for if the punt was blocked.}
#' \item{first_down_rush}{Binary indicator for if a running play converted the first down.}
#' \item{first_down_pass}{Binary indicator for if a passing play converted the first down.}
#' \item{first_down_penalty}{Binary indicator for if a penalty converted the first down.}
#' \item{third_down_converted}{Binary indicator for if the first down was converted on third down.}
#' \item{third_down_failed}{Binary indicator for if the posteam failed to convert first down on third down.}
#' \item{fourth_down_converted}{Binary indicator for if the first down was converted on fourth down.}
#' \item{fourth_down_failed}{Binary indicator for if the posteam failed to convert first down on fourth down.}
#' \item{incomplete_pass}{Binary indicator for if the pass was incomplete.}
#' \item{touchback}{Binary indicator for if a touchback occurred on the play.}
#' \item{interception}{Binary indicator for if the pass was intercepted.}
#' \item{punt_inside_twenty}{Binary indicator for if the punt ended inside the twenty yard line.}
#' \item{punt_in_endzone}{Binary indicator for if the punt was in the endzone.}
#' \item{punt_out_of_bounds}{Binary indicator for if the punt went out of bounds.}
#' \item{punt_downed}{Binary indicator for if the punt was downed.}
#' \item{punt_fair_catch}{Binary indicator for if the punt was caught with a fair catch.}
#' \item{kickoff_inside_twenty}{Binary indicator for if the kickoff ended inside the twenty yard line.}
#' \item{kickoff_in_endzone}{Binary indicator for if the kickoff was in the endzone.}
#' \item{kickoff_out_of_bounds}{Binary indicator for if the kickoff went out of bounds.}
#' \item{kickoff_downed}{Binary indicator for if the kickoff was downed.}
#' \item{kickoff_fair_catch}{Binary indicator for if the kickoff was caught with a fair catch.}
#' \item{fumble_forced}{Binary indicator for if the fumble was forced.}
#' \item{fumble_not_forced}{Binary indicator for if the fumble was not forced.}
#' \item{fumble_out_of_bounds}{Binary indicator for if the fumble went out of bounds.}
#' \item{solo_tackle}{Binary indicator if the play had a solo tackle (could be multiple due to fumbles).}
#' \item{safety}{Binary indicator for whether or not a safety occurred.}
#' \item{penalty}{Binary indicator for whether or not a penalty occurred.}
#' \item{tackled_for_loss}{Binary indicator for whether or not a tackle for loss on a run play occurred.}
#' \item{fumble_lost}{Binary indicator for if the fumble was lost.}
#' \item{own_kickoff_recovery}{Binary indicator for if the kicking team recovered the kickoff.}
#' \item{own_kickoff_recovery_td}{Binary indicator for if the kicking team recovered the kickoff and scored a TD.}
#' \item{qb_hit}{Binary indicator if the QB was hit on the play.}
#' \item{rush_attempt}{Binary indicator for if the play was a run.}
#' \item{pass_attempt}{Binary indicator for if the play was a pass attempt (includes sacks).}
#' \item{sack}{Binary indicator for if the play ended in a sack.}
#' \item{touchdown}{Binary indicator for if the play resulted in a TD.}
#' \item{pass_touchdown}{Binary indicator for if the play resulted in a passing TD.}
#' \item{rush_touchdown}{Binary indicator for if the play resulted in a rushing TD.}
#' \item{return_touchdown}{Binary indicator for if the play resulted in a return TD.}
#' \item{extra_point_attempt}{Binary indicator for extra point attempt.}
#' \item{two_point_attempt}{Binary indicator for two point conversion attempt.}
#' \item{field_goal_attempt}{Binary indicator for field goal attempt.}
#' \item{kickoff_attempt}{Binary indicator for kickoff.}
#' \item{punt_attempt}{Binary indicator for punts.}
#' \item{fumble}{Binary indicator for if a fumble occurred.}
#' \item{complete_pass}{Binary indicator for if the pass was completed.}
#' \item{assist_tackle}{Binary indicator for if an assist tackle occurred.}
#' \item{lateral_reception}{Binary indicator for if a lateral occurred on the reception.}
#' \item{lateral_rush}{Binary indicator for if a lateral occurred on a run.}
#' \item{lateral_return}{Binary indicator for if a lateral occurred on a return.}
#' \item{lateral_recovery}{Binary indicator for if a lateral occurred on a fumble recovery.}
#' \item{passer_player_id}{Unique identifier for the player that attempted the pass.}
#' \item{passer_player_name}{String name for the player that attempted the pass.}
#' \item{passing_yards}{Numeric yards by the passer_player_name, including yards gained in pass plays with laterals.
#' This should equal official passing statistics.}
#' \item{receiver_player_id}{Unique identifier for the receiver that was targeted on the pass.}
#' \item{receiver_player_name}{String name for the targeted receiver.}
#' \item{receiving_yards}{Numeric yards by the receiver_player_name, excluding yards gained in pass plays with laterals.
#' This should equal official receiving statistics but could miss yards gained in pass plays with laterals.
#' Please see the description of `lateral_receiver_player_name` for further information.}
#' \item{rusher_player_id}{Unique identifier for the player that attempted the run.}
#' \item{rusher_player_name}{String name for the player that attempted the run.}
#' \item{rushing_yards}{Numeric yards by the rusher_player_name, excluding yards gained in rush plays with laterals.
#' This should equal official rushing statistics but could miss yards gained in rush plays with laterals.
#' Please see the description of `lateral_rusher_player_name` for further information.}
#' \item{lateral_receiver_player_id}{Unique identifier for the player that received the last(!) lateral on a pass play.}
#' \item{lateral_receiver_player_name}{String name for the player that received the last(!) lateral on a pass play.
#' If there were multiple laterals in the same play, this will only be the last player who received a lateral.
#' Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
#' for a list of plays where multiple players recorded lateral receiving yards.}
#' \item{lateral_receiving_yards}{Numeric yards by the `lateral_receiver_player_name` in pass plays with laterals.
#' Please see the description of `lateral_receiver_player_name` for further information.}
#' \item{lateral_rusher_player_id}{Unique identifier for the player that received the last(!) lateral on a run play.}
#' \item{lateral_rusher_player_name}{String name for the player that received the last(!) lateral on a run play.
#' If there were multiple laterals in the same play, this will only be the last player who received a lateral.
#' Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
#' for a list of plays where multiple players recorded lateral rushing yards.}
#' \item{lateral_rushing_yards}{Numeric yards by the `lateral_rusher_player_name` in run plays with laterals.
#' Please see the description of `lateral_rusher_player_name` for further information.}
#' \item{lateral_sack_player_id}{Unique identifier for the player that received the lateral on a sack.}
#' \item{lateral_sack_player_name}{String name for the player that received the lateral on a sack.}
#' \item{interception_player_id}{Unique identifier for the player that intercepted the pass.}
#' \item{interception_player_name}{String name for the player that intercepted the pass.}
#' \item{lateral_interception_player_id}{Unique indentifier for the player that received the lateral on an interception.}
#' \item{lateral_interception_player_name}{String name for the player that received the lateral on an interception.}
#' \item{punt_returner_player_id}{Unique identifier for the punt returner.}
#' \item{punt_returner_player_name}{String name for the punt returner.}
#' \item{lateral_punt_returner_player_id}{Unique identifier for the player that received the lateral on a punt return.}
#' \item{lateral_punt_returner_player_name}{String name for the player that received the lateral on a punt return.}
#' \item{kickoff_returner_player_name}{String name for the kickoff returner.}
#' \item{kickoff_returner_player_id}{Unique identifier for the kickoff returner.}
#' \item{lateral_kickoff_returner_player_id}{Unique identifier for the player that received the lateral on a kickoff return.}
#' \item{lateral_kickoff_returner_player_name}{String name for the player that received the lateral on a kickoff return.}
#' \item{punter_player_id}{Unique identifier for the punter.}
#' \item{punter_player_name}{String name for the punter.}
#' \item{kicker_player_name}{String name for the kicker on FG or kickoff.}
#' \item{kicker_player_id}{Unique identifier for the kicker on FG or kickoff.}
#' \item{own_kickoff_recovery_player_id}{Unique identifier for the player that recovered their own kickoff.}
#' \item{own_kickoff_recovery_player_name}{String name for the player that recovered their own kickoff.}
#' \item{blocked_player_id}{Unique identifier for the player that blocked the punt or FG.}
#' \item{blocked_player_name}{String name for the player that blocked the punt or FG.}
#' \item{tackle_for_loss_1_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_1_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{qb_hit_1_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_1_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{forced_fumble_player_1_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_name}{String name of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_name}{String name of one of the players with a forced fumble.}
#' \item{solo_tackle_1_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_2_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_name}{String name of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_name}{String name of one of the players with a solo tackle.}
#' \item{assist_tackle_1_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_1_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_1_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_2_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_3_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_4_team}{Team of one of the players with a tackle assist.}
#' \item{tackle_with_assist}{Binary indicator for if there has been a tackle with assist.}
#' \item{tackle_with_assist_1_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_team}{Team of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_team}{Team of one of the players with a tackle with assist.}
#' \item{pass_defense_1_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_1_player_name}{String name of one of the players with a pass defense.}
#' \item{pass_defense_2_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_2_player_name}{String name of one of the players with a pass defense.}
#' \item{fumbled_1_team}{Team of one of the first player with a fumble.}
#' \item{fumbled_1_player_id}{Unique identifier of the first player who fumbled on the play.}
#' \item{fumbled_1_player_name}{String name of one of the first player who fumbled on the play.}
#' \item{fumbled_2_player_id}{Unique identifier of the second player who fumbled on the play.}
#' \item{fumbled_2_player_name}{String name of one of the second player who fumbled on the play.}
#' \item{fumbled_2_team}{Team of one of the second player with a fumble.}
#' \item{fumble_recovery_1_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_name}{String name of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_name}{String name of one of the players with a fumble recovery.}
#' \item{sack_player_id}{Unique identifier of the player who recorded a solo sack.}
#' \item{sack_player_name}{String name of the player who recorded a solo sack.}
#' \item{half_sack_1_player_id}{Unique identifier of the first player who recorded half a sack.}
#' \item{half_sack_1_player_name}{String name of the first player who recorded half a sack.}
#' \item{half_sack_2_player_id}{Unique identifier of the second player who recorded half a sack.}
#' \item{half_sack_2_player_name}{String name of the second player who recorded half a sack.}
#' \item{return_team}{String abbreviation of the return team.}
#' \item{return_yards}{Yards gained by the return team.}
#' \item{penalty_team}{String abbreviation of the team with the penalty.}
#' \item{penalty_player_id}{Unique identifier for the player with the penalty.}
#' \item{penalty_player_name}{String name for the player with the penalty.}
#' \item{penalty_yards}{Yards gained (or lost) by the posteam from the penalty.}
#' \item{replay_or_challenge}{Binary indicator for whether or not a replay or challenge.}
#' \item{replay_or_challenge_result}{String indicating the result of the replay or challenge.}
#' \item{penalty_type}{String indicating the penalty type of the first penalty in the given play. Will be `NA` if `desc` is missing the type.}
#' \item{defensive_two_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on a two point conversion, this results following a turnover.}
#' \item{defensive_two_point_conv}{Binary indicator whether or not the defense successfully scored on the two point conversion.}
#' \item{defensive_extra_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on an extra point attempt, this results following a blocked attempt that the defense recovers the ball.}
#' \item{defensive_extra_point_conv}{Binary indicator whether or not the defense successfully scored on an extra point attempt.}
#' \item{safety_player_name}{String name for the player who scored a safety.}
#' \item{safety_player_id}{Unique identifier for the player who scored a safety.}
#' \item{season}{4 digit number indicating to which season the game belongs to.}
#' \item{cp}{Numeric value indicating the probability for a complete pass based on comparable game situations.}
#' \item{cpoe}{For a single pass play this is 1 - cp when the pass was completed or 0 - cp when the pass was incomplete. Analyzed for a whole game or season an indicator for the passer how much over or under expectation his completion percentage was.}
#' \item{series}{Starts at 1, each new first down increments, numbers shared across both teams NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam}
#' \item{series_success}{1: scored touchdown, gained enough yards for first down.}
#' \item{series_result}{Possible values: First down, Touchdown, Opp touchdown, Field goal, Missed field goal, Safety, Turnover, Punt, Turnover on downs, QB kneel, End of half}
#' \item{start_time}{Kickoff time in eastern time zone.}
#' \item{order_sequence}{Column provided by NFL to fix out-of-order plays. Available 2011 and beyond with source "nfl".}
#' \item{time_of_day}{Time of day of play in UTC "HH:MM:SS" format. Available 2011 and beyond with source "nfl".}
#' \item{stadium}{Game site name.}
#' \item{weather}{String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!}
#' \item{nfl_api_id}{UUID of the game in the new NFL API.}
#' \item{play_clock}{Time on the playclock when the ball was snapped.}
#' \item{play_deleted}{Binary indicator for deleted plays.}
#' \item{play_type_nfl}{Play type as listed in the NFL source. Slightly different to the regular play_type variable.}
#' \item{special_teams_play}{Binary indicator for whether play is special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{st_play_type}{Type of special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{end_clock_time}{Game time at the end of a given play.}
#' \item{end_yard_line}{String indicating the yardline at the end of the given play consisting of team half and yard line number.}
#' \item{drive_real_start_time}{Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').}
#' \item{drive_play_count}{Numeric value of how many regular plays happened in a given drive.}
#' \item{drive_time_of_possession}{Time of possession in a given drive.}
#' \item{drive_first_downs}{Number of first downs in a given drive.}
#' \item{drive_inside20}{Binary indicator if the offense was able to get inside the opponents 20 yard line.}
#' \item{drive_ended_with_score}{Binary indicator the drive ended with a score.}
#' \item{drive_quarter_start}{Numeric value indicating in which quarter the given drive has started.}
#' \item{drive_quarter_end}{Numeric value indicating in which quarter the given drive has ended.}
#' \item{drive_yards_penalized}{Numeric value of how many yards the offense gained or lost through penalties in the given drive.}
#' \item{drive_start_transition}{String indicating how the offense got the ball.}
#' \item{drive_end_transition}{String indicating how the offense lost the ball.}
#' \item{drive_game_clock_start}{Game time at the beginning of a given drive.}
#' \item{drive_game_clock_end}{Game time at the end of a given drive.}
#' \item{drive_start_yard_line}{String indicating where a given drive started consisting of team half and yard line number.}
#' \item{drive_end_yard_line}{String indicating where a given drive ended consisting of team half and yard line number.}
#' \item{drive_play_id_started}{Play_id of the first play in the given drive.}
#' \item{drive_play_id_ended}{Play_id of the last play in the given drive.}
#' \item{fixed_drive}{Manually created drive number in a game.}
#' \item{fixed_drive_result}{Manually created drive result.}
#' \item{away_score}{Total points scored by the away team.}
#' \item{home_score}{Total points scored by the home team.}
#' \item{location}{Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. }
#' \item{result}{Equals home_score - away_score and means the game outcome from the perspective of the home team.}
#' \item{total}{Equals home_score + away_score and means the total points scored in the given game.}
#' \item{spread_line}{The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference)}
#' \item{total_line}{The closing total line for the game. (Source: Pro-Football-Reference)}
#' \item{div_game}{Binary indicator for if the given game was a division game.}
#' \item{roof}{One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{surface}{What type of ground the game was played on. (Source: Pro-Football-Reference)}
#' \item{temp}{The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)}
#' \item{wind}{The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)}
#' \item{home_coach}{First and last name of the home team coach. (Source: Pro-Football-Reference)}
#' \item{away_coach}{First and last name of the away team coach. (Source: Pro-Football-Reference)}
#' \item{stadium_id}{ID of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{game_stadium}{Name of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' }
#' @export
#' @examples
#' \donttest{
#' # Get pbp data for two games
#' try({# to avoid CRAN test problems
#' fast_scraper(c("2019_01_GB_CHI", "2013_21_SEA_DEN"))
#' })
#'
#'
#' # It is also possible to directly use the
#' # output of `fast_scraper_schedules` as input
#' try({# to avoid CRAN test problems
#' library(dplyr, warn.conflicts = FALSE)
#' fast_scraper_schedules(2020) |>
#'   slice_tail(n = 3) |>
#'   fast_scraper()
#' })
#'
#' \dontshow{
#' # Close open connections for R CMD Check
#' future::plan("sequential")
#' }
#' }
fast_scraper <- function(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...,
  in_builder = FALSE
) {
  if (!is.vector(game_ids) && is.data.frame(game_ids)) {
    game_ids <- game_ids$game_id
  }

  if (!is.vector(game_ids)) {
    cli::cli_abort("Param {.code game_ids} is not a valid vector!")
  }

  if (length(game_ids) > 1 && is_sequential()) {
    cli::cli_alert_info(
      c(
        "It is recommended to use parallel processing when trying to load multiple games.",
        "Please consider running {.code future::plan(\"multisession\")}! ",
        "Will go on sequentially..."
      )
    )
  }

  # nflfastR v6 stopped supporting the 1999 and 2000 seasons because of
  # inconsistent data sources. Data is still available through load_pbp
  # but we will not fix any issues.
  # It's possible to install nflfastR v5.2.0 to parse those seasons.
  # try pak::pak("nflverse/nflfastR@v5.2.0")
  game_ids <- check_for_dropped_seasons(game_ids)

  suppressWarnings({
    p <- progressr::progressor(along = game_ids)
    pbp <- furrr::future_map_dfr(
      game_ids,
      function(x, p, dir, ...) {
        plays <- please_work(get_pbp_nfl)(x, dir = dir, ...)
        p(sprintf("ID=%s", as.character(x)))
        return(plays)
      },
      p,
      dir = dir,
      ...
    )

    if (length(pbp) != 0) {
      user_message("Download finished. Adding variables...", "done")
      pbp <- pbp |>
        add_game_data(...) |>
        add_nflscrapr_mutations() |>
        add_ep() |>
        add_air_yac_ep() |>
        add_wp() |>
        add_air_yac_wp() |>
        add_cp() |>
        add_drive_results() |>
        add_series_data() |>
        restore_kickoff_attempt() |>
        select_variables()
    }
  })

  if (!in_builder) {
    str <- paste0(my_time(), " | Procedure completed.")
    cli::cli_alert_success("{.field {str}}")
  }
  make_nflverse_data(pbp)
}


# roster ------------------------------------------------------------------

#' Load Team Rosters for Multiple Seasons
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated. Please use [`nflreadr::load_rosters`].
#'
#' @details See [`nflreadr::load_rosters`] for details.
#' @inheritDotParams nflreadr::load_rosters
#' @inherit nflreadr::load_rosters
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @keywords internal
#' @examples
#' \donttest{
#' # Roster of the 2019 and 2020 seasons
#' try({# to avoid CRAN test problems
#' # fast_scraper_roster(2019:2020)
#' })
#' }
#' @export
fast_scraper_roster <- function(...) {
  lifecycle::deprecate_warn(
    "5.2.0",
    "fast_scraper_roster()",
    "nflreadr::load_rosters()"
  )
  nflreadr::load_rosters(...)
}

# schedules ---------------------------------------------------------------

#' Load NFL Season Schedules
#'
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' This function was deprecated. Please use [`nflreadr::load_schedules`].
#'
#' @details See [`nflreadr::load_schedules`] for details.
#' @inheritDotParams nflreadr::load_schedules
#' @inherit nflreadr::load_schedules
#' @seealso For information on parallel processing and progress updates please
#' see [nflfastR].
#' @keywords internal
#' @examples
#'\donttest{
#' # Get schedules for the whole 2015 - 2018 seasons
#' try({# to avoid CRAN test problems
#' # fast_scraper_schedules(2015:2018)
#' })
#' }
#' @export
fast_scraper_schedules <- function(...) {
  lifecycle::deprecate_warn(
    "5.2.0",
    "fast_scraper_schedules()",
    "nflreadr::load_schedules()"
  )
  nflreadr::load_schedules(...)
}


================================================
FILE: R/utils.R
================================================
# The function `message_completed` to create the green "...completed" message
# only exists to hide the option `in_builder` in dots
message_completed <- function(x, in_builder = FALSE) {
  if (isFALSE(in_builder)) {
    str <- paste0(my_time(), " | ", x)
    cli::cli_alert_success("{.field {str}}")
  } else if (in_builder) {
    cli::cli_alert_success("{my_time()} | {x}")
  }
}

user_message <- function(x, type) {
  if (type == "done") {
    cli::cli_alert_success("{my_time()} | {x}")
  } else if (type == "todo") {
    cli::cli_ul("{my_time()} | {x}")
  } else if (type == "info") {
    cli::cli_alert_info("{my_time()} | {x}")
  } else if (type == "oops") {
    cli::cli_alert_danger("{my_time()} | {x}")
  }
}

cli_message <- function(
  msg,
  ...,
  .cli_fct = cli::cli_alert_info,
  .envir = parent.frame()
) {
  .cli_fct(c(my_time(), " | ", msg), ..., .envir = .envir)
}

my_time <- function() strftime(Sys.time(), format = "%H:%M:%S")

# custom mode function from https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode/8189441
custom_mode <- function(x, na.rm = TRUE) {
  if (na.rm) {
    x <- x[!is.na(x)]
  }
  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}

rule_header <- function(x) {
  print(cli::rule(
    left = cli::style_bold(x),
    right = paste("nflfastR version", utils::packageVersion("nflfastR")),
  ))
}

rule_footer <- function(x) {
  print(cli::rule(
    left = cli::style_bold(x)
  ))
}

# read rds that has been pre-fetched
read_raw_rds <- function(raw) {
  con <- gzcon(rawConnection(raw))
  ret <- readRDS(con)
  on.exit(close(con))
  ret
}

# helper to make sure the output of the
# schedule scraper is not named 'invalid' if the source file not yet exists
maybe_valid <- function(id) {
  all(
    length(id) == 1,
    is.character(id),
    substr(id, 1, 4) %in%
      seq.int(1999, as.integer(format(Sys.Date(), "%Y")) + 1, 1),
    as.integer(substr(id, 6, 7)) %in% seq_len(22),
    stringr::str_extract_all(id, "(?<=_)[:upper:]{2,3}")[[1]] %in%
      nflfastR::teams_colors_logos$team_abbr
  )
}

# some 2000 games have play_ids like 2767.375 and 2767.703 which results in
# duplicates that can be fixed. We save play IDs as numeric first and then
# check whether or not there are duplicates when we convert them to integer
# If there are duplicates, we multiply all play IDs by 10 and check again
# If there are still duplicates, we multiply all play IDs by 100 and so on
# As soon as play IDs are unique, we save them as integer and go on
uniquify_ids <- function(ids) {
  ids <- as.numeric(ids)
  int_ids <- as.integer(ids)
  mult <- 10
  while (anyDuplicated(int_ids) > 0) {
    int_ids <- as.integer(ids * mult)
    mult <- mult * 10
  }
  int_ids
}

# check if a package is installed
is_installed <- function(pkg) requireNamespace(pkg, quietly = TRUE)

# load raw game files esp. for debugging
load_raw_game <- function(
  game_id,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  skip_local = FALSE
) {
  # game_id <- "2022_19_LAC_JAX"

  season <- substr(game_id, 1, 4)

  local_file <- file.path(
    dir,
    season,
    paste0(game_id, ".rds")
  )

  if (
    length(local_file) == 1 && file.exists(local_file) && isFALSE(skip_local)
  ) {
    # cli::cli_progress_step("Load locally from {.path {local_file}}")
    raw <- readRDS(local_file)
  } else {
    to_load <- raw_pbp_urls(game_id)
    raw <- nflreadr::rds_from_url(to_load)
  }

  raw
}

# Identify sessions with sequential future resolving
is_sequential <- function() inherits(future::plan(), "sequential")

# take a time string of the format "MM:SS" and convert it to seconds
time_to_seconds <- function(time) {
  as.numeric(strptime(time, format = "%M:%S")) -
    as.numeric(strptime("0", format = "%S"))
}

# write season pbp to a connected db
write_pbp <- function(seasons, dbConnection, tablename) {
  p <- progressr::progressor(along = seasons)
  purrr::walk(
    seasons,
    function(x, p) {
      pbp <- nflreadr::load_pbp(x)
      if (!DBI::dbExistsTable(dbConnection, tablename)) {
        pbp <- dplyr::bind_rows(default_play, pbp)
      }
      DBI::dbWriteTable(dbConnection, tablename, pbp, append = TRUE)
      p("loading...")
    },
    p
  )
}

make_nflverse_data <- function(data, type = c("play by play")) {
  attr(data, "nflverse_timestamp") <- Sys.time()
  attr(data, "nflverse_type") <- type
  attr(data, "nflfastR_version") <- utils::packageVersion("nflfastR")
  class(data) <- c("nflverse_data", "tbl_df", "tbl", "data.table", "data.frame")
  data
}

str_split_and_extract <- function(string, pattern, i) {
  split_list <- stringr::str_split(string, pattern, simplify = TRUE, n = i + 1)
  split_list[, i]
}

# slightly modified version of purrr::possibly
please_work <- function(.f, otherwise = data.frame(), quiet = FALSE) {
  function(...) {
    tryCatch(
      expr = .f(...),
      error = function(e) {
        if (isFALSE(quiet)) {
          cli::cli_alert_warning(conditionMessage(e))
        }
        otherwise
      }
    )
  }
}

# THIS IS CALLED FROM INSIDE get_pbp_gc AND get_pbp_nfl
# MODIFY WITH CAUTION
fetch_raw <- function(
  game_id,
  dir = getOption("nflfastR.raw_directory", default = NULL)
) {
  season <- substr(game_id, 1, 4)

  if (is.null(dir)) {
    to_load <- raw_pbp_urls(game_id)

    fetched <- curl::curl_fetch_memory(to_load)

    if (fetched$status_code == 404 & maybe_valid(game_id)) {
      cli::cli_abort(
        "The requested GameID {.val {game_id}} is not loaded yet, please try again later!"
      )
    } else if (fetched$status_code == 500) {
      cli::cli_abort(
        "The data hosting servers are down, please try again later!"
      )
    } else if (fetched$status_code == 404) {
      cli::cli_abort("The requested GameID {.val {game_id}} is invalid!")
    }

    out <- read_raw_rds(fetched$content)
  } else {
    # build path to locally stored game files
    local_file <- file.path(
      dir,
      season,
      paste0(game_id, ".rds")
    )

    if (!file.exists(local_file)) {
      cli::cli_abort("File {.path {local_file}} doesn't exist!")
    }

    out <- readRDS(local_file)
  }

  out
}

release_bullets <- function() {
  c(
    '`devtools::check_mac_release()`',
    '`nflfastR:::my_rhub_check()`',
    '`pkgdown::check_pkgdown()`',
    '`nflfastR:::nflverse_thanks()`',
    NULL
  )
}

load_model <- function(name) {
  model <- switch(
    name,
    "ep" = fastrmodels::ep_model,
    "cp" = fastrmodels::cp_model,
    "wp" = fastrmodels::wp_model,
    "wp_spread" = fastrmodels::wp_model_spread,
    "fg" = fastrmodels::fg_model,
    "xpass" = fastrmodels::xpass_model,
    "xyac" = fastrmodels::xyac_model
  )

  # fastrmodels v2 introduced raw model vectors to make sure the models
  # are compatible with future xgboost versions
  out <- if (is.raw(model)) {
    xgboost::xgb.load.raw(model)
  } else {
    model
  }
  out
}

my_rhub_check <- function() {
  cli::cli_text("Please run the following code")
  cli::cli_text(
    "{.run rhub::rhub_check(platforms = nflfastR:::rhub_check_platforms())}"
  )
}

rhub_check_platforms <- function() {
  # plts created with
  # out <- paste0('"', rhub::rhub_platforms()$name, '"', collapse = ",\n")
  # cli::cli_code(paste0(
  #   "plts <- c(\n", out, "\n)"
  # ))

  plts <- c(
    "linux",
    "m1-san",
    "macos",
    "macos-arm64",
    "windows",
    "atlas",
    "c23",
    "clang-asan",
    "clang-ubsan",
    "clang16",
    "clang17",
    "clang18",
    "clang19",
    "clang20",
    "donttest",
    "gcc-asan",
    "gcc13",
    "gcc14",
    "gcc15",
    "intel",
    "mkl",
    "nold",
    "noremap",
    "nosuggests",
    "rchk",
    "ubuntu-clang",
    "ubuntu-gcc12",
    "ubuntu-next",
    "ubuntu-release",
    "valgrind"
  )
  exclude <- c("rchk", "nosuggests", "valgrind")
  plts[!plts %in% exclude]
}

nflverse_thanks <- function() {
  cli::cli_text("Run the following code and copy/paste its output to NEWS.md")

  cli::cli_code(
    '
    contributors <- usethis::use_tidy_thanks()
    paste(
      "Thank you to",
      glue::glue_collapse(
        paste0("&#x0040;", contributors), sep = ", ", last = ", and "
      ),
      "for their questions, feedback, and contributions towards this release."
    )'
  )
}

check_for_dropped_seasons <- function(game_ids) {
  dropped_support <- grep("1999|2000", game_ids, value = TRUE)
  if (length(dropped_support)) {
    seasons <- substr(dropped_support, 1, 4) |> unique() |> sort()
    cli::cli_alert_warning(
      "You have supplied game ID(s) of the {seasons} \\
      {cli::qty(length(seasons))}season{?s}. \\
      nflfastR v6 has discontinued support for the parser for {?this/these} \\
      season{?s} because of too many inconsistencies between the data sources. \\
      The data is still available for download, however. \\
      Please run {.run nflfastR::load_pbp(c({paste(seasons, collapse = ', ')}))}"
    )
    game_ids <- game_ids[!game_ids %in% dropped_support]
  }
  game_ids
}

raw_pbp_urls <- function(game_ids) {
  # pattern
  # https://github.com/nflverse/nflverse-pbp/releases/download/{season}/{game_id}.rds
  file.path(
    "https://github.com/nflverse/nflverse-pbp/releases/download",
    paste0("raw_pbp_", substr(game_ids, 1, 4)),
    paste0(game_ids, ".rds"),
    fsep = "/"
  )
}


================================================
FILE: README.Rmd
================================================
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/readme-"
)
```

# **nflfastR** <img src="man/figures/logo.png" align="right" width="25%" min-width="120px"/>


<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version-last-release/nflfastR)](https://CRAN.R-project.org/package=nflfastR)
[![CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/nflfastR)](https://CRAN.R-project.org/package=nflfastR)
[![Dev status](https://img.shields.io/github/r-package/v/nflverse/nflfastR/master?label=dev%20version&style=flat-square&logo=github)](https://nflfastr.com/)
[![R build status](https://img.shields.io/github/actions/workflow/status/nflverse/nflfastR/R-CMD-check.yaml?label=R%20check&style=flat-square&logo=github)](https://github.com/nflverse/nflfastR/actions)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![nflverse support](https://img.shields.io/discord/789805604076126219?color=7289da&label=nflverse%20support&logo=discord&logoColor=fff&style=flat-square)](https://discord.com/invite/5Er2FBnnQa)
<!-- [![Twitter Follow](https://img.shields.io/twitter/follow/nflfastR.svg?style=social)](https://twitter.com/nflfastR) -->
<!-- badges: end -->

`nflfastR` is a set of functions to efficiently scrape NFL play-by-play data. `nflfastR` expands upon the features of nflscrapR:
  
* The package contains NFL play-by-play data back to 1999
* As suggested by the package name, it obtains games **much** faster
* Includes completion probability (`cp`), completion percentage over expected (`cpoe`), and expected yards after the catch (`xyac_epa` and `xyac_mean_yardage`) in play-by-play going back to 2006
* Includes drive information, including drive starting position and drive result
* Includes series information, including series number and series success
* Hosts [a release of play-by-play data going back to 1999](https://github.com/nflverse/nflverse-data/releases/tag/pbp) for very quick access
* Features models for Expected Points, Win Probability, Completion Probability, and Yards After the Catch (see section below)
* Includes a function `update_db()` that creates and updates a database

We owe a debt of gratitude to the original [`nflscrapR`](https://github.com/maksimhorowitz/nflscrapR) team, Maksim Horowitz, Ronald Yurko, and Samuel Ventura, without whose contributions and inspiration this package would not exist.


## Installation

The easiest way to get nflfastR is to install it from [CRAN](https://cran.r-project.org/package=nflfastR) with:

```{r, eval=FALSE}
install.packages("nflfastR")
```

To get a bug fix or to use a feature from the development version, you can install the development version of nflfastR either from [GitHub](https://github.com/nflverse/nflfastR/) with:

``` {r eval = FALSE}
if (!require("pak")) install.packages("pak")
pak::pak("nflverse/nflfastR")
```

or prebuilt from the [development repo](https://nflverse.r-universe.dev) with:

```{r eval = FALSE}
install.packages("nflfastR", repos = c("https://nflverse.r-universe.dev", getOption("repos")))
```

## Usage

We have provided some application examples in the **[Getting Started](https://nflfastr.com/articles/nflfastR.html)** article. However, these require a basic knowledge of R. For this reason we have the **[nflfastR beginner's guide](https://nflfastr.com/articles/beginners_guide.html)**, which we recommend to all those who are looking for an introduction to nflfastR with R.

You can find column names and descriptions in the **[Field Descriptions](https://nflfastr.com/articles/field_descriptions.html)** article, or by accessing the `field_descriptions` dataframe from the package.

## Data access

Even though `nflfastR` is very fast, **we recommend downloading the data from [here](https://github.com/nflverse/nflverse-data/releases/tag/pbp) or using the `nflreadr` package**. These data sets include play-by-play data of complete seasons going back to 1999 and are updated nightly during the season. The files contain both regular season and postseason data, and one can use game_type or week to figure out which games occurred in the postseason.

## nflfastR models

`nflfastR` uses its own models for Expected Points, Win Probability, Completion Probability, and Expected Yards After the Catch. To read about the models, please see [this post on Open Source Football](https://opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/). For a more detailed description of the motivation for Expected Points models, we highly recommend this paper [from the nflscrapR team located here](https://arxiv.org/pdf/1802.00998). 

Here is a visualization of the Expected Points model by down and yardline.

``` {r epa-model, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600, echo=FALSE, eval = FALSE}

# This code was used to create the ep model image. Since we don't want to include 
# the resulting png file in the package for file size reasons it was uploaded to
# the nflfastR repo and embedded remotely with the next chunk

library(tidyverse)

df <- nflreadr::load_pbp(2014:2019) |>
        filter(!is.na(posteam) & !is.na(ep), !is.na(down)) |>
        select(ep, down, yardline_100, air_yards, pass_location, cp)

df |>
  ggplot(aes(x = yardline_100, y = ep, color = as.factor(down))) + 
  geom_smooth(size = 2) + 
  labs(x = "Yards from opponent's end zone",
       y = "Expected points value",
       color = "Down",
       title = "Expected Points by Yardline and Down") +
  theme_bw() + 
  scale_y_continuous(expand=c(0,0), breaks = scales::pretty_breaks(10)) + 
  scale_x_continuous(expand=c(0,0), breaks = seq(from = 5, to = 95, by = 10)) +
  theme(
    plot.title = element_text(size = 18, hjust = 0.5),
    plot.subtitle = element_text(size = 16, hjust = 0.5),
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.text = element_text(size = 16),
    legend.title = element_text(size = 16),
    legend.position = c(.90, .80)) +
    annotate("text", x = 14, y = -2.2, size = 3, label = "2014-2019 | Model: @nflfastR")
```

```{r echo=FALSE, fig.align='center', fig.cap='', out.width='100%'}
knitr::include_graphics('man/figures/readme-epa-model-1.png')
```

Here is a visualization of the Completion Probability model by air yards and pass direction.

``` {r cp-model, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600, echo=FALSE, eval = FALSE}

# This code was used to create the cp model image. Since we don't want to include 
# the resulting png file in the package for file size reasons it was uploaded to
# the nflfastR repo and embedded remotely with the next chunk

df |>
  filter(!is.na(cp), between(air_yards, -5, 45)) |>
  mutate(pass_middle = if_else(pass_location == "middle", "Yes", "No")) |>
  ggplot(aes(x = air_yards, y = cp, color = as.factor(pass_middle))) + 
  geom_smooth(size = 2) + 
  labs(x = "Air yards",
       y = "Expected completion %",
       color = "Pass middle",
       title = "Expected Completion % by Air Yards and Pass Direction") +
  theme_bw() + 
  scale_y_continuous(expand=c(0,0), breaks = scales::pretty_breaks(5)) + 
  scale_x_continuous(expand=c(0,0)) +
  theme(
    plot.title = element_text(size = 18, hjust = 0.5),
    plot.subtitle = element_text(size = 16, hjust = 0.5),
    axis.title = element_text(size = 18),
    axis.text = element_text(size = 16),
    legend.text = element_text(size = 16),
    legend.title = element_text(size = 16),
    legend.position = c(.80, .80)) +
    annotate("text", x = 2, y = .32, size = 3, label = "2014-2019 | Model: @nflfastR")
```

```{r echo=FALSE, fig.align='center', fig.cap='', out.width='100%'}
knitr::include_graphics('man/figures/readme-cp-model-1.png')
```

`nflfastR` includes two win probability models: one with and one without incorporating the pre-game spread.

## Special thanks

* To Nick Shoemaker for [finding and making available JSON-formatted NFL play-by-play back to 1999](https://github.com/CroppedClamp/nfl_pbps) (`nflfastR` uses this source for 1999 and 2000 and previously also used it for 2001-2010)
* To Lau Sze Yui for developing a scraping function to access JSON-formatted NFL play-by-play beginning in 2001
* To Aaron Schatz and FTN Fantasy for providing charting data to correctly mark scrambles in the 1999-2005 seasons
* To Lee Sharpe for curating a resource for game information
* To Timo Riske, Lau Sze Yui, Sean Clement, and Daniel Houston for many helpful discussions regarding the development of the new `nflfastR` models
* To Zach Feldman and Josh Hermsmeyer for many helpful discussions about CPOE models as well as Peter Owen for many helpful suggestions for the CP model
* To Florian Schmitt for the logo design
* The many users who found and reported bugs in `nflfastR` 1.0
* And of course, the original [`nflscrapR`](https://github.com/maksimhorowitz/nflscrapR) team, Maksim Horowitz, Ronald Yurko, and Samuel Ventura, whose work represented a dramatic step forward for the state of public NFL research


================================================
FILE: README.md
================================================

<!-- README.md is generated from README.Rmd. Please edit that file -->

# **nflfastR** <img src="man/figures/logo.png" align="right" width="25%" min-width="120px"/>

<!-- badges: start -->

[![CRAN
status](https://www.r-pkg.org/badges/version-last-release/nflfastR)](https://CRAN.R-project.org/package=nflfastR)
[![CRAN
downloads](https://cranlogs.r-pkg.org/badges/grand-total/nflfastR)](https://CRAN.R-project.org/package=nflfastR)
[![Dev
status](https://img.shields.io/github/r-package/v/nflverse/nflfastR/master?label=dev%20version&style=flat-square&logo=github)](https://nflfastr.com/)
[![R build
status](https://img.shields.io/github/actions/workflow/status/nflverse/nflfastR/R-CMD-check.yaml?label=R%20check&style=flat-square&logo=github)](https://github.com/nflverse/nflfastR/actions)
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![nflverse
support](https://img.shields.io/discord/789805604076126219?color=7289da&label=nflverse%20support&logo=discord&logoColor=fff&style=flat-square)](https://discord.com/invite/5Er2FBnnQa)
<!-- [![Twitter Follow](https://img.shields.io/twitter/follow/nflfastR.svg?style=social)](https://twitter.com/nflfastR) -->
<!-- badges: end -->

`nflfastR` is a set of functions to efficiently scrape NFL play-by-play
data. `nflfastR` expands upon the features of nflscrapR:

- The package contains NFL play-by-play data back to 1999
- As suggested by the package name, it obtains games **much** faster
- Includes completion probability (`cp`), completion percentage over
  expected (`cpoe`), and expected yards after the catch (`xyac_epa` and
  `xyac_mean_yardage`) in play-by-play going back to 2006
- Includes drive information, including drive starting position and
  drive result
- Includes series information, including series number and series
  success
- Hosts [a release of play-by-play data going back to
  1999](https://github.com/nflverse/nflverse-data/releases/tag/pbp) for
  very quick access
- Features models for Expected Points, Win Probability, Completion
  Probability, and Yards After the Catch (see section below)
- Includes a function `update_db()` that creates and updates a database

We owe a debt of gratitude to the original
[`nflscrapR`](https://github.com/maksimhorowitz/nflscrapR) team, Maksim
Horowitz, Ronald Yurko, and Samuel Ventura, without whose contributions
and inspiration this package would not exist.

## Installation

The easiest way to get nflfastR is to install it from
[CRAN](https://cran.r-project.org/package=nflfastR) with:

``` r
install.packages("nflfastR")
```

To get a bug fix or to use a feature from the development version, you
can install the development version of nflfastR either from
[GitHub](https://github.com/nflverse/nflfastR/) with:

``` r
if (!require("pak")) install.packages("pak")
pak::pak("nflverse/nflfastR")
```

or prebuilt from the [development repo](https://nflverse.r-universe.dev)
with:

``` r
install.packages("nflfastR", repos = c("https://nflverse.r-universe.dev", getOption("repos")))
```

## Usage

We have provided some application examples in the **[Getting
Started](https://nflfastr.com/articles/nflfastR.html)** article.
However, these require a basic knowledge of R. For this reason we have
the **[nflfastR beginner’s
guide](https://nflfastr.com/articles/beginners_guide.html)**, which we
recommend to all those who are looking for an introduction to nflfastR
with R.

You can find column names and descriptions in the **[Field
Descriptions](https://nflfastr.com/articles/field_descriptions.html)**
article, or by accessing the `field_descriptions` dataframe from the
package.

## Data access

Even though `nflfastR` is very fast, **we recommend downloading the data
from [here](https://github.com/nflverse/nflverse-data/releases/tag/pbp)
or using the `nflreadr` package**. These data sets include play-by-play
data of complete seasons going back to 1999 and are updated nightly
during the season. The files contain both regular season and postseason
data, and one can use game_type or week to figure out which games
occurred in the postseason.

## nflfastR models

`nflfastR` uses its own models for Expected Points, Win Probability,
Completion Probability, and Expected Yards After the Catch. To read
about the models, please see [this post on Open Source
Football](https://opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/).
For a more detailed description of the motivation for Expected Points
models, we highly recommend this paper [from the nflscrapR team located
here](https://arxiv.org/pdf/1802.00998).

Here is a visualization of the Expected Points model by down and
yardline.

<img src="man/figures/readme-epa-model-1.png" alt="" width="100%" style="display: block; margin: auto;" />

Here is a visualization of the Completion Probability model by air yards
and pass direction.

<img src="man/figures/readme-cp-model-1.png" alt="" width="100%" style="display: block; margin: auto;" />

`nflfastR` includes two win probability models: one with and one without
incorporating the pre-game spread.

## Special thanks

- To Nick Shoemaker for [finding and making available JSON-formatted NFL
  play-by-play back to 1999](https://github.com/CroppedClamp/nfl_pbps)
  (`nflfastR` uses this source for 1999 and 2000 and previously also
  used it for 2001-2010)
- To Lau Sze Yui for developing a scraping function to access
  JSON-formatted NFL play-by-play beginning in 2001
- To Aaron Schatz and FTN Fantasy for providing charting data to
  correctly mark scrambles in the 1999-2005 seasons
- To Lee Sharpe for curating a resource for game information
- To Timo Riske, Lau Sze Yui, Sean Clement, and Daniel Houston for many
  helpful discussions regarding the development of the new `nflfastR`
  models
- To Zach Feldman and Josh Hermsmeyer for many helpful discussions about
  CPOE models as well as Peter Owen for many helpful suggestions for the
  CP model
- To Florian Schmitt for the logo design
- The many users who found and reported bugs in `nflfastR` 1.0
- And of course, the original
  [`nflscrapR`](https://github.com/maksimhorowitz/nflscrapR) team,
  Maksim Horowitz, Ronald Yurko, and Samuel Ventura, whose work
  represented a dramatic step forward for the state of public NFL
  research


================================================
FILE: air.toml
================================================


================================================
FILE: cran-comments.md
================================================
## Release summary

This is a minor release that 
* deprecates old functions, and
* fixes bugs.

## R CMD check results

0 errors | 0 warnings | 0 notes

## revdepcheck results

We checked 3 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.

 * We saw 0 new problems
 * We failed to check 0 packages


================================================
FILE: data-raw/MODELS.R
================================================
################################################################################
# Author: Ben Baldwin
# Purpose: Estimate nflfastR models for EP, CP, Field Goals, and WP
################################################################################

library(tidyverse)
library(xgboost)
source('R/helper_add_ep_wp.R')
source('R/helper_add_cp_cpoe.R')
source('R/helper_add_nflscrapr_mutations.R')


################################################################################
# Estimate EP model
################################################################################

# from remote
# pbp_data <- readRDS(url('https://github.com/nflverse/nflfastR-data/blob/master/models/cal_data.rds?raw=true'))

# from local
pbp_data <- readRDS('../nflfastR-data/models/cal_data.rds')

#function in helper_add_ep_wp.R
model_data <- pbp_data |>
  make_model_mutations() |>
  mutate(
    label = case_when(
      Next_Score_Half == "Touchdown" ~ 0,
      Next_Score_Half == "Opp_Touchdown" ~ 1,
      Next_Score_Half == "Field_Goal" ~ 2,
      Next_Score_Half == "Opp_Field_Goal" ~ 3,
      Next_Score_Half == "Safety" ~ 4,
      Next_Score_Half == "Opp_Safety" ~ 5,
      Next_Score_Half == "No_Score" ~ 6
    ),
    label = as.factor(label),
    # Calculate the drive difference between the next score drive and the
    # current play drive:
    Drive_Score_Dist = Drive_Score_Half - drive,
    # Create a weight column based on difference in drives between play and next score:
    Drive_Score_Dist_W = (max(Drive_Score_Dist) - Drive_Score_Dist) /
      (max(Drive_Score_Dist) - min(Drive_Score_Dist)),
    # Create a weight column based on score differential:
    ScoreDiff_W = (max(abs(score_differential), na.rm = T) -
      abs(score_differential)) /
      (max(abs(score_differential), na.rm = T) -
        min(abs(score_differential), na.rm = T)),
    # Add these weights together and scale again:
    Total_W = Drive_Score_Dist_W + ScoreDiff_W,
    Total_W_Scaled = (Total_W - min(Total_W, na.rm = T)) /
      (max(Total_W, na.rm = T) - min(Total_W, na.rm = T))
  ) |>
  filter(
    !is.na(defteam_timeouts_remaining),
    !is.na(posteam_timeouts_remaining),
    !is.na(yardline_100)
  ) |>
  select(
    label,
    half_seconds_remaining,
    yardline_100,
    home,
    retractable,
    dome,
    outdoors,
    ydstogo,
    era0,
    era1,
    era2,
    era3,
    era4,
    down1,
    down2,
    down3,
    down4,
    posteam_timeouts_remaining,
    defteam_timeouts_remaining,
    Total_W_Scaled
  )

nrounds = 525
params <-
  list(
    booster = "gbtree",
    objective = "multi:softprob",
    eval_metric = c("mlogloss"),
    num_class = 7,
    eta = 0.025,
    gamma = 1,
    subsample = 0.8,
    colsample_bytree = 0.8,
    max_depth = 5,
    min_child_weight = 1
  )

model_data <- model_data |>
  mutate(label = as.numeric(label), label = label - 1)

full_train = xgboost::xgb.DMatrix(
  model.matrix(~ . + 0, data = model_data |> select(-label, -Total_W_Scaled)),
  label = model_data$label,
  weight = model_data$Total_W_Scaled
)

set.seed(2013) #GoHawks
ep_model <- xgboost::xgboost(
  params = params,
  data = full_train,
  nrounds = nrounds,
  verbose = 2
)

################################################################################
# Estimate FG model
################################################################################

fg_model_data <- pbp_data |>
  filter(
    play_type %in%
      c("field_goal", "extra_point", "run") &
      (!is.na(extra_point_result) | !is.na(field_goal_result))
  ) |>
  make_model_mutations()

#estimate model
fg_model <- mgcv::bam(
  sp ~ s(yardline_100, by = interaction(era, model_roof)) + model_roof + era,
  data = fg_model_data,
  family = "binomial"
)

################################################################################
# Estimate CP model
################################################################################

model_vars <- pbp_data |>
  filter(season >= 2006) |>
  make_model_mutations() |>
  prepare_cp_data() |>
  filter(valid_pass == 1) |>
  select(-valid_pass)

nrounds = 560
params <-
  list(
    booster = "gbtree",
    objective = "binary:logistic",
    eval_metric = c("logloss"),
    eta = 0.025,
    gamma = 5,
    subsample = 0.8,
    colsample_bytree = 0.8,
    max_depth = 4,
    min_child_weight = 6,
    base_score = mean(model_vars$complete_pass)
  )

full_train = xgboost::xgb.DMatrix(
  model.matrix(~ . + 0, data = model_vars |> dplyr::select(-complete_pass)),
  label = model_vars$complete_pass
)
set.seed(2013) #GoHawks
cp_model <- xgboost::xgboost(
  params = params,
  data = full_train,
  nrounds = nrounds,
  verbose = 2
)


################################################################################
# Estimate WP model: spread
################################################################################

model_data <-
  readRDS(url(
    'https://github.com/guga31bb/metrics/blob/master/wp_tuning/cal_data.rds?raw=true'
  )) |>
  filter(Winner != "TIE") |>
  make_model_mutations() |>
  prepare_wp_data() |>
  mutate(label = ifelse(posteam == Winner, 1, 0)) |>
  filter(
    qtr <= 4 &
      !is.na(ep) &
      !is.na(score_differential) &
      !is.na(play_type) &
      !is.na(label),
    !is.na(yardline_100)
  ) |>
  select(
    label,
    receive_2h_ko,
    spread_time,
    home,
    half_seconds_remaining,
    game_seconds_remaining,
    Diff_Time_Ratio,
    score_differential,
    down,
    ydstogo,
    yardline_100,
    posteam_timeouts_remaining,
    defteam_timeouts_remaining
  )


nrounds = 534
params <-
  list(
    booster = "gbtree",
    objective = "binary:logistic",
    eval_metric = c("logloss"),
    eta = 0.05,
    gamma = .79012017,
    subsample = 0.9224245,
    colsample_bytree = 5 / 12,
    max_depth = 5,
    min_child_weight = 7,
    monotone_constraints = "(0, 0, 0, 0, 0, 1, 1, -1, -1, -1, 1, -1)"
  )


full_train = xgboost::xgb.DMatrix(
  model.matrix(~ . + 0, data = model_data |> select(-label)),
  label = model_data$label
)
set.seed(2013) #GoHawks
wp_model_spread <- xgboost::xgboost(
  params = params,
  data = full_train,
  nrounds = nrounds,
  verbose = 2
)

importance <- xgboost::xgb.importance(
  feature_names = colnames(wp_model_spread),
  model = wp_model_spread
)
xgboost::xgb.ggplot.importance(importance_matrix = importance)

#xgboost::xgb.plot.tree(model = wp_model_spread, trees = 1, show_node_id = TRUE)

################################################################################
# Estimate WP model: no spread
################################################################################

model_data <- model_data |>
  select(
    -spread_time
  )

nrounds = 65
params <-
  list(
    booster = "gbtree",
    objective = "binary:logistic",
    eval_metric = c("logloss"),
    eta = 0.2,
    gamma = 0,
    subsample = 0.8,
    colsample_bytree = 0.8,
    max_depth = 4,
    min_child_weight = 1
  )


full_train = xgboost::xgb.DMatrix(
  model.matrix(~ . + 0, data = model_data |> select(-label)),
  label = model_data$label
)
set.seed(2013) #GoHawks
wp_model <- xgboost::xgboost(
  params = params,
  data = full_train,
  nrounds = nrounds,
  verbose = 2
)


################################################################################
# save models to use in package
################################################################################

usethis::use_data(
  ep_model,
  wp_model,
  wp_model_spread,
  fg_model,
  cp_model,
  internal = TRUE,
  overwrite = TRUE
)


================================================
FILE: data-raw/_tune_spread_wp.R
================================================
library(tidyverse)
library(tidymodels)
source('R/helper_add_ep_wp.R')
source('R/helper_add_nflscrapr_mutations.R')

set.seed(2013)

model_data <-
  # readRDS('data-raw/cal_data.rds') |>
  readRDS(url(
    'https://github.com/nflverse/nflfastR-data/blob/master/models/cal_data.rds?raw=true'
  )) |>
  filter(Winner != "TIE") |>
  make_model_mutations() |>
  prepare_wp_data() |>
  mutate(label = ifelse(posteam == Winner, 1, 0)) |>
  filter(
    !is.na(ep) &
      !is.na(score_differential) &
      !is.na(play_type) &
      !is.na(label) &
      !is.na(yardline_100),
    qtr <= 4
  ) |>
  select(
    label,
    receive_2h_ko,
    spread_time,
    home,
    half_seconds_remaining,
    game_seconds_remaining,
    ExpScoreDiff_Time_Ratio,
    score_differential,
    # ep,
    down,
    ydstogo,
    yardline_100,
    posteam_timeouts_remaining,
    defteam_timeouts_remaining,
    season
  )


folds <- map(0:9, function(x) {
  f <- which(model_data$season %in% c(2000 + x, 2010 + x))
  return(f)
})


full_train = xgboost::xgb.DMatrix(
  model.matrix(~ . + 0, data = model_data |> select(-label, -season)),
  label = model_data$label
)

#params
nrounds = 5000


# #################################################################################
# try tidymodels

grid <- grid_latin_hypercube(
  finalize(mtry(), model_data),
  min_n(),
  tree_depth(),
  learn_rate(),
  loss_reduction(),
  sample_size = sample_prop(),
  size = 20
)

grid <- grid |>
  mutate(
    # it was making dumb learn rates
    learn_rate = .1 * ((1:nrow(grid)) / nrow(grid)),
    # has to be between 0 and 1
    mtry = mtry / length(model_data)
  )

# bonus round at the end: do more searching after finding good ones
grid <- grid |>
  head(6) |>
  mutate(
    learn_rate = c(0.01, 0.02, .03, .04, .05, .06),
    min_n = 14,
    tree_depth = 5,
    mtry = 0.5714286,
    loss_reduction = 3.445502e-01,
    sample_size = 0.7204741
  )

grid

# function to search over hyperparameter grid
get_metrics <- function(df, row = 1) {
  # testing only
  # df <- grid |> dplyr::slice(1)

  params <-
    list(
      booster = "gbtree",
      objective = "binary:logistic",
      eval_metric = c("logloss"),
      eta = df$learn_rate,
      gamma = df$loss_reduction,
      subsample = df$sample_size,
      colsample_bytree = df$mtry,
      max_depth = df$tree_depth,
      min_child_weight = df$min_n
    )

  #train
  wp_cv_model <- xgboost::xgb.cv(
    data = full_train,
    params = params,
    nrounds = nrounds,
    folds = folds,
    metrics = list("logloss"),
    early_stopping_rounds = 10,
    print_every_n = 10
  )

  output <- params
  output$iter = wp_cv_model$best_iteration
  output$logloss = wp_cv_model$evaluation_log[output$iter]$test_logloss_mean
  output$error = wp_cv_model$evaluation_log[output$iter]$test_error_mean

  this_param <- bind_rows(output)

  if (row == 1) {
    saveRDS(this_param, "data-raw/modeling.rds")
  } else {
    prev <- readRDS("data-raw/modeling.rds")
    for_save <- bind_rows(prev, this_param)
    saveRDS(for_save, "data-raw/modeling.rds")
  }

  return(this_param)
}

# get results
results <- map_df(1:nrow(grid), function(x) {
  message(glue::glue("Row {x}"))
  get_metrics(grid |> dplyr::slice(x), row = x)
})

# plot
results |>
  select(
    logloss,
    eta,
    gamma,
    subsample,
    colsample_bytree,
    max_depth,
    min_child_weight
  ) |>
  pivot_longer(
    eta:min_child_weight,
    values_to = "value",
    names_to = "parameter"
  ) |>
  ggplot(aes(value, logloss, color = parameter)) +
  geom_point(alpha = 0.8, show.legend = FALSE, size = 3) +
  facet_wrap(~parameter, scales = "free_x") +
  labs(x = NULL, y = "logloss") +
  theme_minimal()

# final best model
#
# eta 0.02
# gamma 0.3445502
# subsample 0.7204741
# colsample_bytree 0.5714286
# max_depth 5
# min_child_weight 14
# iter 760
# logloss 0.4485878

# https://parsnip.tidymodels.org/reference/boost_tree.html
# https://xgboost.readthedocs.io/en/latest/parameter.html


================================================
FILE: data-raw/build_scramble_fix.R
================================================
library(tidyverse)

pbp <- nflfastR::load_pbp(1999:2005) |>
  # plays that could plausibly be scramble
  filter(
    !is.na(rusher_player_id) | penalty == 1,
    is.na(passer_player_id),
    is.na(receiver_player_id)
  ) |>
  select(
    season,
    game_id,
    play_id,
    week,
    away_team,
    home_team,
    posteam,
    qtr,
    down,
    ydstogo,
    time,
    desc
  ) |>
  # not in scramble data this year
  mutate(
    time = case_when(
      nchar(time) == 3 ~ paste0("00", time),
      nchar(time) == 4 ~ paste0("0", time),
      TRUE ~ time
    )
  )

# Thank you to Aaron Schatz and Football Outsiders
# For the charting data to fix scrambles in 2005
s <- readxl::read_xlsx("data-raw/scrambles_2005.xlsx") |>
  as_tibble() |>
  janitor::clean_names() |>
  select(
    season = year,
    week,
    qtr,
    away_team = away,
    home_team = home,
    posteam = offense,
    down,
    ydstogo = togo,
    date_time = time
  )

# Thank you to Aaron Schatz
# For the charting data to fix scrambles in 1999 - 2004
s2 <- readxl::read_xlsx(
  "data-raw/Scrambles 1999-2004 UPDATE for NFLfastR.xlsx",
  sheet = 1
) |>
  as_tibble() |>
  janitor::clean_names() |>
  filter(type %in% c("scramble", "assume scramble")) |>
  select(
    season = year,
    week,
    qtr,
    away_team = away,
    home_team = home,
    posteam = offense,
    down,
    ydstogo = togo,
    date_time = time,
    yards_gained = yards
  )
# s3 is a correction. the plays are in s3 should be rushes and therefore excluded from scramble_fix
# see #475
s3 <- readxl::read_xlsx(
  "data-raw/Scrambles.1999-2003.FURTHER.UPDATE.for.NFLfastR.xlsx",
  sheet = 1
) |>
  as_tibble() |>
  janitor::clean_names() |>
  select(
    season = year,
    week,
    qtr,
    away_team = away,
    home_team = home,
    posteam = offense,
    down,
    ydstogo = togo,
    date_time = time,
    yards_gained = yards
  ) |>
  mutate(
    time = paste0(
      formatC(lubridate::hour(date_time), width = 2, flag = "0"),
      ":",
      formatC(lubridate::minute(date_time), width = 2, flag = "0")
    )
  ) |>
  select(-date_time) |>
  mutate_at(vars(home_team, away_team, posteam), nflfastR:::team_name_fn) |>
  dplyr::left_join(
    pbp,
    by = c(
      "week",
      "away_team",
      "home_team",
      "posteam",
      "qtr",
      "down",
      "ydstogo",
      "time",
      "season"
    )
  ) |>
  mutate(no_scramble_id = paste0(game_id, "_", play_id))

dat <- bind_rows(
  s2,
  s
) |>
  mutate(
    time = paste0(
      formatC(lubridate::hour(date_time), width = 2, flag = "0"),
      ":",
      formatC(lubridate::minute(date_time), width = 2, flag = "0")
    )
  ) |>
  select(-date_time) |>
  mutate_at(vars(home_team, away_team, posteam), nflfastR:::team_name_fn)

d <- dat |>
  dplyr::left_join(
    pbp,
    by = c(
      "week",
      "away_team",
      "home_team",
      "posteam",
      "qtr",
      "down",
      "ydstogo",
      "time",
      "season"
    )
  ) |>
  mutate(scramble_id = paste0(game_id, "_", play_id)) |>
  filter(scramble_id != "2005_09_CIN_BAL_1725") |>
  filter(!scramble_id %in% s3$no_scramble_id)

# number non-matched by season
nrow(d)
d |> filter(is.na(desc)) |> group_by(season) |> summarise(n = n())

# get rid of non-match
d <- d |>
  filter(!is.na(desc))
d |> group_by(season) |> summarise(n = n())

scramble_fix <- d$scramble_id
scramble_fix <- scramble_fix |>
  unique()
length(scramble_fix)
saveRDS(scramble_fix, file = "data-raw/scramble_fix.rds")


================================================
FILE: data-raw/build_stat_id_df.R
================================================
stat_ids <- "https://www.nflgsis.com/gsis/Documentation/Partners/StatIDs_files/sheet001.html" |>
  xml2::read_html() |>
  rvest::html_table(fill = TRUE) |>
  as.data.frame() |>
  dplyr::rename("stat_id" = X1, "name" = X2, "comment" = X3) |>
  dplyr::select(1:3) |>
  dplyr::slice(-1) |>
  dplyr::mutate(stat_id = as.integer(stat_id)) |>
  dplyr::filter(!is.na(stat_id)) |>
  dplyr::group_by(stat_id, name) |>
  dplyr::summarise(comment = paste0(comment, collapse = " ")) |>
  dplyr::ungroup() |>
  dplyr::mutate(comment = stringr::str_squish(comment))

usethis::use_data(stat_ids, overwrite = TRUE)


================================================
FILE: data-raw/compare_dfs.R
================================================
library(tidyverse)
future::plan("multisession")

# function for comparing revisions against data in repo
# make sure to build package first
compare_pbp <- function(id, cols) {
  s <- substr(id[1], 1, 4) |> as.integer()
  # no idea why this is necessary
  games <- id

  new_pbp <- build_nflfastR_pbp(
    id
    # comment this out to use the "normal" way
    # , dir = "../nflfastR-raw/raw"
  ) |>
    filter(!stringr::str_detect(desc, "GAME")) |>
    select(all_of(cols)) |>
    # necessary to pass the equality checks
    mutate(
      ep = round(ep, 2),
      epa = round(epa, 2),
      vegas_home_wp = round(vegas_home_wp, 2),
      vegas_home_wpa = round(vegas_home_wpa, 2),
      home_wp = round(home_wp, 2)
    )

  repo_pbp <- readRDS(url(glue::glue(
    "https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_{s}.rds"
  ))) |>
    filter(game_id %in% games) |>
    filter(!stringr::str_detect(desc, "GAME")) |>
    select(all_of(cols)) |>
    mutate(
      ep = round(ep, 2),
      epa = round(epa, 2),
      vegas_home_wp = round(vegas_home_wp, 2),
      vegas_home_wpa = round(vegas_home_wpa, 2),
      home_wp = round(home_wp, 2)
    )

  sum <- arsenal::diffs(arsenal::comparedf(
    new_pbp |> select(-desc, -game_id, -play_id),
    repo_pbp |> select(-desc, -game_id, -play_id)
  ))
  dfs <- bind_cols(
    new_pbp |> select(-desc, -game_id, -play_id),
    repo_pbp |> select(-desc, -game_id, -play_id)
  )

  dfs$desc <- new_pbp$desc
  dfs$play_id <- new_pbp$play_id
  dfs$game_id <- new_pbp$game_id

  return(
    list(sum, dfs)
  )
}

cols <- c(
  # DO NOT REMOVE THESE ONES OR THE COMPARISON WILL BREAK
  "game_id",
  "play_id",
  "desc",
  "ep",
  "epa",
  "vegas_home_wp",
  "vegas_home_wpa",
  "home_wp"

  # here is stuff you can choose whether to include
  # , "posteam_timeouts_remaining", "defteam_timeouts_remaining"
)

id <- "2002_05_PHI_JAX"
id <- "2006_01_MIA_PIT"
id <- "2006_02_PIT_JAX"
id <- "2017_08_LAC_NE"
id <- "2006_04_JAX_WAS"
id <- "2019_01_SF_TB"
id <- "2017_12_JAX_ARI"

ids <- nflfastR::fast_scraper_schedules(2020) |>
  dplyr::slice(1:20) |>
  pull(game_id)

compared <- compare_pbp(
  id = ids,
  cols = cols
)

# summary table
compared[[1]]

# get row numbers of things with differences
obs <- compared[[1]]$..row.names.. |> unique()

# dfs
compared[[2]] |> arrange(play_id)

# dfs with differences
compared[[2]][obs, ] |> arrange(play_id)

# play description of plays with differences
compared[[2]][obs, ] |> arrange(play_id) |> select(desc)


================================================
FILE: data-raw/create_field_descriptions.R
================================================
library(dplyr)
library(tidyr)
library(stringr)
library(usethis)

x <- readLines("data-raw/variable_list.txt")

field_descriptions <- tibble(x = x) |>
  separate(x, "{", into = c(NA, "Field", "Description")) |>
  mutate_all(str_remove_all, "\\}")

usethis::use_data(field_descriptions, overwrite = TRUE)
# save(field_descriptions, file = "vignettes/field_descriptions.rda")


================================================
FILE: data-raw/default_play.R
================================================
### Create datatype dataframe
### This is a db that is stored on Seb's local machine
connection <- DBI::dbConnect(duckdb::duckdb(), "../_data_cache/pbp_db")
pbp_db <- dplyr::tbl(connection, "nflfastR_pbp")

### This is heavy, only run if necessary
sk <- pbp_db |>
  dplyr::collect() |>
  skimr::skim()

readr::write_csv(sk, "data-raw/pbp_datatypes.csv")

sk <- readr::read_csv("data-raw/pbp_datatypes.csv")

random_play <- pbp_db |>
  dplyr::filter(season == 1999) |>
  dplyr::collect() |>
  dplyr::slice_sample(n = 1) |>
  as.list()

default_play <-
  purrr::map(
    seq_along(random_play),
    function(i, play, sk) {
      val <- play[[i]]
      var <- names(play[i])
      if (is.character(val)) {
        max_char <- sk$character.max[sk$skim_variable == var]
        rnd_char <- ifelse(is.na(val), 0L, nchar(val))
        if (is.na(max_char)) {
          max_char <- switch(
            var,
            "lateral_sack_player_id" = 10L,
            "lateral_sack_player_name" = 20L,
            "tackle_with_assist_2_player_id" = 10L,
            "tackle_with_assist_2_player_name" = 20L,
            "tackle_with_assist_2_team" = 3L,
            "drive_real_start_time" = 20L
          )
        }
        if (max_char > rnd_char) {
          val <- strsplit(val, character(0L))[[1]] |>
            sample(size = max_char, replace = TRUE) |>
            paste0(collapse = "")
        }
      }
      val
    },
    play = random_play,
    sk = sk
  ) |>
  purrr::set_names(names(random_play)) |>
  tibble::as_tibble_row() |>
  dplyr::mutate(game_id = "9999_99_DEF_TYP")

readr::write_csv(default_play, "data-raw/pbp_defaultplay.csv")
saveRDS(default_play, "data-raw/pbp_defaultplay.rds")


================================================
FILE: data-raw/nfl_stats_variables.R
================================================
s1 <- calculate_stats(2023, "season", "player")
s2 <- calculate_stats(2023, "week", "player")
s3 <- calculate_stats(2023, "season", "team")
s4 <- calculate_stats(2023, "week", "team")

n1 <- names(s1)
n2 <- names(s2)
n3 <- names(s3)
n4 <- names(s4)

setdiff(n1, n2)
setdiff(n2, n1)

setdiff(n1, n3)
setdiff(n3, n1)

# tibble::tibble(
#   variable = c(n1, n2, n3, n4) |> unique(),
#   description = ""
# ) |>
#   jsonlite::write_json("data-raw/nfl_stats_variables.json", pretty = TRUE)

nfl_stats_variables <- jsonlite::fromJSON("data-raw/nfl_stats_variables.json")

usethis::use_data(nfl_stats_variables, overwrite = TRUE)

s_old_1 <- nflreadr::load_player_stats(2023, "offense")
s_old_2 <- nflreadr::load_player_stats(2023, "defense")
s_old_3 <- nflreadr::load_player_stats(2023, "kicking")
n_old_1 <- names(s_old_1)
n_old_2 <- names(s_old_2)
n_old_3 <- names(s_old_3)


# Differences to old offense stats ----------------------------------------

# recent_team -> team (recent team in weekly data never made sense)
# interceptions -> passing_interceptions (all passing stats have the passing prefix)
# sacks -> sacks_suffered (to make clear it's not on defensive side)
# sack_yards -> sack_yards_lost (to make clear it's not on defensive side)
# dakota -> not implemented at the moment
setdiff(n_old_1, n2)
setdiff(n2, n_old_1)

# Differences to old defense stats ----------------------------------------

# def_tackles -> there is def_tackles_solo and def_tackles_with_assist
# def_fumble_recovery_own -> fumble_recovery_own (it is not exclusive to defense)
# def_fumble_recovery_yards_own -> fumble_recovery_yards_own (it is not exclusive to defense)
# def_fumble_recovery_opp -> fumble_recovery_opp (it is not exclusive to defense)
# def_fumble_recovery_yards_opp -> fumble_recovery_yards_opp (it is not exclusive to defense)
# def_safety -> def_safeties (we use plural everywhere)
# def_penalty -> penalties (it is not exclusive to defense)
# def_penalty_yards -> penalty_yards (it is not exclusive to defense)
setdiff(n_old_2, n2)
setdiff(n2, n_old_2)


# Differences to old kicking stats ----------------------------------------

# No differences
setdiff(n_old_3, n2)
setdiff(n2, n_old_3)


================================================
FILE: data-raw/nfl_stats_variables.json
================================================
[
  {
    "variable": "player_id",
    "description": "GSIS player ID. Available if stat_type = 'player'."
  },
  {
    "variable": "player_name",
    "description": "Short player name as listed in play-by-play data. Please keep in mind that this name is not always unique for one player and can change from season to season and sometimes even within a season. Do not group by this variable. Available if stat_type = 'player'."
  },
  {
    "variable": "player_display_name",
    "description": "Full name of player. Available if stat_type = 'player'."
  },
  {
    "variable": "position",
    "description": "Position of player. Available if stat_type = 'player'."
  },
  {
    "variable": "position_group",
    "description": "Position group of player. Available if stat_type = 'player'."
  },
  {
    "variable": "headshot_url",
    "description": "URL to a player headshot image. Available if stat_type = 'player'."
  },
  {
    "variable": "season",
    "description": "The NFL season"
  },
  {
    "variable": "week",
    "description": "The NFL week. Available if summary_level = 'week'"
  },
  {
    "variable": "season_type",
    "description": "One of 'REG', 'POST', or 'REG+POST'"
  },
  {
    "variable": "game_id",
    "description": "The nflverse game id of the form '{season}_{week}_{away abbreviation}_{home abbreviation}'. Available if summary_level = 'week'"
  },
  {
    "variable": "recent_team",
    "description": "Most recent team player appears in data with. Available if stat_type = 'player' & summary_level = 'season'."
  },
  {
    "variable": "team",
    "description": "Team stats are counted for."
  },
  {
    "variable": "opponent_team",
    "description": "The opponent team in that week. Available if summary_level = 'week'."
  },
  {
    "variable": "games",
    "description": "The number of games where stats were counted. Available if summary_level = 'season'."
  },
  {
    "variable": "completions",
    "description": "The number of completed passes."
  },
  {
    "variable": "attempts",
    "description": "The number of pass attempts as defined by the NFL."
  },
  {
    "variable": "passing_yards",
    "description": "Yards gained on pass plays."
  },
  {
    "variable": "passing_tds",
    "description": "The number of passing touchdowns."
  },
  {
    "variable": "passing_interceptions",
    "description": "The number of interceptions thrown."
  },
  {
    "variable": "sacks_suffered",
    "description": "The Number of times sacked."
  },
  {
    "variable": "sack_yards_lost",
    "description": "Yards lost on sack plays."
  },
  {
    "variable": "sack_fumbles",
    "description": "The number of sacks with a fumble."
  },
  {
    "variable": "sack_fumbles_lost",
    "description": "The number of sacks with a lost fumble."
  },
  {
    "variable": "passing_air_yards",
    "description": "Passing air yards (includes incomplete passes)."
  },
  {
    "variable": "passing_yards_after_catch",
    "description": "Yards after the catch gained on plays in which player was the passer (this is an unofficial stat and may differ slightly between different sources)."
  },
  {
    "variable": "passing_first_downs",
    "description": "First downs on pass attempts."
  },
  {
    "variable": "passing_epa",
    "description": "Total expected points added on pass attempts and sacks. NOTE: this uses the variable `qb_epa`, which gives QB credit for EPA for up to the point where a receiver lost a fumble after a completed catch and makes EPA work more like passing yards on plays with fumbles."
  },
  {
    "variable": "passing_cpoe",
    "description": "Completion percentage over expectation"
  },
  {
    "variable": "passing_2pt_conversions",
    "description": "Two-point conversion passes."
  },
  {
    "variable": "pacr",
    "description": "Passing Air Conversion Ratio. PACR = `passing_yards` / `passing_air_yards`. Available if stat_type = 'player'."
  },
  {
    "variable": "passing_10",
    "description": "The number of passes that gained 10 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "passing_16",
    "description": "The number of passes that gained 16 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "passing_20",
    "description": "The number of passes that gained 20 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "passing_40",
    "description": "The number of passes that gained 40 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "carries",
    "description": "The number of official rush attempts (incl. scrambles and kneel downs). Rushes after a lateral reception don't count as carry."
  },
  {
    "variable": "rushing_yards",
    "description": "Yards gained when rushing with the ball (incl. scrambles and kneel downs). Also includes yards gained after obtaining a lateral on a play that started with a rushing attempt."
  },
  {
    "variable": "rushing_tds",
    "description": "The number of rushing touchdowns (incl. scrambles). Also includes touchdowns after obtaining a lateral on a play that started with a rushing attempt."
  },
  {
    "variable": "rushing_fumbles",
    "description": "The number of rushes with a fumble."
  },
  {
    "variable": "rushing_fumbles_lost",
    "description": "The number of rushes with a lost fumble."
  },
  {
    "variable": "rushing_first_downs",
    "description": "First downs on rush attempts (incl. scrambles)."
  },
  {
    "variable": "rushing_epa",
    "description": "Expected points added on rush attempts (incl. scrambles and kneel downs)."
  },
  {
    "variable": "rushing_2pt_conversions",
    "description": "Two-point conversion rushes."
  },
  {
    "variable": "rushing_10",
    "description": "The number of runs that gained 10 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "rushing_12",
    "description": "The number of runs that gained 12 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "rushing_20",
    "description": "The number of runs that gained 20 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "rushing_40",
    "description": "The number of runs that gained 40 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "receptions",
    "description": "The number of pass receptions. Lateral receptions officially don't count as reception."
  },
  {
    "variable": "targets",
    "description": "The number of pass plays where the player was the targeted receiver."
  },
  {
    "variable": "receiving_yards",
    "description": "Yards gained after a pass reception. Includes yards gained after receiving a lateral on a play that started as a pass play."
  },
  {
    "variable": "receiving_tds",
    "description": "The number of touchdowns following a pass reception. Also includes touchdowns after receiving a lateral on a play that started as a pass play."
  },
  {
    "variable": "receiving_fumbles",
    "description": "The number of fumbles after a pass reception."
  },
  {
    "variable": "receiving_fumbles_lost",
    "description": "The number of fumbles lost after a pass reception."
  },
  {
    "variable": "receiving_air_yards",
    "description": "Receiving air yards (incl. incomplete passes)."
  },
  {
    "variable": "receiving_yards_after_catch",
    "description": "Yards after the catch gained on plays in which player was receiver (this is an unofficial stat and may differ slightly between different sources)."
  },
  {
    "variable": "receiving_first_downs",
    "description": "First downs on receptions."
  },
  {
    "variable": "receiving_epa",
    "description": "Expected points added on receptions."
  },
  {
    "variable": "receiving_2pt_conversions",
    "description": "Two-point conversion receptions."
  },
  {
    "variable": "receiving_10",
    "description": "The number of receptions that gained 10 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "receiving_16",
    "description": "The number of receptions that gained 16 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "receiving_20",
    "description": "The number of receptions that gained 20 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "receiving_40",
    "description": "The number of receptions that gained 40 or more yards. Some define this as an 'explosive' play."
  },
  {
    "variable": "racr",
    "description": "Receiver Air Conversion Ratio. RACR = `receiving_yards` / `receiving_air_yards`. Available if stat_type = 'player'."
  },
  {
    "variable": "target_share",
    "description": "The share of targets of the player in all targets of his team. Available if stat_type = 'player'."
  },
  {
    "variable": "air_yards_share",
    "description": "The share of receiving_air_yards of the player in all air_yards of his team. Available if stat_type = 'player'."
  },
  {
    "variable": "wopr",
    "description": "Weighted Opportunity Rating. WOPR = 1.5 × `target_share` + 0.7 × `air_yards_share`. Available if stat_type = 'player'."
  },
  {
    "variable": "special_teams_tds",
    "description": "The number of touchdowns scored in special teams plays."
  },
  {
    "variable": "def_tackles_solo",
    "description": "Solo tackles."
  },
  {
    "variable": "def_tackles_with_assist",
    "description": "Tackles where another player assisted."
  },
  {
    "variable": "def_tackle_assists",
    "description": "Assist to another player's tackle."
  },
  {
    "variable": "def_tackles_for_loss",
    "description": "Tackles for loss."
  },
  {
    "variable": "def_tackles_for_loss_yards",
    "description": "Yards lost by the opposing team through a tackle."
  },
  {
    "variable": "def_fumbles_forced",
    "description": "Forced fumbles."
  },
  {
    "variable": "def_sacks",
    "description": "Number of Sacks."
  },
  {
    "variable": "def_sack_yards",
    "description": "Yards lost by the opposing team through a sack."
  },
  {
    "variable": "def_qb_hits",
    "description": "Number of QB hits"
  },
  {
    "variable": "def_interceptions",
    "description": "Interceptions caught."
  },
  {
    "variable": "def_interception_yards",
    "description": "Yards gained after interceptions."
  },
  {
    "variable": "def_pass_defended",
    "description": "Number of defended passes."
  },
  {
    "variable": "def_tds",
    "description": "Defensive touchdowns."
  },
  {
    "variable": "def_fumbles",
    "description": "Number of fumbles while playing on defense."
  },
  {
    "variable": "def_safeties",
    "description": "Tackles that resulted in a safety."
  },
  {
    "variable": "misc_yards",
    "description": "Yardage gained/lost that doesn't fall into any other category. Examples are blocked field goals or blocked punts."
  },
  {
    "variable": "fumble_recovery_own",
    "description": "Recovered fumbles where the ball was fumbled by own team."
  },
  {
    "variable": "fumble_recovery_yards_own",
    "description": "Yardage gained/lost by a player after he recovered a fumble by his own team. Includes yardage gained/lost where a team mate recovered a fumble and lateraled the ball to the player."
  },
  {
    "variable": "fumble_recovery_opp",
    "description": "Recovered fumbles where the ball was fumbled by opposing team."
  },
  {
    "variable": "fumble_recovery_yards_opp",
    "description": "Yardage gained/lost by a player after he recovered a fumble by the opposing team. Includes yardage gained/lost where a team mate recovered a fumble and lateraled the ball to the player."
  },
  {
    "variable": "fumble_recovery_tds",
    "description": "Touchdowns scored after a fumble recovery. This can be in any unit. And both the own team and the opposing team can have fumbled the ball initially. Includes touchdowns where a team mate recovered a fumble and lateraled the ball to the player."
  },
  {
    "variable": "penalties",
    "description": "Penalties caused."
  },
  {
    "variable": "penalty_yards",
    "description": "Yardage lost through penalties."
  },
  {
    "variable": "timeouts",
    "description": "Number of timeouts taken by team. Available if stat_type = 'team'."
  },
  {
    "variable": "fumbles_forced_by_opp",
    "description": "The number of fumbles by the player. The fumble was forced by the opponent."
  },
  {
    "variable": "fumbles_not_forced",
    "description": "The number of fumbles by the player. The fumble was NOT forced by the opponent."
  },
  {
    "variable": "fumbles_out_of_bounds",
    "description": "The number of fumbles by the player, and the ball went out of bounds. The fumble may or may not have been forced."
  },
  {
    "variable": "fumbles_total",
    "description": "The total number of fumbles by the player. Equals `fumbles_forced_by_opp` + `fumbles_not_forced`."
  },
  {
    "variable": "fumbles_lost_total",
    "description": "The total number of fumbles lost by the player."
  },
  {
    "variable": "punt_returns",
    "description": "Number of punts returned."
  },
  {
    "variable": "punt_return_yards",
    "description": "Yardage gained/lost by a player during a punt return."
  },
  {
    "variable": "kickoff_returns",
    "description": "Number of kickoffs returned."
  },
  {
    "variable": "kickoff_return_yards",
    "description": "Yardage gained/lost by a player during a kickoff return."
  },
  {
    "variable": "fg_made",
    "description": "Successful field goal attempts."
  },
  {
    "variable": "fg_att",
    "description": "Attempted field goals."
  },
  {
    "variable": "fg_missed",
    "description": "Missed field goals."
  },
  {
    "variable": "fg_blocked",
    "description": "Attempted field goals that were blocked."
  },
  {
    "variable": "fg_long",
    "description": "Distance of longest made field goal."
  },
  {
    "variable": "fg_pct",
    "description": "Percentage of successful field goal attempts."
  },
  {
    "variable": "fg_made_0_19",
    "description": "Successful field goal attempts where distance was between 0 and 19 yards."
  },
  {
    "variable": "fg_made_20_29",
    "description": "Successful field goal attempts where distance was between 20 and 29 yards."
  },
  {
    "variable": "fg_made_30_39",
    "description": "Successful field goal attempts where distance was between 30 and 39 yards."
  },
  {
    "variable": "fg_made_40_49",
    "description": "Successful field goal attempts where distance was between 40 and 49 yards."
  },
  {
    "variable": "fg_made_50_59",
    "description": "Successful field goal attempts where distance was between 50 and 59 yards."
  },
  {
    "variable": "fg_made_60_",
    "description": "Successful field goal attempts where distance was 60+ yards."
  },
  {
    "variable": "fg_missed_0_19",
    "description": "Missed field goal attempts where distance was between 0 and 19 yards."
  },
  {
    "variable": "fg_missed_20_29",
    "description": "Missed field goal attempts where distance was between 20 and 29 yards."
  },
  {
    "variable": "fg_missed_30_39",
    "description": "Missed field goal attempts where distance was between 30 and 39 yards."
  },
  {
    "variable": "fg_missed_40_49",
    "description": "Missed field goal attempts where distance was between 40 and 49 yards."
  },
  {
    "variable": "fg_missed_50_59",
    "description": "Missed field goal attempts where distance was between 50 and 59 yards."
  },
  {
    "variable": "fg_missed_60_",
    "description": "Missed field goal attempts where distance was 60+ yards."
  },
  {
    "variable": "fg_made_list",
    "description": "Distances of all successful field goal attempts."
  },
  {
    "variable": "fg_missed_list",
    "description": "Distances of all missed field goal attempts."
  },
  {
    "variable": "fg_blocked_list",
    "description": "Distances of all blocked field goal attempts."
  },
  {
    "variable": "fg_made_distance",
    "description": "Sum of distances of all made field goals."
  },
  {
    "variable": "fg_missed_distance",
    "description": "Sum of distances of all missed field goals."
  },
  {
    "variable": "fg_blocked_distance",
    "description": "Sum of distances of all blocked field goals."
  },
  {
    "variable": "pat_made",
    "description": "Successful extra point attempts."
  },
  {
    "variable": "pat_att",
    "description": "Attempted extra points."
  },
  {
    "variable": "pat_missed",
    "description": "Missed extra points."
  },
  {
    "variable": "pat_blocked",
    "description": "Extra points blocked by opponent."
  },
  {
    "variable": "pat_pct",
    "description": "Percentage of successful extra point attempts."
  },
  {
    "variable": "gwfg_made",
    "description": "Successful game winning field goal attempts."
  },
  {
    "variable": "gwfg_att",
    "description": "Attempted game winning field goals."
  },
  {
    "variable": "gwfg_missed",
    "description": "Missed game winning field goals."
  },
  {
    "variable": "gwfg_blocked",
    "description": "Game winning field goal attempts blocked by opponent."
  },
  {
    "variable": "gwfg_distance",
    "description": "Distance of game winning field goal attempt. Available if summary_level = 'week'."
  },
  {
    "variable": "gwfg_distance_list",
    "description": "Distances of game winning field goal attempts. Available if summary_level = 'season'."
  },
  {
    "variable": "pt_att",
    "description": "Kicked punts."
  },
  {
    "variable": "pt_blocked",
    "description": "Kicked punts that were blocked."
  },
  {
    "variable": "pt_long",
    "description": "Longest punt kicked."
  },
  {
    "variable": "pt_yards",
    "description": "Length of punts kicked."
  },
  {
    "variable": "pt_inside_20",
    "description": "The number of punts where the RETURN ended inside the opponent's 20 yard line."
  },
  {
    "variable": "pt_out_of_bounds",
    "description": "The number of punts that went out of bounds without return."
  },
  {
    "variable": "pt_downed",
    "description": "The number of punts downed without return."
  },
  {
    "variable": "pt_touchback",
    "description": "The number of punts that resulted in a touchback."
  },
  {
    "variable": "pt_fair_caught",
    "description": "The number of punts that resulted in a fair catch."
  },
  {
    "variable": "pt_returned",
    "description": "The number of punts that were returned by the opponent team."
  },
  {
    "variable": "pt_return_yards",
    "description": "The punt return yardage of the opponent team."
  },
  {
    "variable": "pt_return_tds",
    "description": "The number of punts that were returned for a touchdown by the opponent team."
  },
  {
    "variable": "pt_net_yards",
    "description": "Net punt yardage. Equals `pt_yards` - `pt_return_yards` - `pt_touchback` * `20`."
  },
  {
    "variable": "fantasy_points",
    "description": "Standard fantasy points."
  },
  {
    "variable": "fantasy_points_ppr",
    "description": "PPR fantasy points."
  }
]


================================================
FILE: data-raw/pbp_datatypes.csv
================================================
skim_type,skim_variable,n_missing,complete_rate,character.min,character.max,character.empty,character.n_unique,character.whitespace,numeric.mean,numeric.sd,numeric.p0,numeric.p25,numeric.p50,numeric.p75,numeric.p100,numeric.hist
character,game_id,0,1,13,15,0,6134,0,NA,NA,NA,NA,NA,NA,NA,NA
character,old_game_id,0,1,10,10,0,6134,0,NA,NA,NA,NA,NA,NA,NA,NA
character,home_team,0,1,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,away_team,0,1,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,season_type,0,1,3,4,0,2,0,NA,NA,NA,NA,NA,NA,NA,NA
character,posteam,74782,0.931927869867191,0,3,104,33,0,NA,NA,NA,NA,NA,NA,NA,NA
character,posteam_type,74782,0.931927869867191,4,4,0,2,0,NA,NA,NA,NA,NA,NA,NA,NA
character,defteam,74782,0.931927869867191,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,side_of_field,85173,0.9224692099729649,0,5,79,38,0,NA,NA,NA,NA,NA,NA,NA,NA
character,game_date,0,1,10,10,0,1169,0,NA,NA,NA,NA,NA,NA,NA,NA
character,game_half,0,1,5,8,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,time,1137,0.9989650181599716,0,5,62,1502,0,NA,NA,NA,NA,NA,NA,NA,NA
character,yrdln,7356,0.9933040225019798,0,6,2,1659,0,NA,NA,NA,NA,NA,NA,NA,NA
character,desc,0,1,4,1079,0,990769,0,NA,NA,NA,NA,NA,NA,NA,NA
character,play_type,50328,0.9541877167590596,3,11,0,9,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_length,806758,0.26562895400384134,4,5,0,2,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_location,806698,0.26568357045977953,4,6,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,run_location,778466,0.2913824335272217,4,6,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,run_gap,870143,0.20793121967648853,3,6,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,field_goal_result,1075311,0.02117206914443326,4,7,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,extra_point_result,1070162,0.025859071338194206,4,7,0,4,0,NA,NA,NA,NA,NA,NA,NA,NA
character,two_point_conv_result,1096605,0.0017886889319752575,7,7,0,2,0,NA,NA,NA,NA,NA,NA,NA,NA
character,timeout_team,1053072,0.04141565853791751,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,td_team,1068075,0.02775881373057698,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,td_player_name,1068075,0.02775881373057698,5,19,0,2989,0,NA,NA,NA,NA,NA,NA,NA,NA
character,td_player_id,1068076,0.02775790345631135,10,10,0,3264,0,NA,NA,NA,NA,NA,NA,NA,NA
character,passer_player_id,653500,0.40513576740671964,10,10,0,767,0,NA,NA,NA,NA,NA,NA,NA,NA
character,passer_player_name,653500,0.40513576740671964,5,17,0,802,0,NA,NA,NA,NA,NA,NA,NA,NA
character,receiver_player_id,715364,0.34882256023739955,10,10,0,3088,0,NA,NA,NA,NA,NA,NA,NA,NA
character,receiver_player_name,715364,0.34882256023739955,5,18,0,2971,0,NA,NA,NA,NA,NA,NA,NA,NA
character,rusher_player_id,764576,0.3040261430769091,10,10,0,2262,0,NA,NA,NA,NA,NA,NA,NA,NA
character,rusher_player_name,764576,0.3040261430769091,5,18,0,2206,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_receiver_player_id,1098347,2.0299116123689842e-4,10,10,0,181,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_receiver_player_name,1098347,2.0299116123689842e-4,6,16,0,181,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_rusher_player_id,1098531,3.550069635982478e-5,10,10,0,37,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_rusher_player_name,1098531,3.550069635982478e-5,6,13,0,37,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_sack_player_id,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_sack_player_name,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,interception_player_id,1086906,0.01061743903438106,10,10,0,1982,0,NA,NA,NA,NA,NA,NA,NA,NA
character,interception_player_name,1086906,0.01061743903438106,5,19,0,1904,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_interception_player_id,1098491,7.19116669852804e-5,10,10,0,70,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_interception_player_name,1098491,7.19116669852804e-5,5,14,0,70,0,NA,NA,NA,NA,NA,NA,NA,NA
character,punt_returner_player_id,1059306,0.03574100876594122,10,10,0,930,0,NA,NA,NA,NA,NA,NA,NA,NA
character,punt_returner_player_name,1059305,0.03574191904020685,5,19,0,910,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_punt_returner_player_id,1098509,5.552673020381427e-5,10,10,0,56,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_punt_returner_player_name,1098509,5.552673020381427e-5,5,19,0,56,0,NA,NA,NA,NA,NA,NA,NA,NA
character,kickoff_returner_player_name,1059285,0.03576012452551958,5,19,0,2072,0,NA,NA,NA,NA,NA,NA,NA,NA
character,kickoff_returner_player_id,1059286,0.035759214251253946,10,10,0,2222,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_kickoff_returner_player_id,1098419,1.374514141110339e-4,10,10,0,132,0,NA,NA,NA,NA,NA,NA,NA,NA
character,lateral_kickoff_returner_player_name,1098419,1.374514141110339e-4,6,16,0,132,0,NA,NA,NA,NA,NA,NA,NA,NA
character,punter_player_id,1041530,0.05192204411189094,10,10,0,222,0,NA,NA,NA,NA,NA,NA,NA,NA
character,punter_player_name,1041530,0.05192204411189094,5,16,0,243,0,NA,NA,NA,NA,NA,NA,NA,NA
character,kicker_player_name,985912,0.10254967821804706,3,14,0,334,0,NA,NA,NA,NA,NA,NA,NA,NA
character,kicker_player_id,985912,0.10254967821804706,10,10,0,310,0,NA,NA,NA,NA,NA,NA,NA,NA
character,own_kickoff_recovery_player_id,1098310,2.3667130906546152e-4,10,10,0,238,0,NA,NA,NA,NA,NA,NA,NA,NA
character,own_kickoff_recovery_player_name,1098310,2.3667130906546152e-4,5,19,0,236,0,NA,NA,NA,NA,NA,NA,NA,NA
character,blocked_player_id,1097590,8.920687803235516e-4,10,10,0,622,0,NA,NA,NA,NA,NA,NA,NA,NA
character,blocked_player_name,1097589,8.929790545891825e-4,4,19,0,603,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_for_loss_1_player_id,1048762,0.04533894062280963,10,10,0,3367,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_for_loss_1_player_name,1048699,0.04539628790154471,4,19,0,3226,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_for_loss_2_player_id,1098559,1.0013016922050255e-5,10,10,0,11,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_for_loss_2_player_name,1098559,1.0013016922050255e-5,6,12,0,11,0,NA,NA,NA,NA,NA,NA,NA,NA
character,qb_hit_1_player_id,1050497,0.0437596147719308,10,10,0,2982,0,NA,NA,NA,NA,NA,NA,NA,NA
character,qb_hit_1_player_name,1050490,0.043765986691790215,4,19,0,2829,0,NA,NA,NA,NA,NA,NA,NA,NA
character,qb_hit_2_player_id,1096391,0.0019834876248213673,10,10,0,1015,0,NA,NA,NA,NA,NA,NA,NA,NA
character,qb_hit_2_player_name,1096391,0.0019834876248213673,5,19,0,995,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_1_team,1087072,0.010466333506285452,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_1_player_id,1087105,0.010436294455519413,10,10,0,2977,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_1_player_name,1087079,0.010459961586426036,4,19,0,2720,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_2_team,1098504,6.0078101531968464e-5,2,3,0,26,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_2_player_id,1098504,6.0078101531968464e-5,10,10,0,65,0,NA,NA,NA,NA,NA,NA,NA,NA
character,forced_fumble_player_2_player_name,1098504,6.0078101531968464e-5,6,14,0,65,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_1_team,599256,0.45451268467189165,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_2_team,1092308,0.005700137451414067,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_1_player_id,599641,0.45416222907962167,10,10,0,8252,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_2_player_id,1092376,0.005638238801350837,10,10,0,3134,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_1_player_name,599306,0.4544671709586098,4,19,0,7320,0,NA,NA,NA,NA,NA,NA,NA,NA
character,solo_tackle_2_player_name,1092313,0.005695586080085913,4,18,0,2863,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_1_player_id,956715,0.12912695595182833,10,10,0,5669,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_1_player_name,956715,0.12912695595182833,5,19,0,5142,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_1_team,956715,0.12912695595182833,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_2_player_id,1039980,0.05333296922362707,10,10,0,4638,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_2_player_name,1039980,0.05333296922362707,5,19,0,4165,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_2_team,1039980,0.05333296922362707,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_3_player_id,1098567,2.730822796892518e-6,10,10,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_3_player_name,1098567,2.730822796892518e-6,8,9,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_3_team,1098567,2.730822796892518e-6,3,3,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_4_player_id,1098567,2.730822796892518e-6,10,10,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_4_player_name,1098567,2.730822796892518e-6,7,8,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,assist_tackle_4_team,1098567,2.730822796892518e-6,3,3,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_1_player_id,1015313,0.0757867045340761,10,10,0,4738,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_1_player_name,1015313,0.0757867045340761,5,19,0,4380,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_1_team,1015313,0.0757867045340761,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_2_player_id,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_2_player_name,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,tackle_with_assist_2_team,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_defense_1_player_id,1042237,0.05127848020608605,10,10,0,3379,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_defense_1_player_name,1042236,0.05127939048035168,4,19,0,3197,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_defense_2_player_id,1096864,0.0015529278971754268,10,10,0,996,0,NA,NA,NA,NA,NA,NA,NA,NA
character,pass_defense_2_player_name,1096864,0.0015529278971754268,5,19,0,963,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_1_team,1081551,0.01549195772686307,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_1_player_id,1081555,0.015488316629800547,10,10,0,2696,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_1_player_name,1081555,0.015488316629800547,5,19,0,2517,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_2_player_id,1098456,1.0377126628258182e-4,10,10,0,111,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_2_player_name,1098456,1.0377126628258182e-4,5,12,0,110,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumbled_2_team,1098456,1.0377126628258182e-4,2,3,0,30,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_1_team,1082877,0.014284934050629472,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_1_player_id,1082887,0.014275831307973053,10,10,0,5004,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_1_player_name,1082887,0.014275831307973053,5,19,0,4398,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_2_team,1098447,1.1196373467325937e-4,2,3,0,31,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_2_player_id,1098447,1.1196373467325937e-4,10,10,0,120,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fumble_recovery_2_player_name,1098447,1.1196373467325937e-4,5,13,0,118,0,NA,NA,NA,NA,NA,NA,NA,NA
character,sack_player_id,1073024,0.023253866389943312,10,10,0,2582,0,NA,NA,NA,NA,NA,NA,NA,NA
character,sack_player_name,1072977,0.023296649280428183,4,19,0,2488,0,NA,NA,NA,NA,NA,NA,NA,NA
character,half_sack_1_player_id,1096089,0.002258390453043546,10,10,0,1118,0,NA,NA,NA,NA,NA,NA,NA,NA
character,half_sack_1_player_name,1096089,0.002258390453043546,5,18,0,1084,0,NA,NA,NA,NA,NA,NA,NA,NA
character,half_sack_2_player_id,1096089,0.002258390453043546,10,10,0,1094,0,NA,NA,NA,NA,NA,NA,NA,NA
character,half_sack_2_player_name,1096089,0.002258390453043546,5,19,0,1063,0,NA,NA,NA,NA,NA,NA,NA,NA
character,return_team,971115,0.11601900652666652,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,penalty_team,1021292,0.07034417469983711,2,3,0,32,0,NA,NA,NA,NA,NA,NA,NA,NA
character,penalty_player_id,1024660,0.06727837097317424,10,47,0,7741,0,NA,NA,NA,NA,NA,NA,NA,NA
character,penalty_player_name,1024735,0.06721010040325148,5,19,0,6791,0,NA,NA,NA,NA,NA,NA,NA,NA
character,replay_or_challenge_result,1090418,0.007420555813466567,6,8,0,3,0,NA,NA,NA,NA,NA,NA,NA,NA
character,penalty_type,1021527,0.07013026024741253,7,38,0,69,0,NA,NA,NA,NA,NA,NA,NA,NA
character,safety_player_name,1098320,2.275685664090421e-4,4,15,0,217,0,NA,NA,NA,NA,NA,NA,NA,NA
character,safety_player_id,1098325,2.230171950808879e-4,10,47,0,224,0,NA,NA,NA,NA,NA,NA,NA,NA
character,series_result,248,0.9997742519821222,4,17,0,11,0,NA,NA,NA,NA,NA,NA,NA,NA
character,start_time,91628,0.916593389588283,7,8,0,48,0,NA,NA,NA,NA,NA,NA,NA,NA
character,time_of_day,157093,0.8570022847884068,7,8,0,62195,0,NA,NA,NA,NA,NA,NA,NA,NA
character,stadium,569653,0.48145953375752115,3,35,0,149,0,NA,NA,NA,NA,NA,NA,NA,NA
character,weather,563302,0.4872406856185769,22,123,0,2682,0,NA,NA,NA,NA,NA,NA,NA,NA
character,nfl_api_id,91628,0.916593389588283,36,36,0,5619,0,NA,NA,NA,NA,NA,NA,NA,NA
character,play_clock,409460,0.6272790991925867,1,2,0,47,0,NA,NA,NA,NA,NA,NA,NA,NA
character,play_type_nfl,91848,0.916393129249843,4,11,0,15,0,NA,NA,NA,NA,NA,NA,NA,NA
character,st_play_type,1097598,8.847865861983939e-4,7,7,0,1,0,NA,NA,NA,NA,NA,NA,NA,NA
character,end_clock_time,893921,0.1862867181881901,4,5,0,902,0,NA,NA,NA,NA,NA,NA,NA,NA
character,end_yard_line,249182,0.7731760379402314,2,7,0,2240,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fixed_drive_result,556,0.9994938875083063,4,17,0,9,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_real_start_time,1098570,0,NA,NA,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_time_of_possession,14081,0.9871824280655762,4,5,0,675,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_start_transition,18201,0.9834320980911548,4,19,0,29,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_end_transition,50801,0.9537571570314136,4,19,0,31,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_game_clock_start,14081,0.9871824280655762,4,5,0,1500,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_game_clock_end,14081,0.9871824280655762,4,5,0,1500,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_start_yard_line,16021,0.9854164959902418,0,6,241,1614,0,NA,NA,NA,NA,NA,NA,NA,NA
character,drive_end_yard_line,16021,0.9854164959902418,0,6,241,1634,0,NA,NA,NA,NA,NA,NA,NA,NA
character,location,0,1,4,7,0,2,0,NA,NA,NA,NA,NA,NA,NA,NA
character,roof,4387,0.9960066267966539,4,8,0,4,0,NA,NA,NA,NA,NA,NA,NA,NA
character,surface,0,1,5,10,0,9,0,NA,NA,NA,NA,NA,NA,NA,NA
character,home_coach,0,1,7,20,0,151,0,NA,NA,NA,NA,NA,NA,NA,NA
character,away_coach,0,1,7,20,0,151,0,NA,NA,NA,NA,NA,NA,NA,NA
character,stadium_id,0,1,5,5,0,59,0,NA,NA,NA,NA,NA,NA,NA,NA
character,game_stadium,0,1,8,35,0,94,0,NA,NA,NA,NA,NA,NA,NA,NA
character,passer,616172,0.4391144851943891,5,16,0,737,0,NA,NA,NA,NA,NA,NA,NA,NA
character,rusher,771805,0.2974457704106247,5,18,0,2248,0,NA,NA,NA,NA,NA,NA,NA,NA
character,receiver,669086,0.3909482327025132,4,18,0,2722,0,NA,NA,NA,NA,NA,NA,NA,NA
character,passer_id,616172,0.4391144851943891,10,10,0,765,0,NA,NA,NA,NA,NA,NA,NA,NA
character,rusher_id,771806,0.29744486013635907,10,10,0,2456,0,NA,NA,NA,NA,NA,NA,NA,NA
character,receiver_id,669086,0.3909482327025132,10,10,0,3088,0,NA,NA,NA,NA,NA,NA,NA,NA
character,name,289407,0.7365602556050138,5,18,0,2325,0,NA,NA,NA,NA,NA,NA,NA,NA
character,id,289408,0.7365593453307482,10,10,0,2538,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fantasy_player_name,381375,0.6528441519429804,5,18,0,3442,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fantasy_player_id,381375,0.6528441519429804,10,10,0,3587,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fantasy,326758,0.7025606015092347,4,18,0,3342,0,NA,NA,NA,NA,NA,NA,NA,NA
character,fantasy_id,326759,0.7025596912349691,10,10,0,3764,0,NA,NA,NA,NA,NA,NA,NA,NA
numeric,play_id,0,1,NA,NA,NA,NA,NA,2084.4356254039344,1219.3576690394557,1,1036,2066,3102,5921,▇▇▇▃▁
numeric,week,0,1,NA,NA,NA,NA,NA,9.522483774361215,5.284186738284365,1,5,10,14,22,▇▆▆▆▁
numeric,yardline_100,85487,0.922183383853555,NA,NA,NA,NA,NA,49.03448977033471,24.905220727748304,1,30,51,70,99,▅▆▆▇▃
numeric,quarter_seconds_remaining,1199,0.9989085811555022,NA,NA,NA,NA,NA,416.734325036838,282.64536464321304,0,150,400,663,900,▇▅▅▅▆
numeric,half_seconds_remaining,1199,0.9989085811555022,NA,NA,NA,NA,NA,815.5808409371124,558.844159835077,0,287,806,1296,1800,▇▅▅▅▅
numeric,game_seconds_remaining,1199,0.9989085811555022,NA,NA,NA,NA,NA,1714.1759377639833,1058.2081334837549,0,801,1800,2607,3600,▇▆▇▆▆
numeric,quarter_end,0,1,NA,NA,NA,NA,NA,0.01757557552090445,0.13140277920743262,0,0,0,0,1,▇▁▁▁▁
numeric,drive,14081,0.9871824280655762,NA,NA,NA,NA,NA,12.280475873890838,7.123816430268896,1,6,12,18,38,▇▇▇▂▁
numeric,sp,0,1,NA,NA,NA,NA,NA,0.07154209563341436,0.2577283155762031,0,0,0,0,1,▇▁▁▁▁
numeric,qtr,0,1,NA,NA,NA,NA,NA,2.5646176392947195,1.1313198255404997,1,2,3,4,6,▇▃▅▁▁
numeric,down,184365,0.8321772850159753,NA,NA,NA,NA,NA,2.0047494817901894,1.0067423782563265,1,1,2,3,4,▇▆▁▃▂
numeric,goal_to_go,13,0.9999881664345467,NA,NA,NA,NA,NA,0.04769165368751918,0.21311311831669968,0,0,0,0,1,▇▁▁▁▁
numeric,ydstogo,0,1,NA,NA,NA,NA,NA,7.136612141238155,4.9257457058179615,0,3,9,10,50,▇▁▁▁▁
numeric,ydsnet,14081,0.9871824280655762,NA,NA,NA,NA,NA,38.70946777698989,28.730318595751015,-39,12,37,65,99,▁▇▇▇▅
numeric,yards_gained,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.9359645816734976,7.807584532205186,-38,0,0,6,99,▁▇▁▁▁
numeric,shotgun,0,1,NA,NA,NA,NA,NA,0.2984989577359661,0.45759973839118473,0,0,0,1,1,▇▁▁▁▃
numeric,no_huddle,0,1,NA,NA,NA,NA,NA,0.043568457176147136,0.20413300724484748,0,0,0,0,1,▇▁▁▁▁
numeric,qb_dropback,50328,0.9541877167590596,NA,NA,NA,NA,NA,0.4362780731930223,0.4959231297944922,0,0,0,1,1,▇▁▁▁▆
numeric,qb_kneel,0,1,NA,NA,NA,NA,NA,0.00735046469501261,0.08541921332781825,0,0,0,0,1,▇▁▁▁▁
numeric,qb_spike,0,1,NA,NA,NA,NA,NA,0.0015210682978781499,0.03897122055563391,0,0,0,0,1,▇▁▁▁▁
numeric,qb_scramble,0,1,NA,NA,NA,NA,NA,0.014260356645457247,0.1185622691649317,0,0,0,0,1,▇▁▁▁▁
numeric,air_yards,804184,0.267971999963589,NA,NA,NA,NA,NA,8.308679760586442,10.100947914566389,-93,2,6,13,78,▁▁▇▃▁
numeric,yards_after_catch,915119,0.1669907243052332,NA,NA,NA,NA,NA,5.121536541092717,7.023980347723798,-72,1,3,7,91,▁▁▇▁▁
numeric,kick_distance,964499,0.1220413810681158,NA,NA,NA,NA,NA,41.25097895890983,15.058408894839559,-2,31,41,53,88,▁▇▇▆▁
numeric,home_timeouts_remaining,0,1,NA,NA,NA,NA,NA,2.5088369425707966,0.7915343388518676,-1,2,3,3,3,▁▁▁▃▇
numeric,away_timeouts_remaining,0,1,NA,NA,NA,NA,NA,2.4807158396824964,0.813320107324933,0,2,3,3,3,▁▁▁▃▇
numeric,timeout,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.04347332414161605,0.20392016552278353,0,0,0,0,1,▇▁▁▁▁
numeric,posteam_timeouts_remaining,74782,0.931927869867191,NA,NA,NA,NA,NA,2.5355298167198677,0.7704586202045379,-1,2,3,3,3,▁▁▁▂▇
numeric,defteam_timeouts_remaining,74782,0.931927869867191,NA,NA,NA,NA,NA,2.5542592802416126,0.7423333886351559,-1,2,3,3,3,▁▁▁▂▇
numeric,total_home_score,0,1,NA,NA,NA,NA,NA,11.808507423286635,10.161861231134086,0,3,10,17,62,▇▅▁▁▁
numeric,total_away_score,0,1,NA,NA,NA,NA,NA,10.394369043392777,9.571924926888174,0,3,7,17,59,▇▃▁▁▁
numeric,posteam_score,75540,0.9312378819738387,NA,NA,NA,NA,NA,10.229523083389541,9.55150831293642,0,0,7,17,62,▇▃▁▁▁
numeric,defteam_score,75540,0.9312378819738387,NA,NA,NA,NA,NA,11.41926434219915,10.011129519281944,0,3,10,17,62,▇▅▁▁▁
numeric,score_differential,75540,0.9312378819738387,NA,NA,NA,NA,NA,-1.1897412588096141,10.834589890553639,-59,-7,0,4,59,▁▂▇▁▁
numeric,posteam_score_post,74782,0.931927869867191,NA,NA,NA,NA,NA,10.472527515462184,9.629131505015522,0,3,7,17,62,▇▃▁▁▁
numeric,defteam_score_post,74782,0.931927869867191,NA,NA,NA,NA,NA,11.424630880612003,10.018617374348864,0,3,10,17,62,▇▅▁▁▁
numeric,score_differential_post,74782,0.931927869867191,NA,NA,NA,NA,NA,-0.9521033651498161,10.915129362273541,-59,-7,0,6,59,▁▂▇▁▁
numeric,no_score_prob,0,1,NA,NA,NA,NA,NA,0.1318170844272374,0.2041972997826963,0,0.006777436123229563,0.03241235762834549,0.17009594291448593,1,▇▁▁▁▁
numeric,opp_fg_prob,0,1,NA,NA,NA,NA,NA,0.0900442372636615,0.07039615565190616,0,0.02218738803640008,0.08674044534564018,0.14487330988049507,0.42820701003074646,▇▆▂▁▁
numeric,opp_safety_prob,0,1,NA,NA,NA,NA,NA,0.0024300688393332974,0.005656309098386118,0,4.499515416682698e-4,0.0013044169172644615,0.002646706940140575,0.3727701008319855,▇▁▁▁▁
numeric,opp_td_prob,0,1,NA,NA,NA,NA,NA,0.13686450993658264,0.11023031029440622,0,0.030053429771214724,0.12771396338939667,0.22834347561001778,0.5410013794898987,▇▅▅▁▁
numeric,fg_prob,0,1,NA,NA,NA,NA,NA,0.2320199917491022,0.16120988501708008,0,0.14781925082206726,0.21269237995147705,0.2992970049381256,0.9976794235436337,▇▇▂▁▁
numeric,safety_prob,0,1,NA,NA,NA,NA,NA,0.0026556686697281505,0.002310675572790794,0,0.001050723367370665,0.0024312720634043217,0.003875198948662728,0.3445877730846405,▇▁▁▁▁
numeric,td_prob,0,1,NA,NA,NA,NA,NA,0.2910022321016449,0.174859975956561,0,0.1884789690375328,0.3141609728336334,0.39814645051956177,0.9335877299308777,▅▇▅▁▁
numeric,extra_point_prob,0,1,NA,NA,NA,NA,NA,0.02504230941374404,0.1535238490465528,0,0,0,0,0.9963630685616458,▇▁▁▁▁
numeric,two_point_conversion_prob,0,1,NA,NA,NA,NA,NA,8.469442092902586e-4,0.02000777681298465,0,0,0,0,0.4735,▇▁▁▁▁
numeric,ep,17843,0.9837579762782527,NA,NA,NA,NA,NA,1.687473113671753,1.7057576445967129,-3.8036762471310794,0.4870520166296046,1.4019992728717625,2.8423898457549512,6.593665642023552,▁▃▇▅▁
numeric,epa,18018,0.9835986782817663,NA,NA,NA,NA,NA,-0.007954446230637966,1.2560468088794987,-13.58485902735265,-0.5379002675181255,-0,0.5221822524326853,9.579868719680235,▁▁▇▃▁
numeric,total_home_epa,0,1,NA,NA,NA,NA,NA,0.24846134966507072,12.107505734241665,-63.419816149475174,-6.520948685618038,0.1629172118846327,7.009262653625,65.34455354430793,▁▂▇▁▁
numeric,total_away_epa,0,1,NA,NA,NA,NA,NA,-0.22035254191216969,12.12565119448133,-65.34455354430793,-6.986512561861116,-0.1464349998728821,6.554438007757298,73.00163108506858,▁▂▇▁▁
numeric,total_home_rush_epa,0,1,NA,NA,NA,NA,NA,-0.06469372946176401,5.308144641341979,-32.110373332952676,-2.927859448711388,0,2.827283695190957,33.16040635136608,▁▁▇▁▁
numeric,total_away_rush_epa,0,1,NA,NA,NA,NA,NA,0.06469372946176401,5.308144641341979,-33.16040635136608,-2.827283695190957,0,2.927859448711388,32.110373332952676,▁▁▇▁▁
numeric,total_home_pass_epa,0,1,NA,NA,NA,NA,NA,-0.1637843757034343,10.596039297207732,-60.38258939230582,-6.080178061468196,0,5.74475710087931,53.981246434958145,▁▁▇▂▁
numeric,total_away_pass_epa,0,1,NA,NA,NA,NA,NA,0.16382928846033679,10.595990115688988,-53.981246434958145,-5.74475710087931,0,6.080178061468196,60.38258939230582,▁▂▇▁▁
numeric,air_epa,805471,0.26680047698371523,NA,NA,NA,NA,NA,0.5055065487066353,1.354761434024009,-11.79624876496382,-0.4898784961551428,0.2779173366725445,1.3612197960610501,7.454971208120696,▁▁▅▇▁
numeric,yac_epa,805500,0.2667740790300117,NA,NA,NA,NA,NA,-0.36660423146545995,1.924923089183378,-14,-0.8769923797808588,0,0.5234156415099278,9.848225327499676,▁▁▇▅▁
numeric,comp_air_epa,364010,0.6686510645657537,NA,NA,NA,NA,NA,0.05577522160824954,0.6080947516859125,-11.79624876496382,0,0,0,7.423614024184644,▁▁▁▇▁
numeric,comp_yac_epa,364019,0.668642872097363,NA,NA,NA,NA,NA,0.1588122908959473,0.5770241464224364,-10.914922855328768,0,0,0,9.848225327499676,▁▁▇▁▁
numeric,total_home_comp_air_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.1715972755615049,6.026821816064234,-34.2570441190619,-3.49196311540436,0,3.224452398380345,30.63045173761202,▁▁▇▂▁
numeric,total_away_comp_air_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.1715972755615049,6.026821816064234,-30.63045173761202,-3.224452398380345,0,3.49196311540436,34.2570441190619,▁▂▇▁▁
numeric,total_home_comp_yac_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.06711449665539697,6.386129449480004,-32.63186197145842,-3.6308591519482434,0,3.5630386369884945,35.91268990805838,▁▂▇▁▁
numeric,total_away_comp_yac_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.06711449665539697,6.386129449480004,-35.91268990805838,-3.5630386369884945,0,3.6308591519482434,32.63186197145842,▁▁▇▂▁
numeric,total_home_raw_air_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.5431016530492653,9.369514500698392,-67.52350558768376,-5.610673192248214,-0.13607840356417,4.56955827208003,62.42438889516052,▁▁▇▁▁
numeric,total_away_raw_air_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.5431016530492653,9.369514500698392,-62.42438889516052,-4.56955827208003,0.13607840356417,5.610673192248214,67.52350558768376,▁▁▇▁▁
numeric,total_home_raw_yac_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.28727892012474604,12.25002005575581,-66.52721119566374,-6.30992435511531,0,6.773971046050304,83.97783668081532,▁▃▇▁▁
numeric,total_away_raw_yac_epa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.28727892012474604,12.25002005575581,-83.97783668081532,-6.773971046050304,0,6.30992435511531,66.52721119566374,▁▁▇▃▁
numeric,wp,6298,0.994267092675023,NA,NA,NA,NA,NA,0.5092682336602672,0.3000892471142395,0,0.2650805339217186,0.5197814106941223,0.7546021823708784,0.999977398025294,▇▆▇▇▇
numeric,def_wp,6298,0.994267092675023,NA,NA,NA,NA,NA,0.49073176633973276,0.3000892471142395,2.2601974706049077e-5,0.24539781762912163,0.4802185893058777,0.7349194660782814,1,▇▇▇▆▇
numeric,home_wp,0,1,NA,NA,NA,NA,NA,0.5651212434687914,0.29463946094535143,0,0.34911757707595825,0.58470419049263,0.8092272281646729,1,▅▅▇▇▇
numeric,away_wp,0,1,NA,NA,NA,NA,NA,0.43487875653120855,0.29463946094535143,0,0.19077277183532715,0.41529580950737,0.6508824229240417,1,▇▇▇▅▅
numeric,wpa,14462,0.9868356135703688,NA,NA,NA,NA,NA,2.3231512244109806e-4,0.041179407416532865,-0.999494194984436,-0.01393437385559082,-0,0.01011665165424347,0.9997061546600889,▁▁▇▁▁
numeric,vegas_wpa,14462,0.9868356135703688,NA,NA,NA,NA,NA,5.26771687176859e-4,0.0399261106770947,-0.9998733997344971,-0.011666126549243927,0,0.008488729596138,0.9998747144678148,▁▁▇▁▁
numeric,vegas_home_wpa,6134,0.9944163776545873,NA,NA,NA,NA,NA,4.930193089403416e-6,0.039854870927841776,-0.9998747144678148,-0.010171317495405674,0,0.010242700576782227,0.9998733997344971,▁▁▇▁▁
numeric,home_wp_post,84757,0.9228478840674695,NA,NA,NA,NA,NA,0.5654515316738287,0.2932497903348318,0,0.3504820764064789,0.588352620601654,0.8072708249092102,1,▅▅▇▇▇
numeric,away_wp_post,84757,0.9228478840674695,NA,NA,NA,NA,NA,0.4345613652772663,0.293342597120003,-0.9994123093201779,0.19272546470165253,0.41165006160736084,0.6495179235935211,1.9977154731750488,▁▃▇▂▁
numeric,vegas_wp,6298,0.994267092675023,NA,NA,NA,NA,NA,0.5093007509900243,0.3234289117543515,0,0.22046128660440445,0.5167959928512573,0.799371525645256,0.9999985694885254,▇▆▆▆▇
numeric,vegas_home_wp,0,1,NA,NA,NA,NA,NA,0.5658029819469882,0.3181085215410204,0,0.2971372455358505,0.60519078373909,0.8528302535414696,1,▅▃▅▅▇
numeric,total_home_rush_wpa,0,1,NA,NA,NA,NA,NA,-0.005736035352490824,0.15295769414640503,-1.0790940578000532,-0.08720871806144714,0,0.08142374248139406,1.0827744475655492,▁▁▇▁▁
numeric,total_away_rush_wpa,0,1,NA,NA,NA,NA,NA,0.005736035352490824,0.15295769414640503,-1.0827744475655492,-0.08142374248139406,0,0.08720871806144714,1.0790940578000532,▁▁▇▁▁
numeric,total_home_pass_wpa,0,1,NA,NA,NA,NA,NA,-0.011704553292779056,0.2612775797505144,-1.8669247376815488,-0.1780981719493866,0.0036022216081619263,0.1692317584797332,1.7035376865755814,▁▁▇▁▁
numeric,total_away_pass_wpa,0,1,NA,NA,NA,NA,NA,0.011704553292779056,0.2612775797505144,-1.7035376865755814,-0.1692317584797332,-0.0036022216081619263,0.1780981719493866,1.8669247376815488,▁▁▇▁▁
numeric,air_wpa,805469,0.2668022975322465,NA,NA,NA,NA,NA,0.0031944892416476147,0.04235070909282636,-0.9980781078338623,0,0,0,0.9912653528153896,▁▁▇▁▁
numeric,yac_wpa,805469,0.2668022975322465,NA,NA,NA,NA,NA,3.550292191821682e-4,0.05655707094397412,-0.99085608497262,-0.015901625156402588,0,0.011798828840255737,1,▁▁▇▁▁
numeric,comp_air_wpa,364008,0.668652885114285,NA,NA,NA,NA,NA,0.001331281430309251,0.021105782500084776,-0.99611663818359375,0,0,0,0.9882552844937891,▁▁▇▁▁
numeric,comp_yac_wpa,364008,0.668652885114285,NA,NA,NA,NA,NA,0.004087812878937085,0.026691557398439505,-0.9883022714639083,0,0,0,1,▁▁▇▁▁
numeric,total_home_comp_air_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.006595088517105849,0.1518415145504115,-1.9444250103439105,-0.04968388275176994,0,0.04606897134688148,3.0175069247978037,▁▇▃▁▁
numeric,total_away_comp_air_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.006595088517105849,0.1518415145504115,-3.0175069247978037,-0.04606897134688148,0,0.04968388275176994,1.9444250103439105,▁▁▃▇▁
numeric,total_home_comp_yac_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-1.2000175624890671e-4,0.23046011134609773,-2.555464788224241,-0.11639988422393799,0,0.11682116985321045,2.106908860423353,▁▁▇▁▁
numeric,total_away_comp_yac_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,1.2000175624890671e-4,0.23046011134609773,-2.106908860423353,-0.11682116985321045,0,0.11639988422393799,2.555464788224241,▁▁▇▁▁
numeric,total_home_raw_air_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.011168386565439632,0.1927320921131443,-2.9340697821495256,-0.05357837677001953,0,0.04698159838693505,3.905425994589482,▁▁▇▁▁
numeric,total_away_raw_air_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.011168386565439632,0.1927320921131443,-3.905425994589482,-0.04698159838693505,0,0.05357837677001953,2.9340697821495256,▁▁▇▁▁
numeric,total_home_raw_yac_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,-0.0026903383881771385,0.30053654108944405,-3.7097175012884205,-0.1533583104610443,0,0.14303510306494469,3.265507598971233,▁▁▇▁▁
numeric,total_away_raw_yac_wpa,328090,0.7013481161874073,NA,NA,NA,NA,NA,0.0026903383881771385,0.30053654108944405,-3.265507598971233,-0.14303510306494469,0,0.1533583104610443,3.7097175012884205,▁▁▇▁▁
numeric,punt_blocked,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.024679283809378e-4,0.017388983007877862,0,0,0,0,1,▇▁▁▁▁
numeric,first_down_rush,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.07168966981379617,0.25797349583877194,0,0,0,0,1,▇▁▁▁▁
numeric,first_down_pass,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.13458868655448952,0.3412838747160773,0,0,0,0,1,▇▁▁▁▁
numeric,first_down_penalty,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.019788272450133332,0.13927209063948912,0,0,0,0,1,▇▁▁▁▁
numeric,third_down_converted,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.06017108044024827,0.23780364899861764,0,0,0,0,1,▇▁▁▁▁
numeric,third_down_failed,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.09510087830198129,0.2933543648216251,0,0,0,0,1,▇▁▁▁▁
numeric,fourth_down_converted,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.005742119851723924,0.07555893963498897,0,0,0,0,1,▇▁▁▁▁
numeric,fourth_down_failed,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.00593295135227972,0.07679685584657968,0,0,0,0,1,▇▁▁▁▁
numeric,incomplete_pass,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.14255017675767742,0.34961370180594376,0,0,0,0,1,▇▁▁▁▁
numeric,touchback,0,1,NA,NA,NA,NA,NA,0.023482345230618002,0.15142967201489163,0,0,0,0,1,▇▁▁▁▁
numeric,interception,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.01112929311241407,0.10490682746207507,0,0,0,0,1,▇▁▁▁▁
numeric,punt_inside_twenty,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.017976327352356058,0.132865329756134,0,0,0,0,1,▇▁▁▁▁
numeric,punt_in_endzone,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.339551259726445e-5,0.005778791326959958,0,0,0,0,1,▇▁▁▁▁
numeric,punt_out_of_bounds,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0051133300573925735,0.07132453131353966,0,0,0,0,1,▇▁▁▁▁
numeric,punt_downed,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.007430978631642728,0.08588228121016313,0,0,0,0,1,▇▁▁▁▁
numeric,punt_fair_catch,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.012493738341388016,0.11107500445965617,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_inside_twenty,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0064730044988526225,0.08019420707197665,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_in_endzone,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.011002390164544478,0.10431369976053685,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_out_of_bounds,50525,0.9540083927287292,NA,NA,NA,NA,NA,7.356554346425972e-4,0.0271130032851308,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_downed,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.8166300111159326e-5,0.006177773050218164,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_fair_catch,50525,0.9540083927287292,NA,NA,NA,NA,NA,9.636990778067718e-5,0.009816349248312638,0,0,0,0,1,▇▁▁▁▁
numeric,fumble_forced,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.01097281128195831,0.10417494444176967,0,0,0,0,1,▇▁▁▁▁
numeric,fumble_not_forced,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.005325153023009507,0.072779123533286,0,0,0,0,1,▇▁▁▁▁
numeric,fumble_out_of_bounds,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.001254717116154364,0.03539977396551467,0,0,0,0,1,▇▁▁▁▁
numeric,solo_tackle,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.4764241993425855,0.49944411061962346,0,0,0,1,1,▇▁▁▁▇
numeric,safety,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.730755835865827e-4,0.019311570470469743,0,0,0,0,1,▇▁▁▁▁
numeric,penalty,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.07373538349975431,0.2613398972362713,0,0,0,0,1,▇▁▁▁▁
numeric,tackled_for_loss,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.029865129836982186,0.17021525049721078,0,0,0,0,1,▇▁▁▁▁
numeric,fumble_lost,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.007655205644795784,0.08715853784738917,0,0,0,0,1,▇▁▁▁▁
numeric,own_kickoff_recovery,50525,0.9540083927287292,NA,NA,NA,NA,NA,2.4712679321975634e-4,0.015718331886929632,0,0,0,0,1,▇▁▁▁▁
numeric,own_kickoff_recovery_td,50525,0.9540083927287292,NA,NA,NA,NA,NA,9.54157502778982e-7,9.768098600950877e-4,0,0,0,0,1,▇▁▁▁▁
numeric,qb_hit,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.04588161767863027,0.20922843164238794,0,0,0,0,1,▇▁▁▁▁
numeric,rush_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.3186847892981695,0.4659667386623199,0,0,0,1,1,▇▁▁▁▃
numeric,pass_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.4246687880768479,0.49429287030579566,0,0,0,1,1,▇▁▁▁▆
numeric,sack,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.026963536871031302,0.16197694152013542,0,0,0,0,1,▇▁▁▁▁
numeric,touchdown,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0290970330472451,0.16807862050375538,0,0,0,0,1,▇▁▁▁▁
numeric,pass_touchdown,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.016936295674326952,0.1290328386472942,0,0,0,0,1,▇▁▁▁▁
numeric,rush_touchdown,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.00968946944072058,0.09795709662084069,0,0,0,0,1,▇▁▁▁▁
numeric,return_touchdown,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0022794822741389917,0.04768949994212801,0,0,0,0,1,▇▁▁▁▁
numeric,extra_point_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.027105706338945365,0.16239153975780735,0,0,0,0,1,▇▁▁▁▁
numeric,two_point_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0018749194929607034,0.04325974983135766,0,0,0,0,1,▇▁▁▁▁
numeric,field_goal_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.022192749357136384,0.14731005375929296,0,0,0,0,1,▇▁▁▁▁
numeric,kickoff_attempt,50498,0.9540329701339013,NA,NA,NA,NA,NA,0.05856372462960561,0.2348064466583969,0,0,0,0,1,▇▁▁▁▁
numeric,punt_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.0547314285169053,0.22745537719220404,0,0,0,0,1,▇▁▁▁▁
numeric,fumble,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.016238806539795515,0.12639273295859219,0,0,0,0,1,▇▁▁▁▁
numeric,complete_pass,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.24267374015428733,0.4286996283554495,0,0,0,0,1,▇▁▁▁▂
numeric,assist_tackle,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.13535201255671275,0.3420993377960261,0,0,0,0,1,▇▁▁▁▁
numeric,lateral_reception,50525,0.9540083927287292,NA,NA,NA,NA,NA,2.1277712311971333e-4,0.014585336883167023,0,0,0,0,1,▇▁▁▁▁
numeric,lateral_rush,50525,0.9540083927287292,NA,NA,NA,NA,NA,3.721214260838033e-5,0.00610006502996325,0,0,0,0,1,▇▁▁▁▁
numeric,lateral_return,50525,0.9540083927287292,NA,NA,NA,NA,NA,2.776598333086835e-4,0.016660822404177992,0,0,0,0,1,▇▁▁▁▁
numeric,lateral_recovery,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.001992280865802518,0.044590509975986924,0,0,0,0,1,▇▁▁▁▁
numeric,passing_yards,844237,0.23151278480206083,NA,NA,NA,NA,NA,11.454679494992785,10.155601152890332,-22,5,9,15,99,▁▇▁▁▁
numeric,receiving_yards,844237,0.23151278480206083,NA,NA,NA,NA,NA,11.449501244431513,10.148341932961369,-22,5,9,15,99,▁▇▁▁▁
numeric,rushing_yards,765120,0.30353095387640294,NA,NA,NA,NA,NA,4.177474883790673,6.277634572361271,-34,1,3,6,99,▁▇▁▁▁
numeric,lateral_receiving_yards,1098347,2.0299116123689842e-4,NA,NA,NA,NA,NA,5.941704035874439,10.321900567115344,-21,0,4,10,62,▁▇▂▁▁
numeric,lateral_rushing_yards,1098531,3.550069635982478e-5,NA,NA,NA,NA,NA,9.692307692307692,9.878615517086033,0,4,6,12,44,▇▃▁▁▁
numeric,tackle_with_assist,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.07944029120886999,0.2704248529974212,0,0,0,0,1,▇▁▁▁▁
numeric,fumble_recovery_1_yards,1082877,0.014284934050629472,NA,NA,NA,NA,NA,2.3068246989103423,9.281173192373814,-100,0,0,0,104,▁▁▇▁▁
numeric,fumble_recovery_2_yards,1098447,1.1196373467325937e-4,NA,NA,NA,NA,NA,3.926829268292683,12.171702422588814,-16,0,0,1.5,77,▇▂▁▁▁
numeric,return_yards,50525,0.9540083927287292,NA,NA,NA,NA,NA,1.215521280097707,5.810803247891786,-100,0,0,0,109,▁▁▇▁▁
numeric,penalty_yards,1021292,0.07034417469983711,NA,NA,NA,NA,NA,8.364502186909599,5.267519745457818,0,5,5,10,66,▇▂▁▁▁
numeric,replay_or_challenge,0,1,NA,NA,NA,NA,NA,0.0074205558134665915,0.08582247881242441,0,0,0,0,1,▇▁▁▁▁
numeric,defensive_two_point_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,4.7707875138949524e-5,0.006906927291951021,0,0,0,0,1,▇▁▁▁▁
numeric,defensive_two_point_conv,50525,0.9540083927287292,NA,NA,NA,NA,NA,1.0495732530568788e-5,0.0032396963414267495,0,0,0,0,1,▇▁▁▁▁
numeric,defensive_extra_point_attempt,50525,0.9540083927287292,NA,NA,NA,NA,NA,0,0,0,0,0,0,0,▁▁▇▁▁
numeric,defensive_extra_point_conv,50525,0.9540083927287292,NA,NA,NA,NA,NA,0,0,0,0,0,0,0,▁▁▇▁▁
numeric,season,0,1,NA,NA,NA,NA,NA,2010.1182082161356,6.627584365671061,1999,2004,2010,2016,2021,▇▆▇▆▇
numeric,cp,811614,0.2612086621699118,NA,NA,NA,NA,NA,0.6348154096544791,0.17004701657348936,0.09544014185667038,0.5279483497142792,0.6781390309333801,0.7639535367488861,0.926070511341095,▁▃▃▇▆
numeric,cpoe,811614,0.2612086621699118,NA,NA,NA,NA,NA,0.06548366550596109,44.90345874861157,-92.05909371376038,-44.26913559436798,20.371761918067932,32.09476172924042,85.26444137096405,▃▃▁▇▁
numeric,series,0,1,NA,NA,NA,NA,NA,29.29300545254285,17.06057955007624,1,15,29,44,82,▇▇▇▃▁
numeric,series_success,0,1,NA,NA,NA,NA,NA,0.550455592269951,0.49744794547782784,0,0,1,1,1,▆▁▁▁▇
numeric,order_sequence,91628,0.916593389588283,NA,NA,NA,NA,NA,2089.286688328847,1223.2124087724426,1,1038,2071,3110,5921,▇▇▇▃▁
numeric,play_deleted,91628,0.916593389588283,NA,NA,NA,NA,NA,0,0,0,0,0,0,0,▁▁▇▁▁
numeric,special_teams_play,91628,0.916593389588283,NA,NA,NA,NA,NA,0.15910747590228633,0.36577646163240873,0,0,0,0,1,▇▁▁▁▂
numeric,fixed_drive,0,1,NA,NA,NA,NA,NA,12.20017932403033,7.136661236285931,1,6,12,18,38,▇▆▆▁▁
numeric,drive_play_count,14081,0.9871824280655762,NA,NA,NA,NA,NA,7.189677350346567,3.602393232327467,0,4,7,10,24,▅▇▅▁▁
numeric,drive_first_downs,14081,0.9871824280655762,NA,NA,NA,NA,NA,2.3803994323593884,1.8755856899745,0,1,2,4,9,▇▇▅▁▁
numeric,drive_inside20,14081,0.9871824280655762,NA,NA,NA,NA,NA,0.37966636821581407,0.4853040636890067,0,0,0,1,1,▇▁▁▁▅
numeric,drive_ended_with_score,14081,0.9871824280655762,NA,NA,NA,NA,NA,0.4375959553301142,0.4960906793535729,0,0,0,1,1,▇▁▁▁▆
numeric,drive_quarter_start,14081,0.9871824280655762,NA,NA,NA,NA,NA,2.5190509078469216,1.130511492046773,1,2,3,4,5,▇▇▇▇▁
numeric,drive_quarter_end,14081,0.9871824280655762,NA,NA,NA,NA,NA,2.6349700181375746,1.1228361579111892,1,2,3,4,6,▇▃▅▁▁
numeric,drive_yards_penalized,105709,0.9037758176538591,NA,NA,NA,NA,NA,0.0910671282284217,7.611621086271443,-40,0,0,0,77,▁▇▁▁▁
numeric,drive_play_id_started,14081,0.9871824280655762,NA,NA,NA,NA,NA,1994.7812905432884,1209.712470152506,10,953,1986,3005,5764,▇▇▇▃▁
numeric,drive_play_id_ended,14081,0.9871824280655762,NA,NA,NA,NA,NA,2192.7340221984737,1213.3905594679325,34,1149,2164,3206,5921,▇▇▇▅▁
numeric,away_score,0,1,NA,NA,NA,NA,NA,21.140365202035373,10.101971087488094,0,14,21,28,59,▃▇▆▂▁
numeric,home_score,0,1,NA,NA,NA,NA,NA,23.40198530817335,10.330063582737433,0,16,23,30,62,▂▇▆▂▁
numeric,result,0,1,NA,NA,NA,NA,NA,2.2616201061379795,14.548871420505362,-49,-7,3,11,59,▁▃▇▂▁
numeric,total,0,1,NA,NA,NA,NA,NA,44.54235051020873,14.34748784077857,3,34,44,54,106,▁▇▆▂▁
numeric,spread_line,0,1,NA,NA,NA,NA,NA,2.339101286217539,5.952923025351251,-19,-3,3,6.5,27,▁▅▇▂▁
numeric,total_line,0,1,NA,NA,NA,NA,NA,43.481633851279376,5.017518021658727,30,40,43.5,47,63.5,▂▇▇▂▁
numeric,div_game,0,1,NA,NA,NA,NA,NA,0.380814149303185,0.4855872193867445,0,0,0,1,1,▇▁▁▁▅
numeric,temp,317103,0.7113492995439525,NA,NA,NA,NA,NA,58.22617333809361,16.860510369502123,-6,46,59,71,109,▁▃▇▇▁
numeric,wind,317103,0.7113492995439525,NA,NA,NA,NA,NA,8.494006784675488,5.383877047778842,0,5,8,12,71,▇▁▁▁▁
numeric,aborted_play,0,1,NA,NA,NA,NA,NA,0.002607935771047818,0.051001341255008464,0,0,0,0,1,▇▁▁▁▁
numeric,success,18018,0.9835986782817663,NA,NA,NA,NA,NA,0.41713494584249533,0.49308580166962673,0,0,0,1,1,▇▁▁▁▆
numeric,passer_jersey_number,656502,0.40240312406127965,NA,NA,NA,NA,NA,9.03425717310459,4.860778988279489,1,5,9,12,92,▇▁▁▁▁
numeric,rusher_jersey_number,794957,0.2763711006126146,NA,NA,NA,NA,NA,27.7503301900775,10.75144904832512,1,22,28,33,99,▂▇▁▁▁
numeric,receiver_jersey_number,702732,0.3603211447609165,NA,NA,NA,NA,NA,53.07991400522436,32.341174900030225,1,18,80,84,99,▅▃▁▁▇
numeric,pass,0,1,NA,NA,NA,NA,NA,0.43929380922471944,0.49630130225070546,0,0,0,1,1,▇▁▁▁▆
numeric,rush,0,1,NA,NA,NA,NA,NA,0.2900707283104399,0.4537949849222303,0,0,0,1,1,▇▁▁▁▃
numeric,first_down,50525,0.9540083927287292,NA,NA,NA,NA,NA,0.223826267001894,0.41680719159581336,0,0,0,0,1,▇▁▁▁▂
numeric,special,0,1,NA,NA,NA,NA,NA,0.15393466051321264,0.360886269285787,0,0,0,0,1,▇▁▁▁▂
numeric,play,0,1,NA,NA,NA,NA,NA,0.752477311413929,0.43157314184813556,0,1,1,1,1,▂▁▁▁▇
numeric,jersey_number,367370,0.6655925430332159,NA,NA,NA,NA,NA,16.811021608315098,12.101034042875968,1,8,12,26,99,▇▅▁▁▁
numeric,out_of_bounds,0,1,NA,NA,NA,NA,NA,0.06955041554020228,0.25438792059600834,0,0,0,0,1,▇▁▁▁▁
numeric,home_opening_kickoff,0,1,NA,NA,NA,NA,NA,0.48775408030439565,0.4998502424557511,0,0,0,1,1,▇▁▁▁▇
numeric,qb_epa,18018,0.9835986782817663,NA,NA,NA,NA,NA,-7.359714411560366e-4,1.2454625465492268,-13.58485902735265,-0.5353464668150991,0,0.5254294750135534,9.579868719680235,▁▁▇▃▁
numeric,xyac_epa,831471,0.24313334607717307,NA,NA,NA,NA,NA,0.6872020672438497,0.5040984516422667,-1.2820399739124755,0.30739913538713626,0.5661891744628638,0.920687893258089,13.028142577118448,▇▁▁▁▁
numeric,xyac_mean_yardage,831419,0.2431806803389861,NA,NA,NA,NA,NA,5.217610647958738,2.983742858931366,-77.34254090196919,3.6430345882508846,4.496286971581867,6.384836019860813,78.85111491640419,▁▁▇▁▁
numeric,xyac_median_yardage,831419,0.2431806803389861,NA,NA,NA,NA,NA,3.4332418744455384,2.3961516593295906,0,2,3,5,48,▇▁▁▁▁
numeric,xyac_success,831419,0.2431806803389861,NA,NA,NA,NA,NA,0.7963682448146061,0.24753272979377464,0.010269990365486592,0.5846049897518242,0.9878865564242005,1,1,▁▁▃▂▇
numeric,xyac_fd,831419,0.2431806803389861,NA,NA,NA,NA,NA,0.598139960266696,0.35654205947727374,0,0.24512977158883587,0.5134540184517391,0.9991877254215069,1,▂▆▂▁▇
numeric,xpass,522120,0.5247276004260084,NA,NA,NA,NA,NA,0.6137813038965859,0.2410066250760552,0.01067630760371685,0.44443003088235855,0.5740557610988617,0.8400976061820984,0.998187243938446,▁▃▇▅▆
numeric,pass_oe,538745,0.5095942907598059,NA,NA,NA,NA,NA,-0.26308551422534004,41.94502679344846,-99.23535585403442,-41.299596428871155,4.3928563594818115,34.751224517822266,97.72196300327778,▂▇▇▇▂


================================================
FILE: data-raw/replace_models.R
================================================
# Helper function to replace the internal calls to the models
# with a call to the fastrmodels package
models <- c(
  "ep_model,",
  "wp_model,",
  "wp_model_spread,",
  "fg_model,",
  "cp_model,",
  "xyac_model,",
  "xpass_model,"
)

purrr::walk(models, function(model) {
  xfun::gsub_dir(
    # paste0(model,"(?![:alpha:]+)"),
    model,
    paste0("fastrmodels::", model),
    dir = usethis::proj_path("R"),
    ext = "R"
  )
})


================================================
FILE: data-raw/teams_colors_logos.R
================================================
teams_colors_logos <- nflreadr::load_teams()

use_data(teams_colors_logos, overwrite = TRUE)


================================================
FILE: data-raw/tidy_play_stats_row.R
================================================
# Script to create the tidy_play_stats_row tibble that is used in
# the internal function `sum_play_stats`

library(tidyverse)

tidy_play_stats_row <-
  as_tibble_row(
    matrix(NA, ncol = length(pbp_stat_columns)),
    .name_repair = "minimal"
  ) |>
  set_names(pbp_stat_columns) |>
  modify_at(indicator_stats, function(x) {
    x <- 0
  }) |>
  modify_if(is.na, function(x) {
    x <- NA_character_
  }) |>
  modify_at(
    c(
      "air_yards",
      "yards_after_catch",
      "penalty_yards",
      "kick_distance",
      "fumble_recovery_1_yards",
      "fumble_recovery_2_yards",
      "rushing_yards",
      "lateral_rushing_yards",
      "passing_yards",
      "receiving_yards",
      "lateral_receiving_yards"
    ),
    as.integer
  )

tidy_play_stats_row <- nflfastR:::tidy_play_stats_row
scramble_fix <- readRDS("data-raw/scramble_fix.rds")
default_play <- readRDS("data-raw/pbp_defaultplay.rds")
usethis::use_data(
  tidy_play_stats_row,
  scramble_fix,
  default_play,
  internal = TRUE,
  overwrite = TRUE
)

# stats character vectors -------------------------------------------------

pbp_stat_columns <-
  c(
    # "play_id",
    "punt_blocked",
    "first_down_rush",
    "first_down_pass",
    "first_down_penalty",
    "third_down_converted",
    "third_down_failed",
    "fourth_down_converted",
    "fourth_down_failed",
    "incomplete_pass",
    "interception",
    "punt_inside_twenty",
    "punt_in_endzone",
    "punt_out_of_bounds",
    "punt_downed",
    "punt_fair_catch",
    "kickoff_inside_twenty",
    "kickoff_in_endzone",
    "kickoff_out_of_bounds",
    "kickoff_fair_catch",
    "fumble_forced",
    "fumble_not_forced",
    "fumble_out_of_bounds",
    "timeout",
    "field_goal_missed",
    "field_goal_made",
    "field_goal_blocked",
    "extra_point_good",
    "extra_point_failed",
    "extra_point_blocked",
    "two_point_rush_good",
    "two_point_rush_failed",
    "two_point_pass_good",
    "two_point_pass_failed",
    "solo_tackle",
    "safety",
    "penalty",
    "tackled_for_loss",
    "extra_point_safety",
    "two_point_rush_safety",
    "two_point_pass_safety",
    "kickoff_downed",
    "two_point_pass_reception_good",
    "two_point_pass_reception_failed",
    "fumble_lost",
    "own_kickoff_recovery",
    "own_kickoff_recovery_td",
    "qb_hit",
    "extra_point_aborted",
    "two_point_return",
    "rush_attempt",
    "pass_attempt",
    "sack",
    "touchdown",
    "pass_touchdown",
    "rush_touchdown",
    "return_touchdown",
    "extra_point_attempt",
    "two_point_attempt",
    "field_goal_attempt",
    "kickoff_attempt",
    "punt_attempt",
    "fumble",
    "complete_pass",
    "assist_tackle",
    "lateral_reception",
    "lateral_rush",
    "lateral_return",
    "lateral_recovery",
    "passer_player_id",
    "passer_player_name",
    "receiver_player_id",
    "receiver_player_name",
    "rusher_player_id",
    "rusher_player_name",
    "lateral_receiver_player_id",
    "lateral_receiver_player_name",
    "lateral_rusher_player_id",
    "lateral_rusher_player_name",
    "lateral_sack_player_id",
    "lateral_sack_player_name",
    "interception_player_id",
    "interception_player_name",
    "lateral_interception_player_id",
    "lateral_interception_player_name",
    "punt_returner_player_id",
    "punt_returner_player_name",
    "lateral_punt_returner_player_id",
    "lateral_punt_returner_player_name",
    "kickoff_returner_player_name",
    "kickoff_returner_player_id",
    "lateral_kickoff_returner_player_id",
    "lateral_kickoff_returner_player_name",
    "punter_player_id",
    "punter_player_name",
    "kicker_player_name",
    "kicker_player_id",
    "own_kickoff_recovery_player_id",
    "own_kickoff_recovery_player_name",
    "blocked_player_id",
    "blocked_player_name",
    "tackle_for_loss_1_player_id",
    "tackle_for_loss_1_player_name",
    "tackle_for_loss_2_player_id",
    "tackle_for_loss_2_player_name",
    "qb_hit_1_player_id",
    "qb_hit_1_player_name",
    "qb_hit_2_player_id",
    "qb_hit_2_player_name",
    "forced_fumble_player_1_team",
    "forced_fumble_player_1_player_id",
    "forced_fumble_player_1_player_name",
    "forced_fumble_player_2_team",
    "forced_fumble_player_2_player_id",
    "forced_fumble_player_2_player_name",
    "solo_tackle_1_team",
    "solo_tackle_2_team",
    "solo_tackle_1_player_id",
    "solo_tackle_2_player_id",
    "solo_tackle_1_player_name",
    "solo_tackle_2_player_name",
    "assist_tackle_1_player_id",
    "assist_tackle_1_player_name",
    "assist_tackle_1_team",
    "assist_tackle_2_player_id",
    "assist_tackle_2_player_name",
    "assist_tackle_2_team",
    "assist_tackle_3_player_id",
    "assist_tackle_3_player_name",
    "assist_tackle_3_team",
    "assist_tackle_4_player_id",
    "assist_tackle_4_player_name",
    "assist_tackle_4_team",
    # new for stat ID 80 -> tackle_with_assist
    "tackle_with_assist",
    "tackle_with_assist_1_player_id",
    "tackle_with_assist_1_player_name",
    "tackle_with_assist_1_team",
    "tackle_with_assist_2_player_id",
    "tackle_with_assist_2_player_name",
    "tackle_with_assist_2_team",

    "pass_defense_1_player_id",
    "pass_defense_1_player_name",
    "pass_defense_2_player_id",
    "pass_defense_2_player_name",
    "fumbled_1_team",
    "fumbled_1_player_id",
    "fumbled_1_player_name",
    "fumbled_2_player_id",
    "fumbled_2_player_name",
    "fumbled_2_team",
    "fumble_recovery_1_team",
    "fumble_recovery_1_yards",
    "fumble_recovery_1_player_id",
    "fumble_recovery_1_player_name",
    "fumble_recovery_2_team",
    "fumble_recovery_2_yards",
    "fumble_recovery_2_player_id",
    "fumble_recovery_2_player_name",
    "td_team",
    "return_team",
    "timeout_team",
    "yards_gained",
    "return_yards",
    "air_yards",
    "yards_after_catch",
    "penalty_team",
    "penalty_player_id",
    "penalty_player_name",
    "penalty_yards",
    "kick_distance",
    "defensive_two_point_attempt",
    "defensive_two_point_conv",
    "defensive_extra_point_attempt",
    "defensive_extra_point_conv",
    "penalty_fix",
    "return_penalty_fix",
    #new in nflfastR v4.0
    "rushing_yards",
    "lateral_rushing_yards",
    "passing_yards",
    "receiving_yards",
    "lateral_receiving_yards",
    # new in nflfastR v4.1
    "td_player_id",
    "td_player_name",
    "sack_player_id",
    "sack_player_name",
    "half_sack_1_player_id",
    "half_sack_1_player_name",
    "half_sack_2_player_id",
    "half_sack_2_player_name",
    # new in nflfastR > v4.1
    "safety_player_name",
    "safety_player_id"
  )

indicator_stats <- c(
  "punt_blocked",
  "first_down_rush",
  "first_down_pass",
  "first_down_penalty",
  "third_down_converted",
  "third_down_failed",
  "fourth_down_converted",
  "fourth_down_failed",
  "incomplete_pass",
  "interception",
  "punt_inside_twenty",
  "punt_in_endzone",
  "punt_out_of_bounds",
  "punt_downed",
  "punt_fair_catch",
  "kickoff_inside_twenty",
  "kickoff_in_endzone",
  "kickoff_out_of_bounds",
  "kickoff_fair_catch",
  "fumble_forced",
  "fumble_not_forced",
  "fumble_out_of_bounds",
  "timeout",
  "field_goal_missed",
  "field_goal_made",
  "field_goal_blocked",
  "extra_point_good",
  "extra_point_failed",
  "extra_point_blocked",
  "two_point_rush_good",
  "two_point_rush_failed",
  "two_point_pass_good",
  "two_point_pass_failed",
  "solo_tackle",
  "safety",
  "penalty",
  "tackled_for_loss",
  "extra_point_safety",
  "two_point_rush_safety",
  "two_point_pass_safety",
  "kickoff_downed",
  "two_point_pass_reception_good",
  "two_point_pass_reception_failed",
  "fumble_lost",
  "own_kickoff_recovery",
  "own_kickoff_recovery_td",
  "qb_hit",
  "extra_point_aborted",
  "two_point_return",
  "defensive_two_point_attempt",
  "defensive_two_point_conv",
  "defensive_extra_point_attempt",
  "defensive_extra_point_conv",
  "rush_attempt",
  "pass_attempt",
  "sack",
  "touchdown",
  "pass_touchdown",
  "rush_touchdown",
  "return_touchdown",
  "extra_point_attempt",
  "two_point_attempt",
  "field_goal_attempt",
  "kickoff_attempt",
  "punt_attempt",
  "fumble",
  "complete_pass",
  "assist_tackle",
  # new for stat ID 80 -> tackle_with_assist
  "tackle_with_assist",

  "lateral_reception",
  "lateral_rush",
  "lateral_return",
  "lateral_recovery",
  "penalty_fix",
  "yards_gained",
  "return_yards",
  "return_penalty_fix"
)


================================================
FILE: data-raw/variable_list.txt
================================================
#' \item{play_id}{Numeric play id that when used with game_id and drive provides the unique identifier for a single play.}
#' \item{game_id}{Ten digit identifier for NFL game.}
#' \item{old_game_id}{Legacy NFL game ID.}
#' \item{home_team}{String abbreviation for the home team.}
#' \item{away_team}{String abbreviation for the away team.}
#' \item{season_type}{'REG' or 'POST' indicating if the game belongs to regular or post season.}
#' \item{week}{Season week.}
#' \item{posteam}{String abbreviation for the team with possession.}
#' \item{posteam_type}{String indicating whether the posteam team is home or away.}
#' \item{defteam}{String abbreviation for the team on defense.}
#' \item{side_of_field}{String abbreviation for which team's side of the field the team with possession is currently on.}
#' \item{yardline_100}{Numeric distance in the number of yards from the opponent's endzone for the posteam.}
#' \item{game_date}{Date of the game.}
#' \item{quarter_seconds_remaining}{Numeric seconds remaining in the quarter.}
#' \item{half_seconds_remaining}{Numeric seconds remaining in the half.}
#' \item{game_seconds_remaining}{Numeric seconds remaining in the game.}
#' \item{game_half}{String indicating which half the play is in, either Half1, Half2, or Overtime.}
#' \item{quarter_end}{Binary indicator for whether or not the row of the data is marking the end of a quarter.}
#' \item{drive}{Numeric drive number in the game.}
#' \item{sp}{Binary indicator for whether or not a score occurred on the play.}
#' \item{qtr}{Quarter of the game (5 is overtime).}
#' \item{down}{The down for the given play.}
#' \item{goal_to_go}{Binary indicator for whether or not the posteam is in a goal down situation.}
#' \item{time}{Time at start of play provided in string format as minutes:seconds remaining in the quarter.}
#' \item{yrdln}{String indicating the current field position for a given play.}
#' \item{ydstogo}{Numeric yards in distance from either the first down marker or the endzone in goal down situations.}
#' \item{ydsnet}{Numeric value for total yards gained on the given drive.}
#' \item{desc}{Detailed string description for the given play.}
#' \item{play_type}{String indicating the type of play: pass (includes sacks), run (includes scrambles), punt, field_goal, kickoff, extra_point, qb_kneel, qb_spike, no_play (timeouts and penalties), and missing for rows indicating end of play.}
#' \item{yards_gained}{Numeric yards gained (or lost) by the possessing team, excluding yards gained via fumble recoveries and laterals.}
#' \item{shotgun}{Binary indicator for whether or not the play was in shotgun formation.}
#' \item{no_huddle}{Binary indicator for whether or not the play was in no_huddle formation.}
#' \item{qb_dropback}{Binary indicator for whether or not the QB dropped back on the play (pass attempt, sack, or scrambled).}
#' \item{qb_kneel}{Binary indicator for whether or not the QB took a knee.}
#' \item{qb_spike}{Binary indicator for whether or not the QB spiked the ball.}
#' \item{qb_scramble}{Binary indicator for whether or not the QB scrambled.}
#' \item{pass_length}{String indicator for pass length: short or deep.}
#' \item{pass_location}{String indicator for pass location: left, middle, or right.}
#' \item{air_yards}{Numeric value for distance in yards perpendicular to the line of scrimmage at where the targeted receiver either caught or didn't catch the ball.}
#' \item{yards_after_catch}{Numeric value for distance in yards perpendicular to the yard line where the receiver made the reception to where the play ended.}
#' \item{run_location}{String indicator for location of run: left, middle, or right.}
#' \item{run_gap}{String indicator for line gap of run: end, guard, or tackle}
#' \item{field_goal_result}{String indicator for result of field goal attempt: made, missed, or blocked.}
#' \item{kick_distance}{Numeric distance in yards for kickoffs, field goals, and punts.}
#' \item{extra_point_result}{String indicator for the result of the extra point attempt: good, failed, blocked, safety (touchback in defensive endzone is 1 point apparently), or aborted.}
#' \item{two_point_conv_result}{String indicator for result of two point conversion attempt: success, failure, safety (touchback in defensive endzone is 1 point apparently), or return.}
#' \item{home_timeouts_remaining}{Numeric timeouts remaining in the half for the home team.}
#' \item{away_timeouts_remaining}{Numeric timeouts remaining in the half for the away team.}
#' \item{timeout}{Binary indicator for whether or not a timeout was called by either team.}
#' \item{timeout_team}{String abbreviation for which team called the timeout.}
#' \item{td_team}{String abbreviation for which team scored the touchdown.}
#' \item{td_player_name}{String name of the player who scored a touchdown.}
#' \item{td_player_id}{Unique identifier of the player who scored a touchdown.}
#' \item{posteam_timeouts_remaining}{Number of timeouts remaining for the possession team.}
#' \item{defteam_timeouts_remaining}{Number of timeouts remaining for the team on defense.}
#' \item{total_home_score}{Score for the home team at the end of the play.}
#' \item{total_away_score}{Score for the away team at the end of the play.}
#' \item{posteam_score}{Score the posteam at the start of the play.}
#' \item{defteam_score}{Score the defteam at the start of the play.}
#' \item{score_differential}{Score differential between the posteam and defteam at the start of the play.}
#' \item{posteam_score_post}{Score for the posteam at the end of the play.}
#' \item{defteam_score_post}{Score for the defteam at the end of the play.}
#' \item{score_differential_post}{Score differential between the posteam and defteam at the end of the play.}
#' \item{no_score_prob}{Predicted probability of no score occurring for the rest of the half based on the expected points model.}
#' \item{opp_fg_prob}{Predicted probability of the defteam scoring a FG next.}
#' \item{opp_safety_prob}{Predicted probability of the defteam scoring a safety next.}
#' \item{opp_td_prob}{Predicted probability of the defteam scoring a TD next.}
#' \item{fg_prob}{Predicted probability of the posteam scoring a FG next.}
#' \item{safety_prob}{Predicted probability of the posteam scoring a safety next.}
#' \item{td_prob}{Predicted probability of the posteam scoring a TD next.}
#' \item{extra_point_prob}{Predicted probability of the posteam scoring an extra point.}
#' \item{two_point_conversion_prob}{Predicted probability of the posteam scoring the two point conversion.}
#' \item{ep}{Using the scoring event probabilities, the estimated expected points with respect to the possession team for the given play.}
#' \item{epa}{Expected points added (EPA) by the posteam for the given play.}
#' \item{total_home_epa}{Cumulative total EPA for the home team in the game so far.}
#' \item{total_away_epa}{Cumulative total EPA for the away team in the game so far.}
#' \item{total_home_rush_epa}{Cumulative total rushing EPA for the home team in the game so far.}
#' \item{total_away_rush_epa}{Cumulative total rushing EPA for the away team in the game so far.}
#' \item{total_home_pass_epa}{Cumulative total passing EPA for the home team in the game so far.}
#' \item{total_away_pass_epa}{Cumulative total passing EPA for the away team in the game so far.}
#' \item{air_epa}{EPA from the air yards alone. For completions this represents the actual value provided through the air. For incompletions this represents the hypothetical value that could've been added through the air if the pass was completed.}
#' \item{yac_epa}{EPA from the yards after catch alone. For completions this represents the actual value provided after the catch. For incompletions this represents the difference between the hypothetical air_epa and the play's raw observed EPA (how much the incomplete pass cost the posteam).}
#' \item{comp_air_epa}{EPA from the air yards alone only for completions.}
#' \item{comp_yac_epa}{EPA from the yards after catch alone only for completions.}
#' \item{total_home_comp_air_epa}{Cumulative total completions air EPA for the home team in the game so far.}
#' \item{total_away_comp_air_epa}{Cumulative total completions air EPA for the away team in the game so far.}
#' \item{total_home_comp_yac_epa}{Cumulative total completions yac EPA for the home team in the game so far.}
#' \item{total_away_comp_yac_epa}{Cumulative total completions yac EPA for the away team in the game so far.}
#' \item{total_home_raw_air_epa}{Cumulative total raw air EPA for the home team in the game so far.}
#' \item{total_away_raw_air_epa}{Cumulative total raw air EPA for the away team in the game so far.}
#' \item{total_home_raw_yac_epa}{Cumulative total raw yac EPA for the home team in the game so far.}
#' \item{total_away_raw_yac_epa}{Cumulative total raw yac EPA for the away team in the game so far.}
#' \item{wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play.}
#' \item{def_wp}{Estimated win probability for the defteam.}
#' \item{home_wp}{Estimated win probability for the home team.}
#' \item{away_wp}{Estimated win probability for the away team.}
#' \item{wpa}{Win probability added (WPA) for the posteam.}
#' \item{vegas_wpa}{Win probability added (WPA) for the posteam: spread_adjusted model.}
#' \item{vegas_home_wpa}{Win probability added (WPA) for the home team: spread_adjusted model.}
#' \item{home_wp_post}{Estimated win probability for the home team at the end of the play.}
#' \item{away_wp_post}{Estimated win probability for the away team at the end of the play.}
#' \item{vegas_wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.}
#' \item{vegas_home_wp}{Estimated win probability for the home team incorporating pre-game Vegas line.}
#' \item{total_home_rush_wpa}{Cumulative total rushing WPA for the home team in the game so far.}
#' \item{total_away_rush_wpa}{Cumulative total rushing WPA for the away team in the game so far.}
#' \item{total_home_pass_wpa}{Cumulative total passing WPA for the home team in the game so far.}
#' \item{total_away_pass_wpa}{Cumulative total passing WPA for the away team in the game so far.}
#' \item{air_wpa}{WPA through the air (same logic as air_epa).}
#' \item{yac_wpa}{WPA from yards after the catch (same logic as yac_epa).}
#' \item{comp_air_wpa}{The air_wpa for completions only.}
#' \item{comp_yac_wpa}{The yac_wpa for completions only.}
#' \item{total_home_comp_air_wpa}{Cumulative total completions air WPA for the home team in the game so far.}
#' \item{total_away_comp_air_wpa}{Cumulative total completions air WPA for the away team in the game so far.}
#' \item{total_home_comp_yac_wpa}{Cumulative total completions yac WPA for the home team in the game so far.}
#' \item{total_away_comp_yac_wpa}{Cumulative total completions yac WPA for the away team in the game so far.}
#' \item{total_home_raw_air_wpa}{Cumulative total raw air WPA for the home team in the game so far.}
#' \item{total_away_raw_air_wpa}{Cumulative total raw air WPA for the away team in the game so far.}
#' \item{total_home_raw_yac_wpa}{Cumulative total raw yac WPA for the home team in the game so far.}
#' \item{total_away_raw_yac_wpa}{Cumulative total raw yac WPA for the away team in the game so far.}
#' \item{punt_blocked}{Binary indicator for if the punt was blocked.}
#' \item{first_down_rush}{Binary indicator for if a running play converted the first down.}
#' \item{first_down_pass}{Binary indicator for if a passing play converted the first down.}
#' \item{first_down_penalty}{Binary indicator for if a penalty converted the first down.}
#' \item{third_down_converted}{Binary indicator for if the first down was converted on third down.}
#' \item{third_down_failed}{Binary indicator for if the posteam failed to convert first down on third down.}
#' \item{fourth_down_converted}{Binary indicator for if the first down was converted on fourth down.}
#' \item{fourth_down_failed}{Binary indicator for if the posteam failed to convert first down on fourth down.}
#' \item{incomplete_pass}{Binary indicator for if the pass was incomplete.}
#' \item{touchback}{Binary indicator for if a touchback occurred on the play.}
#' \item{interception}{Binary indicator for if the pass was intercepted.}
#' \item{punt_inside_twenty}{Binary indicator for if the punt ended inside the twenty yard line.}
#' \item{punt_in_endzone}{Binary indicator for if the punt was in the endzone.}
#' \item{punt_out_of_bounds}{Binary indicator for if the punt went out of bounds.}
#' \item{punt_downed}{Binary indicator for if the punt was downed.}
#' \item{punt_fair_catch}{Binary indicator for if the punt was caught with a fair catch.}
#' \item{kickoff_inside_twenty}{Binary indicator for if the kickoff ended inside the twenty yard line.}
#' \item{kickoff_in_endzone}{Binary indicator for if the kickoff was in the endzone.}
#' \item{kickoff_out_of_bounds}{Binary indicator for if the kickoff went out of bounds.}
#' \item{kickoff_downed}{Binary indicator for if the kickoff was downed.}
#' \item{kickoff_fair_catch}{Binary indicator for if the kickoff was caught with a fair catch.}
#' \item{fumble_forced}{Binary indicator for if the fumble was forced.}
#' \item{fumble_not_forced}{Binary indicator for if the fumble was not forced.}
#' \item{fumble_out_of_bounds}{Binary indicator for if the fumble went out of bounds.}
#' \item{solo_tackle}{Binary indicator if the play had a solo tackle (could be multiple due to fumbles).}
#' \item{safety}{Binary indicator for whether or not a safety occurred.}
#' \item{penalty}{Binary indicator for whether or not a penalty occurred.}
#' \item{tackled_for_loss}{Binary indicator for whether or not a tackle for loss on a run play occurred.}
#' \item{fumble_lost}{Binary indicator for if the fumble was lost.}
#' \item{own_kickoff_recovery}{Binary indicator for if the kicking team recovered the kickoff.}
#' \item{own_kickoff_recovery_td}{Binary indicator for if the kicking team recovered the kickoff and scored a TD.}
#' \item{qb_hit}{Binary indicator if the QB was hit on the play.}
#' \item{rush_attempt}{Binary indicator for if the play was a run.}
#' \item{pass_attempt}{Binary indicator for if the play was a pass attempt (includes sacks).}
#' \item{sack}{Binary indicator for if the play ended in a sack.}
#' \item{touchdown}{Binary indicator for if the play resulted in a TD.}
#' \item{pass_touchdown}{Binary indicator for if the play resulted in a passing TD.}
#' \item{rush_touchdown}{Binary indicator for if the play resulted in a rushing TD.}
#' \item{return_touchdown}{Binary indicator for if the play resulted in a return TD.}
#' \item{extra_point_attempt}{Binary indicator for extra point attempt.}
#' \item{two_point_attempt}{Binary indicator for two point conversion attempt.}
#' \item{field_goal_attempt}{Binary indicator for field goal attempt.}
#' \item{kickoff_attempt}{Binary indicator for kickoff.}
#' \item{punt_attempt}{Binary indicator for punts.}
#' \item{fumble}{Binary indicator for if a fumble occurred.}
#' \item{complete_pass}{Binary indicator for if the pass was completed.}
#' \item{assist_tackle}{Binary indicator for if an assist tackle occurred.}
#' \item{lateral_reception}{Binary indicator for if a lateral occurred on the reception.}
#' \item{lateral_rush}{Binary indicator for if a lateral occurred on a run.}
#' \item{lateral_return}{Binary indicator for if a lateral occurred on a return.}
#' \item{lateral_recovery}{Binary indicator for if a lateral occurred on a fumble recovery.}
#' \item{passer_player_id}{Unique identifier for the player that attempted the pass.}
#' \item{passer_player_name}{String name for the player that attempted the pass.}
#' \item{passing_yards}{Numeric yards by the passer_player_name, including yards gained in pass plays with laterals. This should equal official passing statistics.}
#' \item{receiver_player_id}{Unique identifier for the receiver that was targeted on the pass.}
#' \item{receiver_player_name}{String name for the targeted receiver.}
#' \item{receiving_yards}{Numeric yards by the receiver_player_name, excluding yards gained in pass plays with laterals. This should equal official receiving statistics but could miss yards gained in pass plays with laterals. Please see the description of `lateral_receiver_player_name` for further information.}
#' \item{rusher_player_id}{Unique identifier for the player that attempted the run.}
#' \item{rusher_player_name}{String name for the player that attempted the run.}
#' \item{rushing_yards}{Numeric yards by the rusher_player_name, excluding yards gained in rush plays with laterals. This should equal official rushing statistics but could miss yards gained in rush plays with laterals. Please see the description of `lateral_rusher_player_name` for further information.}
#' \item{lateral_receiver_player_id}{Unique identifier for the player that received the last(!) lateral on a pass play.}
#' \item{lateral_receiver_player_name}{String name for the player that received the last(!) lateral on a pass play. If there were multiple laterals in the same play, this will only be the last player who received a lateral. Please see <https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards> for a list of plays where multiple players recorded lateral receiving yards.}
#' \item{lateral_receiving_yards}{Numeric yards by the `lateral_receiver_player_name` in pass plays with laterals. Please see the description of `lateral_receiver_player_name` for further information.}
#' \item{lateral_rusher_player_id}{Unique identifier for the player that received the last(!) lateral on a run play.}
#' \item{lateral_rusher_player_name}{String name for the player that received the last(!) lateral on a run play. If there were multiple laterals in the same play, this will only be the last player who received a lateral. Please see <https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards> for a list of plays where multiple players recorded lateral rushing yards.}
#' \item{lateral_rushing_yards}{Numeric yards by the `lateral_rusher_player_name` in run plays with laterals. Please see the description of `lateral_rusher_player_name` for further information.}
#' \item{lateral_sack_player_id}{Unique identifier for the player that received the lateral on a sack.}
#' \item{lateral_sack_player_name}{String name for the player that received the lateral on a sack.}
#' \item{interception_player_id}{Unique identifier for the player that intercepted the pass.}
#' \item{interception_player_name}{String name for the player that intercepted the pass.}
#' \item{lateral_interception_player_id}{Unique indentifier for the player that received the lateral on an interception.}
#' \item{lateral_interception_player_name}{String name for the player that received the lateral on an interception.}
#' \item{punt_returner_player_id}{Unique identifier for the punt returner.}
#' \item{punt_returner_player_name}{String name for the punt returner.}
#' \item{lateral_punt_returner_player_id}{Unique identifier for the player that received the lateral on a punt return.}
#' \item{lateral_punt_returner_player_name}{String name for the player that received the lateral on a punt return.}
#' \item{kickoff_returner_player_name}{String name for the kickoff returner.}
#' \item{kickoff_returner_player_id}{Unique identifier for the kickoff returner.}
#' \item{lateral_kickoff_returner_player_id}{Unique identifier for the player that received the lateral on a kickoff return.}
#' \item{lateral_kickoff_returner_player_name}{String name for the player that received the lateral on a kickoff return.}
#' \item{punter_player_id}{Unique identifier for the punter.}
#' \item{punter_player_name}{String name for the punter.}
#' \item{kicker_player_name}{String name for the kicker on FG or kickoff.}
#' \item{kicker_player_id}{Unique identifier for the kicker on FG or kickoff.}
#' \item{own_kickoff_recovery_player_id}{Unique identifier for the player that recovered their own kickoff.}
#' \item{own_kickoff_recovery_player_name}{String name for the player that recovered their own kickoff.}
#' \item{blocked_player_id}{Unique identifier for the player that blocked the punt or FG.}
#' \item{blocked_player_name}{String name for the player that blocked the punt or FG.}
#' \item{tackle_for_loss_1_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_1_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{qb_hit_1_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_1_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{forced_fumble_player_1_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_name}{String name of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_name}{String name of one of the players with a forced fumble.}
#' \item{solo_tackle_1_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_2_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_name}{String name of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_name}{String name of one of the players with a solo tackle.}
#' \item{assist_tackle_1_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_1_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_1_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_2_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_3_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_4_team}{Team of one of the players with a tackle assist.}
#' \item{tackle_with_assist}{Binary indicator for if there has been a tackle with assist.}
#' \item{tackle_with_assist_1_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_team}{Team of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_team}{Team of one of the players with a tackle with assist.}
#' \item{pass_defense_1_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_1_player_name}{String name of one of the players with a pass defense.}
#' \item{pass_defense_2_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_2_player_name}{String name of one of the players with a pass defense.}
#' \item{fumbled_1_team}{Team of one of the first player with a fumble.}
#' \item{fumbled_1_player_id}{Unique identifier of the first player who fumbled on the play.}
#' \item{fumbled_1_player_name}{String name of one of the first player who fumbled on the play.}
#' \item{fumbled_2_player_id}{Unique identifier of the second player who fumbled on the play.}
#' \item{fumbled_2_player_name}{String name of one of the second player who fumbled on the play.}
#' \item{fumbled_2_team}{Team of one of the second player with a fumble.}
#' \item{fumble_recovery_1_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_name}{String name of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_name}{String name of one of the players with a fumble recovery.}
#' \item{sack_player_id}{Unique identifier of the player who recorded a solo sack.}
#' \item{sack_player_name}{String name of the player who recorded a solo sack.}
#' \item{half_sack_1_player_id}{Unique identifier of the first player who recorded half a sack.}
#' \item{half_sack_1_player_name}{String name of the first player who recorded half a sack.}
#' \item{half_sack_2_player_id}{Unique identifier of the second player who recorded half a sack.}
#' \item{half_sack_2_player_name}{String name of the second player who recorded half a sack.}
#' \item{return_team}{String abbreviation of the return team.}
#' \item{return_yards}{Yards gained by the return team.}
#' \item{penalty_team}{String abbreviation of the team with the penalty.}
#' \item{penalty_player_id}{Unique identifier for the player with the penalty.}
#' \item{penalty_player_name}{String name for the player with the penalty.}
#' \item{penalty_yards}{Yards gained (or lost) by the posteam from the penalty.}
#' \item{replay_or_challenge}{Binary indicator for whether or not a replay or challenge.}
#' \item{replay_or_challenge_result}{String indicating the result of the replay or challenge.}
#' \item{penalty_type}{String indicating the penalty type of the first penalty in the given play. Will be `NA` if `desc` is missing the type.}
#' \item{defensive_two_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on a two point conversion, this results following a turnover.}
#' \item{defensive_two_point_conv}{Binary indicator whether or not the defense successfully scored on the two point conversion.}
#' \item{defensive_extra_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on an extra point attempt, this results following a blocked attempt that the defense recovers the ball.}
#' \item{defensive_extra_point_conv}{Binary indicator whether or not the defense successfully scored on an extra point attempt.}
#' \item{safety_player_name}{String name for the player who scored a safety.}
#' \item{safety_player_id}{Unique identifier for the player who scored a safety.}
#' \item{season}{4 digit number indicating to which season the game belongs to.}
#' \item{cp}{Numeric value indicating the probability for a complete pass based on comparable game situations.}
#' \item{cpoe}{For a single pass play this is 1 - cp when the pass was completed or 0 - cp when the pass was incomplete. Analyzed for a whole game or season an indicator for the passer how much over or under expectation his completion percentage was.}
#' \item{series}{Starts at 1, each new first down increments, numbers shared across both teams NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam}
#' \item{series_success}{1: scored touchdown, gained enough yards for first down.}
#' \item{series_result}{Possible values: First down, Touchdown, Opp touchdown, Field goal, Missed field goal, Safety, Turnover, Punt, Turnover on downs, QB kneel, End of half}
#' \item{order_sequence}{Column provided by NFL to fix out-of-order plays. Available 2011 and beyond with source "nfl".}
#' \item{start_time}{Kickoff time in eastern time zone.}
#' \item{time_of_day}{Time of day of play in UTC "HH:MM:SS" format. Available 2011 and beyond with source "nfl".}
#' \item{stadium}{Game site name.}
#' \item{weather}{String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!}
#' \item{nfl_api_id}{UUID of the game in the new NFL API.}
#' \item{play_clock}{Time on the playclock when the ball was snapped.}
#' \item{play_deleted}{Binary indicator for deleted plays.}
#' \item{play_type_nfl}{Play type as listed in the NFL source. Slightly different to the regular play_type variable.}
#' \item{special_teams_play}{Binary indicator for whether play is special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{st_play_type}{Type of special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{end_clock_time}{Game time at the end of a given play.}
#' \item{end_yard_line}{String indicating the yardline at the end of the given play consisting of team half and yard line number.}
#' \item{fixed_drive}{Manually created drive number in a game.}
#' \item{fixed_drive_result}{Manually created drive result.}
#' \item{drive_real_start_time}{Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').}
#' \item{drive_play_count}{Numeric value of how many regular plays happened in a given drive.}
#' \item{drive_time_of_possession}{Time of possession in a given drive.}
#' \item{drive_first_downs}{Number of first downs in a given drive.}
#' \item{drive_inside20}{Binary indicator if the offense was able to get inside the opponents 20 yard line.}
#' \item{drive_ended_with_score}{Binary indicator the drive ended with a score.}
#' \item{drive_quarter_start}{Numeric value indicating in which quarter the given drive has started.}
#' \item{drive_quarter_end}{Numeric value indicating in which quarter the given drive has ended.}
#' \item{drive_yards_penalized}{Numeric value of how many yards the offense gained or lost through penalties in the given drive.}
#' \item{drive_start_transition}{String indicating how the offense got the ball.}
#' \item{drive_end_transition}{String indicating how the offense lost the ball.}
#' \item{drive_game_clock_start}{Game time at the beginning of a given drive.}
#' \item{drive_game_clock_end}{Game time at the end of a given drive.}
#' \item{drive_start_yard_line}{String indicating where a given drive started consisting of team half and yard line number.}
#' \item{drive_end_yard_line}{String indicating where a given drive ended consisting of team half and yard line number.}
#' \item{drive_play_id_started}{Play_id of the first play in the given drive.}
#' \item{drive_play_id_ended}{Play_id of the last play in the given drive.}
#' \item{away_score}{Total points scored by the away team.}
#' \item{home_score}{Total points scored by the home team.}
#' \item{location}{Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. }
#' \item{result}{Equals home_score - away_score and means the game outcome from the perspective of the home team.}
#' \item{total}{Equals home_score + away_score and means the total points scored in the given game.}
#' \item{spread_line}{The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference)}
#' \item{total_line}{The closing total line for the game. (Source: Pro-Football-Reference)}
#' \item{div_game}{Binary indicator for if the given game was a division game.}
#' \item{roof}{One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{surface}{What type of ground the game was played on. (Source: Pro-Football-Reference)}
#' \item{temp}{The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)}
#' \item{wind}{The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)}
#' \item{home_coach}{First and last name of the home team coach. (Source: Pro-Football-Reference)}
#' \item{away_coach}{First and last name of the away team coach. (Source: Pro-Football-Reference)}
#' \item{stadium_id}{ID of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{game_stadium}{Name of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{success}{Binary indicator wheter epa > 0 in the given play. }
#' \item{passer}{Name of the dropback player (scrambles included) including plays with penalties.}
#' \item{passer_jersey_number}{Jersey number of the passer.}
#' \item{rusher}{Name of the rusher (no scrambles) including plays with penalties.}
#' \item{rusher_jersey_number}{Jersey number of the rusher.}
#' \item{receiver}{Name of the receiver including plays with penalties.}
#' \item{receiver_jersey_number}{Jersey number of the receiver.}
#' \item{pass}{Binary indicator if the play was a pass play (sacks and scrambles included).}
#' \item{rush}{Binary indicator if the play was a rushing play.}
#' \item{first_down}{Binary indicator if the play ended in a first down.}
#' \item{aborted_play}{Binary indicator if the play description indicates "Aborted".}
#' \item{special}{Binary indicator if the play was a special teams play.}
#' \item{play}{Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.}
#' \item{passer_id}{ID of the player in the 'passer' column.}
#' \item{rusher_id}{ID of the player in the 'rusher' column.}
#' \item{receiver_id}{ID of the player in the 'receiver' column.}
#' \item{name}{Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.}
#' \item{jersey_number}{Jersey number of the player listed in the 'name' column.}
#' \item{id}{ID of the player in the 'name' column.}
#' \item{fantasy_player_name}{Name of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{fantasy_player_id}{ID of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{fantasy}{Name of the rusher on rush plays or receiver on pass plays.}
#' \item{fantasy_id}{ID of the rusher on rush plays or receiver on pass plays.}
#' \item{out_of_bounds}{1 if play description contains ran ob, pushed ob, or sacked ob; 0 otherwise.}
#' \item{home_opening_kickoff}{= 1 if the home team received the opening kickoff, 0 otherwise.}
#' \item{qb_epa}{Gives QB credit for EPA for up to the point where a receiver lost a fumble after a completed catch and makes EPA work more like passing yards on plays with fumbles.}
#' \item{xyac_epa}{Expected value of EPA gained after the catch, starting from where the catch was made. Zero yards after the catch would be listed as zero EPA.}
#' \item{xyac_mean_yardage}{Average expected yards after the catch based on where the ball was caught.}
#' \item{xyac_median_yardage}{Median expected yards after the catch based on where the ball was caught.}
#' \item{xyac_success}{Probability play earns positive EPA (relative to where play started) based on where ball was caught.}
#' \item{xyac_fd}{Probability play earns a first down based on where the ball was caught.}
#' \item{xpass}{Probability of dropback scaled from 0 to 1.}
#' \item{pass_oe}{Dropback percent over expected on a given play scaled from 0 to 100.}


================================================
FILE: data-raw/wordmarks.R
================================================
library(dplyr)

teams <- nflfastR::teams_colors_logos |>
  dplyr::filter(!team_abbr %in% c("LAR", "OAK", "SD", "STL"))

purrr::walk(teams$team_abbr, function(x) {
  load <- glue::glue(
    "https://static.www.nfl.com/league/apps/clubs/wordmarks/{x}_fullcolor.png"
  ) |>
    magick::image_read() |>
    magick::image_trim()

  info <- magick::image_info(load)

  rl <- (700 - info$width) / 2
  tb <- (192 - info$height) / 2

  image <- magick::image_border(load, "transparent", glue::glue("{rl}x{tb}"))

  magick::image_write(
    image,
    path = glue::glue("wordmarks/{x}.png"),
    format = "png"
  )

  if (x == "LA") {
    magick::image_write(image, path = "wordmarks/LAR.png", format = "png")
    magick::image_write(image, path = "wordmarks/STL.png", format = "png")
  } else if (x == "LAC") {
    magick::image_write(image, path = "wordmarks/SD.png", format = "png")
  } else if (x == "LV") {
    magick::image_write(image, path = "wordmarks/OAK.png", format = "png")
  }
})


================================================
FILE: man/add_qb_epa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_additional_functions.R
\name{add_qb_epa}
\alias{add_qb_epa}
\title{Compute QB epa}
\usage{
add_qb_epa(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}

\item{...}{Additional arguments passed to a message function (for internal use).}
}
\description{
Compute QB epa
}
\details{
Add the variable 'qb_epa', which gives QB credit for EPA for up to the point where
a receiver lost a fumble after a completed catch and makes EPA work more
like passing yards on plays with fumbles
}


================================================
FILE: man/add_xpass.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_add_xpass.R
\name{add_xpass}
\alias{add_xpass}
\title{Add expected pass columns}
\usage{
add_xpass(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}

\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter \code{pbp} with the following columns
added:
\describe{
\item{xpass}{Probability of dropback scaled from 0 to 1.}
\item{pass_oe}{Dropback percent over expected on a given play scaled from 0 to 100.}
}
}
\description{
Build columns from the expected dropback model. Will return
\code{NA} on data prior to 2006 since that was before NFL started marking scrambles.
Must be run on a dataframe that has already had \code{\link[=clean_pbp]{clean_pbp()}} run on it.
Note that the functions \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} and
the database function \code{\link[=update_db]{update_db()}} already include this function.
}


================================================
FILE: man/add_xyac.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_add_xyac.R
\name{add_xyac}
\alias{add_xyac}
\title{Add expected yards after completion (xyac) variables}
\usage{
add_xyac(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}

\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter 'pbp' with the following columns
added:
\describe{
\item{xyac_epa}{Expected value of EPA gained after the catch, starting from where the catch was made. Zero yards after the catch would be listed as zero EPA.}
\item{xyac_success}{Probability play earns positive EPA (relative to where play started) based on where ball was caught.}
\item{xyac_fd}{Probability play earns a first down based on where the ball was caught.}
\item{xyac_mean_yardage}{Average expected yards after the catch based on where the ball was caught.}
\item{xyac_median_yardage}{Median expected yards after the catch based on where the ball was caught.}
}
}
\description{
Add expected yards after completion (xyac) variables
}
\details{
Build columns that capture what we should expect after the catch.
}


================================================
FILE: man/build_nflfastR_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/build_nflfastR_pbp.R
\name{build_nflfastR_pbp}
\alias{build_nflfastR_pbp}
\title{Build a Complete nflfastR Data Set}
\usage{
build_nflfastR_pbp(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...,
  decode = TRUE,
  rules = TRUE
)
}
\arguments{
\item{game_ids}{Vector of character ids or a data frame including the variable
\code{game_id} (see details for further information).}

\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory")
where nflfastR searches for raw game play-by-play data.
See \code{\link[=save_raw_pbp]{save_raw_pbp()}} for additional information.}

\item{...}{Additional arguments passed to the scraping functions (for internal use)}

\item{decode}{If \code{TRUE}, the function \code{\link[=decode_player_ids]{decode_player_ids()}} will be executed.}

\item{rules}{If \code{FALSE}, printing of the header and footer in the console output will be suppressed.}
}
\value{
An nflfastR play-by-play data frame like it can be loaded from \url{https://github.com/nflverse/nflverse-data}.
}
\description{
\code{build_nflfastR_pbp} is a convenient wrapper around 6 nflfastR functions:

\itemize{
\item{\code{\link[=fast_scraper]{fast_scraper()}}}
\item{\code{\link[=clean_pbp]{clean_pbp()}}}
\item{\code{\link[=add_qb_epa]{add_qb_epa()}}}
\item{\code{\link[=add_xyac]{add_xyac()}}}
\item{\code{\link[=add_xpass]{add_xpass()}}}
\item{\code{\link[=decode_player_ids]{decode_player_ids()}}}
}

Please see either the documentation of each function or
\href{https://nflfastr.com/articles/field_descriptions.html}{the nflfastR Field Descriptions website}
to learn about the output.
}
\details{
To load valid game_ids please use the package function \code{\link[=fast_scraper_schedules]{fast_scraper_schedules()}}.
}
\examples{
\donttest{
# Build nflfastR pbp for the 2018 and 2019 Super Bowls
try({# to avoid CRAN test problems
build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
})

# It is also possible to directly use the
# output of `load_schedules` as input
try({# to avoid CRAN test problems
nflreadr::load_schedules(2025) |>
  dplyr::slice_tail(n = 3) |>
  build_nflfastR_pbp()
})

\dontshow{
# Close open connections for R CMD Check
future::plan("sequential")
}
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}


================================================
FILE: man/calculate_expected_points.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ep_wp_calculators.R
\name{calculate_expected_points}
\alias{calculate_expected_points}
\title{Compute expected points}
\usage{
calculate_expected_points(pbp_data)
}
\arguments{
\item{pbp_data}{Play-by-play dataset to estimate expected points for.}
}
\value{
The original pbp_data with the following columns appended to it:
\describe{
\item{ep}{expected points.}
\item{no_score_prob}{probability of no more scoring this half.}
\item{opp_fg_prob}{probability next score opponent field goal this half.}
\item{opp_safety_prob}{probability next score opponent safety  this half.}
\item{opp_td_prob}{probability of next score opponent touchdown this half.}
\item{fg_prob}{probability next score field goal this half.}
\item{safety_prob}{probability next score safety this half.}
\item{td_prob}{probability text score touchdown this half.}
}
}
\description{
for provided plays. Returns the data with
probabilities of each scoring event and EP added. The following columns
must be present: season, home_team, posteam, roof (coded as 'open',
'closed', or 'retractable'), half_seconds_remaining, yardline_100,
ydstogo, posteam_timeouts_remaining, defteam_timeouts_remaining
}
\details{
Computes expected points for provided plays. Returns the data with
probabilities of each scoring event and EP added. The following columns
must be present:
\itemize{
\item{season}
\item{home_team}
\item{posteam}
\item{roof (coded as 'outdoors', 'dome', or 'open'/'closed'/NA (retractable))}
\item{half_seconds_remaining}
\item{yardline_100}
\item{down}
\item{ydstogo}
\item{posteam_timeouts_remaining}
\item{defteam_timeouts_remaining}
}
}
\examples{
\donttest{
try({# to avoid CRAN test problems
library(dplyr)
data <- tibble::tibble(
"season" = 1999:2019,
"home_team" = "SEA",
"posteam" = "SEA",
"roof" = "outdoors",
"half_seconds_remaining" = 1800,
"yardline_100" = c(rep(80, 17), rep(75, 4)),
"down" = 1,
"ydstogo" = 10,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)

nflfastR::calculate_expected_points(data) |>
  dplyr::select(season, yardline_100, td_prob, ep)
})
}
}


================================================
FILE: man/calculate_player_stats.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats.R
\name{calculate_player_stats}
\alias{calculate_player_stats}
\title{Get Official Game Stats}
\usage{
calculate_player_stats(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}. If the data doesn't include the variable
\code{qb_epa}, the function \code{add_qb_epa()} will be called to add it.}

\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats
for the entire Data frame.}
}
\value{
A data frame including the following columns (all ID columns are
decoded to the gsis ID format):
\describe{
\item{player_id}{ID of the player. Use this to join to other sources.}
\item{player_name}{Name of the player}
\item{player_display_name}{Full name of the player}
\item{position}{Position of the player}
\item{position_group}{Position group of the player}
\item{headshot_url}{URL to a player headshot image}
\item{games}{The number of games where the player recorded passing, rushing or receiving stats.}
\item{recent_team}{Most recent team player appears in \code{pbp} with.}
\item{season}{Season if \code{weekly} is \code{TRUE}}
\item{week}{Week if \code{weekly} is \code{TRUE}}
\item{season_type}{\code{REG} or \code{POST} if \code{weekly} is \code{TRUE}}
\item{opponent_team}{The player's opponent team if \code{weekly} is \code{TRUE}}
\item{completions}{The number of completed passes.}
\item{attempts}{The number of pass attempts as defined by the NFL.}
\item{passing_yards}{Yards gained on pass plays.}
\item{passing_tds}{The number of passing touchdowns.}
\item{interceptions}{The number of interceptions thrown.}
\item{sacks}{The Number of times sacked.}
\item{sack_yards}{Yards lost on sack plays.}
\item{sack_fumbles}{The number of sacks with a fumble.}
\item{sack_fumbles_lost}{The number of sacks with a lost fumble.}
\item{passing_air_yards}{Passing air yards (includes incomplete passes).}
\item{passing_yards_after_catch}{Yards after the catch gained on plays in
which player was the passer (this is an unofficial stat and may differ slightly
between different sources).}
\item{passing_first_downs}{First downs on pass attempts.}
\item{passing_epa}{Total expected points added on pass attempts and sacks.
NOTE: this uses the variable \code{qb_epa}, which gives QB credit for EPA for up
to the point where a receiver lost a fumble after a completed catch and makes
EPA work more like passing yards on plays with fumbles.}
\item{passing_2pt_conversions}{Two-point conversion passes.}
\item{pacr}{Passing Air Conversion Ratio. PACR = \code{passing_yards} / \code{passing_air_yards}}
\item{dakota}{Adjusted EPA + CPOE composite based on coefficients which best predict adjusted EPA/play in the following year.}
\item{carries}{The number of official rush attempts (incl. scrambles and kneel downs).
Rushes after a lateral reception don't count as carry.}
\item{rushing_yards}{Yards gained when rushing with the ball (incl. scrambles and kneel downs).
Also includes yards gained after obtaining a lateral on a play that started
with a rushing attempt.}
\item{rushing_tds}{The number of rushing touchdowns (incl. scrambles).
Also includes touchdowns after obtaining a lateral on a play that started
with a rushing attempt.}
\item{rushing_fumbles}{The number of rushes with a fumble.}
\item{rushing_fumbles_lost}{The number of rushes with a lost fumble.}
\item{rushing_first_downs}{First downs on rush attempts (incl. scrambles).}
\item{rushing_epa}{Expected points added on rush attempts (incl. scrambles and kneel downs).}
\item{rushing_2pt_conversions}{Two-point conversion rushes}
\item{receptions}{The number of pass receptions. Lateral receptions officially
don't count as reception.}
\item{targets}{The number of pass plays where the player was the targeted receiver.}
\item{receiving_yards}{Yards gained after a pass reception. Includes yards
gained after receiving a lateral on a play that started as a pass play.}
\item{receiving_tds}{The number of touchdowns following a pass reception.
Also includes touchdowns after receiving a lateral on a play that started
as a pass play.}
\item{receiving_air_yards}{Receiving air yards (incl. incomplete passes).}
\item{receiving_yards_after_catch}{Yards after the catch gained on plays in
which player was receiver (this is an unofficial stat and may differ slightly
between different sources).}
\item{receiving_fumbles}{The number of fumbles after a pass reception.}
\item{receiving_fumbles_lost}{The number of fumbles lost after a pass reception.}
\item{receiving_2pt_conversions}{Two-point conversion receptions}
\item{racr}{Receiver Air Conversion Ratio. RACR = \code{receiving_yards} / \code{receiving_air_yards}}
\item{target_share}{The share of targets of the player in all targets of his team}
\item{air_yards_share}{The share of receiving_air_yards of the player in all air_yards of his team}
\item{wopr}{Weighted Opportunity Rating. WOPR = 1.5 × \code{target_share} + 0.7 × \code{air_yards_share}}
\item{fantasy_points}{Standard fantasy points.}
\item{fantasy_points_ppr}{PPR fantasy points.}
}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.

Build columns that aggregate official passing, rushing, and receiving stats
either at the game level or at the level of the entire data frame passed.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
# pbp <- nflfastR::load_pbp(2020)

# weekly <- calculate_player_stats(pbp, weekly = TRUE)
# dplyr::glimpse(weekly)

# overall <- calculate_player_stats(pbp, weekly = FALSE)
# dplyr::glimpse(overall)
})
}
}
\seealso{
The function \code{\link[=load_player_stats]{load_player_stats()}} and the corresponding examples
on \href{https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats}{the nflfastR website}
}
\keyword{internal}


================================================
FILE: man/calculate_player_stats_def.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats_def.R
\name{calculate_player_stats_def}
\alias{calculate_player_stats_def}
\title{Get Official Game Stats on Defense}
\usage{
calculate_player_stats_def(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}. If the data doesn't include the variable
\code{qb_epa}, the function \code{add_qb_epa()} will be called to add it.}

\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats
for the entire Data frame.}
}
\value{
A data frame of defensive player stats. See dictionary (# TODO)
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.

Build columns that aggregate official defense stats
either at the game level or at the level of the entire data frame passed.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
  # pbp <- nflfastR::load_pbp(2020)

  # weekly <- calculate_player_stats_def(pbp, weekly = TRUE)
  # dplyr::glimpse(weekly)

  # overall <- calculate_player_stats_def(pbp, weekly = FALSE)
  # dplyr::glimpse(overall)
})
}

}
\seealso{
The function \code{\link[=load_player_stats]{load_player_stats()}} and the corresponding examples
on \href{https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats}{the nflfastR website}
}
\keyword{internal}


================================================
FILE: man/calculate_player_stats_kicking.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats_kicking.R
\name{calculate_player_stats_kicking}
\alias{calculate_player_stats_kicking}
\title{Summarize Kicking Stats}
\usage{
calculate_player_stats_kicking(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}.}

\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats for
the entire data frame in argument \code{pbp}.}
}
\value{
a dataframe of kicking stats
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.

Build columns that aggregate kicking stats at the game level.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
    # pbp <- nflreadr::load_pbp(2021)
    # weekly <- calculate_player_stats_kicking(pbp, weekly = TRUE)
    # dplyr::glimpse(weekly)

    # overall <- calculate_player_stats_kicking(pbp, weekly = FALSE)
    # dplyr::glimpse(overall)
})
}

}
\seealso{
\url{https://nflreadr.nflverse.com/reference/load_player_stats.html} for the nflreadr function to download this from repo (\code{stat_type = "kicking"})
}
\keyword{internal}


================================================
FILE: man/calculate_series_conversion_rates.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_series_conversion_rates.R
\name{calculate_series_conversion_rates}
\alias{calculate_series_conversion_rates}
\title{Compute Series Conversion Information from Play by Play}
\usage{
calculate_series_conversion_rates(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{Play-by-play data as returned by \code{\link[=load_pbp]{load_pbp()}}, \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, or
\code{\link[=fast_scraper]{fast_scraper()}}.}

\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise,
season-by-season stats in argument \code{pbp}.}
}
\value{
A data frame of series information including the following columns:
\describe{
\item{season}{The NFL season}
\item{team}{NFL team abbreviation}
\item{week}{Week if \code{weekly} is \code{TRUE}}
\item{off_n}{The number of series the offense played (excludes QB kneel
downs, kickoffs, extra point/two point conversion attempts, non-plays, and
plays that do not list a "posteam")}
\item{off_scr}{The rate at which a series ended in either new 1st down or
touchdown while the offense was on the field}
\item{off_scr_1st}{The rate at which an offense earned a 1st down
or scored a touchdown on 1st down}
\item{off_scr_2nd}{The rate at which an offense earned a 1st down
or scored a touchdown on 2nd down}
\item{off_scr_3rd}{The rate at which an offense earned a 1st down
or scored a touchdown on 3rd down}
\item{off_scr_4th}{The rate at which an offense earned a 1st down
or scored a touchdown on 4th down}
\item{off_1st}{The rate of series that ended in a new 1st down while the
offense was on the field (does not include offensive touchdown)}
\item{off_td}{The rate of series that ended in an offensive touchdown while the
offense was on the field}
\item{off_fg}{The rate of series that ended in a field goal attempt while the
offense was on the field}
\item{off_punt}{The rate of series that ended in a punt while the
offense was on the field}
\item{off_to}{The rate of series that ended in a turnover (including on downs), in an
opponent score, or at the end of half (or game) while the
offense was on the field}
\item{def_n}{The number of series the defense played (excludes QB kneel
downs, kickoffs, extra point/two point conversion attempts, non-plays, and
plays that do not list a "posteam")}
\item{def_scr}{The rate at which a series ended in either new 1st down or
touchdown while the defense was on the field}
\item{def_scr_1st}{The rate at which a defense allowed a
1st down or touchdown on 1st down}
\item{def_scr_2nd}{The rate at which a defense allowed a
1st down or touchdown on 2nd down}
\item{def_scr_3rd}{The rate at which a defense allowed a
1st down or touchdown on 3rd down}
\item{def_scr_4th}{The rate at which a defense allowed a
1st down or touchdown on 4th down}
\item{def_1st}{The rate of series that ended in a new 1st down while the
defense was on the field (does not include offensive touchdown)}
\item{def_td}{The rate of series that ended in an offensive touchdown while the
defense was on the field}
\item{def_fg}{The rate of series that ended in a field goal attempt while the
defense was on the field}
\item{def_punt}{The rate of series that ended in a punt while the
defense was on the field}
\item{def_to}{The rate of series that ended in a turnover (including on downs), in an
opponent score, or at the end of half (or game) while the
defense was on the field}
}
}
\description{
A "Series" begins on a 1st and 10 and each team attempts to either earn
a new 1st down (on offense) or prevent the offense from converting a new
1st down (on defense). Series conversion rate represents how many series
have been either converted to a new 1st down or ended in a touchdown.
This function computes series conversion rates on offense and defense from
nflverse play-by-play data along with other series results.
The function automatically removes series that ended in a QB kneel down.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
  pbp <- nflfastR::load_pbp(2021)

  weekly <- calculate_series_conversion_rates(pbp, weekly = TRUE)
  dplyr::glimpse(weekly)

  overall <- calculate_series_conversion_rates(pbp, weekly = FALSE)
  dplyr::glimpse(overall)
})
}
}


================================================
FILE: man/calculate_standings.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_standings.R
\name{calculate_standings}
\alias{calculate_standings}
\title{Compute Division Standings and Conference Seeds from Play by Play}
\usage{
calculate_standings(
  nflverse_object,
  tiebreaker_depth = 3,
  playoff_seeds = NULL
)
}
\arguments{
\item{nflverse_object}{Data object of class \code{nflverse_data}. Either schedules
as returned by \code{\link[=fast_scraper_schedules]{fast_scraper_schedules()}} or \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules()}}.
Or play-by-play data as returned by \code{\link[=load_pbp]{load_pbp()}}, \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, or
\code{\link[=fast_scraper]{fast_scraper()}}.}

\item{tiebreaker_depth}{A single value equal to 1, 2, or 3. The default is 3. The
value controls the depth of tiebreakers that shall be applied. The deepest
currently implemented tiebreaker is strength of schedule. The following
values are valid:
\describe{
\item{tiebreaker_depth = 1}{Break all ties with a coinflip. Fastest variant.}
\item{tiebreaker_depth = 2}{Apply head-to-head and division win percentage tiebreakers. Random if still tied.}
\item{tiebreaker_depth = 3}{Apply all tiebreakers through strength of schedule. Random if still tied.}
}}

\item{playoff_seeds}{Number of playoff teams per conference. If \code{NULL} (the
default), the function will try to split \code{nflverse_object} into seasons prior
2020 (6 seeds) and 2020ff (7 seeds). If set to a numeric, it will be used
for all seasons in \code{nflverse_object}!}
}
\value{
A tibble with NFL regular season standings
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated and replaced by \code{\link[nflseedR:nfl_standings]{nflseedR::nfl_standings()}}.

This function calculates division standings as well as playoff
seeds per conference based on either nflverse play-by-play data or nflverse
schedule data.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
  # load nflverse data both schedules and pbp
  # scheds <- fast_scraper_schedules(2014)
  # pbp <- load_pbp(c(2018, 2021))

  # calculate standings based on pbp
  # calculate_standings(pbp)

  # calculate standings based on schedules
  # calculate_standings(scheds)
})
}
}
\keyword{internal}


================================================
FILE: man/calculate_stats.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_stats.R
\name{calculate_stats}
\alias{calculate_stats}
\title{Calculate NFL Stats}
\usage{
calculate_stats(
  seasons = nflreadr::most_recent_season(),
  summary_level = c("season", "week"),
  stat_type = c("player", "team"),
  season_type = c("REG", "POST", "REG+POST"),
  pbp = NULL
)
}
\arguments{
\item{seasons}{A numeric vector of 4-digit years associated with given NFL
seasons - defaults to latest season. If set to TRUE, returns all available
data since 1999. Ignored if argument \code{pbp} is not \code{NULL}.}

\item{summary_level}{Summarize stats by \code{"season"} or \code{"week"}.}

\item{stat_type}{Calculate \code{"player"} level stats or \code{"team"} level stats.}

\item{season_type}{One of \code{"REG"}, \code{"POST"}, or \code{"REG+POST"}. Filters
data to regular season ("REG"), post season ("POST") or keeps all data.
Only applied if \code{summary_level} == \code{"season"}.}

\item{pbp}{This argument allows passing a subset of nflverse play-by-play
data, created with \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} or loaded with \code{\link[=load_pbp]{load_pbp()}}.
Stats are then calculated based on the \code{game_id}s and \code{play_id}s in this
subset of play-by-play data, rather then using the seasons specified in the
\code{seasons} argument. The function will error if required variables are
missing from the subset, but lists which variables are missing.
If \code{pbp = NULL} (the default), all available games and plays from the
\code{seasons} argument are used to calculate stats.
Please use this responsibly, because the output is structurally identical
to full seasons, even if plays have been filtered out. It may then appear
as if the stats are incorrect. If \code{pbp} is not \code{NULL}, the function will add
the attribute \code{"custom_pbp" = TRUE} to the function output to help identify
stats that are possibly based on play-by-play subsets.}
}
\value{
A tibble of player/team stats summarized by season/week.
}
\description{
Compute various NFL stats based off nflverse Play-by-Play data.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
stats <- calculate_stats(2023, "season", "player")
dplyr::glimpse(stats)
})
}
}
\seealso{
\link{nfl_stats_variables} for a description of all variables.

\url{https://nflfastr.com/articles/stats_variables.html} for a searchable
table of the stats variable descriptions.
}


================================================
FILE: man/calculate_win_probability.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ep_wp_calculators.R
\name{calculate_win_probability}
\alias{calculate_win_probability}
\title{Compute win probability}
\usage{
calculate_win_probability(pbp_data)
}
\arguments{
\item{pbp_data}{Play-by-play dataset to estimate win probability for.}
}
\value{
The original pbp_data with the following columns appended to it:
\describe{
\item{wp}{win probability.}
\item{vegas_wp}{win probability taking into account pre-game spread.}
}
}
\description{
for provided plays. Returns the data with
probabilities of winning the game. The following columns
must be present: receive_h2_ko (1 if game is in 1st half and possession
team will receive 2nd half kickoff, 0 otherwise),
home_team, posteam, half_seconds_remaining, game_seconds_remaining,
spread_line (how many points home team was favored by), down, ydstogo,
yardline_100, posteam_timeouts_remaining, defteam_timeouts_remaining
}
\details{
Computes win probability for provided plays. Returns the data with
spread and non-spread-adjusted win probabilities. The following columns
must be present:
\itemize{
\item{receive_2h_ko (1 if game is in 1st half and possession team will receive 2nd half kickoff, 0 otherwise)}
\item{score_differential}
\item{home_team}
\item{posteam}
\item{half_seconds_remaining}
\item{game_seconds_remaining}
\item{spread_line (how many points home team was favored by)}
\item{down}
\item{ydstogo}
\item{yardline_100}
\item{posteam_timeouts_remaining}
\item{defteam_timeouts_remaining}
}
}
\examples{
\donttest{
try({# to avoid CRAN test problems
library(dplyr)
data <- tibble::tibble(
"receive_2h_ko" = 0,
"home_team" = "SEA",
"posteam" = "SEA",
"score_differential" = 0,
"half_seconds_remaining" = 1800,
"game_seconds_remaining" = 3600,
"spread_line" = c(1, 3, 4, 7, 14),
"down" = 1,
"ydstogo" = 10,
"yardline_100" = 75,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)

nflfastR::calculate_win_probability(data) |>
  dplyr::select(spread_line, wp, vegas_wp)
})
}
}


================================================
FILE: man/clean_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_additional_functions.R
\name{clean_pbp}
\alias{clean_pbp}
\title{Clean Play by Play Data}
\usage{
clean_pbp(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}

\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter 'pbp' with the following columns
added:
\describe{
\item{success}{Binary indicator wheter epa > 0 in the given play. }
\item{passer}{Name of the dropback player (scrambles included) including plays with penalties.}
\item{passer_jersey_number}{Jersey number of the passer.}
\item{rusher}{Name of the rusher (no scrambles) including plays with penalties.}
\item{rusher_jersey_number}{Jersey number of the rusher.}
\item{receiver}{Name of the receiver including plays with penalties.}
\item{receiver_jersey_number}{Jersey number of the receiver.}
\item{pass}{Binary indicator if the play was a pass play (sacks and scrambles included).}
\item{rush}{Binary indicator if the play was a rushing play.}
\item{special}{Binary indicator if the play was a special teams play.}
\item{first_down}{Binary indicator if the play ended in a first down.}
\item{aborted_play}{Binary indicator if the play description indicates "Aborted".}
\item{play}{Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.}
\item{passer_id}{ID of the player in the 'passer' column.}
\item{rusher_id}{ID of the player in the 'rusher' column.}
\item{receiver_id}{ID of the player in the 'receiver' column.}
\item{name}{Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.}
\item{fantasy}{Name of the rusher on rush plays or receiver on pass plays.}
\item{fantasy_id}{ID of the rusher on rush plays or receiver on pass plays.}
\item{fantasy_player_name}{Name of the rusher on rush plays or receiver on pass plays (from official stats).}
\item{fantasy_player_id}{ID of the rusher on rush plays or receiver on pass plays (from official stats).}
\item{jersey_number}{Jersey number of the player listed in the 'name' column.}
\item{id}{ID of the player in the 'name' column.}
\item{out_of_bounds}{= 1 if play description contains "ran ob", "pushed ob", or "sacked ob"; = 0 otherwise.}
\item{home_opening_kickoff}{= 1 if the home team received the opening kickoff, 0 otherwise.}
}
}
\description{
Clean Play by Play Data
}
\details{
Build columns that capture what happens on all plays, including
penalties, using string extraction from play description.
Loosely based on Ben's nflfastR guide (\url{https://nflfastr.com/articles/beginners_guide.html})
but updated to work with the RS data, which has a different player format in
the play description; e.g. 24-M.Lynch instead of M.Lynch.
The function also standardizes team abbreviations so that, for example,
the Chargers are always represented by 'LAC' regardless of which year it was.
Starting in 2022, play-by-play data was missing gsis player IDs of rookies.
This functions tries to fix as many as possible.
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}


================================================
FILE: man/decode_player_ids.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_decode_player_ids.R
\name{decode_player_ids}
\alias{decode_player_ids}
\title{Decode the player IDs in nflfastR play-by-play data}
\usage{
decode_player_ids(pbp, ..., fast = TRUE)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}

\item{...}{Additional arguments passed to a message function (for internal use).}

\item{fast}{If \code{TRUE} the IDs will be decoded with the high efficient
function \link[gsisdecoder:decode_ids]{decode_ids}. If \code{FALSE} an nflfastR internal
function will be used for decoding (it is generally not recommended to do this,
unless there is a problem with \link[gsisdecoder:decode_ids]{decode_ids}
which can take several days to fix on CRAN.)}
}
\value{
The input data frame of the parameter \code{pbp} with decoded player IDs.
}
\description{
Takes all columns ending with \code{'player_id'} as well as the
variables \code{'passer_id'}, \code{'rusher_id'}, \code{'fantasy_id'},
\code{'receiver_id'}, and \code{'id'} of an nflfastR play-by-play data set
and decodes the player IDs to the commonly known GSIS ID format 00-00xxxxx.

The function uses by default the high efficient \link[gsisdecoder:decode_ids]{decode_ids}
of the package \href{https://cran.r-project.org/package=gsisdecoder}{\code{gsisdecoder}}.
In the unlikely event that there is a problem with this function, an nflfastR
internal decoder can be used with the option \code{fast = FALSE}.

The 2022 play by play data introduced new player IDs that can't be decoded
with gsisdecoder. In that case, IDs are joined through \link[nflreadr:load_players]{nflreadr::load_players}.
}
\examples{
\donttest{
# Decode data frame consisting of some names and ids
decode_player_ids(data.frame(
  name = c("P.Mahomes", "B.Baldwin", "P.Mahomes", "S.Carl", "J.Jones"),
  id = c(
    "32013030-2d30-3033-3338-3733fa30c4fa",
    NA_character_,
    "00-0033873",
    NA_character_,
    "32013030-2d30-3032-3739-3434d4d3846d"
  )
))
}
}


================================================
FILE: man/fast_scraper.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper}
\alias{fast_scraper}
\title{Get NFL Play by Play Data}
\usage{
fast_scraper(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL),
  ...,
  in_builder = FALSE
)
}
\arguments{
\item{game_ids}{Vector of character ids or a data frame including the variable
\code{game_id} (see details for further information).}

\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory")
where nflfastR searches for raw game play-by-play data.
See \code{\link[=save_raw_pbp]{save_raw_pbp()}} for additional information.}

\item{...}{Additional arguments passed to the scraping functions (for internal use)}

\item{in_builder}{If \code{TRUE}, the final message will be suppressed (for usage inside of \code{\link{build_nflfastR_pbp}}).}
}
\value{
Data frame where each individual row represents a single play for
all passed game_ids containing the following
detailed information (description partly extracted from nflscrapR):
\describe{
\item{play_id}{Numeric play id that when used with game_id and drive provides the unique identifier for a single play.}
\item{game_id}{Ten digit identifier for NFL game.}
\item{old_game_id}{Legacy NFL game ID.}
\item{home_team}{String abbreviation for the home team.}
\item{away_team}{String abbreviation for the away team.}
\item{season_type}{'REG' or 'POST' indicating if the game belongs to regular or post season.}
\item{week}{Season week.}
\item{posteam}{String abbreviation for the team with possession.}
\item{posteam_type}{String indicating whether the posteam team is home or away.}
\item{defteam}{String abbreviation for the team on defense.}
\item{side_of_field}{String abbreviation for which team's side of the field the team with possession is currently on.}
\item{yardline_100}{Numeric distance in the number of yards from the opponent's endzone for the posteam.}
\item{game_date}{Date of the game.}
\item{quarter_seconds_remaining}{Numeric seconds remaining in the quarter.}
\item{half_seconds_remaining}{Numeric seconds remaining in the half.}
\item{game_seconds_remaining}{Numeric seconds remaining in the game.}
\item{game_half}{String indicating which half the play is in, either Half1, Half2, or Overtime.}
\item{quarter_end}{Binary indicator for whether or not the row of the data is marking the end of a quarter.}
\item{drive}{Numeric drive number in the game.}
\item{sp}{Binary indicator for whether or not a score occurred on the play.}
\item{qtr}{Quarter of the game (5 is overtime).}
\item{down}{The down for the given play.}
\item{goal_to_go}{Binary indicator for whether or not the posteam is in a goal down situation.}
\item{time}{Time at start of play provided in string format as minutes:seconds remaining in the quarter.}
\item{yrdln}{String indicating the current field position for a given play.}
\item{ydstogo}{Numeric yards in distance from either the first down marker or the endzone in goal down situations.}
\item{ydsnet}{Numeric value for total yards gained on the given drive.}
\item{desc}{Detailed string description for the given play.}
\item{play_type}{String indicating the type of play: pass (includes sacks), run (includes scrambles), punt, field_goal, kickoff, extra_point, qb_kneel, qb_spike, no_play (timeouts and penalties), and missing for rows indicating end of play.}
\item{yards_gained}{Numeric yards gained (or lost) by the possessing team, excluding yards gained via fumble recoveries and laterals.}
\item{shotgun}{Binary indicator for whether or not the play was in shotgun formation.}
\item{no_huddle}{Binary indicator for whether or not the play was in no_huddle formation.}
\item{qb_dropback}{Binary indicator for whether or not the QB dropped back on the play (pass attempt, sack, or scrambled).}
\item{qb_kneel}{Binary indicator for whether or not the QB took a knee.}
\item{qb_spike}{Binary indicator for whether or not the QB spiked the ball.}
\item{qb_scramble}{Binary indicator for whether or not the QB scrambled.}
\item{pass_length}{String indicator for pass length: short or deep.}
\item{pass_location}{String indicator for pass location: left, middle, or right.}
\item{air_yards}{Numeric value for distance in yards perpendicular to the line of scrimmage at where the targeted receiver either caught or didn't catch the ball.}
\item{yards_after_catch}{Numeric value for distance in yards perpendicular to the yard line where the receiver made the reception to where the play ended.}
\item{run_location}{String indicator for location of run: left, middle, or right.}
\item{run_gap}{String indicator for line gap of run: end, guard, or tackle}
\item{field_goal_result}{String indicator for result of field goal attempt: made, missed, or blocked.}
\item{kick_distance}{Numeric distance in yards for kickoffs, field goals, and punts.}
\item{extra_point_result}{String indicator for the result of the extra point attempt: good, failed, blocked, safety (touchback in defensive endzone is 1 point apparently), or aborted.}
\item{two_point_conv_result}{String indicator for result of two point conversion attempt: success, failure, safety (touchback in defensive endzone is 1 point apparently), or return.}
\item{home_timeouts_remaining}{Numeric timeouts remaining in the half for the home team.}
\item{away_timeouts_remaining}{Numeric timeouts remaining in the half for the away team.}
\item{timeout}{Binary indicator for whether or not a timeout was called by either team.}
\item{timeout_team}{String abbreviation for which team called the timeout.}
\item{td_team}{String abbreviation for which team scored the touchdown.}
\item{td_player_name}{String name of the player who scored a touchdown.}
\item{td_player_id}{Unique identifier of the player who scored a touchdown.}
\item{posteam_timeouts_remaining}{Number of timeouts remaining for the possession team.}
\item{defteam_timeouts_remaining}{Number of timeouts remaining for the team on defense.}
\item{total_home_score}{Score for the home team at the end of the play.}
\item{total_away_score}{Score for the away team at the end of the play.}
\item{posteam_score}{Score the posteam at the start of the play.}
\item{defteam_score}{Score the defteam at the start of the play.}
\item{score_differential}{Score differential between the posteam and defteam at the start of the play.}
\item{posteam_score_post}{Score for the posteam at the end of the play.}
\item{defteam_score_post}{Score for the defteam at the end of the play.}
\item{score_differential_post}{Score differential between the posteam and defteam at the end of the play.}
\item{no_score_prob}{Predicted probability of no score occurring for the rest of the half based on the expected points model.}
\item{opp_fg_prob}{Predicted probability of the defteam scoring a FG next.}
\item{opp_safety_prob}{Predicted probability of the defteam scoring a safety next.}
\item{opp_td_prob}{Predicted probability of the defteam scoring a TD next.}
\item{fg_prob}{Predicted probability of the posteam scoring a FG next.}
\item{safety_prob}{Predicted probability of the posteam scoring a safety next.}
\item{td_prob}{Predicted probability of the posteam scoring a TD next.}
\item{extra_point_prob}{Predicted probability of the posteam scoring an extra point.}
\item{two_point_conversion_prob}{Predicted probability of the posteam scoring the two point conversion.}
\item{ep}{Using the scoring event probabilities, the estimated expected points with respect to the possession team for the given play.}
\item{epa}{Expected points added (EPA) by the posteam for the given play.}
\item{total_home_epa}{Cumulative total EPA for the home team in the game so far.}
\item{total_away_epa}{Cumulative total EPA for the away team in the game so far.}
\item{total_home_rush_epa}{Cumulative total rushing EPA for the home team in the game so far.}
\item{total_away_rush_epa}{Cumulative total rushing EPA for the away team in the game so far.}
\item{total_home_pass_epa}{Cumulative total passing EPA for the home team in the game so far.}
\item{total_away_pass_epa}{Cumulative total passing EPA for the away team in the game so far.}
\item{air_epa}{EPA from the air yards alone. For completions this represents the actual value provided through the air. For incompletions this represents the hypothetical value that could've been added through the air if the pass was completed.}
\item{yac_epa}{EPA from the yards after catch alone. For completions this represents the actual value provided after the catch. For incompletions this represents the difference between the hypothetical air_epa and the play's raw observed EPA (how much the incomplete pass cost the posteam).}
\item{comp_air_epa}{EPA from the air yards alone only for completions.}
\item{comp_yac_epa}{EPA from the yards after catch alone only for completions.}
\item{total_home_comp_air_epa}{Cumulative total completions air EPA for the home team in the game so far.}
\item{total_away_comp_air_epa}{Cumulative total completions air EPA for the away team in the game so far.}
\item{total_home_comp_yac_epa}{Cumulative total completions yac EPA for the home team in the game so far.}
\item{total_away_comp_yac_epa}{Cumulative total completions yac EPA for the away team in the game so far.}
\item{total_home_raw_air_epa}{Cumulative total raw air EPA for the home team in the game so far.}
\item{total_away_raw_air_epa}{Cumulative total raw air EPA for the away team in the game so far.}
\item{total_home_raw_yac_epa}{Cumulative total raw yac EPA for the home team in the game so far.}
\item{total_away_raw_yac_epa}{Cumulative total raw yac EPA for the away team in the game so far.}
\item{wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play.}
\item{def_wp}{Estimated win probability for the defteam.}
\item{home_wp}{Estimated win probability for the home team.}
\item{away_wp}{Estimated win probability for the away team.}
\item{wpa}{Win probability added (WPA) for the posteam.}
\item{vegas_wpa}{Win probability added (WPA) for the posteam: spread_adjusted model.}
\item{vegas_home_wpa}{Win probability added (WPA) for the home team: spread_adjusted model.}
\item{home_wp_post}{Estimated win probability for the home team at the end of the play.}
\item{away_wp_post}{Estimated win probability for the away team at the end of the play.}
\item{vegas_wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.}
\item{vegas_home_wp}{Estimated win probability for the home team incorporating pre-game Vegas line.}
\item{total_home_rush_wpa}{Cumulative total rushing WPA for the home team in the game so far.}
\item{total_away_rush_wpa}{Cumulative total rushing WPA for the away team in the game so far.}
\item{total_home_pass_wpa}{Cumulative total passing WPA for the home team in the game so far.}
\item{total_away_pass_wpa}{Cumulative total passing WPA for the away team in the game so far.}
\item{air_wpa}{WPA through the air (same logic as air_epa).}
\item{yac_wpa}{WPA from yards after the catch (same logic as yac_epa).}
\item{comp_air_wpa}{The air_wpa for completions only.}
\item{comp_yac_wpa}{The yac_wpa for completions only.}
\item{total_home_comp_air_wpa}{Cumulative total completions air WPA for the home team in the game so far.}
\item{total_away_comp_air_wpa}{Cumulative total completions air WPA for the away team in the game so far.}
\item{total_home_comp_yac_wpa}{Cumulative total completions yac WPA for the home team in the game so far.}
\item{total_away_comp_yac_wpa}{Cumulative total completions yac WPA for the away team in the game so far.}
\item{total_home_raw_air_wpa}{Cumulative total raw air WPA for the home team in the game so far.}
\item{total_away_raw_air_wpa}{Cumulative total raw air WPA for the away team in the game so far.}
\item{total_home_raw_yac_wpa}{Cumulative total raw yac WPA for the home team in the game so far.}
\item{total_away_raw_yac_wpa}{Cumulative total raw yac WPA for the away team in the game so far.}
\item{punt_blocked}{Binary indicator for if the punt was blocked.}
\item{first_down_rush}{Binary indicator for if a running play converted the first down.}
\item{first_down_pass}{Binary indicator for if a passing play converted the first down.}
\item{first_down_penalty}{Binary indicator for if a penalty converted the first down.}
\item{third_down_converted}{Binary indicator for if the first down was converted on third down.}
\item{third_down_failed}{Binary indicator for if the posteam failed to convert first down on third down.}
\item{fourth_down_converted}{Binary indicator for if the first down was converted on fourth down.}
\item{fourth_down_failed}{Binary indicator for if the posteam failed to convert first down on fourth down.}
\item{incomplete_pass}{Binary indicator for if the pass was incomplete.}
\item{touchback}{Binary indicator for if a touchback occurred on the play.}
\item{interception}{Binary indicator for if the pass was intercepted.}
\item{punt_inside_twenty}{Binary indicator for if the punt ended inside the twenty yard line.}
\item{punt_in_endzone}{Binary indicator for if the punt was in the endzone.}
\item{punt_out_of_bounds}{Binary indicator for if the punt went out of bounds.}
\item{punt_downed}{Binary indicator for if the punt was downed.}
\item{punt_fair_catch}{Binary indicator for if the punt was caught with a fair catch.}
\item{kickoff_inside_twenty}{Binary indicator for if the kickoff ended inside the twenty yard line.}
\item{kickoff_in_endzone}{Binary indicator for if the kickoff was in the endzone.}
\item{kickoff_out_of_bounds}{Binary indicator for if the kickoff went out of bounds.}
\item{kickoff_downed}{Binary indicator for if the kickoff was downed.}
\item{kickoff_fair_catch}{Binary indicator for if the kickoff was caught with a fair catch.}
\item{fumble_forced}{Binary indicator for if the fumble was forced.}
\item{fumble_not_forced}{Binary indicator for if the fumble was not forced.}
\item{fumble_out_of_bounds}{Binary indicator for if the fumble went out of bounds.}
\item{solo_tackle}{Binary indicator if the play had a solo tackle (could be multiple due to fumbles).}
\item{safety}{Binary indicator for whether or not a safety occurred.}
\item{penalty}{Binary indicator for whether or not a penalty occurred.}
\item{tackled_for_loss}{Binary indicator for whether or not a tackle for loss on a run play occurred.}
\item{fumble_lost}{Binary indicator for if the fumble was lost.}
\item{own_kickoff_recovery}{Binary indicator for if the kicking team recovered the kickoff.}
\item{own_kickoff_recovery_td}{Binary indicator for if the kicking team recovered the kickoff and scored a TD.}
\item{qb_hit}{Binary indicator if the QB was hit on the play.}
\item{rush_attempt}{Binary indicator for if the play was a run.}
\item{pass_attempt}{Binary indicator for if the play was a pass attempt (includes sacks).}
\item{sack}{Binary indicator for if the play ended in a sack.}
\item{touchdown}{Binary indicator for if the play resulted in a TD.}
\item{pass_touchdown}{Binary indicator for if the play resulted in a passing TD.}
\item{rush_touchdown}{Binary indicator for if the play resulted in a rushing TD.}
\item{return_touchdown}{Binary indicator for if the play resulted in a return TD.}
\item{extra_point_attempt}{Binary indicator for extra point attempt.}
\item{two_point_attempt}{Binary indicator for two point conversion attempt.}
\item{field_goal_attempt}{Binary indicator for field goal attempt.}
\item{kickoff_attempt}{Binary indicator for kickoff.}
\item{punt_attempt}{Binary indicator for punts.}
\item{fumble}{Binary indicator for if a fumble occurred.}
\item{complete_pass}{Binary indicator for if the pass was completed.}
\item{assist_tackle}{Binary indicator for if an assist tackle occurred.}
\item{lateral_reception}{Binary indicator for if a lateral occurred on the reception.}
\item{lateral_rush}{Binary indicator for if a lateral occurred on a run.}
\item{lateral_return}{Binary indicator for if a lateral occurred on a return.}
\item{lateral_recovery}{Binary indicator for if a lateral occurred on a fumble recovery.}
\item{passer_player_id}{Unique identifier for the player that attempted the pass.}
\item{passer_player_name}{String name for the player that attempted the pass.}
\item{passing_yards}{Numeric yards by the passer_player_name, including yards gained in pass plays with laterals.
This should equal official passing statistics.}
\item{receiver_player_id}{Unique identifier for the receiver that was targeted on the pass.}
\item{receiver_player_name}{String name for the targeted receiver.}
\item{receiving_yards}{Numeric yards by the receiver_player_name, excluding yards gained in pass plays with laterals.
This should equal official receiving statistics but could miss yards gained in pass plays with laterals.
Please see the description of \code{lateral_receiver_player_name} for further information.}
\item{rusher_player_id}{Unique identifier for the player that attempted the run.}
\item{rusher_player_name}{String name for the player that attempted the run.}
\item{rushing_yards}{Numeric yards by the rusher_player_name, excluding yards gained in rush plays with laterals.
This should equal official rushing statistics but could miss yards gained in rush plays with laterals.
Please see the description of \code{lateral_rusher_player_name} for further information.}
\item{lateral_receiver_player_id}{Unique identifier for the player that received the last(!) lateral on a pass play.}
\item{lateral_receiver_player_name}{String name for the player that received the last(!) lateral on a pass play.
If there were multiple laterals in the same play, this will only be the last player who received a lateral.
Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
for a list of plays where multiple players recorded lateral receiving yards.}
\item{lateral_receiving_yards}{Numeric yards by the \code{lateral_receiver_player_name} in pass plays with laterals.
Please see the description of \code{lateral_receiver_player_name} for further information.}
\item{lateral_rusher_player_id}{Unique identifier for the player that received the last(!) lateral on a run play.}
\item{lateral_rusher_player_name}{String name for the player that received the last(!) lateral on a run play.
If there were multiple laterals in the same play, this will only be the last player who received a lateral.
Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
for a list of plays where multiple players recorded lateral rushing yards.}
\item{lateral_rushing_yards}{Numeric yards by the \code{lateral_rusher_player_name} in run plays with laterals.
Please see the description of \code{lateral_rusher_player_name} for further information.}
\item{lateral_sack_player_id}{Unique identifier for the player that received the lateral on a sack.}
\item{lateral_sack_player_name}{String name for the player that received the lateral on a sack.}
\item{interception_player_id}{Unique identifier for the player that intercepted the pass.}
\item{interception_player_name}{String name for the player that intercepted the pass.}
\item{lateral_interception_player_id}{Unique indentifier for the player that received the lateral on an interception.}
\item{lateral_interception_player_name}{String name for the player that received the lateral on an interception.}
\item{punt_returner_player_id}{Unique identifier for the punt returner.}
\item{punt_returner_player_name}{String name for the punt returner.}
\item{lateral_punt_returner_player_id}{Unique identifier for the player that received the lateral on a punt return.}
\item{lateral_punt_returner_player_name}{String name for the player that received the lateral on a punt return.}
\item{kickoff_returner_player_name}{String name for the kickoff returner.}
\item{kickoff_returner_player_id}{Unique identifier for the kickoff returner.}
\item{lateral_kickoff_returner_player_id}{Unique identifier for the player that received the lateral on a kickoff return.}
\item{lateral_kickoff_returner_player_name}{String name for the player that received the lateral on a kickoff return.}
\item{punter_player_id}{Unique identifier for the punter.}
\item{punter_player_name}{String name for the punter.}
\item{kicker_player_name}{String name for the kicker on FG or kickoff.}
\item{kicker_player_id}{Unique identifier for the kicker on FG or kickoff.}
\item{own_kickoff_recovery_player_id}{Unique identifier for the player that recovered their own kickoff.}
\item{own_kickoff_recovery_player_name}{String name for the player that recovered their own kickoff.}
\item{blocked_player_id}{Unique identifier for the player that blocked the punt or FG.}
\item{blocked_player_name}{String name for the player that blocked the punt or FG.}
\item{tackle_for_loss_1_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_1_player_name}{String name for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_2_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_2_player_name}{String name for one of the potential players with the tackle for loss.}
\item{qb_hit_1_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_1_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_2_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_2_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{forced_fumble_player_1_team}{Team of one of the players with a forced fumble.}
\item{forced_fumble_player_1_player_id}{Unique identifier of one of the players with a forced fumble.}
\item{forced_fumble_player_1_player_name}{String name of one of the players with a forced fumble.}
\item{forced_fumble_player_2_team}{Team of one of the players with a forced fumble.}
\item{forced_fumble_player_2_player_id}{Unique identifier of one of the players with a forced fumble.}
\item{forced_fumble_player_2_player_name}{String name of one of the players with a forced fumble.}
\item{solo_tackle_1_team}{Team of one of the players with a solo tackle.}
\item{solo_tackle_2_team}{Team of one of the players with a solo tackle.}
\item{solo_tackle_1_player_id}{Unique identifier of one of the players with a solo tackle.}
\item{solo_tackle_2_player_id}{Unique identifier of one of the players with a solo tackle.}
\item{solo_tackle_1_player_name}{String name of one of the players with a solo tackle.}
\item{solo_tackle_2_player_name}{String name of one of the players with a solo tackle.}
\item{assist_tackle_1_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_1_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_1_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_2_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_2_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_2_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_3_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_3_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_3_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_4_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_4_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_4_team}{Team of one of the players with a tackle assist.}
\item{tackle_with_assist}{Binary indicator for if there has been a tackle with assist.}
\item{tackle_with_assist_1_player_id}{Unique identifier of one of the players with a tackle with assist.}
\item{tackle_with_assist_1_player_name}{String name of one of the players with a tackle with assist.}
\item{tackle_with_assist_1_team}{Team of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_player_id}{Unique identifier of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_player_name}{String name of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_team}{Team of one of the players with a tackle with assist.}
\item{pass_defense_1_player_id}{Unique identifier of one of the players with a pass defense.}
\item{pass_defense_1_player_name}{String name of one of the players with a pass defense.}
\item{pass_defense_2_player_id}{Unique identifier of one of the players with a pass defense.}
\item{pass_defense_2_player_name}{String name of one of the players with a pass defense.}
\item{fumbled_1_team}{Team of one of the first player with a fumble.}
\item{fumbled_1_player_id}{Unique identifier of the first player who fumbled on the play.}
\item{fumbled_1_player_name}{String name of one of the first player who fumbled on the play.}
\item{fumbled_2_player_id}{Unique identifier of the second player who fumbled on the play.}
\item{fumbled_2_player_name}{String name of one of the second player who fumbled on the play.}
\item{fumbled_2_team}{Team of one of the second player with a fumble.}
\item{fumble_recovery_1_team}{Team of one of the players with a fumble recovery.}
\item{fumble_recovery_1_yards}{Yards gained by one of the players with a fumble recovery.}
\item{fumble_recovery_1_player_id}{Unique identifier of one of the players with a fumble recovery.}
\item{fumble_recovery_1_player_name}{String name of one of the players with a fumble recovery.}
\item{fumble_recovery_2_team}{Team of one of the players with a fumble recovery.}
\item{fumble_recovery_2_yards}{Yards gained by one of the players with a fumble recovery.}
\item{fumble_recovery_2_player_id}{Unique identifier of one of the players with a fumble recovery.}
\item{fumble_recovery_2_player_name}{String name of one of the players with a fumble recovery.}
\item{sack_player_id}{Unique identifier of the player who recorded a solo sack.}
\item{sack_player_name}{String name of the player who recorded a solo sack.}
\item{half_sack_1_player_id}{Unique identifier of the first player who recorded half a sack.}
\item{half_sack_1_player_name}{String name of the first player who recorded half a sack.}
\item{half_sack_2_player_id}{Unique identifier of the second player who recorded half a sack.}
\item{half_sack_2_player_name}{String name of the second player who recorded half a sack.}
\item{return_team}{String abbreviation of the return team.}
\item{return_yards}{Yards gained by the return team.}
\item{penalty_team}{String abbreviation of the team with the penalty.}
\item{penalty_player_id}{Unique identifier for the player with the penalty.}
\item{penalty_player_name}{String name for the player with the penalty.}
\item{penalty_yards}{Yards gained (or lost) by the posteam from the penalty.}
\item{replay_or_challenge}{Binary indicator for whether or not a replay or challenge.}
\item{replay_or_challenge_result}{String indicating the result of the replay or challenge.}
\item{penalty_type}{String indicating the penalty type of the first penalty in the given play. Will be \code{NA} if \code{desc} is missing the type.}
\item{defensive_two_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on a two point conversion, this results following a turnover.}
\item{defensive_two_point_conv}{Binary indicator whether or not the defense successfully scored on the two point conversion.}
\item{defensive_extra_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on an extra point attempt, this results following a blocked attempt that the defense recovers the ball.}
\item{defensive_extra_point_conv}{Binary indicator whether or not the defense successfully scored on an extra point attempt.}
\item{safety_player_name}{String name for the player who scored a safety.}
\item{safety_player_id}{Unique identifier for the player who scored a safety.}
\item{season}{4 digit number indicating to which season the game belongs to.}
\item{cp}{Numeric value indicating the probability for a complete pass based on comparable game situations.}
\item{cpoe}{For a single pass play this is 1 - cp when the pass was completed or 0 - cp when the pass was incomplete. Analyzed for a whole game or season an indicator for the passer how much over or under expectation his completion percentage was.}
\item{series}{Starts at 1, each new first down increments, numbers shared across both teams NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam}
\item{series_success}{1: scored touchdown, gained enough yards for first down.}
\item{series_result}{Possible values: First down, Touchdown, Opp touchdown, Field goal, Missed field goal, Safety, Turnover, Punt, Turnover on downs, QB kneel, End of half}
\item{start_time}{Kickoff time in eastern time zone.}
\item{order_sequence}{Column provided by NFL to fix out-of-order plays. Available 2011 and beyond with source "nfl".}
\item{time_of_day}{Time of day of play in UTC "HH:MM:SS" format. Available 2011 and beyond with source "nfl".}
\item{stadium}{Game site name.}
\item{weather}{String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!}
\item{nfl_api_id}{UUID of the game in the new NFL API.}
\item{play_clock}{Time on the playclock when the ball was snapped.}
\item{play_deleted}{Binary indicator for deleted plays.}
\item{play_type_nfl}{Play type as listed in the NFL source. Slightly different to the regular play_type variable.}
\item{special_teams_play}{Binary indicator for whether play is special teams play from NFL source. Available 2011 and beyond with source "nfl".}
\item{st_play_type}{Type of special teams play from NFL source. Available 2011 and beyond with source "nfl".}
\item{end_clock_time}{Game time at the end of a given play.}
\item{end_yard_line}{String indicating the yardline at the end of the given play consisting of team half and yard line number.}
\item{drive_real_start_time}{Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').}
\item{drive_play_count}{Numeric value of how many regular plays happened in a given drive.}
\item{drive_time_of_possession}{Time of possession in a given drive.}
\item{drive_first_downs}{Number of first downs in a given drive.}
\item{drive_inside20}{Binary indicator if the offense was able to get inside the opponents 20 yard line.}
\item{drive_ended_with_score}{Binary indicator the drive ended with a score.}
\item{drive_quarter_start}{Numeric value indicating in which quarter the given drive has started.}
\item{drive_quarter_end}{Numeric value indicating in which quarter the given drive has ended.}
\item{drive_yards_penalized}{Numeric value of how many yards the offense gained or lost through penalties in the given drive.}
\item{drive_start_transition}{String indicating how the offense got the ball.}
\item{drive_end_transition}{String indicating how the offense lost the ball.}
\item{drive_game_clock_start}{Game time at the beginning of a given drive.}
\item{drive_game_clock_end}{Game time at the end of a given drive.}
\item{drive_start_yard_line}{String indicating where a given drive started consisting of team half and yard line number.}
\item{drive_end_yard_line}{String indicating where a given drive ended consisting of team half and yard line number.}
\item{drive_play_id_started}{Play_id of the first play in the given drive.}
\item{drive_play_id_ended}{Play_id of the last play in the given drive.}
\item{fixed_drive}{Manually created drive number in a game.}
\item{fixed_drive_result}{Manually created drive result.}
\item{away_score}{Total points scored by the away team.}
\item{home_score}{Total points scored by the home team.}
\item{location}{Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. }
\item{result}{Equals home_score - away_score and means the game outcome from the perspective of the home team.}
\item{total}{Equals home_score + away_score and means the total points scored in the given game.}
\item{spread_line}{The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference)}
\item{total_line}{The closing total line for the game. (Source: Pro-Football-Reference)}
\item{div_game}{Binary indicator for if the given game was a division game.}
\item{roof}{One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)}
\item{surface}{What type of ground the game was played on. (Source: Pro-Football-Reference)}
\item{temp}{The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)}
\item{wind}{The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)}
\item{home_coach}{First and last name of the home team coach. (Source: Pro-Football-Reference)}
\item{away_coach}{First and last name of the away team coach. (Source: Pro-Football-Reference)}
\item{stadium_id}{ID of the stadium the game was played in. (Source: Pro-Football-Reference)}
\item{game_stadium}{Name of the stadium the game was played in. (Source: Pro-Football-Reference)}
}
}
\description{
Load and parse NFL play-by-play data and add all of the original
nflfastR variables. As nflfastR now provides multiple functions which add
information to the output of this function, it is recommended to use
\code{\link{build_nflfastR_pbp}} instead.
}
\details{
To load valid game_ids please use the package function
\code{\link{fast_scraper_schedules}} (the function can directly handle the
output of that function)
}
\examples{
\donttest{
# Get pbp data for two games
try({# to avoid CRAN test problems
fast_scraper(c("2019_01_GB_CHI", "2013_21_SEA_DEN"))
})


# It is also possible to directly use the
# output of `fast_scraper_schedules` as input
try({# to avoid CRAN test problems
library(dplyr, warn.conflicts = FALSE)
fast_scraper_schedules(2020) |>
  slice_tail(n = 3) |>
  fast_scraper()
})

\dontshow{
# Close open connections for R CMD Check
future::plan("sequential")
}
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.

\code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, \code{\link[=save_raw_pbp]{save_raw_pbp()}}
}


================================================
FILE: man/fast_scraper_roster.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper_roster}
\alias{fast_scraper_roster}
\title{Load Team Rosters for Multiple Seasons}
\usage{
fast_scraper_roster(...)
}
\arguments{
\item{...}{
  Arguments passed on to \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}}
  \describe{
    \item{\code{seasons}}{a numeric vector of seasons to return, defaults to returning
this year's data if it is March or later. If set to \code{TRUE}, will return all available data.
Data available back to 1920.}
    \item{\code{file_type}}{One of \code{c("rds", "csv", "parquet")}. Can also be set globally with
\code{options(nflreadr.prefer)}}
  }}
}
\value{
A tibble of season-level roster data.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated. Please use \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}}.
}
\details{
See \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}} for details.
}
\examples{
\donttest{
# Roster of the 2019 and 2020 seasons
try({# to avoid CRAN test problems
# fast_scraper_roster(2019:2020)
})
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
\keyword{internal}


================================================
FILE: man/fast_scraper_schedules.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper_schedules}
\alias{fast_scraper_schedules}
\title{Load NFL Season Schedules}
\usage{
fast_scraper_schedules(...)
}
\arguments{
\item{...}{
  Arguments passed on to \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}}
  \describe{
    \item{\code{seasons}}{a numeric vector of seasons to return, default \code{TRUE} returns all available data.}
  }}
}
\value{
A tibble of game information for past and/or future games.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated. Please use \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}}.
}
\details{
See \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}} for details.
}
\examples{
\donttest{
# Get schedules for the whole 2015 - 2018 seasons
try({# to avoid CRAN test problems
# fast_scraper_schedules(2015:2018)
})
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
\keyword{internal}


================================================
FILE: man/field_descriptions.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{field_descriptions}
\alias{field_descriptions}
\title{nflfastR Field Descriptions}
\format{
A data frame including names and descriptions of all variables in
an nflfastR dataset.
}
\usage{
field_descriptions
}
\description{
nflfastR Field Descriptions
}
\examples{
\donttest{
field_descriptions
}
}
\seealso{
The searchable table on the
\href{https://nflfastr.com/articles/field_descriptions.html}{nflfastR website}
}
\keyword{datasets}


================================================
FILE: man/missing_raw_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/save_raw_pbp.R
\name{missing_raw_pbp}
\alias{missing_raw_pbp}
\title{Compute Missing Raw PBP Data on Local Filesystem}
\usage{
missing_raw_pbp(
  dir = getOption("nflfastR.raw_directory", default = NULL),
  seasons = TRUE,
  verbose = TRUE
)
}
\arguments{
\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory").
nflfastR will download the raw game files split by season into one sub
directory per season.}

\item{seasons}{a numeric vector of seasons to return, default \code{TRUE} returns all available data.}

\item{verbose}{If \code{TRUE}, will print number of missing game files as well as
oldest and most recent missing ID to console.}
}
\value{
A character vector of missing game IDs. If no files are missing,
returns \code{NULL} invisibly.
}
\description{
Uses \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules()}} to load game IDs of finished games and
compares these IDs to all files saved under \code{dir}.
This function is intended to serve as input for \code{\link[=save_raw_pbp]{save_raw_pbp()}}.
}
\examples{
\donttest{
try(
missing <- missing_raw_pbp(tempdir())
)
}
}
\seealso{
\code{\link[=save_raw_pbp]{save_raw_pbp()}}
}


================================================
FILE: man/nfl_stats_variables.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{nfl_stats_variables}
\alias{nfl_stats_variables}
\title{NFL Stats Variables}
\format{
A data frame explaining all variables returned by the function
\code{\link[=calculate_stats]{calculate_stats()}}.
}
\usage{
nfl_stats_variables
}
\description{
NFL Stats Variables
}
\examples{
\donttest{
nfl_stats_variables
}
}
\keyword{datasets}


================================================
FILE: man/nflfastR-package.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nflfastR-package.R
\docType{package}
\name{nflfastR-package}
\alias{nflfastR}
\alias{nflfastR-package}
\title{nflfastR: Functions to Efficiently Access NFL Play by Play Data}
\description{
\if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}}

A set of functions to access National Football League play-by-play data from \url{https://www.nfl.com/}.
}
\section{Parallel Processing and Progress Updates in nflfastR}{
\subsection{Preface}{

Prior to nflfastR v4.0, parallel processing could be activated with an
argument \code{pp} in the relevant functions and progress updates were always
shown. Both of these methods are bad practice and were therefore removed
in nflfastR v4.0

The next sections describe how to make nflfastR work in parallel processes
and show progress updates if the user wants to.
}

\subsection{More Speed Using Parallel Processing}{

Nearly all nflfastR functions support parallel processing
using \code{\link[furrr:future_map]{furrr::future_map()}} if it is enabled by a call to \code{\link[future:plan]{future::plan()}}
prior to the function call.
Please see the documentation of the functions for detailed information.

As an example, the following code block will resolve all function calls in the
current session using multiple sessions in the background and load play-by-play
data for the 2018 through 2020 seasons or build them freshly for the 2018 and
2019 Super Bowls:

\if{html}{\out{<div class="sourceCode">}}\preformatted{future::plan("multisession")
load_pbp(2018:2020)
build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
}\if{html}{\out{</div>}}

We recommend choosing a default parallel processing method and saving it
as an environment variable in the R user profile to make sure all futures
will be resolved with the chosen method by default.
This can be done by following the below given steps.

First, run the following line and the file \code{.Renviron} should be opened automatically.
If you haven't saved any environment variables yet, this will be an empty file.

\if{html}{\out{<div class="sourceCode">}}\preformatted{usethis::edit_r_environ()
}\if{html}{\out{</div>}}

In the opened file \code{.Renviron} add the next line, then save the file and restart your R session.
Please note that this example sets "multisession" as default. For most users
this should be the appropriate plan but please make sure it truly is.

\if{html}{\out{<div class="sourceCode">}}\preformatted{R_FUTURE_PLAN="multisession"
}\if{html}{\out{</div>}}

After the session is freshly restarted please check if the above method worked
by running the next line. If the output is \code{FALSE} you successfully set up a
default non-sequential \code{\link[future:plan]{future::plan()}}. If the output is \code{TRUE} all functions
will behave like they were called with \code{\link[purrr:map]{purrr::map()}} and NOT in multisession.

\if{html}{\out{<div class="sourceCode">}}\preformatted{inherits(future::plan(), "sequential")
}\if{html}{\out{</div>}}

For more information on possible plans please see
\href{https://github.com/futureverse/future/blob/develop/README.md}{the future package Readme}.

For more information on \code{.Renviron} please see
\href{https://rstats.wtf/r-startup.html}{this book chapter}.
}

\subsection{Get Progress Updates while Functions are Running}{

Most nflfastR functions are able to show progress updates
using \code{\link[progressr:progressor]{progressr::progressor()}} if they are turned on before the function is
called. There are at least two basic ways to do this by either activating
progress updates globally (for the current session) with

\if{html}{\out{<div class="sourceCode">}}\preformatted{progressr::handlers(global = TRUE)
}\if{html}{\out{</div>}}

or by piping the function call into \code{\link[progressr:with_progress]{progressr::with_progress()}}:

\if{html}{\out{<div class="sourceCode">}}\preformatted{load_pbp(2018:2020) |>
  progressr::with_progress()
}\if{html}{\out{</div>}}

Just like in the previous section, it is possible to activate global
progression handlers by default. This can be done by following the below given steps.

First, run the following line and the file \code{.Rprofile} should be opened automatically.
If you haven't saved any code yet, this will be an empty file.

\if{html}{\out{<div class="sourceCode">}}\preformatted{usethis::edit_r_profile()
}\if{html}{\out{</div>}}

In the opened file \code{.Rprofile} add the next line, then save the file and restart your R
session. All code in this file will be executed when a new R session starts.
The part \verb{if (require("progressr"))} makes sure this will only run if the
package progressr is installed to avoid crashing R sessions.

\if{html}{\out{<div class="sourceCode">}}\preformatted{if (requireNamespace("progressr", quietly = TRUE)) progressr::handlers(global = TRUE)
}\if{html}{\out{</div>}}

After the session is freshly restarted please check if the above method worked
by running the next line. If the output is \code{TRUE} you successfully activated
global progression handlers for all sessions.

\if{html}{\out{<div class="sourceCode">}}\preformatted{progressr::handlers(global = NA)
}\if{html}{\out{</div>}}

For more information how to work with progress handlers please see \link[progressr:progressr]{progressr::progressr}.

For more information on \code{.Rprofile} please see
\href{https://rstats.wtf/r-startup.html}{this book chapter}.
}
}

\seealso{
Useful links:
\itemize{
  \item \url{https://nflfastr.com/}
  \item \url{https://github.com/nflverse/nflfastR}
  \item Report bugs at \url{https://github.com/nflverse/nflfastR/issues}
}

}
\author{
\strong{Maintainer}: Ben Baldwin \email{bbaldwin206@gmail.com}

Authors:
\itemize{
  \item Sebastian Carl \email{mrcaseb@gmail.com}
}

Other contributors:
\itemize{
  \item Lee Sharpe [contributor]
  \item Maksim Horowitz \email{maksim.horowitz@gmail.com} [contributor]
  \item Ron Yurko \email{ryurko@stat.cmu.edu} [contributor]
  \item Samuel Ventura \email{samventura22@gmail.com} [contributor]
  \item Tan Ho [contributor]
  \item John Edwards \email{edwards1860@gmail.com} [contributor]
}

}


================================================
FILE: man/reexports.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nflfastR-package.R
\docType{import}
\name{reexports}
\alias{reexports}
\alias{load_pbp}
\alias{load_player_stats}
\alias{load_team_stats}
\alias{load_schedules}
\alias{load_rosters}
\alias{nflverse_sitrep}
\alias{most_recent_season}
\title{Objects exported from other packages}
\keyword{internal}
\description{
These objects are imported from other packages. Follow the links
below to see their documentation.

\describe{
  \item{nflreadr}{\code{\link[nflreadr]{load_pbp}}, \code{\link[nflreadr]{load_player_stats}}, \code{\link[nflreadr]{load_rosters}}, \code{\link[nflreadr]{load_schedules}}, \code{\link[nflreadr]{load_team_stats}}, \code{\link[nflreadr:latest_season]{most_recent_season}}, \code{\link[nflreadr:sitrep]{nflverse_sitrep}}}
}}


================================================
FILE: man/report.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/report.R
\name{report}
\alias{report}
\title{Get a Situation Report on System, nflverse Package Versions and Dependencies}
\usage{
report(...)
}
\arguments{
\item{...}{
  Arguments passed on to \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}}
  \describe{
    \item{\code{pkg}}{a character vector naming installed packages, or \code{NULL}
(the default) meaning all nflverse packages. The function checks internally
if all packages are installed and informs if that is not the case.}
    \item{\code{recursive}}{a logical indicating whether dependencies of \code{pkg} and their
dependencies (and so on) should be included.
Can also be a character vector listing the types of dependencies, a subset
of \code{c("Depends", "Imports", "LinkingTo", "Suggests", "Enhances")}.
Character string \code{"all"} is shorthand for that vector, character string
\code{"most"} for the same vector without \code{"Enhances"}, character string \code{"strong"}
(default) for the first three elements of that vector.}
    \item{\code{redact_path}}{a logical indicating whether options that contain "path"
in the name should be redacted, default = TRUE}
  }}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

This function was deprecated. Please use \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}}.

This function gives a quick overview of the versions of R and
the operating system as well as the versions of nflverse packages, options,
and their dependencies. It's primarily designed to help you get a quick
idea of what's going on when you're helping someone else debug a problem.
}
\details{
See \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}} for details.
}
\examples{
\donttest{
\dontshow{
# set CRAN mirror to avoid failing checks in weird scenarios
old_ops <- options(repos = c("CRAN" = "https://cran.rstudio.com/"))
}

# report(recursive = FALSE)
nflverse_sitrep(pkg = "nflreadr", recursive = TRUE)

\dontshow{
# restore old options
options(old_ops)
}
}
}
\keyword{internal}


================================================
FILE: man/save_raw_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/save_raw_pbp.R
\name{save_raw_pbp}
\alias{save_raw_pbp}
\title{Download Raw PBP Data to Local Filesystem}
\usage{
save_raw_pbp(
  game_ids,
  dir = getOption("nflfastR.raw_directory", default = NULL)
)
}
\arguments{
\item{game_ids}{A vector of nflverse game IDs.}

\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory").
nflfastR will download the raw game files split by season into one sub
directory per season.}
}
\value{
The function returns a data frame with one row for each downloaded file and
the following columns:
\itemize{
\item \code{success} if the HTTP request was successfully performed, regardless of the
response status code. This is \code{FALSE} in case of a network error, or in case
you tried to resume from a server that did not support this. A value of \code{NA}
means the download was interrupted while in progress.
\item \code{status_code} the HTTP status code from the request. A successful download is
usually \code{200} for full requests or \code{206} for resumed requests. Anything else
could indicate that the downloaded file contains an error page instead of the
requested content.
\item \code{resumefrom} the file size before the request, in case a download was resumed.
\item \code{url} final url (after redirects) of the request.
\item \code{destfile} downloaded file on disk.
\item \code{error} if \code{success == FALSE} this column contains an error message.
\item \code{type} the \code{Content-Type} response header value.
\item \code{modified} the \code{Last-Modified} response header value.
\item \code{time} total elapsed download time for this file in seconds.
\item \code{headers} vector with http response headers for the request.
}
}
\description{
The functions \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} and \code{\link[=fast_scraper]{fast_scraper()}} support loading
raw pbp data from local file systems instead of Github servers.
This function is intended to help setting this up. It loads raw pbp data
and saves it in the given directory split by season in subdirectories.
}
\examples{
\donttest{
# CREATE LOCAL TEMP DIRECTORY
local_dir <- tempdir()

# LOAD AND SAVE A GAME TO TEMP DIRECTORY
save_raw_pbp("2021_20_BUF_KC", dir = local_dir)

# REMOVE THE DIRECTORY
unlink(file.path(local_dir, 2021))
}
}
\seealso{
\code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, \code{\link[=missing_raw_pbp]{missing_raw_pbp()}}
}


================================================
FILE: man/stat_ids.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{stat_ids}
\alias{stat_ids}
\title{NFL Stat IDs and their Meanings}
\format{
A data frame including NFL stat IDs, names and descriptions used in
an nflfastR dataset.
}
\source{
\url{http://www.nflgsis.com/gsis/Documentation/Partners/StatIDs.html}
}
\usage{
stat_ids
}
\description{
NFL Stat IDs and their Meanings
}
\examples{
\donttest{
stat_ids
}
}
\keyword{datasets}


================================================
FILE: man/teams_colors_logos.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{teams_colors_logos}
\alias{teams_colors_logos}
\title{NFL Team names, colors and logo urls.}
\format{
A data frame with 36 rows and 10 variables containing NFL team level
information, including franchises in multiple cities:
\describe{
\item{team_abbr}{Team abbreviation}
\item{team_name}{Complete Team name}
\item{team_id}{Team id used in the roster function}
\item{team_nick}{Nickname}
\item{team_conf}{Conference}
\item{team_division}{Division}
\item{team_color}{Primary color}
\item{team_color2}{Secondary color}
\item{team_color3}{Tertiary color}
\item{team_color4}{Quaternary color}
\item{team_logo_wikipedia}{Url to Team logo on wikipedia}
\item{team_logo_espn}{Url to higher quality logo on espn}
\item{team_wordmark}{Url to team wordmarks}
\item{team_conference_logo}{Url to AFC and NFC logos}
\item{team_league_logo}{Url to NFL logo}
}
The primary and secondary colors have been taken from nfl.com with some modifications
for better team distinction and most recent team color themes.
The tertiary and quaternary colors are taken from Lee Sharpe's teamcolors.csv
who has taken them from the \code{teamcolors} package created by Ben Baumer and
Gregory Matthews. The Wikipeadia logo urls are taken from Lee Sharpe's logos.csv
Team wordmarks from nfl.com
}
\usage{
teams_colors_logos
}
\description{
NFL Team names, colors and logo urls.
}
\examples{
\donttest{
teams_colors_logos
}
}
\keyword{datasets}


================================================
FILE: man/update_db.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_database_functions.R
\name{update_db}
\alias{update_db}
\title{Update or Create a nflfastR Play-by-Play Database}
\usage{
update_db(
  dbdir = getOption("nflfastR.dbdirectory", default = "."),
  dbname = "pbp_db",
  tblname = "nflfastR_pbp",
  force_rebuild = FALSE,
  db_connection = NULL
)
}
\arguments{
\item{dbdir}{Directory in which the database is or shall be located. Can also
be set globally with \code{options(nflfastR.dbdirectory)}}

\item{dbname}{File name of an existing or desired SQLite database within \code{dbdir}}

\item{tblname}{The name of the play by play data table within the database}

\item{force_rebuild}{Hybrid parameter (logical or numeric) to rebuild parts
of or the complete play by play data table within the database (please see details for further information)}

\item{db_connection}{A \code{DBIConnection} object, as returned by
\code{\link[DBI:dbConnect]{DBI::dbConnect()}} (please see details for further information)}
}
\description{
\code{update_db} updates or creates a database with \code{nflfastR}
play by play data of all completed games since 1999.
}
\details{
This function creates and updates a data table with the name \code{tblname}
within a SQLite database (other drivers via \code{db_connection}) located in
\code{dbdir} and named \code{dbname}.
The data table combines all play by play data for every available game back
to the 1999 season and adds the most recent completed games as soon as they
are available for \code{nflfastR}.

The argument \code{force_rebuild} is of hybrid type. It can rebuild the play
by play data table either for the whole nflfastR era (with \code{force_rebuild = TRUE})
or just for specified seasons (e.g. \code{force_rebuild = c(2019, 2020)}).
Please note the following behavior:
\itemize{
\item \code{force_rebuild = TRUE}: The data table with the name \code{tblname}
will be removed completely and rebuilt from scratch. This is helpful when
new columns are added during the Off-Season.
\item \code{force_rebuild = c(2019, 2020)}: The data table with the name \code{tblname}
will be preserved and only rows from the 2019 and 2020 seasons will be
deleted and re-added. This is intended to be used for ongoing seasons because
the NFL fixes bugs in the underlying data during the week and we recommend
rebuilding the current season every Thursday during the season.
}

The parameter \code{db_connection} is intended for advanced users who want
to use other DBI drivers, such as MariaDB, Postgres or odbc. Please note that
the arguments \code{dbdir} and \code{dbname} are dropped in case a \code{db_connection}
is provided but the argument \code{tblname} will still be used to write the
data table into the database.
}


================================================
FILE: man/update_pbp_db.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/database.R
\name{update_pbp_db}
\alias{update_pbp_db}
\title{Update or Create a nflverse Play-by-Play Data Table in a Connected Database}
\usage{
update_pbp_db(conn, ..., name = "nflverse_pbp", seasons = most_recent_season())
}
\arguments{
\item{conn}{A \code{DBIConnection} object, as returned by \code{\link[DBI:dbConnect]{DBI::dbConnect()}}}

\item{...}{These dots are for future extensions and must be empty.}

\item{name}{The table name, passed on to \code{\link[DBI:dbQuoteIdentifier]{dbQuoteIdentifier()}}. Options are:
\itemize{
\item a character string with the unquoted DBMS table name,
e.g. \code{"table_name"},
\item a call to \code{\link[DBI:Id]{Id()}} with components to the fully qualified table name,
e.g. \code{Id(schema = "my_schema", table = "table_name")}
\item a call to \code{\link[DBI:SQL]{SQL()}} with the quoted and fully qualified table name
given verbatim, e.g. \code{SQL('"my_schema"."table_name"')}
}}

\item{seasons}{Hybrid argument (logical or numeric) to update parts
of or the complete play by play table within the database.

It can update the play by play data table either for the whole nflfastR era
(with \code{seasons = TRUE}) or just for specified seasons
(e.g. \code{seasons = 2024:2025}).

Defaults to \link{most_recent_season}. Please see details for further information.}
}
\value{
Always returns the database connection invisibly.
}
\description{
The nflfastR play-by-play era dates back to 1999. To analyze all the data
efficiently, there is practically no alternative to working with a database.

This function helps to create and maintain a table containing all
play-by-play data of the nflfastR era in a connected database.
Primarily, the preprocessed data from \link{load_pbp} is written to the database
and, if necessary, supplemented with the latest games using
\link{build_nflfastR_pbp}.
}
\details{
\subsection{The \code{seasons} argument}{

The \code{seasons} argument controls how the table in the connected database is
handled.

With \code{seasons = TRUE}, the table in argument \code{name} will be removed completely
(by calling \link[DBI:dbRemoveTable]{DBI::dbRemoveTable}) and all seasons of the nflfastR era will be
added to a fresh table. This is helpful when new columns are added during the
offseason.

With a numerical vector, e.g. \code{seasons = 2024:2025}, the table in argument
\code{name} will be preserved and only rows from the given seasons will be deleted
and re-added (by calling \link[DBI:dbAppendTable]{DBI::dbAppendTable}). This is intended to be used
for ongoing seasons because the NFL fixes bugs in the underlying data during
the week and we recommend rebuilding the current season every Thursday during
the season.

The default behavior is \code{seasons = most_recent_season()}, which means that
only the most recent season is updated or added.

To keep the table, and thus also the schema, but update all play-by-play
data of the nflfastR era, set

\if{html}{\out{<div class="sourceCode">}}\preformatted{seasons = seq(1999, most_recent_season())
}\if{html}{\out{</div>}}

If \code{seasons} contains multiple seasons, it is possible to control whether the
seasons are loaded individually and written to the database, or whether
multiple seasons should be processed in chunks. The latter is more efficient
because fewer write operations are required, but at the same time, the data
must first be stored in memory. The option \verb{“nflfastR.db_chunk_size”} can
be used to control how many seasons are loaded together in a chunk and
written to the database. With the following option, for example, 5 seasons
are always loaded together and written to the database.

\if{html}{\out{<div class="sourceCode">}}\preformatted{options("nflfastR.db_chunk_size" = 5L)
}\if{html}{\out{</div>}}
}
}
\examples{
\donttest{
con <- DBI::dbConnect(duckdb::duckdb())
try({# to avoid CRAN test problems
update_pbp_db(con, seasons = 2024)
})
}
}


================================================
FILE: nflfastR.Rproj
================================================
Version: 1.0
ProjectId: e1e14382-386c-49b3-9b3f-206a4cc98503

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace

UseNativePipeOperator: Yes


================================================
FILE: pkgdown/_pkgdown.yml
================================================
url: https://nflfastr.com/

template:
  bootstrap: 5
  light-switch: true
  bslib:
    font_scale: 1.1
    base_font: {google: "Roboto"}
    heading_font: {google: "Kanit"}
    code_font: {google: "Fira Code"}
  opengraph:
    image:
      src: man/figures/card.png
      alt: "nflfastR social preview card"
    twitter:
      site: "@nflfastR"
      card: summary_large_image

toc:
  depth: 3

authors:
  Sebastian Carl:
    href: https://mrcaseb.com
  Ben Baldwin:
    href: https://bsky.app/profile/rbsdm.com
  Lee Sharpe:
    href: https://twitter.com/LeeSharpeNFL
  Maksim Horowitz:
    href: https://twitter.com/bklynmaks
  Ron Yurko:
    href: https://twitter.com/Stat_Ron
  Samuel Ventura:
    href: https://twitter.com/stat_sam
  Tan Ho:
    href: https://tanho.ca
  John Edwards:
    href: https://johnbedwards.io
home:
  title: An R package to quickly obtain clean and tidy NFL play by play data
  links:
  - text: nflverse Discord Chat
    href: https://discord.gg/5Er2FBnnQa
  - text: nflfastR Beginner's Guide
    href: articles/beginners_guide.html
  - text: nflfastR stats landing page
    href: https://rbsdm.com/stats/
  - text: Lee Sharpe's nfl game data
    href: https://nflgamedata.com

navbar:
  bg: dark
  type: light
  structure:
    left:  [home, intro, reference, news, articles]
    right: [search, lightswitch, stats, games, discord, github, more]
  components:
    games:
      icon: "fas fa-football-ball fa-lg"
      href: http://nflgamedata.com/
      aria-label: Games
    stats:
      icon: "fas fa-chart-line fa-lg"
      href: https://rbsdm.com/stats/
      aria-label: Stats
    reference:
      text: "Functions"
      href: reference/index.html
    discord:
      icon: "fab fa-discord fa-lg"
      href: https://discord.com/invite/5Er2FBnnQa
      aria-label: Discord
    articles:
      text: "Articles"
      menu:
      - text: A beginner’s guide to nflfastR
        href: articles/beginners_guide.html
      - text: Field Descriptions
        href: articles/field_descriptions.html
      - text: Stats Variable Descriptions
        href: articles/stats_variables.html
      - text: nflfastR models
        href: https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/
      - text: Open Source Football
        href: https://www.opensourcefootball.com/
    more:
      text: "Packages & More"
      menu:
        - text: "nflverse Packages"
        - text: nflfastR
          href: https://nflfastr.com
        - text: nflseedR
          href: https://nflseedr.com
        - text: nfl4th
          href: https://www.nfl4th.com
        - text: nflreadr
          href: https://nflreadr.nflverse.com/
        - text: nflplotR
          href: https://nflplotr.nflverse.com/
        - text: nflverse
          href: https://nflverse.nflverse.com/
        - text: "Open Source Football"
          href: https://www.opensourcefootball.com
        - text: "nflverse Data"
        - text: nflverse GitHub
          href: https://github.com/nflverse
        - text: ffverse
        - text: "ffverse.com"
          href: https://www.ffverse.com
reference:
  - title: Main Functions
    contents:
      - build_nflfastR_pbp
      - update_db
      - update_pbp_db
  - title: Load Functions
    desc: >
      These functions access precomputed data using the nflreadr package.
      See <https://nflreadr.nflverse.com> for info and more data load functions.
    contents:
      - reexports
  - title: Utility Functions
    contents:
      - save_raw_pbp
      - missing_raw_pbp
      - starts_with("calculate_")
  - title: Documentation
    contents:
      - nflfastR-package
      - teams_colors_logos
      - field_descriptions
      - stat_ids
      - nfl_stats_variables
  - title: Lower Level Functions
    desc: >
      These functions are wrapped in the above listed main functions and
      typically not used by the enduser.
    contents:
      - fast_scraper
      - add_qb_epa
      - add_xpass
      - add_xyac
      - clean_pbp
      - decode_player_ids
  - title: Deprecated
    desc: 'These functions are no longer recommended for use, see nflreadr for latest versions.'
    contents:
      - fast_scraper_roster
      - fast_scraper_schedules
      - report


================================================
FILE: pkgdown/extra.css
================================================
/*
Check: https://www.w3schools.com/css/css_rwd_mediaqueries.asp
for Responsive Web Design - Media Queries
*/
.row > main {
  max-width: 100%;
}

@media only screen and (min-width: 640px) {
  main + .col-md-3 {
    margin-left: unset;
    padding-left: 5rem;
    max-width: 75%;
  }
}

h4.author,h4.date {
  padding-top:0px;
  margin-top:0px;
}

.navbar-brand {
  font-weight: 300;
  font-size: 1.5rem;
  font-family: 'Kanit', sans-serif;
}

.me-auto {
    color: #009E8D !important;
}

/*
from gt custom css
draws lines between function names on reference page
*/

dt {
  text-decoration: underline;
  text-decoration-style: solid;
  text-underline-offset: 4px;
  font-family: monospace;
  border-top-style: dotted;
  border-top-width: 1px;
  border-top-color: gray;
  margin-bottom: 5px;
  padding-top: 5px;
}

.active .nav-link {
  color: #F85714 !important;
}


================================================
FILE: tests/testthat/_snaps/build_nflfastR_pbp.md
================================================
# default_play is synced with build_nflfastR_pbp

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["play_id", "game_id", "old_game_id", "home_team", "away_team", "season_type", "week", "posteam", "posteam_type", "defteam", "side_of_field", "yardline_100", "game_date", "quarter_seconds_remaining", "half_seconds_remaining", "game_seconds_remaining", "game_half", "quarter_end", "drive", "sp", "qtr", "down", "goal_to_go", "time", "yrdln", "ydstogo", "ydsnet", "desc", "play_type", "yards_gained", "shotgun", "no_huddle", "qb_dropback", "qb_kneel", "qb_spike", "qb_scramble", "pass_length", "pass_location", "air_yards", "yards_after_catch", "run_location", "run_gap", "field_goal_result", "kick_distance", "extra_point_result", "two_point_conv_result", "home_timeouts_remaining", "away_timeouts_remaining", "timeout", "timeout_team", "td_team", "td_player_name", "td_player_id", "posteam_timeouts_remaining", "defteam_timeouts_remaining", "total_home_score", "total_away_score", "posteam_score", "defteam_score", "score_differential", "posteam_score_post", "defteam_score_post", "score_differential_post", "no_score_prob", "opp_fg_prob", "opp_safety_prob", "opp_td_prob", "fg_prob", "safety_prob", "td_prob", "extra_point_prob", "two_point_conversion_prob", "ep", "epa", "total_home_epa", "total_away_epa", "total_home_rush_epa", "total_away_rush_epa", "total_home_pass_epa", "total_away_pass_epa", "air_epa", "yac_epa", "comp_air_epa", "comp_yac_epa", "total_home_comp_air_epa", "total_away_comp_air_epa", "total_home_comp_yac_epa", "total_away_comp_yac_epa", "total_home_raw_air_epa", "total_away_raw_air_epa", "total_home_raw_yac_epa", "total_away_raw_yac_epa", "wp", "def_wp", "home_wp", "away_wp", "wpa", "vegas_wpa", "vegas_home_wpa", "home_wp_post", "away_wp_post", "vegas_wp", "vegas_home_wp", "total_home_rush_wpa", "total_away_rush_wpa", "total_home_pass_wpa", "total_away_pass_wpa", "air_wpa", "yac_wpa", "comp_air_wpa", "comp_yac_wpa", "total_home_comp_air_wpa", "total_away_comp_air_wpa", "total_home_comp_yac_wpa", "total_away_comp_yac_wpa", "total_home_raw_air_wpa", "total_away_raw_air_wpa", "total_home_raw_yac_wpa", "total_away_raw_yac_wpa", "punt_blocked", "first_down_rush", "first_down_pass", "first_down_penalty", "third_down_converted", "third_down_failed", "fourth_down_converted", "fourth_down_failed", "incomplete_pass", "touchback", "interception", "punt_inside_twenty", "punt_in_endzone", "punt_out_of_bounds", "punt_downed", "punt_fair_catch", "kickoff_inside_twenty", "kickoff_in_endzone", "kickoff_out_of_bounds", "kickoff_downed", "kickoff_fair_catch", "fumble_forced", "fumble_not_forced", "fumble_out_of_bounds", "solo_tackle", "safety", "penalty", "tackled_for_loss", "fumble_lost", "own_kickoff_recovery", "own_kickoff_recovery_td", "qb_hit", "rush_attempt", "pass_attempt", "sack", "touchdown", "pass_touchdown", "rush_touchdown", "return_touchdown", "extra_point_attempt", "two_point_attempt", "field_goal_attempt", "kickoff_attempt", "punt_attempt", "fumble", "complete_pass", "assist_tackle", "lateral_reception", "lateral_rush", "lateral_return", "lateral_recovery", "passer_player_id", "passer_player_name", "passing_yards", "receiver_player_id", "receiver_player_name", "receiving_yards", "rusher_player_id", "rusher_player_name", "rushing_yards", "lateral_receiver_player_id", "lateral_receiver_player_name", "lateral_receiving_yards", "lateral_rusher_player_id", "lateral_rusher_player_name", "lateral_rushing_yards", "lateral_sack_player_id", "lateral_sack_player_name", "interception_player_id", "interception_player_name", "lateral_interception_player_id", "lateral_interception_player_name", "punt_returner_player_id", "punt_returner_player_name", "lateral_punt_returner_player_id", "lateral_punt_returner_player_name", "kickoff_returner_player_name", "kickoff_returner_player_id", "lateral_kickoff_returner_player_id", "lateral_kickoff_returner_player_name", "punter_player_id", "punter_player_name", "kicker_player_name", "kicker_player_id", "own_kickoff_recovery_player_id", "own_kickoff_recovery_player_name", "blocked_player_id", "blocked_player_name", "tackle_for_loss_1_player_id", "tackle_for_loss_1_player_name", "tackle_for_loss_2_player_id", "tackle_for_loss_2_player_name", "qb_hit_1_player_id", "qb_hit_1_player_name", "qb_hit_2_player_id", "qb_hit_2_player_name", "forced_fumble_player_1_team", "forced_fumble_player_1_player_id", "forced_fumble_player_1_player_name", "forced_fumble_player_2_team", "forced_fumble_player_2_player_id", "forced_fumble_player_2_player_name", "solo_tackle_1_team", "solo_tackle_2_team", "solo_tackle_1_player_id", "solo_tackle_2_player_id", "solo_tackle_1_player_name", "solo_tackle_2_player_name", "assist_tackle_1_player_id", "assist_tackle_1_player_name", "assist_tackle_1_team", "assist_tackle_2_player_id", "assist_tackle_2_player_name", "assist_tackle_2_team", "assist_tackle_3_player_id", "assist_tackle_3_player_name", "assist_tackle_3_team", "assist_tackle_4_player_id", "assist_tackle_4_player_name", "assist_tackle_4_team", "tackle_with_assist", "tackle_with_assist_1_player_id", "tackle_with_assist_1_player_name", "tackle_with_assist_1_team", "tackle_with_assist_2_player_id", "tackle_with_assist_2_player_name", "tackle_with_assist_2_team", "pass_defense_1_player_id", "pass_defense_1_player_name", "pass_defense_2_player_id", "pass_defense_2_player_name", "fumbled_1_team", "fumbled_1_player_id", "fumbled_1_player_name", "fumbled_2_player_id", "fumbled_2_player_name", "fumbled_2_team", "fumble_recovery_1_team", "fumble_recovery_1_yards", "fumble_recovery_1_player_id", "fumble_recovery_1_player_name", "fumble_recovery_2_team", "fumble_recovery_2_yards", "fumble_recovery_2_player_id", "fumble_recovery_2_player_name", "sack_player_id", "sack_player_name", "half_sack_1_player_id", "half_sack_1_player_name", "half_sack_2_player_id", "half_sack_2_player_name", "return_team", "return_yards", "penalty_team", "penalty_player_id", "penalty_player_name", "penalty_yards", "replay_or_challenge", "replay_or_challenge_result", "penalty_type", "defensive_two_point_attempt", "defensive_two_point_conv", "defensive_extra_point_attempt", "defensive_extra_point_conv", "safety_player_name", "safety_player_id", "season", "cp", "cpoe", "series", "series_success", "series_result", "order_sequence", "start_time", "time_of_day", "stadium", "weather", "nfl_api_id", "play_clock", "play_deleted", "play_type_nfl", "special_teams_play", "st_play_type", "end_clock_time", "end_yard_line", "fixed_drive", "fixed_drive_result", "drive_real_start_time", "drive_play_count", "drive_time_of_possession", "drive_first_downs", "drive_inside20", "drive_ended_with_score", "drive_quarter_start", "drive_quarter_end", "drive_yards_penalized", "drive_start_transition", "drive_end_transition", "drive_game_clock_start", "drive_game_clock_end", "drive_start_yard_line", "drive_end_yard_line", "drive_play_id_started", "drive_play_id_ended", "away_score", "home_score", "location", "result", "total", "spread_line", "total_line", "div_game", "roof", "surface", "temp", "wind", "home_coach", "away_coach", "stadium_id", "game_stadium", "aborted_play", "success", "passer", "passer_jersey_number", "rusher", "rusher_jersey_number", "receiver", "receiver_jersey_number", "pass", "rush", "first_down", "special", "play", "passer_id", "rusher_id", "receiver_id", "name", "jersey_number", "id", "fantasy_player_name", "fantasy_player_id", "fantasy", "fantasy_id", "out_of_bounds", "home_opening_kickoff", "qb_epa", "xyac_epa", "xyac_mean_yardage", "xyac_median_yardage", "xyac_success", "xyac_fd", "xpass", "pass_oe"]
        }
      },
      "value": ["numeric", "character", "character", "character", "character", "character", "integer", "character", "character", "character", "character", "numeric", "character", "numeric", "numeric", "numeric", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "numeric", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "numeric", "character", "character", "character", "numeric", "character", "character", "numeric", "numeric", "numeric", "character", "character", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "numeric", "numeric", "character", "character", "numeric", "numeric", "numeric", "numeric", "character", "character", "integer", "numeric", "numeric", "numeric", "numeric", "character", "numeric", "character", "character", "character", "character", "character", "character", "numeric", "character", "numeric", "character", "character", "character", "numeric", "character", "character", "numeric", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "character", "character", "character", "character", "numeric", "numeric", "integer", "integer", "character", "integer", "integer", "numeric", "numeric", "integer", "character", "character", "integer", "integer", "character", "character", "character", "character", "numeric", "numeric", "character", "integer", "character", "integer", "character", "integer", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "character", "character", "integer", "character", "character", "character", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"]
    }


================================================
FILE: tests/testthat/_snaps/stats/calculate_stats.md
================================================
# calculate_stats works

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "season_type", "recent_team", "games", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance_list", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
        }
      },
      "value": ["character", "character", "character", "character", "character", "character", "integer", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
    }

---

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "week", "season_type", "game_id", "team", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
        }
      },
      "value": ["character", "character", "character", "character", "character", "character", "integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
    }

---

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["season", "team", "season_type", "games", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "timeouts", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance_list", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards"]
        }
      },
      "value": ["integer", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer"]
    }

---

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["season", "week", "team", "season_type", "game_id", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "timeouts", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards"]
        }
      },
      "value": ["integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer"]
    }

---

    {
      "type": "character",
      "attributes": {
        "names": {
          "type": "character",
          "attributes": {},
          "value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "week", "season_type", "game_id", "team", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
        }
      },
      "value": ["character", "character", "character", "character", "character", "character", "integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
    }


================================================
FILE: tests/testthat/helpers.R
================================================
# sample games we'll use to check with
game_ids <- c("2025_01_KC_LAC", "2019_01_GB_CHI")

test_dir <- getwd()

pbp_cache <- tempfile("pbp_cache", fileext = ".rds")

load_test_pbp <- function(pbp = pbp_cache, dir = test_dir) {
  if (file.exists(pbp) && !is.null(dir)) {
    if (interactive()) {
      cli::cli_alert_info("Will return pbp from cache")
    }
    return(readRDS(pbp))
  }

  g <- readRDS(file.path(test_dir, paste0("games.rds")))

  # model output differs across machines so we round to 4 significant digits
  # to prevent failing tests
  pbp_data <- build_nflfastR_pbp(game_ids, dir = dir, games = g)
  if (!is.null(dir)) {
    saveRDS(pbp_data, pbp)
  }
  pbp_data
}

save_test_object <- function(object) {
  obj_name <- deparse(substitute(object))
  tmp_file <- tempfile(obj_name, fileext = ".csv")
  modify_digits <- dplyr::mutate_if(object, is.numeric, signif, digits = 3)
  data.table::fwrite(modify_digits, tmp_file, na = "NA")
  invisible(tmp_file)
}

load_expectation <- function(
  type = c("pbp", "sc", "sc_weekly", "ep", "wp"),
  dir = test_dir
) {
  type <- match.arg(type)
  file_name <- switch(
    type,
    "pbp" = "expected_pbp.rds",
    "sc" = "expected_sc.rds",
    "sc_weekly" = "expected_sc_weekly.rds",
    "ep" = "expected_ep.rds",
    "wp" = "expected_wp.rds",
  )
  strip_nflverse_attributes(readRDS(file.path(dir, file_name))) |>
    # we gotta round floating point numbers because of different model output
    # across platforms
    round_double_to_digits()
}

# strip nflverse attributes for tests because timestamp and version cause failures
# .internal.selfref is a data.table attribute that is not necessary in this case
strip_nflverse_attributes <- function(df) {
  input_attrs <- names(attributes(df))
  input_remove <- input_attrs[grepl(
    "nflverse|.internal.selfref|nflfastR",
    input_attrs
  )]
  attributes(df)[input_remove] <- NULL
  df
}

round_double_to_digits <- function(df, digits = 3) {
  dplyr::mutate(
    df,
    dplyr::across(
      .cols = relevant_variables(),
      .fns = function(vec) {
        formatC(vec, digits = digits, format = "fg") |>
          as.numeric() |>
          suppressWarnings()
      }
    )
  )
}

relevant_variables <- function() {
  c(
    dplyr::any_of(c(
      "no_score_prob",
      "opp_fg_prob",
      "opp_safety_prob",
      "opp_td_prob",
      "fg_prob",
      "safety_prob",
      "td_prob",
      "ep",
      "cp",
      "cpoe",
      "pass_oe",
      "xpass"
    )),
    dplyr::ends_with("epa"),
    dplyr::ends_with("wp"),
    dplyr::ends_with("wp_post"),
    dplyr::ends_with("wpa"),
    dplyr::starts_with("xyac")
  )
}


================================================
FILE: tests/testthat/test-build_nflfastR_pbp.R
================================================
test_that("build_nflfastR_pbp works (local data)", {
  # This test used to run on CRAN but their changes to env vars which cause
  # check NOTES for multi-threading forced us to skip on cran. It uses locally
  # available data so it can't break because of failed downloads
  # UPDATE Feb 2026: we'll try testing on CRAN again
  # skip_on_cran()

  pbp <- load_test_pbp(dir = test_dir)
  expect_s3_class(pbp, "nflverse_data")
  pbp <- strip_nflverse_attributes(pbp) |>
    # we gotta round floating point numbers because of different model output
    # across platforms
    round_double_to_digits()
  exp <- load_expectation("pbp")
  expect_equal(pbp, exp)
})

test_that("build_nflfastR_pbp works (outside CRAN)", {
  # this test is almost the same as above. However, it requires data download
  # and will therefore not run on CRAN but everywhere else.
  skip_on_cran()

  skip_if_offline("github.com")
  pbp <- load_test_pbp(dir = NULL)
  pbp <- strip_nflverse_attributes(pbp) |>
    # we gotta round floating point numbers because of different model output
    # across platforms
    round_double_to_digits()
  exp <- load_expectation("pbp")
  expect_equal(pbp, exp)
})

test_that("default_play is synced with build_nflfastR_pbp", {
  # `default_play` is a table of 1 row that is supposed to match the
  # output structure of build_nflfastR_pbp. It is used to initialize the
  # data table in pbp DBs.
  # This test makes sure that it is synced with build_nflfastR_pbp

  exp <- load_expectation("pbp")

  names_and_types_exp <- vapply(exp, class, FUN.VALUE = character(1L))
  names_and_types_def <- vapply(default_play, class, FUN.VALUE = character(1L))

  expect_identical(names_and_types_def, names_and_types_exp)
  expect_snapshot_value(names_and_types_def, style = "json2")
})


================================================
FILE: tests/testthat/test-calculate_series_conversion_rates.R
================================================
test_that("calculate_series_conversion_rates works", {
  # This test used to run on CRAN but their changes to env vars which cause
  # check NOTES for multi-threading forced us to skip on cran.
  skip_on_cran()

  pbp <- load_test_pbp()

  sc <- calculate_series_conversion_rates(pbp = pbp, weekly = FALSE) |>
    round_double_to_digits()
  sc_weekly <- calculate_series_conversion_rates(pbp = pbp, weekly = TRUE) |>
    round_double_to_digits()

  exp_sc <- load_expectation("sc")
  exp_sc_weekly <- load_expectation("sc_weekly")

  expect_s3_class(sc, "tbl_df")
  expect_s3_class(sc_weekly, "tbl_df")

  expect_equal(sc, exp_sc)
  expect_equal(sc_weekly, exp_sc_weekly)
})


================================================
FILE: tests/testthat/test-calculate_stats.R
================================================
test_that("calculate_stats works", {
  skip_on_cran()
  skip_if_offline("github.com")

  s1 <- calculate_stats(
    seasons = 2023,
    summary_level = "season",
    stat_type = "player"
  )
  s2 <- calculate_stats(
    seasons = 2023,
    summary_level = "week",
    stat_type = "player"
  )
  s3 <- calculate_stats(
    seasons = 2023,
    summary_level = "season",
    stat_type = "team"
  )
  s4 <- calculate_stats(
    seasons = 2023,
    summary_level = "week",
    stat_type = "team"
  )
  s5 <- calculate_stats(
    seasons = 2023,
    summary_level = "week",
    stat_type = "player",
    season_type = "POST"
  )

  names_and_types_s1 <- vapply(s1, class, FUN.VALUE = character(1L))
  names_and_types_s2 <- vapply(s2, class, FUN.VALUE = character(1L))
  names_and_types_s3 <- vapply(s3, class, FUN.VALUE = character(1L))
  names_and_types_s4 <- vapply(s4, class, FUN.VALUE = character(1L))
  names_and_types_s5 <- vapply(s5, class, FUN.VALUE = character(1L))

  var_names <- nflfastR::nfl_stats_variables$variable

  # Make sure variable names are listed in nflfastR::nfl_stats_variables$variable
  expect_in(names(names_and_types_s1), var_names)
  expect_in(names(names_and_types_s2), var_names)
  expect_in(names(names_and_types_s3), var_names)
  expect_in(names(names_and_types_s4), var_names)
  expect_in(names(names_and_types_s5), var_names)

  # Weak row number test
  expect_gt(nrow(s1), 1900)
  expect_gt(nrow(s2), 17500)
  expect_identical(nrow(s3), 32L)
  expect_gt(nrow(s4), 500)
  expect_gt(nrow(s5), 800)

  # Snapshot variable types and names
  expect_snapshot_value(names_and_types_s1, style = "json2", variant = "stats")
  expect_snapshot_value(names_and_types_s2, style = "json2", variant = "stats")
  expect_snapshot_value(names_and_types_s3, style = "json2", variant = "stats")
  expect_snapshot_value(names_and_types_s4, style = "json2", variant = "stats")
  expect_snapshot_value(names_and_types_s5, style = "json2", variant = "stats")
})

test_that("calculate_stats works with pbp subsets", {
  skip_on_cran()
  skip_if_offline("github.com")

  pbp <- load_pbp(2024) |>
    dplyr::filter(week <= 2, grepl("LAC", game_id))
  s <- calculate_stats(summary_level = "week", stat_type = "player", pbp = pbp)

  # Weak row number test
  expect_lt(nrow(s), 130)

  # week is filtered to <= 2 so stats should return only those weeks
  expect_in(unique(s$week), 1:2)

  # drop some required columns
  pbp_wrong <- pbp |> dplyr::mutate(qb_epa = NULL, play_type = NULL)
  expect_error(
    calculate_stats(pbp = pbp_wrong),
    regexp = 'missing the following required variables: "play_type" and "qb_epa"'
  )
})


================================================
FILE: tests/testthat/test-ep_wp_calculators.R
================================================
test_that("calculate_expected_points works", {
  # This test used to run on CRAN but their changes to env vars which cause
  # check NOTES for multi-threading forced us to skip on cran.
  skip_on_cran()

  data <- tibble::tibble(
    "season" = 2018:2019,
    "home_team" = "SEA",
    "posteam" = "SEA",
    "roof" = "outdoors",
    "half_seconds_remaining" = 1800,
    "yardline_100" = 75,
    "down" = 1,
    "ydstogo" = 10,
    "posteam_timeouts_remaining" = 3,
    "defteam_timeouts_remaining" = 3
  )
  ep <- calculate_expected_points(data) |> round_double_to_digits()
  exp <- load_expectation("ep")
  expect_equal(ep, exp)
})

test_that("calculate_expected_points works", {
  # This test used to run on CRAN but their changes to env vars which cause
  # check NOTES for multi-threading forced us to skip on cran.
  skip_on_cran()

  data <- tibble::tibble(
    "receive_2h_ko" = 0,
    "home_team" = "SEA",
    "posteam" = "SEA",
    "score_differential" = 0,
    "half_seconds_remaining" = 1800,
    "game_seconds_remaining" = 3600,
    "spread_line" = c(1, 3, 4, 7, 14),
    "down" = 1,
    "ydstogo" = 10,
    "yardline_100" = 75,
    "posteam_timeouts_remaining" = 3,
    "defteam_timeouts_remaining" = 3
  )
  wp <- calculate_win_probability(data) |> round_double_to_digits()
  exp <- load_expectation("wp")
  expect_equal(wp, exp)
})


================================================
FILE: tests/testthat.R
================================================
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/tests.html
# * https://testthat.r-lib.org/reference/test_package.html#special-files

library(testthat)
library(nflfastR)

test_check("nflfastR")


================================================
FILE: tools/check.env
================================================
# Check for usage of more than two cores. We really need to do this
# because CRAN kept rejecting nflfastR
# It is not supported on Windows and keeps failing on Debian, so it's
# probably necessary to make sure it doesn't fail on Debian
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_="2.5"
_R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_="2.5"


================================================
FILE: vignettes/.gitignore
================================================
*.html
*.R
pbp_db


================================================
FILE: vignettes/beginners_guide.Rmd
================================================
---
title: "A beginner's guide to nflfastR"
author: "Ben Baldwin"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

## Introduction

The following guide will assume you have R installed. I also highly recommend working in RStudio. If you need help getting those installed or are unfamiliar with how RStudio is laid out, [please see this section of Lee Sharpe's guide](https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md#r-and-rstudio-introduction).

A quick word if you're new to programming: all of this is happening in R. Obviously, you need to install R on your computer to do any of this. Make sure you save what you're doing in a script (in RStudio, File --> New File --> R script) so you can save your work and run multiple lines of code at once. To run code from a script, highlight what you want, and press control + enter or press the Run button in the top of the editor (see Lee's guide). If you don't highlight anything and press control + enter, the currently selected line will run. As you go through your R journey, you might get stuck and have to google a bunch of things, but that's totally okay and normal. That's how I got started!

## Setup

First, you need to install the magic packages. You only need to run this step once on a given computer. For these you can just type them into the RStudio console (look for the Console pane in RStudio) directly since you're never going to be doing this again.

### Install packages

``` {r eval = FALSE}
install.packages("tidyverse", type = "binary")
install.packages("ggrepel", type = "binary")
install.packages("nflreadr", type = "binary")
install.packages("nflplotR", type = "binary")
```

### Load packages

Okay, now here's the stuff you're going to want to start putting into your R script. The following loads `tidyverse`, which contains a lot of helper functions for working with data and `ggrepel` for making figures, along with `nflreadr` (which allows one to quickly download `nflfastR` data, along with a lot of other data). Finally, `nflplotR` makes plotting easier.
``` {r, results = 'hide', message = FALSE }
library(tidyverse)
library(ggrepel)
library(nflreadr)
library(nflplotR)
```

This one is optional but makes R prefer not to display numbers in scientific notation, which I find very annoying:
``` {r}
options(scipen = 9999)
```

### Load data

This will load the full play by play for the 2019 season (including playoffs). We'll get to how to get more seasons later. Note that this is downloading pre-cleaned data from the nflfastR data repository using the `load_pbp()` function included in `nflreadr`, which is much faster than building pbp from scratch.

``` {r}
data <- load_pbp(2019)
```

## Basics: how to look at your data

### Dimensions

```{r echo=FALSE}
rows = dim(data)[[1]]
cols = dim(data)[[2]]
```

Before moving forward, here are a few ways to get a sense of what's in a dataframe. We can check the **dim**ensions of the data, and this tells us that there are ```r rows``` rows (i.e., plays) in the data and ```r cols``` columns (variables):

``` {r}
dim(data)
```

`str` displays the **str**ucture of the dataframe:
``` {r}
str(data[1:10])
```

In the above, I've added in the `[1:10]`, which selects only the first 10 columns, otherwise the list is extremely long (remember from above that there are ```r cols``` columns!). Normally, you would just type `str(data)`.

You can similarly take a glimpse at your data:

``` {r}
glimpse(data[1:10])
```

Where again I'm only showing the first 10 columns. The usual command would be `glimpse(data)`.

### Variable names

Another very useful command is to get the `names` of the variables in the data, which you would get by entering `names(data)` (I won't show here because, again, it is  ```r cols``` columns).

That is a lot to work with!

### Viewer

One more way to look at your data is with the `View()` function. If you're coming from an Excel background, this will help you feel more at home as a way to see what's in the data.

``` {r eval = FALSE}
View(data)
```
This will open the viewer in RStudio in a new panel. Try it out yourself! Since there are so many columns, the Viewer won't show them all. To pick which columns to view, you can **select** some:

``` {r eval = FALSE}
data |>
  select(home_team, away_team, posteam, desc) |>
  View()
```

The `|>` thing lets you pipe together a bunch of different commands. So we're taking our data, "`select`"ing a few variables we want to look at, and then Viewing. Again, I can't display the results of that here, but try it out yourself!

### Head + manipulation

To start, let's just look at the first few rows (the "head") of the data.

``` {r}
data |> 
  select(posteam, defteam, desc, rush, pass) |> 
  head()
```
A couple things. "`desc`" is the important variable that lists the description of what happened on the play, and `head` says to show the first few rows (the "head" of the data). Since this is already sorted by game, these are the first 6 rows from a week 1 game, ATL @ MIN. To make code easier to read, people often put each part of a pipe on a new line, which is useful when working with more complicated functions. We could run:

``` {r eval = FALSE}
data |> select(posteam, defteam, desc, rush, pass) |> head()
```

And it would return the exact same output as the one written out in multiple lines, but the code isn't as easy to read.

We've covered `select`, and the next important function to learn is `filter`, which lets you filter the data to what you want. The following returns only plays that are run plays and pass plays; i.e., no punts, kickoffs, field goals, or dead ball penalties (e.g. false starts) where we don't know what the attempted play was.

``` {r}
data |> 
  filter(rush == 1 | pass == 1) |>
  select(posteam, desc, rush, pass, name, passer, rusher, receiver) |> 
  head()
```

Compared to the first time we did this, the opening line for the start of the game, the kickoff, and the punt are now gone. Note that if you're checking whether a variable is equal to something, we need to use the double equals sign `==` like above. There's probably some technical reason for this [shrug emoji]. Also, the character `|` is used for "or", and `&` for "and". So `rush == 1 | pass == 1` means "rush or pass".

Note that the `rush`, `pass`, `name`, `passer`, `rusher`, and `receiver` columns are all `nflfastR` creations, where we have provided these to make working with the data easier. As we can see above, `passer` is filled in for all dropbacks (including sacks and scrambles, which also have `pass` = 1), and `name` is equal to the passer on pass plays and the rusher on rush plays. Think of this as the primary player involved on a play.

What if we wanted to view special teams plays? Again, we can use `filter`:

``` {r}
data |> 
  filter(special == 1) |>
  select(down, ydstogo, desc) |> 
  head()
```

Fourth down plays?

``` {r}
data |> 
  filter(down == 4) |>
  select(down, ydstogo, desc) |> 
  head()
```

Fourth down plays that aren't special teams plays?

``` {r}
data |> 
  filter(down == 4 & special == 0) |>
  select(down, ydstogo, desc) |> 
  head()
```

So far, we've just been taking a look at the initial dataset we downloaded, but none of our results are preserved. To save a new dataframe of just the plays we want, we need to use `<-` to assign a new dataframe. Let's save a new dataframe that's just run plays and pass plays with non-missing EPA, called `pbp_rp`.

``` {r}
pbp_rp <- data |>
  filter(rush == 1 | pass == 1, !is.na(epa))
```

In the above, `!is.na(epa)` means to exclude plays with missing (`na`) EPA. The `!` symbol is often used by computer folk to negate something, so `is.na(epa)` means "EPA is missing" and `!is.na(epa)` means "EPA is not missing", which we have used above.

## Some basic stuff: Part 1

Okay, we have a big dataset where we call dropbacks pass plays and non-dropbacks rush plays. Now we actually want to, like, do stuff.

### Group by and Summarize

Let's take a look at how various Cowboys' running backs fared on run plays in 2019:

``` {r}
pbp_rp |>
	filter(posteam == "DAL", rush == 1) |>
	group_by(rusher) |>
	summarize(
	  mean_epa = mean(epa), success_rate = mean(success), ypc = mean(yards_gained), plays = n()
	  ) |>
	arrange(-mean_epa) |>
	filter(plays > 20)
```

There's a lot going on here. We've covered `filter` already. The `group_by` function is an *extremely* useful function that, well, groups by what you tell it -- in this case the rusher. Summarize is useful for collapsing the data down to a summary of what you're looking at, and here, while grouping by player, we're summarizing the mean of EPA, success, yardage (a bad rushing stat, but since we're here), and getting the number of plays using `n()`, which returns the number in a group. Unsurprisingly, Prescott was much more effective as a rusher in 2019 than the running backs, and there was no meaningful difference between Pollard and Elliott in efficiency.

If you check the [PFR team stats page](https://www.pro-football-reference.com/teams/dal/2019.htm), you'll notice that the above doesn't match up with the official stats. This is because `nflfastR` computes EPA and provides player names on plays with penalties and on two-point conversions. So if wanting to match the official stats, we need to restrict to `down <= 4` (to excluded two-point conversions, which have down listed as `NA`) and `play_type = run` (to exclude penalties, which are `play_type = no_play`):

``` {r}
pbp_rp |>
	filter(posteam == "DAL", down <= 4, play_type == 'run') |>
	group_by(rusher) |>
	summarize(
	  mean_epa = mean(epa), success_rate = mean(success), ypc=mean(yards_gained), plays=n()
	  ) |>
	filter(plays > 20)
```

Now we exactly match PFR: Zeke has 301 carries at 4.5 yards/carry, and Pollard has 86 carries for 5.3 yards/carry. Note that we still aren't matching Dak's stats to PFR because the NFL classifies scrambles as rush attempts and `nflfastR` does not.

### Manipulating columns: mutate, if_else, and case_when

Let's say we want to make a new column, named `home`, which is equal to 1 if the team with the ball is the home team. Let's introduce another extremely useful function, `if_else`:

``` {r}
pbp_rp |>
  mutate(
    home = if_else(posteam == home_team, 1, 0)
  ) |>
  select(posteam, home_team, home) |>
  head(10)
```

`mutate` is R's word for creating a new column (or overwriting an existing one); in this case, we've created a new column called `home`. The above uses `if_else`, which uses the following pattern: condition (in this case, `posteam == home_team`), value if condition is true (in this case, if `posteam == home_team`, it is 1), and value if the condition is false (0). So we could use this to, for example, look at average EPA/play by home and road teams:

``` {r}
pbp_rp |>
  mutate(
    home = if_else(posteam == home_team, 1, 0)
  ) |>
  group_by(home) |>
  summarize(epa = mean(epa))
```
Note that EPA/play is similar for home teams and away teams because `home` is already built into the `nflfastR` EPA model, so this result is expected. Actually, away EPA/play is actually somewhat higher, presumably because away teams out-performed their usual in 2019 as homefield advantage continues to decline generally.

`if_else` is nice if you're creating a new column based on a simple condition. But what if you need to do something more complicated? `case_when` is a good option. Here's how it works:

``` {r}
pbp_rp |>
  filter(!is.na(cp)) |>
  mutate(
    depth = case_when(
      air_yards < 0 ~ "Negative",
      air_yards >= 0 & air_yards < 10 ~ "Short",
      air_yards >= 10 & air_yards < 20 ~ "Medium",
      air_yards >= 20 ~ "Deep"
    )
  ) |>
  group_by(depth) |>
  summarize(cp = mean(cp))
```
Note the new syntax for `case_when`: we have condition (for the first one, air yards less than 0), followed by `~`, followed by assignment (for the first one, "Negative"). In the above, we created 4 bins based on air yards and got average completion probability (`cp`) based on the `nflfastR` model. Unsurprisingly, `cp` is lower the longer downfield a throw goes.

### A basic figure

Now that we've gained some skills at manipulating data, let's put it to use by making things. Which teams were the most pass-heavy in the first half on early downs with win probability between 20 and 80, excluding the final 2 minutes of the half when everyone is pass-happy?

``` {r}
schotty <- pbp_rp |>
	filter(wp > .20 & wp < .80 & down <= 2 & qtr <= 2 & half_seconds_remaining > 120) |>
	group_by(posteam) |>
	summarize(mean_pass = mean(pass), plays = n()) |>
	arrange(-mean_pass)
schotty
```

Again, we've already used `filter`, `group_by`, and `summarize`. The new function we are using here is `arrange`, which sorts the data by the variable(s) given. The minus sign in front of `mean_pass` means to sort in descending order.

Let's make our first figure:

```{r fig1, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
ggplot(schotty, aes(x = reorder(posteam, -mean_pass), y = mean_pass)) +
  geom_text(aes(label = posteam))
```

This image is kind of a mess -- we still need a title, axis labels, etc -- but gets the point across. We'll get to that other stuff later. But more importantly, we made something interesting using `nflfastR` data! The "reorder" sorts the teams according to pass rate, with the "-" again saying to do it in descending order. "aes" is short for "aesthetic", which is R's weird way of asking which variables should go on the x and y axes.

Looking at the figure, the Chiefs will never have playoff success until they establish the run.

## Loading multiple seasons

Because all the data is stored in the data repository, it is very fast to load data from multiple seasons.

``` {r}
pbp <- load_pbp(2015:2019)
```

This loads play-by-play data from the 2015 through 2019 seasons. 

Let's make sure we got it all. By now, you should understand what this is doing:

``` {r}
pbp |>
  group_by(season) |>
  summarize(n = n())
```

So each season has about 48,000 plays. Just for fun, let's look at the various play types:

``` {r}
pbp |>
  group_by(play_type) |>
  summarize(n = n())
```

## Figures with QB stats

Let's do some stuff with quarterbacks:

``` {r}
qbs <- pbp |>
  filter(season_type == "REG", !is.na(epa)) |>
  group_by(id, name) |>
  summarize(
    epa = mean(qb_epa),
    cpoe = mean(cpoe, na.rm = T),
    n_dropbacks = sum(pass),
    n_plays = n(),
    team = last(posteam)
  ) |>
  ungroup() |>
  filter(n_dropbacks > 100 & n_plays > 1000)
```

Lots of new stuff here. First, we're grouping by `id` and `name` to make sure we're getting unique players; i.e., if two players have the same name (like Javorius Allen and Josh Allen both being J.Allen), we are also using their id to differentiate them. `qb_epa` is an `nflfastR` creation that is equal to EPA in all instances except for when a pass is completed and a fumble is lost, in which case a QB gets "credit" for the play up to the spot the fumble was lost (making EPA function like passing yards). The `last` part in the `summarize` comment gets the last team that a player was observed playing with.

My way of getting a dataset with only quarterbacks without joining to external roster data is to make sure they hit some number of dropbacks. In this case, filtering with `n_dropbacks > 100` makes sure we're only including quarterbacks. The `ungroup()` near the end is good practice after grouping to make sure you don't get weird behavior with the data you created down the line.

Let's make some more figures. The `load_teams()` function is provided in the `nflreadr` package, so since we have already loaded the package, it's ready to use.

``` {r}
load_teams()
```

Let's join this to the `qbs` dataframe we created:

``` {r}
qbs <- qbs |>
  left_join(load_teams(), by = c('team' = 'team_abbr'))
```

`left_join` means keep all the rows from the left dataframe (the first one provided, `qbs`), and join those rows to available rows in the other dataframe. We also need to provide the joining variables, `team` from `qbs` and `team_abbr` from `load_teams()`. Why do we have to type `by = c('team' = 'team_abbr')`? Who knows, but it's what `left_join` requires as instructions for how to match.

### With team color dots

Now we can make a figure!

```{r fig2, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
qbs |>
  ggplot(aes(x = cpoe, y = epa)) +
  #horizontal line with mean EPA
  geom_hline(yintercept = mean(qbs$epa), color = "red", linetype = "dashed", alpha=0.5) +
  #vertical line with mean CPOE
  geom_vline(xintercept =  mean(qbs$cpoe), color = "red", linetype = "dashed", alpha=0.5) +
  #add points for the QBs with the right colors
  #cex controls point size and alpha the transparency (alpha = 1 is normal)
  geom_point(color = qbs$team_color, cex=qbs$n_plays / 350, alpha = .6) +
  #add names using ggrepel, which tries to make them not overlap
  geom_text_repel(aes(label=name)) +
  #add a smooth line fitting cpoe + epa
  stat_smooth(geom='line', alpha=0.5, se=FALSE, method='lm')+
  #titles and caption
  labs(x = "Completion % above expected (CPOE)",
       y = "EPA per play (passes, rushes, and penalties)",
       title = "Quarterback Efficiency, 2015 - 2019",
       caption = "Data: @nflfastR") +
  #uses the black and white ggplot theme
  theme_bw() +
  #center title with hjust = 0.5
  theme(
    plot.title = element_text(size = 14, hjust = 0.5, face = "bold")
  ) +
  #make ticks look nice
  #if this doesn't work, `install.packages('scales')`
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10))

```

This looks complicated, but is just a way of getting a bunch of different stuff on the same plot: we have lines for averages, dots, names, etc. I added comments above to explain what is going on, but in practice for making figures I usually just copy and paste stuff and/or google what I need.

### With team logos

We could also make the same plot with team logos:

```{r fig3, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
qbs |>
  ggplot(aes(x = cpoe, y = epa)) +
  #horizontal line with mean EPA
  geom_hline(yintercept = mean(qbs$epa), color = "red", linetype = "dashed", alpha=0.5) +
  #vertical line with mean CPOE
  geom_vline(xintercept =  mean(qbs$cpoe), color = "red", linetype = "dashed", alpha=0.5) +
  #add points for the QBs with the logos (this uses nflplotR package)
  geom_nfl_logos(aes(team_abbr = team), width = qbs$n_plays / 45000, alpha = 0.75) +
  #add names using ggrepel, which tries to make them not overlap
  geom_text_repel(aes(label=name)) +
  #add a smooth line fitting cpoe + epa
  stat_smooth(geom='line', alpha=0.5, se=FALSE, method='lm')+
  #titles and caption
  labs(x = "Completion % above expected (CPOE)",
       y = "EPA per play (passes, rushes, and penalties)",
       title = "Quarterback Efficiency, 2015 - 2019",
       caption = "Data: @nflfastR") +
  theme_bw() +
  #center title
  theme(
    plot.title = element_text(size = 14, hjust = 0.5, face = "bold")
  ) +
  #make ticks look nice
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10))

```

The only changes we've made are to use `geom_nfl_logos` instead of `geom_point` (how to figure out the right size for the images in the `width` part? Trial and error).

This figure would look better with fewer players shown, but the point of this is explaining how to do stuff, so let's call this good enough.

### Team tiers plot

If it's helpful, here are a few notes about the [chart originally shown here](https://www.nflfastr.com/articles/nflfastR.html#example-5-plot-offensive-and-defensive-epa-per-play-for-a-given-season), which like the above uses nflplotR for team logos.

```{r ex5, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
library(nflplotR)
# get pbp and filter to regular season rush and pass plays
pbp <- nflreadr::load_pbp(2005) |>
  filter(season_type == "REG") |>
  filter(!is.na(posteam) & (rush == 1 | pass == 1))
# offense epa
offense <- pbp |>
  group_by(team = posteam) |>
  summarise(off_epa = mean(epa, na.rm = TRUE))
# defense epa
defense <- pbp |>
  group_by(team = defteam) |>
  summarise(def_epa = mean(epa, na.rm = TRUE))
# make figure
offense |>
  inner_join(defense, by = "team") |>
  ggplot(aes(x = off_epa, y = def_epa)) +
  # tier lines
  geom_abline(slope = -1.5, intercept = (4:-3)/10, alpha = .2) +
  # nflplotR magic
  nflplotR::geom_mean_lines(aes(y0 = off_epa, x0 = def_epa)) +
  nflplotR::geom_nfl_logos(aes(team_abbr = team), width = 0.07, alpha = 0.7) +
  labs(
    x = "Offense EPA/play",
    y = "Defense EPA/play",
    caption = "Data: @nflfastR",
    title = "2005 NFL Offensive and Defensive EPA per Play"
  ) +
  theme_bw() +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5, face = "bold")
  ) +
  scale_y_reverse()
```


* The `geom_mean_lines()` function adds mean lines for offensive and defensive EPA per play 
* The slope lines are created using `geom_abline()`
* `scale_y_reverse()` reverses the vertical axis so that up = better defense

Everything else should be comprehensible by now!

### A few more things on plotting

There are two ways to view plots. One is in the RStudio Viewer, which shows up in RStudio when you plot something. If plots in your RStudio viewer look ugly and pixelated, you probably need to install the `Cairo` package and then set that as the default viewer by doing Tools --> Global Options --> General --> Graphics --> Backend: Set to Cairo.

The other is to save a .png with your preferred dimensions and resolution. For example, `ggsave("test.png", width = 16, height = 9, units = "cm")` would save the current plot as "`test.png`" with the units specified (you can view all the ggsave options [here](https://ggplot2.tidyverse.org/reference/ggsave.html)). 

One more note: the RStudio Viewer can take a long time to preview ggplots, especially if you're doing things like adding images. If you're getting frustrated with a plot taking a long time to display, you can take advantage of [ggpreview](https://nflplotr.nflverse.com/reference/ggpreview.html) from `nflplotR`. To do this, first save the plot to an object and then run `ggpreview` on it (if this doesn't make sense, see the examples [here](https://nflplotr.nflverse.com/reference/ggpreview.html)).

## Real life example: let's make a win total model

I'm going to try to go through the process of cleaning and joining multiple data sets to try to get a sense of how I would approach something like this, step-by-step.

### Get team wins each season

We're going to cheat a little and take advantage of Lee Sharpe's famous `games` file. Most of this stuff has been added into `nflfastR`, but it's easier working with this file where each game is one row. If you're curious, the triple colon is a way to access what is referred to as non-exported functions in a package. Think of this as like a secret menu (why is this secret? Sometimes package developers want to limit the number of exported functions as to be not overwhelming).

``` {r}
games <- nflreadr::load_schedules()
str(games)
```

To start, we want to create a dataframe where each row is a team-season observation, listing how many games they won. There are multiple ways to do this, but I'm going to just take the home and away results and bind together. As an example, here's what the `home` results look like:

``` {r}
home <- games |>
  filter(game_type == 'REG') |>
  select(season, week, home_team, result) |>
  rename(team = home_team)
home |> head(5)
```
Note that we used `rename` to change `home_team` to `team`.

``` {r}
away <- games |>
  filter(game_type == 'REG') |>
  select(season, week, away_team, result) |>
  rename(team = away_team) |>
  mutate(result = -result)
away |> head(5)
```
For away teams, we need to flip the result since result is given from the perspective of the home team. Now let's make a columns called `win` based on the result.

``` {r}
results <- bind_rows(home, away) |>
  arrange(week) |>
  mutate(
    win = case_when(
      result > 0 ~ 1,
      result < 0 ~ 0,
      result == 0 ~ 0.5
    )
  )

results |> filter(season == 2019 & team == 'SEA')
```

Doing the `results |> filter(season == 2019 & team == 'SEA')` part at the end isn't actually for saving the data in a new form, but just making sure the previous step did what I wanted. This is a good habit to get into: frequently inspect your data and make sure it looks like you think it should.

Now that we have the dataframe we wanted, we can get team wins by season easily:

``` {r}
team_wins <- results |>
  group_by(team, season) |>
  summarize(
    wins = sum(win),
    point_diff = sum(result)) |>
  ungroup()

team_wins |>
  arrange(-wins) |>
  head(5)
```

Again, we're making sure the data looks like it "should" by checking the 5 seasons with the most wins, and making sure it looks right.

Now that the team-season win and point differential data is ready, we need to go back to the `nflfastR` data to get EPA/play.

### Get team EPA by season

Let's start by getting data from every season from the `nflfastR` data repository:

``` {r}
pbp <- load_pbp(1999:2019) |>
  filter(
    rush == 1 | pass == 1,
    season_type == "REG",
    !is.na(epa),
    !is.na(posteam),
    posteam != ""
  ) |>
  select(season, posteam, pass, defteam, epa)
```

I'm being pretty aggressive with dropping rows and columns (`filter` and `select`) because otherwise loading this all into memory can be painful on the computer. But this is all we need for what we're doing. Note that I'm only keeping regular season games here (`season_type == "REG"`) since this is how this analysis is usually done.

Now we can get EPA/play on offense and defense. Let's break it out by pass and rush too. I don't remember how to do some of this so let's do it in steps. We know we need to group by team, season, and pass, so there's the beginning:

``` {r}
pbp |>
  group_by(posteam, season, pass) |> 
  summarize(epa = mean(epa)) |>
  head(4)
```

But this makes two rows per team-season. How to get each team-season on the same row? `pivot_wider` is what we need:
``` {r}
pbp |>
  group_by(posteam, season, pass) |> 
  summarize(epa = mean(epa)) |>
  pivot_wider(names_from = pass, values_from = epa) |>
  head(4)
```

This one is hard to wrap my head around so I usually open up the [reference page](https://tidyr.tidyverse.org/reference/pivot_wider.html), read the example, and pray that what I try works. In this case it did. Hooray! This turned our two-lines-per-team dataframe into one, with the 0 column being pass == 0 (run plays) and the 1 column pass == 1. 

Now let's rename to something more sensible and save:

``` {r}
offense <- pbp |>
  group_by(posteam, season, pass) |> 
  summarize(epa = mean(epa)) |>
  pivot_wider(names_from = pass, values_from = epa) |>
  rename(off_pass_epa = `1`, off_rush_epa = `0`)
```

Note that variable names that are numbers need to be surrounded in tick marks for this to work.

Now we can repeat the same process for defense:

``` {r}
defense <- pbp |>
  group_by(defteam, season, pass) |> 
  summarize(epa = mean(epa)) |>
  pivot_wider(names_from = pass, values_from = epa) |>
  rename(def_pass_epa = `1`, def_rush_epa = `0`)
```

Let's do another sanity check looking at the top 5 pass offenses and defenses:
``` {r}
#top 5 offenses
offense |>
  arrange(-off_pass_epa) |>
  head(5)

#top 5 defenses
defense |>
  arrange(def_pass_epa) |>
  head(5)
```

The top pass defenses (2002 TB, 2017 JAX, 2019 NE) and offenses (2007 Pats, 2004 Colts, 2011 Packers) definitely check out! 

### Fix team names and join

Now we're ready to bind it all together. Actually, let's make sure all the team names are ready too.

``` {r}
team_wins |>
  group_by(team) |>
  summarize(n=n()) |>
  arrange(n)
```

Nope, not yet, we need to fix the Raiders, Rams, and Chargers, which are LV, LA, and LAC in `nflfastR`.

``` {r}
team_wins <- team_wins |>
  mutate(
    team = case_when(
      team == 'OAK' ~ 'LV',
      team == 'SD' ~ 'LAC',
      team == 'STL' ~ 'LA',
      TRUE ~ team
    )
  )
```

The `TRUE` statement at the bottom says that if none of the above cases are found, keep team the same. Let's make sure this worked:

``` {r}
team_wins |>
  group_by(team) |>
  summarize(n=n()) |>
  arrange(n)
```

HOU has 3 fewer seasons because it didn't exist from 1999 through 2001, which is fine, and all the other team names have number of seasons that they should. Okay NOW we can join:

``` {r}
data <- team_wins |>
  left_join(offense, by = c('team' = 'posteam', 'season')) |>
  left_join(defense, by = c('team' = 'defteam', 'season'))

data |>
  filter(team == 'SEA' & season >= 2012)
```

Now we're getting really close to doing what we want! Next we need to create new columns for prior year EPA, and let's do point differential too.

``` {r}
data <- data |> 
  arrange(team, season) |>
  group_by(team) |> 
  mutate(
    prior_off_rush_epa = lag(off_rush_epa),
    prior_off_pass_epa = lag(off_pass_epa),
    prior_def_rush_epa = lag(def_rush_epa),
    prior_def_pass_epa = lag(def_pass_epa),
    prior_point_diff = lag(point_diff)
  ) |> 
  ungroup()

data |>
  head(5)
```
Finally! Now we have the data in place and can start doing things with it.

### Correlations and regressions

``` {r}
data |> 
  select(-team, -season) |>
  cor(use="complete.obs") |>
  round(2)
```

```{r echo=FALSE}
pp = cor(data$off_pass_epa, data$prior_off_pass_epa, use="complete.obs") |>
  round(2)
rr = cor(data$off_rush_epa, data$prior_off_rush_epa, use="complete.obs") |>
  round(2)
pd = cor(data$def_pass_epa, data$prior_def_pass_epa, use="complete.obs") |>
  round(2)
rd = cor(data$def_rush_epa, data$prior_def_rush_epa, use="complete.obs") |>
  round(2)
```

We've covered `select`, but here we see a new use where a minus sign de-selects variables (we need to de-select team name for correlation to work because it doesn't work for character strings, and correlation with the season number itself is meaningless). We've run the correlation on this dataframe, removing missing values, and then rounding to 2 digits. Not surprisingly, we see that wins in the current season are more strongly related to passing offense EPA than rushing EPA or defense EPA, and prior offense carries more predictive power than prior defense. Pass offense is more stable year to year (```r pp```) than rush offense (```r rr```), pass defense (```r pd```), or rush defense (```r rd```).

I'm actually surprised that the values for passing offense aren't higher relative to the others. Maybe it was because most of our prior results come from the `nflscrapR` era (2009 - 2019)? Let's check what this looks like since 2009 relative to earlier seasons:

``` {r}
message("2009 through 2019")
data |> 
  filter(season >= 2009) |>
  select(wins, point_diff, off_pass_epa, off_rush_epa, prior_point_diff, prior_off_pass_epa, prior_off_rush_epa) |>
  cor(use="complete.obs") |>
  round(2)
```

``` {r}
message("1999 through 2008")
data |> 
  filter(season < 2009) |>
  select(wins, point_diff, off_pass_epa, off_rush_epa, prior_point_diff, prior_off_pass_epa, prior_off_rush_epa) |>
  cor(use="complete.obs") |>
  round(2)
```

Yep, that seems to be the case. So in the more recent period, passing offense has become slightly more stable but more predictive of following-year success, while at the same time rushing offense has become substantially less stable and less predictive of future team success.

Now let's do a basic regression of wins on prior offense and defense EPA/play. Maybe we should only look at this more recent period to fit our model since it's more relevant for 2020. In the real world, we would be more rigorous about making decisions like this, but let's proceed anyway.

``` {r}
data <- data |> filter(season >= 2009)

fit <- lm(wins ~ prior_off_pass_epa  + prior_off_rush_epa + prior_def_pass_epa + prior_def_rush_epa, data = data)

summary(fit)
```

I'm actually pretty surprised passing offense isn't higher here. How does this compare to simply using point differential?

``` {r}
fit2 <- lm(wins ~ prior_point_diff, data = data)

summary(fit2)
```

So R2 is somewhat higher for just point differential. This isn't surprising as we've thrown away special teams plays and haven't attempted to make any adjustments for things like fumble luck that we know can improve EPA's predictive power.

### Predictions

Now let's get the predictions from the EPA model:

``` {r}
preds <- predict(fit, data |> filter(season == 2020)) |>
  #was just a vector, need a tibble to bind
  as_tibble() |>
  #make the column name make sense
  rename(prediction = value) |>
  round(1) |>
  #get names
  bind_cols(
    data |> filter(season == 2020) |> select(team)
  )

preds |>
  arrange(-prediction) |>
  head(5)
```

This mostly checks out. 

What if we just used simple point differential to predict?

``` {r}
preds2 <- predict(fit2, data |> filter(season == 2020)) |>
  #was just a vector, need a tibble to bind
  as_tibble() |>
  #make the column name make sense
  rename(prediction = value) |>
  round(1) |>
  #get names
  bind_cols(
    data |> filter(season == 2020) |> select(team)
  )

preds2 |>
  arrange(-prediction) |>
  head(5)
```

Not surprisingly, this looks pretty similar. These are very basic models that don't incorporate schedule, roster changes, etc. For example, a better model would take into account Tom Brady no longer playing for the Patriots. But hopefully this has been useful!

## Next Steps

You now should know enough to be able to tackle a great deal of questions using `nflfastR` data. A good way to build up skills is to take interesting things you see and try to replicate them (for making figures, this will also involve a heavy dose of googling stuff).

Looking at others' code is also a good way to learn. One option is to look through the `nflfastR` code base, much of which you should now understand what it's doing. For example, [here is the function that cleans up the data and prepares it for later stages](https://github.com/mrcaseb/nflfastR/blob/master/R/helper_add_nflscrapr_mutations.R): there's a heavy dose of `mutate`, `group_by`, `arrange`, `lag`, `if_else`, and `case_when`. 

### Resources: The gold standards

This is an R package so this section is pretty R heavy.

* [Introduction to R (**recommended**)](https://r4ds.had.co.nz/explore-intro.html)
* [Open Source Football](https://www.opensourcefootball.com/): Mix of R and Python
* [The Mockup Blog (Thomas Mock)](https://themockup.blog/): Invaluable resource for making cool stuff in R

### Code examples: R

* [Lee Sharpe: basic intro to R and RStudio](https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md)
* [Lee Sharpe: lots of useful NFL / nflscrapR code](https://github.com/leesharpe/nfldata)
* [Lee Sharpe: how to update current season games](https://github.com/leesharpe/nfldata/blob/master/UPDATING-NFLSCRAPR.md)
* [Josh Hermsmeyer: Getting Started with R for NFL Analysis](https://t.co/gxDDhOYhcI)
* [Slavin: visualizing positional tiers in SFB9](https://slavin22.github.io/SFB9-Positional-Tiers/Guide.nb)
* [Ron Yurko: assorted examples](https://github.com/ryurko/nflscrapR-data/tree/master/R)
* [CowboysStats: defensive playmaking EPA](https://github.com/dhouston890/cowboys-stats/blob/master/playmaking_epa_pbp.R)
* [Michael Lopez: function to sample plays](https://github.com/statsbylopez/BlogPosts/blob/master/scrapr-data.R)
* [Michael Lopez: R for NFL analysis (presentation to club staffers)](https://statsbylopez.netlify.com/post/r-for-nfl-analysis/)
* [Mitchell Wesson: QB hits investigation](https://gist.github.com/wessonmo/45781bd25a74e8097e0c8bc8fbacf796)
* [Mitchell Wesson: Investigation of the nflscrapR EP model](https://gist.github.com/wessonmo/ef44ea9873d70f816454cb88b86dcce6)
* [WHoffman: graphs for receivers (aDoT, success rate, and more)](https://github.com/whoffman21279/Steelers/blob/master/receiving_stats)
* [ChiBearsStats: investigation of 3rd downs vs offensive efficiency](https://gist.github.com/ChiBearsStats/dac3266037797032a23f38fd9d64d6a8#file-adjustedthirddowns-txt)
* [ChiBearsStats: the insignificance of field goal kicking](https://gist.github.com/ChiBearsStats/78e33baeed3cd6d3cac0040b47d4ec69)

### More data sources

* [Lee Sharpe: Draft Picks, Draft Values, Games, Logos, Rosters, Standings](https://github.com/leesharpe/nfldata/blob/master/DATASETS.md)
* [greerre: how to get .csv file of weather & stadium data from PFR in python](https://github.com/greerre/pfr_metadata_pull)
* [Parker Fleming: Introduction to College Football Data with R and cfbscrapR](https://gist.github.com/spfleming/2527a6ca2b940af2a8aa1fee9320171d)

### Other code examples: Python

* [Deryck97: nflfastR Python Guide](https://gist.github.com/Deryck97/dff8d33e9f841568201a2a0d5519ac5e)
* [Nick Wan: nflfastR Python Colab Guide](https://colab.research.google.com/github/nickwan/colab_nflfastR/blob/master/nflfastR_starter.ipynb)
* [Cory Jez: animated plot](https://github.com/jezlax/sports_analytics/blob/master/animated_nfl_scatter.py)
* [903124S: Sampling EP](https://gist.github.com/903124/6693fdf6b991437a6d6ef9c5d935c83b)
* [903124S: estimating EPA using nfldb](https://gist.github.com/903124/d304f76688b0699497a35b61b6d1e267)
* [903124S: estimate EPA for college football](https://gist.github.com/903124/3c6f0dc0a100d78b8622573ef4c504f5)
* Blake Atkinson: explosiveness [blog post](https://medium.com/@BlakeAtkinson/the-2018-kansas-city-chiefs-and-an-explosiveness-metric-in-football-c3b3fd447d73) and [python code](https://github.com/btatkinson/yard_value/blob/master/yard_value.ipynb)
* Blake Atkinson: player type visualizations [blog post](https://medium.com/@BlakeAtkinson/visualizing-different-nfl-player-styles-88ef31420539) and [python code](https://github.com/btatkinson/player_vectors/blob/master/player_vectors.ipynb)


================================================
FILE: vignettes/field_descriptions.Rmd
================================================
---
title: "Field Descriptions"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = FALSE,
  comment = "#>"
)

with_dt <- requireNamespace("DT")

```

```{r eval = with_dt}
DT::datatable(
  nflfastR::field_descriptions,
  options = list(scrollX = TRUE, pageLength = 25),
  filter = "top",
  rownames = FALSE,
  style = "bootstrap4"
)
```

```{r eval = !with_dt}
knitr::kable(nflfastR::field_descriptions)
```


================================================
FILE: vignettes/nflfastR.Rmd
================================================
---
title: "Get started with nflfastR"
author: "Ben Baldwin & Sebastian Carl"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
future::plan("multisession")
options(dplyr.summarise.inform = FALSE)
options(nflreadr.verbose = FALSE)
```

If you are new to R or are having trouble understanding the code in the below sections we highly recommend the **nflfastR beginner's guide** in `vignette("beginners_guide")`.

# The Main Functions

nflfastR comes with a set of functions to access NFL play-by-play data. This section provides a brief introduction to the essential functions.

nflfastR processes and cleans up play-by-play data and adds variables through [it's models](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/). Since some of these tasks are performed by separate functions, the easiest way to compute the complete nflfastR dataset is `build_nflfastR_pbp()`. The main input for that function is a set of game ids which can be accessed with `load_schedules()`. The following code demonstrates how to build the nflfastR dataset for the Super Bowls of the 2017 - 2019 seasons.

```{r}
ids <- nflreadr::load_schedules(2017:2019) |>
  dplyr::filter(game_type == "SB") |>
  dplyr::pull(game_id)
pbp <- nflfastR::build_nflfastR_pbp(ids)
```

In most cases, however, it is not necessary to use this function for individual games, because nflverse provides both a [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp) and two main play-by-play functions: `load_pbp()` and `update_pbp_db()`. We cover `load_pbp()` below, and please see [Example 8: Using the built-in database function] for how to work with the database function `update_pbp_db()`.

The easiest way to access the data from the release is the function `load_pbp()`. It can load multiple seasons directly into memory and supports multiple data formats. Loading all play-by-play data of the 2022-2024 seasons is as easy as 

```{r}
pbp <- nflfastR::load_pbp(2022:2024)
```

Joining roster data to the play-by-play data set is possible as well. The data can be accessed with the function `load_rosters()` and its application is demonstrated in [Example 10: Working with roster and position data].

# Application Examples

All examples listed below assume that the following libraries are installed (and loaded).

``` {r load, warning = FALSE, message = FALSE}
library(nflfastR)
library(nflplotR)
library(dplyr)
library(ggplot2)
```

## Example 1: Completion Percentage Over Expected (CPOE)

Let's look at CPOE leaders from the 2009 regular season.

As discussed above, `nflfastR` has a data release for all available seasons, so there's no need to actually build them. Let's use that here with the convenience function `load_pbp()` which fetches data from the release (for non-R users, .csv and .parquet are also available in the [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp)).

``` {r ex3-cpoe, warning = FALSE, message = FALSE}
games_2009 <- nflfastR::load_pbp(2009) |> dplyr::filter(season_type == "REG")
games_2009 |>
  dplyr::filter_out(is.na(cpoe)) |>
  dplyr::summarize(
    passer = nflreadr::stat_mode(passer_player_name),
    cpoe = mean(cpoe),
    Atts = n(),
    .by = passer_player_id
  ) |>
  dplyr::filter(Atts > 200) |>
  dplyr::slice_max(cpoe, n = 5) |>
  knitr::kable(digits = 1)
```

## Example 2: Using Drive Information

When working with `nflfastR`, drive results are automatically included. We use `fixed_drive` and `fixed_drive_result` since the NFL-provided information is a bit wonky. Let's look at how much more likely teams were to score starting from 1st & 10 at their own 20 yard line in 2015 (the last year before touchbacks on kickoffs changed to the 25) than in 2000.

``` {r ex4, warning = FALSE, message = FALSE}
pbp <- nflfastR::load_pbp(c(2003, 2015))

out <- pbp |>
  dplyr::filter(
    season_type == "REG" & down == 1 & ydstogo == 10 & yardline_100 == 80
  ) |>
  dplyr::mutate(
    drive_score = dplyr::case_when(
      fixed_drive_result %in% c("Touchdown", "Field goal") ~ 1L,
      TRUE ~ 0L
    )
  ) |>
  dplyr::summarize(drive_score = mean(drive_score), .by = season)

out |>
  knitr::kable(digits = 3)
```

So `r scales::percent(out$drive_score[1], accuracy = 0.1)` of 1st & 10 plays from teams' own 20 would see the drive end up in a score in 2003, compared to `r scales::percent(out$drive_score[2], accuracy = 0.1)` in 2015. This has implications for Expected Points models (see [this article](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/)).

## Example 3: Plot offensive and defensive EPA per play for a given season

Let's build the **[NFL team tiers](https://rbsdm.com/stats/stats/)** using offensive and defensive expected points added per play for the 2005 regular season. Creating data viz including NFL team logos (or wordmarks, or headshots), we recommend the nflverse R package [nflplotR](https://nflplotr.nflverse.com).

When using `load_pbp()`, the helper function `clean_pbp()` has already been run, which creates "rush" and "pass" columns that (a) properly count sacks and scrambles as pass plays and (b) properly include plays with penalties. Using this, we can keep only rush or pass plays.

```{r ex5, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
pbp <- nflfastR::load_pbp(2005) |>
  dplyr::filter(season_type == "REG") |>
  dplyr::filter(!is.na(posteam) & (rush == 1 | pass == 1))
offense <- pbp |>
  dplyr::group_by(team = posteam) |>
  dplyr::summarise(off_epa = mean(epa, na.rm = TRUE))
defense <- pbp |>
  dplyr::group_by(team = defteam) |>
  dplyr::summarise(def_epa = mean(epa, na.rm = TRUE))
offense |>
  dplyr::inner_join(defense, by = "team") |>
  ggplot(aes(x = off_epa, y = def_epa)) +
  geom_abline(
    slope = -1.5,
    intercept = c(.4, .3, .2, .1, 0, -.1, -.2, -.3),
    alpha = .2
  ) +
  nflplotR::geom_mean_lines(aes(y0 = off_epa, x0 = def_epa)) +
  nflplotR::geom_nfl_logos(aes(team_abbr = team), width = 0.07, alpha = 0.7) +
  labs(
    x = "Offense EPA/play",
    y = "Defense EPA/play",
    caption = "Data: @nflfastR",
    title = "2005 NFL Offensive and Defensive EPA per Play"
  ) +
  theme_bw() +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5, face = "bold")
  ) +
  scale_y_reverse()
```

## Example 4: Expected Points calculator

We have provided a calculator for working with the Expected Points model. Here is an example of how to use it, looking for how the Expected Points on a drive beginning following a touchback has changed over time.

While I have put in `'SEA'` for `home_team` and `posteam`, this only matters for figuring out whether the team with the ball is the home team (there's no actual effect for given team; it would be the same no matter what team is supplied).

``` {r ex6a}
data <- tibble::tibble(
  "season" = 1999:2019,
  "home_team" = "SEA",
  "posteam" = "SEA",
  "roof" = "outdoors",
  "half_seconds_remaining" = 1800,
  "yardline_100" = c(rep(80, 17), rep(75, 4)),
  "down" = 1,
  "ydstogo" = 10,
  "posteam_timeouts_remaining" = 3,
  "defteam_timeouts_remaining" = 3
)
nflfastR::calculate_expected_points(data) |>
  dplyr::select(season, yardline_100, td_prob, ep) |>
  knitr::kable(digits = 2)
```

Not surprisingly, offenses have become much more successful over time, with the kickoff touchback moving from the 20 to the 25 in 2016 providing an additional boost. Note that the `td_prob` in this example is the probability that the next score within the same half will be a touchdown scored by team with the ball, **not** the probability that the current drive will end in a touchdown (this is why the numbers are different from Example 4 above).

We could compare the most recent four years to the expectation for playing in a dome by inputting all the same things and changing the `roof` input:

``` {r ex6b}
data <- tibble::tibble(
  "season" = 2016:2019,
  "week" = 5,
  "home_team" = "SEA",
  "posteam" = "SEA",
  "roof" = "dome",
  "half_seconds_remaining" = 1800,
  "yardline_100" = c(rep(75, 4)),
  "down" = 1,
  "ydstogo" = 10,
  "posteam_timeouts_remaining" = 3,
  "defteam_timeouts_remaining" = 3
)
nflfastR::calculate_expected_points(data) |>
  dplyr::select(season, yardline_100, td_prob, ep) |>
  knitr::kable(digits = 2)
```

So for 2018 and 2019, 1st & 10 from a home team's own 25 yard line had higher EP in domes than at home, which is to be expected.

## Example 5: Win probability calculator

We have also provided a calculator for working with the win probability models. Here is an example of how to use it, looking for how the win probability to begin the game depends on the pre-game spread.

While I have put in `'SEA'` for `home_team` and `posteam`, this only matters for figuring out whether the team with the ball is the home team (there's no actual effect for given team; it would be the same no matter what team is supplied).

``` {r ex7}
data <- tibble::tibble(
  "receive_2h_ko" = 0,
  "home_team" = "SEA",
  "posteam" = "SEA",
  "score_differential" = 0,
  "half_seconds_remaining" = 1800,
  "game_seconds_remaining" = 3600,
  "spread_line" = c(1, 3, 4, 7, 14),
  "down" = 1,
  "ydstogo" = 10,
  "yardline_100" = 75,
  "posteam_timeouts_remaining" = 3,
  "defteam_timeouts_remaining" = 3
)
nflfastR::calculate_win_probability(data) |>
  dplyr::select(spread_line, wp, vegas_wp) |>
  knitr::kable(digits = 2)
```

Not surprisingly, `vegas_wp` increases with the amount a team was coming into the game favored by.

## Example 6: Using the built-in database function

If you're comfortable using `dplyr` functions to manipulate and tidy data, you're ready to use a database. Why should you use a database?

* The provided function in `nflfastR` makes it extremely easy to build a database and keep it updated
* Play-by-play data over 25+ seasons takes up a lot of memory: working with a database allows you to only bring into memory what you actually need
* R makes it *extremely* easy to work with databases.

### Start: install and load packages

To start, we need to install the two packages required for this that aren't installed automatically when `nflfastR` installs: `DBI` and `duckdb` (advanced users can use other types of databases, but this example will use duckdb). The `if` statements make sure the packages won't be updated if they are already installed:

``` {r eval = FALSE}
if (!require("DBI")) install.packages("DBI")
if (!require("duckdb")) install.packages("duckdb")
```

### Overview

There's exactly one function in `nflfastR` that works with databases: `update_pbp_db()`. Some notes:

* `update_pbp_db()` follows the DBI argument naming convention and order. It requires an open connection created with `DBI::dbConnect()`.
* You can specify a different table name with `name`.
* The `seasons` argument controls how the table in the connected database is handled. This is a hybrid argument, and its behavior is described in detail [in the function documentation](https://nflfastr.com/reference/update_pbp_db.html#the-seasons-argument).
* If larger parts of the DB need to be updated, then you should definitely consider doing so in chunks. The `"nflfastR.db_chunk_size"` option is available for this purpose. Further details can also be found in the function documentation.

### Connect to a database

Working with databases always requires an open connection. In this example, we will focus solely on duckdb databases, as duckdb has essentially become the state of the art for this type of data. duckdb can easily create a database in your memory. Of course, this doesn't make sense for large amounts of data, because they shouldn't be stored in memory, but the process is practically identical with a locally stored database.

So let's connect to an in-memory duckdb database:

``` {r}
connection <- DBI::dbConnect(duckdb::duckdb())
connection
```

### Write data to the database

Let's say I just want to dump play-by-play data of the 2021 - 2024 seasons in my database. Here we go!

``` {r create-db}
nflfastR::update_pbp_db(connection, seasons = 2021:2024)
```

This created a table named "nflverse_pbp" in the connected database and appended 2024 play-by-play data to it.

Wait, that's it? That's it! What if it's partway through the season and you want to make sure all the new games are added to the database to allow for data corrections from the NFL to propagate into your database? What do you run? `update_pbp_db()`!

``` {r update-db}
nflfastR::update_pbp_db(connection)
```

### Work with the database

Now we're ready to do stuff. If you aren't familiar with databases, they're organized around tables. Here's how to see which tables are present in our database: 

``` {r}
DBI::dbListTables(connection)
```

Since we went with the defaults, there's a table called `nflverse_pbp`. Another useful function is to see the fields (i.e., columns) in a table:

``` {r}
DBI::dbListFields(connection, "nflverse_pbp") |>
  utils::head(10)
```

This is the same list as the list of columns in `nflfastR` play-by-play. Notice we had to supply the name of the table above (`"nflverse_pbp"`). 

With all that out of the way, there's only a couple more things to learn. The main driver here is `tbl`, which helps get output with a specific table in a database:

``` {r}
pbp_db <- dplyr::tbl(connection, "nflverse_pbp")
```

And now, everything will magically just "work": you can forget you're even working with a database!

``` {r}
pbp_db |>
  dplyr::group_by(season) |>
  dplyr::summarize(n = dplyr::n())
pbp_db |>
  dplyr::filter(
    rush == 1 | pass == 1,
    down <= 2,
    !is.na(epa),
    !is.na(posteam)
  ) |>
  dplyr::group_by(pass) |>
  dplyr::summarize(mean_epa = mean(epa, na.rm = TRUE))
```

So far, everything has stayed in the database. If you want to bring a query into memory, just use `collect()` at the end:

``` {r}
russ <- pbp_db |>
  dplyr::filter(name == "R.Wilson" & posteam == "SEA") |>
  dplyr::select(desc, epa) |>
  dplyr::collect()
russ
```

So we've searched through `r pbp_db |> dplyr::count() |> dplyr::collect() |> dplyr::pull(n) |> prettyNum(big.mark = ",")` rows of data across 300+ columns and only brought about `r round(nrow(russ), -1)` rows and two columns into memory. Pretty neat! This is how we supply the data to the shiny apps on rbsdm.com without running out of memory on the server. Now there's only one more thing to remember. When you're finished doing what you need with the database:

``` {r}
DBI::dbDisconnect(connection)
```

For more details on using a database with `nflfastR`, see [Thomas Mock's life-changing post here](https://themockup.blog/posts/2019-04-28-nflfastr-dbplyr-rsqlite/). More detailed information on dbplyr (the dplyr database back-end) are given in the second edition of [Hadley Wickham's R for Data Science (2e)](https://r4ds.hadley.nz/databases.html).

## Example 7: working with the expected yards after catch model

The variables in `xyac` are as follows:

* `xyac_epa`: The expected value of EPA gained after the catch, **starting from where the catch was made**.
* `xyac_success`: The probability the play earns positive EPA (relative to where play started) based on where ball was caught.
* `xyac_fd`: Probability play earns a first down based on where the ball was caught.
* `xyac_mean_yardage` and `xyac_median_yardage`: Average and median expected yards after the catch based on where the ball was caught.

Some other notes:

* `epa` = `air_epa` + `yac_epa`, where `air_epa` is the EPA associated with a catch at the target location. If a receiver loses a fumble, it is removed from his `yac_epa`
* Expected value of EPA at catch point = `air_epa` + `xyac_epa`
* So if we want to get YAC EPA over expected, we need to compare `yac_epa` to `xyac_epa`, as in the example below
* To get first downs over expected, we could compare `first_down` to `xyac_fd`
* These fields are populated for all pass attempts, whether caught or not, but restrict to completed passes when measuring, for example, YAC EPA over expected
* The expected YAC EPA model doesn't take receiver fumbles into account, so actual minus expected YAC is slightly negative due to fumbles happening

Let's create measures for EPA and first downs over expected in 2015:

``` {r ex9-xyac, warning = FALSE, message = FALSE}
nflfastR::load_pbp(2015) |>
  dplyr::group_by(receiver, receiver_id, posteam) |>
  dplyr::mutate(tgt = sum(complete_pass + incomplete_pass)) |>
  dplyr::filter(tgt >= 50) |>
  dplyr::filter(
    complete_pass == 1,
    air_yards < yardline_100,
    !is.na(xyac_epa)
  ) |>
  dplyr::summarize(
    epa_oe = mean(yac_epa - xyac_epa),
    actual_fd = mean(first_down),
    expected_fd = mean(xyac_fd),
    fd_oe = mean(first_down - xyac_fd),
    rec = dplyr::n()
  ) |>
  dplyr::ungroup() |>
  dplyr::select(
    receiver,
    posteam,
    actual_fd,
    expected_fd,
    fd_oe,
    epa_oe,
    rec
  ) |>
  dplyr::slice_max(epa_oe, n = 10) |>
  knitr::kable(digits = 3)
```

The presence of so many running backs on this list suggests that even though it takes into account target depth and pass direction, the model doesn't do a great job capturing space. Alternatively, running backs might be better at generating yards after the catch since running with the football is their primary role.

## Example 8: Working with roster and position data

At long last, there's a way to merge the new play-by-play data with roster information. Use the function to get the rosters:

``` {r roster}
roster <- nflfastR::load_rosters(2019)
```

Now let's load play-by-play data from 2019:
``` {r roster_pbp_load}
games_2019 <- nflfastR::load_pbp(2019)
```

Here is what the player IDs look like because `nflfastR` now automatically decodes IDs to look like the old format with GSIS IDs:

``` {r roster_pbp}
games_2019 |>
  dplyr::filter(rush == 1 | pass == 1, posteam == "SEA") |>
  dplyr::select(name, id)
```

Now we're ready to join to the roster data using these IDs:
``` {r decode_join}
joined <- games_2019 |>
  dplyr::filter(!is.na(receiver_id)) |>
  dplyr::select(posteam, season, desc, receiver, receiver_id, epa) |>
  dplyr::left_join(roster, by = c("receiver_id" = "gsis_id"))
```

``` {r decode_table}
# the real work is done, this just makes a table and has it look nice
joined |>
  dplyr::filter(position %in% c("WR", "TE", "RB")) |>
  dplyr::group_by(receiver_id, receiver, position) |>
  dplyr::summarize(tot_epa = sum(epa), n = n()) |>
  dplyr::arrange(-tot_epa) |>
  dplyr::ungroup() |>
  dplyr::group_by(position) |>
  dplyr::mutate(position_rank = 1:n()) |>
  dplyr::filter(position_rank <= 5) |>
  dplyr::rename(
    Pos_Rank = position_rank,
    Player = receiver,
    Pos = position,
    Tgt = n,
    EPA = tot_epa
  ) |>
  dplyr::select(Player, Pos, Pos_Rank, Tgt, EPA) |>
  knitr::kable(digits = 0)
```

Not surprisingly, all 5 of the top 5 WRs in terms of EPA added come in ahead of the top RB. Note that the number of targets won't match official stats because we're including plays with penalties.

## Example 9: Replicating official stats

The columns like `name`, `passer`, `fantasy` etc are `nflfastR`-created columns that mimic "real" football: i.e., excluding plays with spikes, counting scrambles and sacks as pass plays, etc. But if you're trying to replicate official statistics -- perhaps for fantasy purposes -- use the `*_player_name` and `*_player_id` columns.

[Let's try to replicate this page of passing leaders](https://www.nfl.com/stats/player-stats/).

``` {r stats1}
nflfastR::load_pbp(2020) |>
  dplyr::filter(
    season_type == "REG",
    complete_pass == 1 | incomplete_pass == 1 | interception == 1,
    !is.na(down)
  ) |>
  dplyr::group_by(passer_player_name, posteam) |>
  dplyr::summarize(
    yards = sum(passing_yards, na.rm = T),
    tds = sum(touchdown == 1 & td_team == posteam),
    ints = sum(interception),
    att = dplyr::n()
  ) |>
  dplyr::arrange(-yards) |>
  utils::head(10) |>
  knitr::kable(digits = 0)
```

These match the official stats on NFL.com (note the filter for `season_type == "REG"` since official stats only count regular season games). Note that we're using `passing_yards` here because `yards_gained` is not equal to passing yards on plays with laterals.

While the above code works in this case, there are several special cases where it is nearly impossible to get official player stats from nflfastR play-by-play data. The reason for this is that the idea of nflfastR play-by-play data is a "tidy" data structure. In other words, the aim is to have one row per play in the data. This can lead to problems if, for example, there are several changes of possession per play (i.e. several fumbles) or if the ball is lateraled in a play. These are just two examples of “abnormal” plays that are not fully captured in a tidy data structure.
We have solved this problem with the function `calculate_stats()`. This function uses playstats of the raw play-by-play data before it is parsed into a tidy structure by nflfastR. 

This function has the following features:

- It determines stats in offense, defense, and special teams,
- either on player level or on team level,
- and can summarize them on season level (separately for regular season and post season) or on week level.

For more information see the function documentation of `calculate_stats()`. Again, **don't try to get an exact match with official stats based on nflfastR play-by-play data**. It usually works, but fails because of details that are unsolvable.

Now let's replicate the above table using `calculate_stats()`:

``` {r stats2}
s <- nflfastR::calculate_stats(
  seasons = 2020,
  summary_level = "season",
  stat_type = "player",
  season_type = "REG"
)
s |>
  dplyr::slice_max(passing_yards, n = 10) |>
  dplyr::select(
    player_name,
    recent_team,
    completions,
    attempts,
    passing_yards,
    passing_tds,
    passing_interceptions,
    attempts
  ) |>
  knitr::kable(digits = 0)
```

The same applies to stats data as to pbp data. Its computation is costly, but can be automated. There is therefore rarely a reason to call `calculate_stats()` directly. Instead, nflverse offers the functions `nflfastR::load_player_stats()` and `nflfastR::load_team_stats()` to load precomputed data from data releases.

# Frequent issues

## The `drive` column looks wacky

Use `fixed_drive` and `fixed_drive_result` instead. See [Example 2: Using Drive Information].

## Why are there so many win probability columns?

`vegas_wp` and `vegas_home_wp` incorporate the pregame spread and are much better models.

## Need more help?

Please ask [in the nflverse Discord server](https://discord.com/invite/5Er2FBnnQa).


================================================
FILE: vignettes/stats_variables.Rmd
================================================
---
title: "NFL Stats Variables"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = FALSE,
  comment = "#>"
)

with_dt <- requireNamespace("DT")

```

Below you will find a table that lists and explains all the variables available in `calculate_stats()`. Compared to the old `calculate_player_stats*()` functions that have been deprecated, practically all variables (and their names) have been preserved. However, there are a few differences. These are

- `recent_team`: renamed to `team` (recent team in weekly data never made sense)
- `interceptions`: renamed to `passing_interceptions` (all passing stats have the passing prefix)
- `sacks`: renamed to `sacks_suffered` (to make clear it's not on defensive side)
- `sack_yards`: renamed to `sack_yards_lost` (to make clear it's not on defensive side)
- `dakota`: not implemented at the moment
- `def_tackles`: there is `def_tackles_solo` and `def_tackles_with_assist`
- `def_fumble_recovery_own`: renamed to `fumble_recovery_own` (it is not exclusive to defense)
- `def_fumble_recovery_yards_own`: renamed to `fumble_recovery_yards_own` (it is not exclusive to defense)
- `def_fumble_recovery_opp`: renamed to `fumble_recovery_opp` (it is not exclusive to defense)
- `def_fumble_recovery_yards_opp`: renamed to `fumble_recovery_yards_opp` (it is not exclusive to defense)
- `def_safety`: renamed to `def_safeties` (we use plural everywhere)
- `def_penalty`: renamed to `penalties` (it is not exclusive to defense)
- `def_penalty_yards`: renamed to `penalty_yards` (it is not exclusive to defense)

```{r eval = with_dt}
DT::datatable(
  nflfastR::nfl_stats_variables,
  options = list(scrollX = TRUE, pageLength = 25),
  filter = "top",
  rownames = FALSE,
  style = "bootstrap4"
)
```

```{r eval = !with_dt}
knitr::kable(nflfastR::nfl_stats_variables)
```