for a list of plays where multiple players recorded lateral rushing yards.}
#' \item{lateral_rushing_yards}{Numeric yards by the `lateral_rusher_player_name` in run plays with laterals. Please see the description of `lateral_rusher_player_name` for further information.}
#' \item{lateral_sack_player_id}{Unique identifier for the player that received the lateral on a sack.}
#' \item{lateral_sack_player_name}{String name for the player that received the lateral on a sack.}
#' \item{interception_player_id}{Unique identifier for the player that intercepted the pass.}
#' \item{interception_player_name}{String name for the player that intercepted the pass.}
#' \item{lateral_interception_player_id}{Unique indentifier for the player that received the lateral on an interception.}
#' \item{lateral_interception_player_name}{String name for the player that received the lateral on an interception.}
#' \item{punt_returner_player_id}{Unique identifier for the punt returner.}
#' \item{punt_returner_player_name}{String name for the punt returner.}
#' \item{lateral_punt_returner_player_id}{Unique identifier for the player that received the lateral on a punt return.}
#' \item{lateral_punt_returner_player_name}{String name for the player that received the lateral on a punt return.}
#' \item{kickoff_returner_player_name}{String name for the kickoff returner.}
#' \item{kickoff_returner_player_id}{Unique identifier for the kickoff returner.}
#' \item{lateral_kickoff_returner_player_id}{Unique identifier for the player that received the lateral on a kickoff return.}
#' \item{lateral_kickoff_returner_player_name}{String name for the player that received the lateral on a kickoff return.}
#' \item{punter_player_id}{Unique identifier for the punter.}
#' \item{punter_player_name}{String name for the punter.}
#' \item{kicker_player_name}{String name for the kicker on FG or kickoff.}
#' \item{kicker_player_id}{Unique identifier for the kicker on FG or kickoff.}
#' \item{own_kickoff_recovery_player_id}{Unique identifier for the player that recovered their own kickoff.}
#' \item{own_kickoff_recovery_player_name}{String name for the player that recovered their own kickoff.}
#' \item{blocked_player_id}{Unique identifier for the player that blocked the punt or FG.}
#' \item{blocked_player_name}{String name for the player that blocked the punt or FG.}
#' \item{tackle_for_loss_1_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_1_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
#' \item{tackle_for_loss_2_player_name}{String name for one of the potential players with the tackle for loss.}
#' \item{qb_hit_1_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_1_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{qb_hit_2_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see `sack_player` or `half_sack_*_player`.}
#' \item{forced_fumble_player_1_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_1_player_name}{String name of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_team}{Team of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_id}{Unique identifier of one of the players with a forced fumble.}
#' \item{forced_fumble_player_2_player_name}{String name of one of the players with a forced fumble.}
#' \item{solo_tackle_1_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_2_team}{Team of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_id}{Unique identifier of one of the players with a solo tackle.}
#' \item{solo_tackle_1_player_name}{String name of one of the players with a solo tackle.}
#' \item{solo_tackle_2_player_name}{String name of one of the players with a solo tackle.}
#' \item{assist_tackle_1_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_1_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_1_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_2_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_2_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_3_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_3_team}{Team of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_id}{Unique identifier of one of the players with a tackle assist.}
#' \item{assist_tackle_4_player_name}{String name of one of the players with a tackle assist.}
#' \item{assist_tackle_4_team}{Team of one of the players with a tackle assist.}
#' \item{tackle_with_assist}{Binary indicator for if there has been a tackle with assist.}
#' \item{tackle_with_assist_1_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_1_team}{Team of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_id}{Unique identifier of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_player_name}{String name of one of the players with a tackle with assist.}
#' \item{tackle_with_assist_2_team}{Team of one of the players with a tackle with assist.}
#' \item{pass_defense_1_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_1_player_name}{String name of one of the players with a pass defense.}
#' \item{pass_defense_2_player_id}{Unique identifier of one of the players with a pass defense.}
#' \item{pass_defense_2_player_name}{String name of one of the players with a pass defense.}
#' \item{fumbled_1_team}{Team of one of the first player with a fumble.}
#' \item{fumbled_1_player_id}{Unique identifier of the first player who fumbled on the play.}
#' \item{fumbled_1_player_name}{String name of one of the first player who fumbled on the play.}
#' \item{fumbled_2_player_id}{Unique identifier of the second player who fumbled on the play.}
#' \item{fumbled_2_player_name}{String name of one of the second player who fumbled on the play.}
#' \item{fumbled_2_team}{Team of one of the second player with a fumble.}
#' \item{fumble_recovery_1_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_1_player_name}{String name of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_team}{Team of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_yards}{Yards gained by one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_id}{Unique identifier of one of the players with a fumble recovery.}
#' \item{fumble_recovery_2_player_name}{String name of one of the players with a fumble recovery.}
#' \item{sack_player_id}{Unique identifier of the player who recorded a solo sack.}
#' \item{sack_player_name}{String name of the player who recorded a solo sack.}
#' \item{half_sack_1_player_id}{Unique identifier of the first player who recorded half a sack.}
#' \item{half_sack_1_player_name}{String name of the first player who recorded half a sack.}
#' \item{half_sack_2_player_id}{Unique identifier of the second player who recorded half a sack.}
#' \item{half_sack_2_player_name}{String name of the second player who recorded half a sack.}
#' \item{return_team}{String abbreviation of the return team.}
#' \item{return_yards}{Yards gained by the return team.}
#' \item{penalty_team}{String abbreviation of the team with the penalty.}
#' \item{penalty_player_id}{Unique identifier for the player with the penalty.}
#' \item{penalty_player_name}{String name for the player with the penalty.}
#' \item{penalty_yards}{Yards gained (or lost) by the posteam from the penalty.}
#' \item{replay_or_challenge}{Binary indicator for whether or not a replay or challenge.}
#' \item{replay_or_challenge_result}{String indicating the result of the replay or challenge.}
#' \item{penalty_type}{String indicating the penalty type of the first penalty in the given play. Will be `NA` if `desc` is missing the type.}
#' \item{defensive_two_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on a two point conversion, this results following a turnover.}
#' \item{defensive_two_point_conv}{Binary indicator whether or not the defense successfully scored on the two point conversion.}
#' \item{defensive_extra_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on an extra point attempt, this results following a blocked attempt that the defense recovers the ball.}
#' \item{defensive_extra_point_conv}{Binary indicator whether or not the defense successfully scored on an extra point attempt.}
#' \item{safety_player_name}{String name for the player who scored a safety.}
#' \item{safety_player_id}{Unique identifier for the player who scored a safety.}
#' \item{season}{4 digit number indicating to which season the game belongs to.}
#' \item{cp}{Numeric value indicating the probability for a complete pass based on comparable game situations.}
#' \item{cpoe}{For a single pass play this is 1 - cp when the pass was completed or 0 - cp when the pass was incomplete. Analyzed for a whole game or season an indicator for the passer how much over or under expectation his completion percentage was.}
#' \item{series}{Starts at 1, each new first down increments, numbers shared across both teams NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam}
#' \item{series_success}{1: scored touchdown, gained enough yards for first down.}
#' \item{series_result}{Possible values: First down, Touchdown, Opp touchdown, Field goal, Missed field goal, Safety, Turnover, Punt, Turnover on downs, QB kneel, End of half}
#' \item{order_sequence}{Column provided by NFL to fix out-of-order plays. Available 2011 and beyond with source "nfl".}
#' \item{start_time}{Kickoff time in eastern time zone.}
#' \item{time_of_day}{Time of day of play in UTC "HH:MM:SS" format. Available 2011 and beyond with source "nfl".}
#' \item{stadium}{Game site name.}
#' \item{weather}{String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!}
#' \item{nfl_api_id}{UUID of the game in the new NFL API.}
#' \item{play_clock}{Time on the playclock when the ball was snapped.}
#' \item{play_deleted}{Binary indicator for deleted plays.}
#' \item{play_type_nfl}{Play type as listed in the NFL source. Slightly different to the regular play_type variable.}
#' \item{special_teams_play}{Binary indicator for whether play is special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{st_play_type}{Type of special teams play from NFL source. Available 2011 and beyond with source "nfl".}
#' \item{end_clock_time}{Game time at the end of a given play.}
#' \item{end_yard_line}{String indicating the yardline at the end of the given play consisting of team half and yard line number.}
#' \item{fixed_drive}{Manually created drive number in a game.}
#' \item{fixed_drive_result}{Manually created drive result.}
#' \item{drive_real_start_time}{Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').}
#' \item{drive_play_count}{Numeric value of how many regular plays happened in a given drive.}
#' \item{drive_time_of_possession}{Time of possession in a given drive.}
#' \item{drive_first_downs}{Number of first downs in a given drive.}
#' \item{drive_inside20}{Binary indicator if the offense was able to get inside the opponents 20 yard line.}
#' \item{drive_ended_with_score}{Binary indicator the drive ended with a score.}
#' \item{drive_quarter_start}{Numeric value indicating in which quarter the given drive has started.}
#' \item{drive_quarter_end}{Numeric value indicating in which quarter the given drive has ended.}
#' \item{drive_yards_penalized}{Numeric value of how many yards the offense gained or lost through penalties in the given drive.}
#' \item{drive_start_transition}{String indicating how the offense got the ball.}
#' \item{drive_end_transition}{String indicating how the offense lost the ball.}
#' \item{drive_game_clock_start}{Game time at the beginning of a given drive.}
#' \item{drive_game_clock_end}{Game time at the end of a given drive.}
#' \item{drive_start_yard_line}{String indicating where a given drive started consisting of team half and yard line number.}
#' \item{drive_end_yard_line}{String indicating where a given drive ended consisting of team half and yard line number.}
#' \item{drive_play_id_started}{Play_id of the first play in the given drive.}
#' \item{drive_play_id_ended}{Play_id of the last play in the given drive.}
#' \item{away_score}{Total points scored by the away team.}
#' \item{home_score}{Total points scored by the home team.}
#' \item{location}{Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. }
#' \item{result}{Equals home_score - away_score and means the game outcome from the perspective of the home team.}
#' \item{total}{Equals home_score + away_score and means the total points scored in the given game.}
#' \item{spread_line}{The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference)}
#' \item{total_line}{The closing total line for the game. (Source: Pro-Football-Reference)}
#' \item{div_game}{Binary indicator for if the given game was a division game.}
#' \item{roof}{One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{surface}{What type of ground the game was played on. (Source: Pro-Football-Reference)}
#' \item{temp}{The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)}
#' \item{wind}{The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)}
#' \item{home_coach}{First and last name of the home team coach. (Source: Pro-Football-Reference)}
#' \item{away_coach}{First and last name of the away team coach. (Source: Pro-Football-Reference)}
#' \item{stadium_id}{ID of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{game_stadium}{Name of the stadium the game was played in. (Source: Pro-Football-Reference)}
#' \item{success}{Binary indicator wheter epa > 0 in the given play. }
#' \item{passer}{Name of the dropback player (scrambles included) including plays with penalties.}
#' \item{passer_jersey_number}{Jersey number of the passer.}
#' \item{rusher}{Name of the rusher (no scrambles) including plays with penalties.}
#' \item{rusher_jersey_number}{Jersey number of the rusher.}
#' \item{receiver}{Name of the receiver including plays with penalties.}
#' \item{receiver_jersey_number}{Jersey number of the receiver.}
#' \item{pass}{Binary indicator if the play was a pass play (sacks and scrambles included).}
#' \item{rush}{Binary indicator if the play was a rushing play.}
#' \item{first_down}{Binary indicator if the play ended in a first down.}
#' \item{aborted_play}{Binary indicator if the play description indicates "Aborted".}
#' \item{special}{Binary indicator if the play was a special teams play.}
#' \item{play}{Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.}
#' \item{passer_id}{ID of the player in the 'passer' column.}
#' \item{rusher_id}{ID of the player in the 'rusher' column.}
#' \item{receiver_id}{ID of the player in the 'receiver' column.}
#' \item{name}{Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.}
#' \item{jersey_number}{Jersey number of the player listed in the 'name' column.}
#' \item{id}{ID of the player in the 'name' column.}
#' \item{fantasy_player_name}{Name of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{fantasy_player_id}{ID of the rusher on rush plays or receiver on pass plays (from official stats).}
#' \item{fantasy}{Name of the rusher on rush plays or receiver on pass plays.}
#' \item{fantasy_id}{ID of the rusher on rush plays or receiver on pass plays.}
#' \item{out_of_bounds}{1 if play description contains ran ob, pushed ob, or sacked ob; 0 otherwise.}
#' \item{home_opening_kickoff}{= 1 if the home team received the opening kickoff, 0 otherwise.}
#' \item{qb_epa}{Gives QB credit for EPA for up to the point where a receiver lost a fumble after a completed catch and makes EPA work more like passing yards on plays with fumbles.}
#' \item{xyac_epa}{Expected value of EPA gained after the catch, starting from where the catch was made. Zero yards after the catch would be listed as zero EPA.}
#' \item{xyac_mean_yardage}{Average expected yards after the catch based on where the ball was caught.}
#' \item{xyac_median_yardage}{Median expected yards after the catch based on where the ball was caught.}
#' \item{xyac_success}{Probability play earns positive EPA (relative to where play started) based on where ball was caught.}
#' \item{xyac_fd}{Probability play earns a first down based on where the ball was caught.}
#' \item{xpass}{Probability of dropback scaled from 0 to 1.}
#' \item{pass_oe}{Dropback percent over expected on a given play scaled from 0 to 100.}
================================================
FILE: data-raw/wordmarks.R
================================================
library(dplyr)
teams <- nflfastR::teams_colors_logos |>
dplyr::filter(!team_abbr %in% c("LAR", "OAK", "SD", "STL"))
purrr::walk(teams$team_abbr, function(x) {
load <- glue::glue(
"https://static.www.nfl.com/league/apps/clubs/wordmarks/{x}_fullcolor.png"
) |>
magick::image_read() |>
magick::image_trim()
info <- magick::image_info(load)
rl <- (700 - info$width) / 2
tb <- (192 - info$height) / 2
image <- magick::image_border(load, "transparent", glue::glue("{rl}x{tb}"))
magick::image_write(
image,
path = glue::glue("wordmarks/{x}.png"),
format = "png"
)
if (x == "LA") {
magick::image_write(image, path = "wordmarks/LAR.png", format = "png")
magick::image_write(image, path = "wordmarks/STL.png", format = "png")
} else if (x == "LAC") {
magick::image_write(image, path = "wordmarks/SD.png", format = "png")
} else if (x == "LV") {
magick::image_write(image, path = "wordmarks/OAK.png", format = "png")
}
})
================================================
FILE: man/add_qb_epa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_additional_functions.R
\name{add_qb_epa}
\alias{add_qb_epa}
\title{Compute QB epa}
\usage{
add_qb_epa(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}
\item{...}{Additional arguments passed to a message function (for internal use).}
}
\description{
Compute QB epa
}
\details{
Add the variable 'qb_epa', which gives QB credit for EPA for up to the point where
a receiver lost a fumble after a completed catch and makes EPA work more
like passing yards on plays with fumbles
}
================================================
FILE: man/add_xpass.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_add_xpass.R
\name{add_xpass}
\alias{add_xpass}
\title{Add expected pass columns}
\usage{
add_xpass(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}
\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter \code{pbp} with the following columns
added:
\describe{
\item{xpass}{Probability of dropback scaled from 0 to 1.}
\item{pass_oe}{Dropback percent over expected on a given play scaled from 0 to 100.}
}
}
\description{
Build columns from the expected dropback model. Will return
\code{NA} on data prior to 2006 since that was before NFL started marking scrambles.
Must be run on a dataframe that has already had \code{\link[=clean_pbp]{clean_pbp()}} run on it.
Note that the functions \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} and
the database function \code{\link[=update_db]{update_db()}} already include this function.
}
================================================
FILE: man/add_xyac.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_add_xyac.R
\name{add_xyac}
\alias{add_xyac}
\title{Add expected yards after completion (xyac) variables}
\usage{
add_xyac(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}
\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter 'pbp' with the following columns
added:
\describe{
\item{xyac_epa}{Expected value of EPA gained after the catch, starting from where the catch was made. Zero yards after the catch would be listed as zero EPA.}
\item{xyac_success}{Probability play earns positive EPA (relative to where play started) based on where ball was caught.}
\item{xyac_fd}{Probability play earns a first down based on where the ball was caught.}
\item{xyac_mean_yardage}{Average expected yards after the catch based on where the ball was caught.}
\item{xyac_median_yardage}{Median expected yards after the catch based on where the ball was caught.}
}
}
\description{
Add expected yards after completion (xyac) variables
}
\details{
Build columns that capture what we should expect after the catch.
}
================================================
FILE: man/build_nflfastR_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/build_nflfastR_pbp.R
\name{build_nflfastR_pbp}
\alias{build_nflfastR_pbp}
\title{Build a Complete nflfastR Data Set}
\usage{
build_nflfastR_pbp(
game_ids,
dir = getOption("nflfastR.raw_directory", default = NULL),
...,
decode = TRUE,
rules = TRUE
)
}
\arguments{
\item{game_ids}{Vector of character ids or a data frame including the variable
\code{game_id} (see details for further information).}
\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory")
where nflfastR searches for raw game play-by-play data.
See \code{\link[=save_raw_pbp]{save_raw_pbp()}} for additional information.}
\item{...}{Additional arguments passed to the scraping functions (for internal use)}
\item{decode}{If \code{TRUE}, the function \code{\link[=decode_player_ids]{decode_player_ids()}} will be executed.}
\item{rules}{If \code{FALSE}, printing of the header and footer in the console output will be suppressed.}
}
\value{
An nflfastR play-by-play data frame like it can be loaded from \url{https://github.com/nflverse/nflverse-data}.
}
\description{
\code{build_nflfastR_pbp} is a convenient wrapper around 6 nflfastR functions:
\itemize{
\item{\code{\link[=fast_scraper]{fast_scraper()}}}
\item{\code{\link[=clean_pbp]{clean_pbp()}}}
\item{\code{\link[=add_qb_epa]{add_qb_epa()}}}
\item{\code{\link[=add_xyac]{add_xyac()}}}
\item{\code{\link[=add_xpass]{add_xpass()}}}
\item{\code{\link[=decode_player_ids]{decode_player_ids()}}}
}
Please see either the documentation of each function or
\href{https://nflfastr.com/articles/field_descriptions.html}{the nflfastR Field Descriptions website}
to learn about the output.
}
\details{
To load valid game_ids please use the package function \code{\link[=fast_scraper_schedules]{fast_scraper_schedules()}}.
}
\examples{
\donttest{
# Build nflfastR pbp for the 2018 and 2019 Super Bowls
try({# to avoid CRAN test problems
build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
})
# It is also possible to directly use the
# output of `load_schedules` as input
try({# to avoid CRAN test problems
nflreadr::load_schedules(2025) |>
dplyr::slice_tail(n = 3) |>
build_nflfastR_pbp()
})
\dontshow{
# Close open connections for R CMD Check
future::plan("sequential")
}
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
================================================
FILE: man/calculate_expected_points.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ep_wp_calculators.R
\name{calculate_expected_points}
\alias{calculate_expected_points}
\title{Compute expected points}
\usage{
calculate_expected_points(pbp_data)
}
\arguments{
\item{pbp_data}{Play-by-play dataset to estimate expected points for.}
}
\value{
The original pbp_data with the following columns appended to it:
\describe{
\item{ep}{expected points.}
\item{no_score_prob}{probability of no more scoring this half.}
\item{opp_fg_prob}{probability next score opponent field goal this half.}
\item{opp_safety_prob}{probability next score opponent safety this half.}
\item{opp_td_prob}{probability of next score opponent touchdown this half.}
\item{fg_prob}{probability next score field goal this half.}
\item{safety_prob}{probability next score safety this half.}
\item{td_prob}{probability text score touchdown this half.}
}
}
\description{
for provided plays. Returns the data with
probabilities of each scoring event and EP added. The following columns
must be present: season, home_team, posteam, roof (coded as 'open',
'closed', or 'retractable'), half_seconds_remaining, yardline_100,
ydstogo, posteam_timeouts_remaining, defteam_timeouts_remaining
}
\details{
Computes expected points for provided plays. Returns the data with
probabilities of each scoring event and EP added. The following columns
must be present:
\itemize{
\item{season}
\item{home_team}
\item{posteam}
\item{roof (coded as 'outdoors', 'dome', or 'open'/'closed'/NA (retractable))}
\item{half_seconds_remaining}
\item{yardline_100}
\item{down}
\item{ydstogo}
\item{posteam_timeouts_remaining}
\item{defteam_timeouts_remaining}
}
}
\examples{
\donttest{
try({# to avoid CRAN test problems
library(dplyr)
data <- tibble::tibble(
"season" = 1999:2019,
"home_team" = "SEA",
"posteam" = "SEA",
"roof" = "outdoors",
"half_seconds_remaining" = 1800,
"yardline_100" = c(rep(80, 17), rep(75, 4)),
"down" = 1,
"ydstogo" = 10,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
nflfastR::calculate_expected_points(data) |>
dplyr::select(season, yardline_100, td_prob, ep)
})
}
}
================================================
FILE: man/calculate_player_stats.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats.R
\name{calculate_player_stats}
\alias{calculate_player_stats}
\title{Get Official Game Stats}
\usage{
calculate_player_stats(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}. If the data doesn't include the variable
\code{qb_epa}, the function \code{add_qb_epa()} will be called to add it.}
\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats
for the entire Data frame.}
}
\value{
A data frame including the following columns (all ID columns are
decoded to the gsis ID format):
\describe{
\item{player_id}{ID of the player. Use this to join to other sources.}
\item{player_name}{Name of the player}
\item{player_display_name}{Full name of the player}
\item{position}{Position of the player}
\item{position_group}{Position group of the player}
\item{headshot_url}{URL to a player headshot image}
\item{games}{The number of games where the player recorded passing, rushing or receiving stats.}
\item{recent_team}{Most recent team player appears in \code{pbp} with.}
\item{season}{Season if \code{weekly} is \code{TRUE}}
\item{week}{Week if \code{weekly} is \code{TRUE}}
\item{season_type}{\code{REG} or \code{POST} if \code{weekly} is \code{TRUE}}
\item{opponent_team}{The player's opponent team if \code{weekly} is \code{TRUE}}
\item{completions}{The number of completed passes.}
\item{attempts}{The number of pass attempts as defined by the NFL.}
\item{passing_yards}{Yards gained on pass plays.}
\item{passing_tds}{The number of passing touchdowns.}
\item{interceptions}{The number of interceptions thrown.}
\item{sacks}{The Number of times sacked.}
\item{sack_yards}{Yards lost on sack plays.}
\item{sack_fumbles}{The number of sacks with a fumble.}
\item{sack_fumbles_lost}{The number of sacks with a lost fumble.}
\item{passing_air_yards}{Passing air yards (includes incomplete passes).}
\item{passing_yards_after_catch}{Yards after the catch gained on plays in
which player was the passer (this is an unofficial stat and may differ slightly
between different sources).}
\item{passing_first_downs}{First downs on pass attempts.}
\item{passing_epa}{Total expected points added on pass attempts and sacks.
NOTE: this uses the variable \code{qb_epa}, which gives QB credit for EPA for up
to the point where a receiver lost a fumble after a completed catch and makes
EPA work more like passing yards on plays with fumbles.}
\item{passing_2pt_conversions}{Two-point conversion passes.}
\item{pacr}{Passing Air Conversion Ratio. PACR = \code{passing_yards} / \code{passing_air_yards}}
\item{dakota}{Adjusted EPA + CPOE composite based on coefficients which best predict adjusted EPA/play in the following year.}
\item{carries}{The number of official rush attempts (incl. scrambles and kneel downs).
Rushes after a lateral reception don't count as carry.}
\item{rushing_yards}{Yards gained when rushing with the ball (incl. scrambles and kneel downs).
Also includes yards gained after obtaining a lateral on a play that started
with a rushing attempt.}
\item{rushing_tds}{The number of rushing touchdowns (incl. scrambles).
Also includes touchdowns after obtaining a lateral on a play that started
with a rushing attempt.}
\item{rushing_fumbles}{The number of rushes with a fumble.}
\item{rushing_fumbles_lost}{The number of rushes with a lost fumble.}
\item{rushing_first_downs}{First downs on rush attempts (incl. scrambles).}
\item{rushing_epa}{Expected points added on rush attempts (incl. scrambles and kneel downs).}
\item{rushing_2pt_conversions}{Two-point conversion rushes}
\item{receptions}{The number of pass receptions. Lateral receptions officially
don't count as reception.}
\item{targets}{The number of pass plays where the player was the targeted receiver.}
\item{receiving_yards}{Yards gained after a pass reception. Includes yards
gained after receiving a lateral on a play that started as a pass play.}
\item{receiving_tds}{The number of touchdowns following a pass reception.
Also includes touchdowns after receiving a lateral on a play that started
as a pass play.}
\item{receiving_air_yards}{Receiving air yards (incl. incomplete passes).}
\item{receiving_yards_after_catch}{Yards after the catch gained on plays in
which player was receiver (this is an unofficial stat and may differ slightly
between different sources).}
\item{receiving_fumbles}{The number of fumbles after a pass reception.}
\item{receiving_fumbles_lost}{The number of fumbles lost after a pass reception.}
\item{receiving_2pt_conversions}{Two-point conversion receptions}
\item{racr}{Receiver Air Conversion Ratio. RACR = \code{receiving_yards} / \code{receiving_air_yards}}
\item{target_share}{The share of targets of the player in all targets of his team}
\item{air_yards_share}{The share of receiving_air_yards of the player in all air_yards of his team}
\item{wopr}{Weighted Opportunity Rating. WOPR = 1.5 × \code{target_share} + 0.7 × \code{air_yards_share}}
\item{fantasy_points}{Standard fantasy points.}
\item{fantasy_points_ppr}{PPR fantasy points.}
}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.
Build columns that aggregate official passing, rushing, and receiving stats
either at the game level or at the level of the entire data frame passed.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
# pbp <- nflfastR::load_pbp(2020)
# weekly <- calculate_player_stats(pbp, weekly = TRUE)
# dplyr::glimpse(weekly)
# overall <- calculate_player_stats(pbp, weekly = FALSE)
# dplyr::glimpse(overall)
})
}
}
\seealso{
The function \code{\link[=load_player_stats]{load_player_stats()}} and the corresponding examples
on \href{https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats}{the nflfastR website}
}
\keyword{internal}
================================================
FILE: man/calculate_player_stats_def.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats_def.R
\name{calculate_player_stats_def}
\alias{calculate_player_stats_def}
\title{Get Official Game Stats on Defense}
\usage{
calculate_player_stats_def(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}. If the data doesn't include the variable
\code{qb_epa}, the function \code{add_qb_epa()} will be called to add it.}
\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats
for the entire Data frame.}
}
\value{
A data frame of defensive player stats. See dictionary (# TODO)
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.
Build columns that aggregate official defense stats
either at the game level or at the level of the entire data frame passed.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
# pbp <- nflfastR::load_pbp(2020)
# weekly <- calculate_player_stats_def(pbp, weekly = TRUE)
# dplyr::glimpse(weekly)
# overall <- calculate_player_stats_def(pbp, weekly = FALSE)
# dplyr::glimpse(overall)
})
}
}
\seealso{
The function \code{\link[=load_player_stats]{load_player_stats()}} and the corresponding examples
on \href{https://nflfastr.com/articles/nflfastR.html#example-11-replicating-official-stats}{the nflfastR website}
}
\keyword{internal}
================================================
FILE: man/calculate_player_stats_kicking.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_game_stats_kicking.R
\name{calculate_player_stats_kicking}
\alias{calculate_player_stats_kicking}
\title{Summarize Kicking Stats}
\usage{
calculate_player_stats_kicking(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{A Data frame of NFL play-by-play data typically loaded with
\code{\link[=load_pbp]{load_pbp()}} or \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}.}
\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise, stats for
the entire data frame in argument \code{pbp}.}
}
\value{
a dataframe of kicking stats
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated because we have a new, much better and
harmonized approach in \code{\link[=calculate_stats]{calculate_stats()}}.
Build columns that aggregate kicking stats at the game level.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
# pbp <- nflreadr::load_pbp(2021)
# weekly <- calculate_player_stats_kicking(pbp, weekly = TRUE)
# dplyr::glimpse(weekly)
# overall <- calculate_player_stats_kicking(pbp, weekly = FALSE)
# dplyr::glimpse(overall)
})
}
}
\seealso{
\url{https://nflreadr.nflverse.com/reference/load_player_stats.html} for the nflreadr function to download this from repo (\code{stat_type = "kicking"})
}
\keyword{internal}
================================================
FILE: man/calculate_series_conversion_rates.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_series_conversion_rates.R
\name{calculate_series_conversion_rates}
\alias{calculate_series_conversion_rates}
\title{Compute Series Conversion Information from Play by Play}
\usage{
calculate_series_conversion_rates(pbp, weekly = FALSE)
}
\arguments{
\item{pbp}{Play-by-play data as returned by \code{\link[=load_pbp]{load_pbp()}}, \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, or
\code{\link[=fast_scraper]{fast_scraper()}}.}
\item{weekly}{If \code{TRUE}, returns week-by-week stats, otherwise,
season-by-season stats in argument \code{pbp}.}
}
\value{
A data frame of series information including the following columns:
\describe{
\item{season}{The NFL season}
\item{team}{NFL team abbreviation}
\item{week}{Week if \code{weekly} is \code{TRUE}}
\item{off_n}{The number of series the offense played (excludes QB kneel
downs, kickoffs, extra point/two point conversion attempts, non-plays, and
plays that do not list a "posteam")}
\item{off_scr}{The rate at which a series ended in either new 1st down or
touchdown while the offense was on the field}
\item{off_scr_1st}{The rate at which an offense earned a 1st down
or scored a touchdown on 1st down}
\item{off_scr_2nd}{The rate at which an offense earned a 1st down
or scored a touchdown on 2nd down}
\item{off_scr_3rd}{The rate at which an offense earned a 1st down
or scored a touchdown on 3rd down}
\item{off_scr_4th}{The rate at which an offense earned a 1st down
or scored a touchdown on 4th down}
\item{off_1st}{The rate of series that ended in a new 1st down while the
offense was on the field (does not include offensive touchdown)}
\item{off_td}{The rate of series that ended in an offensive touchdown while the
offense was on the field}
\item{off_fg}{The rate of series that ended in a field goal attempt while the
offense was on the field}
\item{off_punt}{The rate of series that ended in a punt while the
offense was on the field}
\item{off_to}{The rate of series that ended in a turnover (including on downs), in an
opponent score, or at the end of half (or game) while the
offense was on the field}
\item{def_n}{The number of series the defense played (excludes QB kneel
downs, kickoffs, extra point/two point conversion attempts, non-plays, and
plays that do not list a "posteam")}
\item{def_scr}{The rate at which a series ended in either new 1st down or
touchdown while the defense was on the field}
\item{def_scr_1st}{The rate at which a defense allowed a
1st down or touchdown on 1st down}
\item{def_scr_2nd}{The rate at which a defense allowed a
1st down or touchdown on 2nd down}
\item{def_scr_3rd}{The rate at which a defense allowed a
1st down or touchdown on 3rd down}
\item{def_scr_4th}{The rate at which a defense allowed a
1st down or touchdown on 4th down}
\item{def_1st}{The rate of series that ended in a new 1st down while the
defense was on the field (does not include offensive touchdown)}
\item{def_td}{The rate of series that ended in an offensive touchdown while the
defense was on the field}
\item{def_fg}{The rate of series that ended in a field goal attempt while the
defense was on the field}
\item{def_punt}{The rate of series that ended in a punt while the
defense was on the field}
\item{def_to}{The rate of series that ended in a turnover (including on downs), in an
opponent score, or at the end of half (or game) while the
defense was on the field}
}
}
\description{
A "Series" begins on a 1st and 10 and each team attempts to either earn
a new 1st down (on offense) or prevent the offense from converting a new
1st down (on defense). Series conversion rate represents how many series
have been either converted to a new 1st down or ended in a touchdown.
This function computes series conversion rates on offense and defense from
nflverse play-by-play data along with other series results.
The function automatically removes series that ended in a QB kneel down.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
pbp <- nflfastR::load_pbp(2021)
weekly <- calculate_series_conversion_rates(pbp, weekly = TRUE)
dplyr::glimpse(weekly)
overall <- calculate_series_conversion_rates(pbp, weekly = FALSE)
dplyr::glimpse(overall)
})
}
}
================================================
FILE: man/calculate_standings.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_standings.R
\name{calculate_standings}
\alias{calculate_standings}
\title{Compute Division Standings and Conference Seeds from Play by Play}
\usage{
calculate_standings(
nflverse_object,
tiebreaker_depth = 3,
playoff_seeds = NULL
)
}
\arguments{
\item{nflverse_object}{Data object of class \code{nflverse_data}. Either schedules
as returned by \code{\link[=fast_scraper_schedules]{fast_scraper_schedules()}} or \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules()}}.
Or play-by-play data as returned by \code{\link[=load_pbp]{load_pbp()}}, \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, or
\code{\link[=fast_scraper]{fast_scraper()}}.}
\item{tiebreaker_depth}{A single value equal to 1, 2, or 3. The default is 3. The
value controls the depth of tiebreakers that shall be applied. The deepest
currently implemented tiebreaker is strength of schedule. The following
values are valid:
\describe{
\item{tiebreaker_depth = 1}{Break all ties with a coinflip. Fastest variant.}
\item{tiebreaker_depth = 2}{Apply head-to-head and division win percentage tiebreakers. Random if still tied.}
\item{tiebreaker_depth = 3}{Apply all tiebreakers through strength of schedule. Random if still tied.}
}}
\item{playoff_seeds}{Number of playoff teams per conference. If \code{NULL} (the
default), the function will try to split \code{nflverse_object} into seasons prior
2020 (6 seeds) and 2020ff (7 seeds). If set to a numeric, it will be used
for all seasons in \code{nflverse_object}!}
}
\value{
A tibble with NFL regular season standings
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated and replaced by \code{\link[nflseedR:nfl_standings]{nflseedR::nfl_standings()}}.
This function calculates division standings as well as playoff
seeds per conference based on either nflverse play-by-play data or nflverse
schedule data.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
# load nflverse data both schedules and pbp
# scheds <- fast_scraper_schedules(2014)
# pbp <- load_pbp(c(2018, 2021))
# calculate standings based on pbp
# calculate_standings(pbp)
# calculate standings based on schedules
# calculate_standings(scheds)
})
}
}
\keyword{internal}
================================================
FILE: man/calculate_stats.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calculate_stats.R
\name{calculate_stats}
\alias{calculate_stats}
\title{Calculate NFL Stats}
\usage{
calculate_stats(
seasons = nflreadr::most_recent_season(),
summary_level = c("season", "week"),
stat_type = c("player", "team"),
season_type = c("REG", "POST", "REG+POST"),
pbp = NULL
)
}
\arguments{
\item{seasons}{A numeric vector of 4-digit years associated with given NFL
seasons - defaults to latest season. If set to TRUE, returns all available
data since 1999. Ignored if argument \code{pbp} is not \code{NULL}.}
\item{summary_level}{Summarize stats by \code{"season"} or \code{"week"}.}
\item{stat_type}{Calculate \code{"player"} level stats or \code{"team"} level stats.}
\item{season_type}{One of \code{"REG"}, \code{"POST"}, or \code{"REG+POST"}. Filters
data to regular season ("REG"), post season ("POST") or keeps all data.
Only applied if \code{summary_level} == \code{"season"}.}
\item{pbp}{This argument allows passing a subset of nflverse play-by-play
data, created with \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} or loaded with \code{\link[=load_pbp]{load_pbp()}}.
Stats are then calculated based on the \code{game_id}s and \code{play_id}s in this
subset of play-by-play data, rather then using the seasons specified in the
\code{seasons} argument. The function will error if required variables are
missing from the subset, but lists which variables are missing.
If \code{pbp = NULL} (the default), all available games and plays from the
\code{seasons} argument are used to calculate stats.
Please use this responsibly, because the output is structurally identical
to full seasons, even if plays have been filtered out. It may then appear
as if the stats are incorrect. If \code{pbp} is not \code{NULL}, the function will add
the attribute \code{"custom_pbp" = TRUE} to the function output to help identify
stats that are possibly based on play-by-play subsets.}
}
\value{
A tibble of player/team stats summarized by season/week.
}
\description{
Compute various NFL stats based off nflverse Play-by-Play data.
}
\examples{
\donttest{
try({# to avoid CRAN test problems
stats <- calculate_stats(2023, "season", "player")
dplyr::glimpse(stats)
})
}
}
\seealso{
\link{nfl_stats_variables} for a description of all variables.
\url{https://nflfastr.com/articles/stats_variables.html} for a searchable
table of the stats variable descriptions.
}
================================================
FILE: man/calculate_win_probability.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ep_wp_calculators.R
\name{calculate_win_probability}
\alias{calculate_win_probability}
\title{Compute win probability}
\usage{
calculate_win_probability(pbp_data)
}
\arguments{
\item{pbp_data}{Play-by-play dataset to estimate win probability for.}
}
\value{
The original pbp_data with the following columns appended to it:
\describe{
\item{wp}{win probability.}
\item{vegas_wp}{win probability taking into account pre-game spread.}
}
}
\description{
for provided plays. Returns the data with
probabilities of winning the game. The following columns
must be present: receive_h2_ko (1 if game is in 1st half and possession
team will receive 2nd half kickoff, 0 otherwise),
home_team, posteam, half_seconds_remaining, game_seconds_remaining,
spread_line (how many points home team was favored by), down, ydstogo,
yardline_100, posteam_timeouts_remaining, defteam_timeouts_remaining
}
\details{
Computes win probability for provided plays. Returns the data with
spread and non-spread-adjusted win probabilities. The following columns
must be present:
\itemize{
\item{receive_2h_ko (1 if game is in 1st half and possession team will receive 2nd half kickoff, 0 otherwise)}
\item{score_differential}
\item{home_team}
\item{posteam}
\item{half_seconds_remaining}
\item{game_seconds_remaining}
\item{spread_line (how many points home team was favored by)}
\item{down}
\item{ydstogo}
\item{yardline_100}
\item{posteam_timeouts_remaining}
\item{defteam_timeouts_remaining}
}
}
\examples{
\donttest{
try({# to avoid CRAN test problems
library(dplyr)
data <- tibble::tibble(
"receive_2h_ko" = 0,
"home_team" = "SEA",
"posteam" = "SEA",
"score_differential" = 0,
"half_seconds_remaining" = 1800,
"game_seconds_remaining" = 3600,
"spread_line" = c(1, 3, 4, 7, 14),
"down" = 1,
"ydstogo" = 10,
"yardline_100" = 75,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
nflfastR::calculate_win_probability(data) |>
dplyr::select(spread_line, wp, vegas_wp)
})
}
}
================================================
FILE: man/clean_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_additional_functions.R
\name{clean_pbp}
\alias{clean_pbp}
\title{Clean Play by Play Data}
\usage{
clean_pbp(pbp, ...)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}
\item{...}{Additional arguments passed to a message function (for internal use).}
}
\value{
The input Data Frame of the parameter 'pbp' with the following columns
added:
\describe{
\item{success}{Binary indicator wheter epa > 0 in the given play. }
\item{passer}{Name of the dropback player (scrambles included) including plays with penalties.}
\item{passer_jersey_number}{Jersey number of the passer.}
\item{rusher}{Name of the rusher (no scrambles) including plays with penalties.}
\item{rusher_jersey_number}{Jersey number of the rusher.}
\item{receiver}{Name of the receiver including plays with penalties.}
\item{receiver_jersey_number}{Jersey number of the receiver.}
\item{pass}{Binary indicator if the play was a pass play (sacks and scrambles included).}
\item{rush}{Binary indicator if the play was a rushing play.}
\item{special}{Binary indicator if the play was a special teams play.}
\item{first_down}{Binary indicator if the play ended in a first down.}
\item{aborted_play}{Binary indicator if the play description indicates "Aborted".}
\item{play}{Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.}
\item{passer_id}{ID of the player in the 'passer' column.}
\item{rusher_id}{ID of the player in the 'rusher' column.}
\item{receiver_id}{ID of the player in the 'receiver' column.}
\item{name}{Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.}
\item{fantasy}{Name of the rusher on rush plays or receiver on pass plays.}
\item{fantasy_id}{ID of the rusher on rush plays or receiver on pass plays.}
\item{fantasy_player_name}{Name of the rusher on rush plays or receiver on pass plays (from official stats).}
\item{fantasy_player_id}{ID of the rusher on rush plays or receiver on pass plays (from official stats).}
\item{jersey_number}{Jersey number of the player listed in the 'name' column.}
\item{id}{ID of the player in the 'name' column.}
\item{out_of_bounds}{= 1 if play description contains "ran ob", "pushed ob", or "sacked ob"; = 0 otherwise.}
\item{home_opening_kickoff}{= 1 if the home team received the opening kickoff, 0 otherwise.}
}
}
\description{
Clean Play by Play Data
}
\details{
Build columns that capture what happens on all plays, including
penalties, using string extraction from play description.
Loosely based on Ben's nflfastR guide (\url{https://nflfastr.com/articles/beginners_guide.html})
but updated to work with the RS data, which has a different player format in
the play description; e.g. 24-M.Lynch instead of M.Lynch.
The function also standardizes team abbreviations so that, for example,
the Chargers are always represented by 'LAC' regardless of which year it was.
Starting in 2022, play-by-play data was missing gsis player IDs of rookies.
This functions tries to fix as many as possible.
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
================================================
FILE: man/decode_player_ids.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_decode_player_ids.R
\name{decode_player_ids}
\alias{decode_player_ids}
\title{Decode the player IDs in nflfastR play-by-play data}
\usage{
decode_player_ids(pbp, ..., fast = TRUE)
}
\arguments{
\item{pbp}{is a Data frame of play-by-play data scraped using \code{\link[=fast_scraper]{fast_scraper()}}.}
\item{...}{Additional arguments passed to a message function (for internal use).}
\item{fast}{If \code{TRUE} the IDs will be decoded with the high efficient
function \link[gsisdecoder:decode_ids]{decode_ids}. If \code{FALSE} an nflfastR internal
function will be used for decoding (it is generally not recommended to do this,
unless there is a problem with \link[gsisdecoder:decode_ids]{decode_ids}
which can take several days to fix on CRAN.)}
}
\value{
The input data frame of the parameter \code{pbp} with decoded player IDs.
}
\description{
Takes all columns ending with \code{'player_id'} as well as the
variables \code{'passer_id'}, \code{'rusher_id'}, \code{'fantasy_id'},
\code{'receiver_id'}, and \code{'id'} of an nflfastR play-by-play data set
and decodes the player IDs to the commonly known GSIS ID format 00-00xxxxx.
The function uses by default the high efficient \link[gsisdecoder:decode_ids]{decode_ids}
of the package \href{https://cran.r-project.org/package=gsisdecoder}{\code{gsisdecoder}}.
In the unlikely event that there is a problem with this function, an nflfastR
internal decoder can be used with the option \code{fast = FALSE}.
The 2022 play by play data introduced new player IDs that can't be decoded
with gsisdecoder. In that case, IDs are joined through \link[nflreadr:load_players]{nflreadr::load_players}.
}
\examples{
\donttest{
# Decode data frame consisting of some names and ids
decode_player_ids(data.frame(
name = c("P.Mahomes", "B.Baldwin", "P.Mahomes", "S.Carl", "J.Jones"),
id = c(
"32013030-2d30-3033-3338-3733fa30c4fa",
NA_character_,
"00-0033873",
NA_character_,
"32013030-2d30-3032-3739-3434d4d3846d"
)
))
}
}
================================================
FILE: man/fast_scraper.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper}
\alias{fast_scraper}
\title{Get NFL Play by Play Data}
\usage{
fast_scraper(
game_ids,
dir = getOption("nflfastR.raw_directory", default = NULL),
...,
in_builder = FALSE
)
}
\arguments{
\item{game_ids}{Vector of character ids or a data frame including the variable
\code{game_id} (see details for further information).}
\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory")
where nflfastR searches for raw game play-by-play data.
See \code{\link[=save_raw_pbp]{save_raw_pbp()}} for additional information.}
\item{...}{Additional arguments passed to the scraping functions (for internal use)}
\item{in_builder}{If \code{TRUE}, the final message will be suppressed (for usage inside of \code{\link{build_nflfastR_pbp}}).}
}
\value{
Data frame where each individual row represents a single play for
all passed game_ids containing the following
detailed information (description partly extracted from nflscrapR):
\describe{
\item{play_id}{Numeric play id that when used with game_id and drive provides the unique identifier for a single play.}
\item{game_id}{Ten digit identifier for NFL game.}
\item{old_game_id}{Legacy NFL game ID.}
\item{home_team}{String abbreviation for the home team.}
\item{away_team}{String abbreviation for the away team.}
\item{season_type}{'REG' or 'POST' indicating if the game belongs to regular or post season.}
\item{week}{Season week.}
\item{posteam}{String abbreviation for the team with possession.}
\item{posteam_type}{String indicating whether the posteam team is home or away.}
\item{defteam}{String abbreviation for the team on defense.}
\item{side_of_field}{String abbreviation for which team's side of the field the team with possession is currently on.}
\item{yardline_100}{Numeric distance in the number of yards from the opponent's endzone for the posteam.}
\item{game_date}{Date of the game.}
\item{quarter_seconds_remaining}{Numeric seconds remaining in the quarter.}
\item{half_seconds_remaining}{Numeric seconds remaining in the half.}
\item{game_seconds_remaining}{Numeric seconds remaining in the game.}
\item{game_half}{String indicating which half the play is in, either Half1, Half2, or Overtime.}
\item{quarter_end}{Binary indicator for whether or not the row of the data is marking the end of a quarter.}
\item{drive}{Numeric drive number in the game.}
\item{sp}{Binary indicator for whether or not a score occurred on the play.}
\item{qtr}{Quarter of the game (5 is overtime).}
\item{down}{The down for the given play.}
\item{goal_to_go}{Binary indicator for whether or not the posteam is in a goal down situation.}
\item{time}{Time at start of play provided in string format as minutes:seconds remaining in the quarter.}
\item{yrdln}{String indicating the current field position for a given play.}
\item{ydstogo}{Numeric yards in distance from either the first down marker or the endzone in goal down situations.}
\item{ydsnet}{Numeric value for total yards gained on the given drive.}
\item{desc}{Detailed string description for the given play.}
\item{play_type}{String indicating the type of play: pass (includes sacks), run (includes scrambles), punt, field_goal, kickoff, extra_point, qb_kneel, qb_spike, no_play (timeouts and penalties), and missing for rows indicating end of play.}
\item{yards_gained}{Numeric yards gained (or lost) by the possessing team, excluding yards gained via fumble recoveries and laterals.}
\item{shotgun}{Binary indicator for whether or not the play was in shotgun formation.}
\item{no_huddle}{Binary indicator for whether or not the play was in no_huddle formation.}
\item{qb_dropback}{Binary indicator for whether or not the QB dropped back on the play (pass attempt, sack, or scrambled).}
\item{qb_kneel}{Binary indicator for whether or not the QB took a knee.}
\item{qb_spike}{Binary indicator for whether or not the QB spiked the ball.}
\item{qb_scramble}{Binary indicator for whether or not the QB scrambled.}
\item{pass_length}{String indicator for pass length: short or deep.}
\item{pass_location}{String indicator for pass location: left, middle, or right.}
\item{air_yards}{Numeric value for distance in yards perpendicular to the line of scrimmage at where the targeted receiver either caught or didn't catch the ball.}
\item{yards_after_catch}{Numeric value for distance in yards perpendicular to the yard line where the receiver made the reception to where the play ended.}
\item{run_location}{String indicator for location of run: left, middle, or right.}
\item{run_gap}{String indicator for line gap of run: end, guard, or tackle}
\item{field_goal_result}{String indicator for result of field goal attempt: made, missed, or blocked.}
\item{kick_distance}{Numeric distance in yards for kickoffs, field goals, and punts.}
\item{extra_point_result}{String indicator for the result of the extra point attempt: good, failed, blocked, safety (touchback in defensive endzone is 1 point apparently), or aborted.}
\item{two_point_conv_result}{String indicator for result of two point conversion attempt: success, failure, safety (touchback in defensive endzone is 1 point apparently), or return.}
\item{home_timeouts_remaining}{Numeric timeouts remaining in the half for the home team.}
\item{away_timeouts_remaining}{Numeric timeouts remaining in the half for the away team.}
\item{timeout}{Binary indicator for whether or not a timeout was called by either team.}
\item{timeout_team}{String abbreviation for which team called the timeout.}
\item{td_team}{String abbreviation for which team scored the touchdown.}
\item{td_player_name}{String name of the player who scored a touchdown.}
\item{td_player_id}{Unique identifier of the player who scored a touchdown.}
\item{posteam_timeouts_remaining}{Number of timeouts remaining for the possession team.}
\item{defteam_timeouts_remaining}{Number of timeouts remaining for the team on defense.}
\item{total_home_score}{Score for the home team at the end of the play.}
\item{total_away_score}{Score for the away team at the end of the play.}
\item{posteam_score}{Score the posteam at the start of the play.}
\item{defteam_score}{Score the defteam at the start of the play.}
\item{score_differential}{Score differential between the posteam and defteam at the start of the play.}
\item{posteam_score_post}{Score for the posteam at the end of the play.}
\item{defteam_score_post}{Score for the defteam at the end of the play.}
\item{score_differential_post}{Score differential between the posteam and defteam at the end of the play.}
\item{no_score_prob}{Predicted probability of no score occurring for the rest of the half based on the expected points model.}
\item{opp_fg_prob}{Predicted probability of the defteam scoring a FG next.}
\item{opp_safety_prob}{Predicted probability of the defteam scoring a safety next.}
\item{opp_td_prob}{Predicted probability of the defteam scoring a TD next.}
\item{fg_prob}{Predicted probability of the posteam scoring a FG next.}
\item{safety_prob}{Predicted probability of the posteam scoring a safety next.}
\item{td_prob}{Predicted probability of the posteam scoring a TD next.}
\item{extra_point_prob}{Predicted probability of the posteam scoring an extra point.}
\item{two_point_conversion_prob}{Predicted probability of the posteam scoring the two point conversion.}
\item{ep}{Using the scoring event probabilities, the estimated expected points with respect to the possession team for the given play.}
\item{epa}{Expected points added (EPA) by the posteam for the given play.}
\item{total_home_epa}{Cumulative total EPA for the home team in the game so far.}
\item{total_away_epa}{Cumulative total EPA for the away team in the game so far.}
\item{total_home_rush_epa}{Cumulative total rushing EPA for the home team in the game so far.}
\item{total_away_rush_epa}{Cumulative total rushing EPA for the away team in the game so far.}
\item{total_home_pass_epa}{Cumulative total passing EPA for the home team in the game so far.}
\item{total_away_pass_epa}{Cumulative total passing EPA for the away team in the game so far.}
\item{air_epa}{EPA from the air yards alone. For completions this represents the actual value provided through the air. For incompletions this represents the hypothetical value that could've been added through the air if the pass was completed.}
\item{yac_epa}{EPA from the yards after catch alone. For completions this represents the actual value provided after the catch. For incompletions this represents the difference between the hypothetical air_epa and the play's raw observed EPA (how much the incomplete pass cost the posteam).}
\item{comp_air_epa}{EPA from the air yards alone only for completions.}
\item{comp_yac_epa}{EPA from the yards after catch alone only for completions.}
\item{total_home_comp_air_epa}{Cumulative total completions air EPA for the home team in the game so far.}
\item{total_away_comp_air_epa}{Cumulative total completions air EPA for the away team in the game so far.}
\item{total_home_comp_yac_epa}{Cumulative total completions yac EPA for the home team in the game so far.}
\item{total_away_comp_yac_epa}{Cumulative total completions yac EPA for the away team in the game so far.}
\item{total_home_raw_air_epa}{Cumulative total raw air EPA for the home team in the game so far.}
\item{total_away_raw_air_epa}{Cumulative total raw air EPA for the away team in the game so far.}
\item{total_home_raw_yac_epa}{Cumulative total raw yac EPA for the home team in the game so far.}
\item{total_away_raw_yac_epa}{Cumulative total raw yac EPA for the away team in the game so far.}
\item{wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play.}
\item{def_wp}{Estimated win probability for the defteam.}
\item{home_wp}{Estimated win probability for the home team.}
\item{away_wp}{Estimated win probability for the away team.}
\item{wpa}{Win probability added (WPA) for the posteam.}
\item{vegas_wpa}{Win probability added (WPA) for the posteam: spread_adjusted model.}
\item{vegas_home_wpa}{Win probability added (WPA) for the home team: spread_adjusted model.}
\item{home_wp_post}{Estimated win probability for the home team at the end of the play.}
\item{away_wp_post}{Estimated win probability for the away team at the end of the play.}
\item{vegas_wp}{Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line.}
\item{vegas_home_wp}{Estimated win probability for the home team incorporating pre-game Vegas line.}
\item{total_home_rush_wpa}{Cumulative total rushing WPA for the home team in the game so far.}
\item{total_away_rush_wpa}{Cumulative total rushing WPA for the away team in the game so far.}
\item{total_home_pass_wpa}{Cumulative total passing WPA for the home team in the game so far.}
\item{total_away_pass_wpa}{Cumulative total passing WPA for the away team in the game so far.}
\item{air_wpa}{WPA through the air (same logic as air_epa).}
\item{yac_wpa}{WPA from yards after the catch (same logic as yac_epa).}
\item{comp_air_wpa}{The air_wpa for completions only.}
\item{comp_yac_wpa}{The yac_wpa for completions only.}
\item{total_home_comp_air_wpa}{Cumulative total completions air WPA for the home team in the game so far.}
\item{total_away_comp_air_wpa}{Cumulative total completions air WPA for the away team in the game so far.}
\item{total_home_comp_yac_wpa}{Cumulative total completions yac WPA for the home team in the game so far.}
\item{total_away_comp_yac_wpa}{Cumulative total completions yac WPA for the away team in the game so far.}
\item{total_home_raw_air_wpa}{Cumulative total raw air WPA for the home team in the game so far.}
\item{total_away_raw_air_wpa}{Cumulative total raw air WPA for the away team in the game so far.}
\item{total_home_raw_yac_wpa}{Cumulative total raw yac WPA for the home team in the game so far.}
\item{total_away_raw_yac_wpa}{Cumulative total raw yac WPA for the away team in the game so far.}
\item{punt_blocked}{Binary indicator for if the punt was blocked.}
\item{first_down_rush}{Binary indicator for if a running play converted the first down.}
\item{first_down_pass}{Binary indicator for if a passing play converted the first down.}
\item{first_down_penalty}{Binary indicator for if a penalty converted the first down.}
\item{third_down_converted}{Binary indicator for if the first down was converted on third down.}
\item{third_down_failed}{Binary indicator for if the posteam failed to convert first down on third down.}
\item{fourth_down_converted}{Binary indicator for if the first down was converted on fourth down.}
\item{fourth_down_failed}{Binary indicator for if the posteam failed to convert first down on fourth down.}
\item{incomplete_pass}{Binary indicator for if the pass was incomplete.}
\item{touchback}{Binary indicator for if a touchback occurred on the play.}
\item{interception}{Binary indicator for if the pass was intercepted.}
\item{punt_inside_twenty}{Binary indicator for if the punt ended inside the twenty yard line.}
\item{punt_in_endzone}{Binary indicator for if the punt was in the endzone.}
\item{punt_out_of_bounds}{Binary indicator for if the punt went out of bounds.}
\item{punt_downed}{Binary indicator for if the punt was downed.}
\item{punt_fair_catch}{Binary indicator for if the punt was caught with a fair catch.}
\item{kickoff_inside_twenty}{Binary indicator for if the kickoff ended inside the twenty yard line.}
\item{kickoff_in_endzone}{Binary indicator for if the kickoff was in the endzone.}
\item{kickoff_out_of_bounds}{Binary indicator for if the kickoff went out of bounds.}
\item{kickoff_downed}{Binary indicator for if the kickoff was downed.}
\item{kickoff_fair_catch}{Binary indicator for if the kickoff was caught with a fair catch.}
\item{fumble_forced}{Binary indicator for if the fumble was forced.}
\item{fumble_not_forced}{Binary indicator for if the fumble was not forced.}
\item{fumble_out_of_bounds}{Binary indicator for if the fumble went out of bounds.}
\item{solo_tackle}{Binary indicator if the play had a solo tackle (could be multiple due to fumbles).}
\item{safety}{Binary indicator for whether or not a safety occurred.}
\item{penalty}{Binary indicator for whether or not a penalty occurred.}
\item{tackled_for_loss}{Binary indicator for whether or not a tackle for loss on a run play occurred.}
\item{fumble_lost}{Binary indicator for if the fumble was lost.}
\item{own_kickoff_recovery}{Binary indicator for if the kicking team recovered the kickoff.}
\item{own_kickoff_recovery_td}{Binary indicator for if the kicking team recovered the kickoff and scored a TD.}
\item{qb_hit}{Binary indicator if the QB was hit on the play.}
\item{rush_attempt}{Binary indicator for if the play was a run.}
\item{pass_attempt}{Binary indicator for if the play was a pass attempt (includes sacks).}
\item{sack}{Binary indicator for if the play ended in a sack.}
\item{touchdown}{Binary indicator for if the play resulted in a TD.}
\item{pass_touchdown}{Binary indicator for if the play resulted in a passing TD.}
\item{rush_touchdown}{Binary indicator for if the play resulted in a rushing TD.}
\item{return_touchdown}{Binary indicator for if the play resulted in a return TD.}
\item{extra_point_attempt}{Binary indicator for extra point attempt.}
\item{two_point_attempt}{Binary indicator for two point conversion attempt.}
\item{field_goal_attempt}{Binary indicator for field goal attempt.}
\item{kickoff_attempt}{Binary indicator for kickoff.}
\item{punt_attempt}{Binary indicator for punts.}
\item{fumble}{Binary indicator for if a fumble occurred.}
\item{complete_pass}{Binary indicator for if the pass was completed.}
\item{assist_tackle}{Binary indicator for if an assist tackle occurred.}
\item{lateral_reception}{Binary indicator for if a lateral occurred on the reception.}
\item{lateral_rush}{Binary indicator for if a lateral occurred on a run.}
\item{lateral_return}{Binary indicator for if a lateral occurred on a return.}
\item{lateral_recovery}{Binary indicator for if a lateral occurred on a fumble recovery.}
\item{passer_player_id}{Unique identifier for the player that attempted the pass.}
\item{passer_player_name}{String name for the player that attempted the pass.}
\item{passing_yards}{Numeric yards by the passer_player_name, including yards gained in pass plays with laterals.
This should equal official passing statistics.}
\item{receiver_player_id}{Unique identifier for the receiver that was targeted on the pass.}
\item{receiver_player_name}{String name for the targeted receiver.}
\item{receiving_yards}{Numeric yards by the receiver_player_name, excluding yards gained in pass plays with laterals.
This should equal official receiving statistics but could miss yards gained in pass plays with laterals.
Please see the description of \code{lateral_receiver_player_name} for further information.}
\item{rusher_player_id}{Unique identifier for the player that attempted the run.}
\item{rusher_player_name}{String name for the player that attempted the run.}
\item{rushing_yards}{Numeric yards by the rusher_player_name, excluding yards gained in rush plays with laterals.
This should equal official rushing statistics but could miss yards gained in rush plays with laterals.
Please see the description of \code{lateral_rusher_player_name} for further information.}
\item{lateral_receiver_player_id}{Unique identifier for the player that received the last(!) lateral on a pass play.}
\item{lateral_receiver_player_name}{String name for the player that received the last(!) lateral on a pass play.
If there were multiple laterals in the same play, this will only be the last player who received a lateral.
Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
for a list of plays where multiple players recorded lateral receiving yards.}
\item{lateral_receiving_yards}{Numeric yards by the \code{lateral_receiver_player_name} in pass plays with laterals.
Please see the description of \code{lateral_receiver_player_name} for further information.}
\item{lateral_rusher_player_id}{Unique identifier for the player that received the last(!) lateral on a run play.}
\item{lateral_rusher_player_name}{String name for the player that received the last(!) lateral on a run play.
If there were multiple laterals in the same play, this will only be the last player who received a lateral.
Please see \url{https://github.com/mrcaseb/nfl-data/tree/master/data/lateral_yards}
for a list of plays where multiple players recorded lateral rushing yards.}
\item{lateral_rushing_yards}{Numeric yards by the \code{lateral_rusher_player_name} in run plays with laterals.
Please see the description of \code{lateral_rusher_player_name} for further information.}
\item{lateral_sack_player_id}{Unique identifier for the player that received the lateral on a sack.}
\item{lateral_sack_player_name}{String name for the player that received the lateral on a sack.}
\item{interception_player_id}{Unique identifier for the player that intercepted the pass.}
\item{interception_player_name}{String name for the player that intercepted the pass.}
\item{lateral_interception_player_id}{Unique indentifier for the player that received the lateral on an interception.}
\item{lateral_interception_player_name}{String name for the player that received the lateral on an interception.}
\item{punt_returner_player_id}{Unique identifier for the punt returner.}
\item{punt_returner_player_name}{String name for the punt returner.}
\item{lateral_punt_returner_player_id}{Unique identifier for the player that received the lateral on a punt return.}
\item{lateral_punt_returner_player_name}{String name for the player that received the lateral on a punt return.}
\item{kickoff_returner_player_name}{String name for the kickoff returner.}
\item{kickoff_returner_player_id}{Unique identifier for the kickoff returner.}
\item{lateral_kickoff_returner_player_id}{Unique identifier for the player that received the lateral on a kickoff return.}
\item{lateral_kickoff_returner_player_name}{String name for the player that received the lateral on a kickoff return.}
\item{punter_player_id}{Unique identifier for the punter.}
\item{punter_player_name}{String name for the punter.}
\item{kicker_player_name}{String name for the kicker on FG or kickoff.}
\item{kicker_player_id}{Unique identifier for the kicker on FG or kickoff.}
\item{own_kickoff_recovery_player_id}{Unique identifier for the player that recovered their own kickoff.}
\item{own_kickoff_recovery_player_name}{String name for the player that recovered their own kickoff.}
\item{blocked_player_id}{Unique identifier for the player that blocked the punt or FG.}
\item{blocked_player_name}{String name for the player that blocked the punt or FG.}
\item{tackle_for_loss_1_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_1_player_name}{String name for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_2_player_id}{Unique identifier for one of the potential players with the tackle for loss.}
\item{tackle_for_loss_2_player_name}{String name for one of the potential players with the tackle for loss.}
\item{qb_hit_1_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_1_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_2_player_id}{Unique identifier for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{qb_hit_2_player_name}{String name for one of the potential players that hit the QB. No sack as the QB was not the ball carrier. For sacks please see \code{sack_player} or \verb{half_sack_*_player}.}
\item{forced_fumble_player_1_team}{Team of one of the players with a forced fumble.}
\item{forced_fumble_player_1_player_id}{Unique identifier of one of the players with a forced fumble.}
\item{forced_fumble_player_1_player_name}{String name of one of the players with a forced fumble.}
\item{forced_fumble_player_2_team}{Team of one of the players with a forced fumble.}
\item{forced_fumble_player_2_player_id}{Unique identifier of one of the players with a forced fumble.}
\item{forced_fumble_player_2_player_name}{String name of one of the players with a forced fumble.}
\item{solo_tackle_1_team}{Team of one of the players with a solo tackle.}
\item{solo_tackle_2_team}{Team of one of the players with a solo tackle.}
\item{solo_tackle_1_player_id}{Unique identifier of one of the players with a solo tackle.}
\item{solo_tackle_2_player_id}{Unique identifier of one of the players with a solo tackle.}
\item{solo_tackle_1_player_name}{String name of one of the players with a solo tackle.}
\item{solo_tackle_2_player_name}{String name of one of the players with a solo tackle.}
\item{assist_tackle_1_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_1_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_1_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_2_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_2_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_2_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_3_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_3_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_3_team}{Team of one of the players with a tackle assist.}
\item{assist_tackle_4_player_id}{Unique identifier of one of the players with a tackle assist.}
\item{assist_tackle_4_player_name}{String name of one of the players with a tackle assist.}
\item{assist_tackle_4_team}{Team of one of the players with a tackle assist.}
\item{tackle_with_assist}{Binary indicator for if there has been a tackle with assist.}
\item{tackle_with_assist_1_player_id}{Unique identifier of one of the players with a tackle with assist.}
\item{tackle_with_assist_1_player_name}{String name of one of the players with a tackle with assist.}
\item{tackle_with_assist_1_team}{Team of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_player_id}{Unique identifier of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_player_name}{String name of one of the players with a tackle with assist.}
\item{tackle_with_assist_2_team}{Team of one of the players with a tackle with assist.}
\item{pass_defense_1_player_id}{Unique identifier of one of the players with a pass defense.}
\item{pass_defense_1_player_name}{String name of one of the players with a pass defense.}
\item{pass_defense_2_player_id}{Unique identifier of one of the players with a pass defense.}
\item{pass_defense_2_player_name}{String name of one of the players with a pass defense.}
\item{fumbled_1_team}{Team of one of the first player with a fumble.}
\item{fumbled_1_player_id}{Unique identifier of the first player who fumbled on the play.}
\item{fumbled_1_player_name}{String name of one of the first player who fumbled on the play.}
\item{fumbled_2_player_id}{Unique identifier of the second player who fumbled on the play.}
\item{fumbled_2_player_name}{String name of one of the second player who fumbled on the play.}
\item{fumbled_2_team}{Team of one of the second player with a fumble.}
\item{fumble_recovery_1_team}{Team of one of the players with a fumble recovery.}
\item{fumble_recovery_1_yards}{Yards gained by one of the players with a fumble recovery.}
\item{fumble_recovery_1_player_id}{Unique identifier of one of the players with a fumble recovery.}
\item{fumble_recovery_1_player_name}{String name of one of the players with a fumble recovery.}
\item{fumble_recovery_2_team}{Team of one of the players with a fumble recovery.}
\item{fumble_recovery_2_yards}{Yards gained by one of the players with a fumble recovery.}
\item{fumble_recovery_2_player_id}{Unique identifier of one of the players with a fumble recovery.}
\item{fumble_recovery_2_player_name}{String name of one of the players with a fumble recovery.}
\item{sack_player_id}{Unique identifier of the player who recorded a solo sack.}
\item{sack_player_name}{String name of the player who recorded a solo sack.}
\item{half_sack_1_player_id}{Unique identifier of the first player who recorded half a sack.}
\item{half_sack_1_player_name}{String name of the first player who recorded half a sack.}
\item{half_sack_2_player_id}{Unique identifier of the second player who recorded half a sack.}
\item{half_sack_2_player_name}{String name of the second player who recorded half a sack.}
\item{return_team}{String abbreviation of the return team.}
\item{return_yards}{Yards gained by the return team.}
\item{penalty_team}{String abbreviation of the team with the penalty.}
\item{penalty_player_id}{Unique identifier for the player with the penalty.}
\item{penalty_player_name}{String name for the player with the penalty.}
\item{penalty_yards}{Yards gained (or lost) by the posteam from the penalty.}
\item{replay_or_challenge}{Binary indicator for whether or not a replay or challenge.}
\item{replay_or_challenge_result}{String indicating the result of the replay or challenge.}
\item{penalty_type}{String indicating the penalty type of the first penalty in the given play. Will be \code{NA} if \code{desc} is missing the type.}
\item{defensive_two_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on a two point conversion, this results following a turnover.}
\item{defensive_two_point_conv}{Binary indicator whether or not the defense successfully scored on the two point conversion.}
\item{defensive_extra_point_attempt}{Binary indicator whether or not the defense was able to have an attempt on an extra point attempt, this results following a blocked attempt that the defense recovers the ball.}
\item{defensive_extra_point_conv}{Binary indicator whether or not the defense successfully scored on an extra point attempt.}
\item{safety_player_name}{String name for the player who scored a safety.}
\item{safety_player_id}{Unique identifier for the player who scored a safety.}
\item{season}{4 digit number indicating to which season the game belongs to.}
\item{cp}{Numeric value indicating the probability for a complete pass based on comparable game situations.}
\item{cpoe}{For a single pass play this is 1 - cp when the pass was completed or 0 - cp when the pass was incomplete. Analyzed for a whole game or season an indicator for the passer how much over or under expectation his completion percentage was.}
\item{series}{Starts at 1, each new first down increments, numbers shared across both teams NA: kickoffs, extra point/two point conversion attempts, non-plays, no posteam}
\item{series_success}{1: scored touchdown, gained enough yards for first down.}
\item{series_result}{Possible values: First down, Touchdown, Opp touchdown, Field goal, Missed field goal, Safety, Turnover, Punt, Turnover on downs, QB kneel, End of half}
\item{start_time}{Kickoff time in eastern time zone.}
\item{order_sequence}{Column provided by NFL to fix out-of-order plays. Available 2011 and beyond with source "nfl".}
\item{time_of_day}{Time of day of play in UTC "HH:MM:SS" format. Available 2011 and beyond with source "nfl".}
\item{stadium}{Game site name.}
\item{weather}{String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game!}
\item{nfl_api_id}{UUID of the game in the new NFL API.}
\item{play_clock}{Time on the playclock when the ball was snapped.}
\item{play_deleted}{Binary indicator for deleted plays.}
\item{play_type_nfl}{Play type as listed in the NFL source. Slightly different to the regular play_type variable.}
\item{special_teams_play}{Binary indicator for whether play is special teams play from NFL source. Available 2011 and beyond with source "nfl".}
\item{st_play_type}{Type of special teams play from NFL source. Available 2011 and beyond with source "nfl".}
\item{end_clock_time}{Game time at the end of a given play.}
\item{end_yard_line}{String indicating the yardline at the end of the given play consisting of team half and yard line number.}
\item{drive_real_start_time}{Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA').}
\item{drive_play_count}{Numeric value of how many regular plays happened in a given drive.}
\item{drive_time_of_possession}{Time of possession in a given drive.}
\item{drive_first_downs}{Number of first downs in a given drive.}
\item{drive_inside20}{Binary indicator if the offense was able to get inside the opponents 20 yard line.}
\item{drive_ended_with_score}{Binary indicator the drive ended with a score.}
\item{drive_quarter_start}{Numeric value indicating in which quarter the given drive has started.}
\item{drive_quarter_end}{Numeric value indicating in which quarter the given drive has ended.}
\item{drive_yards_penalized}{Numeric value of how many yards the offense gained or lost through penalties in the given drive.}
\item{drive_start_transition}{String indicating how the offense got the ball.}
\item{drive_end_transition}{String indicating how the offense lost the ball.}
\item{drive_game_clock_start}{Game time at the beginning of a given drive.}
\item{drive_game_clock_end}{Game time at the end of a given drive.}
\item{drive_start_yard_line}{String indicating where a given drive started consisting of team half and yard line number.}
\item{drive_end_yard_line}{String indicating where a given drive ended consisting of team half and yard line number.}
\item{drive_play_id_started}{Play_id of the first play in the given drive.}
\item{drive_play_id_ended}{Play_id of the last play in the given drive.}
\item{fixed_drive}{Manually created drive number in a game.}
\item{fixed_drive_result}{Manually created drive result.}
\item{away_score}{Total points scored by the away team.}
\item{home_score}{Total points scored by the home team.}
\item{location}{Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. }
\item{result}{Equals home_score - away_score and means the game outcome from the perspective of the home team.}
\item{total}{Equals home_score + away_score and means the total points scored in the given game.}
\item{spread_line}{The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference)}
\item{total_line}{The closing total line for the game. (Source: Pro-Football-Reference)}
\item{div_game}{Binary indicator for if the given game was a division game.}
\item{roof}{One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference)}
\item{surface}{What type of ground the game was played on. (Source: Pro-Football-Reference)}
\item{temp}{The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference)}
\item{wind}{The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference)}
\item{home_coach}{First and last name of the home team coach. (Source: Pro-Football-Reference)}
\item{away_coach}{First and last name of the away team coach. (Source: Pro-Football-Reference)}
\item{stadium_id}{ID of the stadium the game was played in. (Source: Pro-Football-Reference)}
\item{game_stadium}{Name of the stadium the game was played in. (Source: Pro-Football-Reference)}
}
}
\description{
Load and parse NFL play-by-play data and add all of the original
nflfastR variables. As nflfastR now provides multiple functions which add
information to the output of this function, it is recommended to use
\code{\link{build_nflfastR_pbp}} instead.
}
\details{
To load valid game_ids please use the package function
\code{\link{fast_scraper_schedules}} (the function can directly handle the
output of that function)
}
\examples{
\donttest{
# Get pbp data for two games
try({# to avoid CRAN test problems
fast_scraper(c("2019_01_GB_CHI", "2013_21_SEA_DEN"))
})
# It is also possible to directly use the
# output of `fast_scraper_schedules` as input
try({# to avoid CRAN test problems
library(dplyr, warn.conflicts = FALSE)
fast_scraper_schedules(2020) |>
slice_tail(n = 3) |>
fast_scraper()
})
\dontshow{
# Close open connections for R CMD Check
future::plan("sequential")
}
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
\code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, \code{\link[=save_raw_pbp]{save_raw_pbp()}}
}
================================================
FILE: man/fast_scraper_roster.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper_roster}
\alias{fast_scraper_roster}
\title{Load Team Rosters for Multiple Seasons}
\usage{
fast_scraper_roster(...)
}
\arguments{
\item{...}{
Arguments passed on to \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}}
\describe{
\item{\code{seasons}}{a numeric vector of seasons to return, defaults to returning
this year's data if it is March or later. If set to \code{TRUE}, will return all available data.
Data available back to 1920.}
\item{\code{file_type}}{One of \code{c("rds", "csv", "parquet")}. Can also be set globally with
\code{options(nflreadr.prefer)}}
}}
}
\value{
A tibble of season-level roster data.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated. Please use \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}}.
}
\details{
See \code{\link[nflreadr:load_rosters]{nflreadr::load_rosters}} for details.
}
\examples{
\donttest{
# Roster of the 2019 and 2020 seasons
try({# to avoid CRAN test problems
# fast_scraper_roster(2019:2020)
})
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
\keyword{internal}
================================================
FILE: man/fast_scraper_schedules.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/top-level_scraper.R
\name{fast_scraper_schedules}
\alias{fast_scraper_schedules}
\title{Load NFL Season Schedules}
\usage{
fast_scraper_schedules(...)
}
\arguments{
\item{...}{
Arguments passed on to \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}}
\describe{
\item{\code{seasons}}{a numeric vector of seasons to return, default \code{TRUE} returns all available data.}
}}
}
\value{
A tibble of game information for past and/or future games.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated. Please use \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}}.
}
\details{
See \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules}} for details.
}
\examples{
\donttest{
# Get schedules for the whole 2015 - 2018 seasons
try({# to avoid CRAN test problems
# fast_scraper_schedules(2015:2018)
})
}
}
\seealso{
For information on parallel processing and progress updates please
see \link{nflfastR}.
}
\keyword{internal}
================================================
FILE: man/field_descriptions.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{field_descriptions}
\alias{field_descriptions}
\title{nflfastR Field Descriptions}
\format{
A data frame including names and descriptions of all variables in
an nflfastR dataset.
}
\usage{
field_descriptions
}
\description{
nflfastR Field Descriptions
}
\examples{
\donttest{
field_descriptions
}
}
\seealso{
The searchable table on the
\href{https://nflfastr.com/articles/field_descriptions.html}{nflfastR website}
}
\keyword{datasets}
================================================
FILE: man/missing_raw_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/save_raw_pbp.R
\name{missing_raw_pbp}
\alias{missing_raw_pbp}
\title{Compute Missing Raw PBP Data on Local Filesystem}
\usage{
missing_raw_pbp(
dir = getOption("nflfastR.raw_directory", default = NULL),
seasons = TRUE,
verbose = TRUE
)
}
\arguments{
\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory").
nflfastR will download the raw game files split by season into one sub
directory per season.}
\item{seasons}{a numeric vector of seasons to return, default \code{TRUE} returns all available data.}
\item{verbose}{If \code{TRUE}, will print number of missing game files as well as
oldest and most recent missing ID to console.}
}
\value{
A character vector of missing game IDs. If no files are missing,
returns \code{NULL} invisibly.
}
\description{
Uses \code{\link[nflreadr:load_schedules]{nflreadr::load_schedules()}} to load game IDs of finished games and
compares these IDs to all files saved under \code{dir}.
This function is intended to serve as input for \code{\link[=save_raw_pbp]{save_raw_pbp()}}.
}
\examples{
\donttest{
try(
missing <- missing_raw_pbp(tempdir())
)
}
}
\seealso{
\code{\link[=save_raw_pbp]{save_raw_pbp()}}
}
================================================
FILE: man/nfl_stats_variables.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{nfl_stats_variables}
\alias{nfl_stats_variables}
\title{NFL Stats Variables}
\format{
A data frame explaining all variables returned by the function
\code{\link[=calculate_stats]{calculate_stats()}}.
}
\usage{
nfl_stats_variables
}
\description{
NFL Stats Variables
}
\examples{
\donttest{
nfl_stats_variables
}
}
\keyword{datasets}
================================================
FILE: man/nflfastR-package.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nflfastR-package.R
\docType{package}
\name{nflfastR-package}
\alias{nflfastR}
\alias{nflfastR-package}
\title{nflfastR: Functions to Efficiently Access NFL Play by Play Data}
\description{
\if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}}
A set of functions to access National Football League play-by-play data from \url{https://www.nfl.com/}.
}
\section{Parallel Processing and Progress Updates in nflfastR}{
\subsection{Preface}{
Prior to nflfastR v4.0, parallel processing could be activated with an
argument \code{pp} in the relevant functions and progress updates were always
shown. Both of these methods are bad practice and were therefore removed
in nflfastR v4.0
The next sections describe how to make nflfastR work in parallel processes
and show progress updates if the user wants to.
}
\subsection{More Speed Using Parallel Processing}{
Nearly all nflfastR functions support parallel processing
using \code{\link[furrr:future_map]{furrr::future_map()}} if it is enabled by a call to \code{\link[future:plan]{future::plan()}}
prior to the function call.
Please see the documentation of the functions for detailed information.
As an example, the following code block will resolve all function calls in the
current session using multiple sessions in the background and load play-by-play
data for the 2018 through 2020 seasons or build them freshly for the 2018 and
2019 Super Bowls:
\if{html}{\out{}}\preformatted{future::plan("multisession")
load_pbp(2018:2020)
build_nflfastR_pbp(c("2018_21_NE_LA", "2019_21_SF_KC"))
}\if{html}{\out{
}}
We recommend choosing a default parallel processing method and saving it
as an environment variable in the R user profile to make sure all futures
will be resolved with the chosen method by default.
This can be done by following the below given steps.
First, run the following line and the file \code{.Renviron} should be opened automatically.
If you haven't saved any environment variables yet, this will be an empty file.
\if{html}{\out{}}\preformatted{usethis::edit_r_environ()
}\if{html}{\out{
}}
In the opened file \code{.Renviron} add the next line, then save the file and restart your R session.
Please note that this example sets "multisession" as default. For most users
this should be the appropriate plan but please make sure it truly is.
\if{html}{\out{}}\preformatted{R_FUTURE_PLAN="multisession"
}\if{html}{\out{
}}
After the session is freshly restarted please check if the above method worked
by running the next line. If the output is \code{FALSE} you successfully set up a
default non-sequential \code{\link[future:plan]{future::plan()}}. If the output is \code{TRUE} all functions
will behave like they were called with \code{\link[purrr:map]{purrr::map()}} and NOT in multisession.
\if{html}{\out{}}\preformatted{inherits(future::plan(), "sequential")
}\if{html}{\out{
}}
For more information on possible plans please see
\href{https://github.com/futureverse/future/blob/develop/README.md}{the future package Readme}.
For more information on \code{.Renviron} please see
\href{https://rstats.wtf/r-startup.html}{this book chapter}.
}
\subsection{Get Progress Updates while Functions are Running}{
Most nflfastR functions are able to show progress updates
using \code{\link[progressr:progressor]{progressr::progressor()}} if they are turned on before the function is
called. There are at least two basic ways to do this by either activating
progress updates globally (for the current session) with
\if{html}{\out{}}\preformatted{progressr::handlers(global = TRUE)
}\if{html}{\out{
}}
or by piping the function call into \code{\link[progressr:with_progress]{progressr::with_progress()}}:
\if{html}{\out{}}\preformatted{load_pbp(2018:2020) |>
progressr::with_progress()
}\if{html}{\out{
}}
Just like in the previous section, it is possible to activate global
progression handlers by default. This can be done by following the below given steps.
First, run the following line and the file \code{.Rprofile} should be opened automatically.
If you haven't saved any code yet, this will be an empty file.
\if{html}{\out{}}\preformatted{usethis::edit_r_profile()
}\if{html}{\out{
}}
In the opened file \code{.Rprofile} add the next line, then save the file and restart your R
session. All code in this file will be executed when a new R session starts.
The part \verb{if (require("progressr"))} makes sure this will only run if the
package progressr is installed to avoid crashing R sessions.
\if{html}{\out{}}\preformatted{if (requireNamespace("progressr", quietly = TRUE)) progressr::handlers(global = TRUE)
}\if{html}{\out{
}}
After the session is freshly restarted please check if the above method worked
by running the next line. If the output is \code{TRUE} you successfully activated
global progression handlers for all sessions.
\if{html}{\out{}}\preformatted{progressr::handlers(global = NA)
}\if{html}{\out{
}}
For more information how to work with progress handlers please see \link[progressr:progressr]{progressr::progressr}.
For more information on \code{.Rprofile} please see
\href{https://rstats.wtf/r-startup.html}{this book chapter}.
}
}
\seealso{
Useful links:
\itemize{
\item \url{https://nflfastr.com/}
\item \url{https://github.com/nflverse/nflfastR}
\item Report bugs at \url{https://github.com/nflverse/nflfastR/issues}
}
}
\author{
\strong{Maintainer}: Ben Baldwin \email{bbaldwin206@gmail.com}
Authors:
\itemize{
\item Sebastian Carl \email{mrcaseb@gmail.com}
}
Other contributors:
\itemize{
\item Lee Sharpe [contributor]
\item Maksim Horowitz \email{maksim.horowitz@gmail.com} [contributor]
\item Ron Yurko \email{ryurko@stat.cmu.edu} [contributor]
\item Samuel Ventura \email{samventura22@gmail.com} [contributor]
\item Tan Ho [contributor]
\item John Edwards \email{edwards1860@gmail.com} [contributor]
}
}
================================================
FILE: man/reexports.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nflfastR-package.R
\docType{import}
\name{reexports}
\alias{reexports}
\alias{load_pbp}
\alias{load_player_stats}
\alias{load_team_stats}
\alias{load_schedules}
\alias{load_rosters}
\alias{nflverse_sitrep}
\alias{most_recent_season}
\title{Objects exported from other packages}
\keyword{internal}
\description{
These objects are imported from other packages. Follow the links
below to see their documentation.
\describe{
\item{nflreadr}{\code{\link[nflreadr]{load_pbp}}, \code{\link[nflreadr]{load_player_stats}}, \code{\link[nflreadr]{load_rosters}}, \code{\link[nflreadr]{load_schedules}}, \code{\link[nflreadr]{load_team_stats}}, \code{\link[nflreadr:latest_season]{most_recent_season}}, \code{\link[nflreadr:sitrep]{nflverse_sitrep}}}
}}
================================================
FILE: man/report.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/report.R
\name{report}
\alias{report}
\title{Get a Situation Report on System, nflverse Package Versions and Dependencies}
\usage{
report(...)
}
\arguments{
\item{...}{
Arguments passed on to \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}}
\describe{
\item{\code{pkg}}{a character vector naming installed packages, or \code{NULL}
(the default) meaning all nflverse packages. The function checks internally
if all packages are installed and informs if that is not the case.}
\item{\code{recursive}}{a logical indicating whether dependencies of \code{pkg} and their
dependencies (and so on) should be included.
Can also be a character vector listing the types of dependencies, a subset
of \code{c("Depends", "Imports", "LinkingTo", "Suggests", "Enhances")}.
Character string \code{"all"} is shorthand for that vector, character string
\code{"most"} for the same vector without \code{"Enhances"}, character string \code{"strong"}
(default) for the first three elements of that vector.}
\item{\code{redact_path}}{a logical indicating whether options that contain "path"
in the name should be redacted, default = TRUE}
}}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
This function was deprecated. Please use \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}}.
This function gives a quick overview of the versions of R and
the operating system as well as the versions of nflverse packages, options,
and their dependencies. It's primarily designed to help you get a quick
idea of what's going on when you're helping someone else debug a problem.
}
\details{
See \code{\link[nflreadr:sitrep]{nflreadr::nflverse_sitrep}} for details.
}
\examples{
\donttest{
\dontshow{
# set CRAN mirror to avoid failing checks in weird scenarios
old_ops <- options(repos = c("CRAN" = "https://cran.rstudio.com/"))
}
# report(recursive = FALSE)
nflverse_sitrep(pkg = "nflreadr", recursive = TRUE)
\dontshow{
# restore old options
options(old_ops)
}
}
}
\keyword{internal}
================================================
FILE: man/save_raw_pbp.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/save_raw_pbp.R
\name{save_raw_pbp}
\alias{save_raw_pbp}
\title{Download Raw PBP Data to Local Filesystem}
\usage{
save_raw_pbp(
game_ids,
dir = getOption("nflfastR.raw_directory", default = NULL)
)
}
\arguments{
\item{game_ids}{A vector of nflverse game IDs.}
\item{dir}{Path to local directory (defaults to option "nflfastR.raw_directory").
nflfastR will download the raw game files split by season into one sub
directory per season.}
}
\value{
The function returns a data frame with one row for each downloaded file and
the following columns:
\itemize{
\item \code{success} if the HTTP request was successfully performed, regardless of the
response status code. This is \code{FALSE} in case of a network error, or in case
you tried to resume from a server that did not support this. A value of \code{NA}
means the download was interrupted while in progress.
\item \code{status_code} the HTTP status code from the request. A successful download is
usually \code{200} for full requests or \code{206} for resumed requests. Anything else
could indicate that the downloaded file contains an error page instead of the
requested content.
\item \code{resumefrom} the file size before the request, in case a download was resumed.
\item \code{url} final url (after redirects) of the request.
\item \code{destfile} downloaded file on disk.
\item \code{error} if \code{success == FALSE} this column contains an error message.
\item \code{type} the \code{Content-Type} response header value.
\item \code{modified} the \code{Last-Modified} response header value.
\item \code{time} total elapsed download time for this file in seconds.
\item \code{headers} vector with http response headers for the request.
}
}
\description{
The functions \code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}} and \code{\link[=fast_scraper]{fast_scraper()}} support loading
raw pbp data from local file systems instead of Github servers.
This function is intended to help setting this up. It loads raw pbp data
and saves it in the given directory split by season in subdirectories.
}
\examples{
\donttest{
# CREATE LOCAL TEMP DIRECTORY
local_dir <- tempdir()
# LOAD AND SAVE A GAME TO TEMP DIRECTORY
save_raw_pbp("2021_20_BUF_KC", dir = local_dir)
# REMOVE THE DIRECTORY
unlink(file.path(local_dir, 2021))
}
}
\seealso{
\code{\link[=build_nflfastR_pbp]{build_nflfastR_pbp()}}, \code{\link[=missing_raw_pbp]{missing_raw_pbp()}}
}
================================================
FILE: man/stat_ids.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{stat_ids}
\alias{stat_ids}
\title{NFL Stat IDs and their Meanings}
\format{
A data frame including NFL stat IDs, names and descriptions used in
an nflfastR dataset.
}
\source{
\url{http://www.nflgsis.com/gsis/Documentation/Partners/StatIDs.html}
}
\usage{
stat_ids
}
\description{
NFL Stat IDs and their Meanings
}
\examples{
\donttest{
stat_ids
}
}
\keyword{datasets}
================================================
FILE: man/teams_colors_logos.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_documentation.R
\docType{data}
\name{teams_colors_logos}
\alias{teams_colors_logos}
\title{NFL Team names, colors and logo urls.}
\format{
A data frame with 36 rows and 10 variables containing NFL team level
information, including franchises in multiple cities:
\describe{
\item{team_abbr}{Team abbreviation}
\item{team_name}{Complete Team name}
\item{team_id}{Team id used in the roster function}
\item{team_nick}{Nickname}
\item{team_conf}{Conference}
\item{team_division}{Division}
\item{team_color}{Primary color}
\item{team_color2}{Secondary color}
\item{team_color3}{Tertiary color}
\item{team_color4}{Quaternary color}
\item{team_logo_wikipedia}{Url to Team logo on wikipedia}
\item{team_logo_espn}{Url to higher quality logo on espn}
\item{team_wordmark}{Url to team wordmarks}
\item{team_conference_logo}{Url to AFC and NFC logos}
\item{team_league_logo}{Url to NFL logo}
}
The primary and secondary colors have been taken from nfl.com with some modifications
for better team distinction and most recent team color themes.
The tertiary and quaternary colors are taken from Lee Sharpe's teamcolors.csv
who has taken them from the \code{teamcolors} package created by Ben Baumer and
Gregory Matthews. The Wikipeadia logo urls are taken from Lee Sharpe's logos.csv
Team wordmarks from nfl.com
}
\usage{
teams_colors_logos
}
\description{
NFL Team names, colors and logo urls.
}
\examples{
\donttest{
teams_colors_logos
}
}
\keyword{datasets}
================================================
FILE: man/update_db.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helper_database_functions.R
\name{update_db}
\alias{update_db}
\title{Update or Create a nflfastR Play-by-Play Database}
\usage{
update_db(
dbdir = getOption("nflfastR.dbdirectory", default = "."),
dbname = "pbp_db",
tblname = "nflfastR_pbp",
force_rebuild = FALSE,
db_connection = NULL
)
}
\arguments{
\item{dbdir}{Directory in which the database is or shall be located. Can also
be set globally with \code{options(nflfastR.dbdirectory)}}
\item{dbname}{File name of an existing or desired SQLite database within \code{dbdir}}
\item{tblname}{The name of the play by play data table within the database}
\item{force_rebuild}{Hybrid parameter (logical or numeric) to rebuild parts
of or the complete play by play data table within the database (please see details for further information)}
\item{db_connection}{A \code{DBIConnection} object, as returned by
\code{\link[DBI:dbConnect]{DBI::dbConnect()}} (please see details for further information)}
}
\description{
\code{update_db} updates or creates a database with \code{nflfastR}
play by play data of all completed games since 1999.
}
\details{
This function creates and updates a data table with the name \code{tblname}
within a SQLite database (other drivers via \code{db_connection}) located in
\code{dbdir} and named \code{dbname}.
The data table combines all play by play data for every available game back
to the 1999 season and adds the most recent completed games as soon as they
are available for \code{nflfastR}.
The argument \code{force_rebuild} is of hybrid type. It can rebuild the play
by play data table either for the whole nflfastR era (with \code{force_rebuild = TRUE})
or just for specified seasons (e.g. \code{force_rebuild = c(2019, 2020)}).
Please note the following behavior:
\itemize{
\item \code{force_rebuild = TRUE}: The data table with the name \code{tblname}
will be removed completely and rebuilt from scratch. This is helpful when
new columns are added during the Off-Season.
\item \code{force_rebuild = c(2019, 2020)}: The data table with the name \code{tblname}
will be preserved and only rows from the 2019 and 2020 seasons will be
deleted and re-added. This is intended to be used for ongoing seasons because
the NFL fixes bugs in the underlying data during the week and we recommend
rebuilding the current season every Thursday during the season.
}
The parameter \code{db_connection} is intended for advanced users who want
to use other DBI drivers, such as MariaDB, Postgres or odbc. Please note that
the arguments \code{dbdir} and \code{dbname} are dropped in case a \code{db_connection}
is provided but the argument \code{tblname} will still be used to write the
data table into the database.
}
================================================
FILE: man/update_pbp_db.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/database.R
\name{update_pbp_db}
\alias{update_pbp_db}
\title{Update or Create a nflverse Play-by-Play Data Table in a Connected Database}
\usage{
update_pbp_db(conn, ..., name = "nflverse_pbp", seasons = most_recent_season())
}
\arguments{
\item{conn}{A \code{DBIConnection} object, as returned by \code{\link[DBI:dbConnect]{DBI::dbConnect()}}}
\item{...}{These dots are for future extensions and must be empty.}
\item{name}{The table name, passed on to \code{\link[DBI:dbQuoteIdentifier]{dbQuoteIdentifier()}}. Options are:
\itemize{
\item a character string with the unquoted DBMS table name,
e.g. \code{"table_name"},
\item a call to \code{\link[DBI:Id]{Id()}} with components to the fully qualified table name,
e.g. \code{Id(schema = "my_schema", table = "table_name")}
\item a call to \code{\link[DBI:SQL]{SQL()}} with the quoted and fully qualified table name
given verbatim, e.g. \code{SQL('"my_schema"."table_name"')}
}}
\item{seasons}{Hybrid argument (logical or numeric) to update parts
of or the complete play by play table within the database.
It can update the play by play data table either for the whole nflfastR era
(with \code{seasons = TRUE}) or just for specified seasons
(e.g. \code{seasons = 2024:2025}).
Defaults to \link{most_recent_season}. Please see details for further information.}
}
\value{
Always returns the database connection invisibly.
}
\description{
The nflfastR play-by-play era dates back to 1999. To analyze all the data
efficiently, there is practically no alternative to working with a database.
This function helps to create and maintain a table containing all
play-by-play data of the nflfastR era in a connected database.
Primarily, the preprocessed data from \link{load_pbp} is written to the database
and, if necessary, supplemented with the latest games using
\link{build_nflfastR_pbp}.
}
\details{
\subsection{The \code{seasons} argument}{
The \code{seasons} argument controls how the table in the connected database is
handled.
With \code{seasons = TRUE}, the table in argument \code{name} will be removed completely
(by calling \link[DBI:dbRemoveTable]{DBI::dbRemoveTable}) and all seasons of the nflfastR era will be
added to a fresh table. This is helpful when new columns are added during the
offseason.
With a numerical vector, e.g. \code{seasons = 2024:2025}, the table in argument
\code{name} will be preserved and only rows from the given seasons will be deleted
and re-added (by calling \link[DBI:dbAppendTable]{DBI::dbAppendTable}). This is intended to be used
for ongoing seasons because the NFL fixes bugs in the underlying data during
the week and we recommend rebuilding the current season every Thursday during
the season.
The default behavior is \code{seasons = most_recent_season()}, which means that
only the most recent season is updated or added.
To keep the table, and thus also the schema, but update all play-by-play
data of the nflfastR era, set
\if{html}{\out{}}\preformatted{seasons = seq(1999, most_recent_season())
}\if{html}{\out{
}}
If \code{seasons} contains multiple seasons, it is possible to control whether the
seasons are loaded individually and written to the database, or whether
multiple seasons should be processed in chunks. The latter is more efficient
because fewer write operations are required, but at the same time, the data
must first be stored in memory. The option \verb{“nflfastR.db_chunk_size”} can
be used to control how many seasons are loaded together in a chunk and
written to the database. With the following option, for example, 5 seasons
are always loaded together and written to the database.
\if{html}{\out{}}\preformatted{options("nflfastR.db_chunk_size" = 5L)
}\if{html}{\out{
}}
}
}
\examples{
\donttest{
con <- DBI::dbConnect(duckdb::duckdb())
try({# to avoid CRAN test problems
update_pbp_db(con, seasons = 2024)
})
}
}
================================================
FILE: nflfastR.Rproj
================================================
Version: 1.0
ProjectId: e1e14382-386c-49b3-9b3f-206a4cc98503
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
UseNativePipeOperator: Yes
================================================
FILE: pkgdown/_pkgdown.yml
================================================
url: https://nflfastr.com/
template:
bootstrap: 5
light-switch: true
bslib:
font_scale: 1.1
base_font: {google: "Roboto"}
heading_font: {google: "Kanit"}
code_font: {google: "Fira Code"}
opengraph:
image:
src: man/figures/card.png
alt: "nflfastR social preview card"
twitter:
site: "@nflfastR"
card: summary_large_image
toc:
depth: 3
authors:
Sebastian Carl:
href: https://mrcaseb.com
Ben Baldwin:
href: https://bsky.app/profile/rbsdm.com
Lee Sharpe:
href: https://twitter.com/LeeSharpeNFL
Maksim Horowitz:
href: https://twitter.com/bklynmaks
Ron Yurko:
href: https://twitter.com/Stat_Ron
Samuel Ventura:
href: https://twitter.com/stat_sam
Tan Ho:
href: https://tanho.ca
John Edwards:
href: https://johnbedwards.io
home:
title: An R package to quickly obtain clean and tidy NFL play by play data
links:
- text: nflverse Discord Chat
href: https://discord.gg/5Er2FBnnQa
- text: nflfastR Beginner's Guide
href: articles/beginners_guide.html
- text: nflfastR stats landing page
href: https://rbsdm.com/stats/
- text: Lee Sharpe's nfl game data
href: https://nflgamedata.com
navbar:
bg: dark
type: light
structure:
left: [home, intro, reference, news, articles]
right: [search, lightswitch, stats, games, discord, github, more]
components:
games:
icon: "fas fa-football-ball fa-lg"
href: http://nflgamedata.com/
aria-label: Games
stats:
icon: "fas fa-chart-line fa-lg"
href: https://rbsdm.com/stats/
aria-label: Stats
reference:
text: "Functions"
href: reference/index.html
discord:
icon: "fab fa-discord fa-lg"
href: https://discord.com/invite/5Er2FBnnQa
aria-label: Discord
articles:
text: "Articles"
menu:
- text: A beginner’s guide to nflfastR
href: articles/beginners_guide.html
- text: Field Descriptions
href: articles/field_descriptions.html
- text: Stats Variable Descriptions
href: articles/stats_variables.html
- text: nflfastR models
href: https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/
- text: Open Source Football
href: https://www.opensourcefootball.com/
more:
text: "Packages & More"
menu:
- text: "nflverse Packages"
- text: nflfastR
href: https://nflfastr.com
- text: nflseedR
href: https://nflseedr.com
- text: nfl4th
href: https://www.nfl4th.com
- text: nflreadr
href: https://nflreadr.nflverse.com/
- text: nflplotR
href: https://nflplotr.nflverse.com/
- text: nflverse
href: https://nflverse.nflverse.com/
- text: "Open Source Football"
href: https://www.opensourcefootball.com
- text: "nflverse Data"
- text: nflverse GitHub
href: https://github.com/nflverse
- text: ffverse
- text: "ffverse.com"
href: https://www.ffverse.com
reference:
- title: Main Functions
contents:
- build_nflfastR_pbp
- update_db
- update_pbp_db
- title: Load Functions
desc: >
These functions access precomputed data using the nflreadr package.
See for info and more data load functions.
contents:
- reexports
- title: Utility Functions
contents:
- save_raw_pbp
- missing_raw_pbp
- starts_with("calculate_")
- title: Documentation
contents:
- nflfastR-package
- teams_colors_logos
- field_descriptions
- stat_ids
- nfl_stats_variables
- title: Lower Level Functions
desc: >
These functions are wrapped in the above listed main functions and
typically not used by the enduser.
contents:
- fast_scraper
- add_qb_epa
- add_xpass
- add_xyac
- clean_pbp
- decode_player_ids
- title: Deprecated
desc: 'These functions are no longer recommended for use, see nflreadr for latest versions.'
contents:
- fast_scraper_roster
- fast_scraper_schedules
- report
================================================
FILE: pkgdown/extra.css
================================================
/*
Check: https://www.w3schools.com/css/css_rwd_mediaqueries.asp
for Responsive Web Design - Media Queries
*/
.row > main {
max-width: 100%;
}
@media only screen and (min-width: 640px) {
main + .col-md-3 {
margin-left: unset;
padding-left: 5rem;
max-width: 75%;
}
}
h4.author,h4.date {
padding-top:0px;
margin-top:0px;
}
.navbar-brand {
font-weight: 300;
font-size: 1.5rem;
font-family: 'Kanit', sans-serif;
}
.me-auto {
color: #009E8D !important;
}
/*
from gt custom css
draws lines between function names on reference page
*/
dt {
text-decoration: underline;
text-decoration-style: solid;
text-underline-offset: 4px;
font-family: monospace;
border-top-style: dotted;
border-top-width: 1px;
border-top-color: gray;
margin-bottom: 5px;
padding-top: 5px;
}
.active .nav-link {
color: #F85714 !important;
}
================================================
FILE: tests/testthat/_snaps/build_nflfastR_pbp.md
================================================
# default_play is synced with build_nflfastR_pbp
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["play_id", "game_id", "old_game_id", "home_team", "away_team", "season_type", "week", "posteam", "posteam_type", "defteam", "side_of_field", "yardline_100", "game_date", "quarter_seconds_remaining", "half_seconds_remaining", "game_seconds_remaining", "game_half", "quarter_end", "drive", "sp", "qtr", "down", "goal_to_go", "time", "yrdln", "ydstogo", "ydsnet", "desc", "play_type", "yards_gained", "shotgun", "no_huddle", "qb_dropback", "qb_kneel", "qb_spike", "qb_scramble", "pass_length", "pass_location", "air_yards", "yards_after_catch", "run_location", "run_gap", "field_goal_result", "kick_distance", "extra_point_result", "two_point_conv_result", "home_timeouts_remaining", "away_timeouts_remaining", "timeout", "timeout_team", "td_team", "td_player_name", "td_player_id", "posteam_timeouts_remaining", "defteam_timeouts_remaining", "total_home_score", "total_away_score", "posteam_score", "defteam_score", "score_differential", "posteam_score_post", "defteam_score_post", "score_differential_post", "no_score_prob", "opp_fg_prob", "opp_safety_prob", "opp_td_prob", "fg_prob", "safety_prob", "td_prob", "extra_point_prob", "two_point_conversion_prob", "ep", "epa", "total_home_epa", "total_away_epa", "total_home_rush_epa", "total_away_rush_epa", "total_home_pass_epa", "total_away_pass_epa", "air_epa", "yac_epa", "comp_air_epa", "comp_yac_epa", "total_home_comp_air_epa", "total_away_comp_air_epa", "total_home_comp_yac_epa", "total_away_comp_yac_epa", "total_home_raw_air_epa", "total_away_raw_air_epa", "total_home_raw_yac_epa", "total_away_raw_yac_epa", "wp", "def_wp", "home_wp", "away_wp", "wpa", "vegas_wpa", "vegas_home_wpa", "home_wp_post", "away_wp_post", "vegas_wp", "vegas_home_wp", "total_home_rush_wpa", "total_away_rush_wpa", "total_home_pass_wpa", "total_away_pass_wpa", "air_wpa", "yac_wpa", "comp_air_wpa", "comp_yac_wpa", "total_home_comp_air_wpa", "total_away_comp_air_wpa", "total_home_comp_yac_wpa", "total_away_comp_yac_wpa", "total_home_raw_air_wpa", "total_away_raw_air_wpa", "total_home_raw_yac_wpa", "total_away_raw_yac_wpa", "punt_blocked", "first_down_rush", "first_down_pass", "first_down_penalty", "third_down_converted", "third_down_failed", "fourth_down_converted", "fourth_down_failed", "incomplete_pass", "touchback", "interception", "punt_inside_twenty", "punt_in_endzone", "punt_out_of_bounds", "punt_downed", "punt_fair_catch", "kickoff_inside_twenty", "kickoff_in_endzone", "kickoff_out_of_bounds", "kickoff_downed", "kickoff_fair_catch", "fumble_forced", "fumble_not_forced", "fumble_out_of_bounds", "solo_tackle", "safety", "penalty", "tackled_for_loss", "fumble_lost", "own_kickoff_recovery", "own_kickoff_recovery_td", "qb_hit", "rush_attempt", "pass_attempt", "sack", "touchdown", "pass_touchdown", "rush_touchdown", "return_touchdown", "extra_point_attempt", "two_point_attempt", "field_goal_attempt", "kickoff_attempt", "punt_attempt", "fumble", "complete_pass", "assist_tackle", "lateral_reception", "lateral_rush", "lateral_return", "lateral_recovery", "passer_player_id", "passer_player_name", "passing_yards", "receiver_player_id", "receiver_player_name", "receiving_yards", "rusher_player_id", "rusher_player_name", "rushing_yards", "lateral_receiver_player_id", "lateral_receiver_player_name", "lateral_receiving_yards", "lateral_rusher_player_id", "lateral_rusher_player_name", "lateral_rushing_yards", "lateral_sack_player_id", "lateral_sack_player_name", "interception_player_id", "interception_player_name", "lateral_interception_player_id", "lateral_interception_player_name", "punt_returner_player_id", "punt_returner_player_name", "lateral_punt_returner_player_id", "lateral_punt_returner_player_name", "kickoff_returner_player_name", "kickoff_returner_player_id", "lateral_kickoff_returner_player_id", "lateral_kickoff_returner_player_name", "punter_player_id", "punter_player_name", "kicker_player_name", "kicker_player_id", "own_kickoff_recovery_player_id", "own_kickoff_recovery_player_name", "blocked_player_id", "blocked_player_name", "tackle_for_loss_1_player_id", "tackle_for_loss_1_player_name", "tackle_for_loss_2_player_id", "tackle_for_loss_2_player_name", "qb_hit_1_player_id", "qb_hit_1_player_name", "qb_hit_2_player_id", "qb_hit_2_player_name", "forced_fumble_player_1_team", "forced_fumble_player_1_player_id", "forced_fumble_player_1_player_name", "forced_fumble_player_2_team", "forced_fumble_player_2_player_id", "forced_fumble_player_2_player_name", "solo_tackle_1_team", "solo_tackle_2_team", "solo_tackle_1_player_id", "solo_tackle_2_player_id", "solo_tackle_1_player_name", "solo_tackle_2_player_name", "assist_tackle_1_player_id", "assist_tackle_1_player_name", "assist_tackle_1_team", "assist_tackle_2_player_id", "assist_tackle_2_player_name", "assist_tackle_2_team", "assist_tackle_3_player_id", "assist_tackle_3_player_name", "assist_tackle_3_team", "assist_tackle_4_player_id", "assist_tackle_4_player_name", "assist_tackle_4_team", "tackle_with_assist", "tackle_with_assist_1_player_id", "tackle_with_assist_1_player_name", "tackle_with_assist_1_team", "tackle_with_assist_2_player_id", "tackle_with_assist_2_player_name", "tackle_with_assist_2_team", "pass_defense_1_player_id", "pass_defense_1_player_name", "pass_defense_2_player_id", "pass_defense_2_player_name", "fumbled_1_team", "fumbled_1_player_id", "fumbled_1_player_name", "fumbled_2_player_id", "fumbled_2_player_name", "fumbled_2_team", "fumble_recovery_1_team", "fumble_recovery_1_yards", "fumble_recovery_1_player_id", "fumble_recovery_1_player_name", "fumble_recovery_2_team", "fumble_recovery_2_yards", "fumble_recovery_2_player_id", "fumble_recovery_2_player_name", "sack_player_id", "sack_player_name", "half_sack_1_player_id", "half_sack_1_player_name", "half_sack_2_player_id", "half_sack_2_player_name", "return_team", "return_yards", "penalty_team", "penalty_player_id", "penalty_player_name", "penalty_yards", "replay_or_challenge", "replay_or_challenge_result", "penalty_type", "defensive_two_point_attempt", "defensive_two_point_conv", "defensive_extra_point_attempt", "defensive_extra_point_conv", "safety_player_name", "safety_player_id", "season", "cp", "cpoe", "series", "series_success", "series_result", "order_sequence", "start_time", "time_of_day", "stadium", "weather", "nfl_api_id", "play_clock", "play_deleted", "play_type_nfl", "special_teams_play", "st_play_type", "end_clock_time", "end_yard_line", "fixed_drive", "fixed_drive_result", "drive_real_start_time", "drive_play_count", "drive_time_of_possession", "drive_first_downs", "drive_inside20", "drive_ended_with_score", "drive_quarter_start", "drive_quarter_end", "drive_yards_penalized", "drive_start_transition", "drive_end_transition", "drive_game_clock_start", "drive_game_clock_end", "drive_start_yard_line", "drive_end_yard_line", "drive_play_id_started", "drive_play_id_ended", "away_score", "home_score", "location", "result", "total", "spread_line", "total_line", "div_game", "roof", "surface", "temp", "wind", "home_coach", "away_coach", "stadium_id", "game_stadium", "aborted_play", "success", "passer", "passer_jersey_number", "rusher", "rusher_jersey_number", "receiver", "receiver_jersey_number", "pass", "rush", "first_down", "special", "play", "passer_id", "rusher_id", "receiver_id", "name", "jersey_number", "id", "fantasy_player_name", "fantasy_player_id", "fantasy", "fantasy_id", "out_of_bounds", "home_opening_kickoff", "qb_epa", "xyac_epa", "xyac_mean_yardage", "xyac_median_yardage", "xyac_success", "xyac_fd", "xpass", "pass_oe"]
}
},
"value": ["numeric", "character", "character", "character", "character", "character", "integer", "character", "character", "character", "character", "numeric", "character", "numeric", "numeric", "numeric", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "numeric", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "numeric", "character", "character", "character", "numeric", "character", "character", "numeric", "numeric", "numeric", "character", "character", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "numeric", "character", "character", "character", "character", "character", "character", "character", "character", "character", "numeric", "character", "character", "character", "numeric", "numeric", "character", "character", "numeric", "numeric", "numeric", "numeric", "character", "character", "integer", "numeric", "numeric", "numeric", "numeric", "character", "numeric", "character", "character", "character", "character", "character", "character", "numeric", "character", "numeric", "character", "character", "character", "numeric", "character", "character", "numeric", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "character", "character", "character", "character", "numeric", "numeric", "integer", "integer", "character", "integer", "integer", "numeric", "numeric", "integer", "character", "character", "integer", "integer", "character", "character", "character", "character", "numeric", "numeric", "character", "integer", "character", "integer", "character", "integer", "numeric", "numeric", "numeric", "numeric", "numeric", "character", "character", "character", "character", "integer", "character", "character", "character", "character", "character", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"]
}
================================================
FILE: tests/testthat/_snaps/stats/calculate_stats.md
================================================
# calculate_stats works
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "season_type", "recent_team", "games", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance_list", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
}
},
"value": ["character", "character", "character", "character", "character", "character", "integer", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
}
---
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "week", "season_type", "game_id", "team", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
}
},
"value": ["character", "character", "character", "character", "character", "character", "integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
}
---
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["season", "team", "season_type", "games", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "timeouts", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance_list", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards"]
}
},
"value": ["integer", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer"]
}
---
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["season", "week", "team", "season_type", "game_id", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "timeouts", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards"]
}
},
"value": ["integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer"]
}
---
{
"type": "character",
"attributes": {
"names": {
"type": "character",
"attributes": {},
"value": ["player_id", "player_name", "player_display_name", "position", "position_group", "headshot_url", "season", "week", "season_type", "game_id", "team", "opponent_team", "completions", "attempts", "passing_yards", "passing_tds", "passing_interceptions", "sacks_suffered", "sack_yards_lost", "sack_fumbles", "sack_fumbles_lost", "passing_air_yards", "passing_yards_after_catch", "passing_first_downs", "passing_epa", "passing_cpoe", "passing_2pt_conversions", "pacr", "passing_10", "passing_16", "passing_20", "passing_40", "carries", "rushing_yards", "rushing_tds", "rushing_fumbles", "rushing_fumbles_lost", "rushing_first_downs", "rushing_epa", "rushing_2pt_conversions", "rushing_10", "rushing_12", "rushing_20", "rushing_40", "receptions", "targets", "receiving_yards", "receiving_tds", "receiving_fumbles", "receiving_fumbles_lost", "receiving_air_yards", "receiving_yards_after_catch", "receiving_first_downs", "receiving_epa", "receiving_2pt_conversions", "receiving_10", "receiving_16", "receiving_20", "receiving_40", "racr", "target_share", "air_yards_share", "wopr", "special_teams_tds", "def_tackles_solo", "def_tackles_with_assist", "def_tackle_assists", "def_tackles_for_loss", "def_tackles_for_loss_yards", "def_fumbles_forced", "def_sacks", "def_sack_yards", "def_qb_hits", "def_interceptions", "def_interception_yards", "def_pass_defended", "def_tds", "def_fumbles", "def_safeties", "misc_yards", "fumble_recovery_own", "fumble_recovery_yards_own", "fumble_recovery_opp", "fumble_recovery_yards_opp", "fumble_recovery_tds", "penalties", "penalty_yards", "fumbles_forced_by_opp", "fumbles_not_forced", "fumbles_out_of_bounds", "fumbles_total", "fumbles_lost_total", "punt_returns", "punt_return_yards", "kickoff_returns", "kickoff_return_yards", "fg_made", "fg_att", "fg_missed", "fg_blocked", "fg_long", "fg_pct", "fg_made_0_19", "fg_made_20_29", "fg_made_30_39", "fg_made_40_49", "fg_made_50_59", "fg_made_60_", "fg_missed_0_19", "fg_missed_20_29", "fg_missed_30_39", "fg_missed_40_49", "fg_missed_50_59", "fg_missed_60_", "fg_made_list", "fg_missed_list", "fg_blocked_list", "fg_made_distance", "fg_missed_distance", "fg_blocked_distance", "pat_made", "pat_att", "pat_missed", "pat_blocked", "pat_pct", "gwfg_made", "gwfg_att", "gwfg_missed", "gwfg_blocked", "gwfg_distance", "pt_att", "pt_blocked", "pt_long", "pt_yards", "pt_inside_20", "pt_out_of_bounds", "pt_downed", "pt_touchback", "pt_fair_caught", "pt_returned", "pt_return_yards", "pt_return_tds", "pt_net_yards", "fantasy_points", "fantasy_points_ppr"]
}
},
"value": ["character", "character", "character", "character", "character", "character", "integer", "integer", "character", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "character", "character", "character", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "integer", "numeric", "numeric"]
}
================================================
FILE: tests/testthat/helpers.R
================================================
# sample games we'll use to check with
game_ids <- c("2025_01_KC_LAC", "2019_01_GB_CHI")
test_dir <- getwd()
pbp_cache <- tempfile("pbp_cache", fileext = ".rds")
load_test_pbp <- function(pbp = pbp_cache, dir = test_dir) {
if (file.exists(pbp) && !is.null(dir)) {
if (interactive()) {
cli::cli_alert_info("Will return pbp from cache")
}
return(readRDS(pbp))
}
g <- readRDS(file.path(test_dir, paste0("games.rds")))
# model output differs across machines so we round to 4 significant digits
# to prevent failing tests
pbp_data <- build_nflfastR_pbp(game_ids, dir = dir, games = g)
if (!is.null(dir)) {
saveRDS(pbp_data, pbp)
}
pbp_data
}
save_test_object <- function(object) {
obj_name <- deparse(substitute(object))
tmp_file <- tempfile(obj_name, fileext = ".csv")
modify_digits <- dplyr::mutate_if(object, is.numeric, signif, digits = 3)
data.table::fwrite(modify_digits, tmp_file, na = "NA")
invisible(tmp_file)
}
load_expectation <- function(
type = c("pbp", "sc", "sc_weekly", "ep", "wp"),
dir = test_dir
) {
type <- match.arg(type)
file_name <- switch(
type,
"pbp" = "expected_pbp.rds",
"sc" = "expected_sc.rds",
"sc_weekly" = "expected_sc_weekly.rds",
"ep" = "expected_ep.rds",
"wp" = "expected_wp.rds",
)
strip_nflverse_attributes(readRDS(file.path(dir, file_name))) |>
# we gotta round floating point numbers because of different model output
# across platforms
round_double_to_digits()
}
# strip nflverse attributes for tests because timestamp and version cause failures
# .internal.selfref is a data.table attribute that is not necessary in this case
strip_nflverse_attributes <- function(df) {
input_attrs <- names(attributes(df))
input_remove <- input_attrs[grepl(
"nflverse|.internal.selfref|nflfastR",
input_attrs
)]
attributes(df)[input_remove] <- NULL
df
}
round_double_to_digits <- function(df, digits = 3) {
dplyr::mutate(
df,
dplyr::across(
.cols = relevant_variables(),
.fns = function(vec) {
formatC(vec, digits = digits, format = "fg") |>
as.numeric() |>
suppressWarnings()
}
)
)
}
relevant_variables <- function() {
c(
dplyr::any_of(c(
"no_score_prob",
"opp_fg_prob",
"opp_safety_prob",
"opp_td_prob",
"fg_prob",
"safety_prob",
"td_prob",
"ep",
"cp",
"cpoe",
"pass_oe",
"xpass"
)),
dplyr::ends_with("epa"),
dplyr::ends_with("wp"),
dplyr::ends_with("wp_post"),
dplyr::ends_with("wpa"),
dplyr::starts_with("xyac")
)
}
================================================
FILE: tests/testthat/test-build_nflfastR_pbp.R
================================================
test_that("build_nflfastR_pbp works (local data)", {
# This test used to run on CRAN but their changes to env vars which cause
# check NOTES for multi-threading forced us to skip on cran. It uses locally
# available data so it can't break because of failed downloads
# UPDATE Feb 2026: we'll try testing on CRAN again
# skip_on_cran()
pbp <- load_test_pbp(dir = test_dir)
expect_s3_class(pbp, "nflverse_data")
pbp <- strip_nflverse_attributes(pbp) |>
# we gotta round floating point numbers because of different model output
# across platforms
round_double_to_digits()
exp <- load_expectation("pbp")
expect_equal(pbp, exp)
})
test_that("build_nflfastR_pbp works (outside CRAN)", {
# this test is almost the same as above. However, it requires data download
# and will therefore not run on CRAN but everywhere else.
skip_on_cran()
skip_if_offline("github.com")
pbp <- load_test_pbp(dir = NULL)
pbp <- strip_nflverse_attributes(pbp) |>
# we gotta round floating point numbers because of different model output
# across platforms
round_double_to_digits()
exp <- load_expectation("pbp")
expect_equal(pbp, exp)
})
test_that("default_play is synced with build_nflfastR_pbp", {
# `default_play` is a table of 1 row that is supposed to match the
# output structure of build_nflfastR_pbp. It is used to initialize the
# data table in pbp DBs.
# This test makes sure that it is synced with build_nflfastR_pbp
exp <- load_expectation("pbp")
names_and_types_exp <- vapply(exp, class, FUN.VALUE = character(1L))
names_and_types_def <- vapply(default_play, class, FUN.VALUE = character(1L))
expect_identical(names_and_types_def, names_and_types_exp)
expect_snapshot_value(names_and_types_def, style = "json2")
})
================================================
FILE: tests/testthat/test-calculate_series_conversion_rates.R
================================================
test_that("calculate_series_conversion_rates works", {
# This test used to run on CRAN but their changes to env vars which cause
# check NOTES for multi-threading forced us to skip on cran.
skip_on_cran()
pbp <- load_test_pbp()
sc <- calculate_series_conversion_rates(pbp = pbp, weekly = FALSE) |>
round_double_to_digits()
sc_weekly <- calculate_series_conversion_rates(pbp = pbp, weekly = TRUE) |>
round_double_to_digits()
exp_sc <- load_expectation("sc")
exp_sc_weekly <- load_expectation("sc_weekly")
expect_s3_class(sc, "tbl_df")
expect_s3_class(sc_weekly, "tbl_df")
expect_equal(sc, exp_sc)
expect_equal(sc_weekly, exp_sc_weekly)
})
================================================
FILE: tests/testthat/test-calculate_stats.R
================================================
test_that("calculate_stats works", {
skip_on_cran()
skip_if_offline("github.com")
s1 <- calculate_stats(
seasons = 2023,
summary_level = "season",
stat_type = "player"
)
s2 <- calculate_stats(
seasons = 2023,
summary_level = "week",
stat_type = "player"
)
s3 <- calculate_stats(
seasons = 2023,
summary_level = "season",
stat_type = "team"
)
s4 <- calculate_stats(
seasons = 2023,
summary_level = "week",
stat_type = "team"
)
s5 <- calculate_stats(
seasons = 2023,
summary_level = "week",
stat_type = "player",
season_type = "POST"
)
names_and_types_s1 <- vapply(s1, class, FUN.VALUE = character(1L))
names_and_types_s2 <- vapply(s2, class, FUN.VALUE = character(1L))
names_and_types_s3 <- vapply(s3, class, FUN.VALUE = character(1L))
names_and_types_s4 <- vapply(s4, class, FUN.VALUE = character(1L))
names_and_types_s5 <- vapply(s5, class, FUN.VALUE = character(1L))
var_names <- nflfastR::nfl_stats_variables$variable
# Make sure variable names are listed in nflfastR::nfl_stats_variables$variable
expect_in(names(names_and_types_s1), var_names)
expect_in(names(names_and_types_s2), var_names)
expect_in(names(names_and_types_s3), var_names)
expect_in(names(names_and_types_s4), var_names)
expect_in(names(names_and_types_s5), var_names)
# Weak row number test
expect_gt(nrow(s1), 1900)
expect_gt(nrow(s2), 17500)
expect_identical(nrow(s3), 32L)
expect_gt(nrow(s4), 500)
expect_gt(nrow(s5), 800)
# Snapshot variable types and names
expect_snapshot_value(names_and_types_s1, style = "json2", variant = "stats")
expect_snapshot_value(names_and_types_s2, style = "json2", variant = "stats")
expect_snapshot_value(names_and_types_s3, style = "json2", variant = "stats")
expect_snapshot_value(names_and_types_s4, style = "json2", variant = "stats")
expect_snapshot_value(names_and_types_s5, style = "json2", variant = "stats")
})
test_that("calculate_stats works with pbp subsets", {
skip_on_cran()
skip_if_offline("github.com")
pbp <- load_pbp(2024) |>
dplyr::filter(week <= 2, grepl("LAC", game_id))
s <- calculate_stats(summary_level = "week", stat_type = "player", pbp = pbp)
# Weak row number test
expect_lt(nrow(s), 130)
# week is filtered to <= 2 so stats should return only those weeks
expect_in(unique(s$week), 1:2)
# drop some required columns
pbp_wrong <- pbp |> dplyr::mutate(qb_epa = NULL, play_type = NULL)
expect_error(
calculate_stats(pbp = pbp_wrong),
regexp = 'missing the following required variables: "play_type" and "qb_epa"'
)
})
================================================
FILE: tests/testthat/test-ep_wp_calculators.R
================================================
test_that("calculate_expected_points works", {
# This test used to run on CRAN but their changes to env vars which cause
# check NOTES for multi-threading forced us to skip on cran.
skip_on_cran()
data <- tibble::tibble(
"season" = 2018:2019,
"home_team" = "SEA",
"posteam" = "SEA",
"roof" = "outdoors",
"half_seconds_remaining" = 1800,
"yardline_100" = 75,
"down" = 1,
"ydstogo" = 10,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
ep <- calculate_expected_points(data) |> round_double_to_digits()
exp <- load_expectation("ep")
expect_equal(ep, exp)
})
test_that("calculate_expected_points works", {
# This test used to run on CRAN but their changes to env vars which cause
# check NOTES for multi-threading forced us to skip on cran.
skip_on_cran()
data <- tibble::tibble(
"receive_2h_ko" = 0,
"home_team" = "SEA",
"posteam" = "SEA",
"score_differential" = 0,
"half_seconds_remaining" = 1800,
"game_seconds_remaining" = 3600,
"spread_line" = c(1, 3, 4, 7, 14),
"down" = 1,
"ydstogo" = 10,
"yardline_100" = 75,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
wp <- calculate_win_probability(data) |> round_double_to_digits()
exp <- load_expectation("wp")
expect_equal(wp, exp)
})
================================================
FILE: tests/testthat.R
================================================
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/tests.html
# * https://testthat.r-lib.org/reference/test_package.html#special-files
library(testthat)
library(nflfastR)
test_check("nflfastR")
================================================
FILE: tools/check.env
================================================
# Check for usage of more than two cores. We really need to do this
# because CRAN kept rejecting nflfastR
# It is not supported on Windows and keeps failing on Debian, so it's
# probably necessary to make sure it doesn't fail on Debian
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_="2.5"
_R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_="2.5"
================================================
FILE: vignettes/.gitignore
================================================
*.html
*.R
pbp_db
================================================
FILE: vignettes/beginners_guide.Rmd
================================================
---
title: "A beginner's guide to nflfastR"
author: "Ben Baldwin"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
```
## Introduction
The following guide will assume you have R installed. I also highly recommend working in RStudio. If you need help getting those installed or are unfamiliar with how RStudio is laid out, [please see this section of Lee Sharpe's guide](https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md#r-and-rstudio-introduction).
A quick word if you're new to programming: all of this is happening in R. Obviously, you need to install R on your computer to do any of this. Make sure you save what you're doing in a script (in RStudio, File --> New File --> R script) so you can save your work and run multiple lines of code at once. To run code from a script, highlight what you want, and press control + enter or press the Run button in the top of the editor (see Lee's guide). If you don't highlight anything and press control + enter, the currently selected line will run. As you go through your R journey, you might get stuck and have to google a bunch of things, but that's totally okay and normal. That's how I got started!
## Setup
First, you need to install the magic packages. You only need to run this step once on a given computer. For these you can just type them into the RStudio console (look for the Console pane in RStudio) directly since you're never going to be doing this again.
### Install packages
``` {r eval = FALSE}
install.packages("tidyverse", type = "binary")
install.packages("ggrepel", type = "binary")
install.packages("nflreadr", type = "binary")
install.packages("nflplotR", type = "binary")
```
### Load packages
Okay, now here's the stuff you're going to want to start putting into your R script. The following loads `tidyverse`, which contains a lot of helper functions for working with data and `ggrepel` for making figures, along with `nflreadr` (which allows one to quickly download `nflfastR` data, along with a lot of other data). Finally, `nflplotR` makes plotting easier.
``` {r, results = 'hide', message = FALSE }
library(tidyverse)
library(ggrepel)
library(nflreadr)
library(nflplotR)
```
This one is optional but makes R prefer not to display numbers in scientific notation, which I find very annoying:
``` {r}
options(scipen = 9999)
```
### Load data
This will load the full play by play for the 2019 season (including playoffs). We'll get to how to get more seasons later. Note that this is downloading pre-cleaned data from the nflfastR data repository using the `load_pbp()` function included in `nflreadr`, which is much faster than building pbp from scratch.
``` {r}
data <- load_pbp(2019)
```
## Basics: how to look at your data
### Dimensions
```{r echo=FALSE}
rows = dim(data)[[1]]
cols = dim(data)[[2]]
```
Before moving forward, here are a few ways to get a sense of what's in a dataframe. We can check the **dim**ensions of the data, and this tells us that there are ```r rows``` rows (i.e., plays) in the data and ```r cols``` columns (variables):
``` {r}
dim(data)
```
`str` displays the **str**ucture of the dataframe:
``` {r}
str(data[1:10])
```
In the above, I've added in the `[1:10]`, which selects only the first 10 columns, otherwise the list is extremely long (remember from above that there are ```r cols``` columns!). Normally, you would just type `str(data)`.
You can similarly take a glimpse at your data:
``` {r}
glimpse(data[1:10])
```
Where again I'm only showing the first 10 columns. The usual command would be `glimpse(data)`.
### Variable names
Another very useful command is to get the `names` of the variables in the data, which you would get by entering `names(data)` (I won't show here because, again, it is ```r cols``` columns).
That is a lot to work with!
### Viewer
One more way to look at your data is with the `View()` function. If you're coming from an Excel background, this will help you feel more at home as a way to see what's in the data.
``` {r eval = FALSE}
View(data)
```
This will open the viewer in RStudio in a new panel. Try it out yourself! Since there are so many columns, the Viewer won't show them all. To pick which columns to view, you can **select** some:
``` {r eval = FALSE}
data |>
select(home_team, away_team, posteam, desc) |>
View()
```
The `|>` thing lets you pipe together a bunch of different commands. So we're taking our data, "`select`"ing a few variables we want to look at, and then Viewing. Again, I can't display the results of that here, but try it out yourself!
### Head + manipulation
To start, let's just look at the first few rows (the "head") of the data.
``` {r}
data |>
select(posteam, defteam, desc, rush, pass) |>
head()
```
A couple things. "`desc`" is the important variable that lists the description of what happened on the play, and `head` says to show the first few rows (the "head" of the data). Since this is already sorted by game, these are the first 6 rows from a week 1 game, ATL @ MIN. To make code easier to read, people often put each part of a pipe on a new line, which is useful when working with more complicated functions. We could run:
``` {r eval = FALSE}
data |> select(posteam, defteam, desc, rush, pass) |> head()
```
And it would return the exact same output as the one written out in multiple lines, but the code isn't as easy to read.
We've covered `select`, and the next important function to learn is `filter`, which lets you filter the data to what you want. The following returns only plays that are run plays and pass plays; i.e., no punts, kickoffs, field goals, or dead ball penalties (e.g. false starts) where we don't know what the attempted play was.
``` {r}
data |>
filter(rush == 1 | pass == 1) |>
select(posteam, desc, rush, pass, name, passer, rusher, receiver) |>
head()
```
Compared to the first time we did this, the opening line for the start of the game, the kickoff, and the punt are now gone. Note that if you're checking whether a variable is equal to something, we need to use the double equals sign `==` like above. There's probably some technical reason for this [shrug emoji]. Also, the character `|` is used for "or", and `&` for "and". So `rush == 1 | pass == 1` means "rush or pass".
Note that the `rush`, `pass`, `name`, `passer`, `rusher`, and `receiver` columns are all `nflfastR` creations, where we have provided these to make working with the data easier. As we can see above, `passer` is filled in for all dropbacks (including sacks and scrambles, which also have `pass` = 1), and `name` is equal to the passer on pass plays and the rusher on rush plays. Think of this as the primary player involved on a play.
What if we wanted to view special teams plays? Again, we can use `filter`:
``` {r}
data |>
filter(special == 1) |>
select(down, ydstogo, desc) |>
head()
```
Fourth down plays?
``` {r}
data |>
filter(down == 4) |>
select(down, ydstogo, desc) |>
head()
```
Fourth down plays that aren't special teams plays?
``` {r}
data |>
filter(down == 4 & special == 0) |>
select(down, ydstogo, desc) |>
head()
```
So far, we've just been taking a look at the initial dataset we downloaded, but none of our results are preserved. To save a new dataframe of just the plays we want, we need to use `<-` to assign a new dataframe. Let's save a new dataframe that's just run plays and pass plays with non-missing EPA, called `pbp_rp`.
``` {r}
pbp_rp <- data |>
filter(rush == 1 | pass == 1, !is.na(epa))
```
In the above, `!is.na(epa)` means to exclude plays with missing (`na`) EPA. The `!` symbol is often used by computer folk to negate something, so `is.na(epa)` means "EPA is missing" and `!is.na(epa)` means "EPA is not missing", which we have used above.
## Some basic stuff: Part 1
Okay, we have a big dataset where we call dropbacks pass plays and non-dropbacks rush plays. Now we actually want to, like, do stuff.
### Group by and Summarize
Let's take a look at how various Cowboys' running backs fared on run plays in 2019:
``` {r}
pbp_rp |>
filter(posteam == "DAL", rush == 1) |>
group_by(rusher) |>
summarize(
mean_epa = mean(epa), success_rate = mean(success), ypc = mean(yards_gained), plays = n()
) |>
arrange(-mean_epa) |>
filter(plays > 20)
```
There's a lot going on here. We've covered `filter` already. The `group_by` function is an *extremely* useful function that, well, groups by what you tell it -- in this case the rusher. Summarize is useful for collapsing the data down to a summary of what you're looking at, and here, while grouping by player, we're summarizing the mean of EPA, success, yardage (a bad rushing stat, but since we're here), and getting the number of plays using `n()`, which returns the number in a group. Unsurprisingly, Prescott was much more effective as a rusher in 2019 than the running backs, and there was no meaningful difference between Pollard and Elliott in efficiency.
If you check the [PFR team stats page](https://www.pro-football-reference.com/teams/dal/2019.htm), you'll notice that the above doesn't match up with the official stats. This is because `nflfastR` computes EPA and provides player names on plays with penalties and on two-point conversions. So if wanting to match the official stats, we need to restrict to `down <= 4` (to excluded two-point conversions, which have down listed as `NA`) and `play_type = run` (to exclude penalties, which are `play_type = no_play`):
``` {r}
pbp_rp |>
filter(posteam == "DAL", down <= 4, play_type == 'run') |>
group_by(rusher) |>
summarize(
mean_epa = mean(epa), success_rate = mean(success), ypc=mean(yards_gained), plays=n()
) |>
filter(plays > 20)
```
Now we exactly match PFR: Zeke has 301 carries at 4.5 yards/carry, and Pollard has 86 carries for 5.3 yards/carry. Note that we still aren't matching Dak's stats to PFR because the NFL classifies scrambles as rush attempts and `nflfastR` does not.
### Manipulating columns: mutate, if_else, and case_when
Let's say we want to make a new column, named `home`, which is equal to 1 if the team with the ball is the home team. Let's introduce another extremely useful function, `if_else`:
``` {r}
pbp_rp |>
mutate(
home = if_else(posteam == home_team, 1, 0)
) |>
select(posteam, home_team, home) |>
head(10)
```
`mutate` is R's word for creating a new column (or overwriting an existing one); in this case, we've created a new column called `home`. The above uses `if_else`, which uses the following pattern: condition (in this case, `posteam == home_team`), value if condition is true (in this case, if `posteam == home_team`, it is 1), and value if the condition is false (0). So we could use this to, for example, look at average EPA/play by home and road teams:
``` {r}
pbp_rp |>
mutate(
home = if_else(posteam == home_team, 1, 0)
) |>
group_by(home) |>
summarize(epa = mean(epa))
```
Note that EPA/play is similar for home teams and away teams because `home` is already built into the `nflfastR` EPA model, so this result is expected. Actually, away EPA/play is actually somewhat higher, presumably because away teams out-performed their usual in 2019 as homefield advantage continues to decline generally.
`if_else` is nice if you're creating a new column based on a simple condition. But what if you need to do something more complicated? `case_when` is a good option. Here's how it works:
``` {r}
pbp_rp |>
filter(!is.na(cp)) |>
mutate(
depth = case_when(
air_yards < 0 ~ "Negative",
air_yards >= 0 & air_yards < 10 ~ "Short",
air_yards >= 10 & air_yards < 20 ~ "Medium",
air_yards >= 20 ~ "Deep"
)
) |>
group_by(depth) |>
summarize(cp = mean(cp))
```
Note the new syntax for `case_when`: we have condition (for the first one, air yards less than 0), followed by `~`, followed by assignment (for the first one, "Negative"). In the above, we created 4 bins based on air yards and got average completion probability (`cp`) based on the `nflfastR` model. Unsurprisingly, `cp` is lower the longer downfield a throw goes.
### A basic figure
Now that we've gained some skills at manipulating data, let's put it to use by making things. Which teams were the most pass-heavy in the first half on early downs with win probability between 20 and 80, excluding the final 2 minutes of the half when everyone is pass-happy?
``` {r}
schotty <- pbp_rp |>
filter(wp > .20 & wp < .80 & down <= 2 & qtr <= 2 & half_seconds_remaining > 120) |>
group_by(posteam) |>
summarize(mean_pass = mean(pass), plays = n()) |>
arrange(-mean_pass)
schotty
```
Again, we've already used `filter`, `group_by`, and `summarize`. The new function we are using here is `arrange`, which sorts the data by the variable(s) given. The minus sign in front of `mean_pass` means to sort in descending order.
Let's make our first figure:
```{r fig1, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
ggplot(schotty, aes(x = reorder(posteam, -mean_pass), y = mean_pass)) +
geom_text(aes(label = posteam))
```
This image is kind of a mess -- we still need a title, axis labels, etc -- but gets the point across. We'll get to that other stuff later. But more importantly, we made something interesting using `nflfastR` data! The "reorder" sorts the teams according to pass rate, with the "-" again saying to do it in descending order. "aes" is short for "aesthetic", which is R's weird way of asking which variables should go on the x and y axes.
Looking at the figure, the Chiefs will never have playoff success until they establish the run.
## Loading multiple seasons
Because all the data is stored in the data repository, it is very fast to load data from multiple seasons.
``` {r}
pbp <- load_pbp(2015:2019)
```
This loads play-by-play data from the 2015 through 2019 seasons.
Let's make sure we got it all. By now, you should understand what this is doing:
``` {r}
pbp |>
group_by(season) |>
summarize(n = n())
```
So each season has about 48,000 plays. Just for fun, let's look at the various play types:
``` {r}
pbp |>
group_by(play_type) |>
summarize(n = n())
```
## Figures with QB stats
Let's do some stuff with quarterbacks:
``` {r}
qbs <- pbp |>
filter(season_type == "REG", !is.na(epa)) |>
group_by(id, name) |>
summarize(
epa = mean(qb_epa),
cpoe = mean(cpoe, na.rm = T),
n_dropbacks = sum(pass),
n_plays = n(),
team = last(posteam)
) |>
ungroup() |>
filter(n_dropbacks > 100 & n_plays > 1000)
```
Lots of new stuff here. First, we're grouping by `id` and `name` to make sure we're getting unique players; i.e., if two players have the same name (like Javorius Allen and Josh Allen both being J.Allen), we are also using their id to differentiate them. `qb_epa` is an `nflfastR` creation that is equal to EPA in all instances except for when a pass is completed and a fumble is lost, in which case a QB gets "credit" for the play up to the spot the fumble was lost (making EPA function like passing yards). The `last` part in the `summarize` comment gets the last team that a player was observed playing with.
My way of getting a dataset with only quarterbacks without joining to external roster data is to make sure they hit some number of dropbacks. In this case, filtering with `n_dropbacks > 100` makes sure we're only including quarterbacks. The `ungroup()` near the end is good practice after grouping to make sure you don't get weird behavior with the data you created down the line.
Let's make some more figures. The `load_teams()` function is provided in the `nflreadr` package, so since we have already loaded the package, it's ready to use.
``` {r}
load_teams()
```
Let's join this to the `qbs` dataframe we created:
``` {r}
qbs <- qbs |>
left_join(load_teams(), by = c('team' = 'team_abbr'))
```
`left_join` means keep all the rows from the left dataframe (the first one provided, `qbs`), and join those rows to available rows in the other dataframe. We also need to provide the joining variables, `team` from `qbs` and `team_abbr` from `load_teams()`. Why do we have to type `by = c('team' = 'team_abbr')`? Who knows, but it's what `left_join` requires as instructions for how to match.
### With team color dots
Now we can make a figure!
```{r fig2, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
qbs |>
ggplot(aes(x = cpoe, y = epa)) +
#horizontal line with mean EPA
geom_hline(yintercept = mean(qbs$epa), color = "red", linetype = "dashed", alpha=0.5) +
#vertical line with mean CPOE
geom_vline(xintercept = mean(qbs$cpoe), color = "red", linetype = "dashed", alpha=0.5) +
#add points for the QBs with the right colors
#cex controls point size and alpha the transparency (alpha = 1 is normal)
geom_point(color = qbs$team_color, cex=qbs$n_plays / 350, alpha = .6) +
#add names using ggrepel, which tries to make them not overlap
geom_text_repel(aes(label=name)) +
#add a smooth line fitting cpoe + epa
stat_smooth(geom='line', alpha=0.5, se=FALSE, method='lm')+
#titles and caption
labs(x = "Completion % above expected (CPOE)",
y = "EPA per play (passes, rushes, and penalties)",
title = "Quarterback Efficiency, 2015 - 2019",
caption = "Data: @nflfastR") +
#uses the black and white ggplot theme
theme_bw() +
#center title with hjust = 0.5
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = "bold")
) +
#make ticks look nice
#if this doesn't work, `install.packages('scales')`
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10))
```
This looks complicated, but is just a way of getting a bunch of different stuff on the same plot: we have lines for averages, dots, names, etc. I added comments above to explain what is going on, but in practice for making figures I usually just copy and paste stuff and/or google what I need.
### With team logos
We could also make the same plot with team logos:
```{r fig3, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
qbs |>
ggplot(aes(x = cpoe, y = epa)) +
#horizontal line with mean EPA
geom_hline(yintercept = mean(qbs$epa), color = "red", linetype = "dashed", alpha=0.5) +
#vertical line with mean CPOE
geom_vline(xintercept = mean(qbs$cpoe), color = "red", linetype = "dashed", alpha=0.5) +
#add points for the QBs with the logos (this uses nflplotR package)
geom_nfl_logos(aes(team_abbr = team), width = qbs$n_plays / 45000, alpha = 0.75) +
#add names using ggrepel, which tries to make them not overlap
geom_text_repel(aes(label=name)) +
#add a smooth line fitting cpoe + epa
stat_smooth(geom='line', alpha=0.5, se=FALSE, method='lm')+
#titles and caption
labs(x = "Completion % above expected (CPOE)",
y = "EPA per play (passes, rushes, and penalties)",
title = "Quarterback Efficiency, 2015 - 2019",
caption = "Data: @nflfastR") +
theme_bw() +
#center title
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = "bold")
) +
#make ticks look nice
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10))
```
The only changes we've made are to use `geom_nfl_logos` instead of `geom_point` (how to figure out the right size for the images in the `width` part? Trial and error).
This figure would look better with fewer players shown, but the point of this is explaining how to do stuff, so let's call this good enough.
### Team tiers plot
If it's helpful, here are a few notes about the [chart originally shown here](https://www.nflfastr.com/articles/nflfastR.html#example-5-plot-offensive-and-defensive-epa-per-play-for-a-given-season), which like the above uses nflplotR for team logos.
```{r ex5, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
library(nflplotR)
# get pbp and filter to regular season rush and pass plays
pbp <- nflreadr::load_pbp(2005) |>
filter(season_type == "REG") |>
filter(!is.na(posteam) & (rush == 1 | pass == 1))
# offense epa
offense <- pbp |>
group_by(team = posteam) |>
summarise(off_epa = mean(epa, na.rm = TRUE))
# defense epa
defense <- pbp |>
group_by(team = defteam) |>
summarise(def_epa = mean(epa, na.rm = TRUE))
# make figure
offense |>
inner_join(defense, by = "team") |>
ggplot(aes(x = off_epa, y = def_epa)) +
# tier lines
geom_abline(slope = -1.5, intercept = (4:-3)/10, alpha = .2) +
# nflplotR magic
nflplotR::geom_mean_lines(aes(y0 = off_epa, x0 = def_epa)) +
nflplotR::geom_nfl_logos(aes(team_abbr = team), width = 0.07, alpha = 0.7) +
labs(
x = "Offense EPA/play",
y = "Defense EPA/play",
caption = "Data: @nflfastR",
title = "2005 NFL Offensive and Defensive EPA per Play"
) +
theme_bw() +
theme(
plot.title = element_text(size = 12, hjust = 0.5, face = "bold")
) +
scale_y_reverse()
```
* The `geom_mean_lines()` function adds mean lines for offensive and defensive EPA per play
* The slope lines are created using `geom_abline()`
* `scale_y_reverse()` reverses the vertical axis so that up = better defense
Everything else should be comprehensible by now!
### A few more things on plotting
There are two ways to view plots. One is in the RStudio Viewer, which shows up in RStudio when you plot something. If plots in your RStudio viewer look ugly and pixelated, you probably need to install the `Cairo` package and then set that as the default viewer by doing Tools --> Global Options --> General --> Graphics --> Backend: Set to Cairo.
The other is to save a .png with your preferred dimensions and resolution. For example, `ggsave("test.png", width = 16, height = 9, units = "cm")` would save the current plot as "`test.png`" with the units specified (you can view all the ggsave options [here](https://ggplot2.tidyverse.org/reference/ggsave.html)).
One more note: the RStudio Viewer can take a long time to preview ggplots, especially if you're doing things like adding images. If you're getting frustrated with a plot taking a long time to display, you can take advantage of [ggpreview](https://nflplotr.nflverse.com/reference/ggpreview.html) from `nflplotR`. To do this, first save the plot to an object and then run `ggpreview` on it (if this doesn't make sense, see the examples [here](https://nflplotr.nflverse.com/reference/ggpreview.html)).
## Real life example: let's make a win total model
I'm going to try to go through the process of cleaning and joining multiple data sets to try to get a sense of how I would approach something like this, step-by-step.
### Get team wins each season
We're going to cheat a little and take advantage of Lee Sharpe's famous `games` file. Most of this stuff has been added into `nflfastR`, but it's easier working with this file where each game is one row. If you're curious, the triple colon is a way to access what is referred to as non-exported functions in a package. Think of this as like a secret menu (why is this secret? Sometimes package developers want to limit the number of exported functions as to be not overwhelming).
``` {r}
games <- nflreadr::load_schedules()
str(games)
```
To start, we want to create a dataframe where each row is a team-season observation, listing how many games they won. There are multiple ways to do this, but I'm going to just take the home and away results and bind together. As an example, here's what the `home` results look like:
``` {r}
home <- games |>
filter(game_type == 'REG') |>
select(season, week, home_team, result) |>
rename(team = home_team)
home |> head(5)
```
Note that we used `rename` to change `home_team` to `team`.
``` {r}
away <- games |>
filter(game_type == 'REG') |>
select(season, week, away_team, result) |>
rename(team = away_team) |>
mutate(result = -result)
away |> head(5)
```
For away teams, we need to flip the result since result is given from the perspective of the home team. Now let's make a columns called `win` based on the result.
``` {r}
results <- bind_rows(home, away) |>
arrange(week) |>
mutate(
win = case_when(
result > 0 ~ 1,
result < 0 ~ 0,
result == 0 ~ 0.5
)
)
results |> filter(season == 2019 & team == 'SEA')
```
Doing the `results |> filter(season == 2019 & team == 'SEA')` part at the end isn't actually for saving the data in a new form, but just making sure the previous step did what I wanted. This is a good habit to get into: frequently inspect your data and make sure it looks like you think it should.
Now that we have the dataframe we wanted, we can get team wins by season easily:
``` {r}
team_wins <- results |>
group_by(team, season) |>
summarize(
wins = sum(win),
point_diff = sum(result)) |>
ungroup()
team_wins |>
arrange(-wins) |>
head(5)
```
Again, we're making sure the data looks like it "should" by checking the 5 seasons with the most wins, and making sure it looks right.
Now that the team-season win and point differential data is ready, we need to go back to the `nflfastR` data to get EPA/play.
### Get team EPA by season
Let's start by getting data from every season from the `nflfastR` data repository:
``` {r}
pbp <- load_pbp(1999:2019) |>
filter(
rush == 1 | pass == 1,
season_type == "REG",
!is.na(epa),
!is.na(posteam),
posteam != ""
) |>
select(season, posteam, pass, defteam, epa)
```
I'm being pretty aggressive with dropping rows and columns (`filter` and `select`) because otherwise loading this all into memory can be painful on the computer. But this is all we need for what we're doing. Note that I'm only keeping regular season games here (`season_type == "REG"`) since this is how this analysis is usually done.
Now we can get EPA/play on offense and defense. Let's break it out by pass and rush too. I don't remember how to do some of this so let's do it in steps. We know we need to group by team, season, and pass, so there's the beginning:
``` {r}
pbp |>
group_by(posteam, season, pass) |>
summarize(epa = mean(epa)) |>
head(4)
```
But this makes two rows per team-season. How to get each team-season on the same row? `pivot_wider` is what we need:
``` {r}
pbp |>
group_by(posteam, season, pass) |>
summarize(epa = mean(epa)) |>
pivot_wider(names_from = pass, values_from = epa) |>
head(4)
```
This one is hard to wrap my head around so I usually open up the [reference page](https://tidyr.tidyverse.org/reference/pivot_wider.html), read the example, and pray that what I try works. In this case it did. Hooray! This turned our two-lines-per-team dataframe into one, with the 0 column being pass == 0 (run plays) and the 1 column pass == 1.
Now let's rename to something more sensible and save:
``` {r}
offense <- pbp |>
group_by(posteam, season, pass) |>
summarize(epa = mean(epa)) |>
pivot_wider(names_from = pass, values_from = epa) |>
rename(off_pass_epa = `1`, off_rush_epa = `0`)
```
Note that variable names that are numbers need to be surrounded in tick marks for this to work.
Now we can repeat the same process for defense:
``` {r}
defense <- pbp |>
group_by(defteam, season, pass) |>
summarize(epa = mean(epa)) |>
pivot_wider(names_from = pass, values_from = epa) |>
rename(def_pass_epa = `1`, def_rush_epa = `0`)
```
Let's do another sanity check looking at the top 5 pass offenses and defenses:
``` {r}
#top 5 offenses
offense |>
arrange(-off_pass_epa) |>
head(5)
#top 5 defenses
defense |>
arrange(def_pass_epa) |>
head(5)
```
The top pass defenses (2002 TB, 2017 JAX, 2019 NE) and offenses (2007 Pats, 2004 Colts, 2011 Packers) definitely check out!
### Fix team names and join
Now we're ready to bind it all together. Actually, let's make sure all the team names are ready too.
``` {r}
team_wins |>
group_by(team) |>
summarize(n=n()) |>
arrange(n)
```
Nope, not yet, we need to fix the Raiders, Rams, and Chargers, which are LV, LA, and LAC in `nflfastR`.
``` {r}
team_wins <- team_wins |>
mutate(
team = case_when(
team == 'OAK' ~ 'LV',
team == 'SD' ~ 'LAC',
team == 'STL' ~ 'LA',
TRUE ~ team
)
)
```
The `TRUE` statement at the bottom says that if none of the above cases are found, keep team the same. Let's make sure this worked:
``` {r}
team_wins |>
group_by(team) |>
summarize(n=n()) |>
arrange(n)
```
HOU has 3 fewer seasons because it didn't exist from 1999 through 2001, which is fine, and all the other team names have number of seasons that they should. Okay NOW we can join:
``` {r}
data <- team_wins |>
left_join(offense, by = c('team' = 'posteam', 'season')) |>
left_join(defense, by = c('team' = 'defteam', 'season'))
data |>
filter(team == 'SEA' & season >= 2012)
```
Now we're getting really close to doing what we want! Next we need to create new columns for prior year EPA, and let's do point differential too.
``` {r}
data <- data |>
arrange(team, season) |>
group_by(team) |>
mutate(
prior_off_rush_epa = lag(off_rush_epa),
prior_off_pass_epa = lag(off_pass_epa),
prior_def_rush_epa = lag(def_rush_epa),
prior_def_pass_epa = lag(def_pass_epa),
prior_point_diff = lag(point_diff)
) |>
ungroup()
data |>
head(5)
```
Finally! Now we have the data in place and can start doing things with it.
### Correlations and regressions
``` {r}
data |>
select(-team, -season) |>
cor(use="complete.obs") |>
round(2)
```
```{r echo=FALSE}
pp = cor(data$off_pass_epa, data$prior_off_pass_epa, use="complete.obs") |>
round(2)
rr = cor(data$off_rush_epa, data$prior_off_rush_epa, use="complete.obs") |>
round(2)
pd = cor(data$def_pass_epa, data$prior_def_pass_epa, use="complete.obs") |>
round(2)
rd = cor(data$def_rush_epa, data$prior_def_rush_epa, use="complete.obs") |>
round(2)
```
We've covered `select`, but here we see a new use where a minus sign de-selects variables (we need to de-select team name for correlation to work because it doesn't work for character strings, and correlation with the season number itself is meaningless). We've run the correlation on this dataframe, removing missing values, and then rounding to 2 digits. Not surprisingly, we see that wins in the current season are more strongly related to passing offense EPA than rushing EPA or defense EPA, and prior offense carries more predictive power than prior defense. Pass offense is more stable year to year (```r pp```) than rush offense (```r rr```), pass defense (```r pd```), or rush defense (```r rd```).
I'm actually surprised that the values for passing offense aren't higher relative to the others. Maybe it was because most of our prior results come from the `nflscrapR` era (2009 - 2019)? Let's check what this looks like since 2009 relative to earlier seasons:
``` {r}
message("2009 through 2019")
data |>
filter(season >= 2009) |>
select(wins, point_diff, off_pass_epa, off_rush_epa, prior_point_diff, prior_off_pass_epa, prior_off_rush_epa) |>
cor(use="complete.obs") |>
round(2)
```
``` {r}
message("1999 through 2008")
data |>
filter(season < 2009) |>
select(wins, point_diff, off_pass_epa, off_rush_epa, prior_point_diff, prior_off_pass_epa, prior_off_rush_epa) |>
cor(use="complete.obs") |>
round(2)
```
Yep, that seems to be the case. So in the more recent period, passing offense has become slightly more stable but more predictive of following-year success, while at the same time rushing offense has become substantially less stable and less predictive of future team success.
Now let's do a basic regression of wins on prior offense and defense EPA/play. Maybe we should only look at this more recent period to fit our model since it's more relevant for 2020. In the real world, we would be more rigorous about making decisions like this, but let's proceed anyway.
``` {r}
data <- data |> filter(season >= 2009)
fit <- lm(wins ~ prior_off_pass_epa + prior_off_rush_epa + prior_def_pass_epa + prior_def_rush_epa, data = data)
summary(fit)
```
I'm actually pretty surprised passing offense isn't higher here. How does this compare to simply using point differential?
``` {r}
fit2 <- lm(wins ~ prior_point_diff, data = data)
summary(fit2)
```
So R2 is somewhat higher for just point differential. This isn't surprising as we've thrown away special teams plays and haven't attempted to make any adjustments for things like fumble luck that we know can improve EPA's predictive power.
### Predictions
Now let's get the predictions from the EPA model:
``` {r}
preds <- predict(fit, data |> filter(season == 2020)) |>
#was just a vector, need a tibble to bind
as_tibble() |>
#make the column name make sense
rename(prediction = value) |>
round(1) |>
#get names
bind_cols(
data |> filter(season == 2020) |> select(team)
)
preds |>
arrange(-prediction) |>
head(5)
```
This mostly checks out.
What if we just used simple point differential to predict?
``` {r}
preds2 <- predict(fit2, data |> filter(season == 2020)) |>
#was just a vector, need a tibble to bind
as_tibble() |>
#make the column name make sense
rename(prediction = value) |>
round(1) |>
#get names
bind_cols(
data |> filter(season == 2020) |> select(team)
)
preds2 |>
arrange(-prediction) |>
head(5)
```
Not surprisingly, this looks pretty similar. These are very basic models that don't incorporate schedule, roster changes, etc. For example, a better model would take into account Tom Brady no longer playing for the Patriots. But hopefully this has been useful!
## Next Steps
You now should know enough to be able to tackle a great deal of questions using `nflfastR` data. A good way to build up skills is to take interesting things you see and try to replicate them (for making figures, this will also involve a heavy dose of googling stuff).
Looking at others' code is also a good way to learn. One option is to look through the `nflfastR` code base, much of which you should now understand what it's doing. For example, [here is the function that cleans up the data and prepares it for later stages](https://github.com/mrcaseb/nflfastR/blob/master/R/helper_add_nflscrapr_mutations.R): there's a heavy dose of `mutate`, `group_by`, `arrange`, `lag`, `if_else`, and `case_when`.
### Resources: The gold standards
This is an R package so this section is pretty R heavy.
* [Introduction to R (**recommended**)](https://r4ds.had.co.nz/explore-intro.html)
* [Open Source Football](https://www.opensourcefootball.com/): Mix of R and Python
* [The Mockup Blog (Thomas Mock)](https://themockup.blog/): Invaluable resource for making cool stuff in R
### Code examples: R
* [Lee Sharpe: basic intro to R and RStudio](https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md)
* [Lee Sharpe: lots of useful NFL / nflscrapR code](https://github.com/leesharpe/nfldata)
* [Lee Sharpe: how to update current season games](https://github.com/leesharpe/nfldata/blob/master/UPDATING-NFLSCRAPR.md)
* [Josh Hermsmeyer: Getting Started with R for NFL Analysis](https://t.co/gxDDhOYhcI)
* [Slavin: visualizing positional tiers in SFB9](https://slavin22.github.io/SFB9-Positional-Tiers/Guide.nb)
* [Ron Yurko: assorted examples](https://github.com/ryurko/nflscrapR-data/tree/master/R)
* [CowboysStats: defensive playmaking EPA](https://github.com/dhouston890/cowboys-stats/blob/master/playmaking_epa_pbp.R)
* [Michael Lopez: function to sample plays](https://github.com/statsbylopez/BlogPosts/blob/master/scrapr-data.R)
* [Michael Lopez: R for NFL analysis (presentation to club staffers)](https://statsbylopez.netlify.com/post/r-for-nfl-analysis/)
* [Mitchell Wesson: QB hits investigation](https://gist.github.com/wessonmo/45781bd25a74e8097e0c8bc8fbacf796)
* [Mitchell Wesson: Investigation of the nflscrapR EP model](https://gist.github.com/wessonmo/ef44ea9873d70f816454cb88b86dcce6)
* [WHoffman: graphs for receivers (aDoT, success rate, and more)](https://github.com/whoffman21279/Steelers/blob/master/receiving_stats)
* [ChiBearsStats: investigation of 3rd downs vs offensive efficiency](https://gist.github.com/ChiBearsStats/dac3266037797032a23f38fd9d64d6a8#file-adjustedthirddowns-txt)
* [ChiBearsStats: the insignificance of field goal kicking](https://gist.github.com/ChiBearsStats/78e33baeed3cd6d3cac0040b47d4ec69)
### More data sources
* [Lee Sharpe: Draft Picks, Draft Values, Games, Logos, Rosters, Standings](https://github.com/leesharpe/nfldata/blob/master/DATASETS.md)
* [greerre: how to get .csv file of weather & stadium data from PFR in python](https://github.com/greerre/pfr_metadata_pull)
* [Parker Fleming: Introduction to College Football Data with R and cfbscrapR](https://gist.github.com/spfleming/2527a6ca2b940af2a8aa1fee9320171d)
### Other code examples: Python
* [Deryck97: nflfastR Python Guide](https://gist.github.com/Deryck97/dff8d33e9f841568201a2a0d5519ac5e)
* [Nick Wan: nflfastR Python Colab Guide](https://colab.research.google.com/github/nickwan/colab_nflfastR/blob/master/nflfastR_starter.ipynb)
* [Cory Jez: animated plot](https://github.com/jezlax/sports_analytics/blob/master/animated_nfl_scatter.py)
* [903124S: Sampling EP](https://gist.github.com/903124/6693fdf6b991437a6d6ef9c5d935c83b)
* [903124S: estimating EPA using nfldb](https://gist.github.com/903124/d304f76688b0699497a35b61b6d1e267)
* [903124S: estimate EPA for college football](https://gist.github.com/903124/3c6f0dc0a100d78b8622573ef4c504f5)
* Blake Atkinson: explosiveness [blog post](https://medium.com/@BlakeAtkinson/the-2018-kansas-city-chiefs-and-an-explosiveness-metric-in-football-c3b3fd447d73) and [python code](https://github.com/btatkinson/yard_value/blob/master/yard_value.ipynb)
* Blake Atkinson: player type visualizations [blog post](https://medium.com/@BlakeAtkinson/visualizing-different-nfl-player-styles-88ef31420539) and [python code](https://github.com/btatkinson/player_vectors/blob/master/player_vectors.ipynb)
================================================
FILE: vignettes/field_descriptions.Rmd
================================================
---
title: "Field Descriptions"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
comment = "#>"
)
with_dt <- requireNamespace("DT")
```
```{r eval = with_dt}
DT::datatable(
nflfastR::field_descriptions,
options = list(scrollX = TRUE, pageLength = 25),
filter = "top",
rownames = FALSE,
style = "bootstrap4"
)
```
```{r eval = !with_dt}
knitr::kable(nflfastR::field_descriptions)
```
================================================
FILE: vignettes/nflfastR.Rmd
================================================
---
title: "Get started with nflfastR"
author: "Ben Baldwin & Sebastian Carl"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
future::plan("multisession")
options(dplyr.summarise.inform = FALSE)
options(nflreadr.verbose = FALSE)
```
If you are new to R or are having trouble understanding the code in the below sections we highly recommend the **nflfastR beginner's guide** in `vignette("beginners_guide")`.
# The Main Functions
nflfastR comes with a set of functions to access NFL play-by-play data. This section provides a brief introduction to the essential functions.
nflfastR processes and cleans up play-by-play data and adds variables through [it's models](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/). Since some of these tasks are performed by separate functions, the easiest way to compute the complete nflfastR dataset is `build_nflfastR_pbp()`. The main input for that function is a set of game ids which can be accessed with `load_schedules()`. The following code demonstrates how to build the nflfastR dataset for the Super Bowls of the 2017 - 2019 seasons.
```{r}
ids <- nflreadr::load_schedules(2017:2019) |>
dplyr::filter(game_type == "SB") |>
dplyr::pull(game_id)
pbp <- nflfastR::build_nflfastR_pbp(ids)
```
In most cases, however, it is not necessary to use this function for individual games, because nflverse provides both a [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp) and two main play-by-play functions: `load_pbp()` and `update_pbp_db()`. We cover `load_pbp()` below, and please see [Example 8: Using the built-in database function] for how to work with the database function `update_pbp_db()`.
The easiest way to access the data from the release is the function `load_pbp()`. It can load multiple seasons directly into memory and supports multiple data formats. Loading all play-by-play data of the 2022-2024 seasons is as easy as
```{r}
pbp <- nflfastR::load_pbp(2022:2024)
```
Joining roster data to the play-by-play data set is possible as well. The data can be accessed with the function `load_rosters()` and its application is demonstrated in [Example 10: Working with roster and position data].
# Application Examples
All examples listed below assume that the following libraries are installed (and loaded).
``` {r load, warning = FALSE, message = FALSE}
library(nflfastR)
library(nflplotR)
library(dplyr)
library(ggplot2)
```
## Example 1: Completion Percentage Over Expected (CPOE)
Let's look at CPOE leaders from the 2009 regular season.
As discussed above, `nflfastR` has a data release for all available seasons, so there's no need to actually build them. Let's use that here with the convenience function `load_pbp()` which fetches data from the release (for non-R users, .csv and .parquet are also available in the [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp)).
``` {r ex3-cpoe, warning = FALSE, message = FALSE}
games_2009 <- nflfastR::load_pbp(2009) |> dplyr::filter(season_type == "REG")
games_2009 |>
dplyr::filter_out(is.na(cpoe)) |>
dplyr::summarize(
passer = nflreadr::stat_mode(passer_player_name),
cpoe = mean(cpoe),
Atts = n(),
.by = passer_player_id
) |>
dplyr::filter(Atts > 200) |>
dplyr::slice_max(cpoe, n = 5) |>
knitr::kable(digits = 1)
```
## Example 2: Using Drive Information
When working with `nflfastR`, drive results are automatically included. We use `fixed_drive` and `fixed_drive_result` since the NFL-provided information is a bit wonky. Let's look at how much more likely teams were to score starting from 1st & 10 at their own 20 yard line in 2015 (the last year before touchbacks on kickoffs changed to the 25) than in 2000.
``` {r ex4, warning = FALSE, message = FALSE}
pbp <- nflfastR::load_pbp(c(2003, 2015))
out <- pbp |>
dplyr::filter(
season_type == "REG" & down == 1 & ydstogo == 10 & yardline_100 == 80
) |>
dplyr::mutate(
drive_score = dplyr::case_when(
fixed_drive_result %in% c("Touchdown", "Field goal") ~ 1L,
TRUE ~ 0L
)
) |>
dplyr::summarize(drive_score = mean(drive_score), .by = season)
out |>
knitr::kable(digits = 3)
```
So `r scales::percent(out$drive_score[1], accuracy = 0.1)` of 1st & 10 plays from teams' own 20 would see the drive end up in a score in 2003, compared to `r scales::percent(out$drive_score[2], accuracy = 0.1)` in 2015. This has implications for Expected Points models (see [this article](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/)).
## Example 3: Plot offensive and defensive EPA per play for a given season
Let's build the **[NFL team tiers](https://rbsdm.com/stats/stats/)** using offensive and defensive expected points added per play for the 2005 regular season. Creating data viz including NFL team logos (or wordmarks, or headshots), we recommend the nflverse R package [nflplotR](https://nflplotr.nflverse.com).
When using `load_pbp()`, the helper function `clean_pbp()` has already been run, which creates "rush" and "pass" columns that (a) properly count sacks and scrambles as pass plays and (b) properly include plays with penalties. Using this, we can keep only rush or pass plays.
```{r ex5, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600}
pbp <- nflfastR::load_pbp(2005) |>
dplyr::filter(season_type == "REG") |>
dplyr::filter(!is.na(posteam) & (rush == 1 | pass == 1))
offense <- pbp |>
dplyr::group_by(team = posteam) |>
dplyr::summarise(off_epa = mean(epa, na.rm = TRUE))
defense <- pbp |>
dplyr::group_by(team = defteam) |>
dplyr::summarise(def_epa = mean(epa, na.rm = TRUE))
offense |>
dplyr::inner_join(defense, by = "team") |>
ggplot(aes(x = off_epa, y = def_epa)) +
geom_abline(
slope = -1.5,
intercept = c(.4, .3, .2, .1, 0, -.1, -.2, -.3),
alpha = .2
) +
nflplotR::geom_mean_lines(aes(y0 = off_epa, x0 = def_epa)) +
nflplotR::geom_nfl_logos(aes(team_abbr = team), width = 0.07, alpha = 0.7) +
labs(
x = "Offense EPA/play",
y = "Defense EPA/play",
caption = "Data: @nflfastR",
title = "2005 NFL Offensive and Defensive EPA per Play"
) +
theme_bw() +
theme(
plot.title = element_text(size = 12, hjust = 0.5, face = "bold")
) +
scale_y_reverse()
```
## Example 4: Expected Points calculator
We have provided a calculator for working with the Expected Points model. Here is an example of how to use it, looking for how the Expected Points on a drive beginning following a touchback has changed over time.
While I have put in `'SEA'` for `home_team` and `posteam`, this only matters for figuring out whether the team with the ball is the home team (there's no actual effect for given team; it would be the same no matter what team is supplied).
``` {r ex6a}
data <- tibble::tibble(
"season" = 1999:2019,
"home_team" = "SEA",
"posteam" = "SEA",
"roof" = "outdoors",
"half_seconds_remaining" = 1800,
"yardline_100" = c(rep(80, 17), rep(75, 4)),
"down" = 1,
"ydstogo" = 10,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
nflfastR::calculate_expected_points(data) |>
dplyr::select(season, yardline_100, td_prob, ep) |>
knitr::kable(digits = 2)
```
Not surprisingly, offenses have become much more successful over time, with the kickoff touchback moving from the 20 to the 25 in 2016 providing an additional boost. Note that the `td_prob` in this example is the probability that the next score within the same half will be a touchdown scored by team with the ball, **not** the probability that the current drive will end in a touchdown (this is why the numbers are different from Example 4 above).
We could compare the most recent four years to the expectation for playing in a dome by inputting all the same things and changing the `roof` input:
``` {r ex6b}
data <- tibble::tibble(
"season" = 2016:2019,
"week" = 5,
"home_team" = "SEA",
"posteam" = "SEA",
"roof" = "dome",
"half_seconds_remaining" = 1800,
"yardline_100" = c(rep(75, 4)),
"down" = 1,
"ydstogo" = 10,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
nflfastR::calculate_expected_points(data) |>
dplyr::select(season, yardline_100, td_prob, ep) |>
knitr::kable(digits = 2)
```
So for 2018 and 2019, 1st & 10 from a home team's own 25 yard line had higher EP in domes than at home, which is to be expected.
## Example 5: Win probability calculator
We have also provided a calculator for working with the win probability models. Here is an example of how to use it, looking for how the win probability to begin the game depends on the pre-game spread.
While I have put in `'SEA'` for `home_team` and `posteam`, this only matters for figuring out whether the team with the ball is the home team (there's no actual effect for given team; it would be the same no matter what team is supplied).
``` {r ex7}
data <- tibble::tibble(
"receive_2h_ko" = 0,
"home_team" = "SEA",
"posteam" = "SEA",
"score_differential" = 0,
"half_seconds_remaining" = 1800,
"game_seconds_remaining" = 3600,
"spread_line" = c(1, 3, 4, 7, 14),
"down" = 1,
"ydstogo" = 10,
"yardline_100" = 75,
"posteam_timeouts_remaining" = 3,
"defteam_timeouts_remaining" = 3
)
nflfastR::calculate_win_probability(data) |>
dplyr::select(spread_line, wp, vegas_wp) |>
knitr::kable(digits = 2)
```
Not surprisingly, `vegas_wp` increases with the amount a team was coming into the game favored by.
## Example 6: Using the built-in database function
If you're comfortable using `dplyr` functions to manipulate and tidy data, you're ready to use a database. Why should you use a database?
* The provided function in `nflfastR` makes it extremely easy to build a database and keep it updated
* Play-by-play data over 25+ seasons takes up a lot of memory: working with a database allows you to only bring into memory what you actually need
* R makes it *extremely* easy to work with databases.
### Start: install and load packages
To start, we need to install the two packages required for this that aren't installed automatically when `nflfastR` installs: `DBI` and `duckdb` (advanced users can use other types of databases, but this example will use duckdb). The `if` statements make sure the packages won't be updated if they are already installed:
``` {r eval = FALSE}
if (!require("DBI")) install.packages("DBI")
if (!require("duckdb")) install.packages("duckdb")
```
### Overview
There's exactly one function in `nflfastR` that works with databases: `update_pbp_db()`. Some notes:
* `update_pbp_db()` follows the DBI argument naming convention and order. It requires an open connection created with `DBI::dbConnect()`.
* You can specify a different table name with `name`.
* The `seasons` argument controls how the table in the connected database is handled. This is a hybrid argument, and its behavior is described in detail [in the function documentation](https://nflfastr.com/reference/update_pbp_db.html#the-seasons-argument).
* If larger parts of the DB need to be updated, then you should definitely consider doing so in chunks. The `"nflfastR.db_chunk_size"` option is available for this purpose. Further details can also be found in the function documentation.
### Connect to a database
Working with databases always requires an open connection. In this example, we will focus solely on duckdb databases, as duckdb has essentially become the state of the art for this type of data. duckdb can easily create a database in your memory. Of course, this doesn't make sense for large amounts of data, because they shouldn't be stored in memory, but the process is practically identical with a locally stored database.
So let's connect to an in-memory duckdb database:
``` {r}
connection <- DBI::dbConnect(duckdb::duckdb())
connection
```
### Write data to the database
Let's say I just want to dump play-by-play data of the 2021 - 2024 seasons in my database. Here we go!
``` {r create-db}
nflfastR::update_pbp_db(connection, seasons = 2021:2024)
```
This created a table named "nflverse_pbp" in the connected database and appended 2024 play-by-play data to it.
Wait, that's it? That's it! What if it's partway through the season and you want to make sure all the new games are added to the database to allow for data corrections from the NFL to propagate into your database? What do you run? `update_pbp_db()`!
``` {r update-db}
nflfastR::update_pbp_db(connection)
```
### Work with the database
Now we're ready to do stuff. If you aren't familiar with databases, they're organized around tables. Here's how to see which tables are present in our database:
``` {r}
DBI::dbListTables(connection)
```
Since we went with the defaults, there's a table called `nflverse_pbp`. Another useful function is to see the fields (i.e., columns) in a table:
``` {r}
DBI::dbListFields(connection, "nflverse_pbp") |>
utils::head(10)
```
This is the same list as the list of columns in `nflfastR` play-by-play. Notice we had to supply the name of the table above (`"nflverse_pbp"`).
With all that out of the way, there's only a couple more things to learn. The main driver here is `tbl`, which helps get output with a specific table in a database:
``` {r}
pbp_db <- dplyr::tbl(connection, "nflverse_pbp")
```
And now, everything will magically just "work": you can forget you're even working with a database!
``` {r}
pbp_db |>
dplyr::group_by(season) |>
dplyr::summarize(n = dplyr::n())
pbp_db |>
dplyr::filter(
rush == 1 | pass == 1,
down <= 2,
!is.na(epa),
!is.na(posteam)
) |>
dplyr::group_by(pass) |>
dplyr::summarize(mean_epa = mean(epa, na.rm = TRUE))
```
So far, everything has stayed in the database. If you want to bring a query into memory, just use `collect()` at the end:
``` {r}
russ <- pbp_db |>
dplyr::filter(name == "R.Wilson" & posteam == "SEA") |>
dplyr::select(desc, epa) |>
dplyr::collect()
russ
```
So we've searched through `r pbp_db |> dplyr::count() |> dplyr::collect() |> dplyr::pull(n) |> prettyNum(big.mark = ",")` rows of data across 300+ columns and only brought about `r round(nrow(russ), -1)` rows and two columns into memory. Pretty neat! This is how we supply the data to the shiny apps on rbsdm.com without running out of memory on the server. Now there's only one more thing to remember. When you're finished doing what you need with the database:
``` {r}
DBI::dbDisconnect(connection)
```
For more details on using a database with `nflfastR`, see [Thomas Mock's life-changing post here](https://themockup.blog/posts/2019-04-28-nflfastr-dbplyr-rsqlite/). More detailed information on dbplyr (the dplyr database back-end) are given in the second edition of [Hadley Wickham's R for Data Science (2e)](https://r4ds.hadley.nz/databases.html).
## Example 7: working with the expected yards after catch model
The variables in `xyac` are as follows:
* `xyac_epa`: The expected value of EPA gained after the catch, **starting from where the catch was made**.
* `xyac_success`: The probability the play earns positive EPA (relative to where play started) based on where ball was caught.
* `xyac_fd`: Probability play earns a first down based on where the ball was caught.
* `xyac_mean_yardage` and `xyac_median_yardage`: Average and median expected yards after the catch based on where the ball was caught.
Some other notes:
* `epa` = `air_epa` + `yac_epa`, where `air_epa` is the EPA associated with a catch at the target location. If a receiver loses a fumble, it is removed from his `yac_epa`
* Expected value of EPA at catch point = `air_epa` + `xyac_epa`
* So if we want to get YAC EPA over expected, we need to compare `yac_epa` to `xyac_epa`, as in the example below
* To get first downs over expected, we could compare `first_down` to `xyac_fd`
* These fields are populated for all pass attempts, whether caught or not, but restrict to completed passes when measuring, for example, YAC EPA over expected
* The expected YAC EPA model doesn't take receiver fumbles into account, so actual minus expected YAC is slightly negative due to fumbles happening
Let's create measures for EPA and first downs over expected in 2015:
``` {r ex9-xyac, warning = FALSE, message = FALSE}
nflfastR::load_pbp(2015) |>
dplyr::group_by(receiver, receiver_id, posteam) |>
dplyr::mutate(tgt = sum(complete_pass + incomplete_pass)) |>
dplyr::filter(tgt >= 50) |>
dplyr::filter(
complete_pass == 1,
air_yards < yardline_100,
!is.na(xyac_epa)
) |>
dplyr::summarize(
epa_oe = mean(yac_epa - xyac_epa),
actual_fd = mean(first_down),
expected_fd = mean(xyac_fd),
fd_oe = mean(first_down - xyac_fd),
rec = dplyr::n()
) |>
dplyr::ungroup() |>
dplyr::select(
receiver,
posteam,
actual_fd,
expected_fd,
fd_oe,
epa_oe,
rec
) |>
dplyr::slice_max(epa_oe, n = 10) |>
knitr::kable(digits = 3)
```
The presence of so many running backs on this list suggests that even though it takes into account target depth and pass direction, the model doesn't do a great job capturing space. Alternatively, running backs might be better at generating yards after the catch since running with the football is their primary role.
## Example 8: Working with roster and position data
At long last, there's a way to merge the new play-by-play data with roster information. Use the function to get the rosters:
``` {r roster}
roster <- nflfastR::load_rosters(2019)
```
Now let's load play-by-play data from 2019:
``` {r roster_pbp_load}
games_2019 <- nflfastR::load_pbp(2019)
```
Here is what the player IDs look like because `nflfastR` now automatically decodes IDs to look like the old format with GSIS IDs:
``` {r roster_pbp}
games_2019 |>
dplyr::filter(rush == 1 | pass == 1, posteam == "SEA") |>
dplyr::select(name, id)
```
Now we're ready to join to the roster data using these IDs:
``` {r decode_join}
joined <- games_2019 |>
dplyr::filter(!is.na(receiver_id)) |>
dplyr::select(posteam, season, desc, receiver, receiver_id, epa) |>
dplyr::left_join(roster, by = c("receiver_id" = "gsis_id"))
```
``` {r decode_table}
# the real work is done, this just makes a table and has it look nice
joined |>
dplyr::filter(position %in% c("WR", "TE", "RB")) |>
dplyr::group_by(receiver_id, receiver, position) |>
dplyr::summarize(tot_epa = sum(epa), n = n()) |>
dplyr::arrange(-tot_epa) |>
dplyr::ungroup() |>
dplyr::group_by(position) |>
dplyr::mutate(position_rank = 1:n()) |>
dplyr::filter(position_rank <= 5) |>
dplyr::rename(
Pos_Rank = position_rank,
Player = receiver,
Pos = position,
Tgt = n,
EPA = tot_epa
) |>
dplyr::select(Player, Pos, Pos_Rank, Tgt, EPA) |>
knitr::kable(digits = 0)
```
Not surprisingly, all 5 of the top 5 WRs in terms of EPA added come in ahead of the top RB. Note that the number of targets won't match official stats because we're including plays with penalties.
## Example 9: Replicating official stats
The columns like `name`, `passer`, `fantasy` etc are `nflfastR`-created columns that mimic "real" football: i.e., excluding plays with spikes, counting scrambles and sacks as pass plays, etc. But if you're trying to replicate official statistics -- perhaps for fantasy purposes -- use the `*_player_name` and `*_player_id` columns.
[Let's try to replicate this page of passing leaders](https://www.nfl.com/stats/player-stats/).
``` {r stats1}
nflfastR::load_pbp(2020) |>
dplyr::filter(
season_type == "REG",
complete_pass == 1 | incomplete_pass == 1 | interception == 1,
!is.na(down)
) |>
dplyr::group_by(passer_player_name, posteam) |>
dplyr::summarize(
yards = sum(passing_yards, na.rm = T),
tds = sum(touchdown == 1 & td_team == posteam),
ints = sum(interception),
att = dplyr::n()
) |>
dplyr::arrange(-yards) |>
utils::head(10) |>
knitr::kable(digits = 0)
```
These match the official stats on NFL.com (note the filter for `season_type == "REG"` since official stats only count regular season games). Note that we're using `passing_yards` here because `yards_gained` is not equal to passing yards on plays with laterals.
While the above code works in this case, there are several special cases where it is nearly impossible to get official player stats from nflfastR play-by-play data. The reason for this is that the idea of nflfastR play-by-play data is a "tidy" data structure. In other words, the aim is to have one row per play in the data. This can lead to problems if, for example, there are several changes of possession per play (i.e. several fumbles) or if the ball is lateraled in a play. These are just two examples of “abnormal” plays that are not fully captured in a tidy data structure.
We have solved this problem with the function `calculate_stats()`. This function uses playstats of the raw play-by-play data before it is parsed into a tidy structure by nflfastR.
This function has the following features:
- It determines stats in offense, defense, and special teams,
- either on player level or on team level,
- and can summarize them on season level (separately for regular season and post season) or on week level.
For more information see the function documentation of `calculate_stats()`. Again, **don't try to get an exact match with official stats based on nflfastR play-by-play data**. It usually works, but fails because of details that are unsolvable.
Now let's replicate the above table using `calculate_stats()`:
``` {r stats2}
s <- nflfastR::calculate_stats(
seasons = 2020,
summary_level = "season",
stat_type = "player",
season_type = "REG"
)
s |>
dplyr::slice_max(passing_yards, n = 10) |>
dplyr::select(
player_name,
recent_team,
completions,
attempts,
passing_yards,
passing_tds,
passing_interceptions,
attempts
) |>
knitr::kable(digits = 0)
```
The same applies to stats data as to pbp data. Its computation is costly, but can be automated. There is therefore rarely a reason to call `calculate_stats()` directly. Instead, nflverse offers the functions `nflfastR::load_player_stats()` and `nflfastR::load_team_stats()` to load precomputed data from data releases.
# Frequent issues
## The `drive` column looks wacky
Use `fixed_drive` and `fixed_drive_result` instead. See [Example 2: Using Drive Information].
## Why are there so many win probability columns?
`vegas_wp` and `vegas_home_wp` incorporate the pregame spread and are much better models.
## Need more help?
Please ask [in the nflverse Discord server](https://discord.com/invite/5Er2FBnnQa).
================================================
FILE: vignettes/stats_variables.Rmd
================================================
---
title: "NFL Stats Variables"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
comment = "#>"
)
with_dt <- requireNamespace("DT")
```
Below you will find a table that lists and explains all the variables available in `calculate_stats()`. Compared to the old `calculate_player_stats*()` functions that have been deprecated, practically all variables (and their names) have been preserved. However, there are a few differences. These are
- `recent_team`: renamed to `team` (recent team in weekly data never made sense)
- `interceptions`: renamed to `passing_interceptions` (all passing stats have the passing prefix)
- `sacks`: renamed to `sacks_suffered` (to make clear it's not on defensive side)
- `sack_yards`: renamed to `sack_yards_lost` (to make clear it's not on defensive side)
- `dakota`: not implemented at the moment
- `def_tackles`: there is `def_tackles_solo` and `def_tackles_with_assist`
- `def_fumble_recovery_own`: renamed to `fumble_recovery_own` (it is not exclusive to defense)
- `def_fumble_recovery_yards_own`: renamed to `fumble_recovery_yards_own` (it is not exclusive to defense)
- `def_fumble_recovery_opp`: renamed to `fumble_recovery_opp` (it is not exclusive to defense)
- `def_fumble_recovery_yards_opp`: renamed to `fumble_recovery_yards_opp` (it is not exclusive to defense)
- `def_safety`: renamed to `def_safeties` (we use plural everywhere)
- `def_penalty`: renamed to `penalties` (it is not exclusive to defense)
- `def_penalty_yards`: renamed to `penalty_yards` (it is not exclusive to defense)
```{r eval = with_dt}
DT::datatable(
nflfastR::nfl_stats_variables,
options = list(scrollX = TRUE, pageLength = 25),
filter = "top",
rownames = FALSE,
style = "bootstrap4"
)
```
```{r eval = !with_dt}
knitr::kable(nflfastR::nfl_stats_variables)
```