Repository: christopherjenness/NBA-player-movement Branch: master Commit: 543cf41f5b11 Files: 8 Total size: 94.3 KB Directory structure: gitextract_z95h9qgz/ ├── .gitignore ├── README.md └── game/ ├── allgames.txt ├── game.py ├── pbpevents.txt ├── scrape_games.py ├── spacing_analysis.py └── velocity_analysis.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *~ *.DS_Store *__pycache__/ *.pyc ================================================ FILE: README.md ================================================ # NBA player tracking visualization and analysis This library contains useful methods for visualizing and analyzing NBA player tracking data. The data is located here and contains all player and ball locations for NBA games from the 2015-16 season. Play-by-play data is obtained from nba.stats.com. Example visualizations are shown below. ## System Requirements * `curl` * `ffmpeg` * `p7zip` ## TODO * Long term solution for play-by-play data. This may break at any moment. [See here](https://github.com/christopherjenness/NBA-player-movement/issues/5) * Python 3 support. [See here](https://github.com/christopherjenness/NBA-player-movement/issues/4) ## Visualization Note, these examples use watch_play() to visualize plays. This method is extremely slow. animate_play() is much faster since it streams frames directly to ffmpeg without writing them to disk first. To visualize games from the tracking data, the `Game` class in `game.py` is used. ```python from game import Game game = Game('01.08.2016', 'POR', 'GSW') game.watch_play(game_time=6, length=120, commentary=False) ``` ![NoCommentary](examples/GSWatPORnocommentary.gif) To easily follow the flow of the game, commentary can be added. ```python game.watch_play(game_time=6, length=120, commentary=True) ``` ![Commentary](examples/GSWatPOR.gif) If you are interested in a single player, they can easily be tracked. ```python game.watch_play(game_time=2007, length=10, highlight_player='Stephen Curry', commentary=False) ``` ![Curry3](examples/Curry3.gif) All of a players actions can be extracted and viewed with a single method call. Currently, actions can be in ['all_FG', 'made_FG', 'miss_FG', 'rebound'], but this method can be easily extended to include any action. ```python game.watch_player_actions("Stephen Curry", "made_FG") """ This method will output a video for each of Steph's made FGs in the game, however, I am just diplaying one of them. """ ``` ![CurryFG](examples/CurryFG.gif) ## Analysis (In Progress) Here, we analyze two aspects of basketball that are difficult to address without player tracking data: * Defensive Spacing * Player/Team Velocity ### Defensive Spacing NBA commentators often praise offensive teams who can "Space the defense". Essentially, if an offensive team can draw out defenders to the three point line, passing lanes will open up and drives to the basket will be less clogged and more efficient. Here we analyze how effectively teams can space the defense. The workhorse of this analysis is `scipy.spatial.ConvexHull` which measures the convex hull of the defense (larger convex hull = more spaced defense). This can be visualized: ```python game.watch_play(121, 10, commentary=False, show_spacing='home') ``` ![SpacingPlay](examples/GSWspacing.gif) `spacing_analysis.py` contains the code for the following analysis. To process the data, only "set plays" were analyzed. Since "transition plays" have unique spacing properties, we limited this analysis to "standard" plays where the offense and defense are set. Which teams are best at spacing the defense? (Remember, spacing the defense more is thought to be better). If we average over all time points for each team, we get the following: ![SpacingBar](examples/DefensiveSpacing.png) Interestingly, we see that Detroit is the best team at spacing defenses. [This is something that has been anecdotally documented by Mike Prada, and the data back up his claims.](http://www.sbnation.com/nba/2015/1/9/7517125/detroit-pistons-winning-streak-josh-smith-released) Additionally, teams like Cleveland that are thought to have a modern offense, are great at spacing the defense. But the question is: **Does spacing the defense help you win?** Here we look at the score differential vs defensive spacing and we see a positive correlation. In fact, spacing the defense an extra 5 square feet correlates with increasing the score differential 4.25 points! ![SpacingScore](examples/SpacingVsScore.png) If you stare at this graph long enough, you can notice it also shows the level of home court advantage in the NBA. If you are interested, you can read an analysis of home court advantage I did [here.](https://github.com/christopherjenness/my-pdfs/blob/master/NBAHomeTeamAdvantage.pdf) **How can a team space the defense better?** Intuitively, spacing your offense will draw out the defense. The plot below looks at each game, and plots how spaced the offenses and defenses were. Clearly, a more spaced offense correlates with a more spaced defense. ![SpacingOffDeff](examples/OffenseVsDefense.png) But when you break it down by team, how effectively can each team space the defense? Below is a plot of each teams average offensive spacing plotted against how well they can space the opponent's defense. As expected, if a team has a well spaced offense, their opponents defense is more spaced. There are a few interested exceptions though. ![TeamSpacing](examples/Spacing_scatter.png) Notice Toronto (TOR). Toronto has a hard time spacing the defense even though they space out their offense. This is likely due to their star DeMar DeRozan being a shooting liability. Defenders don't need to guard him out on the 3PT line, so they can keep the paint clogged. Notice San Antonio (SAS). San Antonio can effectively space the defense without spacing out their offense. This may be due to having one of the best 3PT shooters in the league, Kawhi Leonard, who needs be guarded religiously at the 3PT line. Currently, I'm working on breaking down defensive spacing per play to see the effect on individual plays instead of aggregated game data. This is yielding interesting insights. ### Player Velocity Player tracking data provides insight into tean's and player's velocity. Here we analyze how player speed affects the flow of the game. The analysis code can be found in `velocity_analysis.py`. Using the visualization shwon above, team velocities can be shown as the game progresses: ```python game = Game('01.08.2016', 'POR', 'GSW') watch_play_velocities(game, game_time=7, length=54) ``` ![TeamVelocity](examples/TeamVelocity.gif) Alternatively, individual player velocities can be visualized: ```python game = Game('01.08.2016', 'POR', 'GSW') watch_play_velocities(game, game_time=7, length=54, highlight_player='Stephen Curry') ``` ![StephVelocity](examples/CurryVelocity.gif) Different teams have different offense/defensive scheme's that require different amounts of running. When we break down velocity by team, we can look at how much effort each team's scheme takes. (Note: I threw out all transition data, since I was interested in 'set' plays). ![OffenseVelocity](examples/VelocityOffenseTeams.png) What we see makes sense- The Spurs have the most running incorporated into their offense. The Spurs are known for their "flowing" offense, so this makes sense. ![DefenseVelocity](examples/VelocityDefenseTeams.png) Looking at defense is a bit more complicated. Defensive velocity takes into account a number of things: closing out, switching, zoning, etc. We will need to break these down to get real insight. One aspect of basketball that is currently hard to evaluate is player fatigue. Tracking player velocity, we can see how it decreases over the course of a game as a metric for fatigue. What we see is some teams, such as the Indiana Pacers decrease in offensive velocity as the game progresses (each dot is the average velocity of a single game). ![INDfatigue](examples/INDfatige.png) Interestingly, while the Spurs have the highest offensive velocity in the league, they show no fatigue over the course of a game. This reflects the speculated culture of the Spurs. ![SASfatigue](examples/SASfatige.png) This will be more insightful when we break down fatigue by player, since different players are affected differentially. ================================================ FILE: game/allgames.txt ================================================ 01.01.2016.CHA.at.TOR.7z 01.01.2016.DAL.at.MIA.7z 01.01.2016.NYK.at.CHI.7z 01.01.2016.ORL.at.WAS.7z 01.01.2016.PHI.at.LAL.7z 01.02.2016.BKN.at.BOS.7z 01.02.2016.DEN.at.GSW.7z 01.02.2016.DET.at.IND.7z 01.02.2016.HOU.at.SAS.7z 01.02.2016.MEM.at.UTA.7z 01.02.2016.MIL.at.MIN.7z 01.02.2016.NOP.at.DAL.7z 01.02.2016.OKC.at.CHA.7z 01.02.2016.ORL.at.CLE.7z 01.02.2016.PHI.at.LAC.7z 01.02.2016.PHX.at.SAC.7z 01.03.2016.ATL.at.NYK.7z 01.03.2016.CHI.at.TOR.7z 01.03.2016.MIA.at.WAS.7z 01.03.2016.PHX.at.LAL.7z 01.03.2016.POR.at.DEN.7z 01.04.2016.BOS.at.BKN.7z 01.04.2016.CHA.at.GSW.7z 01.04.2016.HOU.at.UTA.7z 01.04.2016.IND.at.MIA.7z 01.04.2016.MEM.at.POR.7z 01.04.2016.MIN.at.PHI.7z 01.04.2016.ORL.at.DET.7z 01.04.2016.SAC.at.OKC.7z 01.04.2016.SAS.at.MIL.7z 01.04.2016.TOR.at.CLE.7z 01.05.2016.GSW.at.LAL.7z 01.05.2016.MIL.at.CHI.7z 01.05.2016.NYK.at.ATL.7z 01.05.2016.SAC.at.DAL.7z 01.06.2016.CHA.at.PHX.7z 01.06.2016.CLE.at.WAS.7z 01.06.2016.DAL.at.NOP.7z 01.06.2016.DEN.at.MIN.7z 01.06.2016.DET.at.BOS.7z 01.06.2016.IND.at.ORL.7z 01.06.2016.LAC.at.POR.7z 01.06.2016.MEM.at.OKC.7z 01.06.2016.NYK.at.MIA.7z 01.06.2016.TOR.at.BKN.7z 01.06.2016.UTA.at.SAS.7z 01.07.2016.ATL.at.PHI.7z 01.07.2016.BOS.at.CHI.7z 01.07.2016.LAL.at.SAC.7z 01.07.2016.UTA.at.HOU.7z 01.08.2016.CLE.at.MIN.7z 01.08.2016.DAL.at.MIL.7z 01.08.2016.DEN.at.MEM.7z 01.08.2016.GSW.at.POR.7z 01.08.2016.IND.at.NOP.7z 01.08.2016.MIA.at.PHX.7z 01.08.2016.NYK.at.SAS.7z 01.08.2016.OKC.at.LAL.7z 01.08.2016.ORL.at.BKN.7z 01.08.2016.TOR.at.WAS.7z 01.09.2016.BKN.at.DET.7z 01.09.2016.CHA.at.LAC.7z 01.09.2016.CHI.at.ATL.7z 01.09.2016.GSW.at.SAC.7z 01.09.2016.MIA.at.UTA.7z 01.09.2016.TOR.at.PHI.7z 01.09.2016.WAS.at.ORL.7z 01.10.2016.BOS.at.MEM.7z 01.10.2016.CHA.at.DEN.7z 01.10.2016.CLE.at.PHI.7z 01.10.2016.DAL.at.MIN.7z 01.10.2016.IND.at.HOU.7z 01.10.2016.MIL.at.NYK.7z 01.10.2016.NOP.at.LAC.7z 01.10.2016.OKC.at.POR.7z 01.10.2016.UTA.at.LAL.7z 01.11.2016.MIA.at.GSW.7z 01.11.2016.SAS.at.BKN.7z 01.11.2016.WAS.at.CHI.7z 01.12.2016.BOS.at.NYK.7z 01.12.2016.CHI.at.MIL.7z 01.12.2016.CLE.at.DAL.7z 01.12.2016.HOU.at.MEM.7z 01.12.2016.NOP.at.LAL.7z 01.12.2016.OKC.at.MIN.7z 01.12.2016.PHX.at.IND.7z 01.12.2016.SAS.at.DET.7z 01.13.2016.ATL.at.CHA.7z 01.13.2016.DAL.at.OKC.7z 01.13.2016.GSW.at.DEN.7z 01.13.2016.IND.at.BOS.7z 01.13.2016.MIA.at.LAC.7z 01.13.2016.MIL.at.WAS.7z 01.13.2016.MIN.at.HOU.7z 01.13.2016.NOP.at.SAC.7z 01.13.2016.NYK.at.BKN.7z 01.13.2016.UTA.at.POR.7z 01.14.2016.CHI.at.PHI.7z 01.14.2016.CLE.at.SAS.7z 01.14.2016.DET.at.MEM.7z 01.14.2016.LAL.at.GSW.7z 01.14.2016.SAC.at.UTA.7z 01.15.2016.ATL.at.MIL.7z 01.15.2016.CHA.at.NOP.7z 01.15.2016.CLE.at.HOU.7z 01.15.2016.DAL.at.CHI.7z 01.15.2016.MIA.at.DEN.7z 01.15.2016.MIN.at.OKC.7z 01.15.2016.PHX.at.BOS.7z 01.15.2016.POR.at.BKN.7z 01.15.2016.WAS.at.IND.7z 01.18.2016.BKN.at.TOR.7z 01.18.2016.BOS.at.DAL.7z 01.18.2016.CHI.at.DET.7z 01.18.2016.GSW.at.CLE.7z 01.18.2016.HOU.at.LAC.7z 01.18.2016.NOP.at.MEM.7z 01.18.2016.ORL.at.ATL.7z 01.18.2016.PHI.at.NYK.7z 01.18.2016.POR.at.WAS.7z 01.18.2016.UTA.at.CHA.7z 01.19.2016.IND.at.PHX.7z 01.19.2016.MIL.at.MIA.7z 01.19.2016.MIN.at.NOP.7z 01.19.2016.OKC.at.DEN.7z 01.20.2016.ATL.at.POR.7z 01.20.2016.BOS.at.TOR.7z 01.20.2016.CHA.at.OKC.7z 01.20.2016.CLE.at.BKN.7z 01.20.2016.DET.at.HOU.7z 01.20.2016.GSW.at.CHI.7z 01.20.2016.MIA.at.WAS.7z 01.20.2016.MIN.at.DAL.7z 01.20.2016.PHI.at.ORL.7z 01.20.2016.SAC.at.LAL.7z 01.20.2016.UTA.at.NYK.7z 01.21.2016.DET.at.NOP.7z 01.21.2016.MEM.at.DEN.7z 01.22.2016.CHA.at.ORL.7z 01.22.2016.CHI.at.BOS.7z 01.22.2016.IND.at.GSW.7z 01.22.2016.LAC.at.NYK.7z 01.22.2016.MIA.at.TOR.7z 01.22.2016.MIL.at.HOU.7z 01.22.2016.OKC.at.DAL.7z 01.22.2016.SAS.at.LAL.7z 01.22.2016.UTA.at.BKN.7z 01.23.2016.ATL.at.PHX.7z 01.23.2016.CHI.at.CLE.7z 01.23.2016.DET.at.DEN.7z 01.23.2016.IND.at.SAC.7z 01.23.2016.LAL.at.POR.7z 01.23.2016.MEM.at.MIN.7z 01.23.2016.MIL.at.NOP.7z 01.23.2016.NYK.at.CHA.7z 01.23.2016.UTA.at.WAS.7z 10.27.2015.CLE.at.CHI.7z 10.27.2015.DET.at.ATL.7z 10.27.2015.NOP.at.GSW.7z 10.28.2015.CLE.at.MEM.7z 10.28.2015.DEN.at.HOU.7z 10.28.2015.IND.at.TOR.7z 10.28.2015.LAC.at.SAC.7z 10.28.2015.MIN.at.LAL.7z 10.28.2015.NOP.at.POR.7z 10.28.2015.NYK.at.MIL.7z 10.28.2015.PHI.at.BOS.7z 10.28.2015.SAS.at.OKC.7z 10.28.2015.UTA.at.DET.7z 10.28.2015.WAS.at.ORL.7z 10.29.2015.ATL.at.NYK.7z 10.29.2015.DAL.at.LAC.7z 10.29.2015.MEM.at.IND.7z 10.30.2015.BKN.at.SAS.7z 10.30.2015.CHA.at.ATL.7z 10.30.2015.CHI.at.DET.7z 10.30.2015.LAL.at.SAC.7z 10.30.2015.MIA.at.CLE.7z 10.30.2015.MIN.at.DEN.7z 10.30.2015.OKC.at.ORL.7z 10.30.2015.POR.at.PHX.7z 10.30.2015.TOR.at.BOS.7z 10.30.2015.UTA.at.PHI.7z 10.30.2015.WAS.at.MIL.7z 10.31.2015.BKN.at.MEM.7z 10.31.2015.GSW.at.NOP.7z 10.31.2015.NYK.at.WAS.7z 10.31.2015.PHX.at.POR.7z 10.31.2015.SAC.at.LAC.7z 10.31.2015.UTA.at.IND.7z 11.01.2015.ATL.at.CHA.7z 11.01.2015.DAL.at.LAL.7z 11.01.2015.DEN.at.OKC.7z 11.01.2015.HOU.at.MIA.7z 11.01.2015.MIL.at.TOR.7z 11.01.2015.ORL.at.CHI.7z 11.01.2015.SAS.at.BOS.7z 11.02.2015.CLE.at.PHI.7z 11.02.2015.MEM.at.GSW.7z 11.02.2015.MIL.at.BKN.7z 11.02.2015.OKC.at.HOU.7z 11.02.2015.PHX.at.LAC.7z 11.02.2015.POR.at.MIN.7z 11.02.2015.SAS.at.NYK.7z 11.03.2015.ATL.at.MIA.7z 11.03.2015.CHI.at.CHA.7z 11.03.2015.DEN.at.LAL.7z 11.03.2015.IND.at.DET.7z 11.03.2015.MEM.at.SAC.7z 11.03.2015.ORL.at.NOP.7z 11.03.2015.TOR.at.DAL.7z 11.04.2015.BKN.at.ATL.7z 11.04.2015.BOS.at.IND.7z 11.04.2015.LAC.at.GSW.7z 11.04.2015.NYK.at.CLE.7z 11.04.2015.ORL.at.HOU.7z 11.04.2015.PHI.at.MIL.7z 11.04.2015.POR.at.UTA.7z 11.04.2015.SAC.at.PHX.7z 11.04.2015.SAS.at.WAS.7z 11.04.2015.TOR.at.OKC.7z 11.05.2015.CHA.at.DAL.7z 11.05.2015.MEM.at.POR.7z 11.05.2015.MIA.at.MIN.7z 11.05.2015.OKC.at.CHI.7z 11.05.2015.UTA.at.DEN.7z 11.06.2015.ATL.at.NOP.7z 11.06.2015.DEN.at.GSW.7z 11.06.2015.DET.at.PHX.7z 11.06.2015.HOU.at.SAC.7z 11.06.2015.LAL.at.BKN.7z 11.06.2015.MIA.at.IND.7z 11.06.2015.MIL.at.NYK.7z 11.06.2015.PHI.at.CLE.7z 11.06.2015.TOR.at.ORL.7z 11.06.2015.WAS.at.BOS.7z 11.07.2015.BKN.at.MIL.7z 11.07.2015.CHA.at.SAS.7z 11.07.2015.GSW.at.SAC.7z 11.07.2015.HOU.at.LAC.7z 11.07.2015.MEM.at.UTA.7z 11.07.2015.MIN.at.CHI.7z 11.07.2015.NOP.at.DAL.7z 11.07.2015.ORL.at.PHI.7z 11.07.2015.WAS.at.ATL.7z 11.08.2015.DET.at.POR.7z 11.08.2015.IND.at.CLE.7z 11.08.2015.LAL.at.NYK.7z 11.08.2015.PHX.at.OKC.7z 11.08.2015.TOR.at.MIA.7z 11.09.2015.DET.at.GSW.7z 11.09.2015.MEM.at.LAC.7z 11.09.2015.MIN.at.ATL.7z 11.09.2015.ORL.at.IND.7z 11.09.2015.POR.at.DEN.7z 11.09.2015.SAS.at.SAC.7z 11.10.2015.BOS.at.MIL.7z 11.10.2015.CHA.at.MIN.7z 11.10.2015.DAL.at.NOP.7z 11.10.2015.LAL.at.MIA.7z 11.10.2015.NYK.at.TOR.7z 11.10.2015.OKC.at.WAS.7z 11.10.2015.UTA.at.CLE.7z 11.11.2015.BKN.at.HOU.7z 11.11.2015.DET.at.SAC.7z 11.11.2015.GSW.at.MEM.7z 11.11.2015.IND.at.BOS.7z 11.11.2015.LAC.at.DAL.7z 11.11.2015.LAL.at.ORL.7z 11.11.2015.MIL.at.DEN.7z 11.11.2015.NOP.at.ATL.7z 11.11.2015.NYK.at.CHA.7z 11.11.2015.SAS.at.POR.7z 11.11.2015.TOR.at.PHI.7z 11.12.2015.GSW.at.MIN.7z 11.12.2015.LAC.at.PHX.7z 11.12.2015.UTA.at.MIA.7z 11.13.2015.ATL.at.BOS.7z 11.13.2015.BKN.at.SAC.7z 11.13.2015.CHA.at.CHI.7z 11.13.2015.CLE.at.NYK.7z 11.13.2015.HOU.at.DEN.7z 11.13.2015.LAL.at.DAL.7z 11.13.2015.MIN.at.IND.7z 11.13.2015.NOP.at.TOR.7z 11.13.2015.PHI.at.OKC.7z 11.13.2015.POR.at.MEM.7z 11.13.2015.UTA.at.ORL.7z 11.14.2015.BKN.at.GSW.7z 11.14.2015.CLE.at.MIL.7z 11.14.2015.DAL.at.HOU.7z 11.14.2015.DEN.at.PHX.7z 11.14.2015.DET.at.LAC.7z 11.14.2015.ORL.at.WAS.7z 11.14.2015.PHI.at.SAS.7z 11.15.2015.BOS.at.OKC.7z 11.15.2015.DET.at.LAL.7z 11.15.2015.MEM.at.MIN.7z 11.15.2015.NOP.at.NYK.7z 11.15.2015.POR.at.CHA.7z 11.15.2015.TOR.at.SAC.7z 11.15.2015.UTA.at.ATL.7z 11.16.2015.BOS.at.HOU.7z 11.16.2015.DAL.at.PHI.7z 11.16.2015.IND.at.CHI.7z 11.16.2015.LAL.at.PHX.7z 11.16.2015.OKC.at.MEM.7z 11.16.2015.POR.at.SAS.7z 11.17.2015.ATL.at.BKN.7z 11.17.2015.CHA.at.NYK.7z 11.17.2015.CLE.at.DET.7z 11.17.2015.DEN.at.NOP.7z 11.17.2015.MIL.at.WAS.7z 11.17.2015.MIN.at.MIA.7z 11.17.2015.TOR.at.GSW.7z 11.18.2015.BKN.at.CHA.7z 11.18.2015.CHI.at.PHX.7z 11.18.2015.DAL.at.BOS.7z 11.18.2015.DEN.at.SAS.7z 11.18.2015.IND.at.PHI.7z 11.18.2015.MIN.at.ORL.7z 11.18.2015.NOP.at.OKC.7z 11.18.2015.POR.at.HOU.7z 11.18.2015.SAC.at.ATL.7z 11.18.2015.TOR.at.UTA.7z 11.19.2015.GSW.at.LAC.7z 11.19.2015.MIL.at.CLE.7z 11.19.2015.SAC.at.MIA.7z 11.20.2015.BKN.at.BOS.7z 11.20.2015.CHI.at.GSW.7z 11.20.2015.DET.at.MIN.7z 11.20.2015.HOU.at.MEM.7z 11.20.2015.LAC.at.POR.7z 11.20.2015.NYK.at.OKC.7z 11.20.2015.PHI.at.CHA.7z 11.20.2015.PHX.at.DEN.7z 11.20.2015.SAS.at.NOP.7z 11.20.2015.TOR.at.LAL.7z 11.20.2015.UTA.at.DAL.7z 11.21.2015.ATL.at.CLE.7z 11.21.2015.MEM.at.SAS.7z 11.21.2015.MIL.at.IND.7z 11.21.2015.NYK.at.HOU.7z 11.21.2015.PHI.at.MIA.7z 11.21.2015.SAC.at.ORL.7z 11.21.2015.WAS.at.DET.7z 11.22.2015.BOS.at.BKN.7z 11.22.2015.DAL.at.OKC.7z 11.22.2015.GSW.at.DEN.7z 11.22.2015.PHX.at.NOP.7z 11.22.2015.POR.at.LAL.7z 11.22.2015.TOR.at.LAC.7z 11.23.2015.DET.at.MIL.7z 11.23.2015.NYK.at.MIA.7z 11.23.2015.OKC.at.UTA.7z 11.23.2015.ORL.at.CLE.7z 11.23.2015.PHI.at.MIN.7z 11.23.2015.PHX.at.SAS.7z 11.23.2015.SAC.at.CHA.7z 11.24.2015.BOS.at.ATL.7z 11.24.2015.CHI.at.POR.7z 11.24.2015.DAL.at.MEM.7z 11.24.2015.IND.at.WAS.7z 11.24.2015.LAC.at.DEN.7z 11.24.2015.LAL.at.GSW.7z 11.25.2015.ATL.at.MIN.7z 11.25.2015.BKN.at.OKC.7z 11.25.2015.CLE.at.TOR.7z 11.25.2015.DAL.at.SAS.7z 11.25.2015.MEM.at.HOU.7z 11.25.2015.MIA.at.DET.7z 11.25.2015.NOP.at.PHX.7z 11.25.2015.NYK.at.ORL.7z 11.25.2015.PHI.at.BOS.7z 11.25.2015.SAC.at.MIL.7z 11.25.2015.UTA.at.LAC.7z 11.25.2015.WAS.at.CHA.7z 11.27.2015.ATL.at.MEM.7z 11.27.2015.CHI.at.IND.7z 11.27.2015.CLE.at.CHA.7z 11.27.2015.DET.at.OKC.7z 11.27.2015.GSW.at.PHX.7z 11.27.2015.MIA.at.NYK.7z 11.27.2015.MIL.at.ORL.7z 11.27.2015.MIN.at.SAC.7z 11.27.2015.PHI.at.HOU.7z 11.27.2015.SAS.at.DEN.7z 11.27.2015.WAS.at.BOS.7z 11.28.2015.ATL.at.SAS.7z 11.28.2015.DEN.at.DAL.7z 11.28.2015.LAL.at.POR.7z 11.28.2015.NOP.at.UTA.7z 11.28.2015.SAC.at.GSW.7z 11.28.2015.TOR.at.WAS.7z 11.29.2015.BOS.at.ORL.7z 11.29.2015.DET.at.BKN.7z 11.29.2015.HOU.at.NYK.7z 11.29.2015.IND.at.LAL.7z 11.29.2015.MIL.at.CHA.7z 11.29.2015.MIN.at.LAC.7z 11.29.2015.PHI.at.MEM.7z 11.29.2015.PHX.at.TOR.7z 11.30.2015.BOS.at.MIA.7z 11.30.2015.DAL.at.SAC.7z 11.30.2015.DEN.at.MIL.7z 11.30.2015.GSW.at.UTA.7z 11.30.2015.HOU.at.DET.7z 11.30.2015.OKC.at.ATL.7z 11.30.2015.POR.at.LAC.7z 11.30.2015.SAS.at.CHI.7z 12.01.2015.DAL.at.POR.7z 12.01.2015.LAL.at.PHI.7z 12.01.2015.MEM.at.NOP.7z 12.01.2015.ORL.at.MIN.7z 12.01.2015.PHX.at.BKN.7z 12.01.2015.WAS.at.CLE.7z 12.02.2015.DEN.at.CHI.7z 12.02.2015.GSW.at.CHA.7z 12.02.2015.LAL.at.WAS.7z 12.02.2015.MIL.at.SAS.7z 12.02.2015.NOP.at.HOU.7z 12.02.2015.PHI.at.NYK.7z 12.02.2015.PHX.at.DET.7z 12.02.2015.TOR.at.ATL.7z 12.03.2015.DEN.at.TOR.7z 12.03.2015.IND.at.POR.7z 12.03.2015.OKC.at.MIA.7z 12.03.2015.ORL.at.UTA.7z 12.03.2015.SAS.at.MEM.7z 12.04.2015.BKN.at.NYK.7z 12.04.2015.CLE.at.NOP.7z 12.04.2015.HOU.at.DAL.7z 12.04.2015.LAL.at.ATL.7z 12.04.2015.MIL.at.DET.7z 12.04.2015.PHX.at.WAS.7z 12.05.2015.BOS.at.SAS.7z 12.05.2015.CHA.at.CHI.7z 12.05.2015.CLE.at.MIA.7z 12.05.2015.DEN.at.PHI.7z 12.05.2015.GSW.at.TOR.7z 12.05.2015.IND.at.UTA.7z 12.05.2015.NYK.at.MIL.7z 12.05.2015.ORL.at.LAC.7z 12.05.2015.POR.at.MIN.7z 12.05.2015.SAC.at.HOU.7z 12.06.2015.DAL.at.WAS.7z 12.06.2015.GSW.at.BKN.7z 12.06.2015.LAL.at.DET.7z 12.06.2015.PHX.at.MEM.7z 12.06.2015.SAC.at.OKC.7z 12.07.2015.BOS.at.NOP.7z 12.07.2015.DAL.at.NYK.7z 12.07.2015.DET.at.CHA.7z 12.07.2015.LAC.at.MIN.7z 12.07.2015.LAL.at.TOR.7z 12.07.2015.PHX.at.CHI.7z 12.07.2015.POR.at.MIL.7z 12.07.2015.SAS.at.PHI.7z 12.07.2015.WAS.at.MIA.7z 12.08.2015.GSW.at.IND.7z 12.08.2015.HOU.at.BKN.7z 12.08.2015.OKC.at.MEM.7z 12.08.2015.ORL.at.DEN.7z 12.08.2015.POR.at.CLE.7z 12.08.2015.UTA.at.SAC.7z 12.09.2015.ATL.at.DAL.7z 12.09.2015.CHI.at.BOS.7z 12.09.2015.HOU.at.WAS.7z 12.09.2015.LAC.at.MIL.7z 12.09.2015.LAL.at.MIN.7z 12.09.2015.MEM.at.DET.7z 12.09.2015.MIA.at.CHA.7z 12.09.2015.NYK.at.UTA.7z 12.09.2015.ORL.at.PHX.7z 12.09.2015.SAS.at.TOR.7z 12.10.2015.ATL.at.OKC.7z 12.10.2015.LAC.at.CHI.7z 12.10.2015.NYK.at.SAC.7z 12.10.2015.PHI.at.BKN.7z 12.11.2015.CHA.at.MEM.7z 12.11.2015.CLE.at.ORL.7z 12.11.2015.DET.at.PHI.7z 12.11.2015.GSW.at.BOS.7z 12.11.2015.LAL.at.SAS.7z 12.11.2015.MIA.at.IND.7z 12.11.2015.MIL.at.TOR.7z 12.11.2015.MIN.at.DEN.7z 12.11.2015.OKC.at.UTA.7z 12.11.2015.POR.at.PHX.7z 12.11.2015.WAS.at.NOP.7z 12.12.2015.BOS.at.CHA.7z 12.12.2015.GSW.at.MIL.7z 12.12.2015.IND.at.DET.7z 12.12.2015.LAC.at.BKN.7z 12.12.2015.LAL.at.HOU.7z 12.12.2015.NOP.at.CHI.7z 12.12.2015.NYK.at.POR.7z 12.12.2015.SAS.at.ATL.7z 12.12.2015.WAS.at.DAL.7z 12.13.2015.MEM.at.MIA.7z 12.13.2015.MIN.at.PHX.7z 12.13.2015.PHI.at.TOR.7z 12.13.2015.UTA.at.OKC.7z 12.14.2015.HOU.at.DEN.7z 12.14.2015.LAC.at.DET.7z 12.14.2015.MIA.at.ATL.7z 12.14.2015.NOP.at.POR.7z 12.14.2015.ORL.at.BKN.7z 12.14.2015.PHI.at.CHI.7z 12.14.2015.PHX.at.DAL.7z 12.14.2015.TOR.at.IND.7z 12.14.2015.UTA.at.SAS.7z 12.14.2015.WAS.at.MEM.7z 12.15.2015.CLE.at.BOS.7z 12.15.2015.DEN.at.MIN.7z 12.15.2015.HOU.at.SAC.7z 12.15.2015.MIL.at.LAL.7z 12.16.2015.BOS.at.DET.7z 12.16.2015.CHA.at.ORL.7z 12.16.2015.DAL.at.IND.7z 12.16.2015.MEM.at.CHI.7z 12.16.2015.MIA.at.BKN.7z 12.16.2015.MIL.at.LAC.7z 12.16.2015.MIN.at.NYK.7z 12.16.2015.NOP.at.UTA.7z 12.16.2015.PHI.at.ATL.7z 12.16.2015.PHX.at.GSW.7z 12.16.2015.POR.at.OKC.7z 12.16.2015.WAS.at.SAS.7z 12.17.2015.HOU.at.LAL.7z 12.17.2015.OKC.at.CLE.7z 12.17.2015.TOR.at.CHA.7z 12.18.2015.ATL.at.BOS.7z 12.18.2015.BKN.at.IND.7z 12.18.2015.DEN.at.UTA.7z 12.18.2015.DET.at.CHI.7z 12.18.2015.LAC.at.SAS.7z 12.18.2015.MEM.at.DAL.7z 12.18.2015.MIL.at.GSW.7z 12.18.2015.NOP.at.PHX.7z 12.18.2015.NYK.at.PHI.7z 12.18.2015.POR.at.ORL.7z 12.18.2015.SAC.at.MIN.7z 12.18.2015.TOR.at.MIA.7z 12.19.2015.CHA.at.WAS.7z 12.19.2015.CHI.at.NYK.7z 12.19.2015.IND.at.MEM.7z 12.19.2015.LAC.at.HOU.7z 12.19.2015.LAL.at.OKC.7z 12.20.2015.ATL.at.ORL.7z 12.20.2015.MIL.at.PHX.7z 12.20.2015.MIN.at.BKN.7z 12.20.2015.NOP.at.DEN.7z 12.20.2015.PHI.at.CLE.7z 12.20.2015.POR.at.MIA.7z 12.20.2015.SAC.at.TOR.7z 12.21.2015.BKN.at.CHI.7z 12.21.2015.CHA.at.HOU.7z 12.21.2015.IND.at.SAS.7z 12.21.2015.MIN.at.BOS.7z 12.21.2015.OKC.at.LAC.7z 12.21.2015.ORL.at.NYK.7z 12.21.2015.PHX.at.UTA.7z 12.21.2015.POR.at.ATL.7z 12.21.2015.SAC.at.WAS.7z 12.22.2015.DAL.at.TOR.7z 12.22.2015.DET.at.MIA.7z 12.22.2015.LAL.at.DEN.7z 12.22.2015.MEM.at.PHI.7z 12.23.2015.BOS.at.CHA.7z 12.23.2015.DAL.at.BKN.7z 12.23.2015.DEN.at.PHX.7z 12.23.2015.DET.at.ATL.7z 12.23.2015.HOU.at.ORL.7z 12.23.2015.MEM.at.WAS.7z 12.23.2015.NYK.at.CLE.7z 12.23.2015.OKC.at.LAL.7z 12.23.2015.PHI.at.MIL.7z 12.23.2015.POR.at.NOP.7z 12.23.2015.SAC.at.IND.7z 12.23.2015.SAS.at.MIN.7z 12.23.2015.UTA.at.GSW.7z 12.25.2015.CHI.at.OKC.7z 12.25.2015.CLE.at.GSW.7z 12.25.2015.LAC.at.LAL.7z 12.25.2015.NOP.at.MIA.7z 12.25.2015.SAS.at.HOU.7z 12.26.2015.BOS.at.DET.7z 12.26.2015.CHI.at.DAL.7z 12.26.2015.CLE.at.POR.7z 12.26.2015.DEN.at.SAS.7z 12.26.2015.HOU.at.NOP.7z 12.26.2015.IND.at.MIN.7z 12.26.2015.LAC.at.UTA.7z 12.26.2015.MEM.at.CHA.7z 12.26.2015.MIA.at.ORL.7z 12.26.2015.NYK.at.ATL.7z 12.26.2015.PHI.at.PHX.7z 12.26.2015.TOR.at.MIL.7z 12.26.2015.WAS.at.BKN.7z 12.27.2015.DEN.at.OKC.7z 12.27.2015.LAL.at.MEM.7z 12.27.2015.NYK.at.BOS.7z 12.27.2015.POR.at.SAC.7z 12.28.2015.ATL.at.IND.7z 12.28.2015.BKN.at.MIA.7z 12.28.2015.CLE.at.PHX.7z 12.28.2015.LAC.at.WAS.7z 12.28.2015.LAL.at.CHA.7z 12.28.2015.MIL.at.DAL.7z 12.28.2015.MIN.at.SAS.7z 12.28.2015.NOP.at.ORL.7z 12.28.2015.PHI.at.UTA.7z 12.28.2015.SAC.at.GSW.7z 12.28.2015.TOR.at.CHI.7z 12.29.2015.ATL.at.HOU.7z 12.29.2015.CLE.at.DEN.7z 12.29.2015.DET.at.NYK.7z 12.29.2015.MIA.at.MEM.7z 12.29.2015.MIL.at.OKC.7z 12.30.2015.BKN.at.ORL.7z 12.30.2015.DEN.at.POR.7z 12.30.2015.GSW.at.DAL.7z 12.30.2015.IND.at.CHI.7z 12.30.2015.LAC.at.CHA.7z 12.30.2015.LAL.at.BOS.7z 12.30.2015.PHI.at.SAC.7z 12.30.2015.PHX.at.SAS.7z 12.30.2015.UTA.at.MIN.7z 12.30.2015.WAS.at.TOR.7z 12.31.2015.GSW.at.HOU.7z 12.31.2015.LAC.at.NOP.7z 12.31.2015.MIL.at.IND.7z 12.31.2015.MIN.at.DET.7z 12.31.2015.PHX.at.OKC.7z 12.31.2015.POR.at.UTA.7z ================================================ FILE: game/game.py ================================================ """ Library for retrieving basektball player-tracking and play-by-play data. """ import matplotlib matplotlib.use('TkAgg') import os import warnings import json from subprocess import Popen, PIPE import pandas as pd import matplotlib.pyplot as plt from matplotlib.patches import Circle, Rectangle, Arc, Polygon import numpy as np import seaborn as sns from scipy.spatial import ConvexHull # Initialize project os.system('mkdir temp') datalink = None curl_request = None class Game(object): """ Class for basketball game. Contains play by play and player tracking data and methods for anaylsis and plotting. """ def __init__(self, date, team1, team2): """ Args: date (str): 'MM.DD.YYYY', date of game team1 (str): 'XXX', abbreviation of team1 in data tracking file name team2 (str): 'XXX', abbreviation of team2 in data tracking file name Attributes: date (str): 'MM.DD.YYYY', date of game team1 (str): 'XXX', abbreviation of team1 in data tracking file name team2 (str): 'XXX', abbreviation of team2 in data tracking file name tracking_id (str): id to access player tracking data Due to the way the SportVU data is stored, game_id is complicated: 'MM.DD.YYYY.AWAYTEAM.at.HOMETEAM' For Example: 01.13.2016.GSW.at.DEN tracking_data (dict): Dictionary of unstructured tracking data scraped from github. game_id (str): ID for game. Lukcily, SportVU and play by play use the same game ID pbp (pd.DataFrame): Play by play data. 33 columns per pbp instance. moments (pd.DataFrame): DataFrame of player tracking data. Each entry is a single snap-shot of where the players are at a given time on the court. Columns: ['quarter', 'universe_time', 'quarter_time', 'shot_clock', 'positions', 'game_time']. moments['positions'] contains a list of where each player and the ball are located. player_ids (dict): dictionary of {player: player_id} for all players in game. away_id (int): ID of away team home_id (int): ID of home team team_colors (dict): dictionary of colors for each team and ball. Used for ploting. home_team (str): 'XXX', abbreviation of home team away_team (str): 'XXX', abbreviation of away team """ self.date = date self.team1 = team1 self.team2 = team2 self.flip_direction = False self.tracking_id = ('{self.date}.{self.team2}.at.{self.team1}' .format(self=self)) self.tracking_data = None self.game_id = None self.pbp = None self.moments = None self.player_ids = None self._get_tracking_data() self._get_playbyplay_data() self._format_tracking_data() self._get_player_ids() self.away_id = self.tracking_data['events'][0]['visitor']['teamid'] self.home_id = self.tracking_data['events'][0]['home']['teamid'] self.team_colors = {-1: "orange", self.away_id: "blue", self.home_id: "red"} self.home_team = (self.tracking_data['events'][0]['home'] ['abbreviation']) self.away_team = (self.tracking_data['events'][0]['visitor'] ['abbreviation']) self.flip_direction = False self._determine_direction() print('All data is loaded') def _get_tracking_data(self): """ Helper function for retrieving tracking data Tracking Data is provided by NBA.com, hosted at: https://www.github.com/neilmj """ # Retrive and extract Data into /temp folder os.system(("curl {datalink} -o temp/zipdata" .format(datalink=datalink))) os.system("7za -o./temp x temp/zipdata") os.remove("./temp/zipdata") # Extract game ID from extracted file name. for file in os.listdir('./temp'): if os.path.splitext(file)[1] == '.json': self.game_id = file[:-5] # Load tracking data and remove json file with open('temp/{self.game_id}.json'.format(self=self)) as data_file: self.tracking_data = json.load(data_file) # Load this json os.remove('./temp/{self.game_id}.json'.format(self=self)) return self def _get_playbyplay_data(self): """ Helper function for retrieving play-by-play data. Play-by-play data is obtained via API call to NBA.com This service is likely to go down at any moment and ruin this whole project. """ os.system(curl_request) # load play by play into pandas DataFrame with open(("{cwd}/temp/pbp_{self.game_id}.json" .format(cwd=os.getcwd(), self=self))) as json_file: parsed = json.load(json_file)['resultSets'][0] os.remove(("{cwd}/temp/pbp_{self.game_id}.json" .format(cwd=os.getcwd(), self=self))) self.pbp = pd.DataFrame(parsed['rowSet']) self.pbp.columns = parsed['headers'] # Get time in quarter reamining to cross-reference tracking data self.pbp['Qmin'] = (self.pbp['PCTIMESTRING'].str .split(':', expand=True)[0]) self.pbp['Qsec'] = (self.pbp['PCTIMESTRING'].str .split(':', expand=True)[1]) self.pbp['Qtime'] = (self.pbp['Qmin'].astype(int)*60 + self.pbp['Qsec'].astype(int)) self.pbp['game_time'] = ((self.pbp['PERIOD'] - 1) * 720 + (720 - self.pbp['Qtime'])) # Format score so that it makes sense: 'XX-XX' self.pbp['SCORE'] = (self.pbp['SCORE'] .fillna(method='ffill') .fillna('0 - 0')) return self def _get_player_ids(self): """ Helper function for returning player ids for all players in game. Note: This data may also be somewhere more conveniently accessible in tracking_data. """ ids = {} for index, row in self.pbp.iterrows(): if row['PLAYER1_NAME'] not in ids: ids[row['PLAYER1_NAME']] = row['PLAYER1_ID'] if row['PLAYER2_NAME'] not in ids: ids[row['PLAYER2_NAME']] = row['PLAYER2_ID'] if row['PLAYER3_NAME'] not in ids: ids[row['PLAYER3_NAME']] = row['PLAYER3_ID'] ids.pop(None) self.player_ids = ids return self def _format_tracking_data(self): """ Heler function to format tracking data into pandas DataFrame """ events = pd.DataFrame(self.tracking_data['events']) moments = [] # Extract 'moments': Each moment is an individual frame for row in events['moments']: for inner_row in row: moments.append(inner_row) moments = pd.DataFrame(moments) moments = moments.drop_duplicates(subset=[1]) moments = moments.reset_index() moments.columns = ['index', 'quarter', 'universe_time', 'quarter_time', 'shot_clock', 'unknown', 'positions'] moments['game_time'] = (moments.quarter - 1) * 720 + \ (720 - moments.quarter_time) moments.drop(['index', 'unknown'], axis=1, inplace=True) self.moments = moments return self def _draw_court(self, color="gray", lw=2, grid=False, zorder=0): """ Helper function to draw court. Modified from Savvas Tjortjoglou with contribution from Michael Wheelock S. Tjortjoglou: http://savvastjortjoglou.com/nba-shot-sharts.html M. Wheelock: https://www.linkedin.com/in/michael-s-wheelock-a5635a66 """ ax = plt.gca() # Create the court lines outer = Rectangle((0, -50), width=94, height=50, color=color, zorder=zorder, fill=False, lw=lw) l_hoop = Circle((5.35, -25), radius=.75, lw=lw, fill=False, color=color, zorder=zorder) r_hoop = Circle((88.65, -25), radius=.75, lw=lw, fill=False, color=color, zorder=zorder) l_backboard = Rectangle((4, -28), 0, 6, lw=lw, color=color, zorder=zorder) r_backboard = Rectangle((90, -28), 0, 6, lw=lw, color=color, zorder=zorder) l_outer_box = Rectangle((0, -33), 19, 16, lw=lw, fill=False, color=color, zorder=zorder) l_inner_box = Rectangle((0, -31), 19, 12, lw=lw, fill=False, color=color, zorder=zorder) r_outer_box = Rectangle((75, -33), 19, 16, lw=lw, fill=False, color=color, zorder=zorder) r_inner_box = Rectangle((75, -31), 19, 12, lw=lw, fill=False, color=color, zorder=zorder) l_free_throw = Circle((19, -25), radius=6, lw=lw, fill=False, color=color, zorder=zorder) r_free_throw = Circle((75, -25), radius=6, lw=lw, fill=False, color=color, zorder=zorder) l_corner_a = Rectangle((0, -3), 14, 0, lw=lw, color=color, zorder=zorder) l_corner_b = Rectangle((0, -47), 14, 0, lw=lw, color=color, zorder=zorder) r_corner_a = Rectangle((80, -3), 14, 0, lw=lw, color=color, zorder=zorder) r_corner_b = Rectangle((80, -47), 14, 0, lw=lw, color=color, zorder=zorder) l_arc = Arc((5, -25), 47.5, 47.5, theta1=292, theta2=68, lw=lw, color=color, zorder=zorder) r_arc = Arc((89, -25), 47.5, 47.5, theta1=112, theta2=248, lw=lw, color=color, zorder=zorder) half_court = Rectangle((47, -50), 0, 50, lw=lw, color=color, zorder=zorder) hc_big_circle = Circle((47, -25), radius=6, lw=lw, fill=False, color=color, zorder=zorder) hc_sm_circle = Circle((47, -25), radius=2, lw=lw, fill=False, color=color, zorder=zorder) court_elements = [l_hoop, l_backboard, l_outer_box, outer, l_inner_box, l_free_throw, l_corner_a, l_corner_b, l_arc, r_hoop, r_backboard, r_outer_box, r_inner_box, r_free_throw, r_corner_a, r_corner_b, r_arc, half_court, hc_big_circle, hc_sm_circle] # Add the court elements onto the axes for element in court_elements: ax.add_patch(element) return ax def watch_play(self, game_time, length, highlight_player=None, commentary=True, show_spacing=None): """ DEPRECIATED. See animate_play() for similar (fastere) method Method for viewing plays in game. Outputs video file of play in {cwd}/temp Args: game_time (int): time in game to start video (seconds into the game). Currently game_time can also be an tuple of length two with (starting_frame, ending_frame) if you want to watch a play using frames instead of game time. length (int): length of play to watch (seconds) highlight_player (str): If not None, video will highlight the circle of the inputed player for easy tracking. commentary (bool): Whether to include play-by-play commentary underneath video show_spacing (str in ['home', 'away']): show convex hull of home or away team. if None, does not display any convex hull Returns: an instance of self, and outputs video file of play """ warnings.warn(("watch_play is extremely slow. " "Use animate_play for similar functionality, " "but greater efficiency")) if type(game_time) == tuple: starting_frame = game_time[0] ending_frame = game_time[1] else: # Get starting and ending frame from requested game_time and length starting_frame = self.moments[self.moments.game_time.round() == game_time].index.values[0] ending_frame = self.moments[self.moments.game_time.round() == game_time + length].index.values[0] # Make video of each frame for frame in range(starting_frame, ending_frame): self.plot_frame(frame, highlight_player=highlight_player, commentary=commentary, show_spacing=show_spacing) command = ('ffmpeg -framerate 20 -start_number {starting_frame} ' '-i %d.png -c:v libx264 -r 30 -pix_fmt yuv420p -vf ' '"scale=trunc(iw/2)*2:trunc(ih/2)*2" {starting_frame}' '.mp4').format(starting_frame=starting_frame) os.chdir('temp') os.system(command) os.chdir('..') # Delete images for file in os.listdir('./temp'): if os.path.splitext(file)[1] == '.png': os.remove('./temp/{file}'.format(file=file)) return self def watch_player_actions(self, player_name, action, length=15, max_vids=5): """ Method for viewing all plays a player in the game had of a specified type. For example: all of Damian Lillards FG attempts in the game Outputs video file for each play in {cwd}/temp Args: player_name (str): Name of player for which to produce videos. Currently, player_name must be perfectly formatted and capitalized, since no string processing is performed. action (str) {'all_FG', 'made_FG', 'miss_FG', 'rebound'}: Action type of interest length (int): length of play to watch (seconds) for each action. max_vids (int): Maximum number of videos to produce. max_vids=None if all videos are desired. If max_vids is less than the total number of actions in the game, the earliest actions are made into videos. Returns: an instance of self, and outputs video file of plays """ player_action_times = self._get_player_actions(player_name, action) for index, time in enumerate(player_action_times): if index == max_vids: break self.watch_play(time-length, length, highlight_player=player_name, commentary=False) return self def _get_commentary(self, game_time, commentary_length=6, commentary_depth=10): """ Helper function for returning play by play events for a given game time. Args: game_time (int): game time (in seconds) for which to retrieve commentary for commentary_length (int): Number of play-by-play calls to include in commentary commentary_depth (int): Number of seconds to look in past to retrieve play-by-play calls commentary_depth=10 looks at previous 10 seconds of game for play-by-play calls Returns: tuple of information (commentary_script, score) commentary_script (str): string of commentary Most recent play-by-play calls, seperated by line breaks score (str): Score at current time 'XX - XX' """ commentary = [' 'for i in range(commentary_length)] commentary[0] = '.' count = 0 score = "0 - 0" for game_second in range(game_time - commentary_depth, game_time + 2): for index, row in self.pbp[self.pbp.game_time == game_second].iterrows(): if row['HOMEDESCRIPTION']: commentary[count] = ('{self.home_team}: ' .format(self=self) + str(row['HOMEDESCRIPTION'])) count += 1 if row['VISITORDESCRIPTION']: commentary[count] = ('{self.away_team}: ' .format(self=self) + str(row['VISITORDESCRIPTION'])) count += 1 if row['NEUTRALDESCRIPTION']: commentary[count] = str(row['NEUTRALDESCRIPTION']) count += 1 score = str(row['SCORE']) if count == commentary_length - 1: break commentary_script = """{commentary[0]} \n{commentary[1]} \n{commentary[2]} \n{commentary[3]} \n{commentary[4]} \n{commentary[5]} """.format(commentary=commentary) return (commentary_script, score) def _get_player_actions(self, player_name, action): """ Helper function to get all times a player performed a specific action Args: player_name (str): name of player to get all actions for action {'all_FG', 'made_FG', 'miss_FG', 'rebound'}: Type of action to get all times for. Returns: times (list): list of game times a player performed a specific specific action """ player_id = self.player_ids[player_name] action_dict = {'all_FG': [1, 2], 'made_FG': [1], 'miss_FG': [2], 'rebound': [4]} action_df = self.pbp[(self.pbp['PLAYER1_ID'] == player_id) & (self.pbp['EVENTMSGTYPE'] .isin(action_dict[action]))] times = list(action_df['game_time']) return times def _get_moment_details(self, frame_number, highlight_player=None): """ Helper function for getting important information for a given frame Args: frame_number (int): Frame in game to retrieve data for frame_number gets player tracking data from moments.ix[frame_number] highlight_player (str): Name of player to be highlighted in downstream plotting. if None, no player is highlighted. Returns: tuple of data game_time (int): seconds into game of current moment x_pos (list): list of x coordinants for all players and ball y_pos (list): list of y coordinants for all players and ball colors (list): color coding of each player/ball for coordinant data sizes (list): size of each player/ball (used for showing ball height) quarter (int): Game quarter shot_clock (str): shot clock game_clock (str): game clock edges (list): list of marker edge sizes of each player for video. useful when trying to highlight a player by making their edge thicker. universe_time (int): Time in the universe, in msec """ current_moment = self.moments.ix[frame_number] game_time = int(np.round(current_moment['game_time'])) universe_time = int(current_moment['universe_time']) x_pos, y_pos, colors, sizes, edges = [], [], [], [], [] # Get player positions for player in current_moment.positions: x_pos.append(player[2]) y_pos.append(player[3]) colors.append(self.team_colors[player[0]]) # Use ball height for size (useful to sevie a shot) if player[0] == -1: sizes.append(max(150 - 2*(player[4] - 5)**2, 10)) else: sizes.append(200) # highlight_player makes their outline much thicker on the video if (highlight_player and player[1] == self.player_ids[highlight_player]): edges.append(5) else: edges.append(0.5) # Unfortunately, the plot is below the y axis, # so the y positions need to be corrected y_pos = np.array(y_pos) - 50 shot_clock = current_moment.shot_clock if np.isnan(shot_clock): shot_clock = 24.00 shot_clock = str(shot_clock).split('.')[0] game_min, game_sec = divmod(current_moment.quarter_time, 60) game_clock = "%02d:%02d" % (game_min, game_sec) quarter = current_moment.quarter return (game_time, x_pos, y_pos, colors, sizes, quarter, shot_clock, game_clock, edges, universe_time) def plot_frame(self, frame_number, highlight_player=None, commentary=True, show_spacing=False, plot_spacing=False, pipe=None): """ Creates an individual the frame of game. Outputs .png file in {cwd}/temp Args: frame_number (int): number of frame in game to create frame_number gets player tracking data from moments.ix[frame_number] highlight_player (str): Name of player to highlight (by making their outline thicker). if None, no player is highlighted commentary (bool): if True, add play-by-play commentary under frame show_spacing (str in ['home', 'away']): show convex hull of home or away team if None, does not display any convex hull pipe (subprocesses.Popen): Popen object with open pipe to send image to if False, image is written to disk instead of sent to pipe Returns: an instance of self, and outputs .png file of frame If pipe, ARGB values are sent to pipe object instead of writing to disk. TODO be able to call this method by game time instead of frame_number """ (game_time, x_pos, y_pos, colors, sizes, quarter, shot_clock, game_clock, edges, universe_time) = self._get_moment_details(frame_number, highlight_player=highlight_player) (commentary_script, score) = self._get_commentary(game_time) fig = plt.figure(figsize=(12, 6)) self._draw_court() frame = plt.gca() frame.axes.get_xaxis().set_ticks([]) frame.axes.get_yaxis().set_ticks([]) plt.scatter(x_pos, y_pos, c=colors, s=sizes, alpha=0.85, linewidths=edges) plt.xlim(-5, 100) plt.ylim(-55, 5) sns.set_style('dark') if commentary: plt.figtext(0.23, -.6, commentary_script, size=20) plt.figtext(0.43, 0.125, shot_clock, size=18) plt.figtext(0.5, 0.125, 'Q'+str(quarter), size=18) plt.figtext(0.57, 0.125, str(game_clock), size=18) plt.figtext(0.43, .85, self.away_team + " " + score + " " + self.home_team, size=18) if highlight_player: plt.figtext(0.17, 0.85, highlight_player, size=18) # Add team color indicators to top of frame plt.scatter([30, 67], [2.5, 2.5], s=100, c=[self.team_colors[self.away_id], self.team_colors[self.home_id]]) if show_spacing: # Show convex hull on frame xy_pos = np.column_stack((np.array(x_pos), np.array(y_pos))) if show_spacing == 'home': points = xy_pos[1:6, :] if show_spacing == 'away': points = xy_pos[6:, :] hull = ConvexHull(points) hull_points = points[hull.vertices, :] polygon = Polygon(hull_points, alpha=0.3, color='gray') ax = plt.gca() ax.add_patch(polygon) if pipe: # Write ARGB values to pipe fig.canvas.draw() string = fig.canvas.tostring_argb() pipe.stdin.write(string) plt.close() if commentary: fig = plt.figure(figsize=(12, 6)) plt.figtext(.2, .4, commentary_script, size=20) fig.canvas.draw() string = fig.canvas.tostring_argb() pipe.stdin.write(string) plt.close() else: # Save image to disk plt.savefig('temp/{frame_number}.png' .format(frame_number=frame_number), bbox_inches='tight') plt.close() return self def _in_formation(self, frame_number): """ This is a complicated method to explain, but it is actually very simple. It determines if the game is in a set offense/defense. It basically returns True if a normal play is being run, and False if the game is in transition, out of bounds, free throw, etc. It is useful for analyzing plays that teams run, and discarding all extranous times from the game. """ # Get relevant moment details details = self._get_moment_details(frame_number) x_pos = np.array(details[1]) shot_clock = details[6] # Determine if offense/defense is set if float(shot_clock) < 23: if (x_pos < 47).all() or (x_pos > 47).all(): return True return False def get_spacing_area(self, frame_number): """ Calculates convex hull of home and away team for a given frame. Useful for analyzing the spacing of teams. Args: frame_number (int): number of frame in game to calculate team convex hulls Returns: tuple of data (home_area, away_area) home_area (float): convex hull area of home team away_area (float): convex hull area of away team """ details = self._get_moment_details(frame_number) x_pos = np.array(details[1]) y_pos = np.array(details[2]) xy_pos = np.column_stack((x_pos, y_pos)) home_area = ConvexHull(xy_pos[1:6, :]).area away_area = ConvexHull(xy_pos[6:, :]).area return (home_area, away_area) def get_offensive_team(self, frame_number): """ Determines which team is on offense. Currently only works if team is in set offense or defense. Args: frame_number (int): number of frame in game to determine offensive team Returns: str in ['home', 'away'] """ details = self._get_moment_details(frame_number) x_pos = np.array(details[1]) quarter = details[5] if len(x_pos) != 11: return None if self.flip_direction: if (x_pos < 47).all() and quarter in [1, 2]: return 'away' if (x_pos > 47).all() and quarter in [3, 4]: return 'away' if (x_pos < 47).all() and quarter in [3, 4]: return 'home' if (x_pos > 47).all() and quarter in [1, 2]: return 'home' if (x_pos < 47).all() and quarter in [1, 2]: return 'home' if (x_pos > 47).all() and quarter in [3, 4]: return 'home' if (x_pos < 47).all() and quarter in [3, 4]: return 'away' if (x_pos > 47).all() and quarter in [1, 2]: return 'away' return None def _determine_direction(self): """ Helper funcation to determine which direction the home team is going. Surprisingly, this is not consistent and depends on the game. Currently, this method detects which side the players start on and is ~90% accurate """ incorrect_count = 0 correct_count = 0 for frame in range(0, 10000, 100): details = self._get_moment_details(frame) home_team_x = details[1][1:6] away_team_x = details[1][6:] if np.mean(home_team_x) < np.mean(away_team_x): incorrect_count += 1 else: correct_count += 1 if incorrect_count > correct_count: self.flip_direction = True return None def get_frame(self, game_time): """ Converts a game time to a frame number. Useful all over the place. Args: game_time (int): game time in seconds of interest Returns: frame (int): frame number of game time """ test_time = game_time while True: if test_time in self.moments.game_time.round(): frames = self.moments[self.moments.game_time.round() == test_time].index.values if len(frames) > 0: frame = frames[0] break else: test_time -= 1 else: test_time -= 1 return frame def get_play_frames(self, event_num, play_type='offense'): """ Args: event_num (int): EVENTNUM of interest in games.pbp NOTE: Check pbpevents.txt for event numbers play_type (str in ['offense', 'defense']): Team of interest is offense or defense Returns: tuple of (start_time (int), end_time (int)): start time and end time in seconds for play of interest """ play_index = self.pbp[self.pbp['EVENTNUM'] == event_num].index[0] event_team = str(self.pbp[self.pbp['EVENTNUM'] == event_num] .PLAYER1_TEAM_ABBREVIATION.head(1).values[0]) if event_team == self.home_team: target_team = 'home' if event_team == self.away_team: target_team = 'away' end_time = int(self.pbp[self.pbp['EVENTNUM'] == event_num].game_time) # To find lower bound on starting frame of the play, # determining when previous play ended putative_start_time = int(self.pbp.ix[play_index-1].game_time) putative_start_frame = self.get_frame(putative_start_time) end_frame = self.get_frame(end_time) for test_frame in range(putative_start_frame, end_frame): if self.get_offensive_team(test_frame) == target_team: break # If the previous loop never found an offensive play, # the function returns None else: return None # Add two seconds to game time to let the players settle into position start_frame = self.get_frame(round(self.moments.ix[test_frame].game_time + 2)) return (start_frame, end_frame) def animate_play(self, game_time, length, highlight_player=None, commentary=True, show_spacing=None): """ Method for animating plays in game. Outputs video file of play in {cwd}/temp. Individual frames are streamed directly to ffmpeg without writing them to the disk, which is a great speed improvement over watch_play Args: game_time (int): time in game to start video (seconds into the game). Currently game_time can also be an tuple of length two with (starting_frame, ending_frame)if you want to watch a play using frames instead of game time. length (int): length of play to watch (seconds) highlight_player (str): If not None, video will highlight the circle of the inputed player for easy tracking. commentary (bool): Whether to include play-by-play commentary in the animation show_spacing (str) in ['home', 'away']: show convex hull spacing of home or away team. If None, does not show spacing. Returns: an instance of self, and outputs video file of play """ if type(game_time) == tuple: starting_frame = game_time[0] ending_frame = game_time[1] else: # Get starting and ending frame from requested game_time and length starting_frame = self.moments[self.moments.game_time.round() == game_time].index.values[0] ending_frame = self.moments[self.moments.game_time.round() == game_time + length].index.values[0] # Make video of each frame filename = "./temp/{game_time}.mp4".format(game_time=game_time) if commentary: size = (960, 960) else: size = (960, 480) cmdstring = ('ffmpeg', '-y', '-r', '20', # fps '-s', '%dx%d' % size, # size of image string '-pix_fmt', 'argb', # Stream argb data from matplotlib '-f', 'rawvideo', '-i', '-', '-vcodec', 'libx264', filename) # Stream plots to pipe pipe = Popen(cmdstring, stdin=PIPE) for frame in range(starting_frame, ending_frame): self.plot_frame(frame, highlight_player=highlight_player, commentary=commentary, show_spacing=show_spacing, pipe=pipe) pipe.stdin.close() pipe.wait() return self ================================================ FILE: game/pbpevents.txt ================================================ Description of Play-by-play ‘EVENTMSGTYPE’ 1: Made FG 2: Miss FG 3: FT Attempt 4: Rebound 5: Turnover 6: Foul 7: Lane Violation (?) 8: Substitution 9: Timeout 10: Jump Ball 11: (?) 12: Quarter Start 13: Quarter End 14: (?) 15: (?) 16: (?) 17: (?) 18: (?) ================================================ FILE: game/scrape_games.py ================================================ """ Quick scipt to get all games in the database and save to text file. """ from bs4 import BeautifulSoup from urllib2 import urlopen def scrape(): page = urlopen(('https://github.com/sealneaward/' 'nba-movement-data/tree/master/data')).read() soup = BeautifulSoup(page) f = open('allgames.txt', 'w') for anchor in soup.findAll('a', class_="js-navigation-open"): if anchor.text.endswith('.7z') and len(anchor.text) == 24: f.write(anchor.text + '\n') f.close() return if __name__ == '__main__': scrape() ================================================ FILE: game/spacing_analysis.py ================================================ """ Scripts for analyzing spacing of NBA tracking data. The workhorse statistic for spacing is "Convex Hull" """ import os import pickle import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from sklearn import linear_model from game import Game def extract_games(): """ Extract games from allgames.txt Returns: list: list of games. Each element is list is [date, home_team, away_team] example element: ['01.01.2016', 'TOR', 'CHI'] """ games = [] with open('allgames.txt', 'r') as game_file: for line in game_file: game = line.split('.') date = "{game[0]}.{game[1]}.{game[2]}".format(game=game) away = game[3] home = game[5] games.append([date, home, away]) return games def get_spacing_statistics(date, home_team, away_team, write_file=False, write_score=False, write_game=False): """ Calculates spacing statistics for each frame in game Args: date (str): date of game in form 'MM.DD.YYYY'. Example: '01.01.2016' home_team (str): home team in form 'XXX'. Example: 'TOR' away_team (str): away team in form 'XXX'. Example: 'CHI' write_file (bool): If True, write pickle file of spacing statistics into data/spacing directory write_score (bool): If True, write pickle file of game score into data/score directory write_game (bool): If True, write pickle file of tracking data into data/game directory Note: This file is ~100MB. Returns: tuple: tuple of data (home_offense_areas, home_defense_areas, away_offense_areas, away_defense_areas), where each element of the tuple is a list of convex hull areas for each frame in the game. """ filename = ("{date}-{away_team}-" "{home_team}.p").format(date=date, away_team=away_team, home_team=home_team) # Do not recalculate spacing data if already saved to disk if filename in os.listdir('./data/spacing'): return game = Game(date, home_team, away_team) # Write game data to disk if write_game: pickle.dump(game, open('data/game/' + filename, "wb")) home_offense_areas, home_defense_areas = [], [] away_offense_areas, away_defense_areas = [], [] print(date, home_team, away_team) for frame in range(len(game.moments)): offensive_team = game.get_offensive_team(frame) if offensive_team: home_area, away_area = game.get_spacing_area(frame) if offensive_team == 'home': home_offense_areas.append(home_area) away_defense_areas.append(away_area) if offensive_team == 'away': home_defense_areas.append(home_area) away_offense_areas.append(away_area) results = (home_offense_areas, home_defense_areas, away_offense_areas, away_defense_areas) # Write spacing data to disk if write_file: filename = ("{date}-{away_team}-" "{home_team}").format(date=date, away_team=away_team, home_team=home_team) pickle.dump(results, open('data/spacing/' + filename + '.p', "wb")) # Write game scores to disk if write_score: score = game.pbp['SCORE'].ix[len(game.pbp) - 1] pickle.dump(score, open('data/score/' + filename + '.p', "wb")) return(home_offense_areas, home_defense_areas, away_offense_areas, away_defense_areas) def write_spacing(gamelist): """ Writes all spacing statistics to data/spacing directory for each game """ for game in gamelist: try: get_spacing_statistics(game[0], game[1], game[2], write_file=True, write_score=True) except: with open('errorlog.txt', 'a') as myfile: myfile.write("{game} Could not extract spacing data\n" .format(game=game)) def plot_spacing(date, home_team, away_team, defense=True, save_plot=False): """ Plots team's spacing distrubution in a game. Args: date (str): date of game in form 'MM.DD.YYYY'. Example: '01.01.2016' home_team (str): home team in form 'XXX'. Example: 'TOR' away_team (str): away team in form 'XXX'. Example: 'CHI' defense (bool): if True, plot defensive spacing. if False, plot offensive spacing save_plot (bool): if True, save plot to /temp directory Returns: None Also, shows plt.hist of team spacing during game """ plt.plot() filename = ("{date}-{away_team}-" "{home_team}").format(date=date, away_team=away_team, home_team=home_team) if filename in os.listdir('data/spacing'): data = pickle.load(open("data/spacing/"+filename, "rb")) else: return None plt.figure() if defense: plt.hist(data[1], bins=100, alpha=0.4, label=home_team) plt.hist(data[3], bins=100, alpha=0.4, label=away_team) else: plt.hist(data[0], bins=100, alpha=0.4, label=home_team) plt.hist(data[1], bins=100, alpha=0.4, label=away_team) plt.xlim(20, 100) plt.legend(loc='upper right') plt.show() if save_plot: plt.savefig('temp/spacing{date}.png'.format(date=date)) return None def get_spacing_details(game): """ Calculates mean spacing for game. Args: game (Game): game to compute spacing details for Returns: tuple of data (home_points, away_points, home_offense_areas, home_defense_areas, away_offense_areas, away_defense_areas) home_points (int): Points scored by home team away_points (int): Points scored by away team home_offense_area (float): Average spacing (sq ft) of home team while on offense home_defense_area (float): Average spacing (sq ft) of home team while on defense away_offense_area (float): Average spacing (sq ft) of away team while on offense away_defense_area (float): Average spacing (sq ft) of away team while on defense If game not saved in data/spacing directory, returns None """ fname = "{game[0]}-{game[2]}-{game[1]}.p".format(game=game) if (fname in os.listdir('data/spacing') and fname in os.listdir('data/score')): data = pickle.load(open("data/spacing/"+fname, "rb")) score = pickle.load(open("data/score/"+fname, "rb")).split(' ') away_points, home_points = score[0], score[2] means = tuple(map(np.mean, data)) return (int(home_points), int(away_points), *means) else: return None def get_spacing_df(gamelist): """ Organizes spacing data from all games into a DataFrame Args: gamelist (list): list of games where each element [date, home_team, away_team] example element: ['01.01.2016', 'TOR', 'CHI'] Returns: pd.DataFrame DataFrame up spacing data with columns: ['home_points', 'away_points', 'home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas', 'away_team', 'home_team', 'space_dif', 'home_win'] within DataFrame: home_win (int): 1 if home team won, -1 if lost space_dif (float): difference (sq ft) between away team's defensive spacing and home team's defensive spacing """ details = [] for game in gamelist: detail = get_spacing_details(game) if detail: details.append((*detail, game[1], game[2])) df = pd.DataFrame(details) df.columns = ['home_points', 'away_points', 'home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas', 'away_team', 'home_team'] df['space_dif'] = df.away_defense_areas - df.home_defense_areas df['home_win'] = np.sign(df.home_points - df.away_points) df = df[df.home_offense_areas > 80] return df def plot_offense_vs_defense_spacing(spacing_data): """ Plot of offensive vs. defensive spacing for games Args: spacing_data (pd.DataFrame): Dataframe with columns of spacing data ['home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas'] save_fig (bool): if True, save plot to temp/ directory Returns None Also, shows plot. """ sns.regplot(spacing_data.away_offense_areas, spacing_data.home_defense_areas, fit_reg=True, color=sns.color_palette()[0], ci=None) sns.regplot(spacing_data.home_offense_areas, spacing_data.away_defense_areas, fit_reg=False, color=sns.color_palette()[0], ci=None) plt.xlabel('Average Offensive Spacing (sq ft)', fontsize=16) plt.ylabel('Average Defensive Spacing (sq ft)', fontsize=16) plt.title('Offensive spacing robustly induces defensive spacing', fontsize=16) plt.savefig('temp/OffenseVsDefense.png') plt.close() return None def plot_defense_spacing_vs_score(spacing_data): """ Plot of team's defensive spacing vs score differential for games Args: spacing_data (pd.DataFrame): Dataframe with columns of spacing data ['home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas'] save_fig (bool): if True, save plot to temp/ directory Returns None Also, shows plot. """ y = spacing_data.home_points - spacing_data.away_points x = spacing_data.away_defense_areas - spacing_data.home_defense_areas sns.regplot(x, y, ci=False) plt.xlabel(' Home Team Defensive Spacing Differential (sq ft)', fontsize=16) plt.ylabel('Home Team Score Differential (pts)', fontsize=16) plt.title('Spacing the defense correlates with outscoring opponents', fontsize=16) plt.savefig('temp/SpacingVsScore.png') plt.close() def plot_defense_spacing_vs_wins(spacing_datae): """ Plot of team's defensive spacing vs wins (binary: 0, 1) for games Args: spacing_data (pd.DataFrame): Dataframe with columns of spacing data ['home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas'] save_fig (bool): if True, save plot to temp/ directory Returns None Also, shows plot. """ clf = linear_model.LogisticRegression(C=1) X = np.array(spacing_data.space_dif) X = X[:, np.newaxis] y = np.array(spacing_data.home_win) y_adjusted = (y+1) / 2 clf.fit(X, y) plt.scatter(X.ravel(), y_adjusted, color=sns.color_palette()[0], s=600, alpha=1, marker='|') plt.xlim(-10, 10) X_test = np.linspace(-10, 10, 300) X_test = X_test[:, np.newaxis] clf.predict(X_test) def model(x): return 1 / (1 + np.exp(-x)) log_fit = model(X_test * clf.coef_ + clf.intercept_).ravel() plt.scatter(X_test.ravel(), log_fit) plt.xlabel('Home Team Defensive Spacing Differential (sq ft)', fontsize=16) plt.ylabel('Home Team Win', fontsize=16) plt.title('Spacing the Defense Correlates with winning', fontsize=16) plt.savefig('temp/SpacingVsWins.png') plt.close() def plot_team_defensive_spacing(spacing_data): """ Plot of team's defensive spacing (bar graph) Args: spacing_data (pd.DataFrame): Dataframe with columns of spacing data ['home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas'] save_fig (bool): if True, save plot to temp/ directory Returns None Also, shows plot. """ df = pd.DataFrame() df['home'] = spacing_data.groupby('home_team')['away_defense_areas'].sum() df['home_count'] = spacing_data.groupby('home_team')['away_defense_areas'].count() df['away'] = spacing_data.groupby('away_team')['home_defense_areas'].sum() df['away_count'] = spacing_data.groupby('away_team')['home_defense_areas'].count() df['average_induced_space'] = (df.home + df.away) / (df.away_count + df.home_count) df['average_induced_space'].sort_values().plot(kind='bar', color=sns.color_palette()[0]) plt.xlabel('', fontsize=16) plt.ylabel("Opponent's Defensive Spacing (sq ft)", fontsize=16) plt.ylim(60, 70) plt.title("Team's ability to space the defense", fontsize=18) plt.savefig('temp/DefensiveSpacing.png') plt.close() def plot_teams_ability_to_space_defense(spacing_data): """ Plots teams ability to space defense given their offensive spacing (scatter plot) Args: spacing_data (pd.DataFrame): Dataframe with columns of spacing data ['home_offense_areas', 'home_defense_areas', 'away_offense_areas', 'away_defense_areas'] save_fig (bool): if True, save plot to temp/ directory Returns None Also saves plot to temp dir """ df = spacing_data.groupby('home_team').count() df['home'] = spacing_data.groupby('home_team')['away_defense_areas'].sum() df['home_count'] = spacing_data.groupby('home_team')['away_defense_areas'].count() df['away'] = spacing_data.groupby('away_team')['home_defense_areas'].sum() df['away_count'] = spacing_data.groupby('away_team')['home_defense_areas'].count() df['average_induced_space'] = (df.home + df.away) / (df.away_count + df.home_count) df['home_offense'] = spacing_data.groupby('home_team')['home_offense_areas'].sum() df['home_offense_count'] = spacing_data.groupby('home_team')['home_offense_areas'].count() df['away_offense'] = spacing_data.groupby('away_team')['away_offense_areas'].sum() df['away_offense_count'] = spacing_data.groupby('away_team')['away_offense_areas'].count() df['average_offense_space'] = (df.home_offense + df.away_offense) / (df.away_offense_count + df.home_offense_count) plt.scatter(df['average_induced_space'], df['average_offense_space'], s=74, alpha=0.7, c=sns.color_palette()[0]) for row in df.iterrows(): if row[0] in ['DEN', 'SAS', 'LAC', 'CLE', 'DET', 'WAS', 'TOR', 'MIL', 'ORL', 'DAL']: plt.annotate(row[0], xy=[row[1]['average_induced_space'] + -0.15, row[1]['average_offense_space'] + 0.1]) plt.xlabel('Average Offensive Spacing (sq ft)', fontsize=16) plt.ylabel("Average Opponent's Defensive Spacing (sq ft)", fontsize=16) plt.title("Team's ability to space opponent's defense", fontsize=16) plt.savefig('temp/Spacing_scatter.png') plt.close() if __name__ == "__main__": """ Calls functions to generate plots. Uncomment lines which you want to plot. if spacing data has not been calculated, uncomment 'write_spacing(games)', which will calculate the spacing data for all games and save it to disk. """ all_games = extract_games() # Uncomment if writing spacing first time # write_spacing(all_games) spacing_data = get_spacing_df(all_games) plot_offense_vs_defense_spacing(spacing_data) plot_defense_spacing_vs_score(spacing_data) plot_defense_spacing_vs_wins(spacing_data) plot_team_defensive_spacing(spacing_data) plot_teams_ability_to_space_defense(spacing_data) ================================================ FILE: game/velocity_analysis.py ================================================ """" Analysis of NBA player velocities. """ import os import pickle import numpy as np import matplotlib.pyplot as plt import seaborn as sns import pandas as pd from game import Game # Initialize Project os.chdir('~/Desktop/Personal/SportVU/NBA-player-movement') def extract_games(): """ Extract games from allgames.txt Returns: list: list of games. Each element is list is tuple (date, home_team, away_team) example element: ('01.01.2016', 'TOR', 'CHI') """ games = [] with open('allgames.txt', 'r') as game_file: for line in game_file: game = line.split('.') date = "{game[0]}.{game[1]}.{game[2]}".format(game=game) away = game[3] home = game[5] games.append((date, home, away)) return games def calculate_velocities(game, frame, highlight_player=None): """ Calculates team or player velocity for a frame in a game Args: game (Game): Game instance to get data from frame_number (int): number of frame in game to calculate velocities frame_number gets player tracking data from moments.ix[frame] highlight_player (str): Name of player to calculate velocity of. if None, cumulative team velocities are calculated. Returns: tuple of data (game_time, home_velocity, away_velocity) game_time (int): universe time of the frame home_velocity (float): cumulative velocity (ft/msec) of home team away_velocity (float): cumulative velocity (ft/msec) of away team """ details = game._get_moment_details(frame, highlight_player=highlight_player) previous_details = game._get_moment_details(frame - 1) game_time = details[9] # Highlighed player's edge value (details[8]) is 5 instead of 0.5 # Use this fact to retrieve the index of the player of interest if highlight_player: if 5 in details[8]: player_index = details[8].index(5) else: highlight_player = None if frame == 0: if highlight_player: return 0 return (game_time, 0, 0) # If not all the players are on the court, there is an error in the data if len(details[1]) != 11 or \ len(details[2]) != 11 or \ len(previous_details[1]) != 11 or \ len(previous_details[2]) != 11: return (game_time, 0, 0) delta_x = np.array(details[1]) - np.array(previous_details[1]) delta_y = np.array(details[2]) - np.array(previous_details[2]) delta_coordinants = zip(delta_x, delta_y) distance_traveled = map(lambda coords: np.linalg.norm(coords), delta_coordinants) delta_time = details[9] - previous_details[9] # Note, universe time is in msec velocity = list(map(lambda distances: distances / delta_time, distance_traveled)) if highlight_player: return (game_time, velocity[player_index]) home_velocity = sum(velocity[1:6]) away_velocity = sum(velocity[6:]) return (game_time, home_velocity, away_velocity) def plot_velocity_frame(game, frame_number, ax, highlight_player=None): """ Creates an individual the frame of game. Args: game (Game): Game instance to get data from frame_number (int): number of frame in game to create frame_number gets player tracking data from moments.ix[frame_number] highlight_player (str): Name of player to highlight (by making their outline thicker). if None, no player is highlighted Returns: plt.fig of frame from game with subplot of velocity. see README.md for example """ (game_time, x_pos, y_pos, colors, sizes, quarter, shot_clock, game_clock, edges, universe_time) = game._get_moment_details(frame_number, highlight_player=highlight_player) (commentary_script, score) = game._get_commentary(game_time) game._draw_court() frame = plt.gca() frame.axes.get_xaxis().set_ticks([]) frame.axes.get_yaxis().set_ticks([]) ax.scatter(x_pos, y_pos, c=colors, s=sizes, alpha=0.85, linewidths=edges) plt.xlim(-5, 100) plt.ylim(-55, 5) sns.set_style('dark') plt.figtext(0.43, 0.105, shot_clock, size=18) plt.figtext(0.5, 0.105, 'Q'+str(quarter), size=18) plt.figtext(0.57, 0.105, str(game_clock), size=18) plt.figtext(0.43, .442, game.away_team + " " + score + " " + game.home_team, size=18) # Add team color indicators to top of frame ax.scatter([30, 67], [2.5, 2.5], s=100, c=[game.team_colors[game.away_id], game.team_colors[game.home_id]]) def watch_play_velocities(game, game_time, length, highlight_player=None): """ Creates an movie of a play which includes a plot of the real-time velocities. Args: game (Game): Game instance to get data from game_time (int): time in game to start video (seconds into the game). length (int): length of play to watch (seconds) highlight_player (str): If not None, video will highlight the circle of the inputed player for easy tracking, and also display that players velocity Returns: None and outputs video file of play with velocity plot. See README.md for example """ starting_frames = game.moments[game.moments.game_time.round() == game_time] starting_frame = starting_frames.index.values[0] ending_frames = game.moments[game.moments.game_time.round() == game_time + length] ending_frame = ending_frames.index.values[0] indices = list(range(ending_frame - starting_frame)) if highlight_player: player_velocities = [calculate_velocities(game, frame, highlight_player=highlight_player)[1] for frame in range(starting_frame, ending_frame)] max_velocity = max(player_velocities) else: home_velocities = [calculate_velocities(game, frame)[1] for frame in range(starting_frame, ending_frame)] away_velocities = [calculate_velocities(game, frame)[2] for frame in range(starting_frame, ending_frame)] all_velocities = home_velocities + away_velocities max_velocity = max(all_velocities) # Plot each frame for index, frame in enumerate(range(starting_frame, ending_frame)): f, (ax1, ax2) = plt.subplots(2, figsize=(12, 12)) plot_velocity_frame(game, frame, ax=ax2, highlight_player=highlight_player) ax1.set_xlim([0, len(indices)]) ax1.set_ylim([0, max_velocity * 1.2]) if highlight_player: ax1.plot(indices[:index+1], player_velocities[:index+1], c='black', label=highlight_player) else: ax1.plot(indices[:index+1], home_velocities[:index+1], c=game.team_colors[game.home_id], label=game.home_team) ax1.plot(indices[:index+1], away_velocities[:index+1], c=game.team_colors[game.away_id], label=game.away_team) ax1.set_yticklabels([]) ax1.set_xticklabels([]) ax1.set_ylabel('Velocity', fontsize=22) if highlight_player: ax1.set_title(highlight_player, fontsize=24) else: ax1.legend(fontsize=18) plt.savefig('temp/' + str(index) + '.png') plt.close() # Make video of each frame command = ('ffmpeg -framerate 20 -start_number 0 -i %d.png -c:v ' 'libx264 -r 30 -pix_fmt yuv420p -vf ' '"scale=trunc(iw/2)*2:trunc(ih/2)*2" {starting_frame}' '.mp4').format(starting_frame=starting_frame) os.chdir('temp') os.system(command) os.chdir('..') # Delete images for file in os.listdir('./temp'): if os.path.splitext(file)[1] == '.png': os.remove('./temp/{file}'.format(file=file)) return def get_velocity_statistics(date, home_team, away_team, write_file=False, write_score=False, write_game=False): """ Calculates velocity statistics for each frame in game Args: date (str): date of game in form 'MM.DD.YYYY'. Example: '01.01.2016' home_team (str): home team in form 'XXX'. Example: 'TOR' away_team (str): away team in form 'XXX'. Example: 'CHI' write_file (bool): If True, write pickle file of velocity statistics into data/velocity directory write_score (bool): If True, write pickle file of game score into data/score directory write_game (bool): If True, write pickle file of tracking data into data/game directory Note: This file is ~100MB. Returns: tuple: tuple of data (home_offense_velocities, home_defense_velocities, away_offense_velocities, away_defense_velocities), where each element of the tuple is a list of tuples (frame, game_time, velocity) for each frame in the game. """ filename = ("{date}-{away_team}-" "{home_team}.p").format(date=date, away_team=away_team, home_team=home_team) # Do not recalculate spacing data if already saved to disk if filename in os.listdir('./data/velocity/'): return game = Game(date, home_team, away_team) # Write game data to disk if write_game: pickle.dump(game, open('data/game/' + filename, "wb")) home_offense_velocities, home_defense_velocities = [], [] away_offense_velocities, away_defense_velocities = [], [] print(date, home_team, away_team) for frame in range(1, len(game.moments)): offensive_team = game.get_offensive_team(frame) if offensive_team: (game_time, home_velocity, away_velocity) = calculate_velocities(game, frame) if offensive_team == 'home': home_offense_velocities.append((frame, game_time, home_velocity)) away_defense_velocities.append((frame, game_time, away_velocity)) if offensive_team == 'away': home_defense_velocities.append((frame, game_time, home_velocity)) away_offense_velocities.append((frame, game_time, away_velocity)) results = (home_offense_velocities, home_defense_velocities, away_offense_velocities, away_defense_velocities) # Write velocity data to disk if write_file: filename = ("{date}-{away_team}-" "{home_team}").format(date=date, away_team=away_team, home_team=home_team) pickle.dump(results, open('data/velocity/' + filename + '.p', "wb")) # Write game scores to disk if write_score: score = game.pbp['SCORE'].ix[len(game.pbp) - 1] pickle.dump(score, open('data/score/' + filename + '.p', "wb")) return (home_offense_velocities, home_defense_velocities, away_offense_velocities, away_defense_velocities) def write_velocity(gamelist): """ Writes all spacing statistics to data/spacing directory for each game """ for game in gamelist: try: get_velocity_statistics(game[0], game[1], game[2], write_file=True, write_score=True) except: with open('errorlog_velocity.txt', 'a') as myfile: myfile.write("{game} Could not extract velocity data\n" .format(game=game)) def extract_velocity(gamelist): """ Loads velocity data, calculates average offensive and defensive velocity for each game in gamelist Note: requires velocity data to be written for each game in data/velocity and data/score (see get_velocity_statistics()) Args: gamelist (list): list of games. Each element is list is tuple (date, home_team, away_team). example element: ('01.01.2016', 'TOR', 'CHI') Returns (pd.DataFrame): Dataframe of velocity data with columns: 0: Home Offensive Velocity 1: Away Offensive Velocity 2: Home Defensive Velocity 3: Away Defensive Velocity 4: Away Score 5: Home Score 6: Away Team 7: Home Team """ data = [] for game in gamelist: away_team = game[2] home_team = game[1] print(away_team, home_team) filename = ("{date}-{away_team}-" "{home_team}").format(date=game[0], away_team=away_team, home_team=home_team) # Load velocity/score data try: velocity_data = pickle.load(open('data/velocity/' + filename + '.p', 'rb')) score_data = pickle.load(open('data/score/' + filename + '.p', 'rb')) except: print('velocity data not written for: ', game) continue away_score, home_score = extract_scores(score_data) # Organize velocity data by team and offense/defense HOV = pd.DataFrame(velocity_data[0]) HDV = pd.DataFrame(velocity_data[1]) AOV = pd.DataFrame(velocity_data[2]) ADV = pd.DataFrame(velocity_data[3]) # Cut out erroneous velocity data # This is due to Frame-skipping in the SVU data # For example, from the last frame of a quarter to the # first frame of the next quarter, etc. HOV = HOV[HOV[2] < 0.15] AOV = AOV[AOV[2] < 0.15] HDV = HDV[HDV[2] < 0.15] ADV = ADV[ADV[2] < 0.15] game_data = (HOV[2].mean(), AOV[2].mean(), HDV[2].mean(), ADV[2].mean(), away_score, home_score, away_team, home_team) data.append(game_data) return pd.DataFrame(data) def extract_fatigue(gamelist): """ Loads velocity data, calculates average offensive and defensive velocity for each quarter for each game in gamelist Note: requires velocity data to be written for each game in data/velocity and data/score (see get_velocity_statistics()) Args: gamelist (list): list of games. Each element is list is tuple (date, home_team, away_team). example element: ('01.01.2016', 'TOR', 'CHI') Returns (pd.DataFrame): Dataframe of velocity data with columns: Tm: team Pos: Offense or Defense 1: 1st Quarter Mean Velocity 2: 2nd Quarter Mean Velocity 3: 3rd Quarter Mean Velocity 4: 4th Quarter Mean Velocity """ data = [] for game in gamelist: away_team = game[2] home_team = game[1] print(away_team, home_team) filename = ("{date}-{away_team}-" "{home_team}").format(date=game[0], away_team=away_team, home_team=home_team) # Load velocity/score data try: velocity_data = pickle.load(open('data/velocity/' + filename + '.p', 'rb')) score_data = pickle.load(open('data/score/' + filename + '.p', 'rb')) except: print('velocity data not written for: ', game) continue away_score, home_score = extract_scores(score_data) # Organize velocity data by team and offense/defense HOV = pd.DataFrame(velocity_data[0]) HDV = pd.DataFrame(velocity_data[1]) AOV = pd.DataFrame(velocity_data[2]) ADV = pd.DataFrame(velocity_data[3]) # Cut out erroneous velocity data # This is due to Frame-skipping in the SVU data # For example, from the last frame of a quarter to the # first frame of the next quarter, etc. HOV = HOV[HOV[2] < 0.15] AOV = AOV[AOV[2] < 0.15] HDV = HDV[HDV[2] < 0.15] ADV = ADV[ADV[2] < 0.15] quarter_velocities = {} for quarter in [1, 2, 3, 4]: ending_frame = int(len(HOV)/4 * quarter) starting_frame = int(len(HOV)/4 * (quarter-1)) quarter_velocities[quarter] = [HOV.iloc[starting_frame: ending_frame][2].mean(), HDV.iloc[starting_frame: ending_frame][2].mean(), AOV.iloc[starting_frame: ending_frame][2].mean(), ADV.iloc[starting_frame: ending_frame][2].mean(), ] df = pd.DataFrame(quarter_velocities) df['Tm'] = [home_team, home_team, away_team, away_team] df['Pos'] = ['Off', 'Def', 'Off', 'Def'] game_data = (df, away_score, home_score, away_team, home_team) data.append(game_data) df = pd.DataFrame() for i in range(len(data)): df = pd.concat((df, data[i][0])) df = pd.melt(df, ['Tm', 'Pos'], [1, 2, 3, 4]) return df def velocity_plots(df): """ Makes plots showing game velocity for SAS and IND Args: df (pd.DataFrame): dataframe of velocity data Note: use extract_velocity() to obtain this data Returns: None Saves plots to examples/ """ # Organize velocity data home = df[[0, 2, 5, 7]] away = df[[1, 3, 4, 6]] home.columns = ['Off', 'Def', 'Pts', 'Tm'] away.columns = ['Off', 'Def', 'Pts', 'Tm'] all_dat = pd.concat((home, away)) ave = all_dat.groupby('Tm').mean() # Plot of offense velocity by team plt.figure() sns.barplot(x='Tm', y='Off', data=all_dat, order=ave.sort_values('Off').index, color=sns.xkcd_rgb["pale red"]) plt.ylim(0.022, 0.03) locs, labels = plt.xticks() plt.setp(labels, rotation=90) locs, labels = plt.yticks() plt.yticks(locs, map(lambda x: "%.1f" % x, locs*1000)) plt.ylabel('Mean Offensive Velocity (ft/sec)') plt.xlabel('') plt.title('Offensive Velocity') plt.savefig('examples/VelocityOffenseTeams') # Plot of defense velocity by team plt.figure() sns.barplot(x='Tm', y='Def', data=all_dat, order=ave.sort_values('Def').index, color=sns.xkcd_rgb["pale red"]) plt.ylim(0.018, 0.024) locs, labels = plt.xticks() plt.setp(labels, rotation=90) locs, labels = plt.yticks() plt.yticks(locs, map(lambda x: "%.1f" % x, locs*1000)) plt.ylabel('Mean Defensive Velocity (ft/sec)') plt.xlabel('') plt.title('Defensive Velocity') plt.savefig('examples/VelocityDefenseTeams') def fatigue_plots(df): """ Makes plots showing game fatigue for SAS and IND Args: df (pd.DataFrame): dataframe of fatigue data Note: use extract_fatigue() to obtain this data Returns: None Saves plots to examples/ """ plt.figure() sns.swarmplot(x='variable', y='value', data=df[df.Pos == 'Off'][df.Tm == 'IND']) plt.title('Indiana Pacers Fatigue') plt.xlabel('Quarter') plt.ylabel('Mean Offensive Velocity (ft/sec)') plt.ylim(0.015, 0.034) locs, labels = plt.yticks() plt.yticks(locs, map(lambda x: "%.1f" % x, locs*1000)) plt.savefig('examples/INDfatige') plt.figure() sns.swarmplot(x='variable', y='value', data=df[df.Pos == 'Off'][df.Tm == 'SAS']) plt.title('San Antonio Spurs Fatigue') plt.xlabel('Quarter') plt.ylabel('Mean Offensive Velocity (ft/sec)') locs, labels = plt.yticks() plt.yticks(locs, map(lambda x: "%.1f" % x, locs*1000)) plt.savefig('examples/SASfatige') def extract_scores(score_data): """ Organizes score data from string to tuple Args: score_data (str): string of form 'AWAYSCORE - HOMESCORE' Example: '111 - 105' Returns: scores (tuple): tuple of form (away_score, home_score) where each score is an int """ away_score = int(score_data.split('-')[0]) home_score = int(score_data.split('-')[1]) scores = (away_score, home_score) return scores def set_plot_params(size): """ Sets font size on plots. 16-22 is a good range. """ SIZE = size plt.rc('font', size=SIZE) plt.rc('axes', titlesize=SIZE) plt.rc('axes', labelsize=SIZE) plt.rc('xtick', labelsize=SIZE) plt.rc('ytick', labelsize=SIZE) plt.rc('legend', fontsize=SIZE) if __name__ == "__main__": set_plot_params(16) all_games = extract_games() write_velocity(all_games) velocity_data = extract_velocity(all_games) velocity_plots(velocity_data) fatigue_data = extract_fatigue(all_games) fatigue_plots(fatigue_data)