Pierre Counathe | NBA Shot Success Probability

Back to Projects

NBA Shot Success Probability

Motivation & Objective

This project aims at creating a visualization tool for basketball games, incorporating the Shot Success Probability at any time - probability of success if the current ball carrier takes a shot. This tool is a first step towards an in-game-decision analysis tool as it can be used to assess the relevance of the choice between making a pass and taking a shot: is the shot success probability higher before of after the pass? The data used is described below, but more detailed datasets could allow for the construction of a more developed tool analysing the Expected Point Value of each possible action (pass, shoot, layup, drive, etc.).

Data

Two distinct datasets are used:

A shot logs file is used to build the shot success probability model. It can be found on kaggle. It contains data on shots taken during the 2014-2015 season, who took the shot, where it was on the court, where was the closest defender, and more.
Another dataset containing in-game positions of players on the pitch is used to apply the Shot Success Probability model. The game data can be found on this GitHub repository. It contains raw SportsVU game data from the 2016 season.

Approach

This project has two main components:

The Shot Success Probability model
Its application to game data and visualization

Shot Success Probability model

The Shot Success Probability model is a simple Logistic Regression based on the features available in the shot logs file. This file contains for each shot of the 2014-2015 season the distance from the ball carrier to the closest defender and to the basket, and a label informing if the shot was successful or not. Based on these features and using the trained model, it is possible to determine the Shot Success Probability of every shot in a game, as long as the distances between the ball carrier, the closest defender and the basket can be computed.

Application to Game Data and Visualization

From the game dataset containing the players' positions, it is easy to compute the distances from players to the ball and to determine who the ball carrier is. However, assigning the ball carrier flag to the player closest to the ball leads in errors when a pass goes by a defender, or a teammate who actually did not touch the ball. A trick here is to assign the ball-carrier flag based on multiple frames.

Then, it is easy to compute the distance from the ball carrier to the basket and to the closest defender, to flag passes, to apply the Shot Success Probability model, and to build a visualization for it.