Part 1 of this article describes how to retrieve the data to create NBA shot charts. If you’re not interested in that, you can download the dataset I used from data.world and skip ahead to Part 2.
Download dataset


If you’re a fan of the NBA and of data analysis & visualization, chances are you’ve come across a few of Kirk Goldsberry’s beautiful and informative shot charts visualizations. Though they don’t always tell the whole story – but what single chart does, especially in a sport like basketball where there are many moving variables – but given enough data points they can tell you a lot about a player’s or team’s shot selection and efficiency.

You can find his work on his Instagram page: https://www.instagram.com/kirkgoldsberry/.

So as fun exercise, I decided to recreate a similar chart in Tableau over the regular season 2018-19. The result is shown in the image below. Click on it to access the interactive dashboard on Tableau Public.

Click on the image to access the interactive dashboard on Tableau Public.

For now, you can filter on any of the 25 players who have attempted the most shots during the season, but the underlying data set contains all the shots taken by every single player during the regular season 2018-19. It also contains the coordinates of where every single shot is taken, and whether the player made or missed the shot.

stats.nba.com API

So where do we actually get this data? As it turns out, this data is freely available on stats.nba.com. If you’ve ever visited the website, you’ll know that there is a ton of statistics and features available. What is not obvious, is that there is also an extensive (but undocumented) API where you can easily extract all that data.

So much sweet, sweet information! But how to retrieve it?

Note: stats.nba.com doesn’t work for me when I have my VPN turned on (I’m using NordVPN). This applies to both the website, as well as when retrieving data via the API.

I can’t take credit for figuring this out all by myself. After a bit of Googling, I came across the nba_api Python package, which (in the developer’s own words) is “an API Client for www.nba.com, and is meant to make the API Endpoints more accessible and to provide extensive documentation”.

You can access the code and docs on Github. For this exercise, we are primarily interested in the shotchartdetail endpoint (documentation). Alternatively, if you prefer to work with R, there is a R library called ballr, but this only provides access to the shotchartdetail endpoint.

Retrieve the JSON: Using the nba_api package

Let’s see first how we can use the nba_api pacakge to retrieve data. Once you’ve installed the package, you can use the example code below to retrieve all the shots taken by James Harden during the regular season of 2018-19.

from nba_api.stats.endpoints import shotchartdetail

response = shotchartdetail.ShotChartDetail(
	team_id=0,
	player_id=201935,
	season_nullable='2018-19',
	season_type_all_star='Regular Season'
)

content = json.loads(response.get_json())

The nba_api package provides additional functions to look up a player_id (please refer to their documentation on Github). Another way to find a player_id is to visit the relevant player’s page on stats.nba.com and check the URL. For example, the URL to James Harden’s player page is https://stats.nba.com/player/201935/. To retrieve the data for all players at once, use 0 as the player_id.

Retrieve the JSON: Using standard Python

While it’s perfectly fine to use the nba_api pacakge to retrieve the data, it can be informative to look at how the above would work by using only the standard packages in Python. The code below should return the same results as the one above:

import requests
import json

url_base = 'https://stats.nba.com/stats/shotchartdetail'

headers = {
		'Host': 'stats.nba.com',
		'Connection': 'keep-alive',
		'Accept': 'application/json, text/plain, */*',
		'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
		'Referer': 'https://stats.nba.com/',
		"x-nba-stats-origin": "stats",
		"x-nba-stats-token": "true",
		'Accept-Encoding': 'gzip, deflate, br',
		'Accept-Language': 'en-US,en;q=0.9',
	}

parameters = {
	'ContextMeasure': 'FGA',
	'LastNGames': 0,
	'LeagueID': '00',
	'Month': 0,
	'OpponentTeamID': 0,
	'Period': 0,
	'PlayerID': 201935,
	'SeasonType': 'Regular Season',
	'TeamID': 0,
	'VsDivision': '',
	'VsConference': '',
	'SeasonSegment': '',
	'Season': '2018-19',
	'RookieYear': '',
	'PlayerPosition': '',
	'Outcome': '',
	'Location': '',
	'GameSegment': '',
	'GameId': '',
	'DateTo': '',
	'DateFrom': ''
}


response = requests.get(url_base, params=parameters, headers=headers)
content = json.loads(response.content)

To further break it down:

First we set the URL https://stats.nba.com/stats/shotchartdetail as our base URL. shotchartdetail is the endpoint where we can retrieve shot location data.

url_base = 'https://stats.nba.com/stats/shotchartdetail'

Next we set the request headers via the headers dictionary. Without the correct settings, stats.nba.com will forcibly close the connection, as it seems they are trying to prevent people from scraping the data. For example, the User-Agent setting makes it seem like you’re accessing the URL via a web browser.

headers = {
		'Host': 'stats.nba.com',
		'Connection': 'keep-alive',
		'Accept': 'application/json, text/plain, */*',
		'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
		'Referer': 'https://stats.nba.com/',
		"x-nba-stats-origin": "stats",
		"x-nba-stats-token": "true",
		'Accept-Encoding': 'gzip, deflate, br',
		'Accept-Language': 'en-US,en;q=0.9',
	}

Be aware that the required headers are regularly changed by nba.com. If that happens, we won’t be able to connect until we update the headers. It is possible that the NBA will close off access to this data at some point, so enjoy it while you can.
This issue also applies to the nba_api package. For instance, as of Friday January 24 2020, the nba_api will fail due to x-nba-stats-origin and x-nba-stats-token headers not being set, though this was scheduled to be fixed during the subsequent weekend.

We then need to set the URL parameters via the parameters dictonary. There are more parameter options than the ones listed here, but these are the ones that are specified as being ‘required’.

parameters = {
	'ContextMeasure': 'FGA',
	'LastNGames': 0,
	'LeagueID': '00',
	'Month': 0,
	'OpponentTeamID': 0,
	'Period': 0,
	'PlayerID': 201935,
	'SeasonType': 'Regular Season',
	'TeamID': 0,
	'VsDivision': '',
	'VsConference': '',
	'SeasonSegment': '',
	'Season': '2018-19',
	'RookieYear': '',
	'PlayerPosition': '',
	'Outcome': '',
	'Location': '',
	'GameSegment': '',
	'GameId': '',
	'DateTo': '',
	'DateFrom': ''
}

As you can see, you can usually pass through either 0 or empty strings if you do not want to filter on a category. This depends on the data types of these fields, and how they are being defined by stats.nba.com.

After providing the correct settings, we can then retrieve the data in JSON format:

response = requests.get(url_base, params=parameters, headers=url_headers)
content = json.loads(response.content)

Transform the JSON into a Pandas dataframe

At this point, I would usually export the JSON into a file and load it directly in Tableau or into a database capable of parsing JSON (e.g. Snowflake, Postgres). However, the data here doesn’t really make use of key-value pairs, like you’d expect from a JSON file. Instead, all the data (and headers) are stored as lists within the resultSets key. Fortunately, it is pretty straightforward to insert the data into a Pandas dataframe and export it to a csv file.

import pandas as pd

# transform contents into dataframe
results = content['resultSets'][0]
headers = results['headers']
rows = results['rowSet']
df = pd.DataFrame(rows)
df.columns = headers

# write to csv file
df.to_csv(<file_name>, index=False)

There you have it. You can now easily change the parameters to retrieve data from a different player, or a different season, or from the playoffs. And this is just one of the many endpoints in the stats.nba.com API, so there is plenty of data left to explore!

Checking the Data in Tableau

Before we wrap up Part 1, let’s take a brief look of the data in Tableau.

  • Add [LOC_X] to the Columns shelf.
  • Add [LOC_Y] to the Rows shelf.
  • Add [GAME_EVENT_ID] and [GAME ID] to the Detail shelf.

Awesome! These are all the shots that were attempted during the regular season 2018-19. We can now move on to Part 2 and see how we can build the shot charts like the one built by Kirk Goldsberry in Tableau.

14 thoughts on “NBA Shot Charts Part 1: Getting the Data (Python)

    1. Hi Lars, sorry for the very late reply!

      I did some research, and to retrieve data for the WNBA, you can set the LeagueID parameter to ’10’ instead of ’00’. Note that the format for the Season parameter is different from the NBA. For WNBA, the seasons are named ‘2019’ or ‘2020’, instead of ‘2018-19’ or ‘2019-20’, so you’ll need to change the parameters accordingly.

      Like

  1. So I am trying to get the 2001 data. How do you manage to get to that year? I have been trying to do that based on the instructions that you mentioned above?

    Like

    1. To get data for a specific season, you would need adjust the Season parameter. Use the value ‘2000-01’ if you want to retrieve the data from the 2000/2001 season.

      Like

  2. Great information Daniel. I did notice that when I loaded in the same data for James Harden from 2018-2019, I only received made shots. Therefore when I attempted to look at his efficiency from certain parts of the court, I didn’t have enough information. Do you know how to fix this within the API call?

    Like

    1. Hi Jeff,

      As a reference, check the NBA API source code and documentation on Github: https://github.com/swar/nba_api.

      You’ll have to change the setting for `context_measure_simple` when making the API call, i.e.:

      response = shotchartdetail.ShotChartDetail (
      context_measure_simple='FGA',
      team_id=0,
      player_id=0,
      season_nullable='2001-02',
      season_type_all_star='Regular Season'
      )

      The default is set to ‘PTS’, so I assume that would only include the made shots.

      Like

  3. Hi
    When i following your steps I get data only about shots made, not about missed like in your data example. Is there something that im doing wrong or it is not possible to get missed shots data anymore? My code in python looks like this:

    from nba_api.stats.endpoints import shotchartdetail
    import json
    import pandas as pd

    response = shotchartdetail.ShotChartDetail(
    team_id=0,
    player_id=0,
    season_nullable='2001-02',
    season_type_all_star='Regular Season'
    )

    content = json.loads(response.get_json())

    results = content['resultSets'][0]
    headers = results['headers']
    rows = results['rowSet']
    df = pd.DataFrame(rows)
    df.columns = headers

    # write to csv file
    df.to_csv(r'C:\Users\Comp\Desktop\nba_2001-02.csv', index=False)

    Like

    1. Hi Tom,

      As a reference, check the NBA API source code and documentation on Github: https://github.com/swar/nba_api.

      You’ll have to change the setting for `context_measure_simple` when making the API call, i.e.:

      response = shotchartdetail.ShotChartDetail (
      context_measure_simple='FGA',
      team_id=0,
      player_id=0,
      season_nullable='2001-02',
      season_type_all_star='Regular Season'
      )

      The default is set to ‘PTS’, so I assume that would only include the made shots.

      Like

      1. Thank you for your answer. If I may have one more question, when I write first question it worked fine, but now i get this error:
        json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
        Is this mean that as you point in the article NBA block or change something and now and it is not possbile for now? It is first time i use any api so sorry if these questions have an some obvious answer.

        Like

  4. Hi Daniel!
    I see you provided a data set available for download for the 2018-2019 season called “nba_shotchartdetail_2018-19.csv”. Would you be able to provide the data for 2008-2009 season in the same .csv format? I am having a lot of trouble downloading the nba API through python.
    Thank you!

    Like

Leave a Reply to jyl6557 Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.