Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FInding & Creation of Datasets Regarding Team Substitution #258

Open
ahmadmunim opened this issue Oct 30, 2024 · 10 comments
Open

FInding & Creation of Datasets Regarding Team Substitution #258

ahmadmunim opened this issue Oct 30, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@ahmadmunim
Copy link

Task:
Find sources of team data and logs with regards to team substitution and whether the result of said substitution was a success or failure.

My plan:
The domain I plan on selecting for this task is basketball or hockey games
I will check out public apis, public datasets, research papers, and game logs for relevent data

@hosseinfani hosseinfani added the enhancement New feature or request label Nov 2, 2024
@ahmadmunim
Copy link
Author

@hosseinfani
Here is what I've found that we can potentially use:

Created a website to help visualize NBA players' rotation patterns for the current season (2023-24)

  • Website that visualizes player rotations in basketball games. It shows when players are playing and not playing at a particular time of the game. The grey bar indicates the player is playing and just a line indicates the player is not playing. At the bottom, you'll see the time of the game. It also shows the scoring margin (the green line) throughout the game to show which players were playing during runs and momentum swings. Good or bad. However, this doesn’t support lots of games (around 30-40 per team). The developer of this platform used an open source NBA API for getting the data that was used to make this visualization. I will look into the open source NBA API I linked earlier.

image

NBA Games Box Scores and Play-by-play

  • Offers users box scores and play by play logs of every game that has been played. This also shows all the substitutions that have occurred in the game as well as score updates and who scored them. It also shows which players did bad things in the game such as turnovers and committing fouls. May need to use a web scraper.

image

Is this a good start? Are there other things I need to search? Let me know what you think.

@hosseinfani
Copy link
Member

Hi @ahmadmunim
These are very interesting. I think this is what we want. We could say that from t1 to t2, a set of players have changed, and the score went up (success), or down (fail).

The only concern I have is that, it is not like soccer that when a player is substituted, the player won't get back to the field (a complete substitution). Here in basketball, a player can come and go multiple times, right? a temporary substitution?

@ahmadmunim
Copy link
Author

@hosseinfani
Yes, a substitution in a basketball game isn't permanent. However, I believe the impact of a substitution in a basketball game is easier to track compared to a soccer game.

This is because scoring in a basketball game is more frequent. So it's easier to see a change in scoring correlating to substitutions. In a soccer game, scoring is not as frequent and the pacing of the game is much slower. Thus, it's harder to track the impact of a substitution. I feel like often times, even if scoring does occur after a substitution, it may not be because of the substituted player.

By using data of basketball games, I imagine that it would be easier to train a model where the impact of substitutions is easier to analyse.

@hosseinfani
Copy link
Member

@ahmadmunim Agree.
So, the next step will be obtaining the raw data as collections of games and the temporal events within each game, including the substitutions and (score, time) pairs.

After obtaining this collection, we can form the sets of players, teams (possibly subteams, ie., the team until the time t), ... and do more preprocessing based on our team class.

@ahmadmunim
Copy link
Author

@hosseinfani
I have found an open source web scraper which can extract play-by-play events among other statistics of anything basketball related from basketball-reference.com.

I believe this is a good web scraper to use for this project mainly because the repository is still being updated and it's user-friendly.

Here is the documentation of the web scraper.

I played around with the scraper's features and got it to output a json file containing a list of events during a basketball game. Examples of said events include scoring, substitutions, turnovers, and missed shots as well as which player from which team performed said event.

The JSON's fields are the following:

        "away_score": number of points the away team scored at this point,
        "away_team": name of away team,
        "description": the description of the event that occured including which player performed said event,
        "home_score": number of points the home team scored at this point,
        "home_team": name of home team,
        "period": what period the game is currently on (1, 2, 3, or 4),
        "period_type": type of period (quarter or overtime),
        "relevant_team": which team the player is from who performed the event,
        "remaining_seconds_in_period": amount of time left in the period in seconds (720 seconds per period)

Below, I've attached a JSON file containg the play-by-play log of a basketball game:
2018_10_06_BOS_PBP.json

I'm thinking about how many of these play-by-play logs we need. I think the more the merrier right? Should I collect data of several NBA teams or just one team?

Let me know what you think.

@hosseinfani
Copy link
Member

hosseinfani commented Nov 29, 2024

@ahmadmunim thank you!
I think we can start with 1 team, and monitor it's success (being ahead of the other team), and failure (being behind the other team) with different games.

But a quick question, I had a look at the file, there is no info about substitutions?!

@ahmadmunim
Copy link
Author

@hosseinfani if you look at the description field, there are instances of substitutions.

The substitutions are phrased as "Player A enters the game for Player B".

@hosseinfani
Copy link
Member

hosseinfani commented Nov 29, 2024

@ahmadmunim
Yes, I see now. Basically, all the events of each period are logged. I am assuming that the remaining_seconds_in_period is always decreasing (ordered events). Nice.

Can you do another preprocessing to extract such substitutions and whether such substitution was a success or failure by analyzing the home/away team scores?

Like this

{
        "away_score": 21,
        "away_team": "PHILADELPHIA 76ERS",
        "description": "R. Covington enters the game for T. McConnell",
        "home_score": 30,
        "home_team": "BOSTON CELTICS",
        "period": 2,
        "period_type": "QUARTER",
        "relevant_team": "PHILADELPHIA 76ERS",
        "remaining_seconds_in_period": 566.0
    },
    {
        "away_score": **21**,
        "away_team": "PHILADELPHIA 76ERS",
        "description": "L. Shamet enters the game for J. Redick",
        "home_score": 30,
        "home_team": "BOSTON CELTICS",
        "period": 2,
        "period_type": "QUARTER",
        "relevant_team": "PHILADELPHIA 76ERS",
        "remaining_seconds_in_period": 566.0
    },
  {
        "away_score": **23**,
        "away_team": "PHILADELPHIA 76ERS",
        "description": "M. Fultz makes 2-pt jump shot from 13 ft",
        "home_score": 30,
        "home_team": "BOSTON CELTICS",
        "period": 2,
        "period_type": "QUARTER",
        "relevant_team": "PHILADELPHIA 76ERS",
        "remaining_seconds_in_period": 544.0
    },

But one thing is not clear, the players are belonging to what teams, home or away?

@ahmadmunim
Copy link
Author

@hosseinfani
For every event, you have the "relevant_team" field whose value is the team of the player who performed the event. The team can either be the home team or the away team.

@ahmadmunim
Copy link
Author

@hosseinfani

I have expanded on my script. It can now extract substitution events from the JSON object of game logs that I showed in a previous comment.

I also found something that can be helpful. There is an existing statistic known as plus/minus which measures a player's impact on a game by tracking the change in score when said player is on the court. The higher the player's plus/minus, the greater the player's impact.

However, a player's plus/minus can be affected by other aspects of their game such as passing and rebounding as well as the performance of their teammates on the court with them. So this statistic isn't based solely on the player's ability to score.

But, I believe the statistic I mentioned coincides with the supposed outcome of my current task.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants