NBA MVPredictor
🏀

NBA MVPredictor

Tags
Data Science
Jupyter
GitHub
Projects
Machine Learning
Published
Author

Project Overview

Motivation:

This project was completed as part of my Introduction to Data Science course, where we focused on creating a tutorial that walks users through the entire data science pipeline: data curation, parsing, and management; exploratory data analysis; model building as either hypothesis testing and/or machine learning; and then the curation of a message or messages covering insights learned during the tutorial. My project involves analyzing NBA statistics to predict the league's Most Valuable Player (MVP) and understanding the most influential factors in determining the award.

Important Links:

Class Project Website: Final Tutorial Project Page
Final Portfolio (GitHub Pages): GitHub Pages Website
GitHub Repository: GitHub Repository

Project Idea and Goals:

For my project, I will be analyzing datasets containing various NBA statistics from many seasons. I hope that, through ETL, EDA, and Model Building, I will be able to predict who will be the NBA's Most Valuable Player, and perhaps determine what feature or set of features has the most weight in determining the MVP. I will be doing my coding in Python on Google Colaboratory and uploading it here on GitHub. Some of the libraries that will be used include PandasNumPySQLSeaborn, and more.

Project Workflow:

  1. ETL (Extract, Transform, Load):
      • Gathered NBA statistics from publicly available datasets.
      • Cleaned and transformed the data to ensure consistency and accuracy.
      • Dealt with missing values and standardized metrics for analysis.
  1. Exploratory Data Analysis (EDA):
      • Visualized trends in player performance over multiple seasons.
      • Analyzed correlations between various player statistics and MVP winners.
      • Highlighted key patterns using Python libraries like Seaborn and Matplotlib.
  1. Model Building:
      • Experimented with machine learning models to predict the MVP.
      • Conducted feature selection to identify the most influential attributes.
      • Evaluated model performance using metrics like accuracy, precision, recall, and F1-score.
  1. Findings:
  • Rebounding metrics, scoring efficiency, and advanced stats like VORP (value over replacement player) were the most influential predictors.

Project Review

Skills Gained:

  • Data Wrangling: Cleaning and transforming complex datasets.
  • Statistical Analysis: Identifying trends and drawing meaningful conclusions.
  • Machine Learning: Building and evaluating predictive models.
  • Visualization: Creating compelling graphics to communicate insights.
  • Technical Communication: Documenting and presenting findings effectively.

Reflection:

Completing this project reinforced the importance of data quality and thoughtful feature engineering. It also highlighted the challenges of building models that generalize well. I now better appreciate the iterative nature of data science projects and the importance of the balance between technical precision and storytelling to communicate your findings.
 

Gallery:

notion image
notion image
notion image
Model
Accuracy
ROC AUC
Precision (Non-MVP)
Recall (Non-MVP)
F1 (Non-MVP)
Precision (MVP)
Recall (MVP)
F1 (MVP)
Base Model
97.97%
0.964811
0.98
1.00
0.99
0.56
0.26
0.36
Feature-Engineered Model
98.09%
0.956272
0.98
1.00
0.99
0.62
0.26
0.37
Tuned Model
94.03%
0.994489
1.00
0.94
0.97
0.26
0.95
0.40
New Model
98.99%
0.993519
0.99
1.00
0.99
0.86
0.63
0.73