NBA ML Pipeline — Predict Player Performance

September 2, 2025 · NBASports AnalyticsMachine LearningData EngineeringPythonGoogle CloudDocker

An end-to-end, reproducible machine learning pipeline to forecast NBA player points from historical data.
The project combines data engineering 🏗️, feature engineering 🧩, and ML modeling 🤖, designed for both local experiments 💻 and cloud deployment ☁️.


🔑 Key Highlights

  • Data pipeline 🛠️: ingest NBA stats via nba_api; clean & build per-game player features (rolling means, usage/pace proxies, opponent strength, rest days).
  • Modeling 📊: baseline regressors → LightGBM; time-aware CV; metrics: MAE, RMSE.
  • Inference 🚀: batch predictions for upcoming slates; outputs: local CSV or BigQuery.
  • Engineering 🧱: modular repo; env pinning; Docker 🐳; deploy on Cloud Run ☁️; one-command run_all.sh.

🛠️ Pipeline Architecture

NBA ML WorkFlow


🔮 Next Steps

  • Add automatic retraining & evaluation 🔁.
  • Extend predictions to assists, rebounds, turnovers 🏆.
  • Integrate injury & lineup data 🏥.
  • Enhance explainability with SHAP values for feature impact 💡.