Data Science Essentials
Machine learning, predictive modeling and storytelling, with a capstone
What you'll be able to do
- Run an end-to-end data-science workflow
- Apply statistics and hypothesis testing
- Build and evaluate predictive models
- Communicate results with compelling visualizations
Before you start
- Python fundamentals
- Basic statistics and algebra
- Comfort working with data in spreadsheets or code
Phase 1 · Foundations
Python & Statistics for Data Science
NumPy/Pandas plus the inferential statistics (hypothesis tests, confidence intervals) that ML rests on.
- Kaggle: Python + Pandas (free)coursefree
- StatQuest with Josh Starmer (YouTube)videofree
- Think Stats (free book)bookfree
- Run a t-test and interpret it
- Bootstrap a confidence interval
- EDA with summary statistics
Data Wrangling & Feature Engineering
Reshaping, joining, encoding, scaling, and building reproducible preprocessing pipelines.
- Kaggle: Feature Engineering (free)coursefree
- Scikit-learn: Preprocessing Guidedocfree
- ColumnTransformer pipeline
- Target vs. one-hot encoding
- Leakage-free train/test prep
Phase 2 · Machine Learning
Supervised Learning
Regression, classification, trees, ensembles, and rigorous model evaluation.
- Andrew Ng: ML Specialization (Coursera)coursefree
- Kaggle: Intermediate ML (free)coursefree
- Hands-On ML (O'Reilly)bookpaid
- Cross-validation + tuning
- Gradient boosting (XGBoost) model
- Top 20% on a Kaggle competition
Unsupervised Learning & Intro to Deep Learning
Clustering, dimensionality reduction, and a first neural network with Keras.
- Kaggle: Intro to Deep Learning (free)coursefree
- Scikit-learn: Clustering Guidedocfree
- K-means + silhouette score
- PCA for visualization
- Train a small neural net
Phase 3 · Big Data, Storytelling & Capstone
Big Data Tools & Storytelling
Working at scale with Spark basics, plus communicating results that drive decisions.
- Databricks: Apache Spark tutorials (free)coursefree
- Storytelling with Data (book)bookpaid
- PySpark DataFrame transforms
- Narrative deck from an analysis
- Tailor a result for executives
Capstone: Predictive Modeling Project
A complete project: problem framing, data, model, evaluation, and a written report.
- Kaggle Competitionslinkfree
- UCI Machine Learning Repositorylinkfree
- Baseline + improved model
- Error analysis writeup
- Publish notebook + README
Frequently asked
Is the Data Science Essentials roadmap free?+
Yes. The entire Data Science Essentials roadmap and every curated resource is free to follow on Commit. You can track your progress, keep a daily streak, and earn a shareable certificate at no cost — there is no paywall.
How long does the Data Science Essentials roadmap take to complete?+
About 150 hours of focused study across 6 courses and 3 stages. At roughly one hour a day that is about 5 months; you can move faster by studying more each day.
Do I get a certificate for finishing the Data Science Essentials roadmap?+
Yes. When you complete the roadmap on Commit you receive a verifiable certificate of completion that you can add to LinkedIn and your public Commit profile as proof of what you finished.
Related roadmaps
Make it stick
Copy this roadmap into Commit and turn it into a tracked program with a streak graph, study logging, and a shareable certificate when you finish. Free forever.
Start Data Science Essentials free