Syllabus

All Materials, Lectures and Assignments (along with the deadlines) are provided here.

Text Book:

Various interesting and useful topics that will be touched during the course are discussed in the following textbooks.
  • Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman, Cambridge University Press, 2nd Edition, 2014. Download for Free.
  • Han, Jiawei, Jian Pei, and Hanghang Tong, Data mining: Concepts and Techniques, 4th Edition, Morgan Kaufmann, 2022.
  • Robinson, Ian, Jim Webber, and Emil Eifrem, Graph Databases: New Opportunities for Connected Data, 2nd Edition, O'Reilly Media, Inc., 2015.
  • Materials and Chapters will be referred when required

Lectures

Event Date Lecture Suggested Readings Assignments and Deadline
Lecture 1 -- Topics: (no slides)
  • Formal introduction
  • Course details
  • Syllabus
Lecture 2 -- Topics: (slides)
  • What is data mining? Tasks, workflow (CRISP-DM), pitfalls, case studies
Lecture 3 -- Topics: (slides)
  • What is data mining? Tasks, workflow (CRISP-DM), pitfalls, case studies
Lecture 4 -- Topics: (slides)
  • Data understanding types, distributions, missingness, leakage; EDA essentials
Lecture 5 -- Topics: (slides)
  • Data understanding types, distributions, missingness, leakage; EDA essentials
Lecture 6 -- Topics: (slides)
  • Data preprocessing cleaning, encoding, scaling, outliers; train/val/test splits
Lecture 7 -- Topics: (slides)
  • Data preprocessing cleaning, encoding, scaling, outliers; train/val/test splits
Lecture 8 -- Topics: (slides)
  • Feature engineering transformations, interaction features, dimensionality reduction intuition
Lecture 9 -- Topics: (slides)
  • Feature engineering transformations, interaction features, dimensionality reduction intuition
Lecture 10 -- Topics: (slides)
  • Similarity & distance metrics for numeric/categorical/text; curse of dimensionality
Lecture 11 -- Topics: (slides)
  • Similarity & distance metrics for numeric/categorical/text; curse of dimensionality
Lecture 12 -- Topics: (slides)
  • Classification I kNN, Naive Bayes; bias–variance intuition; baseline setting.
Lecture 13 -- Topics: (slides)
  • Classification I kNN, Naive Bayes; bias–variance intuition; baseline setting.
Lecture 14 -- Topics: (slides)
  • Classification II logistic regression, SVM intuition; calibration; thresholding
Lecture 15 -- Topics: (slides)
  • Classification II logistic regression, SVM intuition; calibration; thresholding
Lecture 16 -- Topics: (slides)
  • Decision trees CART, impurity, pruning; interpretability; feature importance caveats
Lecture 17 -- Topics: (slides)
  • Decision trees CART, impurity, pruning; interpretability; feature importance caveats
Lecture 18 -- Topics: (slides)
  • Ensembles bagging, random forests, boosting (GBDT intuition); when/why they win
Lecture 19 -- Topics: (slides)
  • Ensembles bagging, random forests, boosting (GBDT intuition); when/why they win
Lecture 20 -- Topics: (slides)
  • Model evaluation confusion matrix, ROC/PR curves, cross-validation, imbalanced data
-- -- (Feb 25) Last Date for Proposal Submission.
-- -- Mid Semester Exam Week Best of Luck.
Lecture 21 -- Topics: (slides)
  • Model evaluation confusion matrix, ROC/PR curves, cross-validation, imbalanced data
Lecture 22 -- Topics: (slides)
  • Regression mining linear/regression trees; regularization; metrics (RMSE/MAE).
Lecture 23 -- Topics: (slides)
  • Regression mining linear/regression trees; regularization; metrics (RMSE/MAE).
Lecture 24 -- Topics: (slides)
  • Clustering I k-means, initialization, choosing k; evaluation (silhouette).
Lecture 25 -- Topics: (slides)
  • Clustering I k-means, initialization, choosing k; evaluation (silhouette).
Lecture 26 -- Topics: (slides)
  • Clustering II hierarchical clustering, DBSCAN; density-based intuition; noise handling
Lecture 27 -- Topics: (slides)
  • Clustering II hierarchical clustering, DBSCAN; density-based intuition; noise handling
Lecture 28 -- Topics: (slides)
  • Dimensionality reduction PCA, t-SNE/UMAP intuition; visualization vs modeling
Lecture 29 -- Topics: (slides)
  • Dimensionality reduction PCA, t-SNE/UMAP intuition; visualization vs modeling
Lecture 30 -- Topics: (slides)
  • Association rules frequent itemsets (Apriori/FP-growth), support/confidence/lift.
Lecture 31 -- Topics: (slides)
  • Association rules frequent itemsets (Apriori/FP-growth), support/confidence/lift.
Lecture 32 -- Topics: (slides)
  • Anomaly detection statistical, LOF, isolation forest; evaluation challenges
-- -- (Mar 30) - Last Date of Mid Presentation.
Lecture 33 -- Topics: (slides)
  • Anomaly detection statistical, LOF, isolation forest; evaluation challenges
Lecture 34 -- Topics: (slides)
  • Text mining basics TF-IDF, topic modeling (LDA intuition), keyword extraction
Lecture 35 -- Topics: (slides)
  • Text mining basics TF-IDF, topic modeling (LDA intuition), keyword extraction
Lecture 36 -- Topics: (slides)
  • Time series mining trend/seasonality, forecasting baselines, anomaly detection in time
Lecture 37 -- Topics: (slides)
  • Time series mining trend/seasonality, forecasting baselines, anomaly detection in time
Lecture 38 -- Topics: (slides)
  • Ethics & governance bias, privacy, explainability, responsible reporting
Lecture 39 -- Topics: (slides)
  • Ethics & governance bias, privacy, explainability, responsible reporting
Lecture 40 -- Topics: (slides)
  • End-to-end Mining Project
Lecture 41 -- Topics: (slides)
  • Wrap-up
Lecture 42 -- Topics: (slides)
  • Wrap-up
-- -- (Apl 30) Last Date of Codes Submission.
-- -- (Apl 30) Dead Line for Final Presentation Video Submission.
-- -- (Apl 30) Dead Line for Report Submission.
-- -- End Semester Exam Week Best of Luck.
Link Added on Last Date for Submission :