| Lecture 1 |
-- |
Topics: (no slides)
- Formal introduction
- Course details
- Syllabus
|
|
|
| Lecture 2 |
-- |
Topics: (slides)
- Why big data 5Vs, scaling pain points; batch vs streaming; architecture patterns
|
|
|
| Lecture 3 |
-- |
Topics: (slides)
- Distributed systems basics partitioning, replication, consistency, fault tolerance
|
|
|
| Lecture 4 |
-- |
Topics: (slides)
- HDFS + MapReduce (conceptual) data locality, jobs, combiner; classic wordcount.
|
|
|
| Lecture 5 |
-- |
Topics: (slides)
- HDFS + MapReduce (conceptual) data locality, jobs, combiner; classic wordcount.
|
|
|
| Lecture 6 |
-- |
Topics: (slides)
- YARN + Hadoop ecosystem overview Hive, HBase, Sqoop, Flume; when to use.
|
|
|
| Lecture 7 |
-- |
Topics: (slides)
- YARN + Hadoop ecosystem overview Hive, HBase, Sqoop, Flume; when to use.
|
|
|
| Lecture 8 |
-- |
Topics: (slides)
- Spark fundamentals RDD vs DataFrame; lazy evaluation; actions vs transformations
|
|
|
| Lecture 9 |
-- |
Topics: (slides)
- PySpark DataFrames I schema, reading/writing CSV/JSON/Parquet; basic ops.
|
|
|
| Lecture 10 |
-- |
Topics: (slides)
- PySpark DataFrames I schema, reading/writing CSV/JSON/Parquet; basic ops.
|
|
|
| Lecture 11 |
-- |
Topics: (slides)
- PySpark DataFrames II groupBy, joins, window functions; handling skew.
|
|
|
| Lecture 12 |
-- |
Topics: (slides)
- PySpark DataFrames II groupBy, joins, window functions; handling skew.
|
|
|
| Lecture 13 |
-- |
Topics: (slides)
- Spark SQL temporary views, SQL queries; optimization intuition
|
|
|
| Lecture 14 |
-- |
Topics: (slides)
- Performance tuning I partitions, caching, broadcast joins; explain plans
|
|
|
| Lecture 15 |
-- |
Topics: (slides)
- Performance tuning II shuffle, skew mitigation, salting, AQE; memory tuning basics
|
|
|
| Lecture 16 |
-- |
Topics: (slides)
- Data formats Parquet/ORC, compression; partitioned tables; lakehouse idea
|
|
|
| Lecture 17 |
-- |
Topics: (slides)
- Data formats Parquet/ORC, compression; partitioned tables; lakehouse idea
|
|
|
| Lecture 18 |
-- |
Topics: (slides)
- Special Topics Page Rank Algorithm and Implementation
|
|
|
| Lecture 19 |
-- |
Topics: (slides)
- Special Topics Page Rank Algorithm and Implementation
|
|
|
| Lecture 20 |
-- |
Topics: (slides)
- Special Topics Streaming Algorithms
|
|
|
|
-- |
-- |
(Feb 25) Last Date for Proposal Submission. |
|
|
-- |
-- |
Mid Semester Exam Week |
Best of Luck.
|
| Lecture 21 |
-- |
Topics: (slides)
- Special Topics Streaming Algorithms
|
|
|
| Lecture 22 |
-- |
Topics: (slides)
- Special Topics Data Structures for Big Data, kd Trees, Bloom Filter.
|
|
|
| Lecture 23 |
-- |
Topics: (slides)
- Special Topics Data Structures for Big Data, kd Trees, Bloom Filter.
|
|
|
| Lecture 24 |
-- |
Topics: (slides)
- Special Topics Decision Making in Distributed Systems and Algorithms, Game Theory and Advertisements.
|
|
|
| Lecture 25 |
-- |
Topics: (slides)
- Special Topics Decision Making in Distributed Systems and Algorithms, Game Theory and Advertisements.
|
|
|
| Lecture 26 |
-- |
Topics: (slides)
- Special Topics Language Embedding and Applications
|
|
|
| Lecture 27 |
-- |
Topics: (slides)
- Special Topics Language Embedding and Applications
|
|
|
| Lecture 28 |
-- |
Topics: (slides)
- Special Topics Graph Neural Networks
|
|
|
| Lecture 29 |
-- |
Topics: (slides)
- Workflow orchestration DAGs, retries, idempotency; Airflow concepts
|
|
|
| Lecture 30 |
-- |
Topics: (slides)
- Workflow orchestration DAGs, retries, idempotency; Airflow concepts
|
|
|
| Lecture 31 |
-- |
Topics: (slides)
- Data quality checks, expectations, anomaly detection; lineage basics
|
|
|
| Lecture 32 |
-- |
Topics: (slides)
- Streaming fundamentals event time vs processing time; watermarks; exactly-once intuition
|
|
|
|
-- |
-- |
(Mar 30) - Last Date of Mid Presentation. |
|
| Lecture 33 |
-- |
Topics: (slides)
- Kafka basics topics, partitions, consumer groups; offset management
|
|
|
| Lecture 34 |
-- |
Topics: (slides)
- Spark Structured Streaming sources/sinks, windows, state; demo pipeline
|
|
|
| Lecture 35 |
-- |
Topics: (slides)
- NoSQL overview key-value, document, columnar; HBase/Cassandra/Mongo tradeoffs
|
|
|
| Lecture 36 |
-- |
Topics: (slides)
- NoSQL overview key-value, document, columnar; HBase/Cassandra/Mongo tradeoffs
|
|
|
| Lecture 37 |
-- |
Topics: (slides)
- Graph & search Graph processing intro; Elasticsearch/OpenSearch conceptually
|
|
|
| Lecture 38 |
-- |
Topics: (slides)
- Graph & search Graph processing intro; Elasticsearch/OpenSearch conceptually
|
|
|
| Lecture 39 |
-- |
Topics: (slides)
- Security & governance access control, encryption, PII handling; audit trails
|
|
|
| Lecture 40 |
-- |
Topics: (slides)
- Security & governance access control, encryption, PII handling; audit trails
|
|
|
| Lecture 41 |
-- |
Topics: (slides)
- Capstone build a batch + streaming pipeline + tuning report
|
|
|
| Lecture 42 |
-- |
Topics: (slides)
|
|
|
|
-- |
-- |
(Apl 30) Last Date of Codes Submission. |
|
|
-- |
-- |
(Apl 30) Dead Line for Final Presentation Video Submission. |
|
|
-- |
-- |
(Apl 30) Dead Line for Report Submission. |
|
|
-- |
-- |
End Semester Exam Week |
Best of Luck.
|
|
Link Added on Last Date for Submission :
|