DA331 Big Data Analytics: Tools & Techniques

Big Data Analytics is one of the most highly sought-after skills in the industry. In this course, you will learn the foundations of Big Data Tools, understand how to build a scalable system, and learn how to lead successful deployment projects and solve critical problems. You will learn about Apache Spark, Hadoop, Apache Cassandra, MongoDb, Apache Airflow, Apache Kafka, and more.

Syllabus Lectures

Announcements

  • New Announcements added on Submitting Assignments
  • The Course Materials and Syllabus is available in Lectures.
  • The Course Website is Ready. New Updates will be reflected here.

Course Staff

Course Assistants

Logistics

Please take note of these:
  • Announcements are made during lectures. This may not be upto date.
  • Communication through emails is highly preferable. CC TAs for quicker response.
  • Other mode of communication is through Microsoft Teams Discuss group. Please let me know if you are not added to this group. Use MS Teams so that other students can be benefitted.
  • Important announcement will be made through Microsoft Teams Discuss group.
  • For longer discussions, visit only during Office Hours or through appointments.

Office Hours

Instructor Office Hours: Faculty Room, MDSAI, Tuesday 10:00-11:00. Wednesday 11:00-12:00

TA Office Hours: To be announced.

Class components

DA 331 has the following components:

  • In class lecture - twice a week (5103).
  • Lab Sessions - two hours (MDSAI Lab in CCC).
  • Assignments (mostly in the first half of the semester).
  • A midterm covering material from the first half of the semester.
  • The final project.
  • An Endterm covering material from the second half of the semester.

Prerequisites

Students are expected to have the following background.

  • Competitive Programming Skills.
  • Knowledge of Data Structure, Database Systems, Operating System.
  • Comfortable with Probability theory.
  • Comfortable with Linear algebra.

Grading

Breakdown

Here are the weightage breakdown:

  • 20%: Midterm
  • 20%: Programming Assignment
  • 20%: Endterm
  • 40%: Final project (broken into proposal (5%), mid-presentation (5%), codes in repository (10%), final demo/video (5%), and final report (15%))
  • 5%: Bonus: To be Announced

Each group will be of 2 people.

For Final project, it is very important for each group to do the followings:

  1. Get the project proposal approved before deadline.
  2. Communicate for any changes in proposal.
  3. Appear for 10 minutes mid-presentation with updates on the progress.
  4. Final Demo/Video Presentation.

Submitting Assignments

Programming Assignments should be submitted through github.

Final Project (code and final report) should be submitted through github.

Final Presentation Through Youtube

New Announcements

  1. For Lab, there should be clear indication (in report) of where the code is, the command to run it.
  2. If there is a script for data preparation, it should be included as well.
  3. No Need to include data or any hdfs file.
  4. Each code will be evaluated after running the code and based on the report. Codes related to data preparation should be included as well.
  5. All codes and report (in pdf file) should be pushed to repository. No hand written reports. No images of reports. No hard copy of report.
  6. No acceptance after deadline. The last commit/push will be considered for evaluation.
  7. Collaboration is encouraged for discussion but the coding and report should be done independently.
  8. Turnitin will be used whenever required.
  9. Copying of reports from "peers" and from the "internet" is strictly not permitted. If found, the case will be forwarded to appropriate disciplinary committee.
  10. Question related to assignment only during lab hours.

Late assignments

No Late Submission allowed for Programming Assignment. 0 point will be awarded for a late Submission.

For Final Project, 2 days are provided and can be used accordingly for any of the deliverables. Deliverables are Proposal, Mid-Presentation, Code Submission, Final Report Submission, Final Video Submission.

Honor code

We encourage students to form groups to discuss different topics. Students may discuss and work on programming assignments and quizzes in groups. However, each student must write down the solutions independently, and without referring to written notes from the joint session. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. In addition, each student should submit his/her own code and mention anyone he/she collaborated with.

Refer

CODE OF CONDUCT PLEDGE for IIT Guwahati