Getting Started

List of Probable Project Topics

Image-to-Report Generator (Medical-lite): generate a structured report from X-ray-like images (or public chest X-ray datasets) with disclaimers.
Visual Q&A Tutor: user uploads a diagram (circuit/biology/graph) and the app explains + answers questions.
Receipt/Invoice Understanding Assistant: extract fields + summarize spending + flag anomalies from invoice images/PDFs.
Multimodal Customer Support Bot: troubleshoot using product photos + user text (“my router lights look like this…”).
Slide-to-Study Notes Generator: take lecture slide images and produce clean notes + quiz questions.
Video Highlight & Caption Generator: summarize a short video and auto-generate chapters + captions.
Audio Meeting Summarizer with Action Items: speech-to-text + summary + tasks + deadlines.
Document + Figure Explainer: parse a research PDF and explain figures/tables in simple language.
Personalized Accessibility Tool: convert image-heavy content into spoken explanations and simplified text.
Multimodal Sentiment/Emotion Analyzer: combine facial cues (video frames) + voice tone + text for emotion trends (ethics-first).
“Ask My Lab Notebook”: query experiment photos + handwritten notes (using synthetic/public data) + generate steps & materials.
E-commerce Style Finder: upload an outfit image → generate captions, tags, and “similar style” textual descriptions.
Food Calorie Estimator (Approx.): image-based food recognition + rough nutrition summary + confidence + disclaimers.
Robotics Instruction Parser (Simulated): interpret a scene image + instruction text → output step-by-step action plan.
Multimodal RAG Knowledge Assistant: retrieve across PDFs, images, and transcripts; answer with citations to sources.
Safety/Compliance Content Checker: detect policy issues in ad creatives (image + text) and suggest safer alternatives.
Handwritten Form Digitizer: extract fields from scanned forms and validate against rules (DOB format, totals, etc.).
AR/VR Scene Narrator (Prototype): describe objects and relationships in a camera feed and generate guidance.

Project

Instructors

Time and Location

Getting Started

List of Probable Project Topics