Agentic & multimodal systems

Presentation defense with live Q&A

Real-time voice flow where the user delivers their work to an AI audience, fields live follow-up questions tied to what they actually said, and receives a structured verdict at the end.

What it is

A real-time voice presentation flow where the user stands up and delivers their work. An AI audience listens to the whole thing, then steps into a live Q&A where it asks follow-up questions tied to specific points the user actually made (not generic prompts), pushing back where the reasoning was thin. The session closes with a structured feedback report that names what the user did well, what landed, what did not, and what to push on next time, delivered conversationally rather than dropped as a wall of text.

What it's for

Most of the platform reads the user through what they write or click. A presentation is the highest-signal way to see whether they actually understood: they have to explain it in their own words, then defend it when someone pushes on a weak point. That moment is what the platform builds toward, so it has to handle live questioning without breaking the realism. If the AI fumbles on the third question, you cannot use this as interview rehearsal.

How it was built

A LiveKit voice session split into four phases. Intro: the AI greets the user, sets the topic, and signals it is listening. Presentation: the user delivers, and every utterance lands in the presentation_sessions table per turn with a timestamp, role, and phase tag, so a mid-session crash never loses what was said. Q&A: when the user wraps, the AI reads back the full transcript, picks the points that need pressure (claims that were thin, jumps that were not justified, choices the user made without showing why), and asks targeted follow-up questions in voice. The user can defend, hedge, or admit they do not know, and each answer feeds back into the next question. Feedback: the AI generates a structured report covering strengths, weaknesses, and stretch areas, then delivers it conversationally rather than dumping a list. At session close, the engine fans out to the performance calibration (the heavy LLM that rewrites the user's work-skills snapshot off the full transcript), the journey map (a single handoff entry naming what this session said about the user that the rest of the platform should carry forward), and the experience-points engine. Deepgram handles speech-to-text, ElevenLabs handles voice with streaming TTS so the AI starts replying before its full sentence is composed, and the session itself runs as a LiveKit room with WebRTC transport.

My role

Sole author. Built the four-phase machine (intro, presentation, Q&A, feedback), the per-utterance transcript persistence, the live Q&A generator that pulls questions from what the user said, the end-of-session feedback report, and the fanout into performance scoring and the journey map at session close.

Built with

LiveKitClaudeDeepgramElevenLabsStreaming TTSWebRTCPythonSupabase

Want the full technical depth, the tradeoffs, what broke, what I'd do differently? Ask the agent about this project.

More projects Talk through it