Agentic & multimodal systems

Learn Mode, sibling flow to Do Mode

Guided understanding flow where the manager walks the user through what they need to grasp for the deliverable, sequenced with pre-generated videos and an optional cowork session at the end.

What it is

A supportive learning flow the user steps into when they get stuck doing the work in Do Mode. The same manager from their fictional company team takes them off the task for a moment and walks them through everything they need to understand to complete the deliverable. It is structured: each concept the stage was supposed to teach comes with a pre-generated video, and the videos play in the order that builds up to the deliverable. After the videos, the manager offers a cowork session so the user can practice what they just watched with someone beside them.

What it's for

Pushing through a task you do not understand teaches the wrong lesson. The user needs a way to step out of the work, get the missing concepts in the right order, and come back to Do Mode without losing where they were. Learn Mode is the sibling flow to Do Mode: same manager, same memory of the user, different goal (teach the concepts, instead of pushing through the deliverable). Without it, the only options are guess and pass, or guess and fail.

How it was built

Triggers when the user explicitly asks the team for help in Do Mode or when the engine routes them into Learn Mode on their behalf. The manager pulls the list of understanding points the stage was supposed to teach, picks the right pre-generated video for the first gap, and plays it. When the video ends, the next one queues up in the order the deliverable needs them. After the sequence, the manager offers a cowork session so the user can talk through what they saw with someone beside them. The flow runs inside the same conversation engine as Do Mode, sharing the persona, the memory, and the user's running profile, so the manager does not feel like a different person when the mode switches. Time-based nudges (after 20, 30, and 45 minutes in Learn Mode) check in with the user so the session does not drift. Every utterance still feeds back into the platform's read of the user, including how much they leaned on Learn Mode for the stage, which the rest of the platform reads as an independence signal.

My role

Co-contributor on the routing into Learn Mode from Do Mode, the video-sequencing logic, and the wiring back into the platform's read of the user when the session ends.

Built with

PythonFastAPIWebSocketsGeminiTemporalPre-generated video sequencing

Want the full technical depth, the tradeoffs, what broke, what I'd do differently? Ask the agent about this project.

More projects Talk through it