Evaluation & calibration

NPC walkthrough of the stage submission verdict

Realtime feedback flow where the user's manager NPC walks them through the stage-submission verdict with audio, evidence highlights on the original file, and a lazy positives sequence.

What it is

A realtime feedback walkthrough that runs after a stage submission lands its verdict. The same manager NPC from the user's scenario team picks up the verdict and walks the user through it conversationally: what the issues are, what was missing in style, what the user did well, and how to push for exceeding expectations. Each item ties back to a visible region on the original submission file so the user can see what the manager is talking about. The walkthrough has audio so the user can hear it, not just read it.

What it's for

A written rubric verdict is a wall of text. Most users skim it, miss the actual point, and stay stuck on the same issues. A manager walking them through it, voice over the file, pointing at the actual region of the work that needs the fix, is how someone learns from the verdict instead of bouncing off it. Keeping the manager the same persona the user has been working with (not a new feedback bot) means the verdict feels like part of the conversation, not a separate machine output.

How it was built

The walkthrough consumes the structured verdict directly from the stage submission endpoint, so it never re-evaluates the work or burns tokens reproducing the verdict. It routes each verdict item (issues from content, style critiques, what-you-did-well, how-to-exceed-expectations) through a prompt that puts the manager NPC in the right voice for this scenario and generates a short narrative script per item. Each script attaches a visual evidence region pulled from the original submission file (PDF or image) so the audio and the picture stay in sync when the user plays it back. On the initial call, only the issues walkthrough is generated, to keep token and audio costs down. Positives are stored alongside but not narrated until the user clicks to see them, at which point a separate on-demand workflow loads the stored items, attaches the evidence regions, and generates the positives script and audio. The walkthrough runs inside the same conversation engine as Do Mode, so the manager carries the same persona and memory in.

My role

Major contributor on the walkthrough prompt, the v2 prompt rewrite, the evidence-highlighting layer that ties each comment to a region on the original submission file, and the on-demand positives split that defers token and audio cost until the user wants it. Audio generation itself is owned by another contributor.

Built with

PythonFastAPIGeminiElevenLabsPDF and image extraction

Want the full technical depth, the tradeoffs, what broke, what I'd do differently? Ask the agent about this project.

More projects Talk through it