All posts

Two Instructions, Two Teams, One Murder Case

A user typed two paragraphs of Chinese. Team Agent produced five expert prompts, ran a six-phase puzzle game, authored a 40KB design brief, and assembled 49 HTML slides — without a single config file.

A 32-year-old pianist found dead at her own piano. Door locked from inside. No wounds, no poison, no struggle. A cup of water. The husband in the next room. The lead AI got this scenario and assembled five specialist agents who would spend the next hour asking the referee yes/no questions — a referee constrained to exactly four answer types.

The human who set this in motion typed two instructions in Chinese. The total input across the entire session: roughly 600 words. The output: 49 slides, 14 original images, and a complete triple-twist murder case.


The Two Instructions

The session opened with the /team-agent skill. The user wrote, in Chinese:

"I want an expert-league Turtle Soup session, 6-person setup (you + 5 puzzle experts). No need to explain the rules. You create the puzzle, play the referee. I'm the client and observer — I'll just read the debrief."

The message went on to specify five expert roles with model assignments, a six-phase workflow structure, and the referee constraint. But the user did not write a single system prompt. They described the shape of the game in natural language and left everything else to the lead.

When the game concluded, the user typed the second instruction:

"Turn the whole process into an animated HTML presentation. Spin up two more Agents — both Codex, because they can generate images. Have them draw key images like character portraits, embed them into the HTML to make it more vivid. Multiple pages are fine. Get everything valuable from above into it."

Sixty-eight Chinese words. One paragraph. No designer hired. No image brief written.


Team #1 — Five Detectives and a Referee

The lead authored the puzzle. Lin Wan (林婉), 32, pianist, carrier of familial Long QT Syndrome type 2 (LQT2, KCNH2 gene mutation). Found dead at her Steinway on a March morning. Her husband Chen Zhiyuan (陈志远), also a musician, was working in the adjacent recording studio. Door locked from inside — by Lin Wan herself. No external trauma. The medical examiner called it cardiac arrest.

Five experts received the case file:

The referee answered only four ways: yes / no / not relevant to the puzzle / please rephrase.

The workflow ran six phases: puzzle announcement, private initial judgments (fifteen questions submitted independently, unseen by the others), rotating questions with peer pushback, midpoint hypothesis battle (five V2 theories, mutual review, disagreement allowed), convergence sprint, collective reveal.

In Batch 2, forensic submitted two questions that came back yes-yes: Is the victim's sudden cardiac death related to an auditory trigger? Is LQT2 the physiological mechanism? The case pivoted. Chen Zhiyuan had played Lin Wan's own experimental recording — with a sharp vocal scream embedded at the tail — through the shared studio wall. Lin Wan thought she was alone. The sound triggered torsades de pointes. Death in seconds. Physically untraceable.

The second reversal came from forensic and detective working in tandem: Chen Zhiyuan's "suicide by hanging" had a second ligature mark horizontal across the neck, inconsistent with self-suspension, and chlordiazepoxide 0.4‰ in his blood — a drug he didn't take. He had been sedated and strangled before being staged as a suicide.

The third reversal surfaced in the midpoint hypothesis battle. Reporter confronted psych directly: "You still wrote 'agent or producer' in your V2 despite two clear no answers on those categories. You buried your own question under an old hypothesis." Psych replied: "Clinical error. I accept." What followed was the sharpest exchange in the session: psych introduced the concept of grandiose possession — a form of controlling attachment that seeks not financial gain but the sole right to define a person's meaning after death. Reporter named it: the entire team adopted this framing.

The real killer: Zhou Shen (周慎), Lin Wan's first love. The silent owner of the independent label. He had spent ten years constructing the circumstances — induced Chen Zhiyuan to kill Lin Wan for the insurance and copyright money, then killed Chen and staged it as a guilty conscience. The "memorial album" was not a business. It was a love letter to a woman he had decided would belong to no one else.


The Handover

When the reveal concluded, the user did not write a config file. They did not shut down Team #1, clear the workspace, or specify what a creative production team would look like. They wrote one paragraph asking for an HTML presentation. The lead decided Team #1's objective was complete, assessed what a production-phase team would require, and transitioned the workspace accordingly. No YAML. No DAG.


Team #2 — One HTML Designer and Two Image Generators

The second team:

Claude and Codex are models from different labs, trained on different objectives. The html_designer wrote text, structured narrative flow, and CSS. The artists generated images using Codex's built-in image generation tooling. The lead wrote a 40KB design brief — encoding the full case narrative, Nordic noir visual palette, typographic scale, and 14 image prompts with precise aspect ratios. The user wrote none of that.

When artist_char encountered a server error generating char_detective.png, it observed the rate-limit cooldown protocol — waited 90 seconds, retried — without any user intervention.


The Result

deck.html is 49 pages. It walks through the case from the puzzle statement to the final reveal — the question matrices, the yes/no attribution per expert, the midpoint theory collision, the grandiose possession breakthrough, and the three-act reversal sequence. Character portraits and scene illustrations are embedded throughout. Total size: 88KB.

The presentation is embedded below. Use arrow keys or click to navigate slides.

The completed presentation — produced by Team Agent with no user input after "turn this into an HTML." Open full deck

Why This Matters

Cross-lab capability composition. No single-vendor system could have produced this output in this form. Claude wrote the narrative architecture and prose. Codex generated the image assets. Team Agent assembled them in one workspace, with the lead deciding which work went to which provider. The user did not orchestrate this split — they described a goal.

Light orchestration. The game specification was roughly 500 words of natural-language Chinese. The production request was 68. Between those two messages, the system wrote five expert system prompts, ran a six-phase puzzle game with 30+ referee exchanges, produced a 40KB design brief, submitted 14 image generation prompts, and assembled 49 HTML slides. No YAML was written by the user. No agent role docs were authored by the user. The lead inferred all of it.

Bandwidth layering. Without team-agent, a single person running this session would be managing expert dialogue, referee logic, theory evaluation, design decisions, and image prompts — in sequence, in one context window. With team-agent, these tasks ran across separate workers in parallel, their verbosity contained inside the team. The user's context saw a debrief report.

The transcript exists. The deck is embedded above. If you have something that would take a full afternoon to coordinate, that is the right entry point.