Two Instructions, Two Teams, One Murder Case

A 32-year-old pianist found dead at her own piano. Door locked from inside. No wounds, no poison, no struggle. A cup of water. The husband in the next room. The lead AI got this scenario and assembled five specialist agents who would spend the next hour asking the referee yes/no questions — a referee constrained to exactly four answer types.

The human who set this in motion typed two instructions in Chinese. The total input across the entire session: roughly 600 words. The output: 49 slides, 14 original images, and a complete triple-twist murder case.

The Two Instructions

The session opened with the /team-agent skill. The user wrote, in Chinese:

"I want an expert-league Turtle Soup session, 6-person setup (you + 5 puzzle experts). No need to explain the rules. You create the puzzle, play the referee. I'm the client and observer — I'll just read the debrief."

The message went on to specify five expert roles with model assignments, a six-phase workflow structure, and the referee constraint. But the user did not write a single system prompt. They described the shape of the game in natural language and left everything else to the lead.

When the game concluded, the user typed the second instruction:

"Turn the whole process into an animated HTML presentation. Spin up two more Agents — both Codex, because they can generate images. Have them draw key images like character portraits, embed them into the HTML to make it more vivid. Multiple pages are fine. Get everything valuable from above into it."

Sixty-eight Chinese words. One paragraph. No designer hired. No image brief written.

Team #1 — Five Detectives and a Referee

The lead authored the puzzle. Lin Wan (林婉), 32, pianist, carrier of familial Long QT Syndrome type 2 (LQT2, KCNH2 gene mutation). Found dead at her Steinway on a March morning. Her husband Chen Zhiyuan (陈志远), also a musician, was working in the adjacent recording studio. Door locked from inside — by Lin Wan herself. No external trauma. The medical examiner called it cardiac arrest.

Five experts received the case file:

forensic (Codex) — Forensic pathologist. Speciality: medical causality and hard biological logic.
insurer (Codex) — Insurance claims investigator. Speciality: financial motive and timeline of policy changes.
psych (Claude) — Clinical psychologist. Speciality: personality profiles, attachment patterns, the interior logic of obsession.
reporter (Claude) — Investigative journalist. Speciality: narrative threading, information gaps, time-sequence reconstruction.
detective (Claude) — Retired criminal investigator. Speciality: scene synthesis, combined intuition.

The referee answered only four ways: yes / no / not relevant to the puzzle / please rephrase.

The workflow ran six phases: puzzle announcement, private initial judgments (fifteen questions submitted independently, unseen by the others), rotating questions with peer pushback, midpoint hypothesis battle (five V2 theories, mutual review, disagreement allowed), convergence sprint, collective reveal.

In Batch 2, forensic submitted two questions that came back yes-yes: Is the victim's sudden cardiac death related to an auditory trigger? Is LQT2 the physiological mechanism? The case pivoted. Chen Zhiyuan had played Lin Wan's own experimental recording — with a sharp vocal scream embedded at the tail — through the shared studio wall. Lin Wan thought she was alone. The sound triggered torsades de pointes. Death in seconds. Physically untraceable.

The second reversal came from forensic and detective working in tandem: Chen Zhiyuan's "suicide by hanging" had a second ligature mark horizontal across the neck, inconsistent with self-suspension, and chlordiazepoxide 0.4‰ in his blood — a drug he didn't take. He had been sedated and strangled before being staged as a suicide.

The third reversal surfaced in the midpoint hypothesis battle. Reporter confronted psych directly: "You still wrote 'agent or producer' in your V2 despite two clear no answers on those categories. You buried your own question under an old hypothesis." Psych replied: "Clinical error. I accept." What followed was the sharpest exchange in the session: psych introduced the concept of grandiose possession — a form of controlling attachment that seeks not financial gain but the sole right to define a person's meaning after death. Reporter named it: the entire team adopted this framing.

The real killer: Zhou Shen (周慎), Lin Wan's first love. The silent owner of the independent label. He had spent ten years constructing the circumstances — induced Chen Zhiyuan to kill Lin Wan for the insurance and copyright money, then killed Chen and staged it as a guilty conscience. The "memorial album" was not a business. It was a love letter to a woman he had decided would belong to no one else.

The Handover

When the reveal concluded, the user did not write a config file. They did not shut down Team #1, clear the workspace, or specify what a creative production team would look like. They wrote one paragraph asking for an HTML presentation. The lead decided Team #1's objective was complete, assessed what a production-phase team would require, and transitioned the workspace accordingly. No YAML. No DAG.

Team #2 — One HTML Designer and Two Image Generators

The second team:

html_designer (Claude) — prose structure, typographic layout, animation logic, HTML/CSS
artist_char (Codex) — character portraits: 8 PNGs (Lin Wan, Chen Zhiyuan, Zhou Shen, and the five experts)
artist_scene (Codex) — crime scene illustrations: 6 PNGs (death scene, LQT2 mechanism, neck ligature diagram, 6-node timeline, relationship network, three-reversal structure)

Claude and Codex are models from different labs, trained on different objectives. The html_designer wrote text, structured narrative flow, and CSS. The artists generated images using Codex's built-in image generation tooling. The lead wrote a 40KB design brief — encoding the full case narrative, Nordic noir visual palette, typographic scale, and 14 image prompts with precise aspect ratios. The user wrote none of that.

When artist_char encountered a server error generating char_detective.png, it observed the rate-limit cooldown protocol — waited 90 seconds, retried — without any user intervention.

The Result

deck.html is 49 pages. It walks through the case from the puzzle statement to the final reveal — the question matrices, the yes/no attribution per expert, the midpoint theory collision, the grandiose possession breakthrough, and the three-act reversal sequence. Character portraits and scene illustrations are embedded throughout. Total size: 88KB.

The presentation is embedded below. Use arrow keys or click to navigate slides.

三月某日清晨，32 岁的女钢琴家坐在三角钢琴前猝死。房间从里锁上，没有外伤，没有毒物，旁边一杯温水，丈夫在隔壁录音室工作。AI 主控收到这个场景，自行组建了五名领域专家，他们花了大约一个小时，向一名只能回答四种话的裁判发问。

触发这一切的人只打了两段中文。整场会话的人工输入总量：约 600 个汉字。输出：49 张幻灯片、14 张原创图像、一个三重反转的完整谋杀案。

两句指令

会话从 /team-agent 技能开始。用户写道：

「我想要一场『专家局海龟汤』，6 人编制（你 + 5 个解谜专家）。规则不用讲。你出题、扮演裁判，我作为委托人/观察者，只看复盘。」

后续说明了五个专家角色的模型分配和六阶段工作流结构，但没有写任何一句 system prompt——用户描述的是游戏形状，其余一切交给主控判断。

解谜结束后，用户发出第二句指令：

「写个 HTML，把整个流程通过动画，类似于动画的方式将它表达出来。同时新起两个 Agent。它们都是 Codex，因为它们可以画图。把一些关键的图，比如说角色图给画出来，然后嵌入到这个 HTML，让它变得更生动。这个 HTML 可以分页，分多少页都可以。总之要将上面所有的有价值的整个流程全部都表达出来。」

68 个汉字，一个自然段。没有雇设计师，没有写过一行图像提示词。

第一支团队：五名侦探与一名裁判

主控自行出题。林婉，32 岁，女钢琴家，家族性长 QT 综合征 LQT2 型携带者（KCNH2 基因突变）。三月某日清晨被发现坐于三角钢琴前死亡，门窗反锁，无外伤，无挣扎痕迹，法医初判心源性猝死。丈夫陈志远，音乐人，案发时在毗邻的录音室工作。

五名专家收到案件材料：

forensic（Codex）—— 法医病理学家，专攻医学因果与生物硬逻辑
insurer（Codex）—— 保险理赔调查员，专攻经济动机与保单时间线
psych（Claude）—— 临床心理医生，专攻人格诊断、依恋模式与执念
reporter（Claude）—— 调查记者，专攻叙事线索、信息空白与时间轴重建
detective（Claude）—— 退役刑警，专攻现场综合直觉

裁判只回答四种话：yes / no / 与本题无关 / 请重新表述。

工作流跑了六个节点：汤面公布 → 私下初判（15 道问题各自独立提交，互不可见）→ 轮转提问+横向接话 → 中场假说对抗（五份 V2 汤底互撕）→ 收敛冲刺 → 集体猜底+揭晓。

Batch 2 里，forensic 连出两个 yes：死因是否与听觉刺激有关？LQT2 是否为核心生理机制？案件转向。陈志远通过录音室共用墙体的监听系统，在林婉独自练琴时播放了他们合作的实验作品——尾部内嵌尖锐人声尖叫采样。林婉以为房间里只有她自己，突遭熟悉旋律末尾的尖叫，LQT2 触发尖端扭转，数秒致死。物理无痕。

第二个反转由 forensic 与 detective 联手锁定：陈志远的"自缢"在 V 字索沟之外另有一道水平擦伤，血液检出氯硝西泮 0.4‰（其人无此药处方）。先被注入安眠药放倒，后遭扼杀，最终悬挂伪造遗书。

第三个反转浮出于中场假说对抗。reporter 当面质问 psych："你的 V2 里仍然写着'经纪人/制作人'——但 Batch 2 已经有两个明确的 no……你用旧假说压盖了你自己问出来的新答案。" psych 回应："临床失误，我接受。"随后命名内核：grandiose possession——一种以"死后定义权"而非金钱为终极目标的控制型依恋。全队采纳这个框架。

真凶：周慎，林婉的婚前初恋，独立厂牌实控人。布局十年。他诱导陈志远杀掉林婉以套取保险金和版权，随后在陈志远的嫌疑链路成形之前亲手灭口，"纪念专辑"不是生意，是他写给自己的情书——把林婉变成只有他能定义的亡者神话。

移交

谜底揭晓之后，用户没有写配置文件，没有手动关掉第一支团队，也没有指定第二支团队的编制。他写了一个自然段，要求做一个 HTML 演示。主控判断第一阶段目标已完成，自行评估生产阶段所需能力，完成工作空间切换。没有 YAML，没有 DAG。

第二支团队：一位 HTML 设计师与两位图像生成师

第二支团队：

html_designer（Claude）—— 散文结构、排版布局、动画逻辑、HTML/CSS
artist_char（Codex）—— 人物肖像，8 张 PNG（林婉、陈志远、周慎及五位专家）
artist_scene（Codex）—— 场景图，6 张 PNG（死亡现场、LQT2 声刺激机制、复合索沟示意、六节点时间轴、三方关系网络、三重反转结构图）

Claude 与 Codex 来自不同实验室，训练目标不同。html_designer 负责文字、叙事结构与 CSS；两位画师通过 Codex 内置图像生成工具产出图像文件。主控撰写了一份 40KB 的设计 brief——包含完整案件叙事、Nordic noir 调色板、字号节奏，以及 14 条带精确比例的图像 prompt。这些，用户一个字也没写。

artist_char 在生成最后一张肖像 char_detective.png 时遭遇服务器错误，自行执行降速协议——等待约 90 秒后重试成功——全程无需用户介入。

最终交付

deck.html 共 49 页，从汤面公布到三重反转揭底，完整走完整个流程：问题矩阵、每条 yes/no 的归属、中场理论碰撞、grandiose possession 命名时刻、三幕反转揭底。人物肖像与场景插图嵌入各处。总大小 88KB。

以下是 Team Agent 制作的演示文稿。用键盘方向键或鼠标点击翻页。

Why This Matters

Cross-lab capability composition. No single-vendor system could have produced this output in this form. Claude wrote the narrative architecture and prose. Codex generated the image assets. Team Agent assembled them in one workspace, with the lead deciding which work went to which provider. The user did not orchestrate this split — they described a goal.

Light orchestration. The game specification was roughly 500 words of natural-language Chinese. The production request was 68. Between those two messages, the system wrote five expert system prompts, ran a six-phase puzzle game with 30+ referee exchanges, produced a 40KB design brief, submitted 14 image generation prompts, and assembled 49 HTML slides. No YAML was written by the user. No agent role docs were authored by the user. The lead inferred all of it.

Bandwidth layering. Without team-agent, a single person running this session would be managing expert dialogue, referee logic, theory evaluation, design decisions, and image prompts — in sequence, in one context window. With team-agent, these tasks ran across separate workers in parallel, their verbosity contained inside the team. The user's context saw a debrief report.

The transcript exists. The deck is embedded above. If you have something that would take a full afternoon to coordinate, that is the right entry point.

为什么这件事值得关注

跨厂商能力组合。没有任何单一厂商框架能在商业逻辑上提供这种组合。Claude 写了叙事架构和散文；Codex 生成了图像资产。Team Agent 在同一工作空间中把它们拼在一起，由主控决定哪项工作交给哪家模型——用户只描述了目标。

轻量编排。游戏规格说明约 500 字中文，生产请求 68 字。两段文字之间，系统写了五份专家 system prompt，跑完六阶段解谜游戏（含 30 余次裁判交互），生成 40KB 设计 brief，提交 14 条图像生成 prompt，组装了 49 张 HTML 幻灯片。用户没有写过任何 YAML，没有手写过任何 agent 角色文档。主控推断了所有内容。

带宽分层。如果由一个人独立完成这场会话，需要依次处理专家对话、裁判逻辑、理论评估、设计决策和图像提示词——全部挤在同一个上下文窗口里。有了 Team Agent，这些任务在独立 worker 的独立上下文中并行运行，它们的冗长细节留在团队内部，不溢出到用户会话。用户的上下文看到的是一份复盘报告。

对话记录真实存在。演示文件就在上方。如果你手上有件事本来需要一个下午才能协调完成，那就是合适的入口。