Three before/after fixtures showing the analogy layer doing the work it claims to do. Each pair shows the same decision twice: once as a typical AI answer (no layer), once with the four-component layer applied. What changed is named explicitly.
Building a system of specialised AI experts orchestrated as sub-agents. The experts kept losing track of project context mid-task. The user asked the AI: how do I fix this?
Three approaches commonly fix expert-drift problems in agent systems:
Each has trade-offs around maintenance burden, latency, and how invasive the change is. Option 2 is generally the cheapest to implement.
The user picks option 2.
Three approaches commonly fix this:
In kitchen terms: the three options are send the chefs on a training course, pin the day's instructions on the kitchen door, and hire a manager to keep the chefs on task.
| Technical | Kitchen | What this means |
|---|---|---|
| Retrain briefings | Training course | Expensive, slow, fundamental; right when the underlying skill is wrong |
| Front-load context | Pinned door instructions | Cheap, immediate; easily ignored under pressure |
| Guardrail layer | Hire a manager | Adds a role and overhead; only worth it if the kitchen can't self-correct |
The user reads the simulation question, recognises that all three options share an assumption (the experts are the right unit of organisation; discipline is the answer), and pauses the decision. Comes back later with a different framing: change how context flows through the system, not how the experts are policed.
{"ts": "2026-02-01T00:00:00Z", "session_id": null, "trigger_conditions": ["options", "tradeoff"], "domain_used": "cooking", "domain_override_reason": null, "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": true, "user_feedback": null, "decision_class": "build", "outcome_label": "quality", "notes": "Mirrors WORKED_EXAMPLES.md Example 1 (kitchen-build framing rejected; switched packs)."}
Writing a chapter on sustainability-linked finance for a mixed audience (finance professionals + general business readers). Needs to explain what a sustainability-linked loan is and why it exists, without losing non-finance readers in the substrate.
A sustainability-linked loan (SLL) is a corporate loan whose financial terms — typically the interest margin — adjust based on the borrower's performance against pre-defined sustainability performance targets (SPTs). The targets are linked to KPIs that are material to the borrower's business (e.g. emissions intensity, water consumption, board diversity). Independent verification of KPI performance is required at each review date. SLLs are general-purpose-of-proceeds instruments, distinguishing them from use-of-proceeds instruments like green bonds.
The reader who works in finance follows this. The reader who doesn't has bounced.
A sustainability-linked loan is a corporate loan whose interest rate moves up or down depending on whether the borrower hits agreed sustainability targets — emissions, water, board diversity, whatever's material to them.
In construction terms: a developer building a hotel agrees the rent steps down if the building hits its energy-efficiency targets, and steps up if it misses them. The lender is taking a position on operational performance, not just on whether the building gets built.
| Technical | Construction | What this means |
|---|---|---|
| Sustainability performance target | Energy-efficiency clause in the lease | A measurable promise about how the asset will run |
| Margin step-up / step-down | Rent goes up or down | The lender shares in the upside if targets are hit, suffers if missed |
| Verification at review date | Inspector visits the building | Outside eyes confirm the claim before the financial change applies |
| General-purpose-of-proceeds | The loan funds whatever the developer needs | Unlike a green bond, which has to fund verified-green works only |
The non-finance reader now has a structural picture of why this instrument exists. The finance reader's eyes don't glaze over because the technical content is still right there.
{"ts": "2026-03-01T00:00:00Z", "session_id": null, "trigger_conditions": ["non_trivial"], "domain_used": "construction", "domain_override_reason": null, "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "user_feedback": null, "decision_class": "explanation", "outcome_label": "quality", "notes": "Mirrors WORKED_EXAMPLES.md Example 2 (construction pack used for an ESG mapping explanation)."}
A knowledge worker, deep into the current generation of AI tools, with seven or eight subscriptions accumulated. None individually expensive; collectively significant. Nobody has audited the whole. The user asks the AI: should I keep these?
Based on usage frequency and overlap, the following recommendations:
| Tool | Plan | Use/mo | Recommendation |
|---|---|---|---|
| ChatGPT Plus | $20/mo | 80+ | keep |
| Perplexity Pro | $20/mo | 15 | consider cancelling |
| Midjourney | $30/mo | 4 | cancel |
| DALL-E (via Plus) | included | 2 | already covered |
| GitHub Copilot | $19/mo | 60+ | keep |
| Otter.ai | $17/mo | 12 | review |
| Asana AI add-on | $11/mo | 0 (long unused) | cancel |
| Fireflies | $19/mo | 10 | review |
Estimated savings: $61/month if you cancel the three flagged tools.
The user reads the table, doesn't cancel anything, closes the tab.
Here's the inventory:
(same table as above, kept inline so the technical content is still present)
In kitchen terms: this isn't an inventory question, it's a staffing question. You're running a kitchen.
| Tool | Kitchen role | What that role costs you |
|---|---|---|
| ChatGPT Plus | Head chef — does most things competently, on the line every service | Indispensable. Justified. |
| Perplexity Pro | Senior cook hired for one specific cuisine | Useful when you cook that cuisine; expensive when you don't. Are you still cooking it? |
| Midjourney + DALL-E | Two pastry chefs on the same station | One of them is on the schedule out of habit. The kitchen doesn't need both. |
| GitHub Copilot | Sous-chef who only works certain dishes but is excellent at them | Indispensable for the dishes they cover. |
| Otter / Fireflies | Two prep cooks doing slightly different mise en place | They specialise — but the head chef could probably do most of this prep. Worth asking whether you actually need two. |
| Asana AI add-on | Specialist supplier on retainer for an ingredient you haven't ordered in a long stretch | Cancel. The retainer is paying for nothing. |
home and look at the standing-orders list as a budget question.The user cancels Asana (clear retainer), keeps Perplexity for another billing cycle and watches whether they actually cook that cuisine, consolidates the prep-cook station by choosing Fireflies over Otter, and leaves the rest alone. The total cut is smaller than the bare table recommended ($30/mo vs $61/mo), but it's intentional rather than mechanical.
The audit schema's outcome_label is the comfort/quality field. Flip the value and watch how the same decision-record changes shape:
{"ts": "2026-04-01T00:00:00Z", "domain_used": "cooking", "decision_class": "audit", "outcome_label": "comfort", "frame_rejection": false, "components_complete": true}
This case sits on the comfort side of the comfort/quality split. A paper making a quality claim that rests on one quality-adjacent case and two comfort-adjacent cases has an evidence gap. Naming it honestly in the audit schema (outcome_label) is the point.
The audit log (audit/decisions.jsonl) is shaped to surface this question — outcome_label is the field that names it. If your real-world use of the skill consistently logs comfort and rarely quality, that is a finding, not a failure. The paper says so.
Each of the three examples maps to a row that could be logged in audit/decisions.jsonl. Three records, three different decision_class values, two different outcome_label values, one frame_rejection. That is what a healthy log looks like.
{"ts": "2026-02-01T00:00:00Z", "domain_used": "cooking", "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": true, "decision_class": "build", "outcome_label": "quality"}
{"ts": "2026-03-01T00:00:00Z", "domain_used": "construction", "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "decision_class": "explanation", "outcome_label": "quality"}
{"ts": "2026-04-01T00:00:00Z", "domain_used": "cooking", "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "decision_class": "audit", "outcome_label": "comfort"}