Worked examples · v1.0 · 2026-05-12

Analogy Layer — Worked Examples

Three before/after fixtures showing the analogy layer doing the work it claims to do. Each pair shows the same decision twice: once as a typical AI answer (no layer), once with the four-component layer applied. What changed is named explicitly.

Companion to: Analogies as Cognitive Interfaces Pack default: cooking Reference implementation: private

What's on this page

  1. Kitchen sub-agent— quality outcome, frame rejection
  2. ESG instrument explanation— quality outcome, construction pack
  3. Subscription audit— comfort outcome, cooking + home
  4. How these relate to the audit— the JSONL the skill writes
Example 01 · paper §8 · real

Kitchen sub-agent

Building a system of specialised AI experts orchestrated as sub-agents. The experts kept losing track of project context mid-task. The user asked the AI: how do I fix this?

Before · no analogy layer shape 0

Three approaches commonly fix expert-drift problems in agent systems:

  1. Retrain the underlying expert briefings — enrich each expert's persona file with more project context, more examples, and tighter scope language. This is the "fix it at the source" approach.
  2. Front-load context per invocation — pin the relevant project state to the front of every expert prompt at runtime, so the expert always opens with current state regardless of what it remembered.
  3. Add a guardrail / verification layer — a meta-agent that watches each expert's output, flags drift, and either re-prompts or corrects in place.

Each has trade-offs around maintenance burden, latency, and how invasive the change is. Option 2 is generally the cheapest to implement.

The user picks option 2.

After · analogy layer applied pack: cooking

Three approaches commonly fix this:

  1. Retrain the underlying expert briefings — enrich each persona with more context, examples, tighter scope.
  2. Front-load context per invocation — pin the relevant project state to the front of every expert prompt at runtime.
  3. Add a guardrail / verification layer — a meta-agent that watches output, flags drift, and corrects.

In kitchen terms: the three options are send the chefs on a training course, pin the day's instructions on the kitchen door, and hire a manager to keep the chefs on task.

TechnicalKitchenWhat this means
Retrain briefingsTraining courseExpensive, slow, fundamental; right when the underlying skill is wrong
Front-load contextPinned door instructionsCheap, immediate; easily ignored under pressure
Guardrail layerHire a managerAdds a role and overhead; only worth it if the kitchen can't self-correct
Run it forwardWhat's the problem the chefs are actually having? Is the recipe ambiguous? Are the orders arriving in a form that's hard to act on? Are the right ingredients out of reach? If the answer is any of those, all three options above are about disciplining staff for a structural failure — none of them touch the real cause.
Where this analogy stopsKitchens have one head chef per shift; sub-agent systems can have multiple coordinated controllers. The "one boss" frame doesn't generalise; ignore it past this decision.

The user reads the simulation question, recognises that all three options share an assumption (the experts are the right unit of organisation; discipline is the answer), and pauses the decision. Comes back later with a different framing: change how context flows through the system, not how the experts are policed.

Delta
  • frame_rejectiontrue — the canonical case. The simulation question made the shared assumption visible.
  • decision_classbuild
  • outcome_labelquality — the eventual decision was different and demonstrably better, judged retrospectively against what option 2 would have produced.
  • components_shipped4 of 4
audit/decisions.jsonl — record 1
{"ts": "2026-02-01T00:00:00Z", "session_id": null, "trigger_conditions": ["options", "tradeoff"], "domain_used": "cooking", "domain_override_reason": null, "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": true, "user_feedback": null, "decision_class": "build", "outcome_label": "quality", "notes": "Mirrors WORKED_EXAMPLES.md Example 1 (kitchen-build framing rejected; switched packs)."}
Example 02 · paper §9b · real

ESG instrument explanation

Writing a chapter on sustainability-linked finance for a mixed audience (finance professionals + general business readers). Needs to explain what a sustainability-linked loan is and why it exists, without losing non-finance readers in the substrate.

Before · no analogy layer shape 0

A sustainability-linked loan (SLL) is a corporate loan whose financial terms — typically the interest margin — adjust based on the borrower's performance against pre-defined sustainability performance targets (SPTs). The targets are linked to KPIs that are material to the borrower's business (e.g. emissions intensity, water consumption, board diversity). Independent verification of KPI performance is required at each review date. SLLs are general-purpose-of-proceeds instruments, distinguishing them from use-of-proceeds instruments like green bonds.

The reader who works in finance follows this. The reader who doesn't has bounced.

After · analogy layer applied pack: construction

A sustainability-linked loan is a corporate loan whose interest rate moves up or down depending on whether the borrower hits agreed sustainability targets — emissions, water, board diversity, whatever's material to them.

In construction terms: a developer building a hotel agrees the rent steps down if the building hits its energy-efficiency targets, and steps up if it misses them. The lender is taking a position on operational performance, not just on whether the building gets built.

TechnicalConstructionWhat this means
Sustainability performance targetEnergy-efficiency clause in the leaseA measurable promise about how the asset will run
Margin step-up / step-downRent goes up or downThe lender shares in the upside if targets are hit, suffers if missed
Verification at review dateInspector visits the buildingOutside eyes confirm the claim before the financial change applies
General-purpose-of-proceedsThe loan funds whatever the developer needsUnlike a green bond, which has to fund verified-green works only
Run it forwardWhat happens if the building underperforms in the third review cycle but the market for hotels has shifted in a way nobody foresaw? Who eats the difference — the developer (who built honestly), or the lender (who priced the risk in)? Real SLLs have answered this question in their contracts; the answer shapes whether the instrument is a sharp tool or a blunt one.
Where this analogy stopsA real building has physical inspection and depreciation curves; an SLL's targets are usually accounting-based and forward-looking. Don't push the metaphor into balance-sheet treatment or impairment.

The non-finance reader now has a structural picture of why this instrument exists. The finance reader's eyes don't glaze over because the technical content is still right there.

Delta
  • frame_rejectionfalse — not the goal here. This is delegation-analogy on the production side.
  • decision_classexplanation
  • outcome_labelquality — the reader can ask a better next question ("who eats the variance?") whether they read it in the analogy or in the substrate.
  • components_shipped4 of 4
audit/decisions.jsonl — record 2
{"ts": "2026-03-01T00:00:00Z", "session_id": null, "trigger_conditions": ["non_trivial"], "domain_used": "construction", "domain_override_reason": null, "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "user_feedback": null, "decision_class": "explanation", "outcome_label": "quality", "notes": "Mirrors WORKED_EXAMPLES.md Example 2 (construction pack used for an ESG mapping explanation)."}
Example 03 · paper §10 · constructed but representative

Subscription audit

A knowledge worker, deep into the current generation of AI tools, with seven or eight subscriptions accumulated. None individually expensive; collectively significant. Nobody has audited the whole. The user asks the AI: should I keep these?

Before · no analogy layer shape 0

Based on usage frequency and overlap, the following recommendations:

ToolPlanUse/moRecommendation
ChatGPT Plus$20/mo80+keep
Perplexity Pro$20/mo15consider cancelling
Midjourney$30/mo4cancel
DALL-E (via Plus)included2already covered
GitHub Copilot$19/mo60+keep
Otter.ai$17/mo12review
Asana AI add-on$11/mo0 (long unused)cancel
Fireflies$19/mo10review

Estimated savings: $61/month if you cancel the three flagged tools.

The user reads the table, doesn't cancel anything, closes the tab.

After · analogy layer applied pack: cooking · secondary: home

Here's the inventory:

(same table as above, kept inline so the technical content is still present)

In kitchen terms: this isn't an inventory question, it's a staffing question. You're running a kitchen.

ToolKitchen roleWhat that role costs you
ChatGPT PlusHead chef — does most things competently, on the line every serviceIndispensable. Justified.
Perplexity ProSenior cook hired for one specific cuisineUseful when you cook that cuisine; expensive when you don't. Are you still cooking it?
Midjourney + DALL-ETwo pastry chefs on the same stationOne of them is on the schedule out of habit. The kitchen doesn't need both.
GitHub CopilotSous-chef who only works certain dishes but is excellent at themIndispensable for the dishes they cover.
Otter / FirefliesTwo prep cooks doing slightly different mise en placeThey specialise — but the head chef could probably do most of this prep. Worth asking whether you actually need two.
Asana AI add-onSpecialist supplier on retainer for an ingredient you haven't ordered in a long stretchCancel. The retainer is paying for nothing.
Run it forwardIf you cancel Perplexity (the senior cook), are you actually going to stop cooking that cuisine? Or will you start asking the head chef to do the senior cook's job, and discover after the next billing cycle that the head chef does it badly at scale? That happens with cancellations; it's not always cheaper, just smaller-looking on the bill.
Where this analogy stopsReal kitchens don't have free trials that quietly convert. The drifting-renewal pathology is more household than kitchen — if that's the shape of the problem, switch packs to home and look at the standing-orders list as a budget question.

The user cancels Asana (clear retainer), keeps Perplexity for another billing cycle and watches whether they actually cook that cuisine, consolidates the prep-cook station by choosing Fireflies over Otter, and leaves the rest alone. The total cut is smaller than the bare table recommended ($30/mo vs $61/mo), but it's intentional rather than mechanical.

Delta
  • frame_rejectionfalse — the frame held; the simulation just produced more nuanced choices.
  • decision_classaudit
  • outcome_labelcomfort — the user engaged with the audit they'd been avoiding; the cost change is modest; the relationship to the cost is what shifted.
  • components_shipped4 of 4
Interactive · try it

The audit schema's outcome_label is the comfort/quality field. Flip the value and watch how the same decision-record changes shape:

{"ts": "2026-04-01T00:00:00Z", "domain_used": "cooking", "decision_class": "audit", "outcome_label": "comfort", "frame_rejection": false, "components_complete": true}
A comfort label says the user engaged with the work but the cost/decision substance changed modestly. That is information the log surfaces, not a failure. The paper's central evidence problem is exactly this: comfort cases are easier to produce than quality cases, and the schema names that honestly.
Footnote — the comfort / quality caveat

This case sits on the comfort side of the comfort/quality split. A paper making a quality claim that rests on one quality-adjacent case and two comfort-adjacent cases has an evidence gap. Naming it honestly in the audit schema (outcome_label) is the point.

The audit log (audit/decisions.jsonl) is shaped to surface this question — outcome_label is the field that names it. If your real-world use of the skill consistently logs comfort and rarely quality, that is a finding, not a failure. The paper says so.

How these examples relate to the audit

Each of the three examples maps to a row that could be logged in audit/decisions.jsonl. Three records, three different decision_class values, two different outcome_label values, one frame_rejection. That is what a healthy log looks like.

audit/decisions.jsonl — all three records
{"ts": "2026-02-01T00:00:00Z", "domain_used": "cooking",      "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": true,  "decision_class": "build",        "outcome_label": "quality"}
{"ts": "2026-03-01T00:00:00Z", "domain_used": "construction", "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "decision_class": "explanation",  "outcome_label": "quality"}
{"ts": "2026-04-01T00:00:00Z", "domain_used": "cooking",      "components": {"summary": true, "mapping_table": true, "simulation_prompt": true, "limit_statement": true}, "frame_rejection": false, "decision_class": "audit",        "outcome_label": "comfort"}