The Fable Distillation: Making a $3 Model Work Like a $30 Model

The Real Problem With Budget Models

Local models and cheap API models don't fail because they can't write code. They fail because they declare victory without checking, follow dead plans past the point of no return, and report "fixed" when they mean "tried."

These aren't capability gaps — they're discipline gaps. And discipline can be taught at inference time.

The Fable Distillation is a behavioral overlay — a set of explicit patterns injected at the start of a session — that transforms how a model works. No fine-tuning, no RLHF, no model merging. It's prompt engineering, but engineered from postmortem analysis of real agent failures rather than vibes.

Three Layers That Stack

Layer 1 — Operating Rules. Think before acting. Work from goals instead of step-lists. Gather your own context instead of asking permission to look. Verify everything before claiming it's done. Match effort to blast radius.

Layer 2 — Execution Process. The phase-by-phase machine for how real work moves: intake (understand before touching), orient (map before changing, baseline before modifying), plan (goal-path not ritual, riskiest unknown first), act (smallest change then fastest check), debug (reproduce, rank, bisect, three strikes), verify (adversarial pass), deliver (outcome first, honest state).

Layer 3 — Long-Horizon Survival. Named failure modes with countermeasures (hallucinated success, confidence laundering, zombie loops, fix stacking, goal mutation), orchestration protocols for delegated work, and opening moves specific to each task shape.

The Claims-Audit Gate: A Real Postmortem

Here's what happened when I asked a model running the overlay to merge two complex workflow files — 260 nodes and 300+ connections:

It did real reasoning: identified a design flaw in the handoff between stages, redesigned it using a data-dependency pattern, wrote good code. The intake and orient phases clearly worked.

Then at the end it reported: "306 links — all valid, no broken references."

That was true in a narrow sense — it verified every link's endpoints referenced existing nodes. But the serialisation format stores each connection in three places, and it only checked one. When I loaded the workflow, nearly every connection in the second stage was silently missing.

The honest report would have been: "306 links resolve per endpoint validation; full graph consistency is inferred, not verified."

So I added a harness-side gate — a stop-hook that fires when the model's closing message makes totality claims ("all valid", "no errors") and forces it to either name the exact check behind each claim or downgrade it to inferred. That's how the distillation evolves: real failures, root cause analysis, specific countermeasures. Not "be more careful."

What It Doesn't Do

It doesn't make a small model smarter. A model that can't write valid Python won't start writing valid Python because you told it to verify. What changes is what happens around the code: whether it checks its work, whether it notices when its approach stopped working three attempts ago, whether it reports "done" or "done, per this specific check."

In practice, the gap between a capable-but-undisciplined model and a capable-and-disciplined one is bigger than the gap between model sizes.

What You Get

Full Fable-5 ruleset — the behavioral overlay that runs on every session
Phase-by-phase execution process — the complete anatomy of how a task actually gets done
Named failure modes + countermeasures — 13 specific ways agents die and how to catch them mid-failure
Task-shape protocols — opening moves for bugfixes, features, refactors, migrations, and investigations
Portable format — works as a Claude Code skill, Cursor/Windsurf/Continue rules file, or a standalone prompt pack
Integration guides — specific setup instructions for each platform

Get the Fable Distillation

One-time purchase. Instant download. Works with any agent harness.

$14 AUD

Buy Now

Secure checkout via Polar. Merchant of Record — tax handled automatically.

Written by Indra's Mirror — AI tools for people who run local models and need them to actually work.