The Challenge
Build a system where a user types "show me a dealer search page" and gets a live, interactive HTML mockup that matches a specific design system — correct colors, components, spacing, and data patterns. Not generic HTML. Design-system-faithful HTML.
The engineering problem: how do you make an LLM output HTML that matches your specific design system instead of generic Bootstrap?
The Answer: Grounding
You inject your design system's rules and component patterns directly into the LLM prompt. The system prompt contains:
- Tailwind config with your custom prefix and color tokens
- HTML patterns for each component (buttons, cards, tables, badges)
- Layout rules and spacing conventions
The model pattern-matches these examples into its output. It doesn't "understand" your design system — the examples constrain the probability space of what HTML it produces.
Architecture
The system has three key engineering patterns:
1. Two-Phase Selective Context Injection
Injecting all 18 component patterns into every prompt wastes tokens and confuses the model. Instead, run two LLM calls:
Phase 1 (cheap, ~400ms): Classify which components the prompt needs.
Input: "show me a search page with filters"
Output: ["data_table", "filter_chips", "stat_card"]
Phase 2 (main generation, 3-8s): Generate HTML with only those 3 component patterns injected.
Result: 44% fewer context tokens, better output quality because the model isn't distracted by irrelevant examples.
2. Zero-Cost Intent Detection + Seed Data
A keyword classifier (no ML, no LLM call) detects the screen type from the prompt:
INTENT_MAP = {
"dealer_search": ["dealer search", "search dealer", "dealer list"],
"dealer_details": ["dealer detail", "dealer profile"],
"reports": ["report", "analytics", "dashboard"],
}Matching a key loads curated domain data from a JSON file — real field names, realistic values, correct status types. This data is injected into the prompt with the instruction: "Use these exact values — do not invent substitutes."
Why it works: Without seed data, the LLM invents placeholder content like "Dealer Name", "$10,000". With real domain data, the mockup looks credible. This technique is called few-shot grounding — concrete examples constrain the model's vocabulary.
3. SSE Streaming with iframe Throttling
The generated HTML streams to the browser via Server-Sent Events. But the critical insight is throttling iframe updates to 500ms.
Without throttling, writing iframe.srcdoc = html on every chunk causes the browser to reload the CSS framework dozens of times per second. Each reload wipes and re-scans all classes. By accumulating ~10 chunks before each iframe write, CSS re-executions drop by 10-20x.
The final chunk always triggers an immediate write to guarantee a clean result.
Security: iframe Sandbox
The iframe uses sandbox="allow-scripts allow-popups" — notably without allow-same-origin.
Counter-intuitively, allow-same-origin is the dangerous option in srcdoc mode — it grants the iframe's content access to the parent page's DOM, cookies, and localStorage. Since the iframe runs LLM-generated code (which could contain anything), removing same-origin isolation is critical.
allow-scripts is necessary for the CSS framework's JavaScript to run. This is a calculated tradeoff: script execution within an isolated sandbox.
Template Substitution: Why .replace() Over .format()
The system prompt template uses .replace() for placeholder substitution:
return (
template
.replace("{HEAD_BLOCK}", HEAD_BLOCK)
.replace("{cop_context_json}", json.dumps(context, indent=2))
.replace("{SEED_DATA_BLOCK}", seed_block)
)Python's .format() and f-strings interpret every {...} as a placeholder. But the HEAD_BLOCK itself contains JavaScript objects with curly braces: {prefix: 'tw-', theme: {extend: {colors: ...}}}. Using .format() would crash. .replace() only touches the three specific placeholders we control.
The Design Philosophy
Use ML only where you need generative power. Use rules everywhere else.
| Task | Approach | Cost |
|---|---|---|
| Intent detection | Keyword map | 0ms, deterministic |
| Component detection in HTML | CSS marker scan | 0ms, deterministic |
| Seed data injection | JSON lookup | 0ms, deterministic |
| Component classification | LLM (temp=0) | ~400ms |
| HTML generation | LLM (temp=0.3) | 3-8s |
The LLM handles two tasks: classifying a free-form prompt into component needs, and generating novel HTML from a description. Everything else is rules — instant, free, and failure-proof.
Takeaways
- Ground LLM outputs with your design system's actual patterns. Don't hope the model knows your component library — inject the patterns verbatim.
- Selective context injection saves tokens and improves quality. A cheap classification pass eliminates irrelevant context.
- Domain seed data makes mockups credible. Real values beat placeholder text for stakeholder buy-in.
- Throttle streaming UI updates. 500ms batching prevents CSS framework thrashing.
- Sandbox LLM-generated content. Never grant
allow-same-originto untrusted HTML in iframes.