How to Build an LLM-Powered UI Generator

The Challenge

Build a system where a user types "show me a dealer search page" and gets a live, interactive HTML mockup that matches a specific design system — correct colors, components, spacing, and data patterns. Not generic HTML. Design-system-faithful HTML.

The engineering problem: how do you make an LLM output HTML that matches your specific design system instead of generic Bootstrap?

The Answer: Grounding

You inject your design system's rules and component patterns directly into the LLM prompt. The system prompt contains:

Tailwind config with your custom prefix and color tokens
HTML patterns for each component (buttons, cards, tables, badges)
Layout rules and spacing conventions

The model pattern-matches these examples into its output. It doesn't "understand" your design system — the examples constrain the probability space of what HTML it produces.

Architecture

The system has three key engineering patterns:

1. Two-Phase Selective Context Injection

Injecting all 18 component patterns into every prompt wastes tokens and confuses the model. Instead, run two LLM calls:

Phase 1 (cheap, ~400ms): Classify which components the prompt needs.

Input:  "show me a search page with filters"
Output: ["data_table", "filter_chips", "stat_card"]

Phase 2 (main generation, 3-8s): Generate HTML with only those 3 component patterns injected.

Result: 44% fewer context tokens, better output quality because the model isn't distracted by irrelevant examples.

2. Zero-Cost Intent Detection + Seed Data

A keyword classifier (no ML, no LLM call) detects the screen type from the prompt:

INTENT_MAP = {
    "dealer_search": ["dealer search", "search dealer", "dealer list"],
    "dealer_details": ["dealer detail", "dealer profile"],
    "reports": ["report", "analytics", "dashboard"],
}

Matching a key loads curated domain data from a JSON file — real field names, realistic values, correct status types. This data is injected into the prompt with the instruction: "Use these exact values — do not invent substitutes."

Why it works: Without seed data, the LLM invents placeholder content like "Dealer Name", "$10,000". With real domain data, the mockup looks credible. This technique is called few-shot grounding — concrete examples constrain the model's vocabulary.

3. SSE Streaming with iframe Throttling

The generated HTML streams to the browser via Server-Sent Events. But the critical insight is throttling iframe updates to 500ms.

Without throttling, writing iframe.srcdoc = html on every chunk causes the browser to reload the CSS framework dozens of times per second. Each reload wipes and re-scans all classes. By accumulating ~10 chunks before each iframe write, CSS re-executions drop by 10-20x.

The final chunk always triggers an immediate write to guarantee a clean result.

Security: iframe Sandbox

The iframe uses sandbox="allow-scripts allow-popups" — notably without allow-same-origin.

Counter-intuitively, allow-same-origin is the dangerous option in srcdoc mode — it grants the iframe's content access to the parent page's DOM, cookies, and localStorage. Since the iframe runs LLM-generated code (which could contain anything), removing same-origin isolation is critical.

allow-scripts is necessary for the CSS framework's JavaScript to run. This is a calculated tradeoff: script execution within an isolated sandbox.

Template Substitution: Why .replace() Over .format()

The system prompt template uses .replace() for placeholder substitution:

return (
    template
    .replace("{HEAD_BLOCK}", HEAD_BLOCK)
    .replace("{cop_context_json}", json.dumps(context, indent=2))
    .replace("{SEED_DATA_BLOCK}", seed_block)
)

Python's .format() and f-strings interpret every {...} as a placeholder. But the HEAD_BLOCK itself contains JavaScript objects with curly braces: {prefix: 'tw-', theme: {extend: {colors: ...}}}. Using .format() would crash. .replace() only touches the three specific placeholders we control.

The Design Philosophy

Use ML only where you need generative power. Use rules everywhere else.

Task	Approach	Cost
Intent detection	Keyword map	0ms, deterministic
Component detection in HTML	CSS marker scan	0ms, deterministic
Seed data injection	JSON lookup	0ms, deterministic
Component classification	LLM (temp=0)	~400ms
HTML generation	LLM (temp=0.3)	3-8s

The LLM handles two tasks: classifying a free-form prompt into component needs, and generating novel HTML from a description. Everything else is rules — instant, free, and failure-proof.

Takeaways

Ground LLM outputs with your design system's actual patterns. Don't hope the model knows your component library — inject the patterns verbatim.
Selective context injection saves tokens and improves quality. A cheap classification pass eliminates irrelevant context.
Domain seed data makes mockups credible. Real values beat placeholder text for stakeholder buy-in.
Throttle streaming UI updates. 500ms batching prevents CSS framework thrashing.
Sandbox LLM-generated content. Never grant allow-same-origin to untrusted HTML in iframes.