Building a Local PII Privacy Gate for AI Coding Assistants

Problem Context

AI coding assistants send your prompts to cloud APIs. If your prompt contains sensitive data — SSNs, API keys, salary information, medical records — that data leaves your device. Even with provider privacy policies, the data is transmitted.

I wanted a privacy-first architecture: every prompt scanned locally on-device before it reaches any cloud API. If PII is detected, the prompt is blocked with a user-friendly error. Zero data leaves the device without consent.

System Architecture

Loading diagram...

The system uses a two-stage hybrid approach:

Stage 1: Regex pre-filter (under 50ms) catches obvious patterns — SSNs, credit cards, emails, API keys, passwords. Fast, reliable, deterministic.

Stage 2: On-device LLM (~500ms) catches contextual PII — names tied to salaries, medical diagnoses, physical addresses, employee data. Only runs on prompts >200 characters to avoid hallucination on short text.

Key Engineering Decisions

1. Two-Stage Scanning Over Single-Pass

A single regex pass is fast but misses 60% of PII. A single LLM pass is accurate but adds 500ms to every prompt. The hybrid approach gives fast feedback for clean prompts (regex only) and deep scanning for potentially risky ones.

2. 200-Character Threshold for LLM

The ~3B on-device model hallucinates PII on short text. Input "Hello world" and it might report "Name: John Doe, Salary: $200000". By requiring >200 characters, the model has enough context for reliable classification.

3. Session Refresh Every 5 Chunks

Large prompts are split into 500-character chunks. Processing many chunks in one session causes context window overflow. Refreshing the session every 5 chunks clears the context while maintaining quality:

async def scan_file(text: str, chunk_size: int = 500):
    chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
    session = None
    chunks_since_refresh = 0
 
    for chunk in chunks:
        if chunks_since_refresh >= 5 or session is None:
            session = LanguageModelSession(
                instructions=SCAN_INSTRUCTIONS,
                model=model,
            )
            chunks_since_refresh = 0
 
        result = await session.respond(chunk, generating=PIIClassification)
        chunks_since_refresh += 1

4. Exit Code Protocol

Code	Meaning	Action
0	No PII found	Prompt proceeds to cloud API
2	PII detected	Prompt blocked, error shown
timeout	Scan took >30s	Prompt blocked for safety

The hook script uses jq to extract the prompt from the event JSON, writes to a temp file (cleaned up on exit), and calls the scanner.

Performance Characteristics

Prompt Type	Regex Time	LLM Time	Total
Short (under 200 chars)	under 10ms	skipped	under 50ms
Medium (200-1KB)	under 15ms	~300ms	~350ms
Large (1-10KB)	under 20ms	~500ms	~550ms
Very large (over 10KB)	under 50ms	~800ms	~850ms

The tradeoff: 50-850ms delay per prompt to guarantee no PII leaves the device.

Technical Tradeoffs

Decision	Benefit	Cost
Hybrid regex + LLM	Best of both: speed + accuracy	Two code paths to maintain
Fresh session per classification	No context bleed between scans	555ms vs ~30ms with reuse
200-char threshold	Prevents hallucination	Short prompts with PII slip through to regex-only
30-second timeout	Prevents hangs	Long documents may timeout

Impact

100% PII detection accuracy across 25 diverse test cases (15 PII, 10 clean)
Zero false positives — no clean prompts incorrectly blocked
Zero data transmitted to cloud before local scanning completes
Falls back to regex-only on machines without the on-device model

Lessons Learned

Scan before transmission, not after. If data leaves the device, it's already too late for privacy guarantees.
Binary classification beats extraction for small models. Asking "does this contain PII?" (yes/no) is dramatically more reliable than asking "extract all PII from this text."
Explicit negative examples in prompts eliminate false positives. Telling the model what NOT to flag ("toll-free numbers, error codes, code variables") is as important as what to flag.
Session management matters in batch processing. Without periodic session refresh, the context window fills up and scanning stops silently. The refresh pattern (every N items) keeps quality high while managing memory.
Threshold tuning prevents hallucination. Small models on insufficient context invent data. The 200-character minimum was found empirically — below it, false positive rate climbs to ~15%.