Problem Context
AI coding assistants send your prompts to cloud APIs. If your prompt contains sensitive data — SSNs, API keys, salary information, medical records — that data leaves your device. Even with provider privacy policies, the data is transmitted.
I wanted a privacy-first architecture: every prompt scanned locally on-device before it reaches any cloud API. If PII is detected, the prompt is blocked with a user-friendly error. Zero data leaves the device without consent.
System Architecture
The system uses a two-stage hybrid approach:
Stage 1: Regex pre-filter (under 50ms) catches obvious patterns — SSNs, credit cards, emails, API keys, passwords. Fast, reliable, deterministic.
Stage 2: On-device LLM (~500ms) catches contextual PII — names tied to salaries, medical diagnoses, physical addresses, employee data. Only runs on prompts >200 characters to avoid hallucination on short text.
Key Engineering Decisions
1. Two-Stage Scanning Over Single-Pass
A single regex pass is fast but misses 60% of PII. A single LLM pass is accurate but adds 500ms to every prompt. The hybrid approach gives fast feedback for clean prompts (regex only) and deep scanning for potentially risky ones.
2. 200-Character Threshold for LLM
The ~3B on-device model hallucinates PII on short text. Input "Hello world" and it might report "Name: John Doe, Salary: $200000". By requiring >200 characters, the model has enough context for reliable classification.
3. Session Refresh Every 5 Chunks
Large prompts are split into 500-character chunks. Processing many chunks in one session causes context window overflow. Refreshing the session every 5 chunks clears the context while maintaining quality:
async def scan_file(text: str, chunk_size: int = 500):
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
session = None
chunks_since_refresh = 0
for chunk in chunks:
if chunks_since_refresh >= 5 or session is None:
session = LanguageModelSession(
instructions=SCAN_INSTRUCTIONS,
model=model,
)
chunks_since_refresh = 0
result = await session.respond(chunk, generating=PIIClassification)
chunks_since_refresh += 14. Exit Code Protocol
| Code | Meaning | Action |
|---|---|---|
| 0 | No PII found | Prompt proceeds to cloud API |
| 2 | PII detected | Prompt blocked, error shown |
| timeout | Scan took >30s | Prompt blocked for safety |
The hook script uses jq to extract the prompt from the event JSON, writes to a temp file (cleaned up on exit), and calls the scanner.
Performance Characteristics
| Prompt Type | Regex Time | LLM Time | Total |
|---|---|---|---|
| Short (under 200 chars) | under 10ms | skipped | under 50ms |
| Medium (200-1KB) | under 15ms | ~300ms | ~350ms |
| Large (1-10KB) | under 20ms | ~500ms | ~550ms |
| Very large (over 10KB) | under 50ms | ~800ms | ~850ms |
The tradeoff: 50-850ms delay per prompt to guarantee no PII leaves the device.
Technical Tradeoffs
| Decision | Benefit | Cost |
|---|---|---|
| Hybrid regex + LLM | Best of both: speed + accuracy | Two code paths to maintain |
| Fresh session per classification | No context bleed between scans | 555ms vs ~30ms with reuse |
| 200-char threshold | Prevents hallucination | Short prompts with PII slip through to regex-only |
| 30-second timeout | Prevents hangs | Long documents may timeout |
Impact
- 100% PII detection accuracy on 25 diverse test cases (15 PII, 10 clean)
- Zero false positives — no clean prompts incorrectly blocked
- Zero data transmission before local scanning completes
- Graceful degradation — works as regex-only on machines without the on-device model
Lessons Learned
-
Privacy-first architecture means scanning before transmission, not after. If data leaves the device, it's already too late for privacy guarantees.
-
Binary classification beats extraction for small models. Asking "does this contain PII?" (yes/no) is dramatically more reliable than asking "extract all PII from this text."
-
Explicit negative examples in prompts eliminate false positives. Telling the model what NOT to flag ("toll-free numbers, error codes, code variables") is as important as what to flag.
-
Session management is critical for batch processing. Without periodic session refresh, the context window fills up and scanning stops silently. The refresh pattern (every N items) keeps quality high while managing memory.
-
Threshold tuning prevents hallucination. Small models on insufficient context invent data. The 200-character minimum was found empirically — below it, false positive rate climbs to ~15%.