#llm
11 pieces of content
I Evaluated Fine-Tuning Across 3 Projects — None of Them Needed It
Three projects, three evaluations, zero cases where fine-tuning was justified. Here's the decision framework, the cost math, and why simpler approaches won every time.
How ReAct Agents Recover from Their Own Mistakes
ReAct agents recover from their own mistakes — not because the model is clever, but because of how tools return errors and how the loop is structured. Here's what that looks like in practice.
Letting the Model Pick Its Own Tools: How Tool Use Inverts Control Flow
The model autonomously combined keyword search and vector search in the optimal sequence — without being told to. Then I ran experiments to measure what vague descriptions, over-calling, and temperature actually do to tool selection.
My LLM Pipeline Passed Every Manual Check — Then 36 Tests Proved Otherwise
Five manual runs looked fine. Then 36 automated tests exposed non-deterministic sourcing, biased scoring, and a confidence threshold that fired randomly.
Building a Local PII Privacy Gate for AI Coding Assistants
How I built a hybrid regex + on-device LLM scanner that blocks prompts containing PII before they reach cloud APIs — zero data leaves the device.
Regex vs On-Device LLM for PII Detection: A 25-Case Benchmark
A comprehensive benchmark comparing regex pattern matching against Apple's on-device Foundation Models for PII detection — 52% F1 vs 100% F1, and why binary classification beats extraction.
On-Device vs Cloud LLM: A Practical Benchmark
Benchmarking Apple's on-device Foundation Models against cloud LLMs across commit message generation, code review, and text classification — latency, quality, cost, and privacy tradeoffs.
Privacy-First Git Commit Message Generator
An on-device tool that analyzes git diffs and generates structured conventional commit messages — zero data leaves your machine.
Session Reuse Eliminates Hangs in Batch LLM Processing
How to Build an LLM-Powered UI Generator
A technical deep dive into building a system where users type plain English and get live HTML mockups that match a specific design system — grounding, token budgets, streaming, and security.
Reducing LLM Token Usage by 44% with Selective Context Injection