#python
23 pieces of content
I Evaluated Fine-Tuning Across 3 Projects — None of Them Needed It
Three projects, three evaluations, zero cases where fine-tuning was justified. Here's the decision framework, the cost math, and why simpler approaches won every time.
How ReAct Agents Recover from Their Own Mistakes
ReAct agents recover from their own mistakes — not because the model is clever, but because of how tools return errors and how the loop is structured. Here's what that looks like in practice.
Letting the Model Pick Its Own Tools: How Tool Use Inverts Control Flow
The model autonomously combined keyword search and vector search in the optimal sequence — without being told to. Then I ran experiments to measure what vague descriptions, over-calling, and temperature actually do to tool selection.
When Embeddings Fail: Why Vector Search Can't Judge Capability
I added vector search to the screening pipeline and watched it rank a junior frontend developer above a Principal Engineer who processed 1B+ events/day. The embedding model matched vocabulary, not capability.
My LLM Pipeline Passed Every Manual Check — Then 36 Tests Proved Otherwise
Five manual runs looked fine. Then 36 automated tests exposed non-deterministic sourcing, biased scoring, and a confidence threshold that fired randomly.
Auditing My AI Systems: Patterns, Tradeoffs, and Gaps I Was Working Around
I catalogued every AI decision across three production systems and found a consistent pattern — along with five gaps I'd been working around instead of solving.
Building a Local PII Privacy Gate for AI Coding Assistants
How I built a hybrid regex + on-device LLM scanner that blocks prompts containing PII before they reach cloud APIs — zero data leaves the device.
Regex vs On-Device LLM for PII Detection: A 25-Case Benchmark
A comprehensive benchmark comparing regex pattern matching against Apple's on-device Foundation Models for PII detection — 52% F1 vs 100% F1, and why binary classification beats extraction.
Privacy-First Git Commit Message Generator
An on-device tool that analyzes git diffs and generates structured conventional commit messages — zero data leaves your machine.
Session Reuse Eliminates Hangs in Batch LLM Processing
Building a Live Terminal Dashboard for AI Coding Sessions
How to Build an LLM-Powered UI Generator
A technical deep dive into building a system where users type plain English and get live HTML mockups that match a specific design system — grounding, token budgets, streaming, and security.
Reducing LLM Token Usage by 44% with Selective Context Injection
Building FastAPI Services on Kubernetes
How I structure FastAPI applications for Kubernetes deployment — from project layout to health checks, pod templates, and CI/CD.
Always Profile Before You Optimize
Optimizing a High-Throughput FastAPI Service
How I achieved 50% lower latency and 40% higher throughput on a critical FastAPI service through async pipelines, SQL optimization, and strategic caching.
Async Python in Production: What They Don't Tell You
Async improves throughput but introduces debugging complexity, connection pool pitfalls, and error handling surprises. Lessons from running async APIs at scale.
Building a Modular Rule Engine for Credit Decisioning
How I designed a plugin-based rule engine that automated credit decisions, reduced manual review by 30%, and made rule changes deployable in hours instead of sprints.
Correlation IDs Go in Middleware, Not App Code
Kubernetes Pod Template Generator
A tool for generating production-ready Kubernetes pod templates with best practices baked in.
Structured Logging Is a Library, Not a Guideline
Restaurant Insights — NLP-Powered Dining Recommendations
A tool that finds nearby restaurants, analyzes reviews with NLP, and checks for parking — built in 3 days with APIs, then rebuilt in 3 minutes with AI prompts.
MindInChess — Chess Analysis App
A chess analysis app using Stockfish for blunder detection, accuracy scoring, and color-coded move insights with PGN upload and Chess.com/Lichess import support.