Tag
#performance
6 pieces of content
Session Reuse Eliminates Hangs in Batch LLM Processing
Reducing LLM Token Usage by 44% with Selective Context Injection
Always Profile Before You Optimize
Optimizing a High-Throughput FastAPI Service
How I achieved 50% lower latency and 40% higher throughput on a critical FastAPI service through async pipelines, SQL optimization, and strategic caching.
Async Python in Production: What They Don't Tell You
Async improves throughput but introduces debugging complexity, connection pool pitfalls, and error handling surprises. Lessons from running async APIs at scale.
Why CPU-Based Autoscaling Fails for API Services
CPU utilization is the wrong signal for scaling API services. Here's why request latency-based HPA produces better scaling behavior and how to implement it.