How I Cut Our Anthropic Bill by 84%: A Prompt Caching Playbook for 2026
Most teams treat Claude's prompt caching like a checkbox. Here's the production tuning playbook from three companies that dropped their bills 70-85% in a month.
Exploring tech, coding, and startups. A contributor to The Stack Stories.
Most teams treat Claude's prompt caching like a checkbox. Here's the production tuning playbook from three companies that dropped their bills 70-85% in a month.
Same workload, same embeddings, 90 days each. pgvector cost 30% of Pinecone and matched it on end-to-end latency. Where each system genuinely wins.
The European Commission issued its first AI Act infringement notices in May 2026. A practical compliance plan for small teams: what you owe, what you can ignore, and what costs €38K versus €4K to get right.
We migrated a production publication from Next.js 15.3 to 16's cache components over six weeks. Here are the numbers, the incidents, and what to do differently.
Six months and 17 production deployments later, here's when a Claude Skill wins, when an MCP server wins, and the hybrid pattern most teams miss.
Scaling laws stopped buying us reasoning. The next phase of AI is neuro-symbolic, world-model-driven, and considerably stranger than another transformer.
Autonomous coding agents stopped being a demo in 2026. Here is what actually shipped, what broke first, and how senior engineering changed shape.
After 12 months of running React 19 Server Actions on Next.js 15.4 at scale, here is the candid debrief: forms, validation, error handling, and the patterns that survived contact with real users.
After deploying 14 production agents on Claude Opus 4.7 and GPT-5.1, here is the unfiltered playbook on tool design, evals, cost control, and the failure modes nobody warned us about.
Chrome 134 made Gemini Nano generally available through the Prompt API. After porting three features to on-device inference, our edge inference bill fell 71%. Here is the engineering reality.
We run 50 microservices with six engineers. AI-powered DevOps tooling is the only reason that math works. Here is the actual stack, the cost, and the trade-offs nobody talks about.
By May 2026 most of the JavaScript build chain has been rewritten in Rust or Go. Here is the practical guide to what to use today, what to skip, and which tools genuinely make a difference.