Scheduled
titles will change · dates won't slip quietlyI added evals to my AI writer. Here's what was actually broken.
A production writer, scored for the first time. What the baseline says, and what vibes never caught.
Q3 2026Context-stuffing vs. real RAG: four approaches, measured on my own data.
Stuffing, keyword, semantic, hybrid+rerank. Same queries, same content, real numbers. One winner ships.
Q4 2026I cut my AI cost with zero quality loss. Here's the trace.
Where waste hides in a production LLM funnel, and the proof the savings cost nothing.
Q1 2027I red-teamed my own AI products.
Attack success before and after guardrails, benign-refusal rate alongside. Refusing everything isn't safety.
Q2 2027