Tim Costagliola — Applied AI Engineer

The Profile

The engineer who won’t ship
what he can’t defend.

Six AI products in production, built alone — then a year spent in public closing the gap that separates a demo from a system: evals, retrieval, observability, safety.

Written & built by the subject· Six products shipped· Every claim measured

I taught myself to build software the way most people learn a language: by moving somewhere it’s spoken and refusing to leave. The result is six shipped AI products — a marketing platform, a local-first assistant, tools that quote jobs and answer reviews — each one built end-to-end, alone, and run by real people.

Somewhere along the way I noticed the difference between the AI products people demo and the ones businesses actually trust. It isn’t the model. It’s everything wrapped around the model: the eval suite that catches a regression before a customer does, the retrieval layer that keeps answers grounded in fact, the traces that explain where the money went, the red-team pass that finds the jailbreak first. So that became the work — and I do it in public, with the numbers attached.

I direct AI to write a great deal of my code. I also read every line it writes, and I keep the receipts. If that sounds like the kind of engineer you need, the letters section is at the bottom of the page.

Fig. 01 — The author. Portrait forthcoming; the work sat for it first.

The Work

Three features · value first, engineering close behind

No. 1

Twexly

An AI marketing platform that gives a one-person business the output of a small agency. The part that matters: an eval and observability layer that keeps every generation on-brand and on-budget as it scales. The AI stays under control — not merely live.

production Next.js · Postgres RLS · metered AI funnel
headline figure — measuring — cost per generation, before and after the eval gate

Read the case study

Fig. 02 — The cost line the eval gate is built to bend. The real chart lands with the data.

No. 2

ARIA

A local-first AI assistant that keeps your data yours. A hand-written agent loop and a symlink-safe permission engine let it do real work on your machine — without handing your files to anyone’s cloud, including mine.

from-scratch agent runtime · Claude & Ollama backends
headline figure — measuring — the test suite that gates every release

Read the case study

Fig. 03 — The agent loop, drawn from memory. It was written the same way.

No. 3

ReplyPilot

Turns hours of review-response drudgery into minutes — and never auto-posts anything that could embarrass the business. On-brand replies drafted automatically; anything risky is held for a human. Time returned, reputation intact.

voice-learning few-shot · suggest-before-send guardrails
headline figure — measuring — the share held back for human review

Read the case study

Fig. 04 — Draft and reply. The held ones are the whole point.

Also in the catalogue — Handoffio, quoting & scheduling for solo contractors · Project Nightlight, a passive RF threat-detection field kit, built to a documented ethics standard · the roadmap artifacts, an eval harness, a measured RAG pipeline, a red-team report, shipping through 2026–27.

In Brief — the facts, small print, no varnish

Trade. Applied AI engineering: evals, retrieval, observability, safety — wrapped around full-stack product work (Next.js, Postgres, Python).

Record. Six AI products designed, built, and shipped solo; real users, real revenue constraints, no team to hide behind.

Method. Direct AI agents to build fast; read and defend every line; measure before claiming. The eval suite is not optional and never was.

Currently. A year-long public curriculum — evals in Q1, RAG in Q2, observability and safety after — every phase ending in a measured, published artifact.

Seeking. Applied AI / AI-engineering roles; also automation work for small teams who want their hours back.

Elsewhere. The build-log below, GitHub for the code, and a résumé in the letters section — one page, as it should be.

The Column

Build-log · published as it happens

Aug 2026 I added evals to my AI writer. Here’s what was actually broken. 9 min Oct 2026 Context-stuffing vs. real RAG: I measured all four approaches on my own data. 12 min Dec 2026 I cut my AI cost with zero quality loss — the trace that showed me how. 8 min Feb 2027 I red-teamed my own AI products. Here's how far attack-success fell. 14 min

Dates ahead of today are the publishing schedule, kept honestly. The essays land when the numbers do.

Letters

Write to the author.

Hiring for Applied AI? Drowning a small team in busywork an agent should be doing? Either way, the reply comes from a person, and quickly.

Email GitHub

The engineer who won’t shipwhat he can’t defend.

The Work

Twexly

ARIA

ReplyPilot

In Brief — the facts, small print, no varnish

The Column

Write to the author.

The engineer who won’t ship
what he can’t defend.