AI automations
#automationsWorkflows that run unattended inside real businesses, with the error handling, retries, and human checkpoints that let the owner stop babysitting.
ReplyPilot
Review responses drafted in the owner's voice from a handful of examples. Suggest-before-send by design; a null-provider fallback runs the whole app offline and labels fallback output instead of faking it.
MeasuringReview responder (n8n)
Classify → draft → human approval → post → log, self-hosted and production-hardened: error workflows, retries, idempotency. Built to run for months, not demos.
In progressProduction systems
#systemsFull products designed, built, and operated end-to-end.
Twexly
Multi-tenant AI marketing platform, ~33k lines. Tenant isolation enforced by Postgres
row-level security. Every model call flows through one generate() funnel:
routing, retries, structured output, cost metering in micro dollars. A tree-grep test
enforces the invariant.
Handoffio
Quoting and scheduling for solo contractors, 22k lines. Treats LLM output as untrusted input: strict JSON contract, hard validation layer. Nothing reaches a client unvalidated.
ProductionAgents & tooling
#agentsAgent runtimes and the plumbing around them. Written by hand first, so no framework holds mysteries.
ARIA
Local-first assistant on a from-scratch runtime: a hand-written think→act→observe loop with a loop-guard that catches repeated identical tool calls; one zod-typed tool registry serving both Ollama function-calling and an in-process MCP server; a symlink-safe permission engine. Files never leave the machine. 199 test files.
ProductionARIA's loop, rebuilt in LangGraph
The same control flow as a stateful graph: loop-guard, Postgres checkpointing, a human-approval interrupt with resume-from-kill.
In progressPublic MCP server + Agent Skill
Published to the registry with OAuth, annotations, a threat model, and tests. Then maintained in public: issues answered, releases cut.
2026Measurement & safety
#measurementThe machinery that makes everything else defensible. Every artifact ships with its numbers.
Eval harness
Golden dataset from real briefs, deterministic checks, a calibrated LLM judge, and a CI gate that fails the build on regression. Pointed first at Twexly's writer, then proven general against ReplyPilot.
In progress · Q3 2026Retrieval bake-off
Context-stuffing vs keyword vs semantic vs hybrid+rerank, scored on labeled queries over real content. The winner ships to production.
Q4 2026Cost & trace dashboard
Full-funnel tracing, per-feature cost attribution, online scoring of live traffic. Then cost cut with quality held flat, proven by evals.
Q1 2027Red-team report
A versioned attack battery run against my own products, guardrails built in depth, attack success measured before and after. Responsibly disclosed.
Q2 2027Hardware & embedded
#hardwareRange past the browser, down to interrupt handlers.
Project Nightlight
Passive Wi-Fi/RF threat-detection field kit. The same detector implemented three times: Python, host-testable C, ISR-safe Arduino. Bespoke C test harness, CI, and a documented ethics standard for defensive use.
Field-tested