Blog · Tag
latency.
4 posts in this archive.
Draft latency, a year on: 45s P95 to 28s
A year of draft-latency work. What moved P95 from 45 seconds to 28, which changes cost quality and which cost money, and the three tradeoffs we chose not to take.
The SLA on draft generation: 45 seconds, 95th percentile
The operational target we hold draft generation to, why it's 45 seconds and not 30 or 90, and the specific things we do to hold the number under peak federal-FY-Q2 load.
Draft autocomplete latency, end to end
Typing lag, inference queue, streaming output. The three budgets that add up to the 240ms P95 we hold ourselves to, and what happens when any one of them slips.
Our retrieval latency budget, explained
Where the milliseconds go in a single retrieval call: embedding lookup, vector search, reranker, hybrid merge, payload hydration. P50 120ms, P95 400ms, and what we cut to get there.
See the proposal workflow
Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.