Blog · Tag
llm-infra.
2 posts in this archive.
Engineering
Draft autocomplete latency, end to end
Typing lag, inference queue, streaming output. The three budgets that add up to the 240ms P95 we hold ourselves to, and what happens when any one of them slips.
The PursuitAgent engineering team
Engineering
Caching the draft step
How we cache partial drafts across proposals without introducing stale-answer risk. The cache key design, invalidation rules, and the directional cost impact we measured internally.
The PursuitAgent engineering team
See the proposal workflow
Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.