Blog · Tag
eval.
3 posts in this archive.
Engineering
Retrieval evaluation, part 2: dealing with numeric claims
Why numeric facts break vanilla retrieval and the two tactics — hybrid search and numeric-claim isolation — that fix it. Continuation of the eval series.
The PursuitAgent engineering team
Engineering
Our eval harness, on the command line
A walkthrough of the dev loop for retrieval changes — one command to baseline, one command to re-run, one to diff. The CLI ergonomics that keep us from tuning by feel.
The PursuitAgent engineering team
Engineering Long read
How we evaluate retrieval quality on our own corpus
Our gold set, the metrics we track, the eval harness on a laptop, the regression-guard CI job, and the directional numbers we'll publicly stand behind. Long read.
The PursuitAgent engineering team
See the proposal workflow
Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.