Blog · Tag

eval.

3 posts in this archive.

Retrieval evaluation, part 2: dealing with numeric claims

Why numeric facts break vanilla retrieval and the two tactics — hybrid search and numeric-claim isolation — that fix it. Continuation of the eval series.

The PursuitAgent engineering team Aug 25, 2025

Engineering

Our eval harness, on the command line

A walkthrough of the dev loop for retrieval changes — one command to baseline, one command to re-run, one to diff. The CLI ergonomics that keep us from tuning by feel.

The PursuitAgent engineering team Jul 7, 2025

Engineering Long read

How we evaluate retrieval quality on our own corpus

Our gold set, the metrics we track, the eval harness on a laptop, the regression-guard CI job, and the directional numbers we'll publicly stand behind. Long read.

The PursuitAgent engineering team Jul 2, 2025

See the proposal workflow

Take the 5-minute tour, then start a trial workspace with your own source material.

Take the tour Start trial