Blog · Tag
embeddings.
6 posts in this archive.
Embedding evaluation, revisited
What we measure differently from 12 months ago. How the gold set grew, which metrics earned their spot in CI, and which ones we quietly retired.
Clustering win themes across 200 past bids
How we cluster win-theme assertions across a corpus of past proposals to surface repeat themes, where the signal is real, and where the clustering is just noise dressed as insight.
Migrating to Gemini Embedding v3, the safe way
A dual-index backfill and a staged cutover across two weeks. How we evaluated retrieval deltas before the switch, what we watched for during the cutover, and the one metric that gated the final flip.
Per-customer embedding tenancy, explained
How tenant isolation works at the vector level in PursuitAgent. Why we use Postgres row-level security on pgvector as the default, where shared embedding spaces would be cheaper, and the trade-offs we are not willing to take.
Semantic deduplication of KB blocks at ingest
How we merge near-duplicate KB blocks at ingest time using embedding similarity, the threshold we settled on after testing four values, and the trade-off we accept by tuning toward over-merging.
Embedding model selection: why Gemini Embedding 2 for proposals
A teardown of how we evaluated four embedding models — Gemini Embedding 2, OpenAI text-embedding-3-large, Cohere embed-v4, and Voyage — for a proposal corpus, and the methodology that drove the choice.
See the proposal workflow
Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.