Blog · Tag

ingest.

6 posts in this archive.

Category

A year of ingest pipeline, condensed

Forty changes to the ingest pipeline across a year of shipping. The five that actually mattered, the ones that didn't, and what the pattern says about where to spend the next year's ingest budget.

Bo Bergstrom
Engineering

Shipped: bulk RFP ingest with duplicate detection

A short changelog entry. Bulk ingest of 10 RFPs in a minute, with block-level duplicate detection so the same clauses across multiple RFPs don't double-count in your KB.

PursuitAgent
Engineering

Turning a SOC 2 PDF into 140 KB blocks

The ingest, the extraction, the linking. A worked trace of how a SOC 2 Type II report becomes the set of KB blocks that DDQ answers cite — with the real pgvector row shape at the end.

The PursuitAgent engineering team
Engineering

Semantic deduplication of KB blocks at ingest

How we merge near-duplicate KB blocks at ingest time using embedding similarity, the threshold we settled on after testing four values, and the trade-off we accept by tuning toward over-merging.

The PursuitAgent engineering team
Engineering

Inside the ingest pipeline: parse, extract, index

How a PDF becomes searchable KB blocks. LlamaParse for parsing, structural-plus-semantic extraction, pgvector indexing with HNSW. Where each stage wins and where it falls over.

The PursuitAgent engineering team
Engineering

Shipped: multi-doc RFP ingest with attachment dependencies

RFPs ship as bundles. The scoring rubric, the technical appendix, the pricing workbook. The Analyzer now ingests all of them as one pursuit, with dependencies tracked between them.

PursuitAgent

See the proposal workflow

Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.