What 120 public debrief transcripts tell us about why bids lose

This is a research note from the PursuitAgent research team. Most win-loss data is private. Public sector procurement is the exception — federal protest decisions, state procurement appeals, and a smaller universe of formal debriefs surface enough text to read at scale. We read 120 of them.

The corpus and methodology are at the bottom of this post. The findings come first because that’s why you’re here.

The corpus

80 GAO bid protest decisions issued between January 2024 and December 2025, drawn from the GAO public database. We selected protests that included substantive evaluation discussion — not procedural-only filings.
30 state procurement appeals, drawn from California, Texas, New York, and Massachusetts state procurement portals. Public-records sourcing varied by state; we kept only appeals with published evaluation rationale.
10 published debrief summaries from federal agencies that proactively publish post-award narratives (a small number do; most don’t).

Total: 120 documents, ranging from ~3 pages to ~80 pages. We coded each for the stated reason the bid was unsuccessful — a different question from “the protest’s argument,” which is what the losing vendor claimed. Stated reason is the agency’s own evaluation language.

The codes were applied by two readers independently, with a third reader resolving disagreements. The full coded dataset is available on request to PursuitAgent customers under research-use terms.

Pattern 1 — Compliance failures, hidden behind technical-merit language

In 41 of 120 (34%), the agency’s stated reason for non-selection was a compliance failure that the agency described in technical-merit terms.

A typical example, reworded for anonymity: “The proposal did not adequately address the cybersecurity continuous monitoring requirement.” Read as a technical-merit comment, that sounds like a quality issue. Read against the underlying compliance matrix, the agency was telling you that the proposal failed to map a required compliance row to a discrete piece of evidence. The “adequately” is doing all the work, and what it means is “you mentioned it but you didn’t show it.”

This pattern matters because losing vendors typically hear this language and infer that they need to write better about cybersecurity. The actual failure was structural: the response didn’t show the auditable evidence the compliance matrix demanded. The fix is in the compliance matrix piece — it’s not a writing problem.

Pattern 2 — Past performance gaps, named directly

In 28 of 120 (23%), the agency directly cited the past-performance section as the differentiator. Phrases like “the awardee demonstrated three relevant prior contracts at scale; the protester’s references were either not directly relevant or did not approach the scale of this requirement.”

Two things to notice. First, agencies are more direct about past performance than about most other dimensions of evaluation. They will name it. Second, the failure mode is consistent: the losing vendor cited references that they considered relevant, but that the evaluator considered insufficient. The disconnect is almost always about scale and recency — references that are too small or too old, regardless of how well-written.

Pattern 3 — Pricing dressed as value, value dressed as price

In 22 of 120 (18%), price was the stated reason, but read closely the agency was almost always making a value judgment. “The protester’s price was higher and the technical advantages did not justify the difference” is a value statement disguised as a price statement.

This is the data behind Bo’s piece on the “we lost on price” excuse. In the protest record, agencies almost never lose on raw price alone for non-commodity bids — they lose when the technical case wasn’t strong enough to justify the premium. The corollary is that buyers who tell you “you lost on price” often mean “your value case wasn’t tight enough to make me defend the price internally.”

Pattern 4 — Win-theme mismatch with evaluation criteria

In 17 of 120 (14%), the agency cited a misalignment between what the proposal emphasized and what the solicitation prioritized. The classic version: a proposal foregrounded technical innovation when the solicitation’s weighted criteria emphasized risk reduction and on-time delivery.

This pattern is the most preventable of the four. It’s a capture failure first — the team didn’t read the evaluation weights and aligned the win themes against the wrong axis. VisibleThread’s recurring point about reading the requirements before writing applies in the strongest form here.

The remaining 12

The other 12 of 120 spread across small categories: page-count violations, late submission, format-of-deliverable failures, security clearance gaps, conflict-of-interest determinations. Each is its own kind of avoidable mistake; none has the volume to support a pattern claim.

Three things that surprised us

Surprise one — agencies are more transparent than expected. Federal protest decisions name reasons specifically. The “vague language” complaint we hear from losing vendors is partly a function of not reading the full decision. The decision is usually 20+ pages; the summary is a paragraph. Vendors read the paragraph.

Surprise two — past performance dominates more than capture-side wisdom suggests. Capture-side training emphasizes win themes and discriminators. The protest record suggests past performance citation is a stronger driver than either, especially for federal work over $10M ACV. If your past-performance section is weak, no win theme rescues it.

Surprise three — “innovation” arguments lose more than they win. When a proposal made a strong “innovation” case, it correlated with a lower win rate in our corpus (33% vs. 47% for proposals that didn’t lead with innovation). Sample sizes are not large enough for a confident claim, but the direction is consistent across the federal and state subcorpora. The hypothesis: agencies that buy through formal procurement are not actually buying innovation; they’re buying low-risk delivery, and innovation language reads as risk.

Methodology and caveats

The corpus is biased toward bids that went to protest. Protested bids are a small fraction of all bids and they skew larger and more contested. This is not a sample of all losing bids; it’s a sample of the losing bids whose losers cared enough to challenge the decision.

Coding inter-rater reliability was 0.78 (Cohen’s kappa), acceptable but not strong. Disagreement clustered at the boundary between Pattern 1 (compliance dressed as technical merit) and Pattern 4 (win-theme mismatch) — those are genuinely adjacent in some decisions.

Stated reasons are not causal explanations. The agency’s stated reason for selection or non-selection is filtered through the contracting officer’s preference for defensible language. We treat the corpus as evidence about what agencies say, which is the input losing vendors actually receive.

What this means for the win-loss dashboard

Two implications for the dashboard we shipped.

First, when a buyer says “you lost on price,” the dashboard surfaces the historic frequency of that stated reason in the team’s corpus and overlays the question of whether technical case strength correlates with debrief comment volume. We’re not claiming this answers the question — we’re claiming the question is worth asking with structure.

Second, when past performance is cited as a stated reason, the dashboard links to the proposal’s past-performance section and the KB blocks that sourced it, with their last-verified dates. A 18-month-old past-performance block cited in a federal bid is a leading indicator we’d flag.

The next research note in this series lands in March: a 60-state cross-cut of public win/loss disclosures and what state-level patterns look like compared to the federal data here.