Field notes

What a Forrester Wave on proposal tools would need to evaluate

Forrester has not published a Wave specifically for proposal management. A criterion-by-criterion read of what such a Wave would need to measure — where the generic rubric fits real buyer behavior, where it lags.

The PursuitAgent research team 7 min read Research

Forrester has not, as of this writing, published a standing Wave specifically for “proposal management.” Forrester’s Wave series covers many adjacent categories (sales enablement, content-experience platforms, CLM), but a dedicated proposal-management Wave is not an announced analyst product. This post is hypothetical: if such a Wave existed, what would it need to evaluate, and where would the generic Wave rubric fit — or fail — the category?

The exercise is useful because enterprise procurement and sales-ops teams routinely ask “where are these vendors on a Forrester Wave?” when they short-list. The honest answer is “they aren’t, on a dedicated one” — and then the interesting conversation is what such a Wave would need to ask. We work from the public Wave methodology and reason forward. This is a critique of the hypothetical evaluation framework, not a summary of a real analyst release.

The Wave criteria, grouped

A generic Forrester Wave evaluates vendors across two top-level dimensions: current offering (what the product does today) and strategy (where the vendor is going). Within current offering, the public Wave methodology typically clusters software-category criteria into roughly six groups. A proposal-management Wave, by that pattern, would likely cover:

Content management. How the product handles the knowledge base — versioning, ownership, freshness, classification, search. Every analyst rubric would need this: content quality is the load-bearing input for a proposal tool. The criterion would correctly be weighted heavily.

Authoring & generation. How the product helps draft responses. AI suggestion, template handling, multi-author editing, formatting. This is where the AI-feature arms race shows up most visibly. The criterion would likely include “AI-driven response generation” as a sub-bullet, which is exactly where a generic Wave rubric would start to lag — see below.

Collaboration & workflow. Assignments, reviews, SME workflows, approvals. The 48%-SME-bottleneck reality lives here. The criterion would be right but would likely be underweighted in a generic Wave report relative to how much SME workflow actually determines team velocity.

Reporting & analytics. Win/loss tracking, content reuse stats, time-to-respond metrics. A predictable criterion, and one where most products would score similarly, because the actual analytics surface in incumbents is thin.

Integrations. CRM, content platforms, document management, sales engagement. Correctly weighted for enterprise procurement.

Security & administration. SSO, RBAC, data residency, audit logs. Foundational for enterprise. Hard to differentiate on.

Where the hypothetical rubric would match real buyer behavior

Three of the criterion groups, if a Wave used them, would map cleanly to what we see in actual buyer evaluations.

Content management would correctly be the heaviest criterion. The dominant complaint across G2 and Capterra reviews of the incumbents is content rot — Loopio’s “Magic” feature is widely reported to fail when the content library degrades. A rubric that weights content management heavily would reflect what actually breaks in deployments.

Integrations match enterprise behavior because procurement teams check vendor compatibility against the existing stack before they evaluate features. A tool that doesn’t integrate with Salesforce or SharePoint or Google Drive in the way the buyer expects is filtered before the AI evaluation begins.

Security & administration matter more than feature differentiation in regulated buyers. Any Wave that gave security its own criterion bucket would be honest about how procurement actually ranks vendors.

Where the rubric would lag

Three significant gaps, each of which we hear about from buyers who have just deployed an incumbent and are unhappy.

Gap one — AI generation gets evaluated as a feature, not as grounding

A generic Wave “AI-driven response generation” criterion would treat AI as a feature: does the product have it, how well does it generate text, does it support multiple languages. It would not strongly distinguish between AI that drafts from the buyer’s KB with citations and AI that drafts from the model’s training data with confident-sounding fabrications.

This is the single most consequential gap in the rubric. Stanford HAI’s research on legal RAG hallucination showed 17–33% hallucination rates on commercial tools that all advertised “grounded AI.” The same dynamic applies in proposal tools: incumbents like Loopio’s “Magic” and Responsive’s various AI features generate text that looks fluent and ships under deadline pressure with hallucinated compliance claims, fabricated case studies, and invented numbers (AutogenAI documented the failure mode).

A criterion that asks “does the product cite the source for every claim, and does the product refuse to generate when retrieval fails” would separate vendors that practice grounded AI from vendors that ship hallucination wrapped in marketing. A generic rubric would not ask that.

Gap two — SME workflow gets underweighted

Qorus’s research on the SME bottleneck — 48% of teams citing SME collaboration as their top challenge for five consecutive years — names the operational reality every proposal lead deals with. A generic collaboration criterion would be one among six in the current-offering dimension. In actual buyer evaluations we observe, SME workflow weighting is much higher.

A rubric that broke “collaboration & workflow” into its components — SME assignment patterns, async-interview support, ticketed asks with SLAs, review discipline — would more accurately reflect where buyer pain lives. A bundled criterion lets vendors score well on workflow with surface features that do not address the underlying SME-velocity problem.

Gap three — content freshness would not be its own criterion

Any “content management” criterion would likely fold content freshness inside it rather than treating it as a top-level axis. The most consistent failure mode in incumbent deployments is that content libraries go stale within months. A rubric that elevated freshness — automated freshness scoring, owner-driven reviews, decay detection — would let buyers see which vendors have made freshness a product feature versus which vendors treat it as a customer-managed maintenance task.

The 1up.ai team framed this exactly: proposal tools are “mostly just knowledge management” — and the knowledge-management problem is unsolved if the knowledge is stale. A category rubric that named freshness as a distinct factor would shift incumbent product roadmaps.

Criteria we would add

If we were redesigning the rubric, we would add three criteria.

Citation density and verifiability. For every claim in a generated draft, can a reviewer click through to a specific source span in the KB that supports the claim? This is testable in a 15-minute demo. A vendor that cannot demonstrate it is selling fluency, not grounding.

Refusal behavior. When the KB does not contain the answer, does the product refuse to generate, or does it hallucinate from training data? Testable with a small set of crafted prompts that ask about content the demo KB does not contain.

Post-mortem and write-back. Does the product capture what worked and didn’t on each bid, and does it write the lessons back to the KB so the next bid starts smarter than the last? This is the compounding test. Most incumbents fail it.

What a Wave would do well

Rubrics are imperfect by design. They have to be legible to a buyer who has 30 minutes to read the report, not the proposal-ops lead who lives in the tool every day. The strength of a Wave, when one exists in a category, is that it forces vendors to publish capability against a public standard.

Our critique above is about where the generic Wave rubric would lag, not whether it should exist. A category without analyst coverage is a category buyers can’t evaluate at all. A category with imperfect analyst coverage is a category buyers can evaluate cautiously, with their own supplementary criteria.

The supplementary criteria above — citation verifiability, refusal behavior, post-mortem write-back — are the questions we recommend buyers ask vendors directly, in a working demo, before short-listing. Any future Wave would be a starting point. The demo is the actual evaluation.

If you are evaluating a proposal tool against Wave-shaped criteria right now, we have deeper posts on the incumbents: the Loopio teardown and the Responsive teardown walk through the public capability and pricing of two vendors that would plausibly sit in the upper-right of a proposal-management Wave if one were ever published.

Sources

  1. 1. Forrester — research methodology
  2. 2. Loopio — Best DDQ software
  3. 3. 1up.ai — The problem with RFP software
  4. 4. G2 — Responsive (formerly RFPIO) reviews