State of Proposal Tools — Wave 1 2025

This is the first annual State of Proposal Tools report. We will publish a Wave 2 in February 2026 and one of these every six months on a steady cadence after that. The point of the report is to consolidate, in one place, what is publicly knowable about the proposal-software category — vendor positioning, customer reviews, pricing signals, AI claims, and the workflow bottlenecks the tools have not yet moved.

The report synthesizes 26 public sources plus our own observations from operating PursuitAgent. It does not include proprietary customer data, vendor-supplied metrics, or numbers we cannot tie to a public citation. Every claim of fact has a link.

We are a category participant. That position has obvious bias. We have tried to compensate by publishing the methodology in section nine and by citing competitors against their own customer reviews rather than against our opinions. Where we describe what we ourselves do, we say “we” and label the claim as ours.

Executive summary

Five findings, in order of how confident we are in them.

Customer-library staleness is the dominant complaint across incumbents. Loopio, Responsive, Qvidian, and QorusDocs all carry recurring G2 and Capterra review feedback that the content library rots and the AI on top rots with it (Capterra: Loopio, Autorfp aggregator, G2: Responsive, Capterra: Qorus, G2: Qvidian). This is the most-cited theme in the public review base.
The “AI in proposals” claim has not yet survived contact with audit. Stanford HAI’s legal-RAG study found commercial legal RAG tools — products marketed as grounded — hallucinate 17–33% of the time. Industry sentiment on Hacker News reflects similar skepticism: the RAG-hallucination thread and the reverse-RAG thread both center on whether retrieval-plus-generation is sufficient. AutogenAI’s own practitioner write-up names the failure modes specifically: invented case studies, fabricated stats, incorrect compliance claims.
Workflow bottlenecks have not moved in five years. Qorus’s 5-year-tracked survey shows 48% of proposal teams citing SME collaboration as their top problem, year after year. Quilt puts sales-engineer time per RFP at 100 to 300 hours. The tools have changed; the work has not compressed.
AI-first challengers are entering, but pricing transparency remains poor across the board. AutogenAI, Arphie, Quilt, 1up, and others have raised in the last 24 months. Public pricing remains quote-only at most enterprise tiers, including ours; this is a category norm we have started to push against (the pricing-in-public series) but most vendors have not.
The buyer side of procurement is underserved. Fairmarkit’s analysis notes the buyer-side pain — operational teams writing RFPs that read like wish lists — and there are very few tools serving that side of the market. The category’s writing leans almost exclusively to sellers.

The rest of this report unpacks each of these. We close with recommendations for buyers and a methodology section that is more honest about what this report is and what it is not.

The landscape

The proposal-software category has three layers worth distinguishing.

Incumbents (15+ years old, $50M+ revenue)

Loopio. Toronto-based, founded 2014, the most-reviewed product in the category by some distance. Strong on content-library management, weak on the AI features added after 2023. The “Magic” feature — their AI suggestion engine — is the recurring complaint vector across Capterra reviews, where reviewers consistently report that “the answers are usually wrong” on nuanced questions and that the suggestions degrade as the underlying library ages. The Autorfp aggregator summarizes the dominant theme: response quality declines when content libraries fall behind, and the expensive tool starts to feel like “an overpriced document repository.” We covered the pricing and feature analysis at depth in the Loopio teardown.

Responsive (formerly RFPIO). Portland-based, founded 2015, the largest direct competitor to Loopio in the enterprise tier. Strong on customer count and integrations. Weak on search and on the v3 UX rollout. The G2 review base carries the persistent meme: “the search is terrible. It constantly misidentifies what I’m searching for.” The pros-and-cons review cut is harsher: UX described as “sooooo clunky, impossible to locate exactly what you’re trying to find,” with the v3 launch making it “LESS intuitive and buggy” rather than better. Our Responsive teardown goes into the detail.

Upland Qvidian. The oldest in the category, acquired by Upland in 2018. Strong on enterprise-customer renewals through incumbency. Weak on every other axis the G2 review base measures: UI described as dated, AI performance described as inadequate, performance slow, price expensive. Reviewers note that new users have trouble fully utilizing the product. The renewal economics are the moat; the product quality has not kept pace with the category.

QorusDocs. Seattle-based, founded 2008, the smallest of the four incumbents. The Capterra review base is consistent across years: very slow performance (“long waits to preview files, view the cart, use the functionality”), a dashboard that hard-caps at 10 pursuits forcing teams to lose visibility, and content searches that return less-relevant results than the library implies should be available.

Challengers (less than 5 years old, AI-first)

AutogenAI. UK-based, founded 2023. AI-first product architecture; the only category challenger with significant venture funding from a top-tier UK firm. Strong on the AI marketing surface, weak on the verifiability of the citation-fidelity claims their marketing makes. Their own practitioner blog is among the more honest pieces in the category about hallucination risk. We covered them in the AutogenAI teardown.

Arphie. US-based, founded 2023. Specializes in security questionnaires and DDQs. Their glossary writeup names the underlying numbers honestly: 30-40 hours on a comprehensive security questionnaire; the time-lag problem where responses can be outdated before they are reviewed.

Quilt. Founded 2023. SaaS-focused. Public blog leans toward operational analysis: their bottleneck post is the source of the most-cited 100–300 hours per RFP estimate.

PursuitAgent (us). Founded 2025. We will not score ourselves in this report. The relevant artifact for our own positioning is the grounded-retrieval pillar.

Adjacent tools

1up. Sales-enablement-adjacent, but their analysis of the RFP-software market is one of the sharpest practitioner pieces in the category: most RFP tools “are mostly just knowledge management” with enterprise bloat that leaves users “getting lost.”

Leulu & Co., Sparrow Genie, Shelf, others. Ecosystem players writing on proposal-adjacent topics — post-mortems that don’t happen, content libraries that rot, outdated knowledge bases that undermine trust. Not direct competitors but their content informs the category’s pain-point map.

Shipley, Lohfeld, VisibleThread, Bid Lab, Trident Proposals, Loopio’s content arm, Fairmarkit, PropLibrary. Industry-knowledge publishers. Cited heavily through this report.

We have not attempted to size individual vendor revenues; the public information is too thin for that to be honest. Loopio and Responsive are the two largest by customer count and reviewer volume; the rest of the incumbents are clearly smaller; the challengers are pre-scale. That’s the most we will commit to.

What customers are actually saying

Across the public review base — G2, Capterra, and aggregator summaries — six themes recur. Each theme below is supported by a direct citation. We are quoting the substance, not paraphrasing.

Theme 1 — “Magic doesn’t work well.” The Loopio AI suggestion feature is the most-cited example. Capterra reviewers repeatedly note that the AI works on basic questions and fails on nuanced content; they end up re-editing most suggestions. The aggregator analysis generalizes the pattern: AI quality degrades when the library is not actively maintained, and “Magic” produces outdated or irrelevant suggestions once the library falls behind, turning the tool into “an overpriced document repository.”

Theme 2 — “The search is terrible.” Responsive’s G2 review base carries the most direct version of this complaint: search “constantly misidentifies what I’m searching for and shows completely unrelated results.” The underlying technical reality is that incumbent search tends to be keyword-match in a category where the buyer’s vocabulary and the seller’s KB vocabulary diverge often.

Theme 3 — “Clunky UX.” The Responsive pros-and-cons review cut names the v3 UX rollout as actively worse than the prior version. Qvidian reviewers say the same about that product’s UI. The pattern: 15-year-old products have accumulated 15 years of features, and the surface area shows.

Theme 4 — “Very slow, expensive.” Qorus’s Capterra base is the cleanest example: “very slow” recurs across years; the dashboard hard-cap at 10 pursuits forces visibility losses; content search returns less-relevant results. Qvidian reviewers add the cost dimension — the product is described as expensive relative to the value delivered.

Theme 5 — “Content libraries that rot.” This is the meta-theme that connects four of the above. Across vendors, customers consistently report that their content libraries fall out of currency within months of being populated, and the AI features sitting on top surface stale content as if it were current. Sparrow Genie’s analysis names the root cause: unclear ownership and stale content; team members respond with “outdated answers from a Google Doc that hasn’t been touched in eight months.” Shelf’s broader knowledge-base analysis makes the same point about KB content in general — outdated content “actively undermines user trust.”

Theme 6 — “Workflow bottlenecks unchanged.” SME collaboration as the top challenge across five consecutive Qorus surveys. 100-300 hours per RFP per Quilt’s bottleneck analysis. Proposal managers spending more time chasing SMEs than building strategy per Lohfeld. The tools have changed; the work has not compressed.

These themes are correlated. A product that ships an AI feature on top of a stale library will generate stale answers, which will read as “Magic doesn’t work,” which will surface as a search complaint when reviewers can’t find the right block to edit the answer toward. The four-vendor convergence on the same review themes suggests a category-level structural issue, not a vendor-specific quality issue.

The AI-in-proposals moment

Three observations on where AI in proposal software is actually landing in 2025.

Citations vs. genuine grounding

The most-cited evidence in the broader RAG-skepticism literature is the Stanford HAI legal-RAG paper, which found that commercial legal RAG tools — Lexis+ AI, Westlaw AI, Ask Practical Law — hallucinate at 17–33% rates despite grounding. The headline takeaway: citations being present does not mean the cited claim is supported by the retrieved evidence. The same gap exists in proposal tools, and it is the gap “AI with citations” marketing tries to obscure.

The Hacker News thread on RAG hallucination reflects the practitioner consensus: the naive retrieve-and-generate approach still fabricates, even with well-curated corpora. The reverse-RAG discussion, centered on Mayo Clinic’s per-claim verification approach, is sharper still: practitioners are debating whether per-claim entailment is the only viable defense, and whether the economics work at scale.

The proposal-tools market has not yet caught up. Most vendor marketing still treats “AI with citations” as the differentiator. The customer who pilots their own questions and clicks the citation links will discover the gap.

What practitioners are pushing back on

Trident Proposals wrote one of the more cited pieces of practitioner pushback in the last year: ChatGPT “doesn’t know you, your team, or your experience.” The argument is that proposal-grade prose is built from verifiable specifics, not theoretical best practice — and generic AI cannot supply the specifics.

AutogenAI’s own blog — a vendor in the category — names the proposal-specific failure modes: invented case studies, fabricated statistics, fabricated compliance claims. They are unusually honest for vendor content; we credit them for it. The honest framing is that the failure modes exist; the mitigations are partial; the discipline of grounded retrieval is necessary but not sufficient.

What this means in 2025

The category has bifurcated into “AI with citations” (the marketing claim) and “genuinely grounded retrieval” (the engineering claim). The customer’s pilot is the only place the two get distinguished. In Wave 2 of this report we plan to publish a citation-fidelity protocol — a 30-minute pilot procedure any buyer can run on any vendor — to make the distinction reproducible.

Pricing trends

What is publicly knowable about proposal-software pricing in 2025:

Loopio. Quote-only. The Loopio teardown reconstructs the per-seat pricing from public review data and cites the ~$1,700/seat/year figure that recurs in customer reviews. Floor varies by deal size; mid-market deals reportedly start around $30K-$50K annually, with enterprise tiers running considerably higher.

Responsive. Quote-only. Public reviews include scattered references to per-seat pricing in the same range as Loopio. The Responsive pricing trail post goes deeper.

Qvidian, QorusDocs. Quote-only. Less public data; customer reviews suggest similar floors with steeper enterprise multiples. Qvidian customers describe the product as expensive relative to value; the actual prices are not disclosed.

AutogenAI, Arphie, Quilt. Quote-only at the moment, with enterprise-tier sales motion. Some lower-tier products (1up, others) are starting to publish per-seat or starter-tier pricing publicly; this is a minority of the market.

The pattern. The category is structurally quote-only. Floors are deal-by-deal. Per-seat economics depend on customer count, content-library size, integration requirements, and AI-feature tier. A small-team buyer evaluating five vendors should expect five different deal structures and three weeks of sales conversations to compare them.

This is a category-level failure of transparency. Buyers cannot comparison-shop. Pricing fairness is determined by negotiation skill rather than published rates. We have argued elsewhere (the pricing-in-public series) that this is wrong; we have changed our own behavior in response. The rest of the category has not yet.

Workflow bottlenecks unchanged

Three numbers anchor the workflow story. All three are public; all three have not moved in five years.

48% — SME collaboration as the top challenge. Qorus has tracked this annually for five consecutive years. The number does not move. The best engineers are also the busiest; proposal work competes with billable work and loses. Lohfeld’s analysis names the same pattern from the proposal-manager side: more time spent chasing SME responses than building strategy.

100 to 300 hours — sales-engineer time per RFP. Quilt’s bottleneck analysis puts sales-engineer time at this range, the equivalent of 12 to 37 full workdays per deal. Most expensive talent pulled away from discovery, demos, and strategy. Revenue teams treat RFPs as “a necessary evil,” producing low win rates and burned-out SMEs.

500+ security questionnaires per year, 200-400 questions each. Safe Security’s analysis names this load at enterprise scale. One engineer reportedly processed 250+ questionnaires in a year, “spending his entire week responding to questionnaires rather than securing systems.” Vendors recycle stale answers; orgs collect reassuring “yes” responses that don’t reflect real security posture. The DDQ playbook goes into our own approach to this load.

The Loopio DDQ analysis supplements these numbers: 200-350 questions per DDQ, 15-40 hours per questionnaire, with 47% of response teams describing the work as a multi-day ordeal.

These numbers have not moved in the five years the surveys have been run. Tools have proliferated. AI has been added. The headline workflow times have not compressed in the public data. We will be honest about why we think that is in section seven.

A separate workflow theme: post-mortems don’t happen. Leulu’s analysis puts the failure mode plainly — “the debrief ends, the document is published, people move on.” The lessons don’t embed in workflow. The same mistakes recur.

And a craft theme: win themes are interchangeable. PropLibrary’s analysis names the swap test: “If you can swap your company name with another and the win theme still makes sense, it’s too generic.” Across the public proposal corpus we have looked at, most win themes fail this test. Tools that draft win themes from generic prompts produce worse offenders.

The VisibleThread analysis of government proposals closes the loop: rushing into writing without fully understanding the requirements is the leading cause of proposal failure. The tools that auto-draft from a thin understanding of the RFP are the tools that produce the most-failing proposals, not the most-winning ones.

A note on review ceremony: the Shipley color-team process is widely used but not widely useful at the size of team that adopts it. Bid Lab’s analysis is sharp on this: the four-round process built for federal pursuits is regularly imported into 10-person shops where it adds meetings without scale. This is a workflow tax that the tools have not addressed.

The buyer side carries its own pain. Fairmarkit’s analysis names the operational-team pattern: RFPs that read like wish lists, vendors that respond by inflating prices or promising features they cannot deliver. The category has very little tooling pointed at this side of the procurement.

What changes in 2025

Speculation, labeled as such.

Grounded retrieval becomes the differentiator. The customer who pilots their own questions and verifies the citations will start to distinguish “AI with citations” from “genuinely grounded retrieval.” We expect this to be the load-bearing axis of vendor selection by mid-2025. Stanford HAI’s legal-RAG findings make the case sharply: citation-presence is uncorrelated with citation-correctness. Markets eventually price the latter.

Citation discipline becomes table stakes. Vendors who refuse to ship low-confidence drafts will be selected over vendors who paper over uncertainty with yellow warnings. Refusal as a product feature is anti-intuitive and we expect it to win in regulated procurements first.

Content-health tooling becomes a product surface. Freshness scoring, expiry workflows, last-used signals — currently buried in admin panels — move toward primary-product status. The Sparrow Genie analysis of why content libraries fail names the cause; vendors who treat the cause as a product opportunity rather than a customer-success problem will win.

Win-loss intelligence emerges as a category. The Leulu post-mortem analysis names the gap; vendors will start to fill it. The compounding loop — every win making the next bid easier — is engineering, not slogan.

The AI-first vs. incumbent gap narrows from both sides. Incumbents will improve their AI; challengers will improve their feature breadth. The middle of the market will be where the interesting buyer decisions get made.

We are not predicting any of these with confidence. The first three we are reasonably sure of; the last two are open.

Recommendations

For buyers evaluating proposal tools in 2025:

Pilot the AI on your own stale questions first. Take five questions from a recent RFP, ask each shortlisted vendor to draft answers using their AI feature, and verify each citation by clicking through. If the citations point at the wrong block — or no block — the AI is decoration. This is a 30-minute exercise. It is more diagnostic than any analyst report.

Demand citation-verification UX. A proposal tool that lets the reviewer see the source block behind every cited sentence, side-by-side, is the minimum bar for grounded AI. A tool that hides this behind a separate workflow is pricing convenience over correctness.

Ask for reference calls with customers at your size. The Trident analysis names the underlying fact: a tool that works for a 200-person proposal shop may not work for a 5-person one, and vice versa. Reference calls at your scale are the cheapest insight you will get.

Demand pricing transparency. Quote-only pricing favors vendors and disadvantages buyers. Vendors who publish floors and ranges are signaling discipline and confidence; vendors who don’t are protecting their negotiation surface at your expense.

For vendors building in this category:

Be transparent on citation fidelity. Publish your gold-set methodology, your precision@5 numbers, your refusal rates. Stand behind them in pilots. The customer who runs the citation-verification pilot will learn the truth either way; vendors who lead with the truth will win that customer’s trust.

Build content-health as a product feature, not a customer-success problem. Freshness alerts, last-used signals, owner-of-record tracking. These are the signals that distinguish a product from a glorified document repository.

Treat refusal as a product feature. A system that confidently refuses to answer a low-confidence question is a system the regulated buyer will trust. A system that always answers is a system that occasionally fabricates.

Methodology and limits

This report synthesizes 26 public sources plus our own observations from operating PursuitAgent. The sources are listed in the frontmatter and cited inline.

What this report is:

A synthesis of publicly available customer reviews, vendor blogs, industry analyses, and academic literature on RAG and grounded AI.
An honest reading of the category’s most-cited pain points.
A reflection of our own position as a category participant.

What this report is not:

A vendor-by-vendor revenue or customer-count comparison. We do not have audited numbers; the public ones are unreliable.
A scorecard. Magic-Quadrant-style ranking would require a methodology we do not yet trust, and we are unwilling to publish a chart that pretends to a precision the underlying data cannot support.
A pricing matrix. The category is structurally quote-only; we will not synthesize a pricing comparison from speculation.
A buyer recommendation. We are a competitor. The recommendations section is procedural — what to test, what to demand — not “buy vendor X.”

What we did not include and why:

Reddit threads. Direct URLs did not surface cleanly during research; the content is referenced secondhand in industry blogs we did cite.
LinkedIn posts. Generic AI-hype content dominated the search results; we excluded them rather than wade through.
GitHub issue queues on RAG-citation bugs. The Stanford paper and the two Hacker News threads cover the same ground more authoritatively.
Vendor-supplied case studies and customer counts. Self-reported and not independently verifiable.

We will publish Wave 2 in February 2026 and rerun the analysis with whatever public data has accumulated by then. We expect the headline themes to remain stable; we expect the AI-fidelity gap to be the most-moved axis.

Closing

The proposal-software category is overdue for a rebuild. The incumbents have content libraries that rot, search that doesn’t find, and AI features that hallucinate. The challengers have AI-first products that have not yet survived contact with audit. The workflow bottlenecks have not moved in five years. The pricing is opaque.

Each of those facts is fixable. None of them have been fixed. The vendors who fix them will win the next decade of this market. The vendors who marginally improve on the existing structure will be displaced.

We are biased in saying so — we are building one of the alternatives. The bias is the point. A report from a non-participant would not have access to the operator’s view of where the structural cracks are. A report from a participant who would not name the cracks would not be honest. We have tried to do the latter without lying about the former.

The next twelve months will tell us whether the bet on grounded retrieval — as a product feature, not a marketing claim — is the right one. We expect it is. We will measure ourselves publicly and publish the results, the same way Wave 2 of this report will be published. The category is rebuilt by the vendors who measure themselves in the open.

This post is by the PursuitAgent research team. Research posts are a shared byline rather than a single author; views reflect PursuitAgent’s position and the work the team is doing on the category.