Query rewriting for RFP questions with implicit context

Most RFP questions are written to be readable, not to be retrieved. “Describe your approach to data migration.” “Discuss your methodology for stakeholder engagement.” “Provide details of your quality assurance program.” These are competent prompts for a writer. They are bad prompts for a retriever.

Bad how? They retrieve too broadly. “Describe your approach” matches every paragraph in the corpus that says “our approach.” The system finds 200 candidate blocks at borderline relevance scores, none of them strongly preferred. The ranker downstream does its best, but the floor — see the grounded drafting loop — frequently fails to clear, and the question goes back to the human.

Query rewriting fixes this. The rest of this post walks through what it does, how we run it in production, and where it costs more than it earns.

The shape of the problem

An RFP question lives inside a section. The section sits inside a document. The document sits inside a procurement context. The question, read literally, drops all of that context. “Describe your approach” — to what? The answer is always in the surrounding text, but the retriever does not see the surrounding text by default.

Consider a question stem under section 3.4 of a state-government IT modernization RFP titled “Data Migration”:

3.4.7 — Describe your approach.

Read on its own, this matches everything. Read with the section heading and the document context, it means “describe your approach to data migration in a state-government IT modernization context.” That second version is the one we want to send to retrieval.

The rewrite chain

Production runs a three-step rewrite for any question that scores below a “specificity threshold” on a quick lexical check (short, abstract verbs, no domain noun phrases). Most RFP questions trip the threshold.

Step 1 — Inherit context. The rewriter is given the question stem, the section heading, two levels of document outline above it, and a one-paragraph document summary. It produces a rewritten query that explicitly attaches the inherited context.

Original: "Describe your approach."
Rewritten: "Describe the vendor's approach to data migration from a
legacy mainframe-based case management system to a cloud-native
platform for a state-level health and human services agency."

Step 2 — Decompose if compound. If the rewritten query contains “and” or a comma-list or two distinct nouns, it gets split. We retrieve once per part and union the results before ranking.

Compound: "Describe your approach to migration and your data
validation methodology."
Decomposed:
  - "approach to data migration"
  - "data validation methodology"

Step 3 — Generate paraphrases. A small model produces three paraphrases of the rewritten query that use different vocabulary for the same concept. This handles the case where the corpus uses terminology the question does not. (“Migration” in the question, “cutover” in the corpus.)

The retriever runs against all paraphrases, the results are merged, the top-K is taken, and the ranker prioritizes blocks that retrieved well across multiple paraphrases (a weak signal that the block is genuinely on-topic, not just keyword-matching one phrasing).

What this costs

Query rewriting is a small-model call per question. We use a Haiku-class model for it because the task is structured and the latency budget is tight. Per-question cost is in the low single-digit cents at our current pricing, and the latency add is around 200 to 400 milliseconds.

That is not free. On a 300-question security questionnaire, the rewrite layer adds roughly a dollar to the run cost and a minute or two to the wall-clock. Both are inside the budget envelope. We are publishing the cost-per-response breakdown later this week with the marginal cost of each pipeline stage broken out.

When rewriting hurts

Rewriting does not always help. Three cases where it hurts.

The question is already specific. “What is the maximum file size your API accepts on a POST to /v1/documents?” — that question is already a retrieval-grade query. Rewriting it adds noise and can drift the meaning. The specificity threshold check exists to skip rewriting in these cases. Roughly 30% of questions in our corpus skip the rewriter.

The inherited context is wrong. Section 3.4 might be titled “Data Migration” but the question is actually about a sub-topic the heading does not capture. The rewriter inherits the wrong context and produces a worse query than the original. We mitigate this by always retaining the original question as one of the paraphrases, so even when the rewriter drifts, the original retrieval signal is preserved.

Compound decomposition over-splits. “Discuss your security and compliance posture” gets decomposed into two queries. Sometimes the corpus has a single block that addresses both jointly, and the decomposed queries individually under-retrieve it. We catch some of these in the merge step (the joint block retrieves on both sub-queries with moderate scores and ends up high after merging), but not all. This is one of the open problems on the retrieval side.

What we tuned

Three things, in order of impact.

The inherited-context window. We started with “all section headings up to the document root.” That was too much. Long context strings retrieved worse than short ones because the model’s rewrite drifted toward generic state-procurement language. We cut to two levels of outline plus the document summary. Retrieval scores improved.

Whether to keep the original question in the paraphrase set. We tried it both ways. Keeping the original always wins by a small margin on our held-out evaluation set. The original sometimes carries vocabulary signal that the rewriter does not preserve.

The size of the paraphrase set. Three paraphrases plus the rewritten query plus the original. We tried five and seven; gains plateaued at three.

Where we are not done

Multi-turn questions. Some RFP sections set up context across multiple paragraphs and then ask a question that depends on the prior paragraphs. The rewriter currently does not see paragraph-level context within the section, only the section heading. We are evaluating whether to extend the inherited context window or to run a separate “section summary” pre-step that the rewriter consumes.

Table-form questions. Security questionnaires especially love a table format where the column header is the implicit context for every row. “Encryption at rest — yes/no/details.” The rewriter has to know that “Encryption at rest” is the topic and “yes/no/details” is the answer schema. Current handling is heuristic and brittle. A more structured table-aware rewriter is in the next milestone.

Cross-document context. When a question references “the system described in Section 2,” the rewriter does not currently follow that reference. The retrieval target is the system, not the section. We are prototyping a reference resolver that walks the document graph at rewrite time. Not in production.

The short version

Query rewriting is the cheapest, highest-payoff retrieval improvement we have made on the RFP-questions side. It costs cents per question. It moves the retrieval-floor pass-through rate from “borderline acceptable” to “comfortable” on the questions that previously failed silently. It is not novel — query rewriting has been in retrieval literature for a decade — but it is under-deployed in proposal AI, and the gap is visible if you compare a pipeline with it to a pipeline without it.

Next time a question retrieves badly in your account, look at what the system rewrote it to. The rewrite is logged. If the rewrite is wrong, that is the most useful data point you can hand back to us.