How the draft packet is generated, line by line

Yesterday’s post covered what’s in an SME draft packet — the question, the context, the proposed answer, the conditional sections. This post is the engineering version: the actual prompt, the retrieval context shape, and the output template.

I’ll use a worked example. The question is rephrased from a real (anonymized) federal RFP we used as a test case during development.

The input

Question: "Describe the offeror's approach to ensuring continuity of
operations during a regional cloud-provider outage. Include RTO and
RPO commitments and any past performance demonstrating the approach."

Bid: federal civilian agency, IT modernization, midsize ($25-50M),
incumbent has been on contract 6 years. Section weighted 15%.

The packet builder runs the five-step pipeline we described yesterday: hybrid retrieval, rerank, grounded draft, claim verification, prior-answer lookup. This post zooms into step three — the grounded draft.

The retrieval context

Before the prompt runs, the reranker has surfaced four candidate KB blocks:

Block A — the company’s BCP/DR plan, dated 4 months ago, with explicit RTO/RPO targets (RTO 4 hours, RPO 15 minutes for tier-1 services).
Block B — a customer case study from 18 months ago describing a real regional outage and the company’s response (failover to secondary region in 23 minutes, no data loss).
Block C — the architecture doc describing multi-region active-active configuration.
Block D — a stale block from the same company’s old single-region disclosure, dated 14 months ago, marked superseded but still indexed.

The retriever returns all four. The reranker scores Block D low because the freshness signal flags it as superseded. The packet builder passes the top three to the drafting prompt with their full text and explicit citations.

The prompt

The drafting prompt has three sections: a system message that defines the constraint, a user message that frames the task, and a structured-output template.

[system]
You are drafting a single section of a proposal response. Your job is
to compose an answer to the user's question using ONLY the source
blocks provided. Do not introduce facts, numbers, or named entities
that do not appear in the source blocks. If the source blocks do not
support a complete answer to the question, return a partial answer
and list the unanswered components in the "unknowns" field.

Every claim in your answer must cite the source block ID it came
from. Numeric claims must match the source block exactly. Do not
soften or harden numeric commitments.

If a source block is marked "superseded" or "stale", do not use its
content. Reference the freshness flag instead.

[user]
Question: "Describe the offeror's approach to ensuring continuity of
operations during a regional cloud-provider outage. Include RTO and
RPO commitments and any past performance demonstrating the approach."

Bid context: federal civilian agency, IT modernization, midsize
($25-50M). Section weight 15%.

Source blocks:
- Block A (BCP/DR plan, freshness: current, 4 months old): [text]
- Block B (Customer case study, freshness: current, 18 months old): [text]
- Block C (Architecture doc, freshness: current): [text]

Output format: structured JSON matching the OutputTemplate schema.

The constraint to use only the source blocks is the load-bearing instruction. We tested earlier versions that allowed the model to “fill gaps with reasonable inference” and got the same hallucination patterns Stanford’s legal-RAG research documented at 17–33% rates. The strict-source constraint cuts the rate substantially — but not to zero, which is why step four (claim verification) runs after.

The output template

The drafting model returns a structured object, not a paragraph. The composer turns it into the SME-facing format.

{
  "answer_sections": [
    {
      "heading": "Approach",
      "text": "Vendor X operates a multi-region active-active configuration
               for tier-1 services [Block C]. During a regional cloud-provider
               outage, traffic fails over automatically to the secondary
               region [Block A].",
      "citations": ["Block C", "Block A"]
    },
    {
      "heading": "RTO and RPO",
      "text": "RTO commitment for tier-1 services is 4 hours; RPO is 15
               minutes [Block A].",
      "citations": ["Block A"]
    },
    {
      "heading": "Past performance",
      "text": "In a documented regional outage 18 months ago, Vendor X
               failed over to the secondary region in 23 minutes with
               no data loss reported [Block B].",
      "citations": ["Block B"]
    }
  ],
  "unknowns": [],
  "warnings": []
}

The output is then composed into the SME packet’s “proposed answer” section, with citation pins next to each claim.

What the verification step catches

The structured output goes into step four — claim verification — before being shown to the SME. Verification looks at numeric and named-entity claims and checks them against the cited source.

In this example, three numeric claims:

“RTO commitment 4 hours” — cited as Block A. Verifier finds “4 hours” in Block A. Pass.
“RPO 15 minutes” — cited as Block A. Verifier finds “15 minutes” in Block A. Pass.
“23 minutes failover” — cited as Block B. Verifier finds “23 minutes” in Block B. Pass.

If any of these had failed — if the model had said “12 minutes” when Block B said “23” — the verifier would have flagged the claim as a numeric mismatch and either refused to ship the draft or moved the specific claim to the “unknowns” list for the SME to confirm. We covered the verifier deeper in the claim-level verification post.

What the SME sees

The SME gets a Slack message with the packet rendered as readable prose, citations as clickable pins, and the deadline. Their two-minute job: confirm the numbers are still current, confirm the case study reference is the right one to cite, and approve. If they disagree, they edit the answer in-line and the system captures the diff. We covered the diff capture in the answer-block tagging changelog (publishing Day 135).

Where the prompt still misses

Three known weaknesses. We’re tracking each.

Compound claims across blocks. When an answer requires composing facts from two blocks (e.g., “we have multi-region for tier-1 services [Block C] and tier-2 services [Block E]”), the verifier struggles to entail compound facts. We surface these as warnings, and the SME reviews. We wrote about the limitation in Grounded Retrieval 101 Part 4.

Implicit numbers. The buyer asks “how often do you patch?” and the source block says “we maintain a 30-day patch SLA.” The model often reports “monthly” instead of “every 30 days,” which is conceptually equivalent but technically a paraphrase the verifier flags. We’re tuning this.

Outdated freshness flags. A block can be flagged “current” by the freshness system but reference a SOC 2 report that is now 13 months old. The block-level freshness flag does not see inside the block. We’re working on span-level freshness signals.

The packet generation pipeline is one part of a larger SME workflow. The SME bot delivers the packet. The SLA tickets track the response. The async interview pattern is the workflow the packet supports. None of them work without the others.