A Christmas Eve draft review note

Two drafts crossed my desk this morning. The date is December 24th. Neither was supposed to land today; both did. This is a short note on what I saw.

Draft one: the confident-but-ungrounded

The first was a security-questionnaire response to a buyer’s 180-question DDQ. It read well — clean sentences, no obvious errors, compliance language in the right places. The citation panel on the side showed every answer tagged to a source document in the KB.

The problem surfaced on the fifth question I spot-checked. The answer claimed a specific certification audit cadence: “annually, with interim surveillance audits at six months.” The cited source said: “annually, with surveillance audits as required by the standard.” The two are not the same sentence. The citation was correct at the reference level — the right document, the right clause — and wrong at the claim level. The drafting prompt had smoothed the source’s “as required” into a specific “at six months” that was not in the source.

This is the Stanford HAI failure mode, in the wild, in a production draft. The citation was not a lie; the claim under the citation was. A reviewer who trusts the citation panel and does not re-read the source document ships the smoothed version into the buyer’s hands.

The draft got flagged, the verification prompt got a new test case added to the regression suite, and the answer got rewritten against the source’s exact language. None of that took more than 15 minutes. The cost of not catching it — a buyer audit finding a claim the source did not support — would have taken months to live down.

Draft two: the correctly-grounded but wrongly-structured

The second was an RFP response section on a public-sector bid. The claims were accurate; every cited source supported the statement. The problem was structural: the response followed the library’s default narrative template, and the buyer’s requirements document asked for a tabular answer with specific columns.

The content was right. The shape was wrong. An evaluator scoring against a compliance matrix does not read a narrative and award compliance; they look for the table the matrix asked for, and if it is not there, the response loses points against a section the team actually had the evidence for.

This is not a grounding problem. It is a classification problem — the inbound question was table-structured, and the classify-question prompt routed it to the narrative drafter. The fix lives upstream, in the classifier. The fix for the immediate draft is to re-run the draft through the table prompt. That is 20 minutes of work.

Why the review happened today

Because deadlines do not wait for calendars. The DDQ was due December 27th. The RFP response was due December 30th. “We will review it after the holidays” is not a sentence the buyer accepts. The drafts were reviewed today because they had to be, and the reviewer is working a short day to make room for a family dinner at five.

What this suggests about grounded AI

Both drafts looked correct at a glance. Both had structural errors that a glance missed. The value of grounded AI is not that the drafts are perfect — it is that the errors are localized and catchable. Draft one’s error was a single smoothed phrase against a citation; draft two’s error was a template mismatch. Neither draft was garbage. Both drafts were close enough that a thirty-minute review could fix them.

The category we are trying to build toward — where the draft is trustworthy enough to ship with light review — is not yet here. It is closer than it was in January. It is further than the marketing would have you believe. Today’s drafts were a snapshot of the gap.

Back to dinner.