Field notes

The color-team review discipline, explained for modern teams

Pink, red, gold, white. The four-team review discipline most modern proposal shops know by name and don't actually run. This post reclaims it — what each team is for, why teams skip it, the rubrics, and how to run reviews async in 2025.

Sarah Smith 18 min read RFP Mechanics

Color teams — pink, red, gold, white — are the Shipley orthodoxy for proposal review. Every proposal manager I have worked with knows the names. Most of them do not actually run the reviews.

That is the core failure I want to argue against in this post. The names live in the org chart. The discipline does not. A team that has “scheduled red team for next Tuesday” on its calendar has done the easy half of the work; the hard half is whether Tuesday’s meeting will produce signed action items that change what ships, or whether it will be a 90-minute Zoom that everybody attends, nobody prepares for, and nothing comes out of.

Shipley wrote the canonical text on this in their Proposal Guide and their public color-team posts. APMP carries the same vocabulary in the Body of Knowledge. The orthodoxy was built for federal pursuits with eight-figure values and three-month proposal cycles, and it shows: full-day reviews, large rooms, formal report-outs. Modern teams — 10-person commercial shops, mid-market RFP factories responding to two bids a week — import the names and skip the substance because the original ceremony does not fit the cadence.

This post reclaims the discipline. The four teams, in order. What each one is for. Why teams skip them. The rubrics that make each review do real work. How to run reviews async in 2025 without a 90-minute Zoom. And the honest cases where a team should collapse two reviews into one.

The four teams, explained

Shipley defines four staged reviews at four points in the lifecycle. The percentages below are the standard reference points; treat them as anchors, not rules.

Pink team — structure, ~30% drafted

What it is. The first review. The draft is roughly 30% complete — outline approved, win themes drafted, two or three sample sections written to the standard the rest will follow. Pink team is not a content review; it is a structure-and-strategy review. The question is not “is the writing good,” it is “are we writing the right document.”

Who attends. A small group: the proposal manager, the capture lead, one or two senior reviewers who are not drafting (a sales engineer, a solution architect, a finance lead), and the strategist or executive sponsor whose mental model of the win is being implemented in the response. Five to seven people. Drafters do not review their own sections at pink team; they answer questions about them.

Rubric. Three questions, in order:

  1. Does the response structure match the buyer’s compliance matrix? Section by section, requirement by requirement — every “shall” and “must” in the RFP has a section that owns the response, with a pointer to the writer.
  2. Are the win themes named, specific, and threaded? A win theme that does not appear in three or more sections is a slogan in a cover letter.
  3. Is the proposed shape of each section appropriate? Page allocation, voice, level of technical depth, evidence pattern.

Output. A revised compliance matrix with section assignments confirmed, a revised win-theme list (with any theme that fails the swap test deleted), and a list of structural changes the drafters need to apply before red team.

The whole point is that mistakes caught at pink team cost minutes to fix. Mistakes caught at gold team — when 95% of the document is drafted around the wrong structure — cost days.

Red team — content, ~80% drafted

What it is. The substantive content review. The draft is roughly 80% complete: every section is written, win themes are integrated, evidence is in place, citations are attached. Red team is the moment a fresh set of senior reviewers reads the response from beginning to end as if they were the buyer’s evaluation panel.

Who attends. A larger group than pink team. Five to ten reviewers, organized by the response’s evaluation factors. The technical-approach evaluator reviews the technical-approach section. The past-performance evaluator reviews the past-performance section. The cost reviewer reviews the cost narrative (not the spreadsheet — that is a separate audit).

Rubric. Five questions:

  1. Do the win themes appear in every major section, with evidence the evaluator can verify?
  2. Does each section answer the question the RFP actually asked, in the rubric’s language? Or does it answer a related question that the writer found easier?
  3. Are claims sourced? Every factual statement about our company, our product, or our past performance has a citation a reviewer can check.
  4. Is there a discriminator on every page? A discriminator is a sentence that names something we offer that the competition does not. A page without a discriminator is a page the evaluator will not score.
  5. Does the response read as if it were written for this buyer? Or as if it were written for “any buyer in this category”?

Output. A scored review per section — typically green/yellow/red against each rubric point — and a written list of action items per section, with owners and deadlines. Red team is the last review where major structural changes are still possible. After red team, you are polishing.

Gold team — win themes plus compliance, ~95% drafted

What it is. The final review before submission. The draft is 95% complete: every section finalized, win themes locked, all evidence cited, the cover letter and executive summary in their final shape. Gold team’s job is to make sure nothing slips between here and submission.

Who attends. A small, senior group. The executive sponsor, the proposal manager, the capture lead, and one external reviewer who has not seen the draft before. The external reviewer is the highest-leverage attendee — they read it the way the evaluator will read it, with no prior context to fill in the gaps. If a section confuses the external reviewer, it will confuse the evaluator.

Rubric. Three questions:

  1. Compliance — every requirement in the matrix has a non-null pointer to a response section, every “shall” is addressed, every page-limit and format constraint is met.
  2. Sourcing — does any sentence in the response make a claim a reviewer cannot verify? If yes, either source it or delete it.
  3. Cohesion — do the executive summary, the technical sections, the past-performance section, and the cover letter tell the same story? A response that contradicts itself across sections is a response that the evaluation panel will read as careless.

Output. A short list of must-fix items (compliance gaps, unsourced claims, contradictions) and a short list of nice-to-have items the team will fix if time allows. The submission goes out when every must-fix is closed.

White team — post-submission

What it is. The retrospective. After submission. Sometimes after award. White team is what most teams call the post-mortem; Shipley folds it into the color framework because the discipline is the same — staged review against a rubric, with named output.

Who attends. The proposal manager, the capture lead, the primary writers, and — if the team can get them — someone from the buyer’s evaluation panel via debrief.

Rubric. Four questions:

  1. What did we learn about this buyer that we did not know at kickoff?
  2. What did we learn about our own KB — content blocks that worked, blocks that were missing, blocks that were stale?
  3. Which win themes earned weight in evaluation, and which did not?
  4. What changes to our process do we make for the next bid?

Output. Updates to the KB. Win themes promoted or retired. Process changes scheduled. The next bid starts from the accumulated intelligence of every prior bid that ran a real white team.

This connects directly to the post-mortem stage of the eight-stage RFP pipeline. A white team that does not write its output back into the corpus is a white team that did not happen.

Why modern teams skip it

I have watched the same three failure modes in dozens of proposal shops across commercial mid-market and federal pursuits. Each one is fixable; each one is also the default state.

Failure mode 1 — reviews are cancelled because the draft isn’t ready. The most common one. Pink team is scheduled for Tuesday. Monday afternoon, the proposal manager looks at the draft. The compliance matrix is half-built. Two of the sample sections aren’t written. Win themes are still bullet points. The proposal manager makes the rational decision: cancel pink team and reschedule for Thursday. Thursday, the same thing happens. By the time something gets reviewed, the draft is at 70%, and the review that should have been a pink team turns into a hybrid pink-and-red that catches structural issues too late to fix without missing the deadline.

The pattern’s root cause is upstream — kickoff missed deadlines, capture work didn’t finish, SMEs didn’t deliver — but the visible symptom is review cancellation. Once a team has cancelled two pink teams in a row, they stop scheduling pink teams. The discipline collapses.

Failure mode 2 — reviews happen but produce no action items. The second most common. Red team is on the calendar. Eight people show up. The draft is ready. The reviewers read it. The meeting happens. Comments fly. Nothing gets written down. The proposal manager promises to “consolidate the feedback” and the team goes back to writing.

A week later the submission ships. Half the comments were addressed because the section owners remembered them. Half were not. Nobody can tell which is which. The review happened in the sense that it took 90 minutes; it did not happen in the sense that anything changed because of it.

The pattern’s root cause is the absence of a tracker. Reviews need a written rubric, comments captured against the rubric, and named action items with owners and deadlines before the meeting ends. Without that scaffold, the meeting is theater.

Failure mode 3 — reviews are done by the people who drafted. The third one is more subtle. The team is small — 10 people total, four working on this proposal, all of them writing. There is no separate reviewer pool. Red team becomes a session where the four writers read each other’s sections. Each of them is too close to the work. They notice typos and miss structural issues. They are too kind because they will be on the receiving end next bid.

A drafter reviewing their own work — or a peer’s, in a small enough team that the dynamic is reciprocal — is not running a review. It is an edit pass. Edits and reviews are different acts. Reviews require distance. The Bid Lab makes the same point about small shops importing the ceremony without the structural conditions that make it work.

A worked example. A six-person commercial shop responds to a 60-page RFP. Four people draft. Two people are notionally available to review but one is the founder who is on a sales trip and the other is the head of customer success who is in three customer calls every day of the response window. Red team is scheduled. The day before red team, the proposal manager realizes neither reviewer has read the draft. The 90-minute meeting becomes the reading session — the reviewers open the document for the first time at 9:00 AM and try to read 60 pages live while commenting. By 10:00 AM they have read 20 pages. By 10:30 they are exhausted. The meeting ends. The draft has been read once, fast, by people who should have read it three days earlier with the rubric in hand. The team submits and loses on a structural issue a fresh reader would have caught at hour one of a real review.

The fix in that case is not “skip red team.” The fix is “either acquire a reviewer who can give the draft 90 minutes of distance, or accept that the bid does not have a real red team and budget time for a structural rewrite when red team would have surfaced one.” Both options are honest. The default option — pretending the meeting was a review when it was a reading session — is not.

The review rubrics

A review without a rubric is a vibe. The rubric is what makes a review reproducible — what makes Tuesday’s red team comparable to last quarter’s red team, what makes one reviewer’s input commensurable with another’s.

Pink rubric — does structure match compliance? The pink-team reviewer takes the compliance matrix and the response outline and walks them line by line. For every requirement: which section answers it, who is writing it, what is the page allocation. Anything unassigned is a structural gap. Anything assigned to a section that does not yet exist in the outline is a structural gap. Anything assigned to a writer who is also assigned to four other sections is a capacity problem the proposal manager needs to solve before the next milestone.

The output of a pink-rubric review is a marked-up matrix. Green rows are clean. Yellow rows have an assignment but a question (page allocation looks light, win-theme integration unclear, etc.). Red rows are unassigned or assigned to a non-existent section.

Red rubric — do win themes appear with evidence? The red-team reviewer reads each section with the win-theme list in hand. For each major section: is win theme A present? Is win theme B present? Is win theme C present? Is each presence backed by evidence the evaluator can verify? A win theme that appears as italicized boilerplate in the cover letter and nowhere else is not a win theme; it is a slogan.

PropLibrary’s swap test belongs in the red-rubric review: take any win-theme paragraph, swap your company name for a competitor’s, and see whether the paragraph still parses. If it does, the win theme is generic and the evaluator will filter it as fluff.

The output of a red-rubric review is a per-section scorecard with a pass/fail per win theme and a written rationale where the score is not pass.

Gold rubric — does any sentence lack a verifiable source? The gold-team reviewer reads the final draft sentence by sentence and asks of each substantive claim: where would I check this? If the answer is “I can’t” — if there is no link, no citation, no reference to a document a reviewer or an evaluator could open — the sentence is a gold-team finding. Either source it or delete it.

This is the rubric that grounded AI is meant to enforce automatically — every sentence drafted from a citable KB block, with the citation rendered inline. Even with grounded drafting, the gold rubric is worth running by hand on a senior reviewer pass, because the failure mode the rubric catches is exactly the failure mode unaided LLM drafting introduces in non-grounded portions of the response (the cover letter, the executive summary, the human-edited final pass).

The output of a gold-rubric review is a list of sentences flagged for citation or deletion, with no other category.

White rubric — what did we learn that changes the KB? The white-team reviewer takes the post-mortem questions and writes the answers as KB updates. Not as meeting notes. As actual changes to actual content blocks. Win theme A worked — promote that block to “preferred for healthcare verticals.” Win theme B did not — retire that block, or re-tag it. The technical-approach answer to question 3.4 was scored low — open the corresponding KB block, edit it, version it, attach the lessons.

The output of a white-rubric review is a diff against the KB, not a document.

Running a review in 2025

The 90-minute synchronous review meeting is the part of the orthodoxy modern teams should retire first. It worked when the team was 50 people in one office on one bid for three months. It does not work when the team is six people across three time zones responding to the second bid this week.

Async-first is the alternative.

Setup. Three days before the review milestone, the proposal manager publishes the draft (or the section, for staged reviews) to a review tool with the rubric attached and named reviewers tagged. Each reviewer has a 48-hour window to leave typed comments against the rubric. Comments are labeled with severity — blocker, major, minor, or nit — using the same scale we use in our own engineering code reviews, because the labels are useful and the underlying review craft is similar.

Discussion. A 30-minute synchronous meeting on day three resolves blocker and major items. Minor and nit items are handled by the section owners on their own time. The synchronous meeting is shorter because the reviewers are not reading the draft live; they have read it. They are aligning on the small set of items that need conversation.

Action items. Every blocker and major item leaves the meeting with a named owner and a deadline. Those deadlines are before the next milestone — never on the day of the next milestone, never on submission day. The proposal manager tracks them in the same tracker as the rest of the proposal’s open work.

Audit trail. Every comment is timestamped. Every action item has a status. After the bid is submitted, the white team can see exactly what was raised at red team, which items were addressed, and which items were deferred with a written rationale. Reviews that don’t leave an audit trail can’t be improved.

Reviewer load. Async-first means the reviewer’s 90-minute commitment becomes a 60-minute read on their own clock plus a 30-minute discussion. That is roughly the same total time, distributed differently — but the distributed version is what makes the review possible at all for senior reviewers whose calendars don’t have a 90-minute synchronous block in the window the proposal needs. A VP who can’t move 90 minutes can almost always move three 20-minute blocks across three days. The async pattern is what opens up the calendar of the reviewers a small shop needs most.

Tooling. The specifics matter less than the discipline, but a tool that supports inline comments against rendered prose, severity labels, named owners, and a status field per action item is the floor. Most modern proposal platforms cover this; a shared Google Doc with a rubric pinned to the top and a tracking spreadsheet alongside it covers it too. The discipline is the rubric, the labels, the owners, and the audit trail. The tool is whatever lets those four things live together.

The Lohfeld Consulting team has argued that proposal managers spend more time chasing SME responses than building strategy. The async-review pattern moves some of that chasing into a structured tool — comments and action items live in one tracker — and frees the proposal manager’s synchronous time for the items that genuinely need conversation. The 90-minute Zoom is not the discipline. The discipline is the rubric, the comments, and the closed action items.

When to skip a team

You shouldn’t. Each team does work the others can’t substitute for. Pink team is structural; red team is content; gold team is final; white team is learning. Collapsing red into gold means catching content issues at 95% complete instead of 80% — too late to fix structurally. Collapsing pink into red means catching structural issues at 80% complete instead of 30% — even later.

That said, smaller shops sometimes collapse pink and red into a single midpoint review at roughly 60% complete. The honest answer is that this works when two conditions hold: the response is short (under 30 pages), and the team has run enough prior bids against this buyer that the structural questions are mostly answered before the draft starts. Under those conditions, a single 60%-draft review can do the work of pink and red without losing too much. Under any other conditions, the collapsed review catches structural problems too late to fix and content problems too early to evaluate.

The team that should never be skipped is white. Win or lose, the post-submission review is the only mechanism that compounds. Without it, every bid starts from zero. The Shipley canon has been clear about this for forty years; the practical failure rate on actually running a white team is, in my experience, somewhere north of 80%. The most consequential review is the one teams skip most often.

If you read one thing in this post and change one habit, change that one. Run the white team. Write the output back to the KB. The next bid will be measurably easier.

Closing

Color teams are not a Shipley relic. They are a discipline that was packaged for federal pursuits and that, with adaptation, scales down to the cadence of a 10-person commercial shop responding to two bids a week. The names are the easy part. The rubrics are the work. The async-first format is the modern adaptation. The white team is the one that compounds.

The canonical pipeline post — the eight-stage RFP response pipeline — covers Stage 6 (Color-Team Review) at a higher level. The full color-team playbook, with templates, sample comment scales, and a worked example of a red-team session run async, is the next post in this thread and lands later in Q4. Until then: pick the next bid in your pipeline, schedule the four reviews on the calendar, write the rubric for each one, and run the first one as if it were the only review you have. The discipline starts there.

Sources

  1. 1. Shipley Proposal Guide (7th ed.), Shipley Associates
  2. 2. Shipley — Color team reviews
  3. 3. APMP Body of Knowledge
  4. 4. Lohfeld Consulting — How to fix the proposal processes holding you back
  5. 5. PropLibrary — Proposal win themes: the good, the bad, and six examples
  6. 6. The Bid Lab — Proposal color team reviews explained

See grounded retrieval in the product.

Start a trial workspace and watch PursuitAgent draft cited answers from the documents you provide.