Block reuse tracking: the metric that matters
Which KB blocks got used, in what proposals, and what they correlated with winning. How we instrument reuse, what the numbers told us, and where the signal turns into noise.
A knowledge base without reuse telemetry is a filing cabinet. A knowledge base with reuse telemetry is a system that can learn. This post is about the telemetry we instrument on every KB block, what it told us in the first year of running it in production, and the places the signal stops being useful.
What we track
Every KB block in the product has a usage row, and every draft that pulls the block writes to it. The minimal schema:
CREATE TABLE block_usage (
block_id UUID NOT NULL,
proposal_id UUID NOT NULL,
retrieved_at TIMESTAMP NOT NULL,
rank_in_retrieval INT NOT NULL,
included_in_draft BOOLEAN NOT NULL,
included_in_final BOOLEAN NOT NULL,
reviewer_edits INT NOT NULL DEFAULT 0,
reviewer_approval TEXT,
proposal_outcome TEXT
);
Seven fields beyond the identifiers. Each one is a decision we made, each one has a specific read.
retrieved_at and rank_in_retrieval. What did the retriever surface when this question was asked, and where did the block rank. Useful for measuring retrieval quality independent of drafting quality.
included_in_draft. Did the draft actually use the block, or did the drafter pick a different one. The gap between “retrieved” and “included in draft” tells us where the drafter disagreed with the retriever.
included_in_final. Did the block survive the review cycle, or did the reviewer remove it. The gap between draft and final tells us where the reviewer disagreed with the drafter.
reviewer_edits. A count of edit operations on the text derived from this block. Heavy edits mean the block was close enough to be kept but not quite right. A block that is consistently heavily edited is a block that wants to be rewritten.
reviewer_approval. Optional structured field the reviewer can set: accepted, accepted_with_edits, replaced, flagged_for_update. Filled in by reviewers about 60% of the time. We do not require it — required fields get filled badly.
proposal_outcome. Populated at post-mortem time: won, lost, withdrawn, no_decision. The honest part here: we get this populated on roughly 70% of proposals. The 30% are proposals whose outcome we never get told about, mostly because the customer never told us.
What the first year told us
We ran this telemetry through 2025 across our active customer base. The aggregated findings, across anonymized data:
The top 5% of blocks account for 55% of draft usage. This is the power-law pattern Sparrow’s research on content libraries anticipates, and it is the single most actionable finding. The 5% are the blocks every KB maintenance cycle should revisit first.
Blocks that get heavily edited on first use rarely get used again. Once a reviewer edits a block more than six times, the block’s retrieval rate in subsequent proposals drops by roughly 60%. Drafters remember which blocks are worth pulling. The system should too — and now it does. We weight blocks with high edit-count lower in retrieval.
Blocks that have been modified within the last 90 days correlate with higher win rates. We cannot make a strong causal claim here. The correlation is about 8 percentage points in our sample, which is large but confounded: freshly updated blocks live in KBs owned by teams that are paying attention, and attentive teams win more regardless. What we can say is the correlation is consistent, and the stale-KB pattern Shelf has written about shows up in our data as well.
Blocks marked flagged_for_update that don’t get updated within 30 days compound the problem. The flag is a signal of reviewer dissatisfaction. Blocks that carry a flag for more than 30 days are 2.3 times more likely to be replaced outright on the next use, rather than edited. The flag is a wake-up call with a short half-life.
What the data cannot tell us
Three things. I want to be honest about them.
Why a block was included or excluded. Our schema records that a drafter kept or replaced a block; it does not record the reason. Some percentage of replacements are “the retrieved block was fine, I just have a personal phrasing preference.” We can’t separate that noise from “the retrieved block was actually wrong.” The reviewer approval field is our closest proxy, and it is optional.
Whether a block’s contribution is causal to the outcome. A won proposal that used a given block did not necessarily win because of the block. The block might have been neutral, or even mildly harmful, and the proposal won on other axes. We can correlate, never attribute.
Whether a block is “good” in the abstract. Usage metrics measure how a block performs inside a retrieval stack and a drafting stack. Move either of those stacks and the same block looks different. A block that scores poorly today might score well under a better reranker.
How the telemetry changes product behavior
Four places, currently:
- Retrieval re-ranking. Blocks with high usage and high reviewer approval rank higher on retrieval for semantically similar questions. Blocks with high edit counts or high replacement rates rank lower.
- KB health dashboard. The maintenance view surfaces the top-usage blocks that haven’t been reviewed in more than 180 days. This is the single most-used internal tool our customer success team has in its weekly rhythm.
- Retire recommendations. Blocks with zero retrieval in the last 365 days get surfaced as candidates for archival. We don’t auto-archive — retiring a block that a regulator asks about next month is a real risk — but we name them.
- Post-mortem prompts. When a proposal closes as won or lost, the post-mortem flow surfaces the top 10 blocks that fed the response and asks the reviewer to label which ones worked. Those labels update the block’s reviewer-approval pattern.
The honest limit
The telemetry is observational. It describes what happened. It does not, by itself, tell us what to do. A block with a 70% replacement rate could be a bad block, or it could be a block the retriever keeps surfacing in the wrong context. We look at the pair — the block and the retrieval pattern that called it up — before we make a decision. In practice we have found the signal is usually honest: a consistently replaced block is usually a block that wants updating, not a block that is being pulled into the wrong context.
The telemetry is also, like all telemetry, only as good as its coverage. We get 70% outcome labels. The other 30% of proposals are a silent hole in the data. Improving that number is a customer-success problem, not an engineering problem, and it is the single lowest-hanging piece of the compounding loop we’d like to raise.
The one-line takeaway: reuse telemetry is the measurement layer of a KB that compounds. Without it, the content library is a filing cabinet. With it, the cabinet starts to learn which drawers the team actually opens — and which ones should be emptied.