Efficient Claim Verification with 4-Step Evidence Gates

Systematically grounds claims in evidence

Takes a list of claims with initial confidence scores and runs each through four verification gates — File Evidence, Read vs Reconstruct, LL Conflict, and Untested Assumption — returning a scored result per claim with a VERIFIED, FLAGGED, or UNVERIFIED status.

Verification Gate Sub-Routine

Cowork | QA (Quality Assurance) | Difficulty: Intermediate | Time: Under 1 min per claim

Tags: Introduced in Beta, Verification, Confidence, Sub-Routine, QA, Claims, Evidence, Cowork

TL;DR

What It Does

Systematically grounds claims in evidence. Takes a list of claims with initial confidence scores and runs each through four verification gates — File Evidence, Read vs Reconstruct, LL Conflict, and Untested Assumption — returning a scored result per claim with a VERIFIED, FLAGGED, or UNVERIFIED status.

How It Works

The recipe checks whether late-session mode is active (via CWK-010 token monitoring at >70% usage). Each claim then passes through four gates in sequence. Gate 1 (File Evidence) checks whether the claim can be backed by a specific file, line, or commit — failure costs 20. Gate 2 (Read vs Reconstruct) determines whether the claim is based on a current-session tool observation or memory reconstruction — failure costs 15. Gate 3 (LL Conflict) scans Lessons Learned for contradictions — conflict costs 15. Gate 4 (Untested Assumption) flags assertions about Cowork platform behavior not tested this session — penalty of 10. Confidence is clamped to 0–100. All gates pass = VERIFIED; above 60 with failures = FLAGGED; below 60 = UNVERIFIED.

What To Expect

A VERIFICATION_SUMMARY containing per-claim results with initial and final confidence scores, gate-by-gate pass/fail details, penalties applied, and a status classification. The calling recipe receives this summary and decides how to act — R-1 blocks on UNVERIFIED, R-8 re-delegates FLAGGED claims, and CWK-021 excludes UNVERIFIED from findings.

Best Results When You

Provide claims with realistic initial confidence scores rather than defaulting to 100. Include source context when available — it helps Gate 1 and Gate 2 resolve faster. Call this sub-routine early in the session when possible, before late-session decay applies.

Typical Time

Under 1 minute per claim. A batch of 5–10 claims typically completes in 2–5 minutes depending on how many re-reads Gate 2 triggers.

Difficulty

Intermediate — the sub-routine itself is automatic, but understanding the gate results and interpreting confidence adjustments requires familiarity with CRAFT’s verification philosophy and the calling recipe’s handling of each status level.

How To Start

This recipe is a sub-routine — it is called by other recipes, not invoked directly by the user in most cases. It can be called standalone for ad hoc claim verification.

STEP 1Provide Claims and Context

Supply the list of claims to verify and optionally identify which recipe is calling the sub-routine.

Available parameters

claims · list · required
List of claim objects to verify. Each claim: {“claim”: “text”, “initial_confidence”: 0–100, “context”: “optional source context”}. More claims means more processing time, but each is independent.
called_by · string · optional · default: standalone
Recipe ID that invoked this sub-routine. Used for audit trail and to determine if late-session decay applies. When omitted, runs in standalone mode.

Invocations

Standalone Verification

#H->AI::Directive: (Run verification gates on claims with claims=[{“claim”: “The cookbook has 48 recipes”, “initial_confidence”: 85}])

Called by Another Recipe

#H->AI::Directive: (Run verification gates on claims with claims=[{“claim”: “LL-054 requires ID ordering”, “initial_confidence”: 90, “context”: “LL file line 8340”}], called_by=“RCP-CWK-021”)

Batch Verification

#H->AI::Directive: (Run verification gates on claims with claims=[{“claim”: “Claim A”, “initial_confidence”: 80}, {“claim”: “Claim B”, “initial_confidence”: 70}, {“claim”: “Claim C”, “initial_confidence”: 95}], called_by=“RCP-CWK-001”)

Usage Examples

Three scenarios — standalone verification, a calling recipe receiving mixed results, and late-session decay in action.

Example 1 · Standalone claim verification

User: Verify this claim — “CWK-017 does not exist in the cookbook” with confidence 90.

Cat: Initializing verification gate sub-routine. Late-session mode: inactive. Gate 1 (File Evidence): Grepping cookbook — CWK-017 not found. PASS. Gate 2 (Read vs Reconstruct): Based on fresh grep. PASS. Gate 3 (LL Conflict): LL-054 confirms CWK-017 as legitimate gap. PASS. Gate 4 (Untested Assumption): Not a platform claim. PASS. Result: VERIFIED. Confidence: 90 → 90.

Example 2 · Called by CWK-021 with mixed results

CWK-021: Passes 3 claims to CWK-024 for verification.

Cat: Claim 1: “Two-repo architecture.” All gates PASS. VERIFIED (85 → 85). Claim 2: “Git pull works in Cowork.” Gate 3: FAIL — conflicts with ULL-001. Gate 4: FAIL — untested. UNVERIFIED (80 → 55). Claim 3: “Session crashes cause total loss.” Gate 1: FAIL. Gate 2: FAIL. FLAGGED (75 → 60). Summary: 1 verified, 1 flagged, 1 unverified.

Example 3 · Late-session decay active

R-8: Calls CWK-024 at 75% token usage.

Cat: Late-session decay active: -10 penalty. Claim: “Recipe index has 128 entries.” Gate 2: FAIL — reconstructed. Re-reading now. Confirmed after re-read. All gates PASS with fresh observation — late-session penalty exempted. VERIFIED (70 → 70).

How AI Reads This Recipe

When this recipe is triggered, the AI processes claims through a systematic 4-step pipeline. The AI should:

INITIALIZE VERIFICATION CONTEXT. Check if CWK-010 late-session mode is active (token usage >70%). If active, set a -10 confidence penalty for claims lacking fresh tool observations. Create an empty results list.
PROCESS EACH CLAIM THROUGH 4 GATES. Gate 1 (File Evidence): Can the claim point to a file, line, or commit? Verify via tool call. FAIL = -20. Gate 2 (Read vs Reconstruct): Is the claim from a current-session observation or memory? If reconstructed, force re-read. FAIL after re-read = -15. Gate 3 (LL Conflict): Does any Lesson Learned contradict the claim? Grep LL file. Conflict = -15. Gate 4 (Untested Assumption): Does the claim assert untested Cowork behavior? Untested = -10. Apply late-session penalty if active and not all gates passed fresh. Clamp 0–100. Classify: all PASS = VERIFIED, ≥60 = FLAGGED, <60 = UNVERIFIED.
GENERATE VERIFICATION SUMMARY. Aggregate: total claims, verified/flagged/unverified counts, late-session mode status, calling recipe, per-claim detail.
REPORT. Output summary. List any FLAGGED or UNVERIFIED claims with failed gates, penalties, and confidence trajectory. Return VERIFICATION_SUMMARY to calling recipe. R-1 blocks UNVERIFIED, R-8 re-delegates FLAGGED, CWK-021 excludes UNVERIFIED.

This recipe implements R-6 (QA Phase B) as a reusable sub-routine extracted from CWK-021 Step 3b. Integrates with CWK-010 late-session confidence decay (R-13). Called by R-1 Fernand gate, R-8 sub-agent verification, CWK-021 analysis. Error handling covers empty claim lists and malformed claim objects.

When to Use This Recipe

Use this recipe when you:

Need to systematically verify claims before including them in a handoff, report, or analysis output.
Are building a recipe that produces assertions about project state and want a reusable verification gate.
Want to audit confidence in late-session claims where memory reconstruction risk is elevated.
Need to check whether assertions conflict with established Lessons Learned before acting on them.

Do not use this recipe when:

You are verifying trivial claims that are directly observable (e.g., “this file exists” when you just created it). The 4-gate pipeline adds overhead not justified for claims with obvious, immediate evidence. Use CWK-024 for claims involving inference, reconstruction, or cross-reference — not for simple tool output confirmation.

Recipe FAQ

Q.What are the four gates?

Gate 1 (File Evidence): Can you point to a file, line, or commit supporting the claim? Gate 2 (Read vs Reconstruct): Is the claim based on a current-session tool observation or memory reconstruction? Gate 3 (LL Conflict): Does any Lesson Learned contradict the claim? Gate 4 (Untested Assumption): Does the claim assert untested Cowork or tool behavior?

Q.What do the confidence penalties mean?

Gate 1 failure: -20 (no evidence is a serious gap). Gate 2 failure: -15 (reconstruction error is likely). Gate 3 failure: -15 (direct contradiction with institutional knowledge). Gate 4 failure: -10 (untested but may still be correct). Late-session decay: -10 (memory less reliable at high token usage). Penalties stack.

Q.What is late-session decay?

When CWK-010 detects token usage above 70%, a -10 confidence penalty applies to claims that do not have all gates passing with fresh tool observations. Claims fully grounded in fresh tool output are exempt.

Q.How do calling recipes use the results?

Each calling recipe has its own policy. R-1 (Fernand gate): FLAGGED triggers handoff revision, UNVERIFIED blocks the handoff. R-8 (sub-agent): FLAGGED re-delegates the task, UNVERIFIED flags to the user. CWK-021 (Issue Analyzer): FLAGGED gets noted in findings, UNVERIFIED gets excluded entirely.

Q.Can I call this standalone?

Yes. Omit the called_by parameter and provide claims directly. The sub-routine runs identically — the only difference is the audit trail shows “standalone” instead of a calling recipe ID.

Q.What happens with an empty claims list?

The sub-routine returns an empty VERIFICATION_SUMMARY with zero counts. No error is raised — an empty input is a valid edge case for recipes that may conditionally have zero claims to verify.

Version History

Changes to this recipe over time. Most recent first.

v1.00a 2026-02-28

Initial release. Reusable 4-gate verification sub-routine extracted from CWK-021 Step 3b: File Evidence (-20), Read vs Reconstruct (-15), LL Conflict (-15), Untested Assumption (-10). CWK-010 late-session decay integration (R-13, -10 penalty at >70% token usage). Three claim statuses: VERIFIED (all pass), FLAGGED (≥60), UNVERIFIED (<60). Confidence clamped 0–100. Called by R-1, R-8, CWK-021. Error handling for empty and malformed inputs.

Get this recipe with CRAFT for Claude Cowork

Cowork recipes ship bundled with CRAFT for Claude Cowork — there’s no separate download. Clone the framework once, and your AI runs every recipe automatically when invoked.

git clone https://github.com/CRAFTFramework/craft-framework

Pull anytime to stay on the latest version — free to clone, no login or email required.

Then start your session

Once CRAFT is in your project folder, open a new Cowork session and ask Claude to initialize. For example:

You

Please initialize my CRAFT session.

Claude

CRAFT session ready. Your project is loaded, your persona is active, and your recipes are available. What would you like to work on?

What is CRAFT for Claude Cowork?

Not familiar with Git? Download as a ZIP

No command line needed. Just download, move, and unzip:

Open the CRAFT framework repo on GitHub.
Click the green Code button, then choose Download ZIP.
Move the downloaded ZIP into your Claude Cowork project folder.
Unzip it: double-click on Mac, or right-click → Extract All on Windows.

RCP-CWK-024 Verification Gate Sub-Routine

Systematically grounds claims in evidence

Verification Gate Sub-Routine

TL;DR

How To Start

STEP 1Provide Claims and Context

Usage Examples

How AI Reads This Recipe

When to Use This Recipe

Recipe FAQ

Q.What are the four gates?

Q.What do the confidence penalties mean?

Q.What is late-session decay?

Q.How do calling recipes use the results?

Q.Can I call this standalone?

Q.What happens with an empty claims list?

Version History

Get this recipe with CRAFT for Claude Cowork

Then start your session

Not familiar with Git? Download as a ZIP

RCP-000-000-039-MARKET-GAP-DETECTIVE

RCP-CWK-014 Cowork Error Interceptor

RCP-000-000-045-MISCONCEPTION-DEBUNKER

RCP-CWK-008 Cowork Sub-Agent Task Delegation

RCP-000-000-018-SECURITY-POLICY-SIMULATOR-BASIC

RCP-000-000-072-WEBSITE-EVIDENCE-AUDITOR

Systematically grounds claims in evidence

Verification Gate Sub-Routine

TL;DR

How To Start

STEP 1Provide Claims and Context

Usage Examples

How AI Reads This Recipe

When to Use This Recipe

Recipe FAQ

Q.What are the four gates?

Q.What do the confidence penalties mean?

Q.What is late-session decay?

Q.How do calling recipes use the results?

Q.Can I call this standalone?

Q.What happens with an empty claims list?

Version History

Get this recipe with CRAFT for Claude Cowork

Then start your session

Not familiar with Git? Download as a ZIP

+

Similar Posts