CAPAS gives review teams a deterministic gate before scientific claims enter reports, governed datasets, or fine-tuning preparation. It checks whether supplied evidence licenses a claim for controlled reuse, and returns a replayable packet — the verdict, the deterministic reason, the required evidence contract, the no-LLM marker, and fine-tune readiness — re-derivable from the same input.
Synthetic benchmark — full verdict-space coverage on an adversarial grid, not a production drift rate. No production pilot has run yet. Full benchmark methodology →
This is the literal output of the engine on a claim whose reported number re-derives correctly but whose supplied accounting evidence is internally inconsistent. Reproducible: capas_sdk.gate("financial_metric_claim", evidence).
{
"schema_version": "capas-claim-payload-v3",
"verdict": "REJECT",
"reason": "reported_value matches reference within
tolerance and period matches; OVERRIDDEN by a
domain invariant violation: balance identity
VIOLATED — assets 1000 != liabilities 600 +
equity 300 (residual 100). The books do not close.",
"required_fields": ["reported_value", "reference_value",
"tolerance", "metric_period_match"],
"invariant_audit": "FLAG",
"fine_tune_ready": false,
"non_claim": "This decision is rule-based over supplied
evidence fields, not an LLM judgment."
}
An LLM may be used upstream — to extract the payload from a paper, or draft a rewrite suggestion. It is never used to determine admissibility. The verdict (ACCEPT / REWRITE / REJECT / HOLD) is produced only by versioned deterministic rules over the supplied evidence fields. The same payload always yields the same verdict, so any decision can be independently re-run and audited. The non_claim field is the machine-readable marker of this.
Text-ingested claims additionally carry source evidence spans; the hosted API wraps any packet in a signed, content-addressed certificate (capas_certstore) for tamper-evidence.
A claim can be ACCEPTed for a report yet still not be ready for training data. These 14 deterministic checks (verbatim from the engine) gate fine_tune_ready after an ACCEPT — they never change the verdict, only whether the claim may enter fine-tuning preparation.
Verbatim from capas.evaluate_fine_tune_readiness. Any unmet criterion appears as a named blocker on the packet.
CAPAS targets the point where a cautious source sentence becomes an over-scoped reusable claim.
The gate runs before records enter fine-tuning, publication workflows, governed datasets, or downstream reports.
Each output carries decision reason, evidence spans, blockers, and a non-LLM marker.
The gate takes a claim and an evidence package. It checks them against a claim-type evidence contract. The contract defines what evidence is required and what scope it licenses. The gate returns a verdict — deterministically.
Claim text + evidence fields (statistical, artifact, source URL, license, reviewer hash…)
90+ deterministic rule functions check claim against evidence contract. No LLM in the decision path.
+ blockers, reviewer action, audit hash, evidence spans
Emerging AI-governance frameworks ask teams to document the quality and provenance of the data behind a claim. CAPAS produces exactly that artifact, per claim, deterministically — with no model in the decision path. It is an input to compliance and review work, not a certification of it.
Article 10 requires high-risk AI systems to use training, validation, and testing data that meet quality and governance criteria. CAPAS records, per claim, whether supplied evidence licenses reuse — a traceable check at the moment data enters a dataset.
The NIST AI Risk Management Framework emphasizes documentation and traceability across the data lifecycle. Every CAPAS decision emits a reason, evidence spans, provenance blockers, and an audit hash that can be reviewed after the fact.
Unlike LLM-as-judge evaluators, CAPAS has no language model in the decision path. The same payload always yields the same verdict, so a decision can be independently re-run and audited — which a stochastic model judgment cannot guarantee.
Regulatory references are provided for context only. CAPAS does not certify compliance with the EU AI Act, the NIST AI RMF, or any other framework; it produces deterministic, auditable decision artifacts that support governance and review processes.
A controlled operating test: can the organization identify which claims are licensed, which must be rewritten, which must be rejected, and which require more evidence before reuse?
Select one vertical corpus: AI governance, pharma evidence review, model risk, journal reproducibility, or materials R&D.
Convert 500 structured records into CAPAS payloads through the guided constructor, CLI, or upstream extraction adapter.
Run deterministic batch gating and sample 100 decisions for expert adjudication.
Report decision mix, reviewer agreement, false reject rate, provenance blockers, and review capacity redirected.
Licensed for controlled reuse. Fine-tune obligations to clear.
Evidence supports a narrower claim. Returns the licensed boundary.
Returns which evidence is missing or failing — not a silent no.
Returns the steps: supply the missing field, verify provenance, re-gate.
CAPAS gates supplied evidence fields; it does not infer hidden evidence, provide legal advice, certify broad scientific truth, or replace external review.
The 1,238 decisions and 78% gated share are reproducible from the engine’s own benchmark (benchmarks/family_decision_mix.py) over a synthetic decision-space grid — they demonstrate full verdict-space coverage, NOT a real-world drift rate. No production pilot has been run yet; real rates require an independently adjudicated corpus.
Review-capacity estimates are planning assumptions and must be calibrated against the customer baseline.
Do not share payload URLs or exports containing confidential source text, reviewer IDs, witness IDs, licensed materials, or proprietary provenance paths without authorization. Data handling & security →