◆ MTCP Pricing

Evaluation that stands up
in procurement

Independent AI release assurance for enterprise procurement and regulatory compliance. Get a deployment verdict before you commit to a model.

EU AI Act compliance deadline: August 2026 — enterprise buyers are purchasing now

DOI: 10.17605/OSF.IO/DXGK5 SSRN Paper: 6482082 HuggingFace: mtcp-boundary-500 32 models evaluated · 4 temperature settings · 183,924 probe interactions · Independent

Free

£0

always free

Full access to the public evidence layer. See how 32 frontier models score before you commit to anything.

View Evidence →

What's included

✓ Full public evidence layer — 32 models
✓ BIS, CPD, and TSI scores for all models
✓ Temperature breakdown (T=0.0–0.8)
✓ Methodology documentation
✓ Grade scale and metric definitions
— Private model evaluation
— Formal evaluation report
— Evaluation certificate

Pro

^£499

per month

Private evaluations with formal reports and Release Decision Packs. For teams procuring models or running recurring governance checks.

Start Evaluation →

Everything in Free, plus

✓ 2 private model evaluations/month
✓ Full 200-probe behavioral durability evaluation
✓ All 4 temperature settings
✓ Formal PDF evaluation report
✓ Release Decision Pack (deployment verdict + signals)
✓ Evaluation certificate (procurement-ready)
✓ Evidence comparison against all 32 models
✓ Results private by default
— API access

Most chosen for compliance

Enterprise

^£1,999

per month

Unlimited evaluations, API access, and audit-ready outputs for board-level AI governance and EU AI Act compliance. NDA available.

Contact for Access →

Everything in Pro, plus

✓ Release Decision Pack per evaluation
✓ Full evidence audit trail (machine-readable JSON)
✓ Tamper-evident SHA-256 integrity hash
✓ Runtime risk signals and deployment verdict
✓ Unlimited private evaluations
✓ Full API access (results + runs)
✓ 200 + 20 control probe runs
✓ Audit certificate (EU AI Act, NIST RMF)
✓ Confidential endpoint submission
✓ NDA available on request
✓ Priority turnaround — 48hr SLA
✓ Dedicated account contact

⚡

EU AI Act — August 2026 Compliance Deadline

The EU AI Act's operator control obligations (Article 12 logging requirements, runtime enforcement, and accountability frameworks) take full effect in August 2026. Enterprise buyers in regulated sectors are establishing AI governance documentation now, before the deadline. MTCP evaluation certificates and reports are designed to serve as formal assurance evidence within AI governance submissions — including NIST AI RMF, ISO/IEC 42001, and EU AI Act Article 12 documentation. Read the Buyer Brief →

Feature comparison

Everything across all tiers, in detail.

Feature	Free	Pro	Enterprise
Evidence access
Public leaderboard (32+ models)	✓	✓	✓
BIS, CPD, TSI scores	✓	✓	✓
Temperature breakdown (T=0.0–0.8)	✓	✓	✓
Model cards (per-model detail)	✓	✓	✓
Private evaluations
Private model evaluation	—	2/month	Unlimited
200-probe behavioral durability evaluation	—	✓	✓
Control probe run (20 probes)	—	—	✓
Confidential endpoint submission	—	—	✓
Reports & outputs
Formal PDF evaluation report	—	✓	✓
Evaluation certificate (procurement)	—	✓	✓
Audit certificate (EU AI Act / NIST)	—	—	✓
Evidence comparison (vs 32 models)	—	✓	✓
API & integrations
REST API access	—	—	✓
Embeddable evaluation badge	—	✓	✓
Support & legal
NDA available	—	—	✓
Dedicated account contact	—	—	✓
Turnaround SLA	—	5 business days	48 hours
Custom probe suite	—	—	—

Common questions

Can't find what you need? Email us directly.

What exactly does an MTCP evaluation test?

MTCP runs a 200-probe multi-turn correction sequence against your model's API. Each probe tests whether the model maintains an explicitly corrected constraint across 3 turns. We measure Boundary Integrity Score (BIS), Control Probe Degradation (CPD), and Temporal Stability Index (TSI) across four temperature settings. Full methodology →

Do you need access to model weights or internals?

No. MTCP is fully black-box. We only require API access — the same endpoint you'd give any user. No model weights, training data, or vendor cooperation is needed. Your API key is used only during the evaluation run and is never stored.

How is the MTCP certificate useful for procurement?

The certificate records the model identifier, evaluation date, BIS score, grade, and comparative release assurance standing. It carries a DOI-registered methodology reference and can be attached directly to procurement documentation, board risk assessments, or EU AI Act Article 12 submissions.

Is my evaluation kept confidential?

Yes. Private evaluations are never published or included in the public evidence layer without explicit written permission. Enterprise customers can submit under NDA. Results are stored in an EU-region database (Neon PostgreSQL, Frankfurt).

What's the difference between Pro and Enterprise?

Pro gives you 2 evaluations/month with formal reports and procurement-ready certificates. Enterprise adds unlimited evaluations, control probe runs (the 20-probe concealed set), REST API access, 48hr SLA, and audit certificates formatted for EU AI Act and NIST RMF submissions.

How does white-label work?

We licence the MTCP methodology and reporting pipeline for you to run under your own brand — for internal use or resale to your clients. Includes custom probe suite development, co-branding options, and advisory support. Contact us to discuss.

Can I trial before committing to Pro?

Yes. The free public evidence layer shows exactly how 32 evaluated models perform. If you want to see how your specific model compares before subscribing, you can submit a single evaluation request — we'll quote you a one-off rate before any subscription is needed.

What models have already been evaluated?

32 models including frontier models from OpenAI, Anthropic, Google, Meta, Mistral, Cohere, NVIDIA, Groq, Cerebras, DeepSeek, and AWS Bedrock. We continuously add new models. Enterprise customers can request priority evaluation. Full list at /evidence/public-findings.

Ready to evaluate?

Submit your model for a private MTCP evaluation. Receive a formal report, Release Decision Pack and deployment verdict — typically within 5 business days.

Request Evaluation Enterprise Enquiry View Evidence

Questions? research@mtcp.live · NDA available · EU-hosted data · Methodology DOI: 10.17605/OSF.IO/DXGK5

Evaluation that stands upin procurement

Feature comparison

Common questions

Ready to evaluate?

Evaluation that stands up
in procurement