OSF DOI: 10.17605/OSF.IO/DXGK5 SSRN Paper: 6482082 HuggingFace: mtcp-boundary-500
EU AI Act — August 2026
MTCP Pricing

Evaluation that stands up
in procurement

Independent AI release assurance for enterprise procurement and regulatory compliance. Get a deployment verdict before you commit to a model.

EU AI Act compliance deadline: August 2026 — enterprise buyers are purchasing now
DOI: 10.17605/OSF.IO/DXGK5 SSRN Paper: 6482082 HuggingFace: mtcp-boundary-500 32 models evaluated · 4 temperature settings · 181,448 probe interactions · Independent
Free
£0
always free

Full access to the public evidence layer. See how 32 frontier models score before you commit to anything.

View Evidence →
  • Full public evidence layer — 32 models
  • BIS, CPD, and TSI scores for all models
  • Temperature breakdown (T=0.0–0.8)
  • Methodology documentation
  • Grade scale and metric definitions
  • Private model evaluation
  • Formal evaluation report
  • Evaluation certificate
Pro
£499
per month

Private evaluations with formal reports and Release Decision Packs. For teams procuring models or running recurring governance checks.

Start Evaluation →
  • 2 private model evaluations/month
  • Full 200-probe behavioral durability evaluation
  • All 4 temperature settings
  • Formal PDF evaluation report
  • Release Decision Pack (deployment verdict + signals)
  • Evaluation certificate (procurement-ready)
  • Evidence comparison against all 32 models
  • Results private by default
  • API access
EU AI Act — August 2026 Compliance Deadline
The EU AI Act's operator control obligations (Article 12 logging requirements, runtime enforcement, and accountability frameworks) take full effect in August 2026. Enterprise buyers in regulated sectors are establishing AI governance documentation now, before the deadline. MTCP evaluation certificates and reports are designed to serve as formal assurance evidence within AI governance submissions — including NIST AI RMF, ISO/IEC 42001, and EU AI Act Article 12 documentation. Read the Buyer Brief →

Feature comparison

Everything across all tiers, in detail.

Feature Free Pro Enterprise
Evidence access
Public leaderboard (32+ models)
BIS, CPD, TSI scores
Temperature breakdown (T=0.0–0.8)
Model cards (per-model detail)
Private evaluations
Private model evaluation 2/month
200-probe behavioral durability evaluation
Control probe run (20 probes)
Confidential endpoint submission
Reports & outputs
Formal PDF evaluation report
Evaluation certificate (procurement)
Audit certificate (EU AI Act / NIST)
Evidence comparison (vs 32 models)
API & integrations
REST API access
Embeddable evaluation badge
Support & legal
NDA available
Dedicated account contact
Turnaround SLA 5 business days
Custom probe suite

Common questions

Can't find what you need? Email us directly.

What exactly does an MTCP evaluation test?
MTCP runs a 200-probe multi-turn correction sequence against your model's API. Each probe tests whether the model maintains an explicitly corrected constraint across 3 turns. We measure Boundary Integrity Score (BIS), Control Probe Degradation (CPD), and Temporal Stability Index (TSI) across four temperature settings. Full methodology →
Do you need access to model weights or internals?
No. MTCP is fully black-box. We only require API access — the same endpoint you'd give any user. No model weights, training data, or vendor cooperation is needed. Your API key is used only during the evaluation run and is never stored.
How is the MTCP certificate useful for procurement?
The certificate records the model identifier, evaluation date, BIS score, grade, and comparative release assurance standing. It carries a DOI-registered methodology reference and can be attached directly to procurement documentation, board risk assessments, or EU AI Act Article 12 submissions.
Is my evaluation kept confidential?
Yes. Private evaluations are never published or included in the public evidence layer without explicit written permission. Enterprise customers can submit under NDA. Results are stored in an EU-region database (Neon PostgreSQL, Frankfurt).
What's the difference between Pro and Enterprise?
Pro gives you 2 evaluations/month with formal reports and procurement-ready certificates. Enterprise adds unlimited evaluations, control probe runs (the 20-probe concealed set), REST API access, 48hr SLA, and audit certificates formatted for EU AI Act and NIST RMF submissions.
How does white-label work?
We licence the MTCP methodology and reporting pipeline for you to run under your own brand — for internal use or resale to your clients. Includes custom probe suite development, co-branding options, and advisory support. Contact us to discuss.
Can I trial before committing to Pro?
Yes. The free public evidence layer shows exactly how 32 evaluated models perform. If you want to see how your specific model compares before subscribing, you can submit a single evaluation request — we'll quote you a one-off rate before any subscription is needed.
What models have already been evaluated?
32 models including frontier models from OpenAI, Anthropic, Google, Meta, Mistral, Cohere, NVIDIA, Groq, Cerebras, DeepSeek, and AWS Bedrock. We continuously add new models. Enterprise customers can request priority evaluation. Full list at /evidence/public-findings.

Ready to evaluate?

Submit your model for a private MTCP evaluation. Receive a formal report, Release Decision Pack and deployment verdict — typically within 5 business days.

Questions? research@mtcp.live  ·  NDA available  ·  EU-hosted data  ·  Methodology DOI: 10.17605/OSF.IO/DXGK5