AI RELEASE ASSURANCE · EU AI ACT READY · EMPIRICAL

Models don't stay aligned after interaction.

MTCP tells you if they will.

Multi-model, multi-language constraint persistence evaluation. 32 models across 12 languages. 183,924 probe interactions.

Test Your Model (5 min) View Evidence →

✓ Black-box — API access only, no weights or vendor cooperation needed ✓ Empirical — 183,924 real probe interactions, not simulated ✓ Audit-ready — SHA-256 signed Release Decision Pack

Aligned to: EU AI Act · NIST AI RMF · ISO/IEC 42001 · FCA · MAS FEAT · NDMO · NCA

MTCP identifies which models can be governed at runtime and which will bleed through no matter how much control-plane engineering you throw at them.

For AI Engineers

Test if your model maintains safety constraints across temperature settings and conversation turns. Get a deploy/don't-deploy answer in 5 minutes.

For Procurement Teams

Compare AI providers on constraint durability. See which models maintain alignment under real-world variation.

For Compliance Officers

Audit trail proving AI models maintain safety constraints across operating conditions. EU AI Act Article 12 ready.

How MTCP Works

Submit API endpoint — no weights or vendor access needed

→

MTCP runs full behavioural durability evaluation

→

Receive Release Decision Pack — APPROVED / RESTRICTED / REJECTED

→

Download tamper-evident evidence trail (SHA-256)

→

Gate deployment or satisfy regulatory audit

Read Methodology → View Evidence →

Models evaluated

Total evaluations

183,924

Research papers

Governance layers

MTCP Governance Stack — 15 Layers

BIS — Single-model constraint persistence

CSAS — Cross-system coordination admissibility

JRS — Jurisdiction resolution at boundaries

TDS — Temporal drift detection over time

CCS — Constraint conflict resolution

RES — Remediation effectiveness measurement

ACPS — Adversarial persistence resistance

BEC — Blockchain evidence chain integrity

COS — Constraint object specification

LRP — Legitimacy resolution protocol

GRC — Governance reference conditions

DRA — Deployment readiness attestation

Gate — Admissibility enforcement (PERMIT/DENY)

Quantum — Post-quantum cryptographic validity

PRP — Runtime behavioural monitoring

Measure

183,924 probe interactions across 32 frontier models in 12 languages at 4 temperature settings. The largest independent constraint persistence dataset published.

Boundary Integrity Score

Verify

Concealed control probes detect training data exposure. SHA-256 signed evidence packs. Machine-readable audit trail per run.

Control Probe Degradation

Gate

Release Decision Pack delivers APPROVED / APPROVED WITH RESTRICTIONS / REJECTED verdict with runtime guidance and regulatory alignment metadata.

Release Decision Pack

Beyond Single-Model Evaluation

MTCP evaluates constraint persistence at three levels. Each level produces empirical evidence, a grading scale, and audit-ready documentation.

Single Model
Does the model hold constraints across multi-turn interaction? Measured by BIS.
Cross-System
Do constraints survive handoff between coordinated AI systems? Measured by CSAS.
Jurisdiction
Was the authority governing a coordination boundary explicitly established? Measured by JRS.

Multi-Language Evaluation

The first multi-language, multi-script constraint persistence evaluation. 12 languages across 4 script families.

Latin script
100% constraint persistence universally across evaluated models.
CJK
Highest failure rates observed. Script distance from English predicts degradation.
Arabic-script
Intermediate performance. Critical for Gulf sovereign AI deployment.
Tamil
Script distance from English is the strongest predictor of constraint failure rate.

Non-Latin deployment requires language-specific evaluation. Standard English-only benchmarks cannot predict multilingual constraint reliability.

Who Uses MTCP

Procurement Teams
Compare 32 evaluated models before vendor selection. Attach MTCP certificate to procurement documentation.
AI Risk Officers
Empirical evidence for board-level risk sign-off. Quantified BIS, CPD, and TSI scores per model.
Compliance Leads
EU AI Act Article 12 ready. NIST AI RMF aligned. Audit-ready evidence packs downloadable immediately.
Deployment Gatekeepers
Set minimum BIS threshold. Block release on REJECTED verdict. Retest after model changes.
Sovereign AI Programmes
Independent evaluation of model stacks across Arabic and multilingual contexts. NDMO and NCA alignment evidence. Board-ready compliance documentation for Gulf sovereign AI infrastructure.

Public Evidence

Full results with temperature breakdowns and metric definitions.

The MTCP evidence layer provides comparative release assurance data across 32 independently evaluated frontier models. 183,924 structured probe interactions at four temperature settings.

View Evidence →

Ready to evaluate your model?

Submit your endpoint for a confidential MTCP evaluation. Receive a Release Decision Pack, full evidence audit trail, and deployment verdict. EU AI Act ready.

Request Evaluation Register for Access