AI RELEASE ASSURANCE · EU AI ACT READY · EMPIRICAL
Models don't stay aligned after interaction.
MTCP tells you if they will.
Multi-model, multi-language constraint persistence evaluation. 32 models across 12 languages. 183,924 probe interactions.
✓ Black-box — API access only, no weights or vendor cooperation needed
✓ Empirical — 183,924 real probe interactions, not simulated
✓ Audit-ready — SHA-256 signed Release Decision Pack
Aligned to:
EU AI Act
·
NIST AI RMF
·
ISO/IEC 42001
·
FCA
·
MAS FEAT
·
NDMO
·
NCA
MTCP identifies which models can be governed at runtime and which will bleed through no matter how much control-plane engineering you throw at them.
For AI Engineers
Test if your model maintains safety constraints across temperature settings and conversation turns.
Get a deploy/don't-deploy answer in 5 minutes.
For Procurement Teams
Compare AI providers on constraint durability.
See which models maintain alignment under real-world variation.
For Compliance Officers
Audit trail proving AI models maintain safety constraints across operating conditions.
EU AI Act Article 12 ready.
How MTCP Works
1
Submit API endpoint — no weights or vendor access needed
→
2
MTCP runs full behavioural durability evaluation
→
3
Receive Release Decision Pack — APPROVED / RESTRICTED / REJECTED
→
4
Download tamper-evident evidence trail (SHA-256)
→
5
Gate deployment or satisfy regulatory audit
Total evaluations
183,924
MTCP Governance Stack — 15 Layers
BIS — Single-model constraint persistence
CSAS — Cross-system coordination admissibility
JRS — Jurisdiction resolution at boundaries
TDS — Temporal drift detection over time
CCS — Constraint conflict resolution
RES — Remediation effectiveness measurement
ACPS — Adversarial persistence resistance
BEC — Blockchain evidence chain integrity
COS — Constraint object specification
LRP — Legitimacy resolution protocol
GRC — Governance reference conditions
Gate — Admissibility enforcement (PERMIT/DENY)
Quantum — Post-quantum cryptographic validity
Measure
183,924 probe interactions across 32 frontier models in 12 languages at 4 temperature settings. The largest independent constraint persistence dataset published.
Boundary Integrity Score
Verify
Concealed control probes detect training data exposure. SHA-256 signed evidence packs. Machine-readable audit trail per run.
Control Probe Degradation
Gate
Release Decision Pack delivers APPROVED / APPROVED WITH RESTRICTIONS / REJECTED verdict with runtime guidance and regulatory alignment metadata.
Release Decision Pack
Beyond Single-Model Evaluation
MTCP evaluates constraint persistence at three levels. Each level produces empirical evidence, a grading scale, and audit-ready documentation.
- Single Model
Does the model hold constraints across multi-turn interaction? Measured by BIS.
- Cross-System
Do constraints survive handoff between coordinated AI systems? Measured by CSAS.
- Jurisdiction
Was the authority governing a coordination boundary explicitly established? Measured by JRS.
Multi-Language Evaluation
The first multi-language, multi-script constraint persistence evaluation. 12 languages across 4 script families.
- Latin script
100% constraint persistence universally across evaluated models.
- CJK
Highest failure rates observed. Script distance from English predicts degradation.
- Arabic-script
Intermediate performance. Critical for Gulf sovereign AI deployment.
- Tamil
Script distance from English is the strongest predictor of constraint failure rate.
Non-Latin deployment requires language-specific evaluation. Standard English-only benchmarks cannot predict multilingual constraint reliability.
- Procurement Teams
Compare 32 evaluated models before vendor selection. Attach MTCP certificate to procurement documentation.
- AI Risk Officers
Empirical evidence for board-level risk sign-off. Quantified BIS, CPD, and TSI scores per model.
- Compliance Leads
EU AI Act Article 12 ready. NIST AI RMF aligned. Audit-ready evidence packs downloadable immediately.
- Deployment Gatekeepers
Set minimum BIS threshold. Block release on REJECTED verdict. Retest after model changes.
- Sovereign AI Programmes
Independent evaluation of model stacks across Arabic and multilingual contexts. NDMO and NCA alignment evidence. Board-ready compliance documentation for Gulf sovereign AI infrastructure.
Public Evidence
Full results with temperature breakdowns and metric definitions.
The MTCP evidence layer provides comparative release assurance data across 32 independently evaluated frontier models. 183,924 structured probe interactions at four temperature settings.
View Evidence →
Ready to evaluate your model?
Submit your endpoint for a confidential MTCP evaluation. Receive a Release Decision Pack, full evidence audit trail, and deployment verdict. EU AI Act ready.