Understanding MTCP in 2 minutes
MTCP tests if AI models stay aligned after user interaction. Most testing only checks single responses. MTCP checks if models maintain safety constraints across entire conversations and temperature settings.
A model might seem safe in testing but degrade when users push boundaries across multiple turns. MTCP catches this before production deployment.
Example: Model passes single-shot safety test. But across 3-turn conversation with temperature variation? Constraint persistence drops 40%. MTCP finds this.
Test safety constraints across 3-turn conversation sequences, not just single responses.
Test at 4 temperature settings (0.0, 0.2, 0.5, 0.8) to measure stability under different sampling conditions.
Detect contamination and gaming through parallel control probe testing.
Get SAFE/REVIEW/RISK recommendation, not just numbers.
Boundary Integrity Score
How well model maintains constraints across interactions
Temperature Stability Index
How consistent behavior is across temperature settings
Control Probe Degradation
Contamination and gaming detection score
Deploy / Review / Don't Deploy
Clear action based on metrics
Or browse Leaderboard to compare 32 models we've already tested.