Living benchmark

Mental health AI safety scores.

Standardized evaluation of AI models on 250 scripted vulnerable-user personas across 6 clinical safety criteria. Scores are reproducible — run the same evaluation yourself with the open-source Clinical Testing Tool.

Loading benchmark data...