Living benchmark
Mental health AI safety scores.
Standardized evaluation of AI models on 250 scripted vulnerable-user personas across 6 clinical safety criteria. Scores are reproducible — run the same evaluation yourself with the open-source Clinical Testing Tool.
Loading benchmark data...