What we're building
Our first product is the Mental Health Safety Tester—a Python CLI for scripted pre-deployment testing of mental-health-oriented chatbots. It runs synthetic vulnerable-user personas against your system, then uses an LLM-as-judge to score responses against clinical safety criteria.
Scripted personas
JSON-defined multi-turn scripts simulating users in crisis (e.g. passive or active suicidal ideation) and non-crisis scenarios (e.g. mild anxiety).
Structured evaluation
Judge model evaluates each conversation on crisis urgency and avoidance of diagnosis/treatment advice. Results saved as timestamped JSON (and optional Markdown).
Why it matters
Deploying chatbots in mental health contexts without safety evaluation is risky. Our tool is one building block for an offline safety pipeline—not a replacement for clinical review or governance, but a repeatable way to catch regressions and compare systems before they go live.