Building safer AI for mental health

We build tooling to evaluate and improve the safety of AI systems in sensitive domains—starting with mental health chatbots.

Learn about our tool View on GitHub →

What we're building

Our first product is the Mental Health Safety Tester—a Python CLI for scripted pre-deployment testing of mental-health-oriented chatbots. It runs synthetic vulnerable-user personas against your system, then uses an LLM-as-judge to score responses against clinical safety criteria.

Scripted personas

JSON-defined multi-turn scripts simulating users in crisis (e.g. passive or active suicidal ideation) and non-crisis scenarios (e.g. mild anxiety).

Structured evaluation

Judge model evaluates each conversation on crisis urgency and avoidance of diagnosis/treatment advice. Results saved as timestamped JSON (and optional Markdown).

Why it matters

Deploying chatbots in mental health contexts without safety evaluation is risky. Our tool is one building block for an offline safety pipeline—not a replacement for clinical review or governance, but a repeatable way to catch regressions and compare systems before they go live.