What 20,000 Real Conversations Reveal About Mental Health AI Safety
arXiv, 2026
This paper looks at how well mental health support systems handle safety risks by comparing standard testing methods with what actually happens in real use. Researchers evaluated both general-purpose models and a system designed specifically for mental health across scenarios like suicide risk, self-harm, and harmful content. They also reviewed 20,000 real conversations with the purpose-built system.
The findings show that the purpose-built system produced significantly fewer harmful responses than general models in testing. In real-world conversations, all identified suicide-risk cases received crisis resources, and only a very small number of self-harm mentions did not trigger an intervention. Overall, the rate of missed responses was low.
The study also found that simulation-based testing showed higher failure rates than what was observed in real use, pointing to a gap between controlled environments and how these systems perform in everyday situations.
Why it matters:
This paper points to a shift in how safety should be evaluated. It suggests that testing alone may not reflect how systems behave in real life, and highlights the importance of ongoing monitoring in real-world use.
It also shows that systems designed specifically for mental health, with focused safeguards, can respond differently than general-purpose models when handling sensitive situations.