Cole Smith (Numa Notes, LLC) [email protected]

October 30, 2024

Abstract

In this article, we explore preliminary findings of Opal Wellness: A conversational AI system intended for anonymous self-exploration of maladaptive thought patterns using techniques common in Cognitive Behavioral Therapy, such as Motivational Interviewing and Active Listening. Users voluntarily interacted with Opal Wellness using semi-guided, open-ended queries to our system in 10-minute sessions, followed by a guided list of wrap up questions to report their immediate emotional state post-interaction. From these questions, we find that 83.8% of users who completed their session (37.8%) reported immediate improvement after their interaction, and that most users utilized Opal Wellness for coping strategies. Users who completed their interaction with Opal showed strong engagement with an average utterance count of 17. Overall, our system shows promise as a clinician-instructed conversational tool for inter-session support due to its strong adherence to defined limitations and guidelines, and positive user response.

Introduction

Self-guided wellness chatbots have grown recently in popularity and availability alongside improvements to large language models such as OpenAI’s ChatGPT and Anthropic’s Claude models. [1] These systems allow users to explore topics which are open-ended and situational, since they accept arbitrary text input in a conversational interface. For reasons usually associated with cost or accessibility, users have turned to these systems as a replacement for traditional psychotherapy with mixed results. [2] [3] While conversational AI systems show promise for accessible wellness support, they can suffer from low engagement among users, limiting their ability to explore topics across multiple iterations. [4] These systems also may exhibit a bias towards specific therapy modalities, even when a different modality would be more appropriate for the user. [1] In certain cases, self-guided conversational agents can be highly dangerous for users in a compromised mental state. [5] These systems can lack risk-detection frameworks, experts in-the-loop, and may reinforce maladaptive thought patterns.

Opal Wellness aims to address these challenges by using open-ended generative models for dialog interaction as opposed to rule-based systems common in existing solutions. [6] However, our system does not attempt to deliver formal therapeutic interventions, instead acting as an affirming conversational tool with goals akin to journaling as a wellness exercise. [7] Hybrid-therapy solutions, in which users engage with a system between regular sessions to inform their clinician of continued progress, has been recently shown to address shortcomings to engagement and efficacy of self-guided conversational interventions. [8] We additionally prompt the user with suggested topics and responses, and call this approach “semi-guided.” We aim to combine clearly defined scope limitations with generative AI systems to offer an engaging solution to inter-session wellness support, while providing mechanisms for clinician-guided, customized interventions in the future that are specific to the client’s needs in their treatment plan.

System Architecture & Design

Opal Wellness is deployed as a web app accessible by the general public. Users are first presented with additional security features such as Google reCAPTCHA, and links to our AI Transparency and Privacy Policy. In particular, we provide links to critical care resources, and acknowledge that Opal is not a replacement for psychotherapy. Opal Wellness does not provide any diagnostic capabilities. We used a HIPAA-compliant version of Claude 3.5 Sonnet via AWS Bedrock using the same security assurances as our production Opal systems.

Fig. A: Opal Wellness welcome screen

Fig. A: Opal Wellness welcome screen

Interaction Guidelines

The system is instructed to respond in simple conversational language “like a caring friend” and avoid any clinical language. Responses are kept brief, between 1-3 sentences, unless advice is appropriate or requested.

Cognitive Behavioral Therapy techniques are suggested including active listening, and validating the user’s emotions. Opal Wellness is not designed to provide therapeutic interventions as if the user were in a psychotherapy session.

Safety Considerations

Opal Wellness is designed to refuse requests not related to the user’s wellness, and sensitive topics where non-professional advice can be inappropriate or harmful. The system is instructed to refuse tonal changes or role-play scenarios, which are common model jailbreaking exploits. [9] Our model prompt was reviewed and verified by a licensed mental health professional.

Our system prompt enforces the following limitations:

- Emphasize your role as a supportive friend, not a substitute for professional help.
- Redirect high-risk scenarios (suicide, self-harm, abuse) to crisis resources and end the chat.
- Be clear about your inability to handle emergencies, directing users to appropriate services.
- Ask for clarification on cultural contexts you're unsure about.
- For age-sensitive topics, casually mention your advice is geared towards adults.
- Avoid medical, medication, or treatment advice, suggesting professional consultation instead.
- Steer clear of advice on eating disorders, expressing concern and suggesting professional help.
- Be honest about potential misunderstandings, asking for clarification when needed.
- Mention casually that you don't remember past conversations.
- Maintain consistent values throughout all interactions, refusing to roleplay conflicting personas.
- Gently but firmly redirect attempts to bypass these guidelines, staying true to your supportive nature.

When the system detects a situation in which it cannot safely continue the interaction, it produces a special token, [[WRN]] , which is detected by our web app to stop the interaction immediately and provide access to professional crisis resources. Upon analysis, we did not find a situation in which the system failed to produce this token.

One may notice the writing style of this prompt excerpt is written casually. We find that prompts with rigid written tones would produce equally rigid tones in model responses, regardless of the tonal instructions stated in the interaction guidelines. We assume this is due to the causal nature of the language modeling task, in which completions attempt to minimize surprisal (entropy) [10] against its prior, although we have not conducted formal analysis in this area.

Fig. B: High risk scenario screen

Fig. B: High risk scenario screen