Introduction

Mental health disorders affect over 970 million people worldwide, and the incidence of these disorders has only increased over the course of the COVID-19 pandemic [1]. As the incidence, awareness, and public acceptance of mental health disorders continue to increase, access to appropriate treatment poses an ever-expanding challenge to both clinicians and patients. The current and expected workforce of psychiatrists is insufficient to meet these needs, given a projected shortage of between 14,280 and 31,091 psychiatrists by 2024 [2]. Furthermore, 96% of counties in the USA have an unmet need for mental health prescribers, suggesting inadequate access to mental health treatment in large parts of the country [2].

Patients and clinicians alike have been searching for a way to increase access to treatment, improve clinical outcomes, and strengthen the therapeutic alliance. With at least 85% of the US population owning a smartphone [3], a search for solutions on how to expand access to psychiatric treatment inevitably includes utilizing mobile health technologies, or “mHealth”. By both increasing scalability of operations and accessibility using digital clinics, mental health apps can be used to reach more patients and allow for care that would otherwise be untenable for those in underserved areas. While mental health apps have the potential to offer solutions, they also present risks, including direct patient harm, loss of privacy, fragmented or inefficient care, and potentially increased costs. Balancing these risks and benefits requires clinical judgment and knowledge of the technological and social factors at play [4•].

Accordingly, this paper will seek to support those interested in this space by reviewing the current app landscape including distinctions between general wellness products (GWPs), apps as medical devices and apps with FDA approval (so-called digital therapeutics). Next, we will discuss the importance of app evaluation models with a focus on the American Psychiatric Association (APA)’s App Evaluation Model. The APA’s app evaluation model has been adopted by several healthcare organizations including the NYC Department of Health and Mental Hygiene, Kaiser Permanente, Vinfen, and Beth Israel Deaconess Medical Center. The Model has also been reviewed and used by several agencies within the federal government, including the Agency for Healthcare Research and Quality (AHRQ) and the US Department of Veteran’s Affairs [5,6,7,8]. It has been featured in several publications, including in peer-reviewed journal articles and in the popular press [9]. We will next discuss the current challenges involved in the development and use of app evaluation models. Finally, we will share practical implementation of and barriers to using apps in various clinic environments such as digital clinic spaces.

A Wild West—the need for app evaluation

Digital health apps exist in a dynamic market that is constantly changing. In 2020, over 90,000 new digital health apps were released. Many apps are found by searching through major app stores (Apple, Google Play, Amazon, Samsung, Microsoft) [10]. The most common uses for digital health apps include patient self-help and facilitating video/phone/text-based telehealth visits. Mental health app functions include a variety of features such as symptom-tracking, habit formation, targeted behavior change, peer support, mindfulness, and cognitive behavioral therapy [11, 12•]. While many papers have further characterized mental health apps, this is outside of the scope of our review.

Although apps have the potential to increase access to care, not all of them are effective, and may even be harmful to patients. A review of mental health apps for bipolar disorder reported critically inaccurate information to consumers, including suggestions to drink “a shot of liquor” to help them sleep during a manic episode and implications that bipolar disorder is contagious [13]. Beyond inappropriate clinical content, an additional concern around apps is that many may not provide clear or easily accessible privacy policies, potentially raising red flags about the protection of sensitive user data related to mental health. Moreover, apps are also continuously being added, removed, or even “abandoned” within the app store, which may indicate a lack of continuous support or even availability of the app, opening questions regarding continuity of care and the use of mHealth [14].

Another feature unique to apps—compared to traditional therapeutics—is how frequently they are updated. Such updates may include a myriad of variability related to features, user interface, privacy policy; or some apps may not be updated at all (so-called zombie apps), posing security risks to the users, or just becoming unusable in a rapidly evolving technological environment [15•]. Harms related to updates, especially those that may come from the use of apps or privacy concerns, are not always considered by patients and consumers. Indeed, the marketplace is rife with a growing number of mediocre or ineffective apps and, coupled with a lack of adequate regulation and accountability, the potential for harm while using these apps underscores the need for effective methods by which clinicians and patients alike can evaluate them prior to use [16•].

The current app regulatory landscape—classifying apps and highlighting limitations

Mental health apps are commonly classified into three categories: (1) general wellness product (GWP), (2) medical, and (3) FDA approved. Table 1 describes each classification and highlights which software functions may be considered “low safety risk” by the FDA [12•, 17].

Table 1 Similarities and differences between three common app classifications and examples of mental health apps the FDA may consider “low safety risk”

The FDA has provided examples of what constitutes “low risk” apps (see Table 1) that are eligible for regulatory discretion, with a particular focus on mental health apps. Indeed, regulatory discretion for low-risk GWP apps was reinforced during the onset of the COVID-19 pandemic with the accompanying Public Health Emergency (PHE) declaration. The FDA released an enforcement policy intended to expand availability of digital health devices that treat psychiatric disorders [12•, 18]. Since the FDA does not exercise its regulatory authority for these apps, many patients and clinicians may find it difficult to ascertain the boundary between GWP and Medical apps. It is therefore essential for patients and clinicians to carefully evaluate all apps, not just those claiming to be wellness products [12•].

In an effort to regulate digital health products, including mental health apps, the FDA has been exploring a new regulatory system called Pre-Certification (Pre-Cert). This system aimed to streamline the approval process for companies with a track record of producing safe and effective digital health products. However, in September 2022, the FDA announced that the Pre-Cert pilot program had ended, leaving uncertainty about what the next steps will be and stating that a new regulatory paradigm would require legislative change [19, 20•]. In the meantime, challenges remain with assessing the efficacy and utility of mental health apps.

App evaluation frameworks

With limited government regulations and oversight for apps, providers and patients continue to search for guidance on how to identify safe and evidence-based mental health apps. In response, numerous research groups and organizations have created original app evaluation frameworks intended to provide guidance on how to self-evaluate apps. Each framework varies in terms of intended audience, evaluation scope, question type, and scoring [16•]. Choosing a framework that enables the systematic evaluation of an app to determine suitability for use among patients and providers along with individual organizational requirements is an important first step in app evaluation, but how does one go about deciding between the different models? We suggest examining the model for the following factors: (1) clinical and privacy considerations, (2) ease of use, (3) diversity of perspective in creation of the model, and (4) adoption of model [21].

APA App Evaluation Model development

The APA App Evaluation Model was initially developed by John Torous’ Digital Psychiatry Lab at Beth Israel Deaconess Medical Center (BIDMC) in 2019 [22]. It was later refined by a committee of 12 individuals representing diverse backgrounds, including, physicians, nurse practitioners, medical students, patients, and researchers, convened by the American Psychiatric Association. The original model prior to committee action was created by reviewing 961 questions across all 45 existing app evaluation models in 2017. Redundant questions were removed, and the remaining 357 questions were grouped into five priority levels: background, info, privacy and safety, evidence, ease of use, and data integration. These levels were then arranged into a pyramid shape to encourage prioritization of privacy and safety first [4•]. The idea being that if an app does not meet criteria at the lower level, no further app evaluation is necessary. Overall, the focus was on creating a data-driven tool that relied on objective evidence rather than subjective qualities and expert consensus [4•]. The model has undergone iterations, and currently the five levels are (1) accessibility and background, (2) privacy and security, (3) clinical foundation, (4) engagement style, and (5) therapeutic goal [15•]. The evaluation results in a qualitative assessment of an app that emphasizes the clinician and patient making an informed decision regarding apps used in the patient’s individual circumstances. Numerical scores were avoided to maintain validity in a dynamic app marketplace that is constantly undergoing updates and changes.

Resources to enable app evaluation and use

One barrier for the implementation and use of any app evaluation framework is that it takes time and effort to carry out evaluations. To streamline app evaluation processes, organizations have created databases or “evaluation hubs” that are freely available for use by the public to operationalize the process of app evaluation. The mHealth Index and Navigation Database (MIND) is one such database that is based on an expanded version of the APA App Evaluation Model with 105 questions [23•]. The MIND website provides an accessible user interface that categorizes apps based off answers to the questions posed by the MIND framework and encourages engagement by app users, allowing them to add their own ratings [23•]. Evaluation hubs are a convenient way for clinicians and patients to assess apps using criteria deemed important to their needs. Like apps or frameworks, the evaluation hub chosen should be carefully vetted as some do not update their app evaluations, have been found to have low interrater reliability across frameworks, or do not share the criteria by which the apps are evaluated [15•, 24].

In addition to using evaluation hubs, some clinics employ digital navigators who utilize these tools to create a list of apps that are then approved and authorized for use by the institution. While this approach may limit available options for clinicians and their patients based off institutional guidelines, it promotes the use of quality apps and provides a go-to list for clinicians thereby decreasing the barriers for app implementation into the clinic environment [25•].

Adoption of app evaluation in the clinical setting

As more mental health apps enter the market, it seems clear that apps are here to stay. Indeed, users are beginning to rely on this technology for clinical reasons. However, practically, many questions remain about how to integrate apps effectively and safely into existing treatment models. While app evaluation frameworks exist, clinicians and others struggle to adopt them into everyday use.

Apps offer a new type of clinical intervention, and the process behind evaluating them can feel unfamiliar to those who work in health care systems and to providers. This begs the question: who is responsible for evaluating apps for use in clinical practice? The provider? The patient? The healthcare system, at large? Table 2 identifies three systemic levels at which app evaluation should occur. We assert that, to effectively adopt apps into clinical practice, an evaluation should occur at each of these three levels. In sum, we highlight current barriers to app evaluation at each of these levels and posit potential solutions.

Table 2 Factors influencing app utilization in clinical settings

Governmental regulation

The broadest level at which app evaluation must happen is at the US governmental level. Formal regulation from governing bodies, such as the Food and Drug Administration (FDA), is limited within the USA. As previously discussed, oversight at this level has encountered several barriers. One such challenge in evaluating apps by governmental bodies is how they have been reluctant to adopt a framework for app evaluation. There are several reasons for this. First, while oversight via regulation of apps is important, some companies developing these tools contend that increased regulation poses a barrier to innovation. A possible solution to this issue could be the adoption of modified app evaluation frameworks, such as the APA Model. This would encourage potential app users and developers to use a framework and adhere to a minimum level of review for app quality and safety since the evaluation process was encouraged by a respected and trusted governing body. These reviews could be available in a searchable database and vary in specificity or depth based on how the app is categorized. The reviews could also highlight whether an app poses a low versus high risk to the user based on certain features and then be summarily regulated.

Second, with respect to app reviews, another limitation to their perceived trustworthiness might entail how resources are allocated towards this endeavor. For instance, are reviewers compensated for their work and, if so, how does this influence their reviews? To that end, it should be noted that some countries have indeed systematically integrated an app review process within their healthcare system, including the National Institute for Health and Care Excellence (NICE) in the United Kingdom (UK) and others. Stern et al. discuss Germany’s Digital Healthcare Act that was passed in 2019 and created a “Fast-Track” regulatory and reimbursement pathway for digital health applications in the German Market (known as DiGA) [26•]. This “Fast-Track” pathway establishes market access for DiGA that are lower risk medical devices primarily used by patients. These devices must meet pre-specified requirements related to safety, functionality, quality, data protection, data security, and interoperability to be eligible for regulatory review. The regulatory process provides flexibility in how and over what period of time researchers can present evidence showing that the application of DiGA is better than the absence of its application, known as positive care effects [26•]. It allows for studies that are clinical or epidemiological studies and studies using methods from healthcare, social or behavioral research. This approach is welcoming to the use of real-world data (RWD) and real-world evidence (RWE) collected outside of traditional randomized controlled trials. RWD is data relating to patient health status or care delivery. RWE is the clinical evidence showing the usage, risks or benefits of the device derived from the analysis of RWD. While this approach provides for rapid, flexible and timely digital medical product evaluation, there are no international standards on best evidence practices for the use of RWE. It will be important to consider new data science methods and techniques for analyzing these data, educating health care providers on how to assess studies incorporating such evidence, and the evidence required for regulatory approval versus payer coverage will need to be defined [26•]. Overall, the promise of app-evaluation-to-practice has been actualized outside of the USA, even if it is lacking here at home [27].

Healthcare system–wide evaluation

At a healthcare system level, leaderships are seeking ways to systematically incorporate mobile app evaluation and integration into clinical psychiatric practice. In researching current practices, we see the emergence of two models within healthcare systems: the creation of “digital clinics,” and the use of “digital care navigators.” Digital clinics have been previously defined as clinics which augment standard patient services. These include office visits or telehealth appointments and digital tools (mobile apps, digital platforms, etc.) [28]. We seek to expand the digital clinic definition to the provision of services using digital tools along with corresponding digital training and defined workflows which can enable the augmentation of care conducted alongside face-to-face visits or telehealth encounters. The Digital Psychiatry Clinic at BIDMC perhaps is an exemplar for this expanding definition of a digital clinic. This digital clinic is strategically embedded within a framework of care, enabling digitally trained navigators to work among the healthcare team to promote the use of mobile apps for patient care. The clinic further incorporates digital data gathering and interventions through its mindLAMP app, an open-source app that collects both active and passive data from patients, as well as digital tools such as journaling, safety planning, and mindfulness activities [29]. Furthermore, while some institutions have not created full digital clinics, we are seeing an increasing interest in the role of digital navigators, or the creation of new team members who have specialized training in digital health technology and can assist patients and providers in selecting digital tools [30]. Figure 1 shows the role of a digital navigator within a digital clinic environment.

Fig. 1
figure 1

The role of the digital navigator in a digital clinic. Digital navigators assist clinicians and patients with incorporating apps into clinical practice. For clinicians, digital navigators perform app evaluations using app evaluation frameworks or referencing evaluation hubs. They maintain a running list of recommended apps for clinicians to use. For patients, digital navigators introduce patients to apps and provide technical support in helping them use the app. For apps that collect data (sleep, activity, subjective patient entered data), the digital navigator receives this data from the mHealth app, processes this data, and summarizes it for clinician use. Overall, digital navigators assist with the use of mHealth apps so that clinicians and patients can focus on addressing symptoms, goals, and treatment.

At the heart of creating sustainable digital clinics and digital navigator training programs is app evaluation. As the process of evaluating apps can be time-intensive, we recommend that healthcare systems consider the use of evaluation hubs that have published evaluations on apps using a high-quality framework. Furthermore, we suggest that training programs for digital navigators should involve training on an app evaluation framework model in addition to taking into account the unique clinical environments in which the apps are intended to be deployed [25•, 31].

In addition to the challenge of finding systematic and time-effective ways of evaluating apps, additional barriers at this level include finding ways to monetize staff member’s time for utilizing digital technologies with patients. As reimbursement remains a limitation, many digital clinics and health systems are rolling out novel digital therapeutics and technologies solely in the realm of research. One such example is Zucker Hillside Hospital/Northwell Health’s Digital Psychiatry Program, which currently relies predominantly on funded studies to implement novel tools such as the Valera app or wearable device studies to its clinical patient population. We address reimbursement below in Additional Considerations.

Individual clinician or patient evaluation

Finally, at the most basic level, there is a need on an individual level for clinicians and patients to independently evaluate apps. The discussion about treatment at its most fundamental level begins with the healthcare provider and patient. We believe it is important to empower individuals to evaluate apps for several reasons. While governmental regulations may be on the horizon, implementing changes at a system level takes time and apps are already here, being used. Furthermore, even if an advanced healthcare system with structured government regulation, digital navigators and clinics existed, treatment decisions occur on an individual basis through discussions between healthcare providers and patients. Empowering members of the healthcare team to have conversations about risks and benefits of using digital tools and clinical appropriateness, which maximizes customization for the patient’s need, will be imperative to maintaining an alliance in a new digital age [25•, 31, 32]. As discussed above, few apps have been compared to evidence-based treatments, and even less is known about how particular populations engage with these apps, which is crucial for delivering culturally relevant care. In discussing apps at the patient level, consider a patient’s background, level of literacy, preferred language, and see if any apps are available that incorporate cultural values, norms or references that would most benefit the patient [33•]. Apps that are developed and designed with diverse populations in mind will be more equipped to address needs of underserved patients and deliver culturally relevant care [33•].

However, several barriers exist, including lack of training of providers in evaluating apps, and concern over the time it would take to evaluate and become familiar with the dearth of digital apps and tools being created. Here we underscore the importance of medical education curriculums educating healthcare providers on app frameworks, as well as the importance of the creation and utilization of evaluation hubs such as MIND. In conclusion, methods for evaluating apps at patient-doctor, health system, and government levels are all necessary.

Additional considerations for app implementation

Informed consent

A patient’s decision to use an app as a part of their clinical care should be a genuine reflection of their autonomous choice [34]. Healthcare professionals should discuss with patients the possible risks and benefits of any recommended therapy, including an app, and feel comfortable in deciding to not use an app if they do not want to use it. The APA ethics committee recommends that healthcare professionals engage in an informed consent process stating the potential risks inherent to using an app, including a loss of personal privacy and more [35]. It is important to note that patients may assume that mHealth apps are adhering to the same privacy and security standards as health care entities, the same that they expect when interacting with their providers, but this may not be the case. Patient consent to use these apps should be informed and voluntary.

Reimbursement

As of now, reimbursement for a clinician in the use of mental health apps in clinical care is inconsistent with respect to current practice management and delivery systems models. For instance, 2023 Remote Therapeutic Monitoring (RTM) offers a Current Procedural Terminology (CPT) code of 98,978. This code allows a practitioner to bill for the use of a Cognitive Behavioral Therapy device. The value and payment for this code is established by Medicare Administrative Contractors, and could vary nationally [36, 37]. To bill for this code, the app must monitor the patient for at least 16 days per month and be used for at least 20 min per month. The practitioner must check in with the patient during that one-month time, and only one practitioner may bill the RTM CPT code during a 30-day period. These codes cannot be reported in combination with Remote Patient Monitoring (RPM) codes [37]. Overall, increased reimbursement for apps would facilitate their implementation, but much work still needs to be done in this arena.

Conclusion

Mental health apps have the potential to increase access to mental health care if they are clinically effective, easy to use and respect the privacy of patients. With over 10,000 mental health apps and few regulatory guidelines in place, finding appropriate apps is challenging. App evaluation frameworks such as the APA App Evaluation Model and evaluation hubs such as MIND help to guide relevant app discovery. Governmental, healthcare system, clinician-patient, and digital navigator roles are crucial in furthering clinical app use. Overall, these efforts are key in harnessing this technology for the benefit of patients.