INTRODUCTION

Mobile technology applications, or “apps,” have been widely promoted as a strategy to improve health through enhanced self-management of chronic conditions for patients and families.1 There is a movement toward harnessing patient-generated data through apps to track personalized trends in health behaviors such as diet, exercise, medication adherence, or other healthcare-related tasks in order to improve daily behaviors and ultimately health outcomes.2 Healthcare systems are becoming increasingly interested in using apps to integrate patient-generated data such as home blood glucose or blood pressure, into the electronic health record, in order to improve treatment plans.3 Most importantly, patients, including low-income and non-English speaking populations, perceive that mobile technology could help with self-management.4

Many have suggested that mobile technology has potential to reduce health disparities.57 There is evidence that racial/ethnic minorities in the US are just as likely as whites to use mobile phones and smartphones.810 Additionally, smartphone use is increasing among low-income populations.11,12 Therefore, even though many low-income chronic disease patients do not have access to these technologies today, developing effective self-management support tools on mobile platforms is still critical, because we expect their use to continue to expand. Experts have suggested that because mobile technology is ubiquitous, apps can lower barriers to engaging in positive health behaviors and self-managing chronic conditions.13 However, there are also concerns about technologies widening the digital divide if only advantaged populations use them.14 In particular, it may be important to tailor these technologies to various groups in order to make them beneficial for diverse audiences, to improve health care quality, and reduce costs.15 In addition, there may be increasing interest among vulnerable/underserved populations in utilizing mobile health for diabetes management specifically.16

Previous evidence has documented very poor usability of health systems’ internet-based patient portal websites among an older, racially/ethnically diverse patient population.17 In addition, studies have identified mobile health app usability barriers for older patients.18 While researchers have evaluated the usability of diabetes apps themselves,1921 there have been very few studies that have examined the usability of commercially available mobile apps among end-users—especially among a predominantly lower income patient population.2224 Therefore, we selected several mobile applications for diabetes, depression, and caregiving and conducted usability testing with diverse patients in each target group.

METHODS

Mobile Application Selection

Our search strategy sought to identify popular and well-rated apps targeting individuals belonging to vulnerable populations. We selected three areas of apps for evaluation: diabetes, depression, and caring for the elderly. We selected diabetes because diabetes apps are the most prevalent chronic disease-specific apps available commercially.25 In addition, managing diabetes requires significant self-management skills (such titrating insulin in response to blood glucose values), for which mobile apps may be useful. We selected depression because mental health disorders represent the largest area in which the U.S. government has invested app development efforts (primarily targeting veterans).2628 We also focused on depression because of the suggestion that app-based therapy may complement or partially replace face-to-face interactions with a clinician.29 Caring for the elderly often involves geographically dispersed caregivers and asynchronous communication, which makes it a clear target for technology-enabled improvement.30

We queried the Apple iTunes (iOS) and Google Play (Android) stores on 3 November 2014, using the search terms “diabetes,” “depression,” and “elderly.” For each of the three search terms, we extracted the first 50 iOS listings (150 apps in total) and first 48 Android listings (144 apps in total), including the description, reviews, ratings, and screenshots.

First, three reviewers (KS, KD, and LPN) individually selected the five best iOS and five best Android apps for each of the three areas based on the app store listings. We judged app quality through a holistic evaluation based on its description, consumer ratings and reviews, and screenshots. The reviewers then met to purposefully sample four apps from each area (12 in total), with the goal of selecting the best apps with different functionalities within each of the three areas. During this process, we downloaded and tested each of the apps that we considered for inclusion in the final cohort to ensure that the functionality and appearance of the app matched its description in the app store. We arrived at the number of apps chosen for each area (four) by balancing the time required for usability testing with the goal of contrasting different approaches to the same health conditions. All the selected apps were available for download free of charge.

We attempted to contact the developers of the apps via email giving the developers the opportunity to opt out of having their app mentioned in this study. Only one developer requested that their app not be named; we refer to this app as ‘Diabetes app.’

Study Setting and Patients

The University of California, San Francisco (UCSF) Committee on Human Research approved the study. This study was based at a publicly funded urban outpatient primary care clinic located on a hospital campus. We recruited participants via flyers posted in the primary care clinic, co-recruitment with another usability study of the hospital’s patient portal, from a diabetes support group, and through provider referral.

Participants were eligible for the study if they were English speaking, were over 18 years of age, and had adequate vision, hearing, and cognitive ability to consent and participate in the study. In order to gain an understanding of how participants would use these apps for their own conditions, we recruited participants who had the target condition for which each app was developed: type 2 diabetes, depression, or being a caregiver.

We collected demographic information including age, gender, and race/ethnicity (White or Caucasian, Black or African American, Hispanic/Latino, Asian or Pacific Islander, American Indian/Native American, or Other). We used previously adapted questions used in in-depth interviews about patient portal use to assess: 1) interest in using the internet to manage their health and 2) frequency of internet use.4,31 To estimate health literacy status, we administered a one-item scale noting confidence filling out forms (not at all, a little bit, somewhat, quite a bit, extremely)32 that has shown to be predictive of internet-based personal health records use in previous literature.33 In accordance with prior studies, we classified participants noting any lack of confidence filling out forms as having limited health literacy.34 Because this safety-net setting does not accept private insurance, participants either have Medicare/Medicaid or do not have health insurance. While we did not ask questions about income level, the patient population at this hospital is known to be low income.35,36 All participants self-identified as having type 2 diabetes, having depressive symptoms, or being a caregiver for someone with a chronic condition in order to participate in the testing of the concordant category of apps. In addition to these conditions, we asked participants if they had asthma or chronic obstructive pulmonary disease (COPD), heart disease, high blood pressure, heart failure, and chronic kidney disease. During the interview process, we did not collect protected health information or patient identifying information.

Study Interview

Participants were asked to complete a variety of tasks using information provided to them for each of the mobile applications in the category in which they were participating. For example, we provided an empty prescription medication bottle with instructions for metformin 1,000 mg twice daily and asked diabetes patients to enter these medication instructions into each app (Complete interview guide is available online as Appendix 1). In order to give the participants context for using the apps, we only asked them to evaluate apps that were created for the health conditions that were relevant to them. For instance, only participants who had active caregiving responsibilities tested the caregiving apps. We explicitly asked patients to consider how technology like the apps they were testing would fit into their lives and their self-management activities prior to asking them to test the apps.

Two different types of tablet devices were used: an Apple iPad fourth generation model number MD510LL/A and a Samsung Android model number SM-P600. Patients alternated between accessing the app on the Apple and the Android tablets unless the app was only available on one platform.

Two video cameras were used during the interview, recording both sound and the participant’s image, with one camera focused on the participant’s face and the second camera focused on the tablet that the participant was using in order to record their hand movements. We conducted interviews in a private office with the door closed. One interviewer (GIG) conducted all the interviews; for two interviews, a second interviewer was present.

Analysis

For this analysis, we focused on selected tasks, in the broad categories of data entry and information retrieval (Table 1). This allowed the comparison of tasks across apps and across chronic conditions to be as similar as possible.

Table 1. Task Demonstrations Included in this Analysis

Video files of the interviews were stored on a password-protected secure server maintained by UCSF.

Coding and Analysis

The coding scheme to categorize task completion was developed a priori using adapted usability metrics from prior studies.17,37 We identified the proportion of tasks that were completed independently; and the degree of completion, categorized as: a) successful/straight-forward, b) successful/prolonged, c) partial, unsuccessful/prolonged, and d) gave up.17,37 Definitions and examples of the categories of task completion are outlined in Table 2.

Table 2. Description of Responses for Each Task Completion Type

All coders (CC, GIG, SO) first coded the same interview in order to calibrate their coding and refine the definitions of tasks and codes. They met together to compare their coding of this initial calibration video to reach consensus. Following this consensus process, each subsequent video of an interview was coded by a single individual. After the initial coding of each video, each code was reviewed by a different coder (CC, GIG) with any discrepancies noted. The two coders met to resolve any differences and reach consensus on each code.

In addition to this deductive approach to classifying barriers to usability, we also captured open-ended comments from participants about usability that we felt shed further light on their experience with these apps. These comments were analyzed with inductive, open coding and investigators (GIG, US, CRL) read the comments and identified themes.38,39 Thematic saturation was reached after three to four interviews in each app category, but all comments were coded as pre-specified.

RESULTS

The 26 patients included in this evaluation were diverse, with varying prior computer or tablet experience and varying reported health literacy (Table 3). Most had one or more chronic health conditions (Table 3). All apps required significant manual data entry, and most tasks required progression through multiple screens and steps. Tasks ranged in complexity from numeric scoring (such as entering a recent blood glucose value for diabetic patients or rating the user’s mood on a scale for the depression apps) to free text entry for journal or diary entries. Across all tasks, participants attempted completion; none simply gave up when confronted with the app.

Table 3. Patient Participant Demographics

We first examined patients’ performance in entering data into each application. Data entry required significant effort for all apps with proportions of successful data entry task completion (combined categories of “Successful/Straight-Forward” and “Successful/Prolonged”) ranging from 89 % for blood glucose entry for InCheck, a diabetes app, to 50 % for entering a medication or appointment into Capzule, a caregiving app (Fig. 1). For 51 of 101 tasks (51 %), participants were able to complete data entry without assistance. They were hampered by the need to navigate through multiple screens and by unclear explanations of what data needed to be entered. For diabetes, there was wide variability even in ease of entry for blood glucose, one of the simplest tasks to complete. In Diabetes Connect, 2/10 patients were able to successfully log their blood glucose without assistance; in “Diabetes App,” 3/10 were successful; in Social Diabetes, 7/10 were successful; in InCheck, 8/9 were successful (Fig. 1a).

Figure 1.
figure 1

Mobile application data entry tasks. a Diabetes Apps. Log Blood Sugar for all diabetes apps. Participants were provided with a blood sugar reading and were asked to log that blood sugar reading in the App. b Depression Apps.* *Data entry for the depression apps included recording mood (Optimism and T2 MoodTracker) and taking a PHQ9 test (Depression CBT and MoodTools). c Caregiver Apps. Participants were provided with a medication and/or an appointment time and asked to enter the Medication or Appointment depending on the app.

Participants struggled even more with data retrieval from the apps (Fig. 2). Participants often had difficulty retrieving data, such as appointments entered into the caregiving apps. Figure 2 shows the proportion of participants completing data retrieval for each application. Participants completed 79/185 (43 %) of data retrieval tasks across all 11 apps without assistance.

Figure 2.
figure 2

Mobile applications data retrieval tasks. a Diabetes Apps. Participants were asked to Check Average Blood Sugar for all apps except InCheck.* *Diabetes Data Retrieval – InCheck task was recipe retrieval. b Depression Apps. Optimism & T2 MoodTracker – Graph retrieval of previously entered emotional/mental states. Depression CBT and MoodTools – Retrieval of Audio Mediation talk and an inspirational video. c Caregiver Apps. Medication or Appointment Retrieval.* *Data retrieval for the caregiver apps was not always conducted for participants due to the length of the interview process and order that apps were tested, with Capzule always the last app tested. Data retrieval for Capzule was for a Blood Pressure flow chart.

In their spontaneous comments during the exercise, participants expressed three main themes (Table 4). While they expressed interest in using technology for self-management support, they also expressed a lack of confidence in mobile technology use and frustration in attempting to perform self-management tasks using the apps under study.

Table 4. Participant Reflections About Apps for Self-Management

DISCUSSION

Mobile apps have great potential to improve patients’ self-management of chronic diseases. However, overall, the usability of the apps was suboptimal. Patients and caregivers who are the target populations for these mobile health apps struggled to complete basic self-management tasks. This demonstrates the gap between the potential and reality of mobile health technology for self-management with regards to the population in this study.

Apps developed for patients with chronic illness or family members assisting these patients should be appropriate across a wide age spectrum. Despite this, none of the apps appeared to have simple interfaces with large buttons and easy-to-follow instructions and navigation, which would likely be necessary for engaging a broad age range—and would make the apps relevant for those with lower literacy as well.

A core premise of the apps is that tracking data digitally confers advantages over recording with pen and paper because of the ability of the app to synthesize data. However, current apps’ data retrieval interfaces simply did not work for participants. If they cannot retrieve their own synthesized data effectively, participants cannot realize the benefit of using technology. In general, the apps’ set of functions were presented as self-evident rather than with an explanation of why this might be an important activity for monitoring a chronic condition or for care-giving. For instance, the diabetes apps gave no explanation for why a user would want to look back at a prior meal.

The ease of use of these applications would also be greatly improved with more automated features. In particular, all of the apps that were analyzed were stand-alone programs not linked to any other data. There are barriers to development of more integrated apps, but it would clearly improve usability enormously if medical information about visits or medications could be auto-populated from patients’ devices, pharmacies and/or the electronic health record.

Despite its strengths, our study has several limitations. There are hundreds of apps available to help manage diabetes,20 depression, and caregiving. While we reviewed a limited number of apps, they were chosen through expert review of a large number of commercially available apps as representative of the very best. Our sample size, while modest, is comparable with similar studies.40 Direct observation is prohibitively time intensive for larger samples and we did reach thematic saturation. Many patients had limited familiarity with tablet computers. However, given the high prevalence of diabetes and depression among low-income populations, apps need to be developed that are appropriate for even those not well versed in tablet use in order to ensure that health disparities do not widen. Our study participants knew they were being recorded, which can affect observed behavior;41 it is possible that we overestimated the usability of apps because of social desirability bias. We asked participants to imagine how these apps might help them manage their chronic conditions, but it is possible their responses to the usability testing would differ in a different context, for example, if their own care team or the health system provided self-management apps to them. We did not evaluate whether apps used a theoretical framework or construct in their design; this would be an important future step. Finally, we did not assess the efficacy of the apps for improving health outcomes. However, we view ability to interact with each app as a prerequisite to efficacy studies.

These results suggest that there are significant usability barriers for diverse populations with chronic conditions. Patients could often not complete basic yet critical tasks, like entering their glucose levels. This underscores the need for these applications to have better usability. Enormous private investment has entered the mobile health application marketplace, in the hopes that mobile technology can improve chronic disease management and reduce health care costs. Our data suggest that these gains will not materialize unless usability improves significantly. Usability is just one prerequisite for widespread use of mobile apps for health; future studies should examine provider data needs and electronic health record information.

We recommend the following design features to enhance the usability of mobile health apps for diabetes, depression and caregiving: (1) a clear rationale embedded in the design such that participants are reminded of the reason behind each task; (2) use of simple language supplemented by graphics throughout; (3) reducing the number of screens for completion of each task; and (4) reducing manual data entry as much as possible, by integrating with pedometers and glucometers, for example.

Our results demonstrate the unmet need for participatory design, extensive testing, and training with diverse patients.42 Without this type of up-front attention to usability, we would not expect diverse populations to adopt mobile technology. Such formative work should be followed with rigorous evaluation approaches using either randomized trials or quasi-experimental designs that measure a range of implementation outcomes including uptake, use, self-management behaviors, health outcomes, and sustainment.43 If we cannot harness the potential of mobile technology to improve self-management and, ultimately, health, it will be a missed opportunity in efforts to ameliorate health disparities.