Keywords

1 Introduction

With the emerging of various health tracking systems, such as smart watches and smart phone apps that provide continuous health information on-demand, there is a great potential for health tracking systems, specifically designed for older adults to monitor their own health. Voice-controlled in-home personal assistants can be leveraged to assist older adults in an easier, hands-free voice interaction that can provide their health information. Our proposed method is to provide continuous, on demand health information to independent living older adults using a voice-assistant system.

Several independent living older adults at Tigerplace (an independent living facility for older adults with tiered levels of skilled care available to allow older adults to age in place) have in-home sensors, such as depth sensors, bed sensors and motion sensors to track their daily activities and health [1, 2]. The sensor data are analyzed through various algorithms to generate health alerts for the clinicians [3]. Studies show that the automated health alert system enhances the registered nurse care coordination care delivery model at Tigerplace, increasing the length of stay of the older adults living with sensor systems nearly twice as compared to the older adults who do not live with a sensor system [4]. In this study, we are exploring how these health data and alerts can also be represented in a simpler and more accessible form to the older adults and their family members. The voice-assistant system has been designed such that the older adult’s health information can be accessed by themselves as well as by designated family members. The health information is presented in the form of audio, text, and graphical visualizations. The text messages are developed using linguistic summaries based on trends in the sensor data [5]. This could possibly help the older adults to manage their own chronic health conditions and have a healthier aging trajectory.

In this study, we have used two different voice assistant devices with built-in displays: Amazon Echo Show and Lenovo Smart Display that provide health information through voice responses, text messages displayed on the screen, and data visualization graphs. Considering the health literacy appropriate for an aging population, shorter and simpler messages were created to prevent an overburden of information to the older adults. Therefore, to deliver useful health information in an easy to interpret format, the voice responses were made slower and the data visualization graphs were made simpler.

This paper provides an overview of the development of the two voice assistant systems and a preliminary comparison of their performance in recognizing voice commands. In Sect. 3, we discuss initial older adult input from a focus group study. In Sect. 4, we describe the comparative study of the two voice assistant prototypes for personal health, including a brief description of the voice-assistant platforms, a comparison of the speech recognition capabilities of the two devices, and test scenarios for using the voice-assistant app. In addition, we describe the app development process for the two devices and discuss the advantages and limitations of each platform. Section 5 includes conclusions and future work on voice-assistant systems for older adults.

2 Related Work

Studies have shown that designing user interfaces for older adults bring unique challenges. The information provided through these interfaces must be simple to understand by the older adults [6]. Previous studies have explored the human-computer interaction (HCI) challenges in developing different user interface options for older adults and their family members. In [7], Skubic et al. have presented challenges in two user interfaces for consumer health applications. The first is to provide sensor data information for detecting early signs of health detection, and the second is an interactive remote physical therapy (PT) system that can be used for remote PT sessions between a client and a therapist. The study provides insights for developing interactive user interface systems that can be used to engage older adults effectively to manage their health conditions.

There have been studies on voice-assistant interfaces for older adults [8,9,10,11]. König et al. and Riva et al. have explored custom designed voice assistants for older adults [8, 9]. Alexenko et al. have discussed the benefits of using voice-assistive technology to control an assistive robot and conducted a speech recognition accuracy test for younger versus older adults [10]. In [11], Schlögl et al. have shown that voice-assistant devices can be used effectively by older adults, though an adequate fall back modality is a necessity should errors arise. Several studies have also explored the usability of voice-assistive technologies in different fields of healthcare and health management [12,13,14]. In [12], Carroll et al. have designed a routine management system using the Amazon Alexa. They found that the system was simple yet effective for individuals with early and middle stage dementia. In [13], Pradhan et al. show the effectiveness of the voice-assistant Amazon Echo for users with a range of disabilities.

In the recent years, there has been a significant growth in natural language processing technologies that has enabled the development of several voice-assistant devices for consumers, such as, Google Home and Amazon Echo [15, 16]. There have been several studies in developing interactive user interfaces for older adults using the voice-assistant devices [17,18,19,20]. Ma et al. have developed a personalized healthcare application using Amazon Alexa [17]. The voice-assistant application provides health information collected through a wearable sensor. In this study, the authors have also performed a speech recognition accuracy test. In [18], Ennis et al. have designed a smart cabinet system that includes an Amazon Echo device and a bathroom cabinet. The system can track objects in the cabinet and provide relevant information upon asking with a voice command. Their findings show that the system received a positive usability score. However, they have highlighted a few limitations of using an off the shelf voice-assistant, e.g., Amazon Echo cannot proactively speak. In [19], Cheng et al. have explored the potential and limitations of using a Google Home application for diabetes self-management for older adults when compared to a mobile application. Results show that the participants were inclined towards using the Google Home application over the mobile application.

In [20], Choi et al. have conducted a study with nineteen older adult participants (age: 65+) over a two month time period to explore the feasibility of using voice-assistant devices to support aging in place. The authors have conducted semi-structured interviews to gather their overall attitude towards using the voice-assistant devices. Results show that the participants had a positive experience using the voice-assistant devices and expressed their interest in using this technology as a health management device to keep track of their health data, such as, blood pressure or blood sugar.

3 Initial Feedback from Focus Groups

As an initial study, twenty-three older adult participants (Mean age = 80; 85% female) and five family members (Mean age = 64; 100% female) were recruited to get their feedback on different possible platforms that could be used to show their health information and health messages, or those of their family member [21]. The participants were informed about the health and wellness system that the University of Missouri Center for Eldercare and Rehabilitation Technology (CERT) has developed to track the health of older adults. The system primarily includes three different sensor systems, including a depth sensor system to track gait and fall risk and detect falls, a noninvasive bed sensor system to track heart rate, respiration rate, sleep patterns, and restlessness in bed, and a motion sensor system to track daily activity patterns. Sensor information is accessible via a web portal for the clinical staff in senior housing sites [7]. The participants of this focus group study were then asked about the idea of using a personal health system, specifically designed for older adults and their family members to visualize the same sensor data in a format designed for them.

In this focus group study, different interface platforms were explored. These platforms included smart phones, computers, televisions, tablets, voice-assistant devices, and smart watches. Focus group participants were shown prototypes of smart phones, voice-assistant systems, and tablets that were connected to our research database containing data from an in-home sensor system for older adults living independently at Tigerplace [1]. The participants were informed about different health data presentation options for each platform, which include voice messages, text messages displayed on a screen, and text plus voice messages with data visualization graphs. Their comments were noted for each platform.

In the case of data visualization graphs, the participants preferred a simpler data graph that represents their health changes. However, they did not want to forgo important information about their health for simplicity sake. Most participants preferred a line graph as compared to bar graphs and risk meter visualizations showing their health risk level in the form of a thermometer gauge.

The focus group participants preferred to interact with the sensor-generated health information using smart phones. Smart phone and computer use were highly preferred, but a combination of technology interfaces was desirable. Both older adult and family member participants preferred options in interacting with health information and receiving health messages, e.g., emergency health alerts sent via text message and other health information accessible via computer. Most participants did not prefer a television as a medium to get their health information. From their options of voice assisted technology, they preferred a voice assistant with a visual display. However, since investigating the feasibility of using a voice-assisted device with this population group was not the main aim of the focus group study, our results are limited. In addition, since the voice assisted platforms are relatively new and many participants had no prior experience with them, a more in-depth study was planned to investigate this further.

This initial focus group study helped us to understand the preferences of older adults and family members in receiving their health messages and other health information based on the sensor data. The voice assistant systems described in this paper were designed based on the input received from the focus group participants. The voice assistant platforms show the data in the form of simple line graphs with adequate information and simple text messages that summarize the data trends.

4 Voice Assistant Prototypes for Personal Health

Google Assistant and Amazon Echo voice assistant platforms were used to conduct this study. For evaluation of the voice assistant systems, four different test scenarios were designed. The details of the test scenarios are provided in Sect. 4.3. A common prototype voice-command app was developed for both platforms based on these test scenarios. The study aims at recruiting older adults and family members in dyads for interviews to get their overall feedback towards these voice-assistant systems. In a typical interview session, participants interact with the voice-assistant systems using the test scenario scripts listed in Tables 3, 4, 5 and 6, and provide feedback based on their experience. To make the voice-assistant interaction process easier for the participants, we placed a note with the app activation command on top of each device. The activation command can be followed by a set of health questions listed in the scenario tables. Figures 1 and 2 show the voice assistant devices in use during the interview sessions. Speech recognition accuracies are compared for both systems.

Fig. 1.
figure 1

Lenovo Smart Display with built-in Google Assistant showing fall-risk information within the Health System App.

Fig. 2.
figure 2

Amazon Echo Show with built-in Alexa showing fall-risk information within the Health System App.

Fig. 3.
figure 3

Sample data visualization for sleep quality.

Fig. 4.
figure 4

Sample data visualization for fall risk.

4.1 Platforms

In this study, two-leading consumer-based voice-assistant platforms with displays were used: the Amazon Echo Show with a 10-inch display and the Lenovo Smart Display with Google Assistant, which also has a 10-inch display. These platforms were selected because they have comparable displays and offer a multi-modal interaction between the voice assistant system and the older adult user. Table 1 shows the physical dimensions and specifications of the Amazon Echo Show and Lenovo Smart Display.

Table 1. Device specifications for Amazon Echo Show and Lenovo Smart Display.

Both the devices function by staying in an always-listening mode once plugged in. The devices activate by listening to specific wake words. A user can ask a specific question to the voice-assistants by first speaking the wake word. The words spoken after the wake word are processed and a voice response is delivered to the user. The built-in displays are used in both devices to display the text of the device’s response as well as data visualization graphs as necessary.

Lenovo Smart Display with Google Assistant.

By default, the wake word for Google Assistant powered devices is “Ok Google” or “Hey Google”. The google assistant transcribes the question asked by the user and displays it on the screen. Figure 1 shows a Lenovo Smart Display Device running the health app.

Amazon Echo Show with Alexa.

By default, the wake word for Alexa powered devices is “Alexa”; however, this can be changed to “Echo”, “Amazon”, or “Computer”. The Amazon Echo device does not transcribe the question text as is done with the Google Assistant device. Figure 2 shows a picture of the Amazon Echo Show display running the health app.

4.2 Speech Recognition

We tested the speech recognition accuracy of both voice-assistants by using the “repeat after me” feature of Google Assistant and the “copycat” skill of Amazon Echo Show, to determine which platform could recognize the voice better. Twelve different sentences were used in this test, with each sentence having a different combination of words. We found that some of the words in the sentences were incorrectly recognized by Alexa while Google Assistant recognized and repeated all the words correctly. Table 2 shows a comparison of voice recognition performance between the Amazon Echo Show and the Google Assistant. In the first column: Amazon Echo (2016), we have included results provided in [17]. The second and third columns: Amazon Echo Show and Google assistant, show the misinterpreted words by the two voice-assistants using test data from six older adults (65+) and four younger adults. These are preliminary results based on the comparison test.

Table 2. Speech recognition test results with word-list and misinterpretations.

4.3 Test Scenarios

Two test scenarios were developed, with responses customized for the older adult and a family member. The older adult’s query produces a second person singular response, e.g., you or your. The family member’s query produces a third-person masculine/feminine singular response referring to the older adult, e.g., he or she. Tables 3, 4, 5, and 6 show the test scenario scripts. We also include test scenarios for husbands and wives to query health data of each other.

Table 3. Script for older adults for sleep quality.
Table 4. Script for family members for sleep quality.
Table 5. Script for older adults for fall risk.
Table 6. Script for family members for fall risk.

4.4 Ease of Development

In this section, we describe the ease of programming and development on the Alexa platform and on the Google Assistant platforms. We also describe development methods on these platforms with the goal of accommodating older adults. While voice assisted technology may be accessible and easy to use for younger adults, there are several changes that must be made to give older adults a more accessible experience with the technology. Development of user interfaces for data visualization to present health data to older adults was done based on focus group results and recommended guidelines [6]. Guidelines for designing technology for older adults indicate that some fonts are more easily readable than others [6]. Sans-Serif font has been noted as preferred by older adults and perceived as more legible when compared to a Serif font. The default font family used for our data visualizations is Sans-Serif to ensure legibility. In addition, the same guidelines specify that brighter and clearer colors tend to stick out and bring attention to themselves, which in turn result in less effort to focus on a specific image. By enhancing the contrast on the screen, a viewer may begin to use pre-attentive processes of searching for information. In pre-attentive searching, a bright and high contrast combination of colors bring attention to themselves and alleviate the user of using more effort to view the screen. As this type of searching helps older and younger adults, the data visualizations have been created with bright colors that offer a contrast to take advantage of pre-attentive searching. As a result, the line in the line graph is clearer to see, and the separations between each boundary on the graph are well known to a viewer. The contrasting colors utilized are commonly associated with other day to day phenomenon, such as a bold red for a stop sign being used to indicate a higher fall risk. In the earlier focus group study, the participants have preferred a simpler graph when shown several types of data visualizations. Therefore, the graphs used in the health app are simple. Also, the graphs have contrasting colors and the text messages are bigger [6].

The two health apps were configured on both the devices with a set of training phrases (Google)/Sample utterances (Amazon) listed in Tables 7 and 8. The apps were also trained with follow-up questions. Synonym and similar pronouncing words were included for better performance of the app, e.g., mom and mum for mother, and dad for father. More training data will be added to the apps based on the information collected in the dyad interviews.

Table 7. Training phrases (Google)/sample utterances (Amazon) for sleep quality.
Table 8. Training phrases (Google)/sample utterances (Amazon) for fall risk.

Finally, a third method of accommodation for older adults was done by slowing down the speed of the audio response on the voice-assistant devices. Both devices supported SSML tags to modify various aspects of an audio response. In this study, we elected to slow the rate of speech with SSML tags. The user interface development guidelines for older adults indicate that as aging occurs, it becomes difficult to process faster rates of speech, and that slower rates of speech are generally favored by older adults. In addition to the previous studies, our results from the initial focus group study show that older adults preferred a slow speech response.

Google Assistant.

The google assistant app was developed using Google’s Dialogflow API V2 platform. This technology is developed by Google and supports natural language conversations through devices with Google Assistant enabled. The platform can be used to develop voice-assistant applications that can provide two-way continuous conversation between the Google Home device and the user, until the user’s intent is fulfilled, or the conversation is finished. The Dialogflow platform uses “Intents” as the unique identifiers that correspond to specific user utterances. Each intent has a set of training phrases. The training phrases consist of the many possible variations of a query that have the same intent. Each intent has a dedicated response to it. There are several types of responses to choose from [22]. In addition, each intent can have a set of follow-up intents. The device sends the user’s utterance to the Google Assistant, which routes it to the fulfillment service via HTTP POST requests. The fulfilment for this application is developed using Node.js 8 programming. Several platforms were explored for deploying the fulfilment, including a University of Missouri server, the Inline Editor provided by the Dialogflow platform and the Heroku web platform. The prototype code was written in the Dialogflow Inline Editor, which is powered by the cloud functions for Firebase.

Alexa.

An Amazon Echo application, or “skill” is comprised of “intents” that each perform specific actions within the skill. An intent represents an action that can be performed by the skill and may contain optional “slot” values to accomplish more specific tasks requested of it by the user of the device. To utilize an intent, the user speaks to the device, and the device matches the spoken words to an intent via an “utterance”. Utterances are phrases that contain words that Amazon looks for when deciding which intent to select after determining the words spoken by the user and may contain spots for slot values to be inserted. When the user of an Echo device speaks to the device, Amazon determines which intent to select by comparing the words it understands to the utterances of each intent. When an intent is successfully determined, the code written to handle that intent is run.

Development for Amazon Alexa was done in Node.js version 8.10. Amazon’s APL was used alongside Node.js 8.10 to deliver full-screen data visualizations. Additionally, Amazon’s AWS Lambda service was used to host the code required for the Alexa skill. Code uploaded to Lambda was done so in the form of a zip file containing the Node.js code to handle each intent and the imported modules that the code requires. At the current time of prototype implementation, Amazon’s Alexa Presentation Language (APL), a JSON object used to format images, is in a public beta release and is utilized by our Show devices to present full screen data visualizations.

4.5 Discussion

While the idea behind both the Echo Show and the Lenovo Smart Display is the same, each has several advantages and disadvantages when compared to the other. For example, we have shown that Google Assistant scores better in speech recognition tests than Alexa does. However, in designing displays with images, currently, Google Assistant cannot display full screen images like Alexa can [22]. If developing an application that requires an image to be easily seen or displayed across the screen, working with Amazon’s Alexa may prove easier. In addition, Google Assistant at this point in time lacks the ability to change its wake word to any other option than “Hey Google”, while Alexa can be changed from “Alexa” to “Computer”, “Amazon”, or “Echo”.

So far, we have interviewed 4 dyads of people aged 65+ and one of their family members. In these preliminary interviews, some of the older adult participants have preferred the Google Assistant, as they thought the voice of the Google Assistant is more natural. However, some of the other older adults preferred Amazon Echo, as they thought the wake word – “Alexa” is easier to use than “Hey Google” for the Google Assistant. Also, they preferred the larger graphs in the Amazon Echo Show as compared to the Google Assistant. Thus far in this dyad interview study, all 8 dyad participants liked the technology and they mentioned that they would like to use it.

Although preliminary, the study illustrates the potential of voice assistant platforms as a user interface for older adults, as well as the tradeoffs between the two platforms investigated. The preliminary results show that our target users are enthusiastic about the voice assistant technologies as a healthcare information interface.

5 Conclusion and Future Work

In this study, we have developed a voice-assistant app for the Google Assistant and Amazon Echo platforms, based on feedback provided in an earlier focus group study and previous literature. The applications can provide on-demand health information, such as sleep quality and fall risk to independent living older adults and their family members. Four different test scenarios were designed to get the feedback of older adults and their family members on the usability of the voice-assistant devices for managing and tracking health.

The speech recognition capabilities of the two voice assistant devices were also compared. Preliminary results show that Google Assistant performs better than Amazon Echo in accurately recognizing speech.

Currently, we are actively recruiting dyad participants of people aged 65+ and one of their family members for testing and getting feedback on the two devices and the health app.