Keywords

1 Introduction

Managing the costs of dementia is a pressing problem facing many countries all around the world [29]. In the United States alone, there are an estimated 5.4 million Americans suffering from Alzheimer’s Disease (AD), the most common cause of dementia today. The costs of managing AD is estimated to be as high as $236 billion dollars a year. This number is projected to continue to grow. By 2050, the number of AD sufferers is expected to reach to 13.8 million, and cost the country $1 trillion dollars [3]. Early detection of dementia can help patients, their families, and caregivers, better prepare for the disease to improve the overall quality of life [9, 28]. The stage preceding dementia is termed mild cognitive impairment (MCI) [22]. Since age is the greatest risk factor for developing dementia, making screening of MCI more accessible to the larger older adult population is a priority.

The current approach for diagnosing dementia and MCI relies on accurate characterization of everyday functioning. Currently, self-reported and family-reported questionnaires are used to identify functional ability [15, 19]. Questionnaires have numerous advantages, including efficiency and low cost. However, they also have numerous drawbacks. Questionnaire data reflect ones appraisal of his/her ability to perform daily activities in the natural context. Limited insight, bias, and/or cognitive dysfunction may compromise the validity of self-reports [24, 31]); consequently, caregiver-reports are generally preferred. However, some adults are not comfortable asking a relative to report on their functioning and many do not have a knowledgeable or healthy informant. As of 2014, over a quarter (28%) of non-institutionalized adults over age 65 live alone and the proportion of elders living alone increases with advanced age when functional difficulties may be more likely to emerge (e.g., 46% of women aged 75 and over lived alone). Even older adults who live with a close family member may not be observed when performing daily activities, particularly when functional difficulties are mild and do not disrupt independence. Additionally, older adults differ in their current and past daily routines, which may be problematic when assessing functioning with questionnaires. For example, a person who typically performs only a few simple tasks (e.g., making a sandwich for one person) throughout the day may be judged by an informant to be less impaired than a person who consistently is required to perform more complex tasks (e.g., preparing a full-course meal for a large family).

Performance-based tests address some of the drawbacks of questionnaires by recording participant behaviors while they perform standardized daily tasks in the laboratory [17]. The major limitation of performance-based tests is that they are time-consuming and require extensive training to score to identify subtle difficulties. This paper describes our initial experiences in developing a wearable computing-based system to address the limitation of the NAT and other performance-based tests. Our system consists of three components: (1) a smartwatch that collects and synchronizes the accelerometer and gyroscope data with the captured video; (2) a video annotation system that allows the clinicians to record notes while viewing the captured video; and (3) an analytical toolkit to sift through the sensor readings and identify features based on clinician-supplied input parameters. Our objective is to eventually help the clinician streamline the analysis of NATs by processing of the smartwatch collected sensor values to try and identify episodes that resemble errors.

2 Related Work

The popularity of smartphones and smartwatches has led to a new approach for monitoring human activity. These devices come with sensors that can detect, among other things, ambient light, acceleration, orientation, and so on. While these sensors cannot directly determine what the user is doing, there has been considerable research into algorithms that make use of these sensor data to infer human activity [2, 14, 18]. Most of existing work focused on recognizing common physical human activities like walking, jogging, biking, etc. [6, 7, 13, 16, 25]. More recent research has considered recognizing more complex human activities like eating [11, 27] and smoking [23]. Our work differs from existing research in that we focus on identifying errors rather than activities. Compared to activities, e.g. walking, errors are more difficult to quantify, and are more personalized.

Fig. 1.
figure 1

Photograph of participant performing the NAT (breakfast preparation task). The left picture shows the start of the task. Note the presence of distractor objects (ice-cream scoop, paint brush, and salt shaker) found on the table. The right picture is the participant using his dominant arm to reach for the coffee. Participants are instructed to use their dominant arm as much as possible while carrying out the task.

Another related area of research is the use of cameras to infer human activity [5, 20, 21]. Rather than infer human activities using sensors, this approach uses the captured camera images to classify human activities algorithmically. More recent work in this area has expanded beyond traditional cameras, towards using 3D cameras like the Microsoft Kinect, which can capture additional depth information [8, 30]. We avoided using cameras for detection because of the lack of flexibility. The advantage of wearable devices is that they can be carried or worn unobtrusively on the person virtually all the time. This is unlike camera-based systems which can only record activities in a specific physical space.

Ubiquitous computing systems approach towards human activity recognition is a holistic approach that integrates different types of sensors and cameras together [1, 10, 12]. Work on “smart table” systems [4, 26, 32], for instance, which are used to track diet, can be also be repositioned to monitor the NATs. However, ubiquitous computing systems often require embedding sensors directly into the environment, which is expensive to deploy in practice. Our approach, on the other hand, uses consumer wearable devices like a smartwatch, which is readily available.

3 Background

The NAT is a standardized, performance-based measure of everyday functioning that requires participants to complete common tasks of increasing complexity with little guidance from the examiner. The clinician can identify and quantify the severity of a participants cognitive impairments based on the his or her actions while carrying out the task. Participants with cognitive impairments have been shown to commit more errors while competing the NAT compared to participants without cognitive impairment. A wide range of error types have been reported and include overt errors, such as inaccurate task sequencing, use of distractor objects, and omissions of task steps. In people with more mild cognitive impairment micro-errors have been observed and shown to correlate with performance on cognitive tests. Micro-errors are more subtle than overt errors and include misreaching toward distractor objects and hesitations before reaching to target objects (see Table 1).

Table 1. Summary of errors used in NAT analysis

An example of a NAT task is to pack a lunch bag with the necessary objects (e.g., thermos lids) and distracting objects (e.g., spatula) distributed evenly on a large table. Figure 1 is a picture of a participant performing a breakfast preparation task involving making toast and coffee. A typical NAT session will involve the participant sitting at a table with the necessarily objects to complete the task, as well as additional objects to serve a distractors. For example, a coffee making task will include the coffee powder, milk, sugar, etc. within arms reach on the table, as well distractor objects like a salt shaker, a spatula, etc. that are not part of completing the task. The participant is told to complete the task by the clinician ahead of time. The complete instructions is shown in Fig. 2. The entire task is recorded on video for later analysis to evaluate the participants performance.

Fig. 2.
figure 2

Verbal instructions to participant.

The NAT video is scored by trained coders blind to participant details (e.g., group membership, test scores, etc.). The video is scored for overt errors and micro-errors using standard scoring procedures. Scoring procedures focus on cognitive failures during the task and do not penalize participants for clumsiness or poor dexterity. Overt errors are grouped according to a widely published taxonomy that includes omissions, commissions, and action additions. Micro-errors include imprecise actions that do not reach the level of overt error. Table 1 summarizes these overt and micro-errors.

A key advantage of using NATs is that the tasks are sufficiently commonplace to be familiar to people across different socio-economic backgrounds, thus allowing a well-developed NAT to be used for a wide range of participants. However, the entire analysis process is both time consuming and labor intensive. Each video needs to be reviewed by multiple trained scorers, who then need to arrive at a consensus on the outcomes. Our system seeks to simplify this process through the use of smartwatch and analytical toolkit.

4 System Design

Our system consists of three components: a smartwatch component to collect the accelerometer and gyroscope orientation information; a video annotation program to facilitate the annotation and segmentation of the videos, and an analytical toolkit to process the resulting data.

4.1 NAT Workflow with Wearables

At the start of the experiment, the examiner will synchronize smartwatch and video recorder to the wall-clock time. The participant will be instructed to wear the smartwatch on his dominant wrist throughout the duration of the task. We also placed a brightly colored bracelet on the non-dominate hand with the words “DO NOT USE” printed on it to remind the participant not to use the non-dominant hand. The NAT was initially designed for stroke populations and was developed to be completed using only one hand in an attempt to accommodate people with hemiparesis following stroke. At the end of the experiment, the data from the smartwatch is extracted and archived.

Fig. 3.
figure 3

Division of breakfast preparation task into sub-tasks.

The video annotation program is used to help divide a NAT task into small sub-tasks. Figure 3 illustrates the division of the breakfast preparation task into smaller sub-tasks. The coders will use the video annotation program to identify the start and end times for each sub-task. Since both the recorded video footage and smartwatch are synchronized to wall-clock time at the beginning of the experiment, we can associate each sub-task with the corresponding accelerometer and gyroscope data for the duration of that sub-task. This annotated date is fed into the analytical toolkit for later analysis.

Fig. 4.
figure 4

Example of configuration file for the analytical toolkit.

The analytical toolkit uses the smartwatch data to identify videos, or segments of videos, that are suggestive of errors. The toolkit allows the coder to specify parameters to identify segments of interest. Figure 4 shows an example of the configuration file where the coder wants to determine the number of pauses, and the length of each pause, that occurred within each sub-task. The output also includes the timestamp of the video segments that correspond to each pause event so that the coder can return to the original video to review as needed.

4.2 Associating Smartwatch Data with Errors

Rather than attempting to match the accelerometer and gyroscope data with the errors listed in Table 1. Instead, our approach is to identify features that are suggestive of errors. To better understand errors and how they related to smartwatch data, we examined the collected videos to determine possible features which can be identified from the smartwatch data.

Fig. 5.
figure 5

Example of a micro-error from the “Add Jelly” subtask. Part (a) shows the participant adding jelly to the bread. In part (b), the participant is done with adding jelly and the next step is to replace the cap onto the jelly jar to complete the subtask. Part (c) shows the participant’s hand reaching towards (but not touching) the jelly jar instead of the cap, and part (d) shows the participant quickly withdrawing his arm when he realizes this is a mistake. Parts (e) and (f) shows the participant performing the correct action of picking up the cap and placing it onto the jelly jar.

Figure 5 shows several frames of a video segment of a participant with MCI performing the NAT lunch sub-task of putting jelly onto toast. In Fig. 5(a) the participant is adding jelly to the toast, in (b) he places the knife down on the plate, and in (c) he starts to reach towards the jelly jar. This action was coded as a micro-error, because as shown in (d), even before touching the jelly jar, the participant quickly redirects his reach towards the jelly jar lid. In (f), he places the lid onto the jelly jar. The micro-error depicted in (c) is characterized by a sudden and sharp arm movement from one object to another. From our preliminary observations of videos of participants with and without MCI performing NATs, we have identified two features that might be indicative of micro-errors. A micro-error is an imprecise action that do not reach the level of overt error, as described in Table 1.

The first are pauses where the participant’s hand remains stationary in the air in the middle of completing a task. This could indicate a participant’s hesitancy about the completing a sub-task, i.e. does salt or sugar go with coffee. The mobility of the arm can be measured by computing the magnitude of the x, y, and z-axis of the accelerometer, \(\sqrt{x^2 + y^2 + z^2}\). A pause is a consecutive period of time where the arm remains static. This is determined by whether the magnitude of the accelerometer data is within a user-specified threshold value \(\tau \), i.e. \(\sqrt{x^2 + y^2 + z^2} < \tau \).

The second feature is the presence of sudden movements, where the participant’s hand moves accelerates in the same or opposite direction. This could indicate instances where the participant realizes a mistake before completing an action, e.g. initially reaching for the ice-cream scoop (instead of the spoon), but self-corrects before actually touching the ice-cream scoop. We determine a sudden movement when either consecutive x or y-axis accelerator values are larger than a user-supplied threshold value. We do not consider the z-axis values because a fast downward movement may be common when picking up objects.

4.3 Preliminary Results and Discussion

We tested our system on two older adults, OA2 and OA7, performing the lunch task NAT. One of the older adults (OA2) has mild cognitive impairment (MCI), whereas the other (OA7) does not. Figure 6 shows the demographics of the two older adults, as well as the differences in pauses and sudden movements based on the smartphone data. The Mini Mental Status Exam (MMSE) is a test of global cognitive status given to the older adults. Both OA2 and OA7 performed within the healthy range on this task. The Functional Assessment Questionnaire (FAQ) is a self-reported measure of functioning. A higher score indicates more functional problems. Both older adults performed within the normal range on the FAQ.

Fig. 6.
figure 6

Preliminary results from smartphone data.

As shown in Fig. 6, the results of human error coding showed that the older adult with MCI committed 1 overt error and 9 micro-errors, whereas the healthy older adult made no overt or micro-errors. Data from the watch reveled that the older adult with MCI had approximately 50% more pauses than the healthy older adult, as well as 17% more sudden movements. The threshold for pauses and sudden movements is a magnitude of 1.0 and difference of acceleration of 0.5 m/s respectively.

Our preliminary findings indicate that the smartwatch can capture data from a NAT that can be indicative of MCI. However, we also identified two open issues. The first issue is the need to develop methods to individualized parameters and thresholds in the toolkit. Older adults differ in speed and dexterity of their physical movements, which may not be directly related to their cognitive abilities. However, the features of sudden movements and pauses are sensitive to these physical differences. One of our future aims is to explore ways of adjusting the parameters based on participant’s physical capabilities. One approach is to include a profiling stage into the NAT to capture sufficient data to adjust the parameters. Another open issue is to identify a better method for accurately identifying episodes of overt and micro-errors. From Fig. 2, we see that the instructions to the participant’s are fairly generic. This means that participants’ arm movements may differ because they are completing a subtask differently than other older adults (e.g., adding two scoops of sugar rather than just one). Such individual differences may not be indicative of any difference in cognitive state. One approach we are considering is a finer-grain approach that examine a few specific sub-tasks, rather than the entire NAT trace, to identify errors.

5 Conclusion

Early detection of dementia is an important problem for many countries, especially countries with an aging population. Current methods of detection are not scalable to large population. This paper describes a system that uses consumer smartwatches to capture data from NATs to facilitate the identification of MCI among older adults. Preliminary experiments indicate that the approach is promising.