Introduction

As the population ages, more individuals will be diagnosed with dementia. The World Health Organization (WHO) estimates that currently nearly 50 million people have dementia worldwide with approximately 10 million new cases being diagnosed annually [1]. With the progression of the disease, at some point persons with dementia (PwD) are unable to live alone and, thus, the PwD would require assistance in taking care of their affairs. At this stage of dementia, a caregiver is necessary. This care can be provided by an institution which caters to the elderly, or by a family member or a close friend. Informal caregiving is defined as unpaid, ongoing assistance with activities of daily living to a person with a chronic illness or disability, such as dementia [2,3,4]. For African Americans, the inclination is to provide care to their own loved ones in lieu of placing their family member into structured institutions [5].

PwD agitation has been confirmed as a major concern affirmed by caregivers. With agitation being the foremost issue caregivers of PwD face, it is important to explore a technological solution which can be used to distinguish between episodes of agitation and non-agitation as well as forecast agitation events using data captured prior to, during, and after an agitation event. Due to the sporadic nature of PwD agitation, it has been difficult for researchers to pinpoint when an episode may occur. However, our hypothesis is that a link may exist between agitation and the ambient environment. In a nursing home setting which caters to dementia patients, a study by Van Hoof determined one of the causes for both restlessness and increased frequency of agitation as bright lights [6]. Furthermore, Joosse suggests that the heightened sound levels has a significant correlation to PwD agitation in a nursing home environment [7].

This research further extends the proposal by Goins et al. [8], which recommended a data-driven model based on deep learning algorithms and a combination of ambient environment data, behavioral data from both members of the dyad, and physiological data from the PwD to determine the occurrence of agitation. The research presented in this paper strives to achieve accurate results in forecasting upcoming agitation. As is often the case with health-related studies, the class imbalance issue of the dataset (e.g., significantly more instances of one class than of another) presents a challenge when applying data-driven methods. For instance, one would assume that more non-cancer screenings would result when checking for breast cancer in mammograms, fewer patients with poor arteries when assessing the risk of heart attack, or the less positive cases of COVID-19 when swabbing the general public. In this study, this problem is addressed through down-sampling the data of the majority class (i.e., non-agitation), while retaining all of the data associated with agitation events.

In this paper, a data-driven deep learning methodology is presented which, utilizing data collected within the home of a dementia dyad, distinguishes between agitation and non-agitation time segments using data collected from a single dyad of the Behavioral and Environmental Sensing and Intervention for Caregiver Empowerment (BESI) study [9,10,11,12]. Here, five environmental factors (i.e., light, audio, temperature, humidity, and air pressure) were collected and validated against the ground truth record of agitation events as reported by the caregivers. The BESI research effort consisted of three phases: (1) designed and tested the use of the tablet to collect PwD agitation information from a select few caregiver homes and verified a system for collection of environmental data from the home and behavioral data from the PwD, (2) collected and analyzed data from 12 dementia households to ascertain which ambient features were most linked to agitation and (3) provided an intervention system for which detected PwD agitation and notified the caregiver of that agitation along with carefully aligned offers of interventions to shorten the length of and decrease the severity of the detected agitation events.

Here, we implement a data-driven approach to forecast an upcoming agitation in PwD based on environmental stimuli. To this end, we use data from one of the dyads in the BESI project which was collected from 5 active relay stations for 64 days. Along with caregiver reports of PwD agitation, the data include ambient acoustic noise level, illuminance, environment temperature, atmospheric pressure, and humidity level. We preprocess the data by estimating the missing values, followed by standardization and normalization. Then, we implement and train two deep learning models without reducing dimension in the data. We also apply Principal Component Analysis (PCA) to reduce the dimensionality and train and optimize two other models based on PCA’s output.

The remainder of this paper is organized as follows. The related work section provides an overview on the applications of data-driven methods in the medical domain. The data-driven modeling section discusses data-driven methods to assess caregiver burden. Then, we conduct a case study: data-driven forecasting of agitation in PwD. We conclude with a results and discussion followed by conclusion.

Related Work

Family caregivers of PwD often experience extreme stress and depression, issues of declining physical and psychological health, financial problems, and limited personal space [13,14,15]. In addition to declines in emotional and physical health, the colossal effect of social isolation on informal PwD caregivers is a valid concern [16]. One of the most persistent difficulties faced by caregivers, though, is the angst associated with challenging PwD behaviors [17]. Agitation is one of the major encounters that the caregiver faces and there is no set pattern that PwD agitation follows. As a result, the caregiver must be prepared at all times so that these events can be handled properly. Traditionally, agitation-related assessment and caregiver interventions are discussed with caregivers in medical settings. To the best of the authors’ knowledge, there’s no agitation prediction model found in the literature.

PwD agitation can be defined by three distinct characteristics: physical, verbal, and psychotic behaviors [18]. Aggressive behaviors include activities which could cause physical harm such as hitting, kicking, pushing, biting, and scratching as well as destruction of property and as well as vocal activities such as talking loudly and using inappropriate or threatening language [18, 19]. Nonaggressive behaviors include pacing, wandering and other general restlessness, as well as hiding items [17]. Physical agitation behaviors displayed by the PwD, especially those involving aggression, are the most burdensome for familial caregivers. A study on PwD patients concluded that negative PwD behavior, found in over half of the patients, not only interrupts patient care but also frustrates caregivers [20]. The literature confirms that angry and aggressive PwD behaviors are proven predictors of caregiver depression.

It has been recommended that managing PwD disturbing behaviors should begin with nonpharmacologic interventions to improve behaviors [21]. This includes providing a safe environment, eliminating conditions in the environment that could cause stress (e.g., reduce TV sound, increase lighting, etc.) and identifying agitating or frightening situations. Interventions include redirecting the PwD, establishing appropriate sleep schedules, and developing a reward system for positive behavior [21]. Increased social support has proven to benefit the caregiver by reducing their burden [22]. Moreover, social support has been identified as a moderator of the connection between negative PwD behavior and caregiver’s depressive moods; however, social support mechanisms are not consistent across all caregivers of PwD [22, 23]. Likewise, a novel method, simulated presence therapy (SPT), can be used to reduce agitation. Here, during an agitation episode, presence of a celebrated family member is simulated by playing a recording of the family member for the PwD [24].

Data-driven technology has shown promise in many areas of medical research and patient care. Moreover, due to the abundance of available data emanating from a myriad of sources, many health-related research endeavors have benefited from deep learning architectures which make use of this plethora of data. Data-driven technology has also been used in the medical environment. For instance, using electronic health records of patients with major depressive disorder (MDD), two collaborating North Carolina universities (Duke and UNC-CH) applied a data-driven approach in creating a visual representation of data from past and present patients with similar diagnosis to successfully assist physicians in decision support [25].

In Japan, where the elderly population is the highest in the world, past check-up data (waist circumference, body mass index, systolic and diastolic blood pressure, etc.), history of medications for hypertension, diabetes, and dyslipidemia were collected from pre-elderly patients (younger than 60 years old). This abundance of data along with information on previous recommendation for candidacy for a health-guidance was used to identify regular health guidance candidates. Using ensemble learning (in particular, Gradient-Boosting Decision Tree or GBDT), the results outperformed the baseline approach in the AUC by over 40%, resulting in an accuracy of 99.3% with a confidence interval of 0.993 [26].

In another study for screening and prediction of mental health, data from ubiquitous sensors, social media, and healthcare systems were fused for digital phenotyping applications. Four challenges from this data-driven approach were noted heterogeneity, volume, noise, and sparse data [27]. Depression has been studied using a data-driven approach, determining trajectory groups based on more closely spaced (i.e., weekly) severity ratings [28]. A sleep study found that the effects of insomnia may not be limited to sleep complaints, suggesting that a data-driven approach including a list of other complaints such as quality of life, demographics, dysfunctional beliefs, childhood trauma should be considered [29]. Lastly, data-driven methodology was used as a complementary method in determining food intake of adolescents with data-driven methods showing some characteristics that were not present in the hypothesis-driven method [30]. In treatment of acute inflammation, for instance, the medicine dosage designed by data-driven methods increased survival rates from 73% to 88%. This outcome proves that data-driven models contribute to personalized treatment [31]. In another study, deep learning class of machine learning models was utilized in risk assessment and detection of diabetic foot ulcers [32].

Finally, machine learning and deep learning models have been applied successfully to medical diagnostics and screening procedures. In a lung cancer study, a convolutional neural network was trained to predict carcinoma in whole slide images. The results of the network trained on four separate datasets were within the range of 97% for the area under the curve [33]. In a similar lung cancer study, lung nodules were classified using deep learning algorithms whereas convolutional neural network yielded an accuracy of almost 90% compared to 85% for the traditional computer aided diagnostics (CADx), the deep belief network slightly exceeded CADx and the stacked denoising autoencoder only slightly trailed CADx in accuracy [34]. A similar study using computed tomography (CT) and positron emission tomography (PET) scans of 14 patients with varying stages of esophageal cancer was conducted to determine the need for surgery. Results of manually extracting the features in the traditional machine learning method and convolutional deep learning methods, which extract their own features, were compared. When traditional and deep learning techniques were used in unison 100% accuracy was achieved; whereas, an accuracy slightly less than 93% was obtained using only traditional machine learning methods [35].

Data-Driven Modeling

Environmental data provide insights for assessment of caregiver’s burden. For example, some environmental events such as dimming light or increased noise levels might correspond to triggering the agitation in the PwD. Finding these events which are correlated with agitation and their impact on caregiver burden is of interest.

Monitoring the dyad’s environment has been made achievable and affordable with the latest advances in digital technology. With the popularity of Internet of Things technology, a PwD is already living in a smart home or the home can feasibly be converted into a smart home. There are plenty of sensors within a smart home which actively measure different types of variables including climate (humidity level, atmospheric pressure, temperature), illuminance, and sound levels. Alternatively, personal devices such as smart phones and tablets can be utilized. Miniaturized devices such as smart phones and wearable devices in close proximity to the dyad can be utilized for capturing biomarkers.

Assessment of Caregiver Burden

While the caregiver is first person who notices the caregiver burden, other means such as surveys, talking with family members and friends, a caregiver support group, and the medical team assigned to the PwD can be employed to ascertain the caregiver burden. However, the description of feelings that an individual provides is highly qualitative and subjective. Hence, it is difficult to develop a comprehensive framework to adequately measure caregiver burden. Data-driven methodologies can be employed to address this challenge.

Data-Driven Assessment

Different sources of data can be utilized to ascertain caregiver burden. For instance, psychometric surveys are administered to the caregiver and summarized to create a subjective synopsis of the experiences of everyday caregiving. Similarly, moments of outburst and tranquility can be ascertained using movement data which are collected by wearable devices that PwD wears. Determining the appropriate tools for caregiver burden assessment is of importance due to many possible causes of caregiver burden. These tools range from subjective caregiver input to sensors and other smart home devices.

Case Study: Data-Driven Forecasting of Agitation in PwD

Data Preprocessing

The data utilized in this paper were acquired from the BESI project, as explained in the Introduction. Three sets of environmental data were collected by each relay station placed in the home. Collected environmental data consisted of light, audio, and interior weather conditions. For this phase of the study, each of the environmental sensors gathered data throughout a time period of approximately two months. The intensity of light surrounding the sensor was measured in units of lux. For audio data, decibel levels were collected by the relay stations indicating sound intensity and not the actual voices or words from individuals. Lastly, weather data such as temperature, humidity, and barometric pressure were obtained using the relay stations. A time stamp was taken along with each data value.

The dyad chosen for this work had five relay stations dispersed throughout the home. Data were collected for a span of exactly 64 days. The raw data are provided in different segments, each of which containing a timestamp which indicates the start and duration of that segment. Each segment contains data in records which have a time offset in seconds. The summation of segment’s timestamps and records’ time offsets provides the absolute time of record. In BESI, the measuring devices, i.e., relay stations, were installed in different locations of the dyads house to gather a comprehensive environmental reading. The floorplan of the house in which these relay stations were installed is available; however, since the location-specific data of agitation incidents are unknown in most cases, we have averaged the data from relay stations.

An Exploratory Data Analysis (EDA) showed that (1) the sampling rate varied from segment to segment; and (2) there are some time windows with no available data. To address the inconsistent sampling rate problem, we calculated the time difference between two consecutive segments based on their timestamps; then, linearly interpolated the content of the first segment to match the duration of the time difference with the next segment. While it is possible to apply linear interpolation to address the missing data problem, the length of the missing data period might be long enough to result in inconsistency by linear interpolation. Therefore, assuming each sensory data follow the same pattern throughout the day, sensor readings from days before and after the missing value can be leveraged to estimate the missing value. The x-axis in Fig. 1 shows the days while the y-axis shows the seconds in each day. Therefore, the last record of each day is connected to the first record of the next day. Assume at time t, the sensor reading of the ith day and jth second is missing. The last sensor reading before t is a and the first sensor reading after t is b which are d1 and d2 seconds before and after t, respectively. The sensor readings at jth second of the days before and after, if available, are c and d. The value of t can be estimated using Eq. 1. The proof of Eq. 1is provided in the appendix. Figure 2 shows a snapshot of temperature data before and after applying Eq. 1.

Fig. 1
figure 1

Reference points for the estimation of missing values

Fig. 2
figure 2

Left: before applying Eq. 1, missing values are filled with zeros. Right: after applying Eq. 1

$$t=\left\{\begin{array}{c}\begin{array}{cc}\frac{{d}_{1}\left(b+c+d\right)+{d}_{2}\left(a+c+d\right)}{3({d}_{1}+{d}_{2})}& \mathrm{if both} c \mathrm{and} d \mathrm{are available}\end{array}\\ \begin{array}{cc}\frac{{d}_{1}\left(b+c\right)+{d}_{2}\left(a+c\right)}{2({d}_{1}+{d}_{2})}& \mathrm{if only} c \mathrm{is available} \end{array}\\ \begin{array}{c}\begin{array}{cc}\frac{{d}_{1}\left(b+d\right)+{d}_{2}\left(a+d\right)}{2({d}_{1}+{d}_{2})}& \mathrm{if only} d \mathrm{is avaibale}\end{array}\\ \begin{array}{cc}\frac{a{d}_{2}+b{d}_{1}}{{d}_{1}+{d}_{2}}& \mathrm{if neither} c \mathrm{nor} d \mathrm{are available}\end{array}\end{array}\end{array}\right..$$
(1)

After sequencing the segments and applying the interpolations, the result is a 5 by 5529,600 tensor, in which columns are representing each of five sensor readings while rows are representing 1 second timespan. Mean, Standard Deviation, Median, and Max of every 60 consecutive samples (one minute) are calculated and stored for each sensory data. Finally, the values are standardized by subtracting the mean and dividing by standard deviation. The final dataset is a 20 by 88,911 tensor. After some initial models were trained, it was discovered that normalizing data, which scale the data to be in the range of [0, 1] improves the results; therefore, normalization is applied on the standardized data.

The selected dyad has recorded a total number of 41 agitations. Each agitation has a timestamp and a level which represents the severity of agitation from caregiver’s perspective. We converted the classification problem to a regression problem by considering a time window with the span of one hour for each agitation (− 30 min to + 30 min). In this time window a gaussian-like function is applied. This function starts from 0 at 30 min prior to time stamp, ramps up to a peak equivalent to agitation level at the recorded time stamp and decreases back to 0 at 30 min after agitation. The combination of these 41 gaussian-like curves is considered as the ground truth for the model. The highest value caregivers reported for agitation level is 6, therefore, our values range from 0 to 6.

Considering the aforementioned one-hour time window for each agitation episode leads to 2460 non-zero samples versus 86,451 samples with the value of zero. This means 97.2% of samples belong to one class (i.e., non-agitation) while 2.8% of samples belong to other class (i.e., agitation). This is naturally a class imbalance problem. Because common practices to this issue such as collecting more data or data augmentation are not feasible here, under-sampling was applied. In this approach, each non-agitation sample is kept with a ¼ chance. This leads to having 27,965 non-agitation samples which gives us a 92% to 8% dataset, which is the lowest amount of under-sampling that achieves reasonable results.

Learning Algorithm Selection

The aim of this work is to forecast an agitation based on changes in the environmental stimuli. To this end, the models are developed to receive sensory data in the past 30 min and predict the possibility of an agitation. Since, as described in the previous section, we have 20 features, the models need to accept a 20 by 30 tensor as input and generate a scaler value which is discussed in more details in the next section. Since the data are timeseries and have temporal dependencies, Long Short-Term Memory (LSTM) is a good choice to develop this model. However, it is also possible to flatten the 20 by 30 tensor into a 600-vector tensor and develop a Multi-Layer Perceptron (MLP) model based on it. Moreover, we applied PCA to reduce the number of features from 20 to 10. This means that PCA reduced the input tensor to the size of 10 by 30.

Training Models

The LSTM model consists 32 LSTM blocks followed by a 10-neuron fully connected hidden layer which uses Rectified Linear Units (ReLU) as the activation function. The MLP model connects its 600-neuron input layer to the output via a 10-neuron hidden layer. In the MLP model, both input and hidden layers use sigmoid as the activation function. The numbers of trainable parameters are 7125 and 6131 for LSTM and MLP models, respectively. Similarly, the two other LSTM and MLP models, which are implemented based on PCA, have 5845 and 3131 trainable parameters, respectively. Except for the input layers, the topologies of models are kept the same before and after applying PCA. Table 1 summarizes the trained models.

Table 1 Summary of trained models

All models were trained using Adam optimizer and Mean-Squared Error (MSE) loss function. The 20 by 30 samples were generated from the sampled dataset, which was described in the previous section, using sliding window technique, in which, the oldest element of the sample is removed, and a new element is added. Therefore, each sample shares a subsample of 20 by 29 with the previous sample. Then, (after and before applying PCA) the samples were shuffled and 2/3 of them were selected for training and 1/3 for validation. Since nearly ¾ of the dataset was unseen by this model, the whole dataset was used as a test dataset. Figure 3 illustrates the entire pipeline for training models.

Fig. 3
figure 3

The employed pipeline for training models

Results and Discussion

The outputs of the models were smoothed using rolling average. Since the problem was initially converted into regression, it needs to be converted back to classification. This is done by applying a threshold value of 1, i.e., if an output is equal to or greater than 1 it is considered as an agitation while all values below 1 are no agitation. The results of models are shown in the Table 2.

Table 2 Results of trained models

As explained earlier, with having more than 97% of samples in non-agitation class, the dataset is highly class imbalanced. Therefore, a model which assigns non-agitation class to all samples achieves more than 97% accuracy. For this reason, accuracy is not an informative metric here and other metrics such as precision and recall are needed. As shown in Table 2, models have achieved a wide range of precision and recall, which makes the interpretation of results difficult. Therefore, we consider the F1-Score, which is the combination of precision and recall, as the determinative performance metric. Model 4 has achieved the best F1-Score; however, we need a baseline to compare against. Assume we have four different datasets, all of which containing two classes. The first dataset is perfectly balanced; however, in the second, third, and fourth datasets, the minority class has 25%, 10%, and 1% of the population respectively. Assume the behavior of a dummy classifier is defined as:

$$\left\{\begin{array}{c}fn=n\times (1-c)\\ fp=p\times (1-c)\\ \begin{array}{c}tn=n\times c\\ tp=p\times c\end{array}\end{array}\right.,$$
(2)

where c is the rate of correct classification, n and p are number of negative and positive samples respectively, fn, fp, tn, and tp denote false negatives, false positives, true negatives and true positives respectively. As shown in Fig. 4, which depicts the F1-Score of this dummy classifier versus parameter c, the more unbalanced a dataset is, the lower F1-Score is for the same value of c. The horizontal lines in Fig. 4 show the F1-Score of the trained models. The intersections of these lines with datasets’ curves implies how good a model is, the more right, the better. If our dataset was balanced, the performance of model 4 could have been better as the intersection of model 4 with the balanced dataset is almost in middle range of the graph. However, for our dataset, model 4 intersects the dataset curve at the far side of the graph which means model 4 is close to the best theoretically possible model and there is little to no room for improvement. However, it is possible that the performances of the models have reduced due to missing values and estimation used to fix it. Although we do not have any control over the missing values, a better performance in similar applications can be expected for a dataset without missing values.

Fig. 4
figure 4

Comparison of models’ performances on different datasets

Conclusion

The work we presented here can significantly reduce caregiver’s burden by providing them an insight about the immediate future. Caregiver’s can live their lives more peacefully knowing that they will be notified if their beloved ones are about to experience agitation. Moreover, having a model which translates environmental factors into possibility of agitation, can be used in a simulation engine to better understand how PwD react to different stimuli. This in turn provides more insight about the environmental events which are triggering agitation; thus, helping to arrange the environment with less potential for agitation and reducing caregiver burden further.

We implemented a data-driven approach to forecast an upcoming agitation in PwD based on environmental stimuli. To this end, we used data from one of the dyads in the BESI project which had five active relay stations for 64 days. The data include ambient acoustic noise level, illuminance, environment temperature, atmospheric pressure, and humidity level. We preprocessed the data by estimating the missing values, followed by standardization and normalization. Then, we implemented and trained two Deep Learning models without reducing dimension in the data. We also applied PCA to reduce the dimensionality and implemented and trained two other models based on PCA’s output. Despite some challenges such as missing values and imbalanced nature of the data, one of the trained models achieves promising results.