Keywords

1 Introduction

Autism Spectrum Disorder (ASD) is a neurodevelopmental disability characterized by persistent difficulties in social communication and interaction [1]. Patients with ASD could display repetitive patterns of challenging behaviors such as self-injury, meltdown and tantrum, and aggression [2] which arise from a set of triggers.

The agitation has negative outcomes for both ASD patients and their caregivers. It can lead to increased stress levels and impaired quality of life [3]. Consequently, the need for assistive technologies to improve the well-being of children with ASD and their caregivers is growing. With the advance of technology, we could develop solutions to prevent these maladaptive behaviors. Before, behaviors were reported by caregivers. In such cases, the physical presence of the caregiver with the child during his daily activity is required. In this regard, the development of automatic detection methods based on wearable signals is needed to replace this constant physical assistance. Recently, the detection of agitated behaviors based on wearable data has grabbed the attention of scientists. Although, there is a lack of literature related to this topic.

Given the above-mentioned, this paper explores the feasibility of detecting agitated behaviors of ASD children based on their physiological and kinetic signals. We will compare different classification algorithms (SVM, Random Forest, XGBoost, and TabNet). The main contributions of our study are:

  • Collection of real data from a 9-year-old male autistic child during his everyday activities. Compared to previous works, our data collection is performed in a natural and real-life process.

  • Investigation of numerous features from electrodermal activity, blood volume pulse, and acceleration signals.

  • Development of various classifiers resulting in improved performance compared to the state-of-the-art methods.

The remainder of this paper is as follows: Sect. 2, provides a literature review. We will give more details about our proposed models 3. Section 4 discusses our results and compares them to previous research. Finally, Sect. 5 concludes this paper.

2 Literature Review

In this section, We will first define EDA and BVP signals, then we will review related work in the subsequent section.

2.1 Physiological Signals

Biomedical signals are very rich with information about changes in the psychophysiological systems. Hence, we will investigate the performance of some physiological signals to detect agitated behaviors of ASD children, basically EDA and BVP signals.

Electrodermal Activity: also known as Galvanic Skin Response GSR, reflects the autonomic changes in the electrical and conductivity properties of the skin consequent to an increased sweat glands activity [4, 5]. EDA has been widely used as a physiological index in psychophysiological research since it is linked to autonomic emotional and cognitive processes. EDA serves as a valuable biomedical indicator in many fields such as emotions and behaviors [6].

EDA is an aggregate of two components: the tonic activity (Skin Conductance Level SCL) and the phasic activity (Skin Conductance Response SCR) [4]. Several methods have been developed to extract SCR and SCL from EDA. In this paper, we will use the cvxEDA [7] which relays on convex optimization and maximum a posteriori approach.

The phasic component corresponds to a series of SCR peaks typically characterized by three components: onsets, peaks amplitude, and half-recovery. Some methods have been suggested to identify these peaks and their characteristics. In our case, we used the method described in [8] to detect these peaks from SCR.

Blood Volume Pulse: is measured using the PPG sensor, which relays on the pulse oximeter technique to measure the blood volume in the micro-vascular of the target tissue [9]. In fact, the sensor illuminates the surface of the skin with IR light and measures changes in the back-scattered light and the portion absorbed by the skin [10]. Even though this technique is inexpensive and suitable for daily life applications, the BVP signal is susceptible to several artifacts. Nevertheless, health-related features such as heart rate and heart rate variability can be extracted from BVP.

Basically, the signal has three characteristic points: diastolic point, systolic point, and dicrotic notch.

In our study, we will use the NeuroKit [26] and HRV analysis [25] Python packages to process EDA and BVP signals. These packages offer comprehensive tools for the analysis and interpretation of physiological data.

2.2 Related Works

The literature on the detection of agitated behaviors using biomedical data is limited. While there have been studies investigating similar areas such as emotion recognition [11, 12], anxiety, and stress detection [13] of ASD patients.

Alban et al. [14] developed three machine learning models (Decision Trees, SVM, Multi-Layer Perceptron) to detect challenging behaviors of a 10-year-old male ASD child in interaction with social robots and toys based on physiological and kinetic signals. From each signal, they extracted four time-domain features (mean, min, max, and std) from a window of two seconds. To balance the dataset, they applied resampling. Combining all the signals and using the MLP, they achieved the best accuracy equal to \(97\%\). Although the result reflects a good performance, it suffers from subjectivity to the experimental protocol of the study (interaction with social robots and toys).

Meltdowns and tantrums often lead to impulsivity and agitation. Khullar et al. [15] designed a wristband with three sensors to collect Heart Rate (HR), Skin Temperature (ST), and Galvanic Skin Response (GSR) in order to detect meltdowns and tantrums in ASD patients. Different preprocessing techniques were applied (null values removal, filtering, normalization). A CNN+LSTM model was developed to detect these behaviors. The performance in terms of precision, recall, and F-1 is equal to 0.98, 0.95, and 0.97 respectively. Unfortunately, the authors did not provide any detail about the ASD patients or the data collection protocol.

Other studies have developed models to predict atypical behaviors. For instance, Goodwin et al. [16] developed a predictive model to anticipate agitated behaviors 1-min before they occur using physiological and kinetic signals. Their study involved 20, ASD patients. The signals were filtered and processed to extract 10 time-domain statistical features, and 2 binary aggression labels were also extracted and used to train a ridge-regularized logistic regression model. Two types of models were investigated: a global model using data from all participants, and 20-person dependent models, developed for each patient. The Area Under the Curve AUC of the global model is 0.71, while the average AUC for person-independent models was 0.84.

In a second study, Goodwin et al. [17] improved their previous models by using Principal Component Analysis PCA for features reduction and kernel-based SVM to predict aggressive behaviors 3 min in advance. The global model achieved an increased AUC of 0.98.

The major drawback of the aforementioned studies is that the datasets were collected in controlled environments under a specific experimental setup. Consequently, the results are subjective to the experimental protocols. In contrast, our study focuses on developing detection models for agitated behaviors in a real-world context during the everyday activities of an ASD child.

3 Proposed Models

3.1 Data Collection

It is important to note that there is currently no publicly available dataset for this specific topic. Consequently, the first step consists in collecting real data from an ASD child using the Empatica E4.

We were able to recruit a 9-year-old male autistic child to take part in our experiments, although many families we contacted refused to participate. The patient suffers from moderate to severe autism and does not take any medication. Data collection took place over a period of 6 months during his daily activities (playing outside, doing homework, etc.).

E4 records EDA (4 Hz), BVP (64 Hz), Heart Rate HR (1 Hz), Inter Beat Intervals IBI, Skin Temperature ST (4 Hz), and acceleration ACC along the x, y, and \(z-\)axis (32 Hz). The wristband has an internal 60 hours memory capacity. Data transfer between E4 and the computer is made via an E4 manager Software that should be installed on the computer. The recorded signals are exported to csv files for further analysis.

The child wore the device and performed his activities in a naturalistic manner. The parents accompanied their child to annotate the moments of agitation. It should be noted that the participant refused to wear the device during school, resulting in most of the activities being conducted at home. Therefore, the moments of agitation were relatively rare compared to normal behaviors. In fact, the child acts aggressively at school or when confronting other people.

3.2 Signal Processing

The quality of signals significantly impacts the performance of a model. Mainly, in this step, we will apply filters to reduce the noise from raw signals, followed by the extraction of relevant features. The collected signals are corrupted by different sources of artifacts. The following paragraphs detail the distinct processing steps employed to extract relevant features from raw signals.

EDA Processing: EDA is hampered by its sensitivity to motion artifacts [19]. Raw EDA is firstly pre-processed using the wavelet-based-adaptive denoising procedure as described in [19]. Secondly, the signal is filtered using \(4^{th}\) order Butterworth low-pass forward-backward filter with \(F_{cut} = 0.5\,\text {Hz}\) as in [20]. Thirdly, we employed min-max normalization to remove the difference between the different recordings. Moving forward, the next steps include the decomposition of EDA into SCR and SCL using the cvxEDA algorithm [7]. The three components are segmented into overlapping windows for features extraction [20, 21]. In contrast to papers reviewed previously that have investigated a limited set of statistical features from raw EDA, our work encompasses a wider range of features from raw EDA, SCR, and SCL. More specifically, we extract time-domain features, peak-related features, and frequency-domain features. In total, we calculate 80 distinct features, which are summarized in Table 1.

Table 1. Extracted features from EDA, SCR, and SCL.

BVP Processing: The BVP signals of our recordings are very noisy. We first, used the winsorization technique to remove values above \(98^{th}\) and below \(2^{sd}\) percentile as in [20]. Next, we applied a \(6^{th}\) order Butterworth band-pass forward-backward filter with a frequency range \([0.6Hz-3.3Hz]\) as in [22]. Later, we normalized the cleaned BVP using min-max. To extract systolic points, we used the pipeline described in [23], which are later cleaned using the HRV-analysis [25] to discard outliers and ectopic peaks [24]. RR intervals are calculated from BVP based on systolic points. The different types of features extracted from the RR interval basically comprise time-domain, frequency-domain, geometrical, and Poincare-plot features. The following Table 2 defines the 30 BVP features.

Table 2. Extracted features from BVP signal.

ACC Processing: We filtered the ACC signals as suggested by [27]. We applied a low-pass \(4^{th}\) order Butterworth filter with \(F_{cut}\) = 10 HZ. Features extracted from each axis include time-domain and frequency-domain features as summarized in Table 3. The total number of features extracted from ACC data is 69.

Table 3. Extracted features from ACC signals.

3.3 Methodology

Our main goal is to detect agitated behaviors of ASD patients using wearable data. For this purpose, we will establish different binary classifiers. While, as stated above, the dataset collected suffers from severe imbalance. For skewed class distribution, we typically apply cost-sensitive learning models [28]. This is because misclassifying agitated behaviors carries a higher cost compared to misclassifying normal behaviors during the model training process.

The first step after collecting the data consists in processing the different signals and extracting salient features. Subsequently, we will explore several types of classifiers: Support Vector Machine [30], Random Forest [29], XGBoost, and TabNet [32]. SVM works by finding a hyperplane that separates the classes of the training data. Random Forest is an ensemble learning algorithm based on multiple decision trees each of which is trained on a different subset of the training data and a random subset of features. XGBoost follows the principle of boosting [31] which consists in combining a set of weak classifiers to create a stronger one. TabNet on the other hand, is a deep-learning model designed for tabular data. It uses a sequence of attention mechanisms to evaluate the importance of each feature at every training step.

4 Experiments

4.1 Setup

Our dataset comprises multiple sessions, with the child exhibiting aggressive behaviors in only 6 of the recordings, and these behaviors were observed for a few minutes. To extract features, we segmented the recordings into windows of 30 seconds with a \(50\%\) overlap. For each window, we calculated the features illustrated in the previous Sect. 3.2. The next step consists in training the different models over a dataset containing segments from all sessions. To ensure optimal model performance, we conducted an extensive search for the best hyperparameters for each model using the Grid Search method. The hyperparameters that yielded the best results are presented below:

  • SVM: kernel =rbf, gamma = scale, and C = 5.

  • RF: n_estimators = 1000, criterion = gini, max_depth = 20, max_features = auto, min_samples_split = 5 and min_samples_leaf = 4.

  • XGBoost: n_estimators = 2500, eta = 0.01, max_delta_step = 8, max_depth = 12, min_child_weight = 12, gamma = 0.5, reg_alpha = 0.1

  • TabNet: n_d, n_a = 64, n_steps = 6, gamma = 1.5, lambda_sparse = 0.01, learning_rate = 0.025, momentum = 0.98, max_epochs = 300.

4.2 Evaluation Metrics

We used stratified \(10-\)fold cross-validations to test the performance of our models since it keeps the same dataset imbalance proportion. We calculated precision, recall, F-1 score, and balanced accuracy. Furthermore, we calculated the AUC and Precision-Recall AUC (PR AUC). AUC measures the model’s performance by calculating the True Positive rate against the False Positive rate. On the other hand, PR AUC measures the precision and recall of a model. This metric is useful when the model suffers from imbalance.

4.3 Results

In this section, we will present the results achieved by each classifier and compare them to previous research.

Table 4 corresponds to the average evaluation metrics values using the different classifiers and combining the features from EDA, BVP, and ACC signals.

Table 4. Average evaluation metrics of the stratified 10-fold of the different classifiers using the combined features (Mean\(\%\) ± Standard Deviation\(\%\))

These results revealed that the XGBoost model outperformed RF, TabNet, and SVM, achieving an \(80\%\) balanced accuracy. Specifically, precision, recall, F1-score, AUC, and PR AUC of XGBoost are 0.77, 0.60, 0.67, 0.80, and 0.72, respectively. SVM achieved similar precision as XGBoost (0.7) while the recall is notably lower (0.38) resulting in a lower F1-score of 0.50. In terms of PR AUC, SVM achieved 0.61. RF model achieved the lowest average F1-score (0.41) and PR AUC (0.43). In the literature, TabNet outperformed XGBoost in several competitions including Kaggle, in our study, XGBoost yielded the best performance. One possible reason could be the limited amount of data to train TabNet. Nevertheless, it still achieved the second highest F1-score, AUC, and balanced accuracy with values of \(56\%, 76\%,\) and \(77\%\) respectively, indicating acceptable preliminary results. Collecting more data and exploring the parameters of TabNet could potentially improve its performance. In addition to its performance, XGBoost exhibited faster training and validation times compared to TabNet.

We compare our results to a previous study [14] in Table 5. In [14], the authors developed an MLP classifier to detect the challenging behaviors of an ASD male child using four statistical features (min, max, mean, and std) from EDA, BVP, HR, and ACC. Table 5 compares the results achieved using XGBoost and the performance of the MLP in [14].

Table 5. Performance comparison of our model and the results of [14]
Fig. 1.
figure 1

The importance of features determined by XGBoost classifier of each signal EDA (a), BVP (b), and ACC (c)

This comparative Table 5 shows that our method achieved higher performance. In fact, the features extracted in our paper contain a large amount of information, which could reflect the behavioral state of the child.

With a focus to identify the contribution of each signal to our detection model, we calculated the features importance of the XGBoost classifier. The feature vector combines EDA, BVP, and ACC information respectively resulting in a \(179-\)dimensional vector. Figure 1 displays the contribution of each feature of each signal: EDA 1a, BVP 1b, and ACC 1c.

Based on our results, the extracted features of the EDA, BVP, and ACC signal contributed \(80.15\%\), \(5.6\%\), and \(14.25\%\) respectively to XGBoost. These results confirm the literature assumption about the performance of EDA in identifying the psychophysiological state of individuals. As mentioned in Sect. 2, EDA has been extensively used in studies related to arousal investigation such as stress and anxiety, and emotion detection. Conversely, BVP contributes less to the detection of agitated behaviors, potentially due to the fact that this signal is very susceptible to environmental artifacts, leading to a less reliable heart rate and heart rate variability readings.

Previous research focused on kinetic data to detect agitated behaviors. However, this kind of signal solely could not contribute to a good generalization. Our results demonstrated the feasibility of using physiological biomarkers combined with kinetic data to detect agitated behaviors of people on the spectrum.

5 Conclusion

Agitated behaviors represent significant challenges for individuals with autism and their caregivers, impacting their well-being. Technological solutions could improve their quality of life. This paper attempted to study the viability of using signals collected from wearable devices during everyday activities to detect agitation. The proposed framework involved pre-processing to remove additive noise, followed by relevant features extraction for each type of signal. The features were later combined to train different classifiers to detect agitated behaviors.

Our approach showed promising results across multiple evaluation metrics. More specifically, XGBoost achieved the highest performance in terms of balanced accuracy, precision, recall, F1-score, AUC, and PR-AUC.

These findings suffer from some limitations including a limited number of recorded moments of agitation. Moreover, we were only able to recruit a single ASD child. We firmly believe that the proposed models could be improved through the collection of a larger dataset from a larger number of ASD patients, hence refining the performance of classifiers.

The implication of our findings is significant in terms of enhancing the quality of life of patients with ASD. By providing caregivers with timely notification of moments of agitation, they can prevent harmful consequences associated with these destructive behaviors. Our proposed framework holds the potential to alleviate the negative impact of these behaviors and promote a safer and more supportive environment for patients with autism. Additionally, our future work will focus as well on developing predictive models capable of anticipating agitated behaviors before their occurrence. This proactive approach in this case will help caregivers to intervene early and prevent unwanted consequences.