Supporting autism spectrum disorder screening and intervention with machine learning and wearables: a systematic literature review

The number of autism spectrum disorder individuals is dramatically increasing. For them, it is difficult to get an early diagnosis or to intervene for preventing challenging behaviors, which may be the cause of social isolation and economic loss for all their family. This SLR aims at understanding and summarizing the current research work on this topic and analyze the limitations and open challenges to address future work. We consider papers published between 2015 and the beginning of 2021. The initial selection included about 2140 papers. 11 of them respected our selection criteria. The papers have been analyzed by mainly considering: (1) the kind of action taken on the autistic individual, (2) the considered wearables, (3) the machine learning approaches, and (4) the evaluation strategies. Results revealed that the topic is very relevant, but there are many limitations in the considered studies, such as reduced number of participants, absence of datasets and experimentation in real contexts, need for considering privacy issues, and the adoption of appropriate validation approaches. The issues highlighted in this analysis may be useful for improving machine learning techniques and highlighting areas of interest in which experimenting with the use of different noninvasive sensors.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disturb characterized by social communication deficits, difficulty in speech, and repetitive and unusual sensory'motor behaviors [27]. The term "spectrum" indicates that there exists a different level of severity. Generally, all ASD individuals, including the less severe, need support during their life. The number of ASD individuals is dramatically increasing: in 2018, 1 in 59 children received an ASD diagnosis in the United States. On four autistic individuals, three are males. Most of them received the diagnosis after 4 years, although it can be detected as early as two years. Some ASD individuals are diagnosed later, whether they are [3]. It is estimated that 10-33% of autistic adults have an intellectual disability and require continuous support [5].
One of the problems of the ASD population is the hyperhypo sensitiveness to sensory stimuli, such as bright lights or noisy sounds [1]. As an example, they rarely tolerate crowded places, due to the alteration in information processing that compromises their cognitive and social responses. All the family suffers from this lifelong problem. In addition, an autistic individual becomes a frightening cost for his family for the care he needs and the lost earning of the caregivers that often cannot work at all.
At the present, the ASD cause is unknown. ASD individuals cannot be cured, and they will be ASD for all their life, but when early detected (from 6 months to 3 years) many deficits may be reduced, and communication and social skills may be improved. The control of repetitive behavior may be enhanced too.
Nowadays, wearables and mobile technologies are both economically accessible and provided adequate processing power and memory to support screening activities and improve the ASD individual life, together with the caregivers' one. Many research works are based on the use of videos in detecting or monitoring ASD people, but when examining the accuracy of these systems, many false positives have been detected. They are caused by noise or a limited view, or by caregiver interferences [12]. Systems based on data acquired through wearables, such as accelerometers or biometric sensors, can overcome the limitations of nonwearable devices and obtain results with appropriate accuracy. They are a better solution when ASD individuals move in unstructured spaces, such as at school or home [46].
In recent years, the efficacy of Machine Learning in predicting/monitoring different types of diseases in the medical [22,39] and clinical psychology domain or in performing sensor data classification [11] has been largely demonstrated in the literature and in the practical clinical [24]. The combination of wearables and Machine Learning techniques may be useful to detect particular patterns of movements or physiological signals of ASD individuals too complex to be recognized/assessed by traditional clinical methods [8], which are very expensive and time-consuming. This paper proposes a systematic literature review (SLR) on the combined use of machine learning techniques and wearable sensors for detection-intervention on ASD individuals. Due to the rapid technological evolution, we consider papers published between 2015 and February 2021. The aim of the paper is twofold: • understanding and summarizing the current research work related to the adoption of Machine Learning and wearable for supporting ASD detection and interventions; • analyzing the limitations and open challenges to address future work.
More specifically, we aim at investigating (1) the types of actions conducted on ASD individuals by machine learning and wearables, (2) the types of wearable adopted in previous work, (3) the types of Machine Learning techniques exploited by researchers, and (4) the adopted evaluation approaches. These objectives are expressed in detail by the research questions reported in Table 1.

Related work
The use of wearable technology to support autism screening and intervention has been largely investigated. Previous research works have been devoted to the analysis of the literature. In particular, the SLR conducted in [21] considered research papers on wearable and mobile technologies from 2000 to 2017. Many other papers are available until January 2021. They initially identified 4722 papers, and only 83 of them passed the selection. Besides, the authors did not give specific attention to ML techniques.
Hosseinzade et al. [16] focused their attention on IoT devices and ML techniques to support ASD diagnosis. The papers ranged from 2014 to the beginning of 2020. Unlike them, we consider both screening and intervention and conduct a deeper analysis by examining other factors, such as the target of the intervention (e.g., children, adults, and caregivers), the datasets, and the assessment of the Machine Learning techniques. They examined 28 research studies, some of them concerning the use of mobile devices or toys for collecting data.
Williams and Gilbert [44] performed an SLR focusing on the support offered by wearable for supporting autonomy and self-determination in autistic people. The paper range is 2010-2018 and the SLR is not specifically focused on Machine Learning techniques.
Hyde et al. [17] conducted a review by considering 45 studies adopting supervised machine learning in ASD.
In this paper, we consider both wearables and Machine Learning adopted in diagnosis and intervention on ASD people in a more recent period (2015-beginning 2021) and focus on different aspects, such as the kind of end-users, the machine learning algorithms, and their validation and evaluation techniques.

Contribution
The contribution of this SLR may be summarized as follows: • We select a set of 11 studies that adopted wearables and Machine Learning techniques to conduct actions to support ASD individuals. These studies may be considered for extending the knowledge on this subject. • We analyze the identified research works w.r.t. (1) the considered wearables, (2) the adopted Machine Learning techniques, and (3) the validation and performance of the ML models.
This work is structured as follows: the section "Methodology" presents the methodology adopted to conduct the SLR, while the section "Results" reports our findings. The section "Discussion and implication" discusses the obtained results and the section "Threat to validity" highlights the threats to validity. Finally, the section "Conclusion" concludes the paper with final remarks.

Methodology
As first step, we identified the paper goal, defined using the Goal-Question-Metric approach (GQM) [41]: Characterize and evaluate (Purpose) the primary domains of inquiry and inquiry gaps (Issue) in wearables and Machine Learning technology research applied to ASD detection and To explore the current state-of-the-art of autism intervention using machine learning and IoT Healthcare RQ2-autistic traits RQ2: What Type of autistic traits and problems does the action refer to?
To explore the current state-of-art autistic traits considered in the action RQ3-wearable device RQ3: Which kind of IoT/wearable devices have been adopted?
To explore the current state-of-the-art of IoT/wearable devices for support intervention in autism RQ4-machine learning approaches RQ4: What machine learning algorithms have been used to support intervention?
To analyze the machine learning approaches adopted by previous work to help researchers in the selection of the machine learning approaches for ASD detection and treatment with wearable computing RQ5-evaluation RQ5: How has the evaluation been conducted and measured in these studies?
To study the methodologies exploited to measure and validate the proposed machine learning approaches, and to assess the ASD people improvements intervention (Object) from a critical disability studies perspective (Viewpoint).
To characterize the papers related to wearables and Machine Learning technology research applied to ASD detection and intervention, we formulated the research questions in Table 1. In particular, we aimed at analyzing the kind of action addressed in the paper (RQ1), e.g., detection of ASD; the kind of autistic traits considered in the action, e.g., non-verbal ASD people (RQ2); the kind of wearable device (e.g., accelerometer or EEG); and Machine Learning model experimented (e.g., SVM or CNN), RQ3 and RQ4, respectively. We also examined how the evaluation of the experiment has been conducted and measured (RQ4). The research questions addressed in this study are summarized in Table 1.

Search strategy
To conduct our SLR, we followed the guidelines proposed by [19], largely adopted in the literature.
To collect the papers which are relevant for our study, we first defined the search terms, and then the source of information to be searched, the search process, and the criteria adopted for selecting the relevant papers.

Search approach
To define the search terms, we performed five steps [19]: 1. We determined the relevant terms from the research questions by identifying population, intervention, and outcome as follows: • population: autism; • intervention: Machine Learning Techniques and wearable technologies; • outcome: type of action.
As an example, the research question RQ1 may be formulated as follows with the above classification: 2. For all selected terms, we identified the synonyms. In particular, we considered the following: • autism ("autism" OR "autistic people" OR "ASD" OR "Autism Spectrum Disorder" OR "Autistic disorder"); • Wearable ("Wearable" OR "IoT" OR "Wireless Body Area Network" OR "WBAN" OR "sensors"); • Machine Learning ("machine learning" OR "supervised learning" OR "classification" OR "regression" OR "'unsupervised learning"); ((autis • OR "ASD" OR "Autism Spectrum Disorder") AND ("IoT" OR "wearable" OR "Wireless Body Area Network" OR "WBAN" OR "sensors") AND ("machine learning" OR "Deep Learning" OR "neural network")) 5. We integrated the search string into a summarized form when the number of Boolean operators is limited, as an example in the ScienceDirect digital library.

Searched resources
In an SLR, the selection of the resources to inquiry to search the relevant papers proposed in the literature has a determining role. To answer our research questions, we selected the following sources.
• We selected these resources, because they are the most relevant in the Computer Science field and contain a very large amount of documents and are largely adopted in most SLRs [4].

Article selection process
The selection of the relevant papers to answer our research questions has been conducted by following the process depicted in Fig. 1.
We submitted the search queries described in the previous section in the search bar of the selected databases and further inspected the references of the papers (snowballing) and manually search other papers.
The number of total papers we retrieved from the digital library and from manual search was 2137. Then, we applied the Exclusion/Inclusion criteria, and detailed in the section "Inclusion and exclusion criteria". The first author of the paper examined the title, abstract, and keywords of the retrieved papers and performed an initial selection. It remained 126 papers that were assessed by both the authors by applying the inclusion criteria from the ones selected from the digital libraries. Successively, we applied the Inclusion and Exclusion criteria and we got 11 papers. Starting from these papers, we performed backward and forward snowball search, where the former consists in examining the papers referenced by the selected ones, and the latter in including the paper referencing the selected ones. We applied to these additional papers inclusion and exclusion criteria, but none of them passed the selection. The quality of the remaining 11 papers were also assessed according to the criteria exposed in the section "Quality assessment", aiming at verifying that all the considered papers were suitable to answer all our research questions. It is worth mentioning that the second author of the paper controlled all the phases of the selection. After discussion, it remained 11 papers.

Inclusion and exclusion criteria
We defined the following criteria for establishing if a paper may be useful for our research.

Exclusion criteria
• research not related to autism; • research not related to wearable or IOT devices; • research not exploiting Machine Learning techniques; • research written in a language different from English; • research not in peer-reviewed journals or conferences, books, and lecture notes; • papers written before 2015.

Inclusion criteria
• English papers (short or full) which exploit Machine Learning techniques and wearable technology to support autism screening/intervention (Table 2).

Quality assessment
The final step of the selection process consists in examining the survived papers in detail, by answering the following questions: • Q1: Is the action clearly defined?
• Q2: Is the technological solution specified in detail?
• Q3: Is the machine learning technique clearly defined?
• Q4: Are the evaluation strategies and metrics explicitly reported? • Q5: Does the intervention has been experimented on a real setting?
The answer to each question is scored "Yes" = 1, "Partially = 0.5", or "No = 0". For each selected paper, the score is computed by summing the score of the four questions related to it. Thus, the maximum score a paper may obtain is 5. We classified the paper quality as High (score ≥ 4), Medium (3 ≤ score < 4), and Low (score < 3) [4]. Both the authors of the paper jointly assessed each document. The papers which reached a medium/high score were selected. The final selection was passed by 11 studies.

Data extraction
The final papers which passed the selection were analyzed for answering the research questions. To this aim, for each paper, the form in Table 3 was filled in by the first author of this paper.

Results
Many works have been discarded, because either they used wearables and did not adopt machine learning techniques, or used other information sources different by sensors, such as videos. All of the three variables of this research-Autism, wearable, and Machine Learning techniques-were found together in the selected studies. In this section, we report the results we obtained for answering the proposed research questions.

Demographics
The selected papers are reported in Table 4. For each paper, we specify its reference, title, authors, publication year, and the publication type. The barchart in Fig. 2 summarizes the publication types of the selected papers. In particular, all the selected papers got a medium or high score and appeared on Journals or Conferences, except paper [40], which is a very well-written technical report treating a very relevant topic and satisfying the quality requirements. The distribution of the publication years of the selected papers is depicted in Fig. 3. As it is possible to see, there is growing interest in this topic and the published works are still few. Therefore, there is space for new research in this area.

Screening
The screening procedure for detecting autism is a very long and expensive process, in the UK, it may require 3 years and a half to get an ASD diagnosis. An ASD diagnosis is formulated after a series of explicit qualitative and quantitative evaluations conducted through the direct observation of the individual, while he performs semi-structured behavioral tasks. The clinician assesses and scores the individual's reactions to specific situations. Also, the familiars are interviewed.
Many ASD people remain without a diagnosis or receive it when they are adults, with dramatic consequences, because early detection is fundamental to get the best results. The diagnosis is difficult, because autism is a spectrum, with varied symptoms and behaviors. While the diagnostic process is enough standardized for children, in the case of the adults, currently there are no standard criteria. They may have only some symptoms that make daily life difficult. Others may feel or behave differently or others may consider them different. All of this hardly impacts their life.
Among the various symptoms of ASD, the ability to imitate or copy others' movements is also damaged. This may heavily impact the person's social life, learning, and language [10]. Many research confirmed that autistic adults are characterized by a lower movement imitation precision w.r.t. to non-autistic individuals.
Vabalas et al. [40] investigated whether simple imitation tasks may be adopted to recognize an autistic person. To analyze the attention to the movement performed on the screen, they adopted an eye tracker to collect eye movement; a motion tracker was used for kinematic data collection.
The various ASD diagnosis approaches, such as DSM, ICD, ADOS, and ADIR, do not take into account the biological bases of the disorder. Many studies have highlighted the relationship between sensory dysfunction in ASD and electrodermal activity (EDA). EDA is related to the skin electrical proprieties, influenced by the variations in sweating, skin conductance, heart rate, and blood flow to muscles [34]. It varies when the person is subject to external or internal stimuli and may be useful in the ASD diagnosis.
Virtual Reality (VR) is increasingly adopted in Neurosciences because of the possibility of creating a replica of natural phenomena and social interactions, by providing an immersive experience soliciting many human senses. It offers the advantage that the stimuli may be provided in a controlled way and that the data may get with high accuracy. Alcaniz et al. [1] proposed to adopt Virtual Reality (VR) in conjunction with electrodermal activity to study the individual response to different and repeatable situations in a controlled way, in a context that simulates real situations. They used VR and EDA reactions to detect ASD. Olfactive stimuli have also been provided to the experiment participants by exploiting the Olorama Technology, 1 for producing the required scents.

Emotion classification and communication
Autistic people generally suffer of degradation of emotional abilities. They have difficulty in both the expression of their emotions and the recognition of the emotions of their interlocutors.    Challenging behavior prediction 1 [29,46] Stereotypical motor movements 2 [30,36] Self-injury behavior 1 [7] Aggression to others 1 [14] Healthcare monitoring 1 [28] Autistic people suffer from the moment of crisis due to their incapability of controlling their emotions. These manifestations, called meltdowns, may have consequences very dangerous for them in case of self-injuring behaviors or for other people, in case of aggressive behaviors. Thus, understanding their emotions may help the caregivers prevent an episode of crisis.
The recognition of the internal emotion may be conducted by analyzing physical signals, such as facial expression, speech, and movements, or physiological signals, such as electroencephalogram (EEG), electrocardiogram (ECG), and temperature [2]. Facial expressions can be intentionally manipulated, while this does not happen with physiological signals which provide a more realistic emotion representation. In [6], an approach of children emotion recognition is proposed by exploiting EEG sensor. The emotion classification has been conducted by exploiting Machine Learning techniques and on a sample of 18 children.
Aslam and Altaf [2] designed an ad-hoc processor for emotion recognition based on a scalp EEG. They considered a set of four emotions, including sad, angry, relaxed, and happy, one for each quadrant of the Circumflex model of Russel [35].

Challenging behavior prediction
ASD children often experience challenging behaviors, which dramatically impact all their family and are the main reason for hospitalization [23]. They include: 2 • Physically challenging behaviors. They occur when ASD individuals are aggressive towards others, (e.g., pulling hair). • Emotionally challenging behaviors. They occur when ASD individuals shout or use offensive language. • Self-injurious behavior (SIB). They occur when ASD individuals' aggressive behavior is directed towards themselves (e.g., head-banging). • Property disruption challenging behaviors. They occur when ASD individuals are aggressive towards the objects.
Many studies demonstrated that the Applied Behavior Analysis (ABA) therapeutic methodology may reduce these challenging behaviors [9]. It is based on the individuation of the antecedents, that is of the situations which may cause the behavior. Many studies are devoted to predicting that such behavior is going to occur.
Wearable devices may detect the signal of an initial state of stress and anxiety and be adopted for supporting caregivers in intervening before the challenging behaviors explosion [23]. Masino et al. [29] proposed an approach to detect challenging behavior based on Heart Rate (HR) and beat to beat (RR).
Zheng et al. [46] proposed a predictive framework named PreMAC to alert caregivers of the insurgence of challenging behaviors. It is based on Random Forest and collects multimodal signals.
SIB is one of the main causes of hospitalization of ASD individuals [18]. Indeed, they may provoke to themselves physical damns such as abrasions, lacerations, and contusions. Generally, ASD individuals subject to SIB have reduced verbal or cognitive abilities. Often, these actions are triggered when the ASD fails in communicating his needs, or it may be a mean to get social attention. They often are not able to communicate their pain, such as a head-hake or if they reach very high-level anxiety, they may adopt selfinjury to decrease their arousal level. The absence of verbal 2 https://www.ambitiousaboutautism.org.uk/. communication makes it difficult to get explanations on the reason for such behaviors.
Cantin et al. [7] adopted wearable accelerometers to detect SIB. Due to the difficulty of acceptance of the sensors by children, sensors have to be discrete and enable a flexible placement. They are placed in different places, such as wrists, waist, and/or ankles, or within pant pockets, depending on the acceptance of the child. Also, the materials (e.g., cotton or fleece) are added to the sensors according to his acceptance.
One of the most serious challenging problems is aggressive behavior. It is unpredictable and may cause serious consequences both to other people and to the ASD individual and his family. This kind of behavior makes it difficult for the autistic person to access therapeutic services and also to have medical support. It hardly impacts the isolation of the family, because stressful environments such as restaurants or cinemas may cause sudden aggression. Often supporting people, such as educators, therapists, or personal assistants, may abandon their job or require payments for their damage. In this way, the abandoned state of the family increases. Often, the final solution for the ASD person is residential living placement, which results in a poor quality of life for him and high costs for the sanitary system. It has been assessed that physiological arousal is associated with aggressive behavior [25]. Goodwin et al. [14] aimed at verifying whether it is possible to predict aggressive behaviors by analyzing changes in physiological arousal.

Alleviating stereotypical motor movements
Stereotypies are activities involving specific repetitive purposeless movements or behaviors, such as rocking and hand flapping or waving or covering ears, fixing an object [37]. They dramatically impact learning and social interactions. In stressful situations, they increase and may cause a meltdown.
The identification of Stereotypical Motor Movements (SMMs) is very relevant to improve the life of ASD people not only for identifying ASD but also for evaluating the intervention they are receiving, such as drugs or therapy. Indeed, if the intervention is effective, SMM should reduce. The causes triggering these movements could also be detected, including environmental factors or sensory perceptions. Monitoring SMM may be very useful to reduce their frequency and duration and to determine an adequate therapy [26].
For these reasons, it is very relevant to detect SSM. Generally, SMM detection consists of manually extracting features by observing the accelerometer signals.
Rad et al. [30] proposed an approach based on Convolutional Neural Network for discriminating features for SMM detection by exploiting an accelerometer.
Saduk et al. [36] analyzed in detail the problem related to the detection of SSM. When using an accelerometer, there is the need of determining the main features characterizing SSM and of considering two types of variance: intrasubject intensity variance, due to the intensity variance of SSM in the same individual, and intersubject intensity variance due to the variance in intensity occurring between different individuals. Convolutional Neural Networks have been adopted for identifying SMM within subjects, while transfer learning and an SVM classifier have been adopted to deal with intersubject classification.

Emergency handling
ASD individuals in addition to challenging behaviors may have other health problems, such as epilepsy or heart attack. There exist autism centers aiming at supporting autistic people, but often ASD individuals are not able to communicate and ask for help. They may have a fever, but they are not able to tell it and their disease can get worse. To manage this aspect together with challenging behaviors, Mamun et al. [28] proposed AutiLife, a healthcare monitoring system for autism center based on the 5G cellular network. Wearable devices are adopted to measure blood pressure, heart rate, body temperature, motion for body movements, and capture speech signals. All this information is transferred to the server which exploits ML techniques to detect health issues and to perform appropriate actions. The system is not fully implemented. We decided to select this paper, because it faces a very relevant issue for ASD individuals.

RQ2-autistic traits
ASD individuals exteriorly look like other people, but they have some traits which may hardly impact on their life and let them be dangerous for themselves and others. Some of the considered research described the traits of the ASD individuals involved in the experiment. To the experiment proposed by Goodwin et al. [14], participants were minimally verbal and/or with intellectual disability. Cantin-Garside et al.'s [7] participants were non-verbal or with repetitive behavior and manifested aggressive behavior. Vabalas et al. [40] focused on the lack of attention that causes the failure of an imitation task. ASD sensory dysfunction is revealed when solicited in a Virtual Reality environment through the EDA analysis [1]. Mamun et al. [28] refer to individuals having an incapacity to communicate because of non-verbal or incapable of Body temperature 2 [1,28,46] declaring their illness. This also holds for Aslam and Atlaf [2]. Others [32]

RQ3-wearable devices
Many different types of devices have been exploited. The Neurosky Mindwave Mobile Headband equipped with an EEG sensor has been adopted by [6] for emotion communication and shown in Fig. 4. Heart rate and beat to beat (RR) were adopted by Masino et al. [29] The motion tracker adopted by [40] was Polhemus Fastrak equipped by a motion sensor to put on the index finger, while the EyeLink 1000 Plus eye tracker was selected to capture eye movements.
An efficient on-chip implementation of the EEG-based emotion detection has been proposed in [2], using a 180nm CMOS process with validation on multiple benchmark EEG datasets for real-time emotion classification. Cantin et al. [7] adopted Tri-axial accelerometers (ActiGraph GT9X Link 3 ) to register ASD children movements.
For predicting challenging behavior, Zheng et al. [46] adopted Microsoft Kinect for the analysis of the facial expressions and head movements; the Empatica E4 4 wristband collects data concerning blood volume pulse (BVP), electrodermal activity (EDA), body temperature, and an accelerometer. The body movements are collected by an ad-   [7] hoc developed device named Wings, see Fig. 5. The device has been designed to be noninvasive (the weight is 232 g). The left part of Fig. 5 shows the device when it is worn. The cloth chosen is one of the most preferred by children. Also, [1,14] adopted Empatica E4 for biosignal collections in their experiments. Summary for RQ3: To collect ASD sensory data, most approaches use an accelerometer; specifically, Empatica E4 wristband that also collects Blood Volume Puls (BVP) and Electrodermal Activity (EDA). But also EEG, Heart Rate, and Body Temperature have been experimented.

RQ4-machine learning approaches
Different machine learning approaches have been considered in the selected studies. Their distribution is reported in Table 7. Some of them compare the performance of different algorithms for selecting the one that performs better.
Cantin et al. [7] aimed at detecting self-injury behaviors. They selected SVN which performed better than Decision Trees in the detection. They also aimed at classifying the different types of behaviors. In this case, performance varies in dependency of the kind of movements. The movements of more active participants were more difficult to detect w.r.t. the quieter ones. They experimented with the application of a different number of sensors, based on the participant acceptance.
Zheng et al. [46] compared different ML techniques. Better accuracy was reached by Random Forest (RF), k Nearest Neighbors (kNN), Decision Tree (DT), and Neural Network (NN), while Support Vector Machine (SVM), Discriminant Analysis (DA), and Naĩve Bayes (NB) performed worst. Better results in terms of accuracy were reached by Random Forest, 98, 51% on an individual level. They also analyzed the contribution of the single source of data and of their combination.
Saduk et al. [36] adopted CNNs for SMM detection within subjects, while for SMM detection across subjects, they proposed a knowledge transfer platform associated with an SVM classifier. They also apply cross-domain transfer learning to be able to detect SMMs of any atypical person.
Summary for RQ4: Support Vector Machine (SVM) is the preferred classification approach.

RQ5-evaluation
To answer this question, several aspects have to be considered, such as the datasets adopted in the evaluation, the evaluation metrics used for the ML models, the validation techniques, and the obtained performances. Besides, it is relevant to verify whether the improvement of ASD individuals is evaluated and how. In the following, we examine how these aspects have been addressed by the selected papers.

Datasets
Ad-hoc datasets are generally adopted, as in the case of [30], which collected accelerometer signals from 6 subjects with autism. In particular, Vabalas et al. [40] adopted an ad-hoc created dataset collecting data from 15 autistic and 15 nonautistic adults for the training set, and from 7 autistic and 7 non-autistic participants for the testing set. The data of 11 children were collected by [7] for SIB prevision. More than 200 episodes of SIB were observed.
Alcaniz et al. [1] performed two experiments for investigating whether sensory processing can discriminate between ASD and not ASD individuals using electrodermal activity in two multimodal virtual environments. Participants were 54 and 40 children, respectively. Goodwin et al.'s [14] dataset was composed of the data collected by 20 ASD individuals  F-measure 3 [7,30,36] Feature selection stability (KI) 1 [40] tolerating E4. They in a previous work [13] published the data and the code related to their experiment. 5 It contains the accelerometer data of 6 ASD participants collected in laboratory and classroom. This dataset has been also used by [30,36]. The latter also adopted the HAR (Human Activity Recognition) dataset collecting data related to basic activities for typical people. The dataset of Sadouk et al. and the code are also available. 6 Aslam et al. [2] adopted two publicly available datasets for emotion recognition: DEAP [20] and SEED [45], containing physiological and EEG signal, respectively. They are not specific for ASD people.

Summary-Datasets
The datasets created in the selected papers for evaluating the proposed approaches were not published. Three papers adopted the dataset produced by Goodwin et al. in [13]. The number of participants in the experiments is restricted. Table 8 reports the distribution of the metrics adopted in the selected study. Accuracy is preferred.

Evaluation metrics
Classifiers of screening approaches labeled their data (ASD, no ASD) based on diagnoses and this avoids the threshold bias problem that often occurs in classification tasks. In the intervention approaches, the phenomena to be detected are analyzed by an expert and labeled.

Summary-Evaluation metrics
Accuracy is preferred. We note the absence of evaluation of the acceptance of the sensors by the ASD individuals.

Validation techniques
The validation techniques adopted in the considered research work are reported in Table 9.
Three of the selected papers [1,36,46] adopted k-fold cross-validation [33]. According to this strategy, the dataset is divided into k subsets of the same size. One of them is selected as test set, and the other k − 1 composed the training set. Each set becomes a test set for one time, and thus, this process is repeated k times. According to [15], the size of the dataset is too big and the causality may be invalidated. For this reason, the size of k should be at list 100. Besides, Tantithamthavorn et al. [38] proved that this technique is not reliable. Cross-validation may be biased by the choice of the hyperparameters. This problem is overcome by adopting the nested cross-validation technique which trains a model in which hyperparameters also are optimized. It is applied in two of the selected papers. The other three papers applied the most reliable cross-validation techniques, nested Leave-One-Out [42], and composed of an outer evaluation loop and an inner hyperparameter selection and training loop. In particular, Zheng et al. adopted both k-fold cross-validation and LOO in two different contexts. Leave-one-out (LOO) 2 [2,29,46] k-fold cross-validation 1 [1,36,46] Nested cross-validation 3 [40] Vanbalas [40] used nested cross-validation and separated model development and independent model testing datasets.

Performances
The support offered by the selected approaches strictly depends on the classification performances. It is difficult to compare the performances of the considered studies, because the datasets have different sizes and the objectives of the classifications are different. Besides, different metrics and validation techniques have been adopted.
Concerning the screening approaches, Vabaldas et al. [40] detected ASD individuals with 73% of accuracy using sensors and 70% accuracy in analyzing eye movements. When combining the two techniques, 78% of accuracy is reached with nested cross-validation. The combined use of EDA and virtual environments to detect ASD individuals in [1] in the final confirmatory test set (n = 20 participants) achieved 85% accuracy.
For the intervention support, emotion recognition techniques [6] reached 82% of accuracy with Random Forest. Aslam et Altaf [2] examined the accuracy of their EEG-based approach w.r.t. the two considered datasets. They reached the best accuracy in the DEAP dataset for valence and arousal 85% and 82.5%, respectively, while 100% for valence with the SEED dataset.
In the detection of stress conditions, Masino et al. [29] reached 93% of accuracy. Their dataset was created by collecting heart rate and RR interval measurements during rest and during activities generating stress. Participants were 22 ASD and 16 non-ASD individuals.
For the prevention of challenging problems, Random Forest [46] reached 98.5% of accuracy with the multimodal model. Zheng adopted Nested Cross-Validation techniques on a dataset different from the training one.

Summary-performances
The performances related to ASD screening have to be improved, while the detection of challenging behaviors, stress conditions, and negative affective states seems to have good results.

ASD improvements
Summary-ASD improvements No evaluation on the advantages of the adopted technologies in real contests and of the possible improvements of the ASD individual life due to the adoption of the proposed technologies have been conducted.

Discussion and implication
From the analysis of the selected papers which combine wearables and Machine Learning techniques in the screening and intervention on ASD individuals, it emerged that the use of these technologies is promising, but it is only in its pioneering phase.
The selected papers proposed very useful solutions to detection and intervention in challenging and dangerous situations, but many further aspects have to be considered. In particular, ASD individuals may have problems when equipped with sensors. Therefore, as considered by Cantin [7], the sensor configuration has to be accurately handled, because it could be changed depending on the individual's acceptance. Generally, all the work provided enough technical details on the adopted sensors, but not on their acceptance.
The performance of the Machine Learning techniques is assessed. The experimentation is conducted in lab sessions. Further validations are needed in a real context. It seems very critical the creation of sensory datasets useful for researchers to experiment with Machine Learning techniques. The number of participants is very reduced, probably due to the difficulty of involving ASD individuals in the experimentation. Besides, in many cases, they should also have the same autistic traits.
In some cases, results were not tested with independent data. Thus, there is the need for further experimentation on wider samples, with different sociodemographic characteristics [1]. Best results should be reached by combining different biomarkers, such as eye-tracking, body movement analysis, EDA, and EEG.
When conducting research on ASD people together with the technological aspects also the understanding and the respect for disability and autism as identity should be considered [43]. Indeed, ASD individuals' cognitive and communication deficits may increase their risk of vio-lation of their human rights concerning their privacy of information. Many families do not participate in the experiments because of the fear that their data could get into the wrong hands, exposing them to stigma or discrimination. 7 The considered works did not examine data privacy issues, while ethical concerns have been considered by [1,7,40].
One of the main issues is the lacking of public datasets of sensory-based data useful to the researchers to compare the various Machine Learning techniques. The improvements in the ASD individual life due to the adoption of the proposed technologies have not been assessed yet. By evaluating the reduced number of papers, the increasing number of ASD individuals, and the increasing need for this kind of application, we can deduce that there is still a large margin of new research contribution in this hot topic.

Threats to validity
The selection process of the papers to analyze represents the main threat to the validity of this study. By examining the process, the first threat is the completeness of the search terms adopted to formulate the queries. We tried to mitigate this threat by adding to the search query all the synonyms of each term and verifying their presence in the selected papers. The paper selection step has been conducted by both the authors, for a deeper verification. Moreover, in addition to the papers found on the selected databases, snowballing has been conducted to search other papers and manual search on other databases such as Google Scholar has been performed. Exclusion and inclusion criteria were rigorously followed.
For assessing the quality of the selected papers, we provided five questions to be answered for each paper and only the papers with an adequate score were considered. For each paper that passed the quality assessment, a form was filled to collect all the details useful to answer our research questions.
It is possible that while screening 2140 papers, someone of interest could have been excluded. The limited year range (papers only after 2015) may also exclude some relevant papers.

Conclusion
In this paper, we reported the results of a Systematic Literature Review on the adoption of wearables and Machine Learning techniques for detection and intervention on ASD individuals. We addressed our analysis on the following aspects: (1) what kind of actions on ASD individuals have been proposed, (2) what kind of autistic traits have been considered, (3) what kind of sensors have been adopted, (4) which Machine Learning algorithms have been selected, and (5) how they have been evaluated.
We selected the papers in the 2015-February 2021 range. The initial population was 2140 papers. From them, we extracted 11 papers with the required quality where Machine Learning techniques and wearable sensors were considered.
From the analysis, it emerged on one side the great relevance of the topic, on the other side that there exist some limitations in the considered studies and that there is still room and need of new research in this area. We hope that this study will be useful for conducting new high-quality studies aiming at supporting ASD individuals and their caregivers.
Author Contributions Both the authors contributed to the manuscript equally.
Funding Not applicable.
Data Availability Not applicable.

Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Code availability Not applicable.
Ethics approval Not applicable.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.