Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing

Purpose The main aim of our study was to explore the utility of artificial intelligence (AI) in diagnosing autism spectrum disorder (ASD). The study primarily focused on using machine learning (ML) and deep learning (DL) models to detect ASD potential cases by analyzing text inputs, especially from social media platforms like Twitter. This is to overcome the ongoing challenges in ASD diagnosis, such as the requirement for specialized professionals and extensive resources. Timely identification, particularly in children, is essential to provide immediate intervention and support, thereby improving the quality of life for affected individuals. Methods We employed natural language processing (NLP) techniques along with ML models like decision trees, extreme gradient boosting (XGB), k-nearest neighbors algorithm (KNN), and DL models such as recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), bidirectional encoder representations from transformers (BERT and BERTweet). We extracted a dataset of 404,627 tweets from Twitter users using the platform’s API and classified them based on whether they were written by individuals claiming to have ASD (ASD users) or by those without ASD (non-ASD users). From this dataset, we used a subset of 90,000 tweets (45,000 from each classification group) for the training and testing of these models. Results The application of our AI models yielded promising results, with the predictive model reaching an accuracy of almost 88% when classifying texts that potentially originated from individuals with ASD. Conclusion Our research demonstrated the potential of using AI, particularly DL models, in enhancing the accuracy of ASD detection and diagnosis. This innovative approach signifies the critical role AI can play in advancing early diagnostic techniques, enabling better patient outcomes and underlining the importance of early identification of ASD, especially in children.


Introduction
Autism spectrum disorder (ASD) is a developmental disability that impacts individuals' social and interactive skills when engaging with others [1].The condition typically manifests before the age of three and can persist throughout a person's life, leading to a lower quality of life for those who remain undiagnosed in childhood [2].ASD encompasses a wide range of subtype conditions, with one of the subtypes known as Asperger Syndrome † Sergio Rubio-Martín, María Teresa García-Ordás, Martín Bayón-Gutiérrez, Natalia Prieto-Fernández, and José Alberto Benítez-Andrades have contributed equally to this work.*Correspondence: srubm@unileon.es 1 SALBIS Research Group, Dept. of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, 24071 León, León, Spain Full list of author information is available at the end of the article (AS), which is classified as severity 1 within the autism spectrum [3].AS was officially recognized as an ASD subtype in 2013 by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and later by the World Health Organization (WHO) in its International Classification of Diseases 11th Revision (ICD-11) in 2022 [3].
Early diagnosis of ASD offers numerous scientific advantages.Firstly, early intervention significantly improves social, communicative, and cognitive skills in children with ASD [4,5].Detecting the disorder at an early age allows professionals to implement tailored interventions that address the specific needs of the child, leading to better long-term development.Secondly, early diagnosis provides parents and caregivers with access to appropriate resources and support, improving the overall quality of life for both the child and the family [6].This includes specialized therapies, counseling, and adjustments to the child's environment to accommodate their unique requirements.Additionally, early ASD diagnosis positively impacts the individual's educational and occupational trajectory, offering increased opportunities for success in academic and work settings [7].By providing suitable interventions from the outset, children have greater chances to acquire skills necessary for future achievements.Lastly, early diagnosis contributes to scientific research by enhancing our understanding of the causes, progression, and variability of ASD.This knowledge aids in identifying risk factors and developing more effective treatment and prevention strategies [8].
While many studies have focused on detecting ASD, most rely on time-consuming interviews and questionnaires, emphasizing the need for early diagnosis [9].However, recent advancements in artificial intelligence (AI) offer promising solutions in the healthcare field [10,11].AI models have been successfully applied to detect and diagnose various types of cancer, such as prostate, breast, and cervical cancer [12][13][14].In the context of ASD, previous research has primarily utilized machine learning techniques and datasets consisting of images [15,16], or focused on specific techniques like eye-tracking [17].However, recent studies have explored the use of AI and machine learning to analyze textual data from social media platforms, particularly Twitter, to identify ASD-related behaviors and patterns [18].By examining the content and structure of tweets, AI algorithms can provide valuable insights for early diagnosis and intervention, offering new opportunities for accurate and timely ASD diagnosis [19,20].
Given the significance of social networks and AI in early ASD diagnosis, there is a research gap in utilizing information from Twitter users' biographies to develop models that aid in this endeavor [21].Individuals within the ASD spectrum often disclose their condition in their biographies using hashtags, plain text, emojis, or emoticons, enabling more precise identification of individuals with different ASD subtypes.Twitter, with its accessibility for extracting textual data, is the platform of choice for this study.The primary objective is to develop artificial intelligence models that can diagnose ASD by analyzing the texts posted by users who openly disclose their condition in their Twitter biographies.
Building upon our preliminary study presented at a conference, where we delved into the potential of artificial intelligence for diagnosing Autism Spectrum Disorder (ASD) [22], this manuscript introduces several substantial advancements.Our key contributions in this extended research are: • Additional Machine Learning Models: Beyond the models explored in our initial study, we have trained and evaluated others, notably including KNN.The rigorous process of including and evaluating these models, each with its unique characteristics and parameters, demanded significant effort for fine-tuning and optimization tailored to our dataset.• Extended Deep Learning Models: We have further ventured into deep learning, incorporating models like RNN and LSTM, renowned for their prowess in handling data sequences like texts.This exploration also involved experimenting with various configurations to hone their performance, necessitating considerable computational resources and time.• Pretrained BERT Models: Our exploration did not stop at conventional models.We ventured into BERT models, testing two distinct pretrained versions, each with its unique training datasets and features, influencing their performance on our tasks.
The magnitude of effort invested in testing these models with our dataset is immense.Each model underwent its distinct tuning, training, and validation process, demanding intensive computational resources and time.This rigorous approach was pivotal to pinpoint the model or combination thereof that yielded the best accuracy and performance for our specific problem.These contributions not only expand the scope and depth of our initial study but also underscore our meticulous exploration of configurations and parameters.The overarching goal remained consistent: to enhance the accuracy and efficacy of our models in detecting ASD.
Furthering our contributions, this manuscript also introduces: • A comprehensive Twitter dataset comprising 404,627 tweets from 252 distinct users, with 221 users explicitly indicating ASD in their biographies.• The application of a diverse array of ML and DL techniques to forge predictive models, aiding in diagnosing patients based on discernible patterns in texts.• A comparative analysis of the accuracy achieved by traditional ML techniques vis-á-vis DL techniques in the predictive models.
In summation, recognizing the burgeoning significance of social networks coupled with the prowess of artificial intelligence in early ASD diagnosis, it is imperative to harness the insights from Twitter users' biographies.By scrutinizing textual data and leveraging machine learning techniques, we can craft models that significantly aid in the precise and timely diagnosis of ASD.Such strides hold the promise to deepen our understanding of ASDrelated behaviors and experiences, refine interventions, and ultimately uplift the lives of individuals with ASD and their families.The structure of the paper is as follows: "Material and methods" section offers a comprehensive explanation of the methodology utilized in the various techniques proposed.In "Experiments and results" section, the experiments and results are presented, along with a comparative analysis of the different techniques.Finally, "Discussion and conclusions" section encompasses the discussion, where the conclusion is also presented for a cohesive narrative within the same section.

Material and methods
The paper provides a detailed explanation of the research methodology in the following subsections.Firstly, "ASD dataset collection and classification" section describes the approach taken to obtain the complete dataset and outlines the classification process for each example.Moving on to "Machine learning and deep learning models used" section, the paper presents the machine learning and deep learning models employed to address the problem at hand.Additionally, "Hardware and Software used for the experiments" section provides an overview of the hardware specifications of the computer utilized in the research.The research outline can be visualized in Fig. 1.

ASD dataset collection and classification
Initially, several datasets pertaining to ASD were explored, but they lacked sufficient representativeness.This lack of representativeness was primarily due to two factors: firstly, these datasets did not contain a sufficient number of records to train robust artificial intelligence models and secondly, the fields or columns within these datasets did not carry relevant information that could effectively contribute to the learning process of these models.Consequently, the decision was made to create a new dataset from scratch.To accomplish this, Twitter was chosen as the source of data, specifically focusing on English tweets from users who self-identified as having ASD in their biography profiles.The dataset extraction process involved programming a Python script capable of accessing the publicly available user data.Accessing Fig. 1 Outline of the research done this information required utilizing Twitter's API, which is exclusively accessible to developers who have undergone prior verification by the platform.Leveraging the API access, the data was exported to a CSV format for convenient handling and analysis.The configuration of the tweet data within the dataset was as follows: • The initial step involved manual scraping of users by examining their biography profiles to determine whether they self-identified as having ASD.To identify these users, specific keywords such as ' Autism' , ' ASD' , ' Asperger' , ' Aspie' , ' Autistic' , and ' ActuallyAutistic' were utilized.The inclusion of ' Asperger' and ' Aspie' as keywords stems from the recognition that Asperger's is now considered a subtype within the autism spectrum by the scientific community.Figure 2 illustrates two examples of ASD users, with certain data points such as username, location, and date of birth removed in compliance with Twitter's policies.
• During the user search process, each individual profile was meticulously reviewed to ensure accurate classification.It was important to discern between profiles belonging to individuals and those associated with organizations or societies, which led to the exclusion of certain profiles despite containing the relevant keywords.Additionally, there were cases where users were part of an ASD person's family, indicated by phrases such as 'father of an ASD kid' or 'mother of an ASD kid' , which resulted in their exclusion from the dataset.Furthermore, some users identified themselves as ' ASD advocates' , indicating their support for individuals with ASD but not personally having ASD themselves.• Subsequently, the complete dataset consisting of tweets was automatically labeled by the programmer, considering the information available in the user's biography.As a result, two distinct groups were formed: -Tweets authored by users or individuals with ASD.
-Tweets authored by users or individuals without ASD.
• Once we have a dataset composed of texts and a binary classification (written by individuals with ASD or without ASD), various artificial intelligence models are trained.These models take texts as input and output a classification, indicating whether the text was written by someone with ASD or not.

Machine learning and deep learning models used
An intriguing research avenue was explored, focusing on the evaluation of ML and DL models to identify the most accurate approach for addressing the problem.Consequently, different ML models were employed in this study to obtain results.While these models were tested, there was also a curiosity to investigate the potential of DL models and assess whether they could outperform traditional models in terms of accuracy.The following section provides an explanation of the models utilized: • Decision trees [23]: A machine learning model that excels in situations where nonlinear relationships among variables are prominent.It provides superior mapping capabilities compared to other models.In this model the decision trees are built using an algorithm that splits recursively the data into several small sequences, which helps to give focus to the important features of the data.This process still be done until a certain stop requirements are achieved, like reaching the depth limit of the tree.• XGB (eXtreme Gradient Boosting) [24]: This machine learning model is based on random forests (RF) but incorporates several optimizations.It operates by initially considering only a subset of randomly selected variables, repeating this process multiple times with different variables.Additionally, each tree takes into account the results of the previous tree, giving importance to the misclassified instances.After creating each tree, the error is calculated, which helps to create another tree that has to correct that error margin.The combination of trees helps give an accurate prediction because this model implements different techniques to avoid overfitting.• KNN (K-Nearest neighbours) [25]: A machine learning model that, after being trained, takes into account the K nearest classified values from the testing sample.The result is influenced by its neighboring instances, conditioning the outcome.The most important in this model is to establish a 'k' that helps to get the best accuracy.In this model, 'k' is the number of nearest neighbours that are going to take into account for giving a prediction about in which group should be classified the current data point.In addition, the method or technique used to is vital because exist different alternatives to do this labor, but for this model it is used the computation of euclidean distance.In the prediction realised by this model, it assigns for each data point the most common category found in its 'k' nearest neighbours.This model does not "learn" as it could be known, in fact it just holds a copy of the data used as train and does predictions with the new data.Transformers) [28]: A deep learning model that generally achieves high accuracy in natural language processing (NLP) tasks.It is particularly suitable for this study as its encoder reads the complete word sequence from left to right and vice versa, taking into consideration the contextual information of surrounding words.Moreover, BERT is a pre-trained model, which means it has been trained on a large corpus of text data.While the pre-training process, BERT learns to predict missing words in a sentence and to distinguish well structured information from random ones.This model is ideal for cases where named entity recognition or sentiment analysis are done among other activities [29].So, this model is on the best candidates to give the most accurate percentage of accuracy among the other models.Hence, although for this research is used a BERTbase model, it is also used a variant or an specialised model of BERT which is called BERTweet,which was pre-trained concretely on a corpus of tweets.Tweets usually contains informal language, expressions, emojis and abbreviations that are not commonly found in large amount of web texts.So, BERT-base and BERTweet [30] use the same underlying model architecture but they differ in the type of data that they were pre-trained on.

Evaluation metrics
To evaluate the models, the results are displayed through confusion matrices.In this way it is possible to visualise the performance of the classification models.The confusion matrix consists of the following elements showed in Table 1.
From the confusion matrix, several performance metrics can be computed: • Accuracy: The proportion of correct predictions among the total number of cases.
• Precision (or Positive Predictive Value): The proportion of positive identifications that were actually correct.• F1 Score: The harmonic mean of precision and recall.

Accuracy
• Specificity (or True Negative Rate): The proportion of actual negatives that were correctly identified.
These metrics provide a comprehensive view of the model's performance, especially in cases where the classes are imbalanced.

Hardware and Software used for the experiments
To conduct all the experiments, two separate Jupyter Notebooks were employed.Both notebooks utilized Python 3.9 programming language and were executed on a computer with the following specifications: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHZ, 32.0GB RAM, and an NVIDIA GeForce RTX 2080 graphics card.

Data extraction and pre-processing
The initial step involved manually identifying users for the experiments by searching for specific keywords, including ' Autism' , ' ASD' , ' Asperger' , ' Aspie' , ' Autistic' , and ' ActuallyAutistic' , within their biographies.This selection process was carried out diligently, with each profile being manually reviewed.Consequently, several users were excluded as they did not correspond to individuals claiming to have ASD.The following user categories were discarded from the ASD group: • Profiles belonging to organizations or societies.
• Users who identified themselves as ASD advocates rather than patients.• Family members of individuals with ASD, such as those who mentioned being the 'Father of an ASD kid' , 'Mother of an ASD kid' , or part of an ' ASD family' .
Data was obtained from public Twitter users using a Python script programmed to interact with Twitter's API developer.This facilitated the extraction of user publications, which were then exported to a CSV file.The tweets were collected from January 1st, 2017, to January 31th, 2022, covering a period of approximately five years.The dataset was designed to consist of two groups: a representative number of ASD patients and a group of individuals without ASD.Subsequently, several pre-processing steps were applied to the data, which are outlined as follows: • Removal of duplicate tweets or those with identical content.• Elimination of retweets posted by the users.
• Exclusion of tweets that were not extracted correctly or in their entirety.• Removal of tweets automatically published by users through sharing options from other platforms like YouTube and Facebook.
For the experiment, a total of 252 users were considered, with 221 classified as ASD users and 31 classified as non-ASD users.Prior to the pre-processing procedure, the dataset consisted of 1,014,723 classified tweets.
After undergoing the aforementioned steps, the dataset was reduced and cleaned, resulting in 404,627 tweets.From the complete dataset, a subset of 90,000 tweets was selected with an equal distribution of 45,000 from ASD and non-ASD users respectively.

Implementation of machine learning and deep learning models
The dataset was randomly divided into training and testing sets, with 75% allocated for training the models and 25% for testing.The primary objective was to identify the best-performing model and compare their results to determine the most accurate model in this specific context.To achieve optimal results, an investigation of the best hyperparameters, which contribute to improving model performance, was conducted.This process, known as hyperparameter search, was facilitated using the Python library called GridSearchCV.
The hyperparameters for each ML model are outlined below, in Table 2: The RNN model is made up of the following layers: • Embedding layer.
• Two fully connected layers with dropout between them • The final output layer has a single neuron due to the fact that is the responsable for classifying the sample.
In Fig. 3 the scheme of the RNN arquitecture is shown.The LSTM model is made up of the following layers: • The input pass through a process of text vectorization.
• One fully connected layer.
• The final output layer has a single neuron because is in charge of classifying the sample.
The only difference among the LSTM and Bi-LSTM arquitectures is the LSTM and Bi-LSTM layers.In Fig. 4 the schemes of the LSTM and Bi-LSTM arquitectures are shown.

Results
Three ML models, namely decision trees, XGB, and KNN, were trained, alongside other DL models, namely RNN, LSTM, Bi-LSTM, BERT and BERTweet.The results, displayed in Table 3, support the hypothesis that some DL basic models achieves higher accuracy compared to the ML models with hyperparameters.Figure 5 displays the confusion matrices for eight different classification models utilized in a binary classification task aimed at identifying individuals with Autism Spectrum Disorder (ASD).
The BERTweet model stands out as the top-performing model, exhibiting a significant number of true positives and true negatives, indicating its strong ability to accurately identify individuals with and without ASD.As a deep learning model, BERT leverages neural networks to discern intricate patterns within the input data.This highlights the potential of deep learning models in extracting relevant patterns, thus enhancing the precision of classification.
While hyperparameter optimization was performed for the machine learning models, it was found that the BER-Tweet model outperformed the others.The KNN model achieved the lowest accuracy at 60.8%, followed by the decision tree with 61.2%, LSTM with 69.5%, RNN with 69.9%, and Bi-LSTM and XGB with an accuracy of 70.3% and 71.6% respectively.Notably, the BERT-based models achieved the best accuracies.The accuracy of BERT and BERTweet models were 84.3% and 87.7 respectively.So the model with the best accuracy was BERTweet.In summary, the analysis of the confusion matrices emphasizes the importance of selecting the appropriate model for detecting ASD and evaluating its performance using metrics such as confusion matrices.The exceptional accuracy and ability of the BERT model to learn complex patterns in the data suggest that deep learning models have the potential to significantly enhance the accuracy of classification tasks involving individuals with and without ASD.

Discussion and conclusions
In this study, a cohort of Twitter users was examined and classified into two groups: ASD users and non-ASD users.This classification process involved an initial search for specific terms within the users' biographies, followed by a manual review of the selected users' timelines based on their biographical descriptions.The Twitter API was then utilized to collect the users' posts, automatically labeling the texts as originating from either ASD or non-ASD users.The main objective of this research was to develop highly accurate models capable of predicting whether a given text was authored by an ASD user.
After preprocessing the dataset, a subset of 45,000 texts was selected from each group (ASD and non-ASD users), resulting in a total of 90,000 tweets.These tweets were further divided into training and test sets to train various models using both traditional machine learning and deep learning techniques.
As the best ML model it is found XGB with an accuracy of 71.6%.However from the whole number of models that were tested, the best one was the DL model called BERTweet with an accuracy of 87.7%.This aligns with previous studies that have shown that BERT-based models have a great effectiveness in categorizing any kind of text texts, including tweets [18,31,32].
It is important to acknowledge certain limitations of this study.One significant limitation is the potential biases in the collected dataset.These biases could arise from users providing false information in their biographies or tweets being authored by individuals other than the profile owners.Moreover, while our models show promise, they are not intended to replace medical specialists.Instead, they aim to assist in identifying potential ASD traits.Before these models can be considered for use in a medical consultation platform, these limitations   must be addressed.However, this research serves as a foundation for future investigations.This includes exploring hyperparameters to further enhance the accuracy of BERTs models and other deep learning models.Additionally, future efforts will involve training and evaluating additional deep learning and machine learning models that have not been previously examined, ensuring that no high-performing models are overlooked.

Fig. 2
Fig. 2 Example twitter bios of people claiming to have ASD = TP + TN TP + TN + FP + FN Precision = TP TP + FP • Recall (or Sensitivity or True Positive Rate): The proportion of actual positives that were correctly identified.

Table 3 Results of ML and DL models tp
true positives, fp false positives, tn true negatives, fn false negatives, f1 f1-score, acc accuracy)