Introduction

Attention deficit hyperactivity disorder (ADHD) is a common pediatric neurodevelopment disorder, with a global prevalence of 5% among people aged 18 and under (Sayal et al. 2018). Due to the lack of dopamine production, the prefrontal cortex, which is crucial for managing behavior, emotion, and attention, is particularly undeveloped in those with ADHD (Arnsten 2009; Loh et al. 2022a; Barua et al. 2022). As such, ADHD individuals exhibit forgetfulness, disorganization, and loss of concentration and attention (Magnus et al. 2021). Hence stimulant medications like Ritalin or Concerta, which can increase dopamine levels in the brain, are frequently used to treat ADHD (Arnsten 2009). If diagnosed early and treated promptly, a recovered individual may be able to restore neuronal connections to the prefrontal cortex and resume normal activities of daily living (Mattfeld et al. 2014). Otherwise, ADHD symptoms may persist into adulthood, increasing the likelihood of developing depression and antisocial behaviors as well as other undesirable outcomes like crime, academic underachievement, interpersonal relationship issues, and low employability (Sayal et al. 2018; Shaw et al. 2012; TAYLOR et al. 1996). Conduct disorder (CD) is frequently comorbid in approximately 30% of ADHD cases (Biederman et al. 1991). Some characteristics of CD include aggression towards people and animals, theft, violation of rules, and destruction of properties (Lillig 2018b). Furthermore, it has been revealed that patients with ADHD and CD are the least responsive to treatment (Carpentier et al. 2012; Shaw et al. 2012). Therefore, we must identify ADHD patients who are comorbid with CD to arrange a different treatment protocol that is best suited to them rather than receiving the same treatment procedure as ADHD.

According to the American Psychiatric Association, a comprehensive clinical interview and behavior rating scales are required to confirm a diagnosis of ADHD or CD (Levy 2014; Marshall et al. 2021; Salekin 2016). Clinical interviews are conducted with patients, family members, and teachers to determine if the patients exhibit symptoms in multiple settings (Marshall et al. 2021; Valo and Tannock 2010). Behavioral rating scales are frequently used in conjunction with clinical interviews, and they are designed to meet the Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic requirements (American Psychiatric Association 2013; Marshall et al. 2021). These assessment approaches, however, are based on subjective judgments, and the evaluation process is lengthy and tedious, which delays the diagnosis and impedes the delivery of the timely intervention. Furthermore, subjective evaluation of symptoms is prone to misdiagnosis (Sansone and Sansone 2011); there have been cases where students mimic ADHD symptoms to obtain ADHD prescription stimulants, which can aid in concentration (Hall et al. 2005), weight management (Piran and Robinson 2006), academic (Rabiner et al. 2010) and athletic performance (McDuff and Baron 2005). Therefore, an objective evaluation of ADHD and CD is essential in facilitating early diagnosis and reducing the likelihood of misdiagnosis and abuse of ADHD prescription stimulants.

Studies have shown visible differences in brain activities recorded using electroencephalograms (EEG) and magnetic resonance imaging (MRI) between ADHD and controls (Sridhar et al. 2017; Travell and Visser 2006), non-medicated responders and medicated responders (Loo and Barkley 2005). This study will be using EEG data acquired from ADHD, ADHD + CD and CD patients to develop a computer-aided diagnostic (CAD) tool based on artificial intelligence. As EEG is high-dimensional data with a number of features exceeding that of observations, we will be using a deep learning (DL) model, convolutional neural network (CNN) in particular, rather than the conventional machine learning (ML) models to perform the classification (Mirza et al. 2019). This is because feature extraction and selection—are crucial procedures when creating a ML model—can result in information loss (Faust et al. 2019; Loh et al. 2020; Mirza et al. 2019). DL models based on neural networks, can process high-dimensional data with minimal information loss (Faust et al. 2019). Additionally, the feature extraction and selection process is not necessary to develop a DL model (Faust et al. 2019).

Hence, we propose ADHD/CD-NET, a DL system that is a cost-effective CAD tool for ADHD and CD diagnosis. DL models, however, are not without drawbacks. The ‘black box’ nature of the DL model results in poor interpretability of the results, as neither the clinicians nor developers have information on how the DL model comes about with its prediction (Loh et al. 2022). Fortunately, explainable artificial intelligence (XAI) techniques have recently been developed to provide explanations for the DL model’s predicted results (Barredo Arrieta et al. 2020; Nazar et al. 2021). In this study, ADHD/CD-NET incorporates a well-known XAI technique known as gradient-weighted class activation mapping (Grad-CAM) to provide an interpretation of the predicted result (Zhou et al. 2015). The novelties of our study are summarized as follows:

  • To the best of our knowledge, this is the first study to use explainable DL approaches to analyze EEG data to distinguish ADHD from its comorbidities, ADHD + CD and CD.

  • We have also proposed a unique EEG preprocessing strategy, which involves estimating the Pearson correlation coefficient between EEG channels and then transforming the correlated EEG channels using CWT to generate a channel-wise CWT correlation matrix.

  • We employed the XAI technique (Grad-CAM) to visualize the interactions between EEG channels for patients with ADHD, ADHD + CD, and CD. This technique highlighted significant pairs of correlated EEG channels that played a crucial role in the classification of ADHD, ADHD + CD, and CD patients.

Related works

Numerous studies have been conducted to detect ADHD objectively. Table 1 contains a list of recent works completed between 2017 and 2022 that focused on distinguishing ADHD patients from healthy controls using EEG signals. Exceptional performance have been achieved with ML and DL models, with the lowest classification accuracy being 81% (Kim et al. 2021) and 83% (Vahid et al. 2019) for ML and DL models, respectively, and the highest being 100% (Kaur et al. 2020; Öztoprak et al. 2017), and 99.50% (Ahmadi et al. 2021). In addition, 7 out of 9 DL studies proposed using the CNN model, demonstrating that the CNN model is the go-to DL model for EEG analysis in ADHD detection, which we had also adopted in our study. Previous works have successfully demonstrated that the EEG characteristics of ADHD patients differ from those of healthy controls. The next step in improving ADHD diagnosis will be to further distinguish those diagnosed with ADHD from CD, which is a common comorbidity of ADHD that is frequently misdiagnosed as the other due to similar clinical symptoms (FARAONE et al. 1997; KUHNE et al., 1997). Hence distinguishing ADHD, ADHD + CD, and CD is a much more difficult task compared to ADHD and healthy controls. Therefore, this study proposed ADHD/CD-NET, a deep learning system for objectively identifying ADHD from CD using EEG signals.

Table 1 List of related EEG-based ML and DL studies that used the ADHD vs. healthy control dataset to detect ADHD

Methods

The deep learning system, ADHD/CD-NET, proposed in this study is depicted in Fig. 1. The subsequent sections contain information on the dataset used in this study, describing how we convert 12-channel EEG signals into a channel-wise CWT correlation matrix, expands on the model architecture of ADHD/CD-NET, and introduces Grad-CAM, which is employed to explain ADHD/CD-NET.

Fig. 1
figure 1

Flowchart process of ADHD/CD-NET

Data acquisition

Private dataset (internal evaluation)

The private EEG data for this study came from a clinical trial that was approved by the Domain Specific Review Board (DSRB) of the National Healthcare Group (NHG) in Singapore (DSRB 2008/00410) (Raine et al. 2019). The goal of the trial, which involved 123 participants (7–16 years old) from the Child Guidance Clinic in Singapore, was to determine whether omega-3 supplements have a reduction effect in lowering aggression and whether aggression can be further reduced when the supplements are used in conjunction with standard therapies. In addition to analyzing the efficacy of omega supplements, the trial also obtained ECG data from its participants at the baseline time point. The EEG data were anonymized and de-identified to ensure patient confidentiality. These participants were divided into three groups: CD only (16 children), ADHD only (45 participants), and ADHD + CD (62 participants), following the diagnostic criteria from the Diagnostic and Statistical Manual of Mental Disorders fourth edition Text Revision (DSM-IV-TR). Furthermore, the parents of these children completed a computerized Diagnostic Interview Schedule for Children (DISC), a standard diagnostic test that is widely used in ADHD research and assessment (Lewin et al. 2014). Participants remained resting with their eyes open for 3 min to collect resting-state EEG using MP150 single-channel EEG 100c biopotential amplifiers linked to data acquisition software Acknowledge. The 12 EEG channels aken are as follows: Fp1, Fp2, F3, F4, P3, P4, O1, O2, F7, F8, T3, and T4. As a result, 123 participants’ 12-channel EEG signals with a sampling frequency of 500 Hz were collected, and all 12 channels will be used in this study to develop a multi-channel CAD tool. Each signal is then segmented into 8 chunks of 21.25s epochs, each with 10,625 timesteps (\(21.25s\times 500?Hz\)), resulting in 128 CD samples (\(16 children\times 8 chunks\)), 360 ADHD samples(\(45 children\times 8 chunks\)), and 496 ADHD + CD samples (\(62 children\times 8 chunks\)). The choice of a 21.25-second epoch duration was determined through extensive experimentation involving various segment durations. After thorough testing, it became evident that utilizing a 21.25-second epoch consistently produced the most optimal and favorable results.

Public dataset (external evaluation)

This study’s available EEG data comes from (Ali Motie Nasrabadi, Armin Allahverdy, Mehdi Samavati, 2020), which includes 61 children with ADHD and 60 healthy controls (HC), all within the age range of 7 to 12 years old. A qualified psychiatrist diagnosed the ADHD adolescents using DSM-IV criteria, and they were given Ritalin for up to 6 months. There were no psychiatric illnesses, epilepsy, or reports of high-risk behaviors among the children in the HC group. The EEG recording was based on a visual attention test in which the children were given a collection of images with cartoon characters to count. Images were presented instantly and uninterrupted after the youngster provided their response to ensure continuous stimulation during EEG recording. The EEG data were collected at the Psychology and Psychiatry Research Center at Roozbeh Hospital (Tehran, Iran) using 19 channels (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) at a sampling frequency of 128 Hz. Similarly, we divided each EEG signal into 16 chunks of 4s epochs, each with 512 timesteps (\(4s\times 128?Hz\)), yielding 976 ADHD (\(61 children\times 16 chunks\)) and 960 HC samples (\(61 children\times 16 chunks\)). Likewise, the utilization of a 4-second epoch duration was determined as optimal through thorough experimentation with various segment durations, consistently showcasing superior results.

Preprocessing

This subsection explains how EEG signals are converted to scalograms using the continuous wavelet transform (CWT) and the correlations calculated between each channel scalogram, resulting in a channel-wise CWT correlation matrix.

Continuous wavelets transform (CWT)

Wavelets are a multi-resolution approach that allows for time and frequency fidelities in different frequency bands, making them extremely useful in signal decomposition (Brunton and Kutz 2019). Wavelet fundamentals begin with the mother wavelet \(\psi \left(t\right)\), which is described in Eq. 1, where \(a\) and \(b\) plays the role of scaling and translating the mother wavelet \(\psi\), respectively (Brunton and Kutz 2019).

$${\psi }_{a,b}\left(t\right)=\frac{1}{\sqrt{a}}\psi \left(\frac{t-b}{a}\right)$$
(1)

In CWT, wavelets of varying scales and times are used to shift across the input signal, yielding coefficients that are a function of wavelet scales and shift parameters (Raghavendra et al. 2021). Equation 2, in which the input signal is denoted by \(f\left(t\right)\), also describes this CWT mechanism. The resulting transformed signal will be converted into a coefficient matrix of \(n \times m\) where \(n\) is the total number of scale and \(m\) is the length of the signal (Raghavendra et al. 2021). In our study, we have set \(n\) = 30 and \(m\) = 10,625 which is the timestep of the segmented EEG signal.

$${\mathcal{W}}_{\psi }\left(f\right)\left(a,b\right)= ?f,{\psi }_{a,b}?= {\int }_{-\infty }^{\infty }f\left(t\right){\stackrel{-}{\psi }}_{a,b}\left(t\right)dt$$
(2)

Channel-wise CWT correlation matrix

PyWavelets (Lee et al. 2019), an open-source wavelet transformation program for Python, was used to apply CWT to each EEG segment, as indicated in Fig. 2. After experimenting with all of the wavelets available in PyWavelets, we chose Gaussian wavelet ‘gaus6’ for our signal transformation as it produced the best results. As a result, for each EEG channel, we were able to obtain a scalogram of size 30 \(\times\) 10,625, giving us 360 \(\times\) 10,625 for 12 channels. Then, we computed the Pearson correlation coefficient of all the channels, resulting in a channel-wise CWT correlation matrix of size 360 \(\times\) 360, which we will use to train our deep learning model.

Fig. 2
figure 2

12-Channel EEG segment conversion to channel-wise CWT correlation matrix

Deep CNN model

After undergoing multiple rounds of hyperparameter tuning, we developed ADHD/CD-NET, illustrated in Fig. 3 and outlined in Table 2. This deep CNN model was specifically designed to classify channel-wise Continuous Wavelet Transform (CWT) correlation matrices into three distinct classes: ADHD, ADHD + CD, and CD. It has been shown in numerous studies that CNN models are suitable in various medical fields that require medical image analysis from CT, MRI, PET and X-ray (Soffer et al. 2019; Yamashita et al. 2018). This is because CNN models are designed to emulate the image recognition abilities of the human visual system (Balderas Silva et al. 2018). In addition to the analysis of medical images, CNN models can also be utilized for biosignals. In this case, there are two approaches: (a) use 1-dimensional (1D) CNN for the full signal analysis or (b) convert the biosignals into 2-dimensional (2D) representation and use 2D-CNN. The latter method was used in this study, where 12-channel EEG data segments were transformed into 2D channel-wise CWT correlation matrices for our proposed 2D CNN model.

Fig. 3
figure 3

Model architecture of ADHD/CD-NET

Table 2 Model layer parameters of ADHD/CD-NET

The convolutional layer, pooling layer, and fully-connected layer were the three key layers that made up a fundamental CNN model. Equation 3 illustrates the operation of the convolutional layer, which converts the input image into a much more simplistic representation for image classification. \(S\) represents the input image, \(*\) is known as the discrete convolutional operation, and the convolutional kernel’s weight is \(W\) which is updated continuously as the kernel moves over the input feature (Albawi et al. 2017; Yildirim et al. 2019). The result of the convolutional layer is the feature map (\(O\)), which is represented by Eq. 4 where \(i\) and \(j\) are the feature map’s dimensions (Albawi et al. 2017; Yildirim et al. 2019). The feature map is further simplified by the max pooling layer, which is applied after each convolutional operation. This lowers the feature map’s complexity, lowering the likelihood that the model would overfit (Hafemann et al. 2017). In addition, at layer no. 11 (Fig. 3; Table 2), we have included a global max pooling layer that covers the entire feature map rather than a restricted kernel as in the pooling layer. Hence, the global max pooling layer can significantly reduce the complexity of the feature map compared to a simple max pooling layer.

$$\left(S*W\right)\left(i,j\right)=\sum _{m}\sum _{n}S\left(m,n\right)W(i-m,j-n)$$
(3)
$${O}_{n}^{l}={\left({S}_{W\left(i,j\right)}*W\left(i,j\right)\right)}_{n}$$
(4)

In ADHD/CD-NET, the convolution and max pooling operations reduced the input matrix from 360 \(\times\) 360 to 11 \(\times\) 11, which is then reduced to a 1D array of length 128 after the global max pooling layer. The fully-connected layer, which is the neural network component of the CNN model, will receive this 1D array as input. The fully-connected layer is composed of two layers; the first layer is made up of 32 neurons, while the final output layer is made up of 3 neurons that use the SoftMax activation function to determine the likelihood that the sample would fall into one of three categories: CD, ADHD + CD, or CD. In addition, we have a dropout layer of 0.2, just before the fully connected layer, to lessen the likelihood of model overfitting. As for the optimizer and loss function of the deep CNN model, we used Adam with a learning rate of 0.0001 and sparse categorical cross entropy, respectively. Then we train the model using 700 epochs with a batch size of 15. To address the imbalance in the dataset caused by CD samples being significantly smaller than ADHD and ADHD + CD samples, the weighted loss was also incorporated during model training. This guarantees that the minority CD class is given more weight than the ADHD and ADHD + CD classes, allowing the model to emphasize learning the CD class rather than being overwhelmed by the potential bias created by the big ADHD + CD class. ADHD/CD-NET was created with Python using Tensorflow (v2.9.1). The specifications of the computer used to train the model are Intel Core i9-12900 F CPU, Nvidia QuadroA2000 12GB, 128GB RAM, and 1.0 TB 2.5 Inch SATA SSD.

Gradient-weighted class activation mapping (Grad-CAM)

Grad-CAM is a well-known XAI method for interpreting CNN models by displaying to users what the CNN model “sees” as a significant characteristic when generating a prediction (Jahmunah et al. 2022; Selvaraju et al. 2016). The distinctive features will gradually become more noticeable as the CNN layers’ convolution operation continue to convolve the feature map; as a result, the last convolutional layer is considered to have the most distinctive feature highlighted. Therefore, Grad-CAM is frequently applied to the final convolution layer to determine which features have been highlighted by utilizing the gradient information provided by the neurons in the convolutional layer as they assign importance value to the region of interest on the feature map (Jahmunah et al. 2022; Selvaraju et al. 2016). Then, Grad-CAM produces a heatmap with the important regions highlighted in red and the less important regions remaining in blue.

Results

Internal evaluation results (private dataset)

We used 10-fold cross-validation to evaluate the model. Figure 4 depicts the model performance graph during training, which demonstrated that the model did not overfit, as seen by the consistently small difference between the training and validation curves. In addition, we used a model checkpoint during model training to save the best-performing model weights obtained during training. The best-performing model weights will then be used to evaluate the test fold in 10-fold cross-validation.

Fig. 4
figure 4

Performance graph of ADHD/CD-NET during model training

The dataset was divided into ten equal folds for 10-fold cross-validation, with 9 folds used for model training and the remaining fold used to evaluate model performance. In addition, we used 1 fold from the training dataset for model tuning as the validation set. This process is repeated ten times to ensure that every fold has gone through model training, validation, and testing. The results of 10-fold cross-validation are shown in Table 3. ADHD/CD-NET achieved a high overall performance of 93.70% classification accuracy. The model also has a high specificity of 95.35%, which means it correctly recognizes the majority of true negative samples rather than misclassifying them as false positives. The remaining metrics are sensitivity and precision. Sensitivity measures how many samples were correctly classified as true positive rather than a false negative, while precision compares the true positive class to the false positive class. Hence, there is always a trade-off between model sensitivity and precision, as high model sensitivity may indicate that the model is able to correctly recognize most of the true positive samples, whereas low model precision indicates that the model incorrectly classifies the majority of other samples as the true positive class. Fortunately, ADHD/CD-NET achieves 90.83% and 91.85% for model sensitivity and precision, indicating that our model can effectively balance the trade-off between sensitivity and precision. The model performance can also be visualized using a confusion matrix, as shown in Fig. 5, which demonstrates that the majority of the samples in each class have been correctly identified even when there is a class imbalance, with the CD class having the fewest samples, but the model could correctly predict 115 out of 128 CD samples.

Table 3 Performance parameter of ADHD/CD-NET for internal evaluation with our private dataset
Fig. 5
figure 5

Normalized confusion matrix of ADHD/CD-NET for internal evaluation with our private dataset. ‘A’ represents ADHD samples, ‘AC’ represents ADHD + CD samples, and ‘C’ represents CD samples

Explanation with Grad-CAM (private dataset)

To demonstrate the explainability of ADHD/CD-NET, we performed a separate experiment in which we randomly selected one subject from each category of ADHD, ADHD + CD, and CD as the test set, with the remaining subjects used to train the model. This ensures that the test set is entirely new to the model. Recall that we segmented the EEG data of each patient into 8 chunks in Sect. 2.1. Figure 6 depicts the classification result of the model on the test set. As can be seen, the model correctly identifies the ADHD and CD classes with 100% accuracy. The ADHD + CD samples, on the other hand, had three segments misclassified as ADHD. We then applied Grad-CAM to all of the segments and interestingly, we discovered that the correlation between channels Fp2 and P4 appears to be the consistent significant contributors as to why the segment is predicted as ADHD (Fig. 7). Repeating patterns are less visible in ADHD + CD and CD samples, especially in the former, because it contains the combination of ADHD and CD characteristics (Fig. 8). Nonetheless, there are some regions in the CD samples that appear to be significant for predicting CD: the correlation between Fp1 with F8 and Fp1 with T3 (Fig. 9).

Fig. 6
figure 6

Confusion matrix of ADHD/CD-NET on test set of 3 subjects from each disorder category. ‘A’ represents ADHD samples, ‘AC’ represents ADHD + CD samples, and ‘C’ represents CD samples

Fig. 7
figure 7

Grad-CAM heatmap produced for each segment’s channel-wise CWT matrix in the ADHD subject. Red circles indicate the EEG channels recognized by ADHD/CD-NET as important for prediction

Fig. 8
figure 8

Grad-CAM heatmap produced for each segment’s channel-wise CWT matrix in the ADHD+CD subject. Red circles indicate the EEG channels recognized by ADHD/CD-NET as important for prediction

Fig. 9
figure 9

Grad-CAM heatmap produced for each segment’s channel-wise CWT matrix in the CD subject. Red circles indicate the EEG channels recognized by ADHD/CD-NET as important for prediction

External evaluation results (public dataset)

Prior to using an external public dataset to assess ADHD/CD-NET, the weights from earlier training with internal private dataset have already been cleared. This ensures that the ADHD/CD-NET is evaluated with a clean slate and that memory from earlier training does not interfere with its evaluation with an external public dataset. Table 4 displays the findings of 10-fold cross validation for external evaluation. ADHD/CD-NET has a high classification accuracy of 98.19%, sensitivity of 98.36%, specificity of 98.02%, and precision of 98.06%. This demonstrates that ADHD/CD-NET can also perform well with an external dataset and accurately distinguish between ADHD and HC control, as seen in the confusion matrix in Fig. 10; ADHD/CD-NET properly detected 960 out of 976 ADHD samples and 941 out of 960 HC samples.

Table 4 Performance parameter of ADHD/CD-NET for external evaluation with public dataset
Fig. 10
figure 10

Normalized confusion matrix of ADHD/CD-NET for external evaluation with public dataset

Discussion

This study objectively distinguishes ADHD from CD using EEG signals via our proposed deep learning system, ADHD/CD-NET. The treatment protocol for this three combination of disorders differ; while stimulant and non-stimulant medications are available for ADHD (Brown et al. 2018), there are currently no official U.S. Food and Drug Administration (FDA) medications approved to treat CD (Lillig 2018a). Misdiagnosis of CD as ADHD increases the likelihood of incorrect medication prescription and may also result in a higher risk of drug abuse, as was mentioned earlier, where students may fake ADHD symptoms to obtain stimulant medication (Sansone and Sansone 2011). Therefore, objective ADHD, ADHD + CD, and CD diagnosis is required not only for the benefit of early detection and treatment but also to reduce the likelihood of drug abuse.

Previously, our team proposed ML models using the same dataset as in Table 5. This study is the first to use a DL model with the unique dataset to differentiate ADHD, ADHD + CD, and CD. In (Tor et al. 2021) study, they used empirical mode decomposition and discrete wavelet transform to decompose the signal, then extracted nonlinear features such as entropy, fractal dimension, Lempel-Ziv complexity, and so on. The top few significant features were then selected using a sequential forward selection technique to train their k-nearest neighbour (kNN) ML classifier, resulting in a high classification accuracy of 97.88%. Despite the high model performance, the proposed technique suffers from poor interpretability because nonlinear features of EEG signals are not recognized as a clinical standard for ADHD or CD diagnosis. When they decomposed the EEG signals to extract nonlinear features, information such as time and location of the EEG characteristic contributing to the diagnosis were lost.

Table 5 List of studies that used the same private dataset for classification of ADHD, ADHD + CD, and CD

Similarly, (Koh et al. 2022) followed the same procedure to decompose the ECG signals using empirical wavelet transform (EWT) and extracted EWT entropies feature to train their best performing ML classifier, bagged tree, resulting in 87.19% classification accuracy. Likewise, these EWT entropies are not clinically recognized as well. Therefore, this research aims to improve on previous works by incorporating time localization and channel selection. Segmenting EEG signals into chunks of 0-21.25 s, 21.26-42.50 s, 42.51-63.75 s, and so on reveals which EEG segment in time exhibits ADHD or CD characteristics. Grad-CAM’s heatmap provided information on which EEG channel displayed such characteristics. This is due to the fact that EEG signals are highly varying, which means that not all EEG channels may capture the important characteristic contributing to the diagnosis. Similarly, it is possible that not all time segments consistently exhibit the characteristics of ADHD or CD patients. Thus, our study addresses this limitation by identifying the contributing EEG segment and significant channels for ADHD, ADHD + CD, and CD detection, thereby providing objective data for relevant medical professionals in this field, such as psychologists, pediatricians, and neurologists.

Furthermore, we externally evaluated our proposed model with a public dataset (Ali Motie Nasrabadi, Armin Allahverdy, Mehdi Samavati, 2020) to demonstrate that ADHD/CD-NET can discriminate ADHD from HC in addition to ADHD, ADHD + CD, and CD. As a result, ADHD/CD-NET produced equivalent results to earlier research that employed the same public dataset; as shown in Table 6, all studies achieved classification accuracy of greater than 90% using DL models. (Khare and Acharya 2023) achieved the highest classification accuracy of 99.81%, followed by (Talebi and Motie Nasrabadi 2022) with classification accuracy of 99.09%. ADHD/CD-NET, on the other hand, achieved a high comparable classification accuracy of 98.19%. This shows that ADHD/CD-NET can perform in both three-class (ADHD, ADHD + CD, and CD) and binary (ADHD and HC) classification tasks. It can be noted from Table 6 that, the performance of our proposed model is comparable with the state-of-the-art techniques. We have also shown that our model is able to classify ADHD from HC without changing layer parameters using an external public dataset. This justifies that our generated mode is accurate and robust.

Table 6 List of studies that used the same public dataset for classification of ADHD and HC

In summary, the novelties and the significant aspects of our research are listed as follows:

  • We are the first study to use an explainable DL model to distinguish between ADHD, ADHD + CD, and CD.

  • Another innovative concept is the preprocessing method for transforming EEG into a channel-wise CWT correlation matrix.

  • We achieved 93.70% classification accuracy, demonstrating the efficacy of ADHD/CD-NET.

  • Grad-CAM was also used to highlight the EEG channels that significantly influenced the classification outcome.

  • As a result, ADHD/CD-NET can locate the EEG channels that exhibit abnormal EEG features and provide time localization of such traits.

  • ADHD/CD-NET also achieves a high classification accuracy of 98.19% with an external public dataset, separating ADHD samples from HC samples.

Despite the benefits of ADHD/CD-NET, we are nevertheless constrained by issues like the lack of data which compromised the performance of ADHD/CD-NET. The data’s diversity is also another limitation because Singapore is only home to three major races: Chinese, Malays, and Indians. As a result, the data used in this study can only represent the Singaporean population and cannot be generalized (Chong 2007).

Having an objective CAD tool for ADHD and CD diagnosis and differentiation, on the other hand, can significantly reduce the healthcare burden in Singapore’s mental health institutions, where there is a shortage of healthcare professionals; the population of psychiatrists to population ratio is an appallingly low 2.6:100,000 (Chong 2007). Therefore, future work must be undertaken in order to eventually incorporate CAD tools in mental healthcare facilities. As a result of the success of this study, we hope to develop a DL model for ADHD, ADHD + CD, and CD differentiation using ECG signals, allowing for an additional parameter for objective diagnosis in addition to EEG. Furthermore, unlike EEG, the ECG signal can be easily captured by smartwatches as compared to the laborious electrodes needed to record EEG signals. ECG signal preprocessing is significantly less complex than EEG’s, which lowers computational complexity. If such a successful DL system is created, patients with ADHD or CD may be able to monitor their conditions using smartphones. This will help confirm the diagnosis of ADHD because mental health professionals and clinicians will have access to the daily ECG data to see if the patient exhibits any characteristics of ADHD or CD. Hence, the path of future ADHD or CD detection and monitoring systems should shift towards widely accessible physiological data, such as ECG (Khare et al. 2023; Loh et al. 2023). As such, future work will necessarily require the collection of ECG data via wearable devices from ADHD, ADHD + CD, CD, and healthy controls to develop this ADHD or CD detection and monitoring system.

Conclusion

This work proposes a deep learning system (ADHD/CD-NET) for EEG signal-based detection of CD, ADHD + CD, and ADHD. A unique preprocessing method is used in this proposed system to turn segments of 12 EEG channels into channel-wise CWT correlation matrices, which the deep CNN model then analyzes and categorizes into one of three disorder categories. As a result, ADHD/CD-NET successfully achieved a high classification accuracy of 93.70%, proving that our system can distinguish ADHD, ADHD + CD, and CD, which is difficult for mental health professionals and clinicians to do because they share similar clinical symptoms. Additionally, we use Grad-CAM, which can assist us in highlighting the critical EEG channels to consider for diagnosis. Hence, ADHD/CD-NET is capable of performing time localization from the EEG signal segments in time and selection of significant EEG channels for diagnosis, offering objecting analysis for mental health professionals and clinicians to consider when making a diagnosis. Additionally, we tested ADHD/CD-NET using an external public dataset, and the results showed 98.19% classification accuracy when differentiating ADHD from healthy controls. In the future, we intend to evaluate our model using a larger samples dataset with more patient diversity and additional physiological signals, such as the ECG, which can be easily obtained via smartwatches.