1 Introduction

Parkinson's disease (PD), initially developed by James Parkinson, affects an individual's movement, leading to muscle stiffness, tremors, and changes in speech and writing skills [1]. This condition occurs when nerve cells produce a chemical called dopamine that breaks down, making nerve cells unable to transmit messages accurately, this condition occur may be due to genetic factors. It can result in depression, nervous disorders, and memory impairment [2]. Several authors have been conducted to diagnose the disease at its earlier stage, though not with great accomplishment. Identifying the disease in its earlier stages is significant so patients can live quality lives [3]. The disease at its advanced stage affects the day-to-day tasks, and the person might need help from others. The later stages of PD are sufficiently severe as the patient gets stiffness in the legs, making it impossible to stand or walk and might cause freezing on standing. Several methods have been utilized for identifying the disease correctly such as writing, speech, and voice exams [4]. The handwritten exam is widely employed for diagnosing PD, because it is easier to get data and inexpensive.

In recent times, data have been enhanced by the amount of features and instances that make data noisier [5]. The noisier data sets construct the algorithm to increase the computational cost, decrease the predicted accuracy, train the data slower, and increase the complexity. Thus, the feature selection (FS) method designed a significant process for the machine learning (ML) approach before training the model [6]. The task of processing and preprocessing data is a complex, as the increase in feature and instance count results in an increase in the quantity of data. Growth in data makes it more vulnerable to noise, which might result in degraded results and a drop in performance. Therefore, it becomes indispensable for treating the data [7]. Complexity tends and computational cost to increase when a large amount of data is used. Hence, the FS method plays an important role in building architecture in ML. In the FS method, also called parameter selection, a feature subset is selected from existing features.

The primary objective is to improve the algorithm’s accuracy before and after FS. The FS method assists in resolving the problems by reducing the computation complexity and cost of datasets [8]. Information has been enhanced by using many instances and features, which makes data noisier. The noisier dataset causes the algorithm to reduce the accuracy predicted by models, increases computational costs, increases complexity, and trains the data slower [9]. Consequently, the FS method has become an essential process for ML before training the model. The FS approach focuses on finding a subset from the entire set of features and less downgrade performance of the network; so, the subset of features forecasts the target with performance similar to the accuracy of the novel set of features and reduced of computation cost. Feature selection helps to understand the causes of disease, reduces the computational requirements, and prevents degradation in performance that contributes to better/faster convergence of the deep training method.

The FS model is classified into wrapper—and filter-based models [10]. The FS algorithms use MLs in wrapper-based approaches to check the accuracy of the selected subset of features with high accuracy. However, these approaches could be more effective with high dimensional datasets due to high training time [11]. Subsequently, filter-based approaches use statistical data dependency methods to reach the best subset faster. Filter-based approaches are less accurate, more scalable, faster, and less computationally expensive than wrapper-based approaches [12].

The QMFOFS-HCNN technique aims to improve Parkinson's disease (PD) detection and classification by utilizing Quantum Mayfly Optimization (QMFO) for feature selection, a Convolutional Neural Network with Attention Long Short Term Memory (CNN-ALSTM) for classification, and hyperparameter tuning with the Nadam optimizer. The contributions of the given study are: (i) It uses QMFO to select relevant features, enhancing classification accuracy and reducing computational complexity. (ii) It employs CNN-ALSTM for PD classification, which is well-suited for biomedical time series data with an attention mechanism to capture important information. (iii) It fine-tunes model parameters with Nadam optimizer, improving overall performance. (iv) It demonstrates superior accuracy and detection rates compared to existing methods on benchmark PD datasets. (v) It efficiently selects minimal features while maintaining high accuracy, which is crucial for real world applications. (vi) It is effective across various PD datasets, suggesting broader applicability.

Thus, this study develops a quantum mayfly optimization-based feature subset selection with a hybrid convolutional neural network (QMFOFS-HCNN) technique for PD detection and classification. The principal intention of the QMFOFS-HCNN technique is to identify the optimal feature subsets and enhance the classification accuracy of the PD diagnosis. The QMFOFS-HCNN technique initially designs a novel QMFO approach for the optimum feature choice and resolves the curse of dimensionality problem. In addition, an optimal CNN with attention long short-term memory (CNN-ALSTM) model is employed to detect and classify PD. In order to effectively boost the PD classification outcomes, the Nadam optimizer can be utilized to select the hyperparameters. The experimental validation takes place using the benchmark datasets, and the results are assessed under several aspects.

2 Literature survey

The authors in [13] developed a cloud-based PD predictive model for making medical decisions that assist physicians in identifying the Parkinson-affected person from a remote place. An efficient expanded cat swarm optimization (ECSO)-based FS method has been examined to resolve the problems of data dimensionality. The classification method can considerably enhance the disease predictive performance by utilizing the FS method in the K-nearest neighbour (K-NN). Solana-Lavalle et al. [14] focused on increasing the accuracy and reducing the amount of selected vocal features in PD diagnoses while utilizing the most extensive and newest open-source dataset. While the number of features in this public dataset is 754, the number of selected features for classification ranges from 8 to 20 after utilizing the Wrapper feature subset selection. The KNN, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF) classifiers are employed for detecting vocal-based PD.

Mathur et al. [15] use different ML methods, which could enhance the efficiency of data sets and play a significant part in making the earlier disease prediction. Afterward, the comparison of this algorithm selects the efficient one in terms of accuracy. The experiment outcomes show that the performance attained from the integrated effects of artificial neural network (ANN) and KNN algorithm is more effective than other approaches. The authors in [11] introduced two NN-based methods, voice impairment classifier and spectrogram detector that focus on helping people and doctors identify disease at earlier stages. A wide-ranging assessment of CNN has been conducted on a large image classifier of gait signal transformed to spectrogram image and deep dense ANN on the voice recording to forecast the disease. El Maachi et al. [12] developed a smart PD method-based deep learning (DL) method for analysing gait data. Then, 1D-Convnet is used to construct the deep neural network (DNN) classification. The presented method processes eighteen 1D signals from the foot sensor, evaluating the classifier. Haq et al. [16] introduced an ML and DNN-based non-invasive predictive model for timely and accurate diagnoses of PD. The ML prediction methods, namely SVM, linear regression (LR), and DNN, have been utilized for classifying healthier people and PD. Zhang et al. [17] proposed an energy direction feature-based empirical mode decomposition (EDF-EMD) feature to display the distinct features of voice signals among healthy and PD patients. At first, the intrinsic mode function (IMF) was attained by using the decomposition of voice signal with empirical mode decomposition.

In Parkinson's disease (PD) research, several previous studies have aimed to diagnose the condition in its early stages but have had limited success [18, 19]. Detecting PD early on is crucial for improving the quality of life for patients. Existing approaches have explored methods such as handwriting analysis, speech assessment, and voice examinations, with handwriting being a preferred choice due to its ease of data collection and affordability [20,21,22]. However, contemporary data sets have grown in size and complexity, introducing noise that can hinder algorithm accuracy, increase computational costs, and slow down data processing [23]. Researchers have turned to feature selection (FS) techniques to address these challenges as a critical step in machine learning (ML) model development. FS helps optimize algorithm performance by selecting a subset of relevant features, reducing computational complexity, and mitigating data noise. The QMFOFS-HCNN technique presented in this study represents a significant advancement in PD detection and classification. It leverages Quantum Mayfly Optimization (QMFO) for feature selection, employs a Convolutional Neural Network with Attention Long Short Term Memory (CNN-ALSTM) for classification, and fine-tunes model parameters using the Nadam optimizer. The key contributions of this research include improved accuracy, feature subset optimization, and enhanced classification performance. Importantly, this technique efficiently selects minimal features while maintaining high accuracy, making it suitable for real-world applications across various PD datasets. Compared to prior work, this study introduces a comprehensive and innovative approach to PD detection, offering the potential for more accurate and efficient diagnoses. While previous research has explored various machine learning methods and feature selection techniques [24], the QMFOFS-HCNN method stands out for its superior accuracy, computational efficiency, and adaptability across diverse PD datasets.

3 Material and methods

3.1 Dataset

The proposed method has been employed with datasets related to Parkinson's disease, encompassing diverse types of sound recordings, as well as data from Parkinson's HandPD, which are as follows:

In the Speech PD dataset, a set of biomedical voice measurements has been gathered from 23 individuals. Each dataset column corresponds to a particular voice measurement, and each dataset row links to one of the 195 recordings of voice taken from these individuals. The main aim of this dataset is to classify between healthy individuals (coded as 0) and those with PD (coded as 1). The dataset was curated by Max Little from the University of Oxford, in partnership with the National Centre for Voice and Speech in Denver, Colorado, where the speech signals were acquired .

In the Voice PD dataset, the training data comprises of records from 20 individuals with PD (14 male and 6 female) and 20 healthy (10 male and 10 female) individuals, who were seen at the department of neurology in Cerrahpasa faculty of medicine, Istanbul University. In data acquisition process, 28 PD patients were suggested to repeat the vowels 'a' and 'o' three times each, ensuing total 168 voice recordings. This dataset acts as a valuable independent test set for results validation obtained from the training dataset.

In the HandPD meander dataset, data has been collected from a total of 158 individuals, including 74 in patient group and 18 in healthy group. Dataset comprises 632 data instances encompassing 13 distinct features. Furthermore, the dataset involves 632 images of meanders drawn by the patients. These individuals represented the age ranges, from 14 to 79 years old. The handwritten examinations were comprised at Botucatu Medical School, São Paulo State University, Brazil.

In the HandPD spiral dataset, participants were recommended to sketch spirals instead of meanders. This dataset comprises of data from 158 individuals. It involves 632 data instances and contains 13 distinct features. The handwritten examinations were collected at Botucatu Medical School, São Paulo State University, Brazil.

3.2 Methods

3.2.1 Design of QMFOFS-HCNN model

This study has developed a novel QMFOFS-HCNN technique for detecting and classifying PD. It aimed to identify the optimal feature subsets and optimize the classification performance of the PD diagnosis. The suggested QMFOFS-HCNN technique encompasses several processes such as QMFO-based feature subset selection, CNN-ALSTM-based classifier, and Nadam-based hyper-parameter tuning. Using QMFOFS and Nadam techniques helps boost the PD classification outcomes effectively. Figure 1 depicts the entire working process of the proposed QMFOFS-HCNN technique. Figure 1 depicts the entire working process of the proposed QMFOFS-HCNN technique. First, the Parkinson dataset has been given as input for its pre-processing in order to remove artefacts. Subsequently, the dataset is divided into the training and testing datasets for providing training and testing. Afterward, a novel QMFO-based feature selection (FS) method has been used to resolve the curse of dimensionality problem by reduction of computational complexity. Information has been enhanced by using many instances and features, which makes data noisier. The noisier dataset causes the algorithm to reduce the accuracy predicted by models, the computational costs, increases the complexity and slows the training process.

Fig. 1
figure 1

Overall working process of QMFOFS-HCNN technique

Consequently, the FS approach focuses on finding an appropriate subset from the entire set of features with high performance of the network and less computational cost. In order to optimize the efficiency of the MFO algorithm, the QMFO technique is derived; for details, refer to [25]. Subsequently, the Nadam optimizer has been used to boost the classification outcome for hyperparameter tuning. At this stage, the CNN-ALSTM model is employed for PD classification. The CNN‐ALSTM is a hybrid DL approach for extracting features in the raw information and implementing predicting utilizing the LSTM-NN [26]. The CNN uses LSTM for optimum extracting the features of experimental data. The attention method is a procedure for allocating weight. Thus, the proposed work develops a QMFOFS-HCNN technique for PD detection and classification. The primary intention of the proposed technique is to identify the optimal feature subsets and enhance the classification accuracy of the PD diagnosis.

3.2.2 Algorithmic design of QMFOFS technique

The MFO algorithm derives from the social activity of MFs [27]. MFs were generated by adults, and afterward, the fittest lived. Two sets of populations were primarily created. It can signify both males as well as female populations. The candidate is signified by \(d\) dimension vector \(x=\left({x}_{1},\dots ,{x}_{d}\right)\). The fitness of candidates is estimated by computing the fitness function (FF) \(fnfx)\). The velocity \(v=({v}_{1},\dots ,{v}_{d})\) has been modified from the candidate place. All the candidates alter their trajectory based on their optimum place (pbest) and an optimum place for every MF (gbest).

Collecting male MFs reflects all males’ knowledge from defining their place in terms of \(neighbor{s}^{I}\) places determining \({x}_{i}^{t}\) as present place of candidate solutions \(i\) at time \(t\), the place was changed by adding a velocity \({v}_{i}^{t+1}\) as [28]:

$${x}_{i}^{t+1}={x}_{i}^{t}+{v}_{i}^{t+1}$$
(1)

With \({x}_{i}^{0} U ({x}_{\mathrm{ min }},{x}_{\mathrm{ max }})\). Considering the minimum velocity of the male population, the velocity is computed as follows:

$${v}_{ij}^{t+1}={v}_{ij}^{t}+{a}_{1}{e}^{-\beta {r}_{p}^{2}} \left(p{{\text{best}}}_{ij}-{x}_{ij}^{t}\right)+{a}_{2}{e}^{-\beta {r}_{g}^{2}} \left(g{{\text{best}}}_{i}-{x}_{ij}^{t}\right),$$
(2)

, where \({v}_{ij}^{t}\) refers to the velocity of MFs \(i,\) \({x}_{ij}^{t}\) signifies the place of MFs \(i,\) \({a}_{1}\), and \({a}_{2}\) are determined as positive constants signifying the attractive. \(pbes{t}_{i}\) stands for the optimum place that candidate solution \(i\) had always obtained, and \(pbes{t}_{ij}\) at the subsequent step t + 1 was defined in Eq. (3).

$$p{\text{best}}_{i} = \left\{ {\begin{array}{*{20}l} {x_{i}^{t + 1} ,} \hfill & {if f\left( {x_{i}^{t + 1} } \right) < f\left( {p{\text{best}}_{i} } \right)} \hfill \\ {{\text{same as before}},} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$
(3)

, where \(f:{\mathbb{R}}^{n}\Rightarrow {\mathbb{R}}\) refers to the function minimizing, \(gbest\) signifies the global optimum attained from the issue ever at time \(t.\) The co-efficient in Eq. (2) limits the \(populatio{n}^{I}s\) visibility. \({r}_{p}\) implies the distance among \({x}_{i}\) and \(pbes{t}_{i}.\) In the meantime, \({r}_{g}\) determines the distance in \({x}_{i}\) to gbest. \({r}_{p}\) and \({r}_{g}\) are defined in Eq. (4).

$$\Vert {x}_{i}-{X}_{i}\Vert =\sqrt{{\sum }_{j=1}^{n}({x}_{ij}-{X}_{ij}{)}^{2}}$$
(4)

, where \({x}_{ij}\) refers the \({j}^{{\text{th}}}\) component of \({i}^{{\text{th}}}\) candidate. \({X}_{i}\) is connected to pbest.

An optimum fit candidate keeps implementing up and down motions by different velocities. The velocity is defined as in Eq. (5).

$$v_{ij}^{t + 1} = v_{ij}^{t} + d \times r$$
(5)

, where d denotes the co-efficient compared with up and down motions, and \(r\) represents the arbitrary value between \(-1\) and 1. Figure 2 demonstrates the flowchart of the MFO technique.

Fig. 2
figure 2

Flowchart of MFO algorithm

The female MFs do not gather, but they move near males. Assume that \({y}_{i}^{t}\) is the present place of female MF \(i\) at time \(t\). The alteration from the place was computed as:

$$y_{i}^{t + 1} = y_{i}^{t} + v_{i}^{t + 1}$$
(6)

With \({y}_{i}^{0} U ({x}_{\mathrm{ min }},{ x}_{\mathrm{ max }})\). The female MFs’ velocity is defined as in Eq. (7).

$$v_{ij}^{t + 1} = \left\{ {\begin{array}{*{20}l} {v_{ij}^{t} + a_{2} e^{{ - \beta r_{mf}^{2} }} \left( {x_{ij}^{t} - y_{ij}^{t} } \right),} \hfill & if \quad {f\left( {y_{i} } \right) > f\left( {x_{i} } \right)} \hfill \\ {v_{ij}^{t} + fl \times r,} \hfill & {if \quad f\left( {y_{i} } \right) \le f\left( {x_{i} } \right)} \hfill \\ \end{array} } \right.$$
(7)

, where \({v}_{ij}^{t}\) refers to the velocity of \({i}^{th}\) female at time \(t,\) \({y}_{ij}^{t}\) signifies the place of \({i}^{th}\) female candidate solution at time \(t,\) \({a}_{2}\) signifies the positive constants, \(\beta\) stands for the set co-efficient, \({r}_{{\text{mf}}}\) indicates the distance between the male candidate solution and female ones that are calculated utilizing in Eq. (4), \(fl\) signifies the co-efficient that relates the female which is not attractive. \(r\) implies the arbitrary number between \(-1\) and 1. The mating was demonstrated by an operator that is a crossover operator. The pair of male, as well as female parents are selected.

$$\begin{gathered} {\text{off}}\;{\text{spring}}\;{\text{l }} = L \times {\text{male}} + \left( {1 - L} \right) \times {\text{female}}\; \hfill \\ {\text{off}}\;{\text{spring}}\;2 = L \times {\text{female}} + \left( {1 - L} \right) \times {\text{male}} \hfill \\ \end{gathered}$$
(8)

, where \(L\) refers to the arbitrary number. Primarily, the velocity of offspring is equivalent to 0. In order to optimize the efficiency of the MFO algorithm, the QMFO technique is derived [25]. With the quantization of grasshopper individuals, the feature search space has improved to balance exploitation and exploration. A vital unit of QC is qubit. The two important forms \(|0>\) and \(|1>\) way a qubit which has been formulated as a linear grouping of these two essential forms as:

$$|Q>=\alpha |0>+\beta |1>.$$
(9)

\({|\alpha |}^{2}\) refers the probability of identifying form \(|0>\), \({|\beta |}^{2}\) signifies the probability of detecting state \(|1>\), where \({|\alpha |}^{2}+{|\beta |}^{2}=1.\) The quantum is composed of \(n\) qubits. Because of the form of quantum superposition, all quantum has \({2}^{n}\) probable values.

$$\Psi = \sum\limits_{{x = 0}}^{{2^{n} - 1}} {C_{x} |x > } ,\sum\limits_{{x = 0}}^{{2^{n} - 1}} {|C_{x} |^{2} } = 1.$$
(10)

Quantum gates have modified the state of qubits as Hadamard, rotation, and NOT gates, among others. The rotation gate was explained as a mutation function to make quanta model optimal solutions and finally determined the global optimal solutions.

The rotation gate is shown as follows:

$$\left[ {\begin{array}{*{20}c} {\alpha^{d} \left( {t + 1} \right)} \\ {\beta^{d} \left( {t + 1} \right)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\text{cos}}\left( {\Delta \theta^{d} } \right)} & { - {\text{sin}}\left( {\Delta \theta^{d} } \right)} \\ {{\text{sin}}\left( {\Delta \theta^{d} } \right)} & {{\text{cos}}\left( {\Delta \theta^{d} } \right)} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\alpha^{d} \left( t \right)} \\ {\beta^{d} \left( t \right)} \\ \end{array} } \right]{\text{for }}d = 1,2, \ldots ,n.$$
(11)

\(\Delta \theta^{d} = \Delta \times S \left( {\alpha^{d} , \beta^{d} } \right)\), \(\Delta \theta^{d}\) stands for the rotation angle of qubit, whereas \(\Delta\) and \(S\left( {\alpha^{d} , \beta^{d} } \right)\) are size and way of rotation correspondingly.

The mathematical model of the QMFOFS approach was established. Generally, some data sets’ classification (i.e. supervised learning) is size \({N}_{S}\times {N}_{F}\), whereas \({N}_{S}\) refers to the number of instances, and \({N}_{F}\) implies the number of features. An important objective of the FS issue is to select a subset of features \(S\) in the entire amount of features \(({N}_{F})\) in which the size of \(S\) is lesser than \({N}_{F}\). It is obtained by minimizing the subsequent primary function:

$${\text{Fit}}=\lambda \times {\gamma }_{S}+\left(1-\lambda \right)\times \left(\frac{\left|S\right|}{{N}_{F}}\right)$$
(12)

, where \({\gamma }_{S}\) denotes the classification error utilizing \(S\) and \(|S|\) is the amount of chosen features. \(\lambda\) is utilized for balancing among \(\left(\frac{\left|S\right|}{{N}_{P}}\right)\) and \({\gamma }_{S}.\)

3.2.3 The process involved in CNN-ALSTM -based classification

At this stage, the CNN-ALSTM model is employed for PD classification. The CNN‐ALSTM is a hybrid DL approach for extracting features in the raw information and implementing predicting utilizing the LSTM-NN [26]. The CNN layer has been utilized for extracting the suitable features in the time series data, demonstrating extra hidden data has the potential for improving the forecast accuracy. The experimental outcomes illustrate that the CNN layer comprises one 16 \(3\times 1\) convolutional kernel layer and one 32 \(3\times 1\) convolutional kernel layer, which optimizes the forecast efficiency. The feature vector attained in the secondary layer of CNN is input to the LSTM layer to forecast. All the elements of feature vectors are similar to most 32 units from the LSTM layer. The attention process sets the superior weight to feature quantity, undoubtedly associated with the present output. Eventually, the FC layer managed the resultant vector of the attention process utilizing the unfolding function. The forecasted value of AC2 at the following moment was the outcome. The LSTM is well suited to forecast experimental time series data. The recent mechanism depicts the maximum predicting efficiency relating CNN and LSTM to distinct applications. The CNN uses LSTM for optimum extracting the features of experimental data. The attention method is a procedure for allocating weight. Inverse normalized prediction power was attained based on Eq. (13).

$${{\text{Pr}}}_{P}={{\text{Pr}}}_{Iac2}*\left({P}_{{\text{max}}}-{P}_{{\text{min}}}\right)+{P}_{{\text{min}}},$$
(13)

where \({{\text{Pr}}}_{p}\) refers to the forecasted value of powers and \({{\text{Pr}}}_{Iac2}\) signifies the forecasted value of \(AC2.\)

The presence of LSTM cell infrastructure efficiently solves the gradient explosion or vanishing issues. There are four essential components from the flowchart of the LSTM technique: cell status, output, input, and forget gates. Those gates were utilized to control the upgrading, maintaining, and deleting of data from cell status. The forward computation procedure is referred to as:

$$\begin{gathered} f_{z} = \sigma \left( {W_{f} \cdot \left[ {h_{z - 1} , x_{z} } \right] + b_{f} } \right), \hfill \\ i_{z} = \sigma \left( {W_{j} \cdot \left[ {h_{z - 1} , x_{z} } \right] + b_{i} } \right), \hfill \\ {\text{O}}_{{\text{z}}} = \sigma \left( {W_{O} \cdot \left[ {h_{z - 1} , x_{z} } \right] + b_{o} } \right), \hfill \\ \tilde{C}_{z} = {\text{ tanh }}\left( {W_{C} \cdot \left[ {h_{z - 1} , x_{z} } \right] + b_{c} } \right), \hfill \\ C_{z} = f_{z} \cdot C_{z - 1} + i_{z} \cdot \tilde{C}_{z} \hfill \\ h_{z} = O_{z} \cdot {\text{tanh}}\left( {C_{z} } \right), \hfill \\ \end{gathered}$$
(14)

, where \({W}_{f},\) \({W}_{j}\), and \({W}_{o}\) refer to the weight matrix of forgetting, input, and output gates correspondingly; \({b}_{f},\) \({b}_{j},\) and \({b}_{o}\) signifies the offset item of forget, input, and output gates correspondingly; \(\sigma\) signifies the sigmoid activation functions; \({\text{tanh}}\) denotes the hyperbolic tangent activation functions.

The attention process is a brain signal-processing method peculiar to human vision. It rapidly scans the global image to obtain the destination region, which requires attention and ignores other regions of unnecessary data. The attention process technique was effectively executed and implemented to train the model and other connected areas. The proposed model utilizes the LSTM hidden neuron resultant vector \(H=\{{h}_{1}, {h}_{2},\cdots ,{h}_{t}\}\) as input of the attention process, and the attention process will determine the attention weight \({\alpha }_{i}\) of \({h}_{i}\) that is computed as shown in Eq. (15).

$$\begin{gathered} e_{i} = {\text{ tanh }}\left( {W_{h} h_{i} + b_{h} } \right), \hfill \\ \alpha_{i} = \frac{{{\text{ exp }}\left( {e_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{t} {\text{ exp }}\left( {e_{i} } \right)}}, \hfill \\ \end{gathered}$$
(15)

whereas \({\alpha }_{i}\) signifies attention to weight, \({W}_{h}\) refers to the weight matrix of \({h}_{j}\), and \({b}_{h}\) represents the bias.

3.3 Hyperparameter tuning

For optimally tuning the hyperparameters of the CNN-ALSTM model, the Nadam optimizer is used. Nadam is an extended version of Adam optimizer [29], which can be applied to optimize the efficiency of the DL approaches. The upgrading rules of the Adam optimizer can be attained using the following equations:

$$g_{t} = \nabla_{{\theta_{f} }} J\left( {\theta_{t} } \right){ }$$
(16)
$$m_{t} = \beta_{1} m_{t - 1} + \left( {1 - \beta_{1} } \right)g_{t} { }$$
(17)
$$v_{t} = \beta_{2} v_{t - 1} + \left( {1 - \beta_{2} } \right)g_{t}^{2}$$
(18)
$$\hat{m}_{{\text{t}}} = \frac{{m_{t} }}{{1 - \beta_{1}^{t} }}{ }$$
(19)
$$\hat{v}_{{\text{t}}} = \frac{{v_{t} }}{{1 - \beta_{2}^{t} }}$$
(20)
$$\theta_{t + 1} = \theta_{t} - \frac{\eta }{{\sqrt {\hat{v}} + \varepsilon }}\hat{m}_{t}$$
(21)

, where \({g}_{t}\) indicates the gradient vector of the CNN-ALSTM model at the time of training; \(\eta\) denotes the learning rate of the CNN-ALSTM model training; \(J({\theta }_{t})\) is the divider function of the CNN in the CNN-ALSTM model; \({\nabla }_{{\theta }_{t}}\) is the partial derivative of \(J({\theta }_{t})\) and \(\theta ,\) \({m}_{t}\) and \({v}_{t}\) denotes 1st and 2nd order moment of the gradient at the time of training the CNN-ALSTM model; \({m}_{t}\) and \(\widehat{v}\) represents the deviation corrections of \({m}_{t}\) and \({v}_{t}\), that can be utilized for offsetting the variation; \({\beta }_{1}\) and \({\beta }_{2}\) indicate exponential decay rate of \({m}_{t}\) and \({v}_{t},\) \(\varepsilon\) is the correction variable used for ensuring that the denominator is not zero; \(t\) represents the number of iterations involved in the training process of the CNN-ALSTM model. Utilizing Eq. (17) into Eqs. (19) and (21) provides,

$${\theta }_{t+1}={\theta }_{t}-\frac{\eta }{\sqrt{\widehat{v}+\varepsilon }}\left(\frac{{\beta }_{1}{m}_{t-1}}{1-{\beta }_{1}^{t}}+\frac{(1-{\beta }_{1}){g}_{t}}{1-{\beta }_{1}^{t}}\right)$$
(22)

The \({m}_{t-1}/1-{\beta }_{1}^{t}\) presents the deviation correction estimation of the momentum vector at an earlier moment of the CNN-ALSTM model that can be attained by substituting \({m}_{t-1}\) with:

$${\theta }_{t+1}={\theta }_{t}-\frac{\eta }{\sqrt{\widehat{v}+\varepsilon }}\left({\beta }_{1}{\widehat{m}}_{t-1}+\frac{(1-{\beta }_{1}){g}_{t}}{1-{\beta }_{1}^{t}}\right)$$
(23)

With the addition of the Nesterov momentum, the deviation correction estimation \({m}_{t}\) of the present momentum vector of the CNN-ALSTM model is straightaway utilized for replacing the deviation corrected estimates \({m}_{t-1}\) of the earlier momentum that results in the updating rule of the Nadam, as provided below.

$${\theta }_{t+1}={\theta }_{t}-\frac{\eta }{\sqrt{\widehat{v}+\varepsilon }}({\beta }_{1}\widehat{m}+\frac{(1-{\beta }_{1}){g}_{t}}{1-{\beta }_{1}^{t}}$$
(24)

The conventional momentum approach has the demerit that the learning rate remains the same in the training procedure and utilizes an individual learning rate for updating weights.

4 Experimental validation

The performance validation of the QMFOFS-HCNN technique uses four benchmark datasets: HandPD meander, HandPD spiral, voice PD, and speech PD [30] using various evaluation metrics. The metrics used for performance evaluation are accuracy, detection, and false alarm rate. The accuracy rate is defined as the proportion of observations that have been correctly classified. A detection rate is defined as an outcome where the model correctly predicts the positive class. It measures the percentage of actual positives that are correctly identified. A false alarm rate (FAR) is defined as an outcome where the model incorrectly predicts the positive class.

Figure 3 shows the FS results of the QMFOFS-HCNN system with existing methods on four data sets. The results showed that the QMFOFS-HCNN technique has shown an effectual outcome by selecting the least number of features. For instance, under HandPD Spiral dataset, the QMFOFS-HCNN technique has elected three features while the modified grasshopper optimization algorithm (MGOA) [31], modified grey wolf optimizer (MGWO) [32], optimized cuttlefish algorithm (OCFA) [30], and improved sailfish optimization algorithm with deep learning (IFSO-DL) [33] systems have selected 5, 7, 8, and 4 features correspondingly. Similarly, under the voice PD dataset, the QMFOFS-HCNN technique has picked six features, while the MGOA, MGWO, OCFA, and IFSO-DL systems have elected 8, 9, 17, and 7 features, respectively.

Fig. 3
figure 3

FS analysis of QMFOFS-HCNN technique

Table 1 demonstrates the comparative PD detection analysis of the QMFOFS-HCNN system with existing approaches on the HandPD spiral and HandPD Meander dataset [31, 33].

Table 1 Results analysis of existing with proposed model on HandPD spiral dataset and HandPD meander dataset

Figure 4 exhibits the comparative \({{\text{accu}}}_{y}\) analysis of the QMFOFS-HCNN system with existing techniques on HandPD spiral and HandPD Meander datasets. The results show that the QMFOFS-HCNN technique has accomplished enhanced classification outcomes with higher accuracy than the other techniques on both datasets. For instance, on HandPD spiral datasets, the QMFOFS-HCNN technique has reached to maximum \(acc{u}_{y}\) of 96.35%, whereas the MGOA-KNeN, MGOA-RANDF, MGOA-DT (C4.5), MGWO-KNeN, MGWO-RANDF, MGWO-DT (C4.5), and IFSO-DL techniques have obtained minimum \({{\text{accu}}}_{y}\) values of 75.54%, 92.62%, 89.88%, 74.13%, 92.62%, 92.03%, and 93.61%, respectively.

Fig. 4
figure 4

\({{\text{Acc}}}_{y}\) analysis of QMFOFS-HCNN technique under HandPD spiral and HandPD meander datasets

Figure 5 demonstrates the comparison study of the QMFOFS-HCNN technique with recent models in terms of detection rate \({d}_{{\text{rate}}}\) on HandPD spiral and HandPD Meander datasets. The experimental values indicated that the QMFOFS-HCNN system has demonstrated improved classifier results with the maximum \({d}_{{\text{rate}}}\) values over the other techniques on both datasets. For instance, on HandPD spiral dataset, the QMFOFS-HCNN technique has offered increased \({d}_{{\text{rate}}}\) of 99.22%, whereas the MGOA-KNeN, MGOA-RANDF, MGOA-DT (C4.5), MGWO-KNeN, MGWO-RANDF, MGWO-DT (C4.5), and IFSO-DL techniques have resulted in reduced \({d}_{{\text{rate}}}\) values of 84.89%, 97.99%, 95.58%, 82.54%, 94.99%, 93.65%, and 98.04%, respectively.

Fig. 5
figure 5

Detection rate analysis of QMFOFS-HCNN technique under HandPD spiral and HandPD meander datasets

Figure 6 provides the accuracy and loss graph analysis of the QMFOFS-HCNN system under HandPD spiral and HandPD meander datasets. The outcomes shown that the accuracy value tends to be higher, and the loss value tends to decrease with an increase in epoch count. It is also observed that the training loss is low, and validation accuracy is maximum on HandPD spiral and HandPD meander datasets. Table 2 demonstrates the comparative PD detection result analysis of the QMFOFS-HCNN technique with existing approaches on the speech PD and voice datasets. Figure 7 depicts the comparative \({{\text{accu}}}_{y}\) analysis of the QMFOFS-HCNN technique with existing methods on speech PD and voice PD datasets. The results showed that the QMFOFS-HCNN system has accomplished enhanced classification outcomes with higher accuracy than the other techniques on both datasets. For instance, on the speech PD dataset, the QMFOFS-HCNN technique has reached to maximum \({{\text{accu}}}_{y}\) of 98.50%, whereas the MGOA-KNeN, MGOA-RANDF, MGOA-DT (C4.5), MGWO-KNeN, MGWO-RANDF, MGWO-DT (C4.5), and IFSO-DL approaches have obtained lesser \({{\text{accu}}}_{y}\) values of 89.69%, 95.56%, 85.53%, 92.35%, 93.64%, 90.18%, and 96.19%, correspondingly.

Fig. 6
figure 6

Accuracy and Loss analysis of QMFOFS-HCNN method under HandPD spiral and HandPD meander datasets

Table 2 Results analysis of existing with proposed model on speech PD dataset and voice PD dataset
Fig. 7
figure 7

\(Ac{c}_{y}\) analysis of QMFOFS-HCNN technique under speech PD and voice PD datasets

Figure 8 examines the comparison study of the QMFOFS-HCNN approach with recent models in terms of detection rate \({d}_{{\text{rate}}}\) on speech PD and voice PD datasets. The experimental values indicated that the QMFOFS-HCNN system had outperformed higher classifier results with higher \({d}_{{\text{rate}}}\) values over the other techniques on both datasets. For instance, on speech PD dataset, the QMFOFS-HCNN approach has offered increased \({d}_{{\text{rate}}}\) of 99.98%, whereas the MGOA-KNeN, MGOA-RANDF, MGOA-DT (C4.5), MGWO-KNeN, MGWO-RANDF, MGWO-DT (C4.5), and IFSO-DL systems have resulted in lower \({d}_{{\text{rate}}}\) values of 96.56%, 90.17%, 97.21%, 99.95%, 94.28%, 99.16%, and 99.98%, correspondingly.

Fig. 8
figure 8

Detection rate analysis of QMFOFS-HCNN technique under speech PD and voice PD datasets

Figure 9 offers the accuracy and loss graph analysis of the QMFOFS-HCNN methodology under speech PD and voice PD Datasets. The outcomes outperformed that the accuracy value tends to increase, and the loss value tends to reduce with a higher epoch count. It can also be observed that the training loss is lesser, and validation accuracy is high on speech PD and voice PD Datasets. From these results, it is ensured that the proposed model is superior to other methods of PD classification.

Fig. 9
figure 9

Accuracy and Loss analysis of QMFOFS-HCNN technique under speech PD and voice PD datasets

The study compared the QMFOFS-HCNN technique to existing methods for Parkinson's disease (PD) detection using four benchmark datasets. The QMFOFS-HCNN technique demonstrated several strengths: QMFOFS-HCNN selected fewer features while maintaining or improving classification performance, reducing data dimensionality, thus, providing Efficient Feature Selection. It consistently outperformed existing methods in accuracy, enhancing PD patient classification and, thus, high accuracy. It achieved higher detection rates, crucial for accurate PD diagnosis, and exhibited lower FAR, reducing the risk of misdiagnosis. It consistently outperformed existing methods across various datasets, demonstrating its versatility. The model showed increasing accuracy and decreasing loss during training, indicating effective learning.

5 Conclusion

This study has developed a novel QMFOFS-HCNN method for detecting and classifying PD. It aimed to identify the optimal feature subsets and enhance the classification accuracy of the PD diagnosis. The proposed QMFOFS-HCNN technique encompasses several processes, such as QMFO-based feature selection, CNN-ALSTM based classification, and Nadam-based hyperparameter tuning. Using QMFOFS classify Nadam techniques helps to boost the PD classification outcomes effectively. The experimental validation takes place using the benchmark datasets, and the results are assessed under several aspects. The comparative results indicated the QMFOFS-HCNN technique’s promising performance in several evaluation metrics. Therefore, the QMFOFS-HCNN technique can be utilized as a proficient tool for PD detection and classification. It offers a proficient PD detection and classification tool, contributing to medical diagnostics. However, it is important to acknowledge some limitations of this study. Firstly, the performance evaluation was conducted on benchmark datasets, and the real-world applicability of the technique may require further validation with diverse and more extensive datasets. Secondly, while the QMFOFS-HCNN method shows promise, it may benefit from additional optimization and fine-tuning to achieve even higher accuracy levels.

In the future, outlier detection techniques can be incorporated into the QMFOFS-HCNN technique to improve the classifier results and robustness. Additionally, exploring the integration of real time data collection and analysis for PD diagnosis could improve the practicality and timeliness of the method. Overall, this study lays the foundation for more advanced and effective PD diagnostic tools, and further refinement and validation in clinical settings will be essential for its successful implementation.