A novel sample and feature dependent ensemble approach for Parkinson’s disease detection

Ali, Liaqat; Chakraborty, Chinmay; He, Zhiquan; Cao, Wenming; Imrana, Yakubu; Rodrigues, Joel J. P. C.

doi:10.1007/s00521-022-07046-2

A novel sample and feature dependent ensemble approach for Parkinson’s disease detection

S.I.: AI-based e-diagnosis
Open access
Published: 11 March 2022

Volume 35, pages 15997–16010, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

A novel sample and feature dependent ensemble approach for Parkinson’s disease detection

Download PDF

Liaqat Ali¹,
Chinmay Chakraborty ORCID: orcid.org/0000-0002-4385-0975²,
Zhiquan He³,
Wenming Cao³,
Yakubu Imrana⁴ &
…
Joel J. P. C. Rodrigues^5,6

2528 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Parkinson’s disease (PD) is a neurological disease that has been reported to have affected most people worldwide. Recent research pointed out that about 90% of PD patients possess voice disorders. Motivated by this fact, many researchers proposed methods based on multiple types of speech data for PD prediction. However, these methods either face the problem of low rate of accuracy or lack generalization. To develop an approach that will be free of these issues, in this paper we propose a novel ensemble approach. These paper contributions are two folds. First, investigating feature selection integration with deep neural network (DNN) and validating its effectiveness by comparing its performance with conventional DNN and other similar integrated systems. Second, development of a novel ensemble model namely EOFSC (Ensemble model with Optimal Features and Sample Dependant Base Classifiers) that exploits the findings of recently published studies. Recent research pointed out that for different types of voice data, different optimal models are obtained which are sensitive to different types of samples and subsets of features. In this paper, we further consolidate the findings by utilizing the proposed integrated system and propose the development of EOFSC. For multiple types of vowel phonations, multiple base classifiers are obtained which are sensitive to different subsets of features. These features and sample-dependent base classifiers are integrated, and the proposed EOFSC model is constructed. To evaluate the final prediction of the EOFSC model, the majority voting methodology is adopted. Experimental results point out that feature selection integration with neural networks improves the performance of conventional neural networks. Additionally, feature selection integration with DNN outperforms feature selection integration with conventional machine learning models. Finally, the newly developed ensemble model is observed to improve PD detection accuracy by 6.5%.

A survey on ensemble learning

Article 30 August 2019

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A Survey on ensemble learning under the era of deep learning

Article 02 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Parkinson’s disease (PD) is one of the serious neurological syndromes that exhibit a chronic neurological disorder caused by progressive degeneration and death of dopaminergic neurons. These neurons are responsible for coordinating movement at the level of muscular tone [43]. People with PD show different symptoms including rigidity, tremor, slow movements, impaired voice, and poor balance [26, 28, 40, 52]. Based on these symptoms, different automated approaches have been developed for PD detection [17, 24, 30, 33, 41, 45, 57]. However, PD detection through vocal signal processing is more beneficial because of the two main reasons 1- literature suggests that around 90% of patients having PD possess voice impairment issues [48]. 2- voice disorders are considered as the early symptoms of PD, hence, PD detection through voice signals is a promising way for the early prediction of PD [7, 43]. Additionally, voice recording-based PD detection enables home-based tele-monitoring and tele-diagnosis of PD [53]. Thus, motivated by the above-discussed factors, in this paper an attempt has been made to develop a novel learning model for early detection of PD through acoustic signal processing and machine learning methods.

Recently, various data mining and machine learning researchers have developed different automated systems for the detection of PD based on voice or speech signals, physiological signals, wearable sensors for gait analysis, handwriting movement analysis [2, 24, 27, 32, 40, 48, 49, 55]. Owing to the above-mentioned facts, it is quite natural to detect PD based on the vocal dataset. Sarkar et al. obtained a balanced dataset by collecting multiple types of voice samples from 68 subjects [53]. Dysphonia-related features were extracted from the voice signals by using Praat software [21]. They highlighted the problem of the subject overlap in data having many voice or speech samples per subject and proposed a novel cross-validation scheme, i.e., leave one subject out of cross-validation (LOSO CV). Under the LOSO CV, they obtained an accuracy of 55% by using k nearest neighbour (KNN) and SVM models. Onwards, many machine learning researchers utilized the data collected by Sarkar et al. [53] and tried to improve the PD detection accuracy by evaluating the feasibility of various features extraction and features selection methods are used [11, 16, 18, 19, 23, 32, 37, 39, 42, 46, 47]. For example, Canturk et al. utilized a machine learning system consisting of four features selection algorithms and five different classifiers but they were able to achieve an accuracy score of 57.5% [23]. Recently, Benba et al. explored the feasibility of human factor cepstral coefficients (HFCCs) features and utilized the multiple types of vowel phonatons data only. For classification purposes, they developed an SVM model with different types of kernels and obtained PD detection accuracy of 87.5% accuracy using LOSO CV [19]. Recently, Rahman et al. [50] collected a relatively bigger dataset and showed that high PD detection accuracy on a bigger dataset is a challenging task. Most recently, Ali et al. showed that instead of selecting features from multiple types of voice data, improved performance can be obtained by selecting samples before feature selection [13]. That is, feature selection from data having only one type of samples is a better method than feature selection from data having heterogeneous nature, i.e., having multiple types of hybrid samples.

After critically analyzing the results published by Ali et al. [13], we noticed that improved performance in case of sample selection before feature selection is due to the fact that different types of sustained vowel phonations are sensitive to different subsets of features and different models, consequently, different samples or vowel phonations will have different optimal subsets of features and different optimal models. Hence, if we construct one model and try to obtain one global optimal subset of features for data having multiple types of samples together, we will notice degradation in performance. Thus, in this paper, we exploit these findings and propose a novel ensemble method namely EOFSC (Ensemble model with Optimal Features and Sample Dependant Base Classifiers). We consider the multiple types of dataset containing three different types of sustained vowel phonations, i.e., vowel “a”, “o” and “u”. During the first set of experiments, we explore the feasibility of integrating feature selection through F-score based statistical model with deep neural network (DNN) and other conventional machine learning models (including conventional ensemble models like Adaboost and random forest). Through the numerical experiments, we further consolidate the findings of Ali et al. [13], consequently, three different DNN configurations with different optimal subsets of features are obtained for the three different types of vowel phonation data. In the second step, we utilize the three types of base classifiers (the DNN models) which are sample and feature dependent. Finally, the three developed base models/classifiers are integrated to construct the EOFSC model using majority voting criterion for evaluating the final decision of the newly developed model. Experimental results proved the effectiveness of the feature-driven DNN integrated system (F-DNN) over conventional DNN and all the other similar hybrid/integrated systems. Simulation results showed that the proposed ensemble model further enhances the performance of the F-DNN integrated system by 6.5%. Figure 1 provides more details about EOFSC.

The main contributions of this study are summarized as follows:

1.
Feature selection at the input level of DNN has not been well studied [54]. Recently, Taherkhani et al. in [54] found out that feature selection coupled with the feature extraction capability of deep learning improves the performance of deep learning models. In this paper, we consolidate the fact by cascading F-score based on a statistical model for feature selection with a DNN model for PD detection using multiple types of phonations datasets.
2.
The performance of the F-DNN method was compared with conventional DNN, ten similar integrated/hybrid intelligent systems based on conventional machine learning models (including conventional ensemble models like Adaboost and random forest), and many renowned previous methods. Experimental results validate the effectiveness of the feature selection integration with DNN based on the three commonly used evaluation criteria, i.e., accuracy, ROC curves and AUC.
3.
To further improve the PD detection performance, this paper proposes a novel ensemble model and validates its effectiveness by demonstrating performance improvement experimentally on two different voice datasets.

In the remaining of the manuscript, we will briefly discuss the multiple types of vowel phonation data and the proposed method in Sect. 2. Section 3, discusses experimental results. Sect. 4 deals with the comparative study and the last section presents the conclusion of the whole study.

2 Materials and methods

2.1 Multiple types of vowel phonation datasets

In this paper, we use two different multiple types of vowel phonations datasets. The first dataset is a benchmark publicly available dataset known as multiple types of speech data for Parkinson’s disease [29]. The dataset was collected and made public by Sarkar et al. [53]. The dataset was distributed in two parts. The first part was given the name training database and the second part was given the name testing database. But, it is worth discussing that the name training database does not mean that this data will be utilized only for training purposes, and the testing database data will be utilized for testing purposes. The main objective was to simulate a proposed method using a training database and re-simulate it once again by training the model through a training database and testing it through a testing database. This will validate the effectiveness of any proposed method more robustly. The training database was constructed by recording multiple types of speech samples, i.e., words, sentences, vowels, and numbers. Thus, from each subject 26 samples were recorded and from each sample, a set of 26 acoustic features were extracted through Praat acoustic software [21]. However, recent studies reported that better performance can be obtained by utilizing the multiple types of vowels only. Furthermore, many of the extracted features are irrelevant for the speech data [13]. Hence, in this paper, we followed the same methodology and we utilized multiple types of vowel phonations for each subject. That is for each subject three vowel phonations were considered, i.e., vowel “a”, “o” and “u”. The vowel “a” data is denoted by $D_{1}$, the vowel “o” data is denoted by $D_{2}$ and the vowel “u” data is denoted by $D_{3}$. A detail of the extracted set of 26 features and their statistical parameters are tabulated in Table 1. From the testing database, the same set of features was extracted, however, the testing database contains three replications of vowel “a” and three replications of vowel “o”. Moreover, the training database contains data of 40 subjects (20 healthy subjects and 20 PD patients) and the testing database accumulates data of 28 PD patients.

The second dataset is also multiple types of voice phonations dataset. The dataset was collected by Rahman et al. [50] in Lady Reading Hospital (Medical Teaching Institution), Pakistan. The dataset is a relatively bigger dataset and was collected from 160 subjects out of which 60 subjects belong to PD class and the remaining 100 subjects are from the healthy class. From each subject three vowel phonations were recorded, i.e., vowel “a”, “o” and “u”. The vowel “a” data is denoted by $D_{1}$, the vowel “o” data is denoted by $D_{2}$ and the vowel “u” data is denoted by $D_{3}$. Hence, the dataset consists of $160 \times 3 = 480$ multiple types of voice phonations out of which 180 voice recordings are from PD class and 300 voice recordings are from healthy class. From sample we extracted 18 time-frequency features and 26 Mel frequency cepstral coefficient-based features, i.e. (MFCC0, MFCC1,..., MFCC12) and their derivatives (Delta0, Delta1,..., Delta12).

Table 1 Statistical parameters of the extracted set of features. m: mean, std: standard deviation, $ Tr_{h} $: healthy of training database, $ Tr_{pd} $: PD of training database, $ Ts_{pd} $: PD of testing database

Full size table

2.2 The proposed methods

In classification systems, feature selection methods are used to mine the most relevant features in a feature space [3, 4, 6, 9, 10, 14, 51]. This proposes the use of feature ranking through F-score. The F-score based on feature ranking model measures the discrimination of two sets of real numbers [25]. For a given dataset with $I_{j}$, $j=1,2,3,......,n$ instances, if the number of instances related to healthy subjects are $m_{+}$ and number of instances of PD patients are $m_{-}$, then the F-score of the kth feature is defined as

$$\begin{aligned} S_1= & {} \frac{1}{(m_{+}-1)} \sum _{j=1}^{m_{+}}(\overline{I}_{j,k}^{(+)}-\overline{I}_{k}^{(+)})^{2} \nonumber \\ S_2= & {} \frac{1}{(m_{-}-1)} \sum _{j=1}^{m_{-}}(\overline{I}_{j,k}^{(-)}-\overline{I}_{k}^{(-)})^{2} \nonumber \\ F(k)= & {} \frac{(\overline{I}_{k}^{(+)} - \overline{I}_{k})^{2} + ( \overline{I}_{k}^{(-)} - \overline{I}_{k})^{2}}{S_1 + S_2}, \end{aligned}$$

(1)

where $\overline{I}_{k}$, $\overline{I}_{k}^{+}$, $\overline{I}_{k}^{-}$ are the average of the $k\mathrm{th}$ feature of the whole, positive and negative datasets, respectively. Moreover, $\overline{I}_{j,k}^{(-)}$ is the $k\mathrm{th}$ feature of the $j\mathrm{th}$ negative instance. And $\overline{I}_{j,k}^{(+)}$ is the $k\mathrm{th}$ feature of the $j\mathrm{th}$ positive instance. Additionally, in (1), the numerator formulates the discrimination between the positive and negative sets while the denominator denotes the one within each of the two sets [5]. The discriminative power of a feature is proportional to the F-score value. After features ranking by the F-score based on a statistical model, we need to decide a threshold for the F-score, i.e., those features will be selected which have higher F-score than the threshold. In this study, we apply hybrid grid search algorithm (HGSA) to search the optimal threshold that will result in an optimal subset of the extracted set of features. The obtained features subset is supplied to DNN model for the purpose of classification.

DNN model’s performance has a dependence on its hyper-parameters configuration. An inappropriate network configuration will result in poor performance. Hyper-parameters are the variables that determine a neural network architecture or configuration. To optimize the neural network architecture, two important hyper-parameters are considered in this paper, i.e., the number of hidden layers (L) and the width of each hidden layer, i.e., the number of neurons in each hidden layer ($W_{h}$). It is worth mentioning that two types of neural networks are discussed in literature, i.e., artificial neural networks (ANNs) or shallow neural networks and DNNs. Shallow neural network or ANN refers to a neural network that uses only one hidden layer [8]. When we optimize ANN or shallow neural network, we can only tune the number of neurons in its hidden layer, i.e., the width of the hidden layer. We cannot tune or optimize the number of hidden layers as there is no concept of depth in shallow neural networks or ANN. However, DNNs refer to neural networks that use multiple hidden layers and are trained using new methods [1, 12, 34, 35]. More precisely, neural networks with many-layer structure, i.e., two or more hidden layers are called deep neural networks [44]. Before utilizing a neural network for classification tasks, it is trained on training data. During the training process, the neural network learns a fitting function known as hypothesis $h_{\alpha }(x)$ from the patterns of the training data. The values of the parameters of the hypothesis are evaluated by minimizing the objective function given as follows

$$\begin{aligned} J(\alpha ) = \frac{1}{m} \sum _{j=1}^{m}{\text {cost}}(h_\alpha (x^{(j)}), y^{j}), \end{aligned}$$

(2)

where m stands the number of training samples and $\alpha \in \mathbb {R}^{d}$ represents the neural network parameters. To solve (2), we used IBFGS algorithm which is an optimizer in the family of quasi-Newton methods. After the training phase, the performance of the trained network is checked by applying the validation or testing data. This is regarded as a neural network hyper-parameter optimization problem, i.e., we need to search for that hyper-parameter configuration $B_{\lambda }$ which will yield maximum generalization performance or minimum validation loss for the LOSO CV. Thus, the main objective of a hyperparameter optimization problem for LOSO CV on a dataset having $N_{S}$ number of subjects, is to find such an optimal model or hyperparameters, i.e., $\lambda $ that will minimize $l(\lambda )$. This fact can be formulated as follows.

$$\begin{aligned} l(\lambda ) = \frac{1}{N_{S}} \sum _{i=1}^{N_{S}}\mathcal {L}(B_{\lambda }, D^{i}_\text {train}, D^{i}_\text {valid}), \end{aligned}$$

(3)

where $D_\mathrm{train}$ denote training data and $D_\mathrm{valid}$ denote testing or validation data during the i-th fold of LOSO CV and the function $\mathcal {L}$ yields the loss obtained during each fold of the LOSO CV.

Equation (3) is a formulation for the optimization of the neural network model only. However, we also need to search for the subset of features that will ensure minimum loss, which is another optimization problem. Thus, the two optimization problems are merged or hybridized as one. The first optimization problem is searching for an optimal threshold for the F-score based on a statistical model which will result in an optimal subset of features having a size of n features. While the second optimization problem is the optimization of neural network configuration. Hence, (3) can be modified as follows.

$$\begin{aligned} l(\lambda , n) = \frac{1}{S} \sum _{i=1}^{S}\mathcal {L}(B_{\lambda }, n, D^{i}_\text {train}, D^{i}_\text {valid}). \end{aligned}$$

(4)

In (4), $\lambda $ denotes the hyper-parameters of the neural network model while n is the parameter of the F-score based on a statistical model. To solve the newly developed optimization formula given in (4), we utilize HGSA algorithm. The algorithm arranges the n, $W_{1}$ and $W_{2}$ as coordinates of a point on a grid. That is each point on the grid is represented by $(n, W_{1}, W_{2})$ where $W_{1}$ and $W_{2}$ denote the number of neurons in the first hidden layer and the second hidden layer, respectively. Thus, the grid is hybrid in nature as each point on the grid merges the hyper-parameters of the two models as coordinates of one point. For each experiment or each type of dataset, the search algorithm will return an optimal point on the hybrid grid. The coordinates of the optimal point on the grid denote the optimal subset of features and the optimal hyperparameters of DNN that produce optimized performance. At the first step, HGSA will yield three subsets of features and three different DNN configurations for the three different types of vowel phonation data. It is important to note that for the other ten similar hybrid intelligent systems that use ten conventional machine learning models, the same algorithm is used. However, the DNN model is replaced by one of the ten conventional machine learning models, hence, the $\lambda $ will denote the hyperparameter(s) of the conventional machine learning model under consideration.

After constructing three different DNN models corresponding to the three different types of vowel phonations, an ensemble model, i.e., EOFSC is developed. The EOFSC model integrates the three base classifiers (i.e., DNN models) which are sample and features dependent, i.e., for different types of dataset (having a specific type of sample for each subject), the corresponding base classifier has different types of an optimal subset of features. These three types of base classifiers are ensembled and a voting criterion is utilized to evaluate the final prediction of the developed EOFSC model. The working of the proposed EOFSC method is depicted in Figure 1.

3 Simulation results and discussion

In this section, a total of seven experiments are performed. The first four experiments are performed using dataset 1 and the remaining on dataset 2. The first experiment is simulated on data that contains multiple types of vowel phonations for each individual. While the second experiment was designed for datasets having one type of vowel phonation for each subject. For both types of experimental settings, the F-DNN method was applied. To further validate the performance of the developed method, the third experiment was performed on the test database following the approach of Sarkar et al. [53] and Benba et al. [19]. In order to validate the effectiveness of the feature selection integration with DNN, similar 10 integrated systems were also developed in experiment four and compared with F-DNN. In the last experiment, the three DNN models which are sample and features dependent were utilized to construct the proposed ensemble model and its performance was compared with the three base classifiers, a further improvement of 5% was observed for dataset 1. Similar experiments are also carried out for dataset 2. For validation purposes, leave one subject out (LOSO) cross-validation is performed and for evaluation purposes, six different types of evaluation metrics are utilized including ROC curves, area under the curve (AUC), specificity, accuracy, sensitivity, and Matthews correlation coefficient (MCC).

3.1 Experiments on the first multiple types of vowel phonations dataset

3.1.1 Experiment No 1: Performance of the F-DNN method using LOSO CV on data containing multiple vowel phonations data for each individual

In this experiment, we consider multiple types of vowel types of vowel phonations for each individual. The dataset is denoted by $D_{m}$. During the validation process for the F-DNN model, LOSO CV is utilized. In LOSO CV, multiple samples of a subject are used for validation and remaining data is used to train the model. In this experiment, we utilized F-score based on a statistical model to eliminate irrelevant features. The selected subset of features is given at the input of the DNN model for classification. The evaluation measures at different subsets of features are tabulated in Table 2. It can be observed from the table that for the optimal subset of features with 25 features, i.e., for $n=25$, PD detection accuracy of 87.5% has been obtained. The optimal subset of features for $n=25$ ranked by F-score and searched by HGSA algorithm includes all $F_{i}$ where i is a set of integers containing all integers from 1 to 26 except 18. The performance of multiple samples per subject dataset $D_{m}$ at a different subset of features is depicted in Fig. 2.

Table 2 Results of the proposed method using LOSO CV for multiple types of samples per subject data. L: Total number of layers in DNN including input and output layer, $W_{1}$: Width of first hidden layer, $W_{2}$: Width of second hidden layer, ACC[URACY](%), Sen[sitivity](%) and Spec[ificity](%)

Full size table

To validate the effectiveness of the F-DNN method on the multiple types of vowel phonations data, we also checked the performance of the conventional neural network on the same dataset. The conventional neural network model was optimized using a grid search algorithm. It was pointed out that best performance was obtained with an accuracy of 77.5%, sensitivity of 65%, specificity of 90% and MCC value of 0.568. The best performance was achieved with an optimized neural network configuration having 9 neurons in the first hidden layer and 30 neurons in the second hidden layer. Hence, it is evidently clear that the F-DNN method improves the strength of the conventional neural network by 10% for multiple samples per subject data.

3.1.2 Experiment No 2: performance of the proposed method using LOSO CV on data having one sample per subject

In this experiment, we utilize one sample per subject data. Thus, we construct three different datasets from the multiple samples per subject data. Each dataset is independently processed by simulating the F-DNN method. Table 3 reports the performance of each of the datasets at different subsets of features and different network configurations. From the table, it is clear that the highest PD detection accuracy of 90% was obtained for $D_{2}$ i.e., vowel “o” at an optimal subset of features with size $n=7$ (Table 3). Thus, the findings of this study consolidate the findings of [53] which pointed out that vowel “o” samples contain complementary information for PD, compared to other types of samples. Additionally, the simulation results validate the importance of optimization of neural networks for each subset of features through HGSA. As it is evidently clear that if optimally configured neural network is not utilized, we may obtain poor performance with an optimal subset of features. Comparing the optimal subsets of features obtained in experiment 2 with the optimal subset of features produced in experiment 1, it is evidently clear those different types of samples are sensitive to different subsets of features and different DNN configurations. The performance of each dataset at different subset of features is depicted in Fig. 3.

Table 3 Results of the proposed method using LOSO CV for datasets having one type of vowel phonation for each individual. $D_j$, j=1,2,3: Type of dataset used. $D_{1}$: Dataset containing vowel “a” phonation for each subject, $D_{2}$: Dataset containing vowel “o” phonation for each subject, $D_{3}$: Dataset containing vowel “u” phonation for each subject

Full size table

From Fig. 3, it is evident that an accuracy of 72.5% is obtained for $D_{1}$ on full features set, i.e., when the only a conventional neural network is used. While the accuracy of 87.5% is obtained when the F-DNN integrated method is applied. Similarly, an accuracy of 80% is obtained for $D_{2}$ with the conventional neural network while an accuracy of 90% is achieved with the F-DNN integrated method. Finally, an accuracy of 67.5% is obtained for $D_{3}$ using conventional neural network while an accuracy of 87.5% is achieved when the F-DNN integrated method is applied. With that, we can conclude that the F-DNN integrated method improves the strength of conventional DNN.

3.1.3 Experiment No 3: performance of the proposed method using LOSO validation on testing database

In this experiment, the performance of the F-DNN integrated method is also validated on the testing database while training the F-DNN model on the data of the training database. After the training process, the performance of the F-DNN method is evaluated by performing LOSO validation on the testing database and it resulted in 100% accuracy. The evaluation measures on a full set of features and a reduced subset of features are reported in Table 4. It is worth discussing that the table does not report specificity and MCC, it is because of the fact that the testing database has no healthy subject.

Table 4 Performance of the F-DNN method for LOSO validation on testing database

Full size table

3.1.4 Experiment No 4: performance comparison of the F-DNN method with other similar integrated methods

To validate the strength of the F-DNN method, we developed ten similar integrated intelligent systems by utilizing ten conventional machine learning models. Each of the developed integrated systems uses same F-score based on a statistical model for features ranking, a machine learning model for prediction, and HGSA for searching for an optimal subset of features and an optimized version of the machine learning model. The evaluation measures for each learning system are given in Table 5. It is evident from the table, that the highest accuracy of 90% is obtained by the F-DNN method.

Table 5 Performance comparison of the proposed method with other similar methods that use conventional machine learning model instead of deep learning model. C/E/K/$W_{1}$: C is hyperparameter of SVM (Lin), SVM (RBF), SVM (Poly), and LR, E: hyperparameter of adaboost and RF model denoting the number of estimators, $W_{1}$: number of neurons in first hidden layer of DNN, G/D/$W_{2}$: G is gamma hyperparameter of SVM (RBF), D is hyperparameter of RF denoting depth of the base estimator, $W_{2}$: number of neurons in the second hidden layer of DNN

Full size table

To further validate the strength of the proposed F-DNN method against the conventional DNN, we utilized two more evaluation metrics namely ROC curve and AUC. In machine learning, the quality of the output of a developed learning model is usually gauged by using ROC curves, a model having more area under the ROC curve, i.e., AUC is considered to be a more robust model. The ROC curves of the proposed method and conventional DNN are drawn in Figure 4. It is evident from the ROC curves, that the F-DNN has a better ROC curve owing to its largest AUC of 0.945.

3.1.5 Experiment No 5: performance of the EOFSC ensemble model

In this experiment, we utilize the models from experiment 2 that yielded optimized performance for each type of data. As there are three types of sustained phonation datasets, hence, three models with different subsets of features and DNN configurations were obtained. It can be seen that the configurations of the three models (the base classifiers for our proposed ensemble model) are feature and sample-dependent. In this experiment, we integrate the three models and utilize the voting criterion for evaluating the final prediction of the developed EOFSC ensemble model. From experiment 2, it can be noticed that the first DNN model with $W_{1}=1$ and $W_{2}=18$ yielded PD detection accuracy of 87%. The second DNN with $W_{1}=5$ and $W_{2}=4$ yielded PD detection accuracy of 90% and the third DNN model with $W_{1}=20$ and $W_{2}=6$ also yielded PD detection accuracy of 87%. However, when the proposed EOFSC method was developed, an improved performance of 95% is obtained, which is 5% percent further better than the three base classifiers or models. Moreover, the proposed ensemble model yielded a sensitivity of 100% and specificity of 90%, hence, both these metrics also observed an improvement of 5% compared to the F-DNN integrated system.

3.2 Experiments on the second multiple types of vowel phonations dataset

3.2.1 Experiment No 6: performance of the proposed method using LOSO CV on data having one sample per subject

In this experiment, similar to experiment no 2, we utilize one sample per subject data resulting in three different datasets from the multiple samples per subject data. The F-DNN method is developed for each of the datasets. In accordance with dataset 1, the highest PD detection accuracy of 87.5% was obtained for $D_{2}$ i.e., vowel “o” at $n=10$. Again it has been shown that different types of samples are sensitive to different subsets of features and different DNN configurations.

3.2.2 Experiment No 7: performance comparison of the F-DNN method with other similar integrated methods using dataset 2

Similar to experiment no3 for dataset 1, in this experiment, we developed ten similar integrated intelligent systems by utilizing ten conventional machine learning models. For dataset 2, different evaluation measures for each method are given in Table 6. It is evident from the table, that the highest accuracy of 87.5% is obtained by the F-DNN method. From the table, it is clear that the conventional DNN model results in 75% accuracy. Again, the results for the second dataset also validate the effectiveness of feature selection integration with the DNN model.

Table 6 Performance comparison of the proposed method with other similar methods that use conventional machine learning models for dataset 2. C/E/K/$W_{1}$: C is hyperparameter of SVM (Lin), SVM (RBF), SVM (Poly) and LR, E: hyperparameter of adaboost and RF model denoting the number of estimators, $W_{1}$: number of neurons in first hidden layer of DNN, G/D/$W_{2}$: G is gamma hyperparameter of SVM (RBF), D is hyperparameter of RF denoting depth of the base estimator, $W_{2}$: number of neurons in the second hidden layer of DNN

Full size table

For the second dataset, we also plotted ROC curves for the proposed F-DNN method and the conventional DNN. The ROC curves of the proposed method and conventional DNN are drawn in Figure 5. It is evident from the ROC curves, that the F-DNN has a better ROC curve owing to its largest AUC of 0.893.

3.2.3 Experiment No 8: performance of the EOFSC ensemble model

In this experiment, we utilize the models from experiment no. 7 that yielded optimized performance for each type of data. For each type of vowel phonation dataset, a corresponding optimal network model is developed. The proposed EOFSC ensemble model is developed for dataset 2 by integrating the three models and utilizing the voting criterion for evaluating the final prediction. From experiment 7, it can be noticed that the feature selection integration with DNN improved PD detection accuracy from 75% to 87.5%. The proposed EOFSC method was developed by ensembling the F-DNN models for $D_{1}$, $D_{2}$ and $D_{3}$, resulting in an improved performance of 93.75% of PD detection accuracy. The proposed ensemble further improves the performance of F-DNN models by 6.25%. Moreover, the proposed ensemble model yielded sensitivity of 100%, specificity of 90% and MCC of 0.878. Thus, experimental results show increase in sensitivity from 91.16% to 100%, specificity from 85% to 90%.

3.3 Experiment No 9: independent testing

Recently published work has shown that high accuracy under cross-validation is an easy job, especially if the datasets are small. However, maintaining such kind of high performance during independent testing is a challenging task. Moreover, results during independent testing are more reliable and practical. Therefore, for more practical validation, we carried out an independent testing as well. We trained our model using the data of the first of the second datasets and tested it using the data of the first dataset (20 PD and 20 healthy). It was pointed out that optimal performance of 70% accuracy was obtained for vowel “o” voice phonations. On the other hand, the proposed EOFSC ensemble approach yielded 85% of PD detection during independent testing. These results clearly highlight the importance of the proposed EOFSC ensemble approach.

4 Performance comparison of the proposed methods with previous methods

To validate the effectiveness of the proposed EOFSC Ensemble method, a comparative analysis is carried with previously published state-of-the-art methods for both the datasets. Table 7 provides a brief description of the previously reported methods on both the datasets.

Table 7 Performance comparison of the proposed method with other methods

Full size table

5 Conclusion

In this study, findings of the recently published work for PD detection based on multiple types of voice data were critically analyzed. It was pointed out that different types of voice phonations are sensitive to different subsets of features and different models. Based on these findings, the feasibility of feature selection integration with DNN was evaluated. Results show that feature selection integration with DNN models further improves their performance. Additionally, the obtained results consolidated the findings of recently published work, i.e., for each type of voice phonation, a unique subset of feature and a model was obtained.

Exploiting the above-discussed findings, a novel ensemble model, namely EOFSC was developed. The ensemble model further improved the performance of the DNN based integrated system (F-DNN) obtained under optimal phonation by 6.5%. Thus, it was observed that the proposed ensemble model shows better performance than integrated systems based on DNN and other conventional machine learning models and many renowned previous methods for PD detection based on multiple types of speech and voice or vowel phonations data. Based on the obtained results, it can be concluded that the proposed ensemble approach is a step forward in the domain of automated PD detection.

References

Adem K, Kiliçarslan S, Cömert O (2019) Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Syst Appl 115:557–564. https://doi.org/10.1016/j.eswa.2018.08.050
Article Google Scholar
Afonso LC, Rosa GH, Pereira CR, Weber SA, Hook C, Albuquerque VHC, Papa JP (2019) A recurrence plot-based approach for parkinson’s disease identification. Futur Gener Comput Syst 94:282–292. https://doi.org/10.1016/j.future.2018.11.054
Article Google Scholar
Ahmad FS, Ali L, Khattak HA, Hameed T, Wajahat I, Kadry S, Bukhari SAC et al (2021) A hybrid machine learning framework to predict mortality in paralytic ileus patients using electronic health records (ehrs). J Ambient Intell Humanized Comput 12(3):3283–3293
Article Google Scholar
Ahmed FS, Ali L, Joseph BA, Ikram A, Mustafa RU, Bukhari SAC (2020) A statistically rigorous deep neural network approach to predict mortality in trauma patients admitted to the intensive care unit. J Trauma Acute Care Surg 89(4):736–742
Article Google Scholar
Akay MF (2009) Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 36(2):3240–3247
Article Google Scholar
Akbar W, Wu Wp, Saleem S, Farhan M, Saleem MA, Javeed A, Ali L (2020) Development of hepatitis disease detection system by exploiting sparsity in linear support vector machine to improve strength of adaboost ensemble model. Mobile Information Systems
Al-Fatlawi AH, Jabardi MH, Ling SH (2016) Efficient diagnosis system for Parkinson’s disease using deep belief network. Evolutionary Computation (CEC).In: IEEE Congress on, IEEE, pp 1324–1330
Ali L, Bukhari S (2021) An approach based on mutually informed neural networks to optimize the generalization capabilities of decision support systems developed for heart failure prediction. IRBM 42
Ali L, Khan SU, Golilarz NA, Yakubu I, Qasim I, Noor A, Nour R (2019) A feature-driven decision support system for heart failure prediction based on statistical model and gaussian naive bayes. Comput Math Methods Med 2019:6314328(1)–6314328(8)
Article MATH Google Scholar
Ali L, Niamat A, Khan JA, Golilarz NA, Xingzhong X, Noor A, Nour R, Bukhari SAC (2019) An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7:54007–54014
Article Google Scholar
Ali L, Zhu C, Golilarz NA, Javeed A, Zhou M, Liu Y (2019) Reliable parkinson’s disease detection by analyzing handwritten drawings: construction of an unbiased cascaded learning system based on feature selection and adaptive boosting model. IEEE Access 7:116480–116489
Article Google Scholar
Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J Trans Eng Health Med 7:1–10
Article Google Scholar
Ali L, Zhu C, Zhou M, Liu Y (2019) Early diagnosis of parkinson’s disease from multiple voice recordings by simultaneous sample and feature selection. Expert Syst Appl 137:22–28. https://doi.org/10.1016/j.eswa.2019.06.052
Article Google Scholar
Ali L, Wajahat I, Golilarz NA, Keshtkar F, Bukhari SAC (2021) Lda-ga-svm: improved hepatocellular carcinoma prediction through dimensionality reduction and genetically optimized support vector machine. Neural Comput Appl 33(7):2783–2792
Article Google Scholar
Behroozi M, Sami A (2016) A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int J Telemed Appl. https://doi.org/10.1155/2016/6837498
Article Google Scholar
Benba A, Jilbab A, Hammouch A (2016) Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int J Speech Technol 19(3):449–456
Article Google Scholar
Benba A, Jilbab A, Hammouch A (2016) Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Trans Neural Syst Rehabilitat Eng 24(10):1100–1108
Article Google Scholar
Benba A, Jilbab A, Hammouch A (2016) Voice assessments for detecting patients with Parkinson’s diseases using PCA and NPCA. Int J Speech Technol 19(4):743–754
Article Google Scholar
Benba A, Jilbab A, Hammouch A (2017) Using human factor cepstral coefficient on multiple types of voice recordings for detecting patients with Parkinson’s disease. IRBM 38(6):346–351
Article Google Scholar
Berus L, Klancnik S, Brezocnik M, Ficko M (2019) Classifying parkinson’s disease based on acoustic measures using artificial neural networks. Sensors 19(1):16
Article Google Scholar
Boersma O, Weenink D (2010) Praat: doing phonetics by computer. http://www.fon.hum.uva.nl/praat/
Cai Z, Gu J, Wen C, Zhao D, Huang C, Huang H, Tong C, Li J, Chen H (2018) An intelligent parkinson’s disease diagnostic system based on a chaotic bacterial foraging optimization enhanced fuzzy knn approach. Comput Math Methods Med. https://doi.org/10.1155/2018/2396952
Article MATH Google Scholar
Canturk I, Karabiber F (2016) A machine learning system for the diagnosis of parkinson’s disease from speech signals and its application to multiple speech signal types. Arab J Sci Eng 41(12):5049–5059
Article Google Scholar
Chen HL, Wang G, Ma C, Cai ZN, Liu WB, Wang SJ (2016) An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinson’s disease. Neurocomputing vol. 184: pp. 131–144, https://doi.org/10.1016/j.neucom.2015.07.138. roLoD: Robust Local Descriptors for Computer Vision 2014
Chen YW, Lin CJ (2006) Combining svms with various feature selection strategies. In: Feature extraction, Springer, pp 315–324
Cunningham L, Mason S, Nugent C, Moore G, Finlay D, Craig D (2011) Home-based monitoring and assessment of Parkinson’s disease. IEEE Trans Inf Technol Biomed 15(1):47–53
Article Google Scholar
Das R (2010) A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl 37(2):1568–1572
Article Google Scholar
Dastgheib Z, Lithgow B, Moussavi Z (2012) Diagnosis of Parkinson’s disease using electrovestibulography. Med & Biol Eng & Comput 50(5):483–491
Article Google Scholar
Dua D, Graff C (2019) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duffy JR (2013) Motor speech disorders: substrates, differential diagnosis, and management. Elsevier Health Sciences
Eskidere O, Karatutlu A, Unal C (2015) Detection of Parkinson’s disease from vocal features using random subspace classifier ensemble. In: Electronics computer and computation (ICECCO), 2015 Twelve international conference on, IEEE, pp 1–4
Gürüler H (2017) A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method. Neural Comput Appl 28(7):1657–1666
Article Google Scholar
Hariharan M, Polat K, Sindhu R (2014) A new hybrid intelligent system for accurate detection of Parkinson’s disease. Comput Methods Progr Biomed 113(3):904–913
Article Google Scholar
Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Imrana Y, Xiang Y, Ali L, Abdul-Rauf Z (2021) A bidirectional lstm deep learning approach for intrusion detection. Expert Syst Appl 185:115524
Article Google Scholar
Khan MM, Mendes A, Chalup SK (2018) Evolutionary wavelet neural network ensembles for breast cancer and parkinson’s disease prediction. PloS One 13(2):e0192192
Article Google Scholar
Khorasani A, Daliri MR (2014) HMM for classification of Parkinson’s disease based on the raw gait data. J Med Syst 38(12):147
Article Google Scholar
Kraipeerapun P, Amornsamankul S (2015) Using stacked generalization and complementary neural networks to predict Parkinson’s disease. In: Natural Computation (ICNC), 2015 11th international conference on, IEEE, pp 1290–1294
Li Y, Zhang C, Jia Y, Wang P, Zhang X, Xie T (2017) Simultaneous learning of speech feature and segment for classification of parkinson disease. In: e-Health Networking, Applications and Services (Healthcom), 2017 IEEE 19th international conference on, IEEE, pp 1–6
Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO et al (2009) Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans Biomed Eng 56(4):1015–1022
Article Google Scholar
Naranjo L, Pérez CJ, Campos-Roca Y, Martín J (2016) Addressing voice recording replications for Parkinson’s disease detection. Expert Syst Appl 46:286–292
Article Google Scholar
Naranjo L, Pérez CJ, Martín J (2017) Addressing voice recording replications for tracking Parkinson’s disease progression. Med & Biol Eng & Comput 55(3):365–373
Article Google Scholar
Naranjo L, Pérez CJ, Martín J, Campos-Roca Y (2017) A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Comput Methods Progr Biomed 142:147–156
Article Google Scholar
Nielsen MA (2015) Neural networks and deep learning, vol 25. Determination press San Francisco, CA
Google Scholar
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L, Farahmand M (2018) A hybrid intelligent system for the prediction of parkinson’s disease progression using machine learning techniques. Biocybern Biomed Eng 38(1):1–15
Article Google Scholar
Ozcift A (2012) Svm feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J Med Syst 36(4):2141–2147
Article Google Scholar
Parisi L, RaviChandran N, Manaog ML (2018) Feature-driven machine learning to improve early diagnosis of Parkinson’s disease. Expert Syst Appl 110:182–190
Article Google Scholar
Pereira CR, Pereira DR, Silva FA, Masieiro JP, Weber SA, Hook C, Papa JP (2016) A new computer vision-based approach to aid the diagnosis of parkinson’s disease. Comput Methods Progr Biomed 136:79–88
Article Google Scholar
Pereira CR, Pereira DR, Rosa GH, Albuquerque VH, Weber SA, Hook C, Papa JP (2018) Handwritten dynamics assessment through convolutional neural networks: an application to parkinson’s disease identification. Artif Intell Med 87:67–77. https://doi.org/10.1016/j.artmed.2018.04.001
Article Google Scholar
Rahman A, Rizvi SS, Khan A, Afzaal Abbasi A, Khan SU, Chung TS (2021) Parkinson’s disease diagnosis in cepstral domain using mfcc and dimensionality reduction with svm classifier. Mob Inf Syst. https://doi.org/10.1155/2021/8822069
Article Google Scholar
Rehman A, Khan A, Ali MA, Khan MU, Khan SU, Ali L (2020) Performance analysis of pca, sparse pca, kernel pca and incremental pca algorithms for heart failure prediction. In: 2020 International conference on electrical, communication, and computer engineering (ICECCE), IEEE, pp 1–5
Rigas G, Tzallas AT, Tsipouras MG, Bougia P, Tripoliti EE, Baga D, Fotiadis DI, Tsouli SG, Konitsiotis S (2012) Assessment of tremor activity in the Parkinson’s disease using a set of wearable sensors. IEEE Trans Inf Technol Biomed 16(3):478–487
Article Google Scholar
Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, Kursun O (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inf 17(4):828–834
Article Google Scholar
Taherkhani A, Cosma G, McGinnity T (2018) Deep-fs: a feature selection algorithm for deep boltzmann machines. Neurocomputing 322:22–37
Article Google Scholar
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271
Article Google Scholar
Vadovskỳ M, Paralič J (2017) Parkinson’s disease patients classification based on the speech signals. In: Applied machine intelligence and informatics (SAMI), 2017 IEEE 15th International symposium on, IEEE, pp. 000321–000326
Wu K, Zhang D, Lu G, Guo Z (2018) Learning acoustic features to detect Parkinson’s disease. Neurocomputing 318:102–108
Article Google Scholar
Zhang HH, Yang L, Liu Y, Wang P, Yin J, Li Y, Qiu M, Zhu X, Yan F (2016) Classification of parkinson’s disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples. Biomed Eng Online 15(1):122
Article Google Scholar
Zhang Y (2017) Can a smartphone diagnose parkinson disease? a deep neural network method and telediagnosis system implementation. Parkinson’s Dis. https://doi.org/10.1155/2017/6209703
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grants 61971290, 61771322, and 61871186 and the Fundamental Research Foundation of Shenzhen under Grant JCYJ20190808160815125.

Author information

Authors and Affiliations

Department of Electrical Engineering, University of Science and Technology, Bannu, Pakistan
Liaqat Ali
Department of Electronics & Communication Engineering, Birla Institute of Technology, Jharkhand, India
Chinmay Chakraborty
Guangdong Multimedia Information Service Engineering Technology Research Center, Shenzhen University, Shenzhen, 518000, China
Zhiquan He & Wenming Cao
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
Yakubu Imrana
College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266555, China
Joel J. P. C. Rodrigues
Covilhã Delegation, Instituto de Telecomunicações, Covilhã, Portugal
Joel J. P. C. Rodrigues

Authors

Liaqat Ali
View author publications
You can also search for this author in PubMed Google Scholar
Chinmay Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Zhiquan He
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yakubu Imrana
View author publications
You can also search for this author in PubMed Google Scholar
Joel J. P. C. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiquan He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human or animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ali, L., Chakraborty, C., He, Z. et al. A novel sample and feature dependent ensemble approach for Parkinson’s disease detection. Neural Comput & Applic 35, 15997–16010 (2023). https://doi.org/10.1007/s00521-022-07046-2

Download citation

Received: 05 August 2021
Accepted: 31 January 2022
Published: 11 March 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-022-07046-2

A novel sample and feature dependent ensemble approach for Parkinson’s disease detection

Abstract