Abstract
Purpose
Parkinson’s disease (PD) is one of the most prevalent neurodegenerative diseases in the global context. The presently available detecting process of PD is costly and labour-intensive. Along with a movement disorder, PD also affects speech differently. By causing variation in pitch, monotonicity, slurring of words, or slow speed of talking. This study uses different machine learning binary classification algorithms for the detection and classification of PD.
Methods
The publicly available Parkinson’s disease speech features dataset is imbalanced, with only 192 healthy instances compared to 564 PD instances. Synthetic Minority Oversampling Technique – Edited Nearest Neighbours (SMOTE-ENN) algorithms rectify the class imbalance by oversampling and under sampling. Thus, it results in a balanced dataset free of noisy samples. Machine learning binary classifiers, Random Forest, K-Nearest Neighbours, Support Vector Machine, Extreme Gradient Boosting, Decision Tree, and Logistic Regression are investigated.
Results
The classification algorithms have been analysed and compared based on several standard evaluation metrics. The classification model, resampling using SMOTE-ENN technique, and dimensionality reduction using principal component analysis (PCA) have been performed on the dataset to enhance the performance and prevent overfitting. The combination of SMOTE-ENN and Support Vector Machine (SVM) yields a good accuracy of 96.5%.
Conclusion
Speech features are a predictive and non-intrusive method, thus making the diagnostic process painless and straightforward. The reported results are promising to aid the diagnosis of PD so that treatment can be administered as early as possible. Thus, the primary findings are beneficial to detect PD at an early stage with optimal accuracy.
Similar content being viewed by others
Availability of data and materials
The data analyzed in this study are available here. https://tinyurl.com/n24jt4vz.
Abbreviations
- ANN:
-
Artificial neural networks
- AUC:
-
Area under the curve
- EC:
-
Electrocardiogram
- ENN:
-
Edited nearest neighbours
- FN:
-
False negative
- FP:
-
False positive
- FPR:
-
False positive rate
- KNN:
-
K-Nearest neighbours
- MFCC:
-
Mel frequency cepstral coefficient
- ML:
-
Machine learning
- PCA:
-
Principal component analysis
- PD:
-
Parkinson’s disease
- RBF:
-
Radial basis function
- ROC:
-
Receiver operating characteristics
- RF :
-
Random Forest
- SMOTE:
-
Synthetic minority oversampling technique
- SVM:
-
Support vector machine
- TN:
-
True negative
- TP:
-
True positive
References
Khoshnevis SA, Sankar R. Diagnosis of Parkinson's disease using higher order statistical analysis of alpha and beta rhythms. Biomed Signal Process Control. 2022;77. https://doi.org/10.1016/j.bspc.2022.103743.
Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T. Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson’s disease. Comput Methods Programs Biomed. 2023;234:107495. https://doi.org/10.1016/j.cmpb.2023.107495.
Di Lazzaro G, Ricci M, Al-Wardat M, Schirinzi T, Scalise S, Giannini F, Mercuri NB, Saggio G, Pisani A. Technology-Based Objective Measures Detect Subclinical Axial Signs in Untreated, de novo Parkinson’s Disease. J Parkinsons Dis. 2020;10(1):113–22. https://doi.org/10.3233/JPD-191758.
Ghaderyan P, Ghoreshi Beyrami SM. Neurodegenerative diseases detection using distance metrics and sparse coding: A new perspective on gait symmetric features. Comput Biol Med. 2020;120. https://doi.org/10.1016/j.compbiomed.2020.103736.
Farashi S. Distinguishing between Parkinson's disease patients and healthy individuals using a comprehensive set of time, frequency and time-frequency features extracted from vertical ground reaction force data. Biomed Signal Process Control 2020;62.
Lamba R, Gulati T, Al-Dhlan KA, Jain A. A systematic approach to diagnose Parkinson’s disease through kinematic features extracted from handwritten drawings. J Reli Intell Environ. 2021;7(3):253–62. https://doi.org/10.1007/s40860-021-00130-9.
Aouraghe I, Alae A, Ghizlane K, Mrabti M, Aboulem G, Faouzi B. A novel approach combining temporal and spectral features of Arabic online handwriting for Parkinson's disease prediction. J Neurosci Methods 2020;339. https://doi.org/10.1016/j.jneumeth.2020.108727.
Karaman O, Çakın H, Alhudhaif A, Polat K. Robust automated Parkinson disease detection based on voice signals with transfer learning. Expert Sys Appl. 2021;178. https://doi.org/10.1016/j.eswa.2021.115013.
Folador JP, Santos MCS, Luiz LMD, de Souza LAPS, Vieira MF, Pereira AA, de Oliveira AA. On the use of histograms of oriented gradients for tremor detection from sinusoidal and spiral handwritten drawings of people with Parkinson’s disease. Med Biol Eng Comput. 2021;59(1):195–214. https://doi.org/10.1007/s11517-020-02303-9.
Borzì L, Olmo G, Artusi CA, Fabbri M, Rizzone MG, Romagnolo A, Zibetti M, Lopiano L. A new index to assess turning quality and postural stability in patients with Parkinson's disease. Biomed Signal Process Control 2020;62. https://doi.org/10.1016/j.bspc.2020.102059.
Tjaden K. Speech and Swallowing in Parkinson’s Disease. Top Geriatr Rehabil. 2008;24(2):115–26. https://doi.org/10.1097/01.TGR.0000318899.87690.44.
Vandana VP, Darshini JK, Vikram VH, Nitish K, Kumar PP, Ravi Y. Speech Characteristics of Patients with Parkinson’s Disease-Does Dopaminergic Medications Have a Role? J Neurosci Rural Pract. 2021;12(4):673–9. https://doi.org/10.1055/s-0041-1735249.
Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. J King Saud Univ Comput Inf Sci. 2022;34(4):1060–1073. https://doi.org/10.1016/j.jksuci.2019.06.012.
Braga D, Madureira AM, Coelho L, Ajith R. Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng Appl Artif Intell. 2019;77:148–58. https://doi.org/10.1016/j.engappai.2018.09.018.
Benba A, Jilbab A, Hammouch A. Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int J Speech Technol. 2016;19:449–56. https://doi.org/10.1007/s10772-016-9338-4.
Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, Kursun O. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform. 2013;17(4):828–34. https://doi.org/10.1109/jbhi.2013.2245674.
Khaskhoussy R, Ayed YB. Automatic Detection of Parkinson’s Disease from Speech Using Acoustic, Prosodic and Phonetic Features. International Conference on Intelligent Systems Design and Applications. Springer, Cham. 2019;1181. https://doi.org/10.1007/978-3-030-49342-4_8.
Solana-Lavalle G, Rosas-Romero R. Analysis of voice as an assisting tool for detection of Parkinson's disease and its subsequent clinical interpretation. Biomed Signal Process Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102415.
Gómez-Rodellar A, Palacios-Alonso D, Ferrández Vicente JM, Mekyska J, Álvarez-Marquina A, Gómez-Vilda P. A Methodology to Differentiate Parkinson’s Disease and Aging Speech Based on Glottal Flow Acoustic Analysis. Int J Neural Syst. 2020;30(10):2050058. https://doi.org/10.1142/S0129065720500586.
Meghraoui D, Boudraa B, Merazi-Meksen T, Vilda PG. A novel pre-processing technique in pathologic voice detection: Application to Parkinson's disease phonation. Biomed Signal Process Control. 2021;68. https://doi.org/10.1016/j.bspc.2021.102604.
Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson's disease classification. Biomed Signal Process Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102452.
Polat K. A hybrid approach to Parkinson disease classification using speech signal: The combination of SMOTE and random forests. Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE. 2019. https://doi.org/10.1109/EBBT.2019.8741725.
Vuttipittayamongkol P, Elyan E. Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease. Int J Neural Syst. 2020;30(8). https://doi.org/10.1142/s0129065720500434.
Benba A, Jilbab A, Hammouch A. Voice assessments for detecting patients with Parkinson’s diseases using PCA and NPCA. Int J Speech Technol. 2016;19:743–54. https://doi.org/10.1007/s10772-016-9367-z.
Shahbakhti M, Taherifar D, Zareei Z. Combination of PCA and SVM for diagnosis of Parkinson's disease. 2013 2nd International Conference on Advances in Biomedical Engineering, Tripoli, Lebanon. 2013;137–140. https://doi.org/10.1109/ICABME.2013.6648866.
Shirvan RA, Tahami E, Voice analysis for detecting Parkinson’s disease using genetic algorithm and KNN classification method. 18th Iranian Conference of Biomedical Engineering (ICBME). IEEE. 2011;2011:278–83. https://doi.org/10.1109/ICBME.2011.6168572.
Alalayah KM, Senan EM, Atlam HF, Ahmed IA, Shatnawi HSA. Automatic and Early Detection of Parkinson’s Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method. Diagnostics. 2023;13(11):1924. https://doi.org/10.3390/diagnostics13111924.
Das R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl. 2010;37(2):1568–72. https://doi.org/10.1016/j.eswa.2009.06.040.
Xuchen Z, Yong F, Peng W. Automatically Predicting Severity of Parkinson's Disease Using Model Based on XGBoost from Speech, 2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Dalian, China. 2019;1–5. https://doi.org/10.1109/ICSPCC46631.2019.8960722.
Elreedy D, Atiya AF. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Inform Sci. 2019;505:32–64. https://doi.org/10.1016/j.ins.2019.07.070.
Demir N, Kuncan M, Kaya Y, Kuncan F. Multi-Layer Co-Occurrence Matrices for Person Identification from ECG Signals. Traitement du Signal. 2022;39(2):431–40.
Bhadra S, Kumar CJ. An insight into diagnosis of depression using machine learning techniques: a systematic review. Curr Med Res Opin. 2022;38(5):749–71.
Anusha B, Geetha P, Kannan A. Parkinson’s disease identification in homo sapiens based on hybrid ResNet-SVM and resnet-fuzzy svm models. J Intell Fuzzy Syst. 2022;43(3):2711–29.
Toma M, Wei OC. Predictive Modeling in Medicine. Encyclopedia. 2023;3(2):590–601. https://doi.org/10.3390/encyclopedia3020042.
Gupta R, Kumari S, Senapati A, Ambasta RK, Kumar P. New era of artificial intelligence and machine learning-based detection, diagnosis, and therapeutics in Parkinson’s disease. Ageing Res Rev. 2023;90:102013. https://doi.org/10.1016/j.arr.2023.102013.
Latha S, Samiappan D. Despeckling of carotid artery ultrasound images with a calculus approach. Curr Med Imaging. 2019;15(4):414–26. https://doi.org/10.2174/1573405614666180402124438.
Aşuroğlu T, Oğul H. A deep learning approach for parkinson’s disease severity assessment. Health Technol. 2022;12:943–53. https://doi.org/10.1007/s12553-022-00698-z.
Ertuğrul ÖF, Kaya Y, Tekin R, Almalı MN. Detection of Parkinson’s disease by shifted one dimensional local binary patterns from gait. Expert Syst Appl. 2016;56:156–63.
Özel E, Tekin R, Kaya Y. Implementation of Artifact Removal Algorithms in Gait Signals for Diagnosis of Parkinson Disease. Trait du Signal. 2021;38(3):587–597. https://doi.org/10.18280/ts.380306.
Singh KR, Dash S. Chapter 1 - Early detection of neurological diseases using machine learning and deep learning techniques: A review. Artificial Intelligence for Neurological Disorders, Academic Press. 2022;1–24. https://doi.org/10.1016/B978-0-323-90277-9.00001-8.
Ayaz Z, Naz S, Khan NH, Razzak I, Imran M. Automated methods for diagnosis of Parkinson’s disease and predicting severity level. Neural Comput Appl. 2023;35(20):14499–534. https://doi.org/10.1007/s00521-021-06626-y.
Xu Z, Shen D, Nie T, Kou Y. A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. J Biomed Inform. 2020;107:103465. https://doi.org/10.1016/j.jbi.2020.103465.
Lamba R, Gulati T, Alharbi HF, Jain A. A hybrid system for Parkinson’s disease diagnosis using machine learning techniques. Int J Speech Technol. 2022;25(3):583–93. https://doi.org/10.1007/s10772-021-09837-9.
Gunduz H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access. 2019;7:115540–51.
Pramanik M. Pradhan R, Nandy P, Qaisar SM, Bhoi AK. Assessment of Acoustic Features and Machine Learning for Parkinson’s Detection. J Healthc Eng. 2021;9957132. https://doi.org/10.1155/2021/9957132.
Parisi L, Ma R, Zaernia A, Youseffi M. m-ark-Support Vector Machine for Early Detection of Parkinson’s Disease from Speech Signals. Int J Math Comp Sim. 2021;15:34–41. https://doi.org/10.46300/9102.2021.15.7.
Mittal V, Sharma RK. Machine learning approach for classification of Parkinson disease using acoustic features. J Reliab Intell Environ. 2021;7:233–9. https://doi.org/10.1007/s40860-021-00141-6.
Parlar T. A heuristic approach with artificial neural network for Parkinson’s disease. Int J Appl Math Elec Comp. 2021;9(1):1–6. https://doi.org/10.18100/ijamec.802599.
Anisha CD, Arulanand N. Early Prediction of Parkinson's Disease (PD) Using Ensemble Classifiers. 2020 International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India. 2020;1–6. https://doi.org/10.1109/ICITIIT49094.2020.9071562.
Hasan KA, Hasan AM, M. Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets,. IEEE Region 10 Symposium (TENSYMP). Dhaka, Bangladesh. 2020;2020:758–61. https://doi.org/10.1109/TENSYMP50017.2020.9230842.
Hema MS, Maheshprabhu R, Reddy KS, Guptha MN, Pandimurugan V. Prediction analysis for Parkinson disease using multiple feature selection & classification methods. Multimed Tools Appl. 2023;82(27):42995–3012. https://doi.org/10.1007/s11042-023-15280-6.
Alshammri R, Alharbi G, Alharbi E, Almubark I. Machine learning approaches to identify Parkinson’s disease using voice signal features. Front Artif Intell. 2023;6:1084001. https://doi.org/10.3389/frai.2023.1084001.
Celik G, Başaran E. Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of Parkinson’s disease from speech signals. Appl Acoust. 2023;211:109476. https://doi.org/10.1016/j.apacoust.2023.109476.
Ouhmida A, Raihani A, Cherradi B, Sandabad S. Parkinson’s diagnosis hybrid system based on deep learning classification with imbalanced dataset. Int J Electr Comput Eng. 2023;13(3):3204–3216. https://doi.org/10.11591/ijece.v13i3.pp3204-3216.
Balakrishnan A, Medikonda J, Namboothiri PK, Natarajan M. Mahalanobis Metric-based Oversampling Technique for Parkinson’s Disease Severity Assessment using Spatiotemporal Gait Parameters. Biomed Signal Process Control. 2023;86:105057. https://doi.org/10.1016/j.bspc.2023.105057.
Acknowledgements
The authors are grateful to the SRM Institute of Science and Technology, Kattankulathur Campus, Chennai, India.
Funding
No funding was received.
Author information
Authors and Affiliations
Contributions
Conceptualization, Methodology, Formal analysis, Writing-Original draft: Samiappan Dhanalakshmi and Sudeshna Das; Investigation, Validation, Visualization, Reviewing and Editing: Samiappan Dhanalakshmi, Sudeshna Das and Ramalingam Senthil. All authors were involved in the preparation of subsequent drafts and made substantial contributions. In addition, all authors approved the final version.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethical clearances were obtained from SRM Medical College Hospital and Research Center, India. Ethics Clearance Number: 1739/IEC/2019.
Consent for publication
All authors agreed with the content for publication.
Conflict of interest
The authors declare no potential conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dhanalakshmi, S., Das, S. & Senthil, R. Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning. Health Technol. 14, 393–406 (2024). https://doi.org/10.1007/s12553-023-00810-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-023-00810-x