Skip to main content
Log in

Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning

  • Original Paper
  • Published:
Health and Technology Aims and scope Submit manuscript

Abstract

Purpose

Parkinson’s disease (PD) is one of the most prevalent neurodegenerative diseases in the global context. The presently available detecting process of PD is costly and labour-intensive. Along with a movement disorder, PD also affects speech differently. By causing variation in pitch, monotonicity, slurring of words, or slow speed of talking. This study uses different machine learning binary classification algorithms for the detection and classification of PD.

Methods

The publicly available Parkinson’s disease speech features dataset is imbalanced, with only 192 healthy instances compared to 564 PD instances. Synthetic Minority Oversampling Technique – Edited Nearest Neighbours (SMOTE-ENN) algorithms rectify the class imbalance by oversampling and under sampling. Thus, it results in a balanced dataset free of noisy samples. Machine learning binary classifiers, Random Forest, K-Nearest Neighbours, Support Vector Machine, Extreme Gradient Boosting, Decision Tree, and Logistic Regression are investigated.

Results

The classification algorithms have been analysed and compared based on several standard evaluation metrics. The classification model, resampling using SMOTE-ENN technique, and dimensionality reduction using principal component analysis (PCA) have been performed on the dataset to enhance the performance and prevent overfitting. The combination of SMOTE-ENN and Support Vector Machine (SVM) yields a good accuracy of 96.5%.

Conclusion

Speech features are a predictive and non-intrusive method, thus making the diagnostic process painless and straightforward. The reported results are promising to aid the diagnosis of PD so that treatment can be administered as early as possible. Thus, the primary findings are beneficial to detect PD at an early stage with optimal accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

The data analyzed in this study are available here. https://tinyurl.com/n24jt4vz.

Abbreviations

ANN:

Artificial neural networks

AUC:

Area under the curve

EC:

Electrocardiogram

ENN:

Edited nearest neighbours

FN:

False negative

FP:

False positive

FPR:

False positive rate

KNN:

K-Nearest neighbours

MFCC:

Mel frequency cepstral coefficient

ML:

Machine learning

PCA:

Principal component analysis

PD:

Parkinson’s disease

RBF:

Radial basis function

ROC:

Receiver operating characteristics

RF :

Random Forest

SMOTE:

Synthetic minority oversampling technique

SVM:

Support vector machine

TN:

True negative

TP:

True positive

References

  1. Khoshnevis SA, Sankar R. Diagnosis of Parkinson's disease using higher order statistical analysis of alpha and beta rhythms. Biomed Signal Process Control. 2022;77. https://doi.org/10.1016/j.bspc.2022.103743.

  2. Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T. Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson’s disease. Comput Methods Programs Biomed. 2023;234:107495. https://doi.org/10.1016/j.cmpb.2023.107495.

  3. Di Lazzaro G, Ricci M, Al-Wardat M, Schirinzi T, Scalise S, Giannini F, Mercuri NB, Saggio G, Pisani A. Technology-Based Objective Measures Detect Subclinical Axial Signs in Untreated, de novo Parkinson’s Disease. J Parkinsons Dis. 2020;10(1):113–22. https://doi.org/10.3233/JPD-191758.

    Article  PubMed  Google Scholar 

  4. Ghaderyan P, Ghoreshi Beyrami SM. Neurodegenerative diseases detection using distance metrics and sparse coding: A new perspective on gait symmetric features. Comput Biol Med. 2020;120. https://doi.org/10.1016/j.compbiomed.2020.103736.

  5. Farashi S. Distinguishing between Parkinson's disease patients and healthy individuals using a comprehensive set of time, frequency and time-frequency features extracted from vertical ground reaction force data. Biomed Signal Process Control 2020;62.

  6. Lamba R, Gulati T, Al-Dhlan KA, Jain A. A systematic approach to diagnose Parkinson’s disease through kinematic features extracted from handwritten drawings. J Reli Intell Environ. 2021;7(3):253–62. https://doi.org/10.1007/s40860-021-00130-9.

    Article  Google Scholar 

  7. Aouraghe I, Alae A, Ghizlane K, Mrabti M, Aboulem G, Faouzi B. A novel approach combining temporal and spectral features of Arabic online handwriting for Parkinson's disease prediction. J Neurosci Methods 2020;339. https://doi.org/10.1016/j.jneumeth.2020.108727.

  8. Karaman O, Çakın H, Alhudhaif A, Polat K. Robust automated Parkinson disease detection based on voice signals with transfer learning. Expert Sys Appl. 2021;178. https://doi.org/10.1016/j.eswa.2021.115013.

  9. Folador JP, Santos MCS, Luiz LMD, de Souza LAPS, Vieira MF, Pereira AA, de Oliveira AA. On the use of histograms of oriented gradients for tremor detection from sinusoidal and spiral handwritten drawings of people with Parkinson’s disease. Med Biol Eng Comput. 2021;59(1):195–214. https://doi.org/10.1007/s11517-020-02303-9.

    Article  PubMed  Google Scholar 

  10. Borzì L, Olmo G, Artusi CA, Fabbri M, Rizzone MG, Romagnolo A, Zibetti M, Lopiano L. A new index to assess turning quality and postural stability in patients with Parkinson's disease. Biomed Signal Process Control 2020;62. https://doi.org/10.1016/j.bspc.2020.102059.

  11. Tjaden K. Speech and Swallowing in Parkinson’s Disease. Top Geriatr Rehabil. 2008;24(2):115–26. https://doi.org/10.1097/01.TGR.0000318899.87690.44.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Vandana VP, Darshini JK, Vikram VH, Nitish K, Kumar PP, Ravi Y. Speech Characteristics of Patients with Parkinson’s Disease-Does Dopaminergic Medications Have a Role? J Neurosci Rural Pract. 2021;12(4):673–9. https://doi.org/10.1055/s-0041-1735249.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. J King Saud Univ Comput Inf Sci. 2022;34(4):1060–1073. https://doi.org/10.1016/j.jksuci.2019.06.012.

  14. Braga D, Madureira AM, Coelho L, Ajith R. Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng Appl Artif Intell. 2019;77:148–58. https://doi.org/10.1016/j.engappai.2018.09.018.

    Article  Google Scholar 

  15. Benba A, Jilbab A, Hammouch A. Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int J Speech Technol. 2016;19:449–56. https://doi.org/10.1007/s10772-016-9338-4.

    Article  Google Scholar 

  16. Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, Kursun O. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform. 2013;17(4):828–34. https://doi.org/10.1109/jbhi.2013.2245674.

    Article  PubMed  Google Scholar 

  17. Khaskhoussy R, Ayed YB. Automatic Detection of Parkinson’s Disease from Speech Using Acoustic, Prosodic and Phonetic Features. International Conference on Intelligent Systems Design and Applications. Springer, Cham. 2019;1181. https://doi.org/10.1007/978-3-030-49342-4_8.

  18. Solana-Lavalle G, Rosas-Romero R. Analysis of voice as an assisting tool for detection of Parkinson's disease and its subsequent clinical interpretation. Biomed Signal Process Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102415.

  19. Gómez-Rodellar A, Palacios-Alonso D, Ferrández Vicente JM, Mekyska J, Álvarez-Marquina A, Gómez-Vilda P. A Methodology to Differentiate Parkinson’s Disease and Aging Speech Based on Glottal Flow Acoustic Analysis. Int J Neural Syst. 2020;30(10):2050058. https://doi.org/10.1142/S0129065720500586.

    Article  PubMed  Google Scholar 

  20. Meghraoui D, Boudraa B, Merazi-Meksen T, Vilda PG. A novel pre-processing technique in pathologic voice detection: Application to Parkinson's disease phonation. Biomed Signal Process Control. 2021;68. https://doi.org/10.1016/j.bspc.2021.102604.

  21. Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson's disease classification. Biomed Signal Process Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102452.

  22. Polat K. A hybrid approach to Parkinson disease classification using speech signal: The combination of SMOTE and random forests. Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE. 2019. https://doi.org/10.1109/EBBT.2019.8741725.

  23. Vuttipittayamongkol P, Elyan E. Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease. Int J Neural Syst. 2020;30(8). https://doi.org/10.1142/s0129065720500434.

  24. Benba A, Jilbab A, Hammouch A. Voice assessments for detecting patients with Parkinson’s diseases using PCA and NPCA. Int J Speech Technol. 2016;19:743–54. https://doi.org/10.1007/s10772-016-9367-z.

    Article  Google Scholar 

  25. Shahbakhti M, Taherifar D, Zareei Z. Combination of PCA and SVM for diagnosis of Parkinson's disease. 2013 2nd International Conference on Advances in Biomedical Engineering, Tripoli, Lebanon. 2013;137–140. https://doi.org/10.1109/ICABME.2013.6648866.

  26. Shirvan RA, Tahami E, Voice analysis for detecting Parkinson’s disease using genetic algorithm and KNN classification method. 18th Iranian Conference of Biomedical Engineering (ICBME). IEEE. 2011;2011:278–83. https://doi.org/10.1109/ICBME.2011.6168572.

    Article  Google Scholar 

  27. Alalayah KM, Senan EM, Atlam HF, Ahmed IA, Shatnawi HSA. Automatic and Early Detection of Parkinson’s Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method. Diagnostics. 2023;13(11):1924. https://doi.org/10.3390/diagnostics13111924.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Das R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl. 2010;37(2):1568–72. https://doi.org/10.1016/j.eswa.2009.06.040.

    Article  Google Scholar 

  29. Xuchen Z, Yong F, Peng W. Automatically Predicting Severity of Parkinson's Disease Using Model Based on XGBoost from Speech, 2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Dalian, China. 2019;1–5. https://doi.org/10.1109/ICSPCC46631.2019.8960722.

  30. Elreedy D, Atiya AF. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Inform Sci. 2019;505:32–64. https://doi.org/10.1016/j.ins.2019.07.070.

    Article  Google Scholar 

  31. Demir N, Kuncan M, Kaya Y, Kuncan F. Multi-Layer Co-Occurrence Matrices for Person Identification from ECG Signals. Traitement du Signal. 2022;39(2):431–40.

    Article  Google Scholar 

  32. Bhadra S, Kumar CJ. An insight into diagnosis of depression using machine learning techniques: a systematic review. Curr Med Res Opin. 2022;38(5):749–71.

    Article  PubMed  Google Scholar 

  33. Anusha B, Geetha P, Kannan A. Parkinson’s disease identification in homo sapiens based on hybrid ResNet-SVM and resnet-fuzzy svm models. J Intell Fuzzy Syst. 2022;43(3):2711–29.

    Article  Google Scholar 

  34. Toma M, Wei OC. Predictive Modeling in Medicine. Encyclopedia. 2023;3(2):590–601. https://doi.org/10.3390/encyclopedia3020042.

    Article  Google Scholar 

  35. Gupta R, Kumari S, Senapati A, Ambasta RK, Kumar P. New era of artificial intelligence and machine learning-based detection, diagnosis, and therapeutics in Parkinson’s disease. Ageing Res Rev. 2023;90:102013. https://doi.org/10.1016/j.arr.2023.102013.

  36. Latha S, Samiappan D. Despeckling of carotid artery ultrasound images with a calculus approach. Curr Med Imaging. 2019;15(4):414–26. https://doi.org/10.2174/1573405614666180402124438.

    Article  CAS  Google Scholar 

  37. Aşuroğlu T, Oğul H. A deep learning approach for parkinson’s disease severity assessment. Health Technol. 2022;12:943–53. https://doi.org/10.1007/s12553-022-00698-z.

    Article  Google Scholar 

  38. Ertuğrul ÖF, Kaya Y, Tekin R, Almalı MN. Detection of Parkinson’s disease by shifted one dimensional local binary patterns from gait. Expert Syst Appl. 2016;56:156–63.

    Article  Google Scholar 

  39. Özel E, Tekin R, Kaya Y. Implementation of Artifact Removal Algorithms in Gait Signals for Diagnosis of Parkinson Disease. Trait du Signal. 2021;38(3):587–597. https://doi.org/10.18280/ts.380306.

  40. Singh KR, Dash S. Chapter 1 - Early detection of neurological diseases using machine learning and deep learning techniques: A review. Artificial Intelligence for Neurological Disorders, Academic Press. 2022;1–24. https://doi.org/10.1016/B978-0-323-90277-9.00001-8.

  41. Ayaz Z, Naz S, Khan NH, Razzak I, Imran M. Automated methods for diagnosis of Parkinson’s disease and predicting severity level. Neural Comput Appl. 2023;35(20):14499–534. https://doi.org/10.1007/s00521-021-06626-y.

    Article  Google Scholar 

  42. Xu Z, Shen D, Nie T, Kou Y. A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. J Biomed Inform. 2020;107:103465. https://doi.org/10.1016/j.jbi.2020.103465.

  43. Lamba R, Gulati T, Alharbi HF, Jain A. A hybrid system for Parkinson’s disease diagnosis using machine learning techniques. Int J Speech Technol. 2022;25(3):583–93. https://doi.org/10.1007/s10772-021-09837-9.

    Article  Google Scholar 

  44. Gunduz H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access. 2019;7:115540–51.

    Article  Google Scholar 

  45. Pramanik M. Pradhan R, Nandy P, Qaisar SM, Bhoi AK. Assessment of Acoustic Features and Machine Learning for Parkinson’s Detection. J Healthc Eng. 2021;9957132. https://doi.org/10.1155/2021/9957132.

  46. Parisi L, Ma R, Zaernia A, Youseffi M. m-ark-Support Vector Machine for Early Detection of Parkinson’s Disease from Speech Signals. Int J Math Comp Sim. 2021;15:34–41. https://doi.org/10.46300/9102.2021.15.7.

  47. Mittal V, Sharma RK. Machine learning approach for classification of Parkinson disease using acoustic features. J Reliab Intell Environ. 2021;7:233–9. https://doi.org/10.1007/s40860-021-00141-6.

    Article  Google Scholar 

  48. Parlar T. A heuristic approach with artificial neural network for Parkinson’s disease. Int J Appl Math Elec Comp. 2021;9(1):1–6. https://doi.org/10.18100/ijamec.802599.

  49. Anisha CD, Arulanand N. Early Prediction of Parkinson's Disease (PD) Using Ensemble Classifiers. 2020 International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India. 2020;1–6. https://doi.org/10.1109/ICITIIT49094.2020.9071562.

  50. Hasan KA, Hasan AM, M. Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets,. IEEE Region 10 Symposium (TENSYMP). Dhaka, Bangladesh. 2020;2020:758–61. https://doi.org/10.1109/TENSYMP50017.2020.9230842.

    Article  Google Scholar 

  51. Hema MS, Maheshprabhu R, Reddy KS, Guptha MN, Pandimurugan V. Prediction analysis for Parkinson disease using multiple feature selection & classification methods. Multimed Tools Appl. 2023;82(27):42995–3012. https://doi.org/10.1007/s11042-023-15280-6.

    Article  Google Scholar 

  52. Alshammri R, Alharbi G, Alharbi E, Almubark I. Machine learning approaches to identify Parkinson’s disease using voice signal features. Front Artif Intell. 2023;6:1084001. https://doi.org/10.3389/frai.2023.1084001.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Celik G, Başaran E. Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of Parkinson’s disease from speech signals. Appl Acoust. 2023;211:109476. https://doi.org/10.1016/j.apacoust.2023.109476.

  54. Ouhmida A, Raihani A, Cherradi B, Sandabad S. Parkinson’s diagnosis hybrid system based on deep learning classification with imbalanced dataset. Int J Electr Comput Eng. 2023;13(3):3204–3216. https://doi.org/10.11591/ijece.v13i3.pp3204-3216.

  55. Balakrishnan A, Medikonda J, Namboothiri PK, Natarajan M. Mahalanobis Metric-based Oversampling Technique for Parkinson’s Disease Severity Assessment using Spatiotemporal Gait Parameters. Biomed Signal Process Control. 2023;86:105057. https://doi.org/10.1016/j.bspc.2023.105057.

Download references

Acknowledgements

The authors are grateful to the SRM Institute of Science and Technology, Kattankulathur Campus, Chennai, India.

Funding

No funding was received.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Methodology, Formal analysis, Writing-Original draft: Samiappan Dhanalakshmi and Sudeshna Das; Investigation, Validation, Visualization, Reviewing and Editing: Samiappan Dhanalakshmi, Sudeshna Das and Ramalingam Senthil. All authors were involved in the preparation of subsequent drafts and made substantial contributions. In addition, all authors approved the final version.

Corresponding author

Correspondence to Ramalingam Senthil.

Ethics declarations

Ethics approval and consent to participate

Ethical clearances were obtained from SRM Medical College Hospital and Research Center, India. Ethics Clearance Number: 1739/IEC/2019.

Consent for publication

All authors agreed with the content for publication.

Conflict of interest

The authors declare no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhanalakshmi, S., Das, S. & Senthil, R. Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning. Health Technol. 14, 393–406 (2024). https://doi.org/10.1007/s12553-023-00810-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12553-023-00810-x

Keywords

Navigation