Skip to main content
Log in

Conventional Machine Learning and Feature Engineering for Vocal Fold Precancerous Lesions Detection Using Acoustic Features

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The success of laryngeal cancer treatment is significantly impacted by early-stage detection. Finding a precancerous lesion is a more difficult process than attempting to stop it from appearing. In this study, we propose a noninvasive, quick way to address that issue. When the vocal folds are affected by pathological lesions, the produced human voice becomes a pathological one. The proposed method is based on the analysis of pathological speeches. Our aim is to characterize the speech signal using a set of features. Hence, many algorithms, operating in different spaces, are used to extract speech features such as MFCC, LPC, LPCC, HNR. The speech feature extraction process is done frame by frame with a frame duration of 30 ms due to the nonstationarity of human speech. More than 170 features are computed for each speech frame. Since the extracted features can be highly correlated or nonsignificant, we have conducted a features engineering process. Feature engineering followed by principal component analysis (PCA) leads to the retention of 28 components. When using the support vector machine (SVM) technique, promising experimental results are obtained in terms of standard metrics. The obtained scores in terms of Accuracy, Recall, precision, and \(f_{1}\) are, respectively, 0.94, 0.95, 0.89, and 0.92.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. A.B. Aicha, Noninvasive detection of potentially precancerous lesions of vocal fold based on glottal wave signal and svm approaches. Proc. Comput. Sci. 126, 586–595 (2018)

    Article  Google Scholar 

  2. A.B. Aicha, Contribution of data augmentation for the prenventive detection of vocal fold precancerous lesions. Proc. Comput. Sci. 159, 212–220 (2019)

    Article  Google Scholar 

  3. A.B. Aicha, Conventional machine learning techniques with features engineering for preventive larynx cancer detection, in 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), IEEE, (2020), pp. 1–5

  4. A.B. Aicha, K. Ezzine, Cancer larynx detection using glottal flow parameters and statistical tools, in International Symposium on Signal, Image, Video and Communications (ISIVC), IEEE, (2016), pp. 65–70

  5. A. Al-Nasheri, M. Ghulam, M. Alsulaiman, A. Zulfiqar, Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J. Voice 31, 3–15 (2017)

    Article  PubMed  Google Scholar 

  6. A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali, K. Malki, T. Mesallam, M.F. Ibrahim, Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 6, 6961–6974 (2017)

    Article  Google Scholar 

  7. Z. Ali, I. Elamvazuthi, M. Alsulaiman, G. Muhammad, Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model. J. Voice 30, 757-e7 (2016)

    Article  Google Scholar 

  8. M. Altayeb, A. Al-Ghraibah, Classification of three pathological voices based on specific features groups using support vector machine. Int. J. Electr. Comput. Eng. 12(1), 946–956 (2022)

    Google Scholar 

  9. F. Amara, M. Fezari, H. Bourouba, An improved gmm-svm system based on distance metric for voice pathology detection. Appl. Math. Inf. Sci. 10, 1061–1070 (2016)

    Article  Google Scholar 

  10. S. Bahadur, A. Thakar, B.K. Mohanti, Carcinoma of the Larynx and Hypopharynx (Springer, Berlin, 2019)

    Book  Google Scholar 

  11. L. Barnes, J.W. Eveson, P. Reichart, D. Sidransky, Pathology and genetics of head and neck tumors, World Health. Organization 9, 177–180 (2005)

    Google Scholar 

  12. R.A. Barreira, L.L. Ling, Kullback-leibler divergence and sample skewness for pathological voice quality assessment. Biomed. Signal Process. Control 57, 101697 (2020)

    Article  Google Scholar 

  13. W.J. Barry, M.Pützer, Saarbrücken Voice Database, Institute of Phonetics, University of Saarland, (2016)

  14. N. Bhat, K. Thakur, L. Jindal, H. Nandan, F. Arzoo, Leukoplakia: a comprehensive review. Asian Pac. J. Health Sci. 7, 33–35 (2020)

    Article  Google Scholar 

  15. U.A. Bhatti, M. Huang, D. Wu, Y. Zhang, A. Mehmood, H. Han, Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp. Inf. Syst. 13, 329–351 (2019)

    Article  Google Scholar 

  16. F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68(2018), 394–424 (2018)

    Article  PubMed  Google Scholar 

  17. A. Castellana, A. Carullo, S. Corbellini, A. Astolfi, Discriminating pathological voice from healthy voice using cepstral peak prominence smoothed distribution in sustained vowel. IEEE Trans. Instrum. Meas. 67, 646–654 (2018)

    Article  ADS  Google Scholar 

  18. L. Chen, J. Chen, Deep neural network for automatic classification of pathological voice signals. J. Voice 36(2), 288-e15 (2020)

    Article  PubMed  Google Scholar 

  19. H. Cordeiro, J. Fonseca, I. Guimarães, C. Meneses, Voice pathologies identification speech signals, features and classifiers evaluation, in Algorithms, Architectures, Arrangements, and Applications (SPA), IEEE, (2015), pp. 81–86

  20. M. Dahmani, M. Guerti, Vocal folds pathologies classification using naïve bayes networks, in 6th International Conference on Systems and Control (ICSC). IEEE (2017), pp. 426–432

  21. N. Dave, Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1, 1–4 (2013)

    Google Scholar 

  22. P.H. Dejonckere, P. Bradley, P. Clemente, G. Cornut, L. Crevier-Buchman, G. Friedrich, P.V.D. Heyning, M. Remacle, V. Woisard, A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur. Arch. Otorhinolaryngol. 258, 77–82 (2001)

    Article  PubMed  CAS  Google Scholar 

  23. T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in 12th Annual Conference of the International Speech Communication Association, (2011), pp. 1973–1976

  24. I.R.H. Kramer et al., Definition of leukoplakia and related lesions: an aid to studies on oral precancer. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 46, 518–537 (1978)

    Article  CAS  Google Scholar 

  25. G. Fairbanks, Voice and Articulation Drillbook, 2nd edn. (Harper and Row, New York, 1960)

    Google Scholar 

  26. M. Fedila, M. Bengherabi, A. Amrouche, Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimed. Tools Appl. 77, 16721–16739 (2018)

    Article  Google Scholar 

  27. J. Ferlay, M. Colombet, I. Soerjomataram, C. Mathers, D.M. Parkin, M. Piñeros, A. Znaor, F. Bray, Estimating the global cancer incidence and mortality in, Globocan sources and methods. Int. J. Cancer 144(2019), 1941–1953 (2018)

    PubMed  Google Scholar 

  28. J. Fernandes, F. Teixeira, V. Guedes, A. Junior, J.P. Teixeira, Harmonic to noise ratio measurement-selection of window and length. Proc. Comput. Sci. 138, 280–285 (2018)

    Article  Google Scholar 

  29. M. Feurer, F. Hutter, Hyperparameter optimization, in Automated Machine Learning. (Springer, Berlin, 2019), pp.3–33

    Chapter  Google Scholar 

  30. A. Gelzinis, A. Verikas, M. Bacauskiene, Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91, 36–47 (2008)

    Article  PubMed  CAS  Google Scholar 

  31. G. Gidaye, J. Nirmal, K. Ezzine, A. Shrivas, M. Frikha, Application of glottal flow descriptors for pathological voice diagnosis. Int. J. Speech Technol. 23, 205–222 (2020)

    Article  Google Scholar 

  32. A.H. Hakeem, I.H. Hakeem, S.A. Pradhan, Management of early-stage laryngeal cancer,. Otorhinolaryngol. Clin. Int. J. 2, 61–165 (2010)

    Google Scholar 

  33. P. Harar, Z. Galaz, J. B. Alonso-Hernandez, J. Mekyska, R. Burget, Z. Smekal, Towards robust voice pathology detection, Neural Comput. Appl. 1–11 (2018)

  34. S. Hegde, S. Shetty, S. Rai, T. Dodderi, A survey on machine learning approaches for automatic detection of voice disorders. J. Voice 33, 947-e11 (2019)

    Article  Google Scholar 

  35. H. Hellquist, J. Lundgren, J. Olofsson, Hyperplasia, keratosis, dysplasia and carcinoma in situ of the vocal cords-a follow-up study. Clin. Otolaryngol. 7, 11–27 (1982)

    Article  PubMed  CAS  Google Scholar 

  36. Y.D. Heman-Ackah, D.D. Michael, M.M. Baroody, R. Ostrowski, J. Hillenbrand, R.J. Heuer, M. Horman, R.T. Sataloff, Cepstral peak prominence: a more reliable measure of dysphonia. Ann. Otol. Rhinol. Laryngol. 112, 324–333 (2003)

    Article  PubMed  Google Scholar 

  37. D. Hemmerling, Voice pathology distinction using autoassociative neural networks, in 25th European Signal Processing Conference (EUSIPCO). IEEE (2017), pp. 1844–1847

  38. J. Hillenbrand, R.A. Houde, Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J. Speech Lang. Hear. Res. 39, 311–321 (1996)

    Article  CAS  Google Scholar 

  39. M.S. Hossain, M. Ghulam, Healthcare big data voice pathology assessment framework. IEEE Access 4, 7806–7815 (2016)

    Article  Google Scholar 

  40. R.T. Hughes, W.J. Beuerlein, S.S. O’Neill, M. Porosnicu, T.W. Lycan, J.D. Waltonen, B.A. Frizzell, K.M. Greven, Human papillomavirus-associated squamous cell carcinoma of the larynx or hypopharynx: clinical outcomes and implications for laryngeal preservation. Oral Oncol. 98, 20–27 (2019)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. L. Jiang, P. Tan, J. Yang, X. Liu, C. Wang, Speech Emotion Recognition Using Emotion Perception Spectral Feature Concurrency and Computation: Practice and Experience (2019), p. 5427

  42. S.R. Kadiri, P. Alku, Analysis and detection of pathological voice using glottal source features. IEEE J. Select. Topics Signal Process. 14, 367–379 (2019)

    Article  ADS  Google Scholar 

  43. R. Karigome, I. Hanazaki, Use of reflection coefficients of burg’s method for improvement of visual support way in pronunciation practice. Trans. Inst. Syst. Control Inf. Eng. 31, 220–227 (2018)

    Google Scholar 

  44. H. Kim, J. Jeon, Y.J. Han, Y. Joo, J. Lee, S. Lee, S. Im, Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J. Clin. Med. 9, 3415 (2020)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. R.S. Lebovics, H.B.N. III, Chapter 29. Infectious and inflammatory disorders of the larynx, Diagnosis and treatment of voice disorders (2006)

  46. J.L. Lefebvre, D. Chevalier, Cancers du larynx. EMC-Oto-rhino-laryngologie 2, 432–457 (2005)

    Article  Google Scholar 

  47. X. Li, J. Tao, M.T. Johnson, J. Soltis, A. Savage, K.M. Leong, J.D. Newman, Stress and emotion classification using jitter and shimmer features, in International Conference on Acoustics, Speech and Signal Processing, vol. 4, (IEEE, 2007), p. 1081

  48. V. Meei, S. Lab, Disordered voice database, version 1.03 [cd-rom], (1994)

  49. V. Mittal, R.K. Sharma, Glottal signal analysis for voice pathology, in International Conference on Innovations in Electronics, Signal Processing and Communication (IESC), (IEEE, 2019), pp. 54–59

  50. G. Muhammad, M.F. Alhamid, M. Alsulaiman, B. Gupta, Edge computing with cloud for voice disorder assessment and treatment. IEEE Commun. Mag. 56, 60–65 (2018)

    Article  Google Scholar 

  51. T. Murry, R.T. Sataloff, Clinical Assessment of Voice (Plural Publishing, San Diego, 2017)

    Google Scholar 

  52. B.W. Neville, T.A. Day, Oral cancer and precancerous lesions. CA Cancer J. Clin. 52, 195–215 (2002)

    Article  PubMed  Google Scholar 

  53. T. Ogunfunmi, R. Togneri, M. Narasimha, Speech and Audio Processing for Coding, Enhancement and Recognition (Springer, Berlin, 2015)

    Book  Google Scholar 

  54. K.K. Paliwal, On the use of line spectral frequency parameters for speech recognition. Digit. Signal Process. 2, 80–87 (1992)

    Article  Google Scholar 

  55. K. K. Paliwal, Spectral subband centroid features for speech recognition, in International Conference on Acoustics, Speech and Signal Processing, vol. 2, (IEEE, 1998), pp. 617–620

  56. A. Panwar, R. Lindau, A. Wieland, Management of premalignant lesions of the larynx. Expert Rev. Anticancer Ther. 13, 1045–1051 (2013)

    Article  PubMed  CAS  Google Scholar 

  57. M.K. Reddy, P. Alku, A comparison of cepstral features in the detection of pathological voices by varying the input and filterbank of the cepstrum computation. IEEE Access 9, 135953–135963 (2021)

    Article  Google Scholar 

  58. M.A. Redford, The Handbook of Speech Production (John Wiley & Sons, New Jersey, 2015)

    Book  Google Scholar 

  59. N.E. Saeedi, F. Almasganj, Wavelet adaptation for automatic voice disorders sorting. Comput. Biol. Med. 43, 699–704 (2013)

    Article  Google Scholar 

  60. P. Saidi, F. Almasganj, Voice disorder signal classification using m-band wavelets and support vector machine. Circuits Syst. Signal Process. 34, 2727–2738 (2015)

    Article  Google Scholar 

  61. P. Schultz, Vocal fold cancer. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 128, 301–308 (2011)

    Article  PubMed  CAS  Google Scholar 

  62. I. Singh, D. Gupta, S. Yadav, Leukoplakia of larynx: a review update. J. Laryngol. Voice 4, 39–44 (2014)

    Article  Google Scholar 

  63. B. Sllamniku, W. Bauer, C. Painter, D. Sessions, The transformation of laryngeal keratosis into invasive carcinoma. Am. J. Otolaryngol. 10, 42–54 (1989)

    Article  PubMed  CAS  Google Scholar 

  64. N. Souissi, A. Cherif, Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector machine, in 7th International Conference on Modelling, Identification and Control (ICMIC). (IEEE, 2015), pp. 1–6

  65. J.C. Stemple, N. Roy, B.K. Klaben, Clinical Voice Pathology: Theory and Management, 2nd edn. (Plural Publishing, San Diego, 2018)

    Google Scholar 

  66. C.E. Steuer, M. El-Deiry, J.R. Parks, K.A. Higgins, N.F. Saba, An update on larynx cancer. CA Cancer J. Clin. 67, 31–50 (2017)

    Article  PubMed  Google Scholar 

  67. M. Sugiyama, Introduction to Statistical Machine Learning (Elsevier, Amsterdam, 2016)

    Google Scholar 

  68. J. Sujanaa, V. Srinivasan, Classification of pathological voice types using artificial neural networks based on mfcc features, Int. J. Sci. Eng. Technol. Res. (2017)

  69. T. Tarver, Cancer facts and figures 2012. American cancer society (ACS) atlanta, GA: American cancer society, 2012. 66 p., pdf. available from, (2012)

  70. A.M. Toh, R. Togneri, S. Nordholm, Spectral entropy as speech features for speech recognition. Proc. PEECS 1, 92 (2005)

    Google Scholar 

  71. H. Wu, J. Soraghan, A. Lowit, G.D. Caterina, A deep learning method for pathological voice detection using convolutional deep belief networks, Interspeech 2018, (2018)

  72. L. Yang, A. Shami, On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)

    Article  Google Scholar 

  73. S.M. Zeitels, G.B. Healy, Laryngology and phonosurgery. N. Engl. J. Med. 349, 882–892 (2003)

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anis Ben Aicha.

Ethics declarations

Conflict of interest

Authors Anis Ben Aicha and Fadi Kacem declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Aicha, A., Kacem, F. Conventional Machine Learning and Feature Engineering for Vocal Fold Precancerous Lesions Detection Using Acoustic Features. Circuits Syst Signal Process 43, 1905–1937 (2024). https://doi.org/10.1007/s00034-023-02551-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02551-8

Keywords

Navigation