Skip to main content
Log in

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1 × 8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1 × 8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3 × 3, 4 × 4, 5 × 5, and 6 × 6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. C. Agarwal, A. Sharma, Image understanding using decision tree based machine learning, in International Conference on Information Technology and Multimedia (IEEE, 2012), pp. 1–8

  2. E. Akafi, M. Vali, N. Moradi, Detection of hypernasal speech in children with cleft palate, in 19th Iranian Conference of Biomedical Engineering (ICBME) (IEEE, 2013), pp. 237–241

  3. A. Amelot, L. Crevier-Buchman, S. Maeda, Observations of velopharyngeal closure mechanism in horizontal and lateral direction from fiberscopic data, in 15th International Congress of Phonetic Sciences, 2003, pp. 3021–3024

  4. T. Ananthakrishna, K. Shama, U.C. Niranjan, k-means nearest neighbor classifier for voice pathology, in Proceedings of the IEEE Indicon (IEEE, 2004), pp. 352–354

  5. V. Ananthanatarajan, S. Jothilakshmi, Segmentation of continuous speech into consonant and vowel units using formant frequencies. Int. J. Comput. Appl. 56(15), 24–27 (2012)

    Google Scholar 

  6. M. Andreas, H.N. Florian, B. Tobias, N.T. Elmar, S. Florian, N. Emeka, S. Maria, Automatic detection of articulation disorders in children with cleft lip and palate. J. Acoust. Soc. Am. 126(5), 2589–2602 (2009)

    Article  Google Scholar 

  7. J.R.O. Arroyave, J.F.V. Bonilla, Automatic detection of hypernasality in children, in International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC) (Springer, 2011), pp. 167–174

  8. Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  9. M. Bianchini, F. Scarselli, On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn. Syst. 25(8), 1553–1565 (2014)

    Article  Google Scholar 

  10. P. Birch, B. Gumoes, S. Prytz, A. Karle, H. Stavad, J. Sundberg, Effects of a velopharyngeal opening on the sound transfer characteristics of the vowel [a]. Speech Music Hear. Q. Prog. Status Rep. 43, 9–15 (2002)

    Google Scholar 

  11. T. Bocklet, K. Riedhammer, U. Eysholdt, E. Nöth, Automatic phoneme analysis in children with Cleft Lip and Palate, in IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2013), pp. 7572–7576

  12. D.A. Cairns, J.H. Hansen, J.E. Riski, A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans. Biomed. Eng. 43(1), 35–45 (1996)

    Article  Google Scholar 

  13. M.A. Carbonneau, E. Granger, Y. Attabi, G. Gagnon, Feature learning from spectrograms for assessment of personality traits. IEEE Trans. Affect. Comput. (2016). https://doi.org/10.1109/TAFFC.2017.2763132

    Article  Google Scholar 

  14. G. Carneiro, J. Nascimento, A.P. Bradley, Automated analysis of unregistered multi-view mammograms with deep learning. IEEE Trans. Med. Imaging 36(11), 2355–2365 (2017)

    Article  Google Scholar 

  15. G. Castellanos, G. Daza, L. Sanchez, O. Castrillon, J. Suarez, Acoustic speech analysis for hypernasality detection in children, in International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE, 2006), pp. 5507–5510

  16. M. Cernak, S. Tong, Nasal speech sounds detection using connectionist temporal classification, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2018), pp. 5574–5578

  17. S. Chambon, M.N. Galtier, P.J. Arnal, G. Wainrib, A. Gramfort, A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Trans. Rehabil. Eng. 26(4), 758–769 (2018)

    Google Scholar 

  18. Y. Chen, H. Jiang, C. Li, X. Jia, P. Ghamisi, Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251 (2016)

    Article  Google Scholar 

  19. C.D.L. Cruz, B. Santhanam, A joint EMD and Teager-Kaiser energy approach towards normal and nasal speech analysis, in 50th Asilomar Conference on Signals, Systems and Computers (IEEE, 2016), pp. 429–433

  20. J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete-Time Processing of Speech Signals (Prentice-Hall, Englewood Cliffs, 1993)

    Google Scholar 

  21. T. Dodderi, M. Narra, S.M. Varghese, D.T. Deepak, Spectral analysis of hypernasality in cleft palate children: a pre-post surgery comparison. J. Clin. Diagn. Res. 10(1), 1–3 (2016)

    Article  Google Scholar 

  22. A.K. Dubey, S.M. Prasanna, S. Dandapat, Pitch-adaptive front-end feature for hypernasality detection, in Interspeech 2018, 2018, pp. 372–376

  23. A.K. Dubey, S.R.M. Prasanna, S. Dandapat, Zero time windowing analysis of hypernasality in speech of Cleft Lip and palate children, in Twenty Second National Conference on Communication (NCC) (IEEE, 2016), pp. 1–6

  24. A.K. Dubey, A. Tripathi, S. Prasanna, S. Dandapat, Detection of hypernasality based on vowel space area. J. Acoust. Soc. Am. 143(5), 412–417 (2018)

    Article  Google Scholar 

  25. T. Fawcett, ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31(1), 1–38 (2004)

    MathSciNet  Google Scholar 

  26. H.M. Fayek, M. Lech, L. Cavedon, Evaluating deep learning architectures for speech emotion recognition. Neural Netw. 92, 60–68 (2017)

    Article  Google Scholar 

  27. W.T. Fitch, J. Giedd, Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J. Acoust. Soc. Am. 106(1), 1511–1522 (1999)

    Article  Google Scholar 

  28. E.S. Fonseca, J.C. Pereira, Normal versus pathological voice signals. IEEE Eng. Med. Biol. Mag. 28(5), 44–48 (2009)

    Article  Google Scholar 

  29. S.K. Gaikwad, B.W. Gawali, P. Yannawar, A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2010)

    Google Scholar 

  30. L.J. Gerstman, Classification of self-normalized vowels. IEEE Trans. Audio Electroacoust. 16(1), 78–80 (1968)

    Article  Google Scholar 

  31. H.R. Gilbert, M.P. Robb, Y. Chen, Formant frequency development: 15 to 36 months. J. Voice 11(3), 260–266 (1997)

    Article  Google Scholar 

  32. X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323

  33. M. Golabbakhsh, F. Abnavi, E.M. Kadkhodaei, F. Derakhshandeh, F. Khanlar, P. Rong, D.P. Kuehn, Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech. J. Acoust. Soc. Am. 141(2), 929–935 (2017)

    Article  Google Scholar 

  34. S. Haque, M.H. Ali, A.K.M.F. Haque, Cross-gender acoustic differences in hypernasal speech and detection of hypernasality, in International Workshop on Computational Intelligence (IWCI) (IEEE, 2017), pp. 187–191

  35. S. Haque, M. Hanif, A.K.M. Fazlul, Variability of acoustic features of hypernasality and it’s assessment. Int. J. Adv. Comput. Sci. Appl. 7(9), 195–201 (2016)

    Google Scholar 

  36. L. He, J. Zhang, Q. Liu, J. Zhang, H. Yin, M. Lech, Automatic detection of glottal stop in cleft palate speech. Biomed. Signal Process. Control 39, 230–236 (2018)

    Article  Google Scholar 

  37. L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech. IEEE Signal Process. Lett. 21(10), 1298–1301 (2014)

    Article  Google Scholar 

  38. G. Henningsson, D.P. Kuehn, D. Sell, T. Sweeney, J.E. Trost-Cardamone, T.L. Whitehill, Universal parameters for reporting speech outcomes in individuals with cleft palate. Cleft Palate Craniofac. J. 45(1), 1–17 (2008)

    Article  Google Scholar 

  39. G.E. Henningsson, A.M. Isberg, Velopharyngeal movement patterns in patients alternating between oral and glottal articulation: a clinical and cineradiographical study. Cleft Palate J. 23(1), 1–9 (1986)

    Article  Google Scholar 

  40. J. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97(1), 3099–3111 (1995)

    Article  Google Scholar 

  41. G.E. Hinton, A practical guide to training restricted Boltzmann machines, in Neural Networks: Tricks of the Trade, ed. by G. Montavon, G.B. Orr, K.R. Müller (Springer, Berlin, 2012), pp. 599–619

    Chapter  Google Scholar 

  42. C. Huang, Analysis of speaker variability, in Seventh European Conference on Speech Communication and Technology (Eurospeech) (2001), pp. 1377–1380

  43. S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015. arXiv:1502.03167

  44. I. Jacobi, On variation and change in diphthongs and long vowels of spoken Dutch. Ph.D. Dissertation, Universiteit of Amsterdam, 2009

  45. R. Kataoka, D.W. Warren, D.J. Zajac, R. Mayo, R.W. Lutz, The relationship between spectral characteristics and perceived hypernasality in children. J. Acoust. Soc. Am. 109(1), 2181–2189 (2001)

    Article  Google Scholar 

  46. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2014. arXiv preprint arXiv:1412.6980

  47. N. Krüger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A.J. Rodríguezsánchez, L. Wiskott, Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1847–1871 (2013)

    Article  Google Scholar 

  48. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–443 (2015)

    Article  Google Scholar 

  49. G.S. Lee, C.P. Wang, C.C. Yang, T.B. Kuo, Voice low tone to high tone ratio: a potential quantitative index for vowel [a:] and its nasalization. IEEE Trans. Biomed. Eng. 53(7), 1437–1439 (2006)

    Article  Google Scholar 

  50. G.S. Lee, C.P. Wang, S. Fu, Evaluation of hypernasality in vowels using voice low tone to high tone ratio. Cleft Palate Craniofac. J. 46(1), 47–52 (2009)

    Article  Google Scholar 

  51. S. Lee, A. Potamianos, S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)

    Article  Google Scholar 

  52. C.X. Ling, J. Huang, H. Zhang, AUC: a better measure than accuracy in comparing learning algorithms, in Conference of the Canadian Society for Computational Studies of Intelligence (Springer, 2003), pp. 329–341

  53. A. Maier, C. Hacker, E. Noth, E. Nkenke, T. Haderlein, F. Rosanowski, M. Schuster, Intelligibility of Children with cleft lip and palate: evaluation by speech recognition techniques, in 18th International Conference on Pattern Recognition (ICPR) (IEEE, 2006), pp. 274–277

  54. A. Maier, C. Hacker, M. Schuster, Analysis of hypernasal speech in children with cleft lip and palate, in International Conference on Text, Speech and Dialogue (TSD) (Springer, 2008), pp. 389–396

  55. A. Mirzaei, M. Vali, Detection of hypernasality from speech signal using group delay and wavelet transform, in 6th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE, 2017), pp. 189–193

  56. J.B. Moon, D.P. Kuehn, J.J. Huisman, Measurement of velopharyngeal closure force during vowel production. Cleft Palate Craniofac. J. 31(5), 356–363 (1994)

    Article  Google Scholar 

  57. D. Morrison, R. Wang, L.C. De Silva, Ensemble methods for spoken emotion recognition in call-centres. Speech Commun. 49(2), 98–112 (2007)

    Article  Google Scholar 

  58. R.G. Nieto, J.I. Marín-Hurtado, L.M. Capacho-Valbuena, A.A. Suarez, Pattern recognition of hypernasality in voice of patients with cleft and lip palate, in XIX Symposium on Image, Signal Processing and Artificial Vision (IEEE, 2015), pp. 1–5

  59. K. Nikitha, S. Kalita, C. Vikram, M. Pushpavathi, S.M. Prasanna, Hypernasality severity analysis in cleft lip and palate speech using vowel space area, in Interspeech, 2017, pp. 1829–1833

  60. L. Nord, G. Ericsson, Acoustic investigation of cleft palate speech before and after speech therapy. Speech Transm. Lab. Q. Prog. Status Rep. 26(4), 15–27 (1985)

    Google Scholar 

  61. J.R. Orozco-Arroyave, J.F. Vargas-Bonilla, J.D. Arias-Londoño, S. Murillo-Rendón, G. Castellanos-Domínguez, J.F. Garcés, Nonlinear dynamics for hypernasality detection in spanish vowels and words. Cognit. Comput. 5(4), 448–457 (2013)

    Article  Google Scholar 

  62. J.R. Orozco-Arroyave, J.D. Arias-Londoño, J.F. Vargas-Bonilla, S. Skodda, J. Rusz, K. Daqrouq, F. Hönig, E. Nöth, Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J. Biomed. Health Inform. 19(6), 1820–1828 (2015)

    Article  Google Scholar 

  63. D. Palaz, R. Collobert, Analysis of cnn-based speech recognition system using raw speech as input, in Interspeech, 2015, pp. 11–15

  64. A. Parush, D.J. Ostry, Superior lateral pharyngeal wall movements in speech. J. Acoust. Soc. Am. 80(3), 749–756 (1986)

    Article  Google Scholar 

  65. D.B. Pisoni, Variability of vowel formant frequencies and the quantal theory of speech: a first report. Phonetica 37(5–6), 285–305 (1980)

    Article  Google Scholar 

  66. R. Prasad, S.R. Kadiri, S.V. Gangashetty, B. Yegnanarayana, Discriminating nasals and approximants in English language using zero time windowing, in Interspeech 2018, 2018, pp. 177–181

  67. D.K. Rah, Y.L. Ko, C. Lee, D.W. Kim, A noninvasive estimation of hypernasality using a linear predictive model. Ann. Biomed. Eng. 29(7), 587–594 (2001)

    Article  Google Scholar 

  68. W. Ryan, C. Hawkins, Ultrasonic measurement of lateral pharyngeal wall movement at the velopharyngeal port. Cleft Palate J. 13, 156–164 (1976)

    Google Scholar 

  69. L. Salhi, A. Cherif, Selection of pertinent acoustic features for detection of pathological voices, in 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO) (IEEE, 2013), pp. 1–6

  70. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  71. M. Schuster, A. Maier, T. Bocklet, E. Nkenke, A. Holst, U. Eysholdt, F. Stelzle, Automatically evaluated degree of intelligibility of children with different cleft type from preschool and elementary school measured by automatic speech recognition. Int. J. Pediatr. Otorhinolaryngol. 76(3), 362–369 (2012)

    Article  Google Scholar 

  72. B.L. Smith, M.K. Kenney, S. Hussain, A longitudinal investigation of duration and temporal variability in children’s speech production. J. Acoust. Soc. Am. 99(1), 2344–2349 (1996)

    Article  Google Scholar 

  73. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  74. P. Tarun, C.Y. Espy-Wilson, B.H. Story, Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. J. Acoust. Soc. Am. 121(6), 3858–3873 (2007)

    Article  Google Scholar 

  75. E. Verteletskaya, K. Sakhnov, B. Simak, Pitch detection algorithms and voiced/unvoiced classification for noisy speech, in International Conference on Systems, Signals and Image Processing (IEEE, 2009), pp. 1–5

  76. P. Vijayalakshmi, T. Nagarajan, J. Rav, Selective pole modification-based technique for the analysis and detection of hypernasality, in IEEE Region 10 Conference TENCON 2009–2009 (IEEE, 2009), pp. 1–5

  77. P. Vijayalakshmi, M.R. Reddy, O.S. Douglas, Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)

    Article  Google Scholar 

  78. C.M. Vikram, A. Tripathi, S. Kalita, S.R. Mahadeva Prasanna, Estimation of hypernasality scores from cleft lip and palate speech, in Interspeech, 2018, pp. 1701–1705

  79. A.P. Vogel, H.M. Ibrahim, S. Reilly, N. Kilpatrick, A comparative study of two acoustic measures of hypernasality. J. Speech Lang. Hear. Res. 52(6), 1640–1651 (2009)

    Article  Google Scholar 

  80. X.Y. Wang, Y.P. Huang, J.H. Qian, L. He, H. Huang, H. Yin, Initial and final segmentation in cleft palate speech based on acoustic characteristics. Comput. Eng. Appl. 54(8), 123–136 (2018)

    Google Scholar 

  81. W. Yin, H. Schütze, B. Xiang, B. Zhou, Abcnn: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2015)

    Article  Google Scholar 

  82. W. Zhang, G. Li, L. Wang, Application of improved spectral subtraction algorithm for speech emotion recognition, in Fifth International Conference on Big Data and Cloud Computing (IEEE, 2015), pp. 213–216

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China 61503264.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Tang, M., Yang, S. et al. Automatic Hypernasality Detection in Cleft Palate Speech Using CNN. Circuits Syst Signal Process 38, 3521–3547 (2019). https://doi.org/10.1007/s00034-019-01141-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01141-x

Keywords

Navigation