Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Wang, Xiyue; Tang, Ming; Yang, Sen; Yin, Heng; Huang, Hua; He, Ling

doi:10.1007/s00034-019-01141-x

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Published: 20 May 2019

Volume 38, pages 3521–3547, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Xiyue Wang¹,
Ming Tang¹,
Sen Yang¹,
Heng Yin²,
Hua Huang¹ &
…
Ling He ORCID: orcid.org/0000-0002-7168-2737¹

825 Accesses
9 Citations
Explore all metrics

Abstract

Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1 × 8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1 × 8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3 × 3, 4 × 4, 5 × 5, and 6 × 6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Evaluation of Hypernasality Based on a Cleft Palate Speech Database

Article 28 March 2015

Significance of Source Information in Hypernasality Detection

Automatic Detection of Pharyngeal Fricatives in Cleft Palate Speech

References

C. Agarwal, A. Sharma, Image understanding using decision tree based machine learning, in International Conference on Information Technology and Multimedia (IEEE, 2012), pp. 1–8
E. Akafi, M. Vali, N. Moradi, Detection of hypernasal speech in children with cleft palate, in 19th Iranian Conference of Biomedical Engineering (ICBME) (IEEE, 2013), pp. 237–241
A. Amelot, L. Crevier-Buchman, S. Maeda, Observations of velopharyngeal closure mechanism in horizontal and lateral direction from fiberscopic data, in 15th International Congress of Phonetic Sciences, 2003, pp. 3021–3024
T. Ananthakrishna, K. Shama, U.C. Niranjan, k-means nearest neighbor classifier for voice pathology, in Proceedings of the IEEE Indicon (IEEE, 2004), pp. 352–354
V. Ananthanatarajan, S. Jothilakshmi, Segmentation of continuous speech into consonant and vowel units using formant frequencies. Int. J. Comput. Appl. 56(15), 24–27 (2012)
Google Scholar
M. Andreas, H.N. Florian, B. Tobias, N.T. Elmar, S. Florian, N. Emeka, S. Maria, Automatic detection of articulation disorders in children with cleft lip and palate. J. Acoust. Soc. Am. 126(5), 2589–2602 (2009)
Article Google Scholar
J.R.O. Arroyave, J.F.V. Bonilla, Automatic detection of hypernasality in children, in International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC) (Springer, 2011), pp. 167–174
Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MATH MathSciNet Google Scholar
M. Bianchini, F. Scarselli, On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn. Syst. 25(8), 1553–1565 (2014)
Article Google Scholar
P. Birch, B. Gumoes, S. Prytz, A. Karle, H. Stavad, J. Sundberg, Effects of a velopharyngeal opening on the sound transfer characteristics of the vowel [a]. Speech Music Hear. Q. Prog. Status Rep. 43, 9–15 (2002)
Google Scholar
T. Bocklet, K. Riedhammer, U. Eysholdt, E. Nöth, Automatic phoneme analysis in children with Cleft Lip and Palate, in IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2013), pp. 7572–7576
D.A. Cairns, J.H. Hansen, J.E. Riski, A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Trans. Biomed. Eng. 43(1), 35–45 (1996)
Article Google Scholar
M.A. Carbonneau, E. Granger, Y. Attabi, G. Gagnon, Feature learning from spectrograms for assessment of personality traits. IEEE Trans. Affect. Comput. (2016). https://doi.org/10.1109/TAFFC.2017.2763132
Article Google Scholar
G. Carneiro, J. Nascimento, A.P. Bradley, Automated analysis of unregistered multi-view mammograms with deep learning. IEEE Trans. Med. Imaging 36(11), 2355–2365 (2017)
Article Google Scholar
G. Castellanos, G. Daza, L. Sanchez, O. Castrillon, J. Suarez, Acoustic speech analysis for hypernasality detection in children, in International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE, 2006), pp. 5507–5510
M. Cernak, S. Tong, Nasal speech sounds detection using connectionist temporal classification, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2018), pp. 5574–5578
S. Chambon, M.N. Galtier, P.J. Arnal, G. Wainrib, A. Gramfort, A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Trans. Rehabil. Eng. 26(4), 758–769 (2018)
Google Scholar
Y. Chen, H. Jiang, C. Li, X. Jia, P. Ghamisi, Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251 (2016)
Article Google Scholar
C.D.L. Cruz, B. Santhanam, A joint EMD and Teager-Kaiser energy approach towards normal and nasal speech analysis, in 50th Asilomar Conference on Signals, Systems and Computers (IEEE, 2016), pp. 429–433
J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete-Time Processing of Speech Signals (Prentice-Hall, Englewood Cliffs, 1993)
Google Scholar
T. Dodderi, M. Narra, S.M. Varghese, D.T. Deepak, Spectral analysis of hypernasality in cleft palate children: a pre-post surgery comparison. J. Clin. Diagn. Res. 10(1), 1–3 (2016)
Article Google Scholar
A.K. Dubey, S.M. Prasanna, S. Dandapat, Pitch-adaptive front-end feature for hypernasality detection, in Interspeech 2018, 2018, pp. 372–376
A.K. Dubey, S.R.M. Prasanna, S. Dandapat, Zero time windowing analysis of hypernasality in speech of Cleft Lip and palate children, in Twenty Second National Conference on Communication (NCC) (IEEE, 2016), pp. 1–6
A.K. Dubey, A. Tripathi, S. Prasanna, S. Dandapat, Detection of hypernasality based on vowel space area. J. Acoust. Soc. Am. 143(5), 412–417 (2018)
Article Google Scholar
T. Fawcett, ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31(1), 1–38 (2004)
MathSciNet Google Scholar
H.M. Fayek, M. Lech, L. Cavedon, Evaluating deep learning architectures for speech emotion recognition. Neural Netw. 92, 60–68 (2017)
Article Google Scholar
W.T. Fitch, J. Giedd, Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J. Acoust. Soc. Am. 106(1), 1511–1522 (1999)
Article Google Scholar
E.S. Fonseca, J.C. Pereira, Normal versus pathological voice signals. IEEE Eng. Med. Biol. Mag. 28(5), 44–48 (2009)
Article Google Scholar
S.K. Gaikwad, B.W. Gawali, P. Yannawar, A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2010)
Google Scholar
L.J. Gerstman, Classification of self-normalized vowels. IEEE Trans. Audio Electroacoust. 16(1), 78–80 (1968)
Article Google Scholar
H.R. Gilbert, M.P. Robb, Y. Chen, Formant frequency development: 15 to 36 months. J. Voice 11(3), 260–266 (1997)
Article Google Scholar
X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323
M. Golabbakhsh, F. Abnavi, E.M. Kadkhodaei, F. Derakhshandeh, F. Khanlar, P. Rong, D.P. Kuehn, Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech. J. Acoust. Soc. Am. 141(2), 929–935 (2017)
Article Google Scholar
S. Haque, M.H. Ali, A.K.M.F. Haque, Cross-gender acoustic differences in hypernasal speech and detection of hypernasality, in International Workshop on Computational Intelligence (IWCI) (IEEE, 2017), pp. 187–191
S. Haque, M. Hanif, A.K.M. Fazlul, Variability of acoustic features of hypernasality and it’s assessment. Int. J. Adv. Comput. Sci. Appl. 7(9), 195–201 (2016)
Google Scholar
L. He, J. Zhang, Q. Liu, J. Zhang, H. Yin, M. Lech, Automatic detection of glottal stop in cleft palate speech. Biomed. Signal Process. Control 39, 230–236 (2018)
Article Google Scholar
L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech. IEEE Signal Process. Lett. 21(10), 1298–1301 (2014)
Article Google Scholar
G. Henningsson, D.P. Kuehn, D. Sell, T. Sweeney, J.E. Trost-Cardamone, T.L. Whitehill, Universal parameters for reporting speech outcomes in individuals with cleft palate. Cleft Palate Craniofac. J. 45(1), 1–17 (2008)
Article Google Scholar
G.E. Henningsson, A.M. Isberg, Velopharyngeal movement patterns in patients alternating between oral and glottal articulation: a clinical and cineradiographical study. Cleft Palate J. 23(1), 1–9 (1986)
Article Google Scholar
J. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97(1), 3099–3111 (1995)
Article Google Scholar
G.E. Hinton, A practical guide to training restricted Boltzmann machines, in Neural Networks: Tricks of the Trade, ed. by G. Montavon, G.B. Orr, K.R. Müller (Springer, Berlin, 2012), pp. 599–619
Chapter Google Scholar
C. Huang, Analysis of speaker variability, in Seventh European Conference on Speech Communication and Technology (Eurospeech) (2001), pp. 1377–1380
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015. arXiv:1502.03167
I. Jacobi, On variation and change in diphthongs and long vowels of spoken Dutch. Ph.D. Dissertation, Universiteit of Amsterdam, 2009
R. Kataoka, D.W. Warren, D.J. Zajac, R. Mayo, R.W. Lutz, The relationship between spectral characteristics and perceived hypernasality in children. J. Acoust. Soc. Am. 109(1), 2181–2189 (2001)
Article Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, 2014. arXiv preprint arXiv:1412.6980
N. Krüger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A.J. Rodríguezsánchez, L. Wiskott, Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1847–1871 (2013)
Article Google Scholar
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–443 (2015)
Article Google Scholar
G.S. Lee, C.P. Wang, C.C. Yang, T.B. Kuo, Voice low tone to high tone ratio: a potential quantitative index for vowel [a:] and its nasalization. IEEE Trans. Biomed. Eng. 53(7), 1437–1439 (2006)
Article Google Scholar
G.S. Lee, C.P. Wang, S. Fu, Evaluation of hypernasality in vowels using voice low tone to high tone ratio. Cleft Palate Craniofac. J. 46(1), 47–52 (2009)
Article Google Scholar
S. Lee, A. Potamianos, S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
Article Google Scholar
C.X. Ling, J. Huang, H. Zhang, AUC: a better measure than accuracy in comparing learning algorithms, in Conference of the Canadian Society for Computational Studies of Intelligence (Springer, 2003), pp. 329–341
A. Maier, C. Hacker, E. Noth, E. Nkenke, T. Haderlein, F. Rosanowski, M. Schuster, Intelligibility of Children with cleft lip and palate: evaluation by speech recognition techniques, in 18th International Conference on Pattern Recognition (ICPR) (IEEE, 2006), pp. 274–277
A. Maier, C. Hacker, M. Schuster, Analysis of hypernasal speech in children with cleft lip and palate, in International Conference on Text, Speech and Dialogue (TSD) (Springer, 2008), pp. 389–396
A. Mirzaei, M. Vali, Detection of hypernasality from speech signal using group delay and wavelet transform, in 6th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE, 2017), pp. 189–193
J.B. Moon, D.P. Kuehn, J.J. Huisman, Measurement of velopharyngeal closure force during vowel production. Cleft Palate Craniofac. J. 31(5), 356–363 (1994)
Article Google Scholar
D. Morrison, R. Wang, L.C. De Silva, Ensemble methods for spoken emotion recognition in call-centres. Speech Commun. 49(2), 98–112 (2007)
Article Google Scholar
R.G. Nieto, J.I. Marín-Hurtado, L.M. Capacho-Valbuena, A.A. Suarez, Pattern recognition of hypernasality in voice of patients with cleft and lip palate, in XIX Symposium on Image, Signal Processing and Artificial Vision (IEEE, 2015), pp. 1–5
K. Nikitha, S. Kalita, C. Vikram, M. Pushpavathi, S.M. Prasanna, Hypernasality severity analysis in cleft lip and palate speech using vowel space area, in Interspeech, 2017, pp. 1829–1833
L. Nord, G. Ericsson, Acoustic investigation of cleft palate speech before and after speech therapy. Speech Transm. Lab. Q. Prog. Status Rep. 26(4), 15–27 (1985)
Google Scholar
J.R. Orozco-Arroyave, J.F. Vargas-Bonilla, J.D. Arias-Londoño, S. Murillo-Rendón, G. Castellanos-Domínguez, J.F. Garcés, Nonlinear dynamics for hypernasality detection in spanish vowels and words. Cognit. Comput. 5(4), 448–457 (2013)
Article Google Scholar
J.R. Orozco-Arroyave, J.D. Arias-Londoño, J.F. Vargas-Bonilla, S. Skodda, J. Rusz, K. Daqrouq, F. Hönig, E. Nöth, Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J. Biomed. Health Inform. 19(6), 1820–1828 (2015)
Article Google Scholar
D. Palaz, R. Collobert, Analysis of cnn-based speech recognition system using raw speech as input, in Interspeech, 2015, pp. 11–15
A. Parush, D.J. Ostry, Superior lateral pharyngeal wall movements in speech. J. Acoust. Soc. Am. 80(3), 749–756 (1986)
Article Google Scholar
D.B. Pisoni, Variability of vowel formant frequencies and the quantal theory of speech: a first report. Phonetica 37(5–6), 285–305 (1980)
Article Google Scholar
R. Prasad, S.R. Kadiri, S.V. Gangashetty, B. Yegnanarayana, Discriminating nasals and approximants in English language using zero time windowing, in Interspeech 2018, 2018, pp. 177–181
D.K. Rah, Y.L. Ko, C. Lee, D.W. Kim, A noninvasive estimation of hypernasality using a linear predictive model. Ann. Biomed. Eng. 29(7), 587–594 (2001)
Article Google Scholar
W. Ryan, C. Hawkins, Ultrasonic measurement of lateral pharyngeal wall movement at the velopharyngeal port. Cleft Palate J. 13, 156–164 (1976)
Google Scholar
L. Salhi, A. Cherif, Selection of pertinent acoustic features for detection of pathological voices, in 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO) (IEEE, 2013), pp. 1–6
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
M. Schuster, A. Maier, T. Bocklet, E. Nkenke, A. Holst, U. Eysholdt, F. Stelzle, Automatically evaluated degree of intelligibility of children with different cleft type from preschool and elementary school measured by automatic speech recognition. Int. J. Pediatr. Otorhinolaryngol. 76(3), 362–369 (2012)
Article Google Scholar
B.L. Smith, M.K. Kenney, S. Hussain, A longitudinal investigation of duration and temporal variability in children’s speech production. J. Acoust. Soc. Am. 99(1), 2344–2349 (1996)
Article Google Scholar
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
P. Tarun, C.Y. Espy-Wilson, B.H. Story, Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. J. Acoust. Soc. Am. 121(6), 3858–3873 (2007)
Article Google Scholar
E. Verteletskaya, K. Sakhnov, B. Simak, Pitch detection algorithms and voiced/unvoiced classification for noisy speech, in International Conference on Systems, Signals and Image Processing (IEEE, 2009), pp. 1–5
P. Vijayalakshmi, T. Nagarajan, J. Rav, Selective pole modification-based technique for the analysis and detection of hypernasality, in IEEE Region 10 Conference TENCON 2009–2009 (IEEE, 2009), pp. 1–5
P. Vijayalakshmi, M.R. Reddy, O.S. Douglas, Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4), 621–629 (2007)
Article Google Scholar
C.M. Vikram, A. Tripathi, S. Kalita, S.R. Mahadeva Prasanna, Estimation of hypernasality scores from cleft lip and palate speech, in Interspeech, 2018, pp. 1701–1705
A.P. Vogel, H.M. Ibrahim, S. Reilly, N. Kilpatrick, A comparative study of two acoustic measures of hypernasality. J. Speech Lang. Hear. Res. 52(6), 1640–1651 (2009)
Article Google Scholar
X.Y. Wang, Y.P. Huang, J.H. Qian, L. He, H. Huang, H. Yin, Initial and final segmentation in cleft palate speech based on acoustic characteristics. Comput. Eng. Appl. 54(8), 123–136 (2018)
Google Scholar
W. Yin, H. Schütze, B. Xiang, B. Zhou, Abcnn: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2015)
Article Google Scholar
W. Zhang, G. Li, L. Wang, Application of improved spectral subtraction algorithm for speech emotion recognition, in Fifth International Conference on Big Data and Cloud Computing (IEEE, 2015), pp. 213–216

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China 61503264.

Author information

Authors and Affiliations

College of Electrical Engineering and Information Technology, Sichuan University, Chengdu, China
Xiyue Wang, Ming Tang, Sen Yang, Hua Huang & Ling He
Hospital of Stomatology, Sichuan University, Chengdu, China
Heng Yin

Authors

Xiyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Sen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Heng Yin
View author publications
You can also search for this author in PubMed Google Scholar
Hua Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ling He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Tang, M., Yang, S. et al. Automatic Hypernasality Detection in Cleft Palate Speech Using CNN. Circuits Syst Signal Process 38, 3521–3547 (2019). https://doi.org/10.1007/s00034-019-01141-x

Download citation

Received: 31 August 2018
Revised: 09 May 2019
Accepted: 10 May 2019
Published: 20 May 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s00034-019-01141-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Abstract

Access this article

Similar content being viewed by others

Automatic Evaluation of Hypernasality Based on a Cleft Palate Speech Database

Significance of Source Information in Hypernasality Detection

Automatic Detection of Pharyngeal Fricatives in Cleft Palate Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Abstract

Access this article

Similar content being viewed by others

Automatic Evaluation of Hypernasality Based on a Cleft Palate Speech Database

Significance of Source Information in Hypernasality Detection

Automatic Detection of Pharyngeal Fricatives in Cleft Palate Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation