A Hybrid Deep Ensemble for Speech Disfluency Classification

Pravin, Sheena Christabel; Palanivelan, M.

doi:10.1007/s00034-021-01657-1

A Hybrid Deep Ensemble for Speech Disfluency Classification

Published: 11 February 2021

Volume 40, pages 3968–3995, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

499 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, a novel Hybrid Deep Ensemble (HDE) is proposed for automatic speech disfluency classification on a sparse speech dataset. Categorizations of speech disfluencies for diagnosis of speech disorders have so long relied on sophisticated deep learning models. Such a task can be accomplished by a straightforward approach with high accuracy by the proposed model which is an optimal combination of diverse machine learning and deep learning algorithms in a hierarchical arrangement which includes a deep autoencoder that yields the compressed latent features. The proposed model has shown considerable improvement in downgrading processing time overcoming the issues of cumbersome hyper-parameter tuning and huge data demand of the deep learning algorithms with high classification accuracy. Experimental results show that the proposed Hybrid Deep Ensemble has superior performance compared to the individual base learners, and the deep neural network as well. The proposed model and the baseline models were evaluated in terms of Cohen’s kappa coefficient, Hamming loss, Jaccard score, F-score and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural network architectures for dysarthric speech analysis and recognition

Article 09 January 2021

Noise Robust Whisper Features for Dysarthric Severity-Level Classification

Classification of Speech Dysfluencies Using Speech Parameterization Techniques and Multiclass SVM

Data Availability

The disfluent speech dataset generated and analysed during the current study is available from the corresponding author on reasonable request.

References

R. Behroozmand, F. Almasganj, Optimal selection of wavelet-packed-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Comput. Biol. Med. 37(1), 474–485 (2007)
Article Google Scholar
M. Black, J. Tepperman, S. Lee, P. Price, S. Narayanan, Automatic detection and classification of disfluent reading miscues in young children’s speech for the purpose of assessment. Proc. INTERSPEECH 2007, 206–209 (2007)
Google Scholar
O. Bloodstein: A handbook on stuttering, San Diego, CA, p. 178–181 (1995).
P. Boersma, D. Weenink, PRAAT: doing phonetics by computer [Computer program], Version 5.3.51. http://www.praat.org/ (2013), Accessed 2 Jan 2019.
A. Braun, A. Rosin, On the speaker-specificity of hesitation markers, in Proceedings of the 18th International Congress of Phonetic Sciences, U.K., 0731.1-5 (2015).
L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
P.W.D. Charles: Project Title. GitHub repository, https://github.com/charlespwd/project-title (2013), Accessed 03 Jan 2020.
T. Chen, C. Guestrin, XGBoost: A scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, p. 785–794 (2016).
E. Cho, T.H. Ha, A. Waibel, CRF-based disfluency detection using semantic features for German to English spoken language translation, in Proceedings of International Workshop on Spoken Language Translation, Germany (2013).
E.G. Conture, Stuttering, 2nd edn. (Prentice-Hall, Englewood Cliffs, 1990).
Google Scholar
M. Corley, L.J. MacGregor, D.I. Donaldson, It’s the Way that you, er, say it: hesitations in speech affect language comprehension. Cognition 105(3), 658–668 (2007)
Article Google Scholar
A. Czyzewski, A. Kaczmarek, B. Kostek, Intelligent processing of stuttered speech. J. Intell. Inf. Syst. 21(2), 143–171 (2003)
Article Google Scholar
K.C. Fraser, J.A. Meltzer, F. Rudzicz, Linguistic features identify Alzheimer’s disease in narrative speech. J. Alzheimer’s Disorder 49(2), 407–422 (2016)
Article Google Scholar
J.E. Fox Tree, The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. J. Memory Lang. 34(1), 709–738 (1995)
Article Google Scholar
S.H. Fraundorf, D.G. Watson, The disfluent discourse: effects of filled pauses on recall. J. Memory Lang. 65(2), 161–175 (2011)
Article Google Scholar
B. Guitar, T. J. Peters: Stuttering: an integrated approach to its nature and treatment. Baltimore, (1998).
L. Guo, J.B. Tomblin, V. Samelson, Speech disruptions in the narratives of English-speaking children with specific language impairment. J. Speech Lang. Hearing Res. 51(3), 722–738 (2008)
Article Google Scholar
G.E. Hinton, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(1), 1527–1554 (2006)
Article MathSciNet Google Scholar
P. Howell, S. Davis, J. Bartrip, The university college London archive of stuttered speech (UCLASS). J. Speech Lang. Hear. Res. 52(1), 556–569 (2009)
Article Google Scholar
T. Hudson, G. de Jong, K. McDougall, P. Harrison, F. Nolan, F0 statistics for 100 young male speakers of standard southern British English, in Proceedings of the 16th International Congress of Phonetic Sciences, Germany, p. 1809–1812 (2007).
F.S. Juste, C.R. Furquim de Andrade, Speech disfluency types of fluent and stuttering individuals: age effects, international journal of phoniatrics, speech therapy and communication. Pathology 63(2), 57–64 (2011)
Google Scholar
T. Kourkounakis, A. Hajavi and A. Etemad, Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory, in Proceedings of ICASSP2020, Spain, p. 6089–6093 (2020).
Y. Liu, E. Shriberg, A. Stolcke, M. Harper, Comparing HMM, maximum entropy, and conditional random fields for disfluency detection, in Proceedings of INTERSPEECH 2005, Portugal, p. 3313–3316 (2005).
Y. Liu, A. Stolcke, E. Shriberg, M. Harper, Using Conditional Random Fields for sentence boundary detection in speech, in Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, USA, p. 451–458 (2005).
W. Loh, Classification and regression trees. WIREs Data Min. Knowl. Discov. 1(1), 14–23 (2011)
Article Google Scholar
P.J. Lou, M. Johnson, Disfluency detection using a noisy channel model and a deep neural language model, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers, Canada, p. 547–553 (2017).
K. McDougall, M. Duckworth, Profiling fluency: an analysis of individual variation in disfluencies in adult males. Speech Commun. 95(1), 16–27 (2017)
Article Google Scholar
H. Maclay, C.E. Osgood, Hesitation phenomena in spontaneous English speech. WORD 15(1), 19–44 (1959)
Article Google Scholar
H. Medeiros, H. Moniz, F. Batista, I. Trancoso, L. Nunes, Disfluency detection based on prosodic features for university lectures, in Proceedings of INTERSPEECH’2013, France, p. 2629–2633 (2013).
J. Mekyska, B. Beitia, N. Barroso, A. Estanga, M. Tainta, M. Ecay-Torres, Advances on automatic speech analysis for early detection of Alzheimer Disease: a non-literal multi-task approach. Curr. Alzheimer Res. 15(2), 139–148 (2018)
Article Google Scholar
S.O. Orimaye, J.S. Wong, C.P. Wong, Deep language space neural network for classifying mild cognitive impairment and Alzheimer-type dementia. PLoS ONE 13(11), 1–31 (2018)
Article Google Scholar
J.R. Orozco-Arroyave, J.C. Vásquez-Correa, J.F. Vargas-Bonilla, R. Arora, N. Dehak, P.S. Nidadavolu, H. Christensen, F. Rudzicz, M. Yancheva, H. Chinaei, A. Vann, N. Vogler, T. Bocklet, M. Cernak, J. Hannink, E. Nöth, NeuroSpeech: an open-source software for Parkinson’s speech analysis. Digital Signal Process. NeuroSpeech 77(1), 207–221 (2017)
Google Scholar
A. Ortiz, J. Munilla, J.M. Gorriz, J. Ramırez, Ensembles of deep learning architectures for the early diagnosis of the Alzheimer’s disease. Int. J. Neural Syst. 26(7), 1650025 (2016)
Article Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(1), 2825–2830 (2011)
MathSciNet MATH Google Scholar
M. Pishgar, F. Karim, S. Majumdar, H. Darabi, Pathological voice classification using Mel-Cepstrum vectors and support vector machine, in Proceedings of 2018 IEEE International Conference on Big Data, USA, p. 5267–5271, (2018).
V. Rangarajan, S. Narayanan, Analysis of disfluent repetitions in spontaneous speech recognition, in Proceedings of 14th European Signal Processing Conference, Italy, p. 1–5 (2006).
N.B. Ratner, B. MacWhinney, Fluency bank: a new resource for fluency research and practice. J. Fluency Disorders 56(1), 69–80 (2018)
Article Google Scholar
M. Reisser, Recurrent Neural Networks in speech disfluency detection and punctuation prediction. Master’s Thesis at the Department of Informatics, Interactive Systems Lab (ISL), Institute of Anthropomatics and Robotics, Karlsruhe Institute of Technology, p. 50–60 (2015).
R. Riad, A.C. Bachoud-Lévi, F. Rudzicz, E. Dupoux, Identification of primary and collateral tracks in stuttered speech, in Proceedings of the 12th Language Resources and Evaluation Conference, p. 1681–1688 (2020).
X. Shao, J. Barker, Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment. Speech Commun. 50(1), 337–353 (2008)
Article Google Scholar
E. Shriberg, Preliminaries to a theory of speech disfluencies, Ph.D. thesis, University of California, Berkeley, CA, (1994).
E. Shriberg, R. Bates, A. Stolcke, A Prosody only decision tree model for disfluency detection, in Proceedings of Eurospeech’97, Greece, p. 2383–2386 (1997).
F. Stouten, J. Duchateau, J.P. Martens, P. Wambacq, Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48(1), 1590–1606 (2006)
Article Google Scholar
G. Thomas Dietterich, Ensemble Methods in Machine Learning, in: Proceedings of the First International Workshop on Multiple Classifier Systems, Italy, 1–15 (2000).
M.J. Van der Laan, E.C. Polley, A.E. Hubbard, Super Learner, U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 222 (2007).
B. Villegas, K. M. Flores, K. José Acuña, K. Pacheco-Barrios and D. Elias, A novel stuttering disfluency classification system based on respiratory biosignals, in Proceedings of 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Germany, p. 4660–4663 (2019).
F. Wang, W. Chen, Z. Yang, Q. Dong, S. Xu, B. Xu, Semi-supervised dis-fluency detection, in Proceedings of the 27th International Conference on Computational Linguistics, USA, p. 3529–3538 (2018).
S. Wang, W. Che, Y. Zhang, M. Zhang, T. Liu, Transition-based dis-fluency detection using LSTMs, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Denmark, p. 2785–2794, (2017).
S. Wang, W. Che, Q. Liu, P. Qin, T. Liu T., W.Y. Wang, Multi-Task self-supervised learning for disfluency detection, in Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, p. 9193–9200 (2020).
J.F. Yeh, C.H. Wu, Edit disfluency detection and correction using a clean-up language model and an alignment model. IEEE Trans. Audio Speech Lang. Process. 14(5), 1574–1582 (2006)
Article Google Scholar
V. Zayats, M. Ostendorf, H. Hajishirzi, Disfluency detection using a bidirectional LSTM, in Proceedings of INTERSPEECH2016, USA, 2523-2527 (2016).

Download references

Acknowledgements

The authors gratefully acknowledge the anonymous reviewers for their valuable comments and suggestions which helped us improve the manuscript. This project is funded by AICTE, India, under the Research Progress Scheme (RPS). The Grant Reference No. is: 8-40/RIFD/RPS/Policy-1/2017-18, dated 15 March 2019. The authors are the joint investigators of the project.

Author information

Authors and Affiliations

Department of ECE, Rajalakshmi Engineering College, Chennai, India
Sheena Christabel Pravin & M. Palanivelan

Authors

Sheena Christabel Pravin
View author publications
You can also search for this author in PubMed Google Scholar
M. Palanivelan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheena Christabel Pravin.

Ethics declarations

Conflict of interest

The authors alone are responsible for the content and writing of the paper, and they report no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pravin, S.C., Palanivelan, M. A Hybrid Deep Ensemble for Speech Disfluency Classification. Circuits Syst Signal Process 40, 3968–3995 (2021). https://doi.org/10.1007/s00034-021-01657-1

Download citation

Received: 30 May 2020
Revised: 17 January 2021
Accepted: 22 January 2021
Published: 11 February 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00034-021-01657-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Deep Ensemble for Speech Disfluency Classification

Abstract

Access this article

Similar content being viewed by others

Deep neural network architectures for dysarthric speech analysis and recognition

Noise Robust Whisper Features for Dysarthric Severity-Level Classification

Classification of Speech Dysfluencies Using Speech Parameterization Techniques and Multiclass SVM

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Hybrid Deep Ensemble for Speech Disfluency Classification

Abstract

Access this article

Similar content being viewed by others

Deep neural network architectures for dysarthric speech analysis and recognition

Noise Robust Whisper Features for Dysarthric Severity-Level Classification

Classification of Speech Dysfluencies Using Speech Parameterization Techniques and Multiclass SVM

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation