Speech Stuttering Detection and Removal Using Deep Neural Networks

Rajput, Shaswat; Nersisson, Ruban; Raj, Alex Noel Joseph; Mary Mekala, A.; Frolova, Olga; Lyakso, Elena

doi:10.1007/978-981-16-6554-7_50

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 808))

2299 Accesses

Abstract

There are more than 70 million people worldwide who suffer from stuttering problems. This will affect the confidence of public speaking in people who suffer from this issue. To solve this problem many people take therapy sessions but the therapy sessions are a temporary solution, as soon as they leave therapy sessions this problem might arise again. This work aims to use state of the art machine learning algorithms that have improved over the past few years to solve this problem. We have used the dataset from UCLASS archives which provide the data for stuttered speech in.wav format with time-aligned transcriptions. We have tried different algorithms and optimized our model by hyper parameter tuning to maximize the model’s accuracy. The algorithm is tested on random speech data with low to heavy stuttering from the same dataset, and it is observed that there is significant reduction in the Word Error Rate (WER) for most of the test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 469.00; Price excludes VAT (USA)

Softcover Book: USD 599.99; Price excludes VAT (USA)

Hardcover Book: USD 599.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chee, L.S., Ai, O.C., Yaacob, S.: Overview of automatic stuttering recognition system. In: Proceedings International Conference on Man-Machine Systems, pp. 1–6, Batu Ferringhi, Penang Malaysia, October (2009)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Kesarkar, M.P.: Feature extraction for speech recognition. Electronic systems, EE, Department, IIT Bombay (2003)
Google Scholar
Szczurowska, I., Kuniszyk-Jóźkowiak, W., Smołka, E.: The application of Kohonen and multilayer perceptron networks in the speech non fluency analysis. Arch. Acoust. 31(4(S)), 205–210 (2014)
Google Scholar
Fabbri-Destro, M., Rizzolatti, G.: Mirror neurons and mirror systems in monkeys and humans. Physiology 23(3), 171–179 (2008)
Article Google Scholar
Sanju, H.K., Choudhury, M., Kumar, V.: Effect of stuttering intervention on depression, stress and anxiety among individuals with stuttering: case study. J. Speech Pathol. Ther. 3(1), 132 (2018)
Google Scholar
Howell, P., Davis, S., Bartrip, J.: The university college London archive of stuttered speech (UCLASS). J. Speech Lang. Hear. Res. 52, 556–569 (2009)
Google Scholar
MacWhinney, B.: The CHILDES project part 1: the CHAT transcription format (2009)
Google Scholar
Howell, P., Davis, S., Bartrip, J., Wormald, L.: Effectiveness of frequency shifted feedback at reducing disfluency for linguistically easy, and difficult, sections of speech (original audio recordings included). Stammering Res.: On-line J. Publ. Br. Stammering Assoc. 1(3), 309 (2004)
Google Scholar
Zhang, Z.: Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)
Google Scholar
Gliozzo, A., et al.: Building Cognitive Applications with IBM Watson Services: Volume 1 Getting Started. IBM Redbooks, Armonk (2017)
Google Scholar
McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8 (2015)
Google Scholar
Schröder, M.: Expressive speech synthesis: past, present, and possible futures. In: Tao, J., Tan, T. (eds.) Affective Information Processing, pp. 111–126. Springer, London (2009). https://doi.org/10.1007/978-1-84800-306-4_7

Download references

Acknowledgement

The authors express their sincere gratitude to Vellore Institute of Technology, Vellore for encouragement provided as well as the support and resources to complete the project. We would also thank University College London archive of Stuttered Speech (UCLASS) for the dataset and the volunteers who have provided the data and the Russian Foundation for Basic Research (project 19–57-45008–IND_a) – for Russian researcher and Department of Science and Technology (DST) (INTRUSRFBR382) - for Indian researcher.

Author information

Authors and Affiliations

School of Electrical Engineering, Vellore Institute of Technology, Vellore, India
Shaswat Rajput & Ruban Nersisson
Key Laboratory of Digital Signal and Image Processing of Guangdong Province, Department of Electronic Engineering, College of Engineering, Shantou University, Shantou, China
Alex Noel Joseph Raj
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
A. Mary Mekala
St. Petersburg State University, Saint Petersburg, Russia
Olga Frolova & Elena Lyakso

Authors

Shaswat Rajput
View author publications
You can also search for this author in PubMed Google Scholar
Ruban Nersisson
View author publications
You can also search for this author in PubMed Google Scholar
Alex Noel Joseph Raj
View author publications
You can also search for this author in PubMed Google Scholar
A. Mary Mekala
View author publications
You can also search for this author in PubMed Google Scholar
Olga Frolova
View author publications
You can also search for this author in PubMed Google Scholar
Elena Lyakso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruban Nersisson .

Editor information

Editors and Affiliations

School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
Qi Liu
School of Computing, Edinburgh Napier University, Edinburgh, UK
Xiaodong Liu
State Key Laboratory of Radar Signal Processing, Xidian University, Xi’an, Shaanxi, China
Bo Chen
School of Civil Engineering and Transportation, Hebei University of Technology, Tianjin, Tianjin, China
Yiming Zhang
Hechi Universtiy, Hechi, Guangxi, China
Jiansheng Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajput, S., Nersisson, R., Raj, A.N.J., Mary Mekala, A., Frolova, O., Lyakso, E. (2022). Speech Stuttering Detection and Removal Using Deep Neural Networks. In: Liu, Q., Liu, X., Chen, B., Zhang, Y., Peng, J. (eds) Proceedings of the 11th International Conference on Computer Engineering and Networks. Lecture Notes in Electrical Engineering, vol 808. Springer, Singapore. https://doi.org/10.1007/978-981-16-6554-7_50

Download citation

DOI: https://doi.org/10.1007/978-981-16-6554-7_50
Published: 12 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6553-0
Online ISBN: 978-981-16-6554-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics