Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks

Cheng, Feng; Wang, Xiaochen; Gang, Li; Tu, Weiping; Wang, Jinshan

doi:10.1007/978-3-319-77383-4_69

Feng Cheng^19,20,21,
Xiaochen Wang^19,20,21,
Li Gang^19,21,
Weiping Tu^19,20 &
…
Jinshan Wang^19,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10736))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2336 Accesses
1 Citations

Abstract

Speech intelligibility is a significant factor for successful speech communication. To enhance the intelligibility, many methods have been proposed, mainly by operating the speech signal such as increasing the amplitude or modifying the speech spectrum. However, their effects are limited when the background noise is extremely strong. In this paper, we purpose a preprocessed noise cancellation model to enhance the speech intelligibility by predicting the cancelling signal and superimposing it into the speech signal. We build a deep neural network (DNN) model to make the prediction algorithm have better accuracy. Finally, the effectiveness of the algorithm was verified by objective and subjective tests, the average of signal-to-noise ratio (SNR) improved 4.5 dB, the average of speech intelligibility index (SII) increased 5.4% and the average of comparison mean opinion score (CMOS) rose 1.16 on a variety of test cases.

X. Wang—This work is supported by National Nature Science Foundation of China (No. 61231015, 61671335); National High Technology Research and Development Program of China (863 Program) No. 2015AA016306; Hubei Province Technological Innovation Major Project (No. 2016AAA015).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kleijn, W.B., Crespo, J.B., Hendriks, R.C., et al.: Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process. Mag. 32(2), 43–54 (2015)
Article Google Scholar
Niederjohn, R., Grotelueschen, J.: The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans. Acoust. Speech Signal Process. 24(4), 277–282 (1976)
Article Google Scholar
Niederjohn, R., Grotelueschen, J.: Speech intelligibility enhancement in a power generating noise environment. IEEE Trans. Acoust. Speech Signal Process. 26(4), 378–380 (1978)
Article Google Scholar
Sauert, B., Vary, P.: Near end listening enhancement: Speech intelligibility improvement in noisy environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006. IEEE, vol. 1, pp. I-I (2006)
Google Scholar
Zorila, T.C., Kandia, V., Stylianou, Y.: Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Schepker, H.F., Rennies, J., Doclo, S.: Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression. In: INTERSPEECH. pp. 3577–3581 (2013)
Google Scholar
Petkov, P.N., Kleijn, W.B.: Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) 23(2), 327–338 (2015)
Article Google Scholar
Goli, P., Karami-mollaei, M.R.: Speech intelligibility improvement in noisy environments based on energy correlation in frequency bands. Digit. Signal Proc. 62, 238–248 (2017)
Article Google Scholar
ANSI A. S3.: 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 19, 90–119 (1997)
Google Scholar
Widrow, B., Glover, J.R., McCool, J.M., et al.: Adaptive noise cancelling: Principles and applications. Proc. IEEE 63(12), 1692–1716 (1975)
Article Google Scholar
Guarnaccia, C.: Advanced tools for traffic noise modelling and prediction. WSEAS Trans. Syst. 12(2), 121–130 (2013)
Google Scholar
Varga, A., Steeneken, H.J.M., Tomlinson, M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit (1992)
Google Scholar
ETSI TS 103 224: A sound field reproduction method for terminal testing including a background noise database. European Telecommunications Standards Institute (2014)
Google Scholar
Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)
Article Google Scholar
Recommendation I. 800: Methods for subjective determination of transmission quality. International Telecommunication Union (1996)
Google Scholar
Khademi, S., Hendriks, R.C., Kleijn, W.B.: Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 654–658 (2016)
Google Scholar
Petkov, P.N., Stylianou, Y.: Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation[J]. IEEE Signal Process. Lett. 23(10), 1434–1438 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, Wuhan, China
Feng Cheng, Xiaochen Wang, Li Gang, Weiping Tu & Jinshan Wang
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China
Feng Cheng, Xiaochen Wang & Weiping Tu
Research Institute of Wuhan University in Shenzhen, Shenzhen, China
Feng Cheng, Xiaochen Wang, Li Gang & Jinshan Wang

Authors

Feng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Gang
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Tu
View author publications
You can also search for this author in PubMed Google Scholar
Jinshan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochen Wang .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, F., Wang, X., Gang, L., Tu, W., Wang, J. (2018). Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-77383-4_69
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics