Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility

Schlesinger, A.; Luther, Chr.

doi:10.1007/978-3-642-37762-4_11

A. Schlesinger² &
Chr. Luther²

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

4077 Accesses
1 Citations

Abstract

At the threshold of understanding speech in noisy environments, the neural exploitation of interaural differences constitutes a significant margin of intelligibility. Mimicking spatial hearing for improving speech intelligibility in hearing aids and robotics has been demonstrated by compelling results. This chapter reviews binaural speech processors and highlights approaches for their optimal application. Binaural algorithms of speech enhancement draw on the assumption that the target and the noise signal strike a head-mounted processor from different directions and cause distinctive interaural parameters. This assumption is, however, often violated in degraded acoustics. In order to study this degradation, the chapter starts with an examination of binaural statistics in different noise conditions. Subsequently, standard binaural speech processors are studied that use different waveform features, namely, the binaural coherence of the fine-structure, the binaural differences of the fine-structure and the binaural differences of the envelope. As a means to cater to a fair comparison, each algorithm underwent a stochastic optimization of the algorithmic parameters in a set of prototypical speech-in-noise scenes, whereby an instrumental measure of speech intelligibility served as the objective function. Furthermore, the binaural speech processors are applied at the output of commercially-available hearing aids that feature superdirective beamformers with different directivity modes. In this way, the SNR-gain that adds to the pre-processing of the beamformer is assessed. For deriving filter gains from binaural statistics, part three of this chapter describes histogram-based methods and parametric approaches for the binaural fine-structure algorithm, and compares these in a realistic environment with reverberation and additive noise. Part three can also be read as a hands-on description, thereby addressing students and engineers, who are striving for an ad-hoc implementation of a binaural system for speech enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For a particular direction of sound incidence, the interaural transfer function is defined as the quotient of the corresponding head-related transfer functions at each ear. There is also a running interaural amplitude-modulation transfer function for a particular sound direction, which is equivalent to the quotient of the corresponding amplitude-modulation transfer functions of each ear. While the former is rooted in binaural differences of the fine-structure of the waveform, the latter is caused by interaural differences of the envelope. Both types of transfer functions are approximately independent of each other.
2.
Disjointness between two signals in the time-frequency domain, for instance, the signal \(\delta _{j}(d, m)\) and the signal \(\delta _{j^{\prime }}(d, m)\) can be expressed as \(\delta _{j}(d, m)\delta _{j^{\prime }}(d, m) \approx 0\), \( \forall j^{\prime } \not = j\), where \(d\) and \(m\) denote the frequency and time index, respectively.
3.
The same recording chain as well as the same source-receiver distance were applied during the HRTF measurements in the anechoic chamber of the Bochum institute.
4.
Frame length and step size had a length of 16 and 8 ms, respectively. This way they corresponded to the settings in [29, 35].
5.
The I3 has originally been developed to predict the effects of additive noise, peak clipping and center clipping on speech intelligibility [20].
6.
Throughout this chapter, the target source and the interferers were arranged in the horizontal plane.
7.
The hearing-aid program low of the hearing glasses provides a benefit of 4.4 dB as assessed with the directivity-index method of the ANSI S3.35-2005 standard [4]. In the mode high the hearing aid offers an improvement of the directivity index of 7.2 dB.
8.
The term coherent-interference condition was used here to describe an interfering sound source under an-echoic conditions.
9.
In order to provide for an instrumental measure of absolute perceptual speech intelligibility, a perceptual intelligibility test with the applied speech material as well as a subsequent fitting of the instrumental results to the perceptual recognition scores with a logistic function need to be executed. Therefore, un-fitted instrumental measures merely quantify trends.
10.
The current section is an excerpt from the diploma thesis of Ch. Luther, Speech intelligibility enhancement based on multivariate models of binaural interaction, Ruhr-University Bochum, 2012. Contact the author to obtain a copy.

References

J. B. Allen, D. A. Berkley, and J. Blauert. Multimicrophone signal-processing technique to remove room reverberation from speech signals. J. Acoust. Soc. Am., 62:912–915, 1977.
Google Scholar
ANSI/ASA. American national standard methods for calculation of the speech intelligibility index. Technical report, Am. Nat. Standards of the Acoust. Soc. Am., S3.5-1997 (R2007).
Google Scholar
J. Blauert. Epistemological bases of binaural perception—a constructivists’ approach. In Forum Acusticum 2011, Aalborg, Denmark, 2011.
Google Scholar
M. M. Boone. Directivity measurements on a highly directive hearing aid: The hearing glasses. In AES 120th Conv., Paris, France, 2006.
Google Scholar
M. M. Boone, R. C. G. Opdam, and A. Schlesinger. Downstream speech enhancement in a low directivity binaural hearing aid. In Proc, 20th Intl. Congr. Acoust., ICA 2010, Sydney, Australia, 2010.
Google Scholar
A. W. Bronkhorst. The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Acta Acust,/Acustica, 86:117–128, 2000.
Google Scholar
M. Dietz, S. D. Ewert, V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. it, Speech Communication, 53:592–605, 2011.
Google Scholar
A. J. Duquesnoy. Effect of a single interfering noise or speech source upon the binaural sentence intelligibility of aged persons. J. Acoust. Soc. Am., 74:739–743, 1983.
Google Scholar
N. I. Durlach and H. S. Colburn. Handbook of Perception, volume 4, chapter Binaural phenomena. New York: Academic Press, 1978.
Google Scholar
C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116:3075–3089, 2004.
Google Scholar
W. Gaik and W. Lindemann. Ein digitales Richtungsfilter, basierend auf der Auswertung interauraler Parameter von Kunstkopfsignalen. In Fortschr. Akust., DAGA’86, volume 86, pages 721–724, Oldenburg, Germany, 1986.
Google Scholar
E. L. J. George. Factors affecting speech reception in fluctuating noise and reverberation. PhD thesis, Vrije Universiteit, The Netherlands, 2007.
Google Scholar
J. Greenberg and P. Zurek. Microphone arrays: Signal processing techniques and applications, chapter Microphone-array hearing aids. Springer-Verlag, 2001.
Google Scholar
V. Hamacher, U. Kornagel, T. Lotter, and H. Puder. Advances in digital speech transmission, chapter Binaural signal processing in hearing aids. John Wiley & Sons Ltd., 2008.
Google Scholar
S. Harding, J. Barker, and G. J. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, and Language Processing, 14:58–67, 2005.
Google Scholar
C. Houck, J. Joines, and M. Kay. A genetic algorithm for function optimization: a Matlab implementation. North Carolina State University, Raleigh, NC, Technical Report, 1995.
Google Scholar
T. Houtgast and H. J. M. Steeneken. Past, present and future of the Speech Transmission Index, chapter The roots of the STI approach, pages 3–11. TNO Human Factors, Soesterberg, The Netherlands, 2002.
Google Scholar
L. Jeffress. A place theory of sound localization. J. Comparative and Physiological Psychol., 41:35–39, 1948.
Google Scholar
J. M. Kates and K. H. Arehart. A model of speech intelligibility and quality in hearing aids. In IEEE Worksh. Applications of Signal Process. to Audio and Acoustics, WASPAA, pages 53–56, New Paltz, 2005.
Google Scholar
J. M. Kates and K. H. Arehart. Coherence and the speech intelligibility index. J. Acoust. Soc. Am., 117:2224–2237, 2005.
Google Scholar
B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95:1593–1602, 1994.
Google Scholar
D. Kolossa. High-level processing of binaural features. In Forum Acusticum 2011, Aalborg, Denmark, 2011.
Google Scholar
J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication. Speech Comm., 53:677–689, 2010.
Google Scholar
J. Ma, Y. Hu, and P. C. Loizou. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am., 125:3387–3405, 2009.
Google Scholar
N. Madhu. Data-driven mask generation for source separation. In Int. Symp. Auditory and Audiological Res., ISAAR, Marienlyst, Denmark, 2009.
Google Scholar
M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation maximization source separation and localization. IEEE Trans. Audio, Speech, and Language Process., 53:382–394, 2010.
Google Scholar
R. Martin. Microphone arrays: Signal processing techniques and applications, chapter Small microphone arrays with postfilters for noise and acoustic echo reduction. Springer-Verlag, 2001.
Google Scholar
I. Merks. Binaural application of microphone arrays for improved speech intelligibility in a noisy environment. PhD thesis, Delft University of Technology, The Netherlands, 2000.
Google Scholar
J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119:463–479, 2006.
Google Scholar
J. Peissig. Binaurale Hörgerätestrategien in komplexen Störschallsituationen (Binaural stragegies for hearing aids in complex noise situations). PhD thesis, Georg-August Universität, Göttingen, Germany, 1992.
Google Scholar
B. Rakerd and W. M. Hartmann. Localization of sound in rooms. V. Binaural coherence and human sensitivity to interaural time differences in noise. J. Acoust. Soc. Am., 128:3052–3063, 2010.
Google Scholar
K. Reindl, P. Prokein, E. Fischer, Z. Y., and W. Kellermann. Combining monaural beamforming and blind source separation for binaural speech enhancement in multi-microphone hearing aids. In ITG-Fachtg. Sprachkommunikation, Nürnberg, Germany, 2010.
Google Scholar
N. Roman, D. L. Wang, and G. J. Brown. A classification-based cocktail-party processor. volume 16, Vancover, Canada, 2003.
Google Scholar
A. Sarampalis, S. Kalluri, B. Edwards, E. Hafter. Objective measures of listening effort: Effects of background noise and noise reduction. J. Speech, Language, and, Hearing Res., 52:1230–1240, 2009.
Google Scholar
A. Schlesinger. Binaural model-based speech intelligibility enhancement and assessment in hearing aids. PhD thesis, Delft University of Technology, The Netherlands, 2012.
Google Scholar
A. Schlesinger. Transient-based speech transmission index for predicting intelligibility in nonlinear speech enhancement processors. In IEEE Intl. Conf. Acoustics, Speech and Signal Process., ICASSP, volume 1, pages 3993–3996, Kyoto, Japan, 2012.
Google Scholar
K. U. Simmer, J. Bitzer, and C. Marro. Microphone arrays: Signal processing techniques and applications, chapter Post-filtering techniques. Springer-Verlag, 2001.
Google Scholar
C. H. Taal, R. C. Hendriks, R. Heusdens. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In IEEE Intl. Conf. Acoust., Speech and Signal Process., ICASSP 2010, pp. 4214–4217, Dallas, United States of, America, 2010.
Google Scholar
The MathWorks, Inc. MATLAB R2012a Documentation, 2012.
Google Scholar
TNO. Multilingual database. TNO Human Factors Research Institute, Soesterberg, The Netherlands, 2000.
Google Scholar
R. J. Weiss, M. I. Mandel, and D. P. W. Ellis. Combining localization cues and source model constraints for binaural source separation. Speech Communication, 53:606–621, 2011.
Google Scholar
P. Welch. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio and Electroacoustics, 15:70–73, 1967.
Google Scholar
O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Processing, 52:1830–1847, 2004.
Google Scholar
Y. Zheng, K. Reindl, and W. Kellermann. BSS for improved interference estimation for blind speech signal extraction with two microphones. In 3rd IEEE Intl. Worksh. Computational Advances in Multi-Sensor Adaptive Process., CAMSAP, pages 253–256, Aruba, Dutch Antilles, 2009.
Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge their indebtedness to two anonymous reviewers for helpful advice. Part of this work has been supported by the Dutch Technology Foundation STW-project # DTF.7459.

Author information

Authors and Affiliations

Institute of Communication Acoustics, Ruhr-University Bochum, Bochum, Germany
A. Schlesinger & Chr. Luther

Authors

A. Schlesinger
View author publications
You can also search for this author in PubMed Google Scholar
Chr. Luther
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Schlesinger .

Editor information

Editors and Affiliations

Fak. Elektrotechnik, LS Allgm.Elektrotechn.+Akustik, Univ. Bochum, Bochum, Germany
Jens Blauert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schlesinger, A., Luther, C. (2013). Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-37762-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics