Abstract
At the threshold of understanding speech in noisy environments, the neural exploitation of interaural differences constitutes a significant margin of intelligibility. Mimicking spatial hearing for improving speech intelligibility in hearing aids and robotics has been demonstrated by compelling results. This chapter reviews binaural speech processors and highlights approaches for their optimal application. Binaural algorithms of speech enhancement draw on the assumption that the target and the noise signal strike a head-mounted processor from different directions and cause distinctive interaural parameters. This assumption is, however, often violated in degraded acoustics. In order to study this degradation, the chapter starts with an examination of binaural statistics in different noise conditions. Subsequently, standard binaural speech processors are studied that use different waveform features, namely, the binaural coherence of the fine-structure, the binaural differences of the fine-structure and the binaural differences of the envelope. As a means to cater to a fair comparison, each algorithm underwent a stochastic optimization of the algorithmic parameters in a set of prototypical speech-in-noise scenes, whereby an instrumental measure of speech intelligibility served as the objective function. Furthermore, the binaural speech processors are applied at the output of commercially-available hearing aids that feature superdirective beamformers with different directivity modes. In this way, the SNR-gain that adds to the pre-processing of the beamformer is assessed. For deriving filter gains from binaural statistics, part three of this chapter describes histogram-based methods and parametric approaches for the binaural fine-structure algorithm, and compares these in a realistic environment with reverberation and additive noise. Part three can also be read as a hands-on description, thereby addressing students and engineers, who are striving for an ad-hoc implementation of a binaural system for speech enhancement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For a particular direction of sound incidence, the interaural transfer function is defined as the quotient of the corresponding head-related transfer functions at each ear. There is also a running interaural amplitude-modulation transfer function for a particular sound direction, which is equivalent to the quotient of the corresponding amplitude-modulation transfer functions of each ear. While the former is rooted in binaural differences of the fine-structure of the waveform, the latter is caused by interaural differences of the envelope. Both types of transfer functions are approximately independent of each other.
- 2.
Disjointness between two signals in the time-frequency domain, for instance, the signal \(\delta _{j}(d, m)\) and the signal \(\delta _{j^{\prime }}(d, m)\) can be expressed as \(\delta _{j}(d, m)\delta _{j^{\prime }}(d, m) \approx 0\), \( \forall j^{\prime } \not = j\), where \(d\) and \(m\) denote the frequency and time index, respectively.
- 3.
The same recording chain as well as the same source-receiver distance were applied during the HRTF measurements in the anechoic chamber of the Bochum institute.
- 4.
- 5.
The I3 has originally been developed to predict the effects of additive noise, peak clipping and center clipping on speech intelligibility [20].
- 6.
Throughout this chapter, the target source and the interferers were arranged in the horizontal plane.
- 7.
The hearing-aid program low of the hearing glasses provides a benefit of 4.4Â dB as assessed with the directivity-index method of the ANSI S3.35-2005 standard [4]. In the mode high the hearing aid offers an improvement of the directivity index of 7.2Â dB.
- 8.
The term coherent-interference condition was used here to describe an interfering sound source under an-echoic conditions.
- 9.
In order to provide for an instrumental measure of absolute perceptual speech intelligibility, a perceptual intelligibility test with the applied speech material as well as a subsequent fitting of the instrumental results to the perceptual recognition scores with a logistic function need to be executed. Therefore, un-fitted instrumental measures merely quantify trends.
- 10.
The current section is an excerpt from the diploma thesis of Ch. Luther, Speech intelligibility enhancement based on multivariate models of binaural interaction, Ruhr-University Bochum, 2012. Contact the author to obtain a copy.
References
J. B. Allen, D. A. Berkley, and J. Blauert. Multimicrophone signal-processing technique to remove room reverberation from speech signals. J. Acoust. Soc. Am., 62:912–915, 1977.
ANSI/ASA. American national standard methods for calculation of the speech intelligibility index. Technical report, Am. Nat. Standards of the Acoust. Soc. Am., S3.5-1997 (R2007).
J. Blauert. Epistemological bases of binaural perception—a constructivists’ approach. In Forum Acusticum 2011, Aalborg, Denmark, 2011.
M. M. Boone. Directivity measurements on a highly directive hearing aid: The hearing glasses. In AES 120th Conv., Paris, France, 2006.
M. M. Boone, R. C. G. Opdam, and A. Schlesinger. Downstream speech enhancement in a low directivity binaural hearing aid. In Proc, 20th Intl. Congr. Acoust., ICA 2010, Sydney, Australia, 2010.
A. W. Bronkhorst. The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Acta Acust,/Acustica, 86:117–128, 2000.
M. Dietz, S. D. Ewert, V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. it, Speech Communication, 53:592–605, 2011.
A. J. Duquesnoy. Effect of a single interfering noise or speech source upon the binaural sentence intelligibility of aged persons. J. Acoust. Soc. Am., 74:739–743, 1983.
N. I. Durlach and H. S. Colburn. Handbook of Perception, volume 4, chapter Binaural phenomena. New York: Academic Press, 1978.
C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116:3075–3089, 2004.
W. Gaik and W. Lindemann. Ein digitales Richtungsfilter, basierend auf der Auswertung interauraler Parameter von Kunstkopfsignalen. In Fortschr. Akust., DAGA’86, volume 86, pages 721–724, Oldenburg, Germany, 1986.
E. L. J. George. Factors affecting speech reception in fluctuating noise and reverberation. PhD thesis, Vrije Universiteit, The Netherlands, 2007.
J. Greenberg and P. Zurek. Microphone arrays: Signal processing techniques and applications, chapter Microphone-array hearing aids. Springer-Verlag, 2001.
V. Hamacher, U. Kornagel, T. Lotter, and H. Puder. Advances in digital speech transmission, chapter Binaural signal processing in hearing aids. John Wiley & Sons Ltd., 2008.
S. Harding, J. Barker, and G. J. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, and Language Processing, 14:58–67, 2005.
C. Houck, J. Joines, and M. Kay. A genetic algorithm for function optimization: a Matlab implementation. North Carolina State University, Raleigh, NC, Technical Report, 1995.
T. Houtgast and H. J. M. Steeneken. Past, present and future of the Speech Transmission Index, chapter The roots of the STI approach, pages 3–11. TNO Human Factors, Soesterberg, The Netherlands, 2002.
L. Jeffress. A place theory of sound localization. J. Comparative and Physiological Psychol., 41:35–39, 1948.
J. M. Kates and K. H. Arehart. A model of speech intelligibility and quality in hearing aids. In IEEE Worksh. Applications of Signal Process. to Audio and Acoustics, WASPAA, pages 53–56, New Paltz, 2005.
J. M. Kates and K. H. Arehart. Coherence and the speech intelligibility index. J. Acoust. Soc. Am., 117:2224–2237, 2005.
B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95:1593–1602, 1994.
D. Kolossa. High-level processing of binaural features. In Forum Acusticum 2011, Aalborg, Denmark, 2011.
J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication. Speech Comm., 53:677–689, 2010.
J. Ma, Y. Hu, and P. C. Loizou. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am., 125:3387–3405, 2009.
N. Madhu. Data-driven mask generation for source separation. In Int. Symp. Auditory and Audiological Res., ISAAR, Marienlyst, Denmark, 2009.
M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation maximization source separation and localization. IEEE Trans. Audio, Speech, and Language Process., 53:382–394, 2010.
R. Martin. Microphone arrays: Signal processing techniques and applications, chapter Small microphone arrays with postfilters for noise and acoustic echo reduction. Springer-Verlag, 2001.
I. Merks. Binaural application of microphone arrays for improved speech intelligibility in a noisy environment. PhD thesis, Delft University of Technology, The Netherlands, 2000.
J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119:463–479, 2006.
J. Peissig. Binaurale Hörgerätestrategien in komplexen Störschallsituationen (Binaural stragegies for hearing aids in complex noise situations). PhD thesis, Georg-August Universität, Göttingen, Germany, 1992.
B. Rakerd and W. M. Hartmann. Localization of sound in rooms. V. Binaural coherence and human sensitivity to interaural time differences in noise. J. Acoust. Soc. Am., 128:3052–3063, 2010.
K. Reindl, P. Prokein, E. Fischer, Z. Y., and W. Kellermann. Combining monaural beamforming and blind source separation for binaural speech enhancement in multi-microphone hearing aids. In ITG-Fachtg. Sprachkommunikation, Nürnberg, Germany, 2010.
N. Roman, D. L. Wang, and G. J. Brown. A classification-based cocktail-party processor. volume 16, Vancover, Canada, 2003.
A. Sarampalis, S. Kalluri, B. Edwards, E. Hafter. Objective measures of listening effort: Effects of background noise and noise reduction. J. Speech, Language, and, Hearing Res., 52:1230–1240, 2009.
A. Schlesinger. Binaural model-based speech intelligibility enhancement and assessment in hearing aids. PhD thesis, Delft University of Technology, The Netherlands, 2012.
A. Schlesinger. Transient-based speech transmission index for predicting intelligibility in nonlinear speech enhancement processors. In IEEE Intl. Conf. Acoustics, Speech and Signal Process., ICASSP, volume 1, pages 3993–3996, Kyoto, Japan, 2012.
K. U. Simmer, J. Bitzer, and C. Marro. Microphone arrays: Signal processing techniques and applications, chapter Post-filtering techniques. Springer-Verlag, 2001.
C. H. Taal, R. C. Hendriks, R. Heusdens. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In IEEE Intl. Conf. Acoust., Speech and Signal Process., ICASSP 2010, pp. 4214–4217, Dallas, United States of, America, 2010.
The MathWorks, Inc. MATLAB R2012a Documentation, 2012.
TNO. Multilingual database. TNO Human Factors Research Institute, Soesterberg, The Netherlands, 2000.
R. J. Weiss, M. I. Mandel, and D. P. W. Ellis. Combining localization cues and source model constraints for binaural source separation. Speech Communication, 53:606–621, 2011.
P. Welch. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio and Electroacoustics, 15:70–73, 1967.
O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Processing, 52:1830–1847, 2004.
Y. Zheng, K. Reindl, and W. Kellermann. BSS for improved interference estimation for blind speech signal extraction with two microphones. In 3rd IEEE Intl. Worksh. Computational Advances in Multi-Sensor Adaptive Process., CAMSAP, pages 253–256, Aruba, Dutch Antilles, 2009.
Acknowledgments
The authors gratefully acknowledge their indebtedness to two anonymous reviewers for helpful advice. Part of this work has been supported by the Dutch Technology Foundation STW-project # DTF.7459.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schlesinger, A., Luther, C. (2013). Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-37762-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)