Skip to main content

Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility

  • Chapter
The Technology of Binaural Listening

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

At the threshold of understanding speech in noisy environments, the neural exploitation of interaural differences constitutes a significant margin of intelligibility. Mimicking spatial hearing for improving speech intelligibility in hearing aids and robotics has been demonstrated by compelling results. This chapter reviews binaural speech processors and highlights approaches for their optimal application. Binaural algorithms of speech enhancement draw on the assumption that the target and the noise signal strike a head-mounted processor from different directions and cause distinctive interaural parameters. This assumption is, however, often violated in degraded acoustics. In order to study this degradation, the chapter starts with an examination of binaural statistics in different noise conditions. Subsequently, standard binaural speech processors are studied that use different waveform features, namely, the binaural coherence of the fine-structure, the binaural differences of the fine-structure and the binaural differences of the envelope. As a means to cater to a fair comparison, each algorithm underwent a stochastic optimization of the algorithmic parameters in a set of prototypical speech-in-noise scenes, whereby an instrumental measure of speech intelligibility served as the objective function. Furthermore, the binaural speech processors are applied at the output of commercially-available hearing aids that feature superdirective beamformers with different directivity modes. In this way, the SNR-gain that adds to the pre-processing of the beamformer is assessed. For deriving filter gains from binaural statistics, part three of this chapter describes histogram-based methods and parametric approaches for the binaural fine-structure algorithm, and compares these in a realistic environment with reverberation and additive noise. Part three can also be read as a hands-on description, thereby addressing students and engineers, who are striving for an ad-hoc implementation of a binaural system for speech enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a particular direction of sound incidence, the interaural transfer function is defined as the quotient of the corresponding head-related transfer functions at each ear. There is also a running interaural amplitude-modulation transfer function for a particular sound direction, which is equivalent to the quotient of the corresponding amplitude-modulation transfer functions of each ear. While the former is rooted in binaural differences of the fine-structure of the waveform, the latter is caused by interaural differences of the envelope. Both types of transfer functions are approximately independent of each other.

  2. 2.

    Disjointness between two signals in the time-frequency domain, for instance, the signal \(\delta _{j}(d, m)\) and the signal \(\delta _{j^{\prime }}(d, m)\) can be expressed as \(\delta _{j}(d, m)\delta _{j^{\prime }}(d, m) \approx 0\), \( \forall j^{\prime } \not = j\), where \(d\) and \(m\) denote the frequency and time index, respectively.

  3. 3.

    The same recording chain as well as the same source-receiver distance were applied during the HRTF measurements in the anechoic chamber of the Bochum institute.

  4. 4.

    Frame length and step size had a length of 16 and 8 ms, respectively. This way they corresponded to the settings in [29, 35].

  5. 5.

    The I3 has originally been developed to predict the effects of additive noise, peak clipping and center clipping on speech intelligibility [20].

  6. 6.

    Throughout this chapter, the target source and the interferers were arranged in the horizontal plane.

  7. 7.

    The hearing-aid program low of the hearing glasses provides a benefit of 4.4 dB as assessed with the directivity-index method of the ANSI S3.35-2005 standard [4]. In the mode high the hearing aid offers an improvement of the directivity index of 7.2 dB.

  8. 8.

    The term coherent-interference condition was used here to describe an interfering sound source under an-echoic conditions.

  9. 9.

    In order to provide for an instrumental measure of absolute perceptual speech intelligibility, a perceptual intelligibility test with the applied speech material as well as a subsequent fitting of the instrumental results to the perceptual recognition scores with a logistic function need to be executed. Therefore, un-fitted instrumental measures merely quantify trends.

  10. 10.

    The current section is an excerpt from the diploma thesis of Ch. Luther, Speech intelligibility enhancement based on multivariate models of binaural interaction, Ruhr-University Bochum, 2012. Contact the author to obtain a copy.

References

  1. J. B. Allen, D. A. Berkley, and J. Blauert. Multimicrophone signal-processing technique to remove room reverberation from speech signals. J. Acoust. Soc. Am., 62:912–915, 1977.

    Google Scholar 

  2. ANSI/ASA. American national standard methods for calculation of the speech intelligibility index. Technical report, Am. Nat. Standards of the Acoust. Soc. Am., S3.5-1997 (R2007).

    Google Scholar 

  3. J. Blauert. Epistemological bases of binaural perception—a constructivists’ approach. In Forum Acusticum 2011, Aalborg, Denmark, 2011.

    Google Scholar 

  4. M. M. Boone. Directivity measurements on a highly directive hearing aid: The hearing glasses. In AES 120th Conv., Paris, France, 2006.

    Google Scholar 

  5. M. M. Boone, R. C. G. Opdam, and A. Schlesinger. Downstream speech enhancement in a low directivity binaural hearing aid. In Proc, 20th Intl. Congr. Acoust., ICA 2010, Sydney, Australia, 2010.

    Google Scholar 

  6. A. W. Bronkhorst. The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Acta Acust,/Acustica, 86:117–128, 2000.

    Google Scholar 

  7. M. Dietz, S. D. Ewert, V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. it, Speech Communication, 53:592–605, 2011.

    Google Scholar 

  8. A. J. Duquesnoy. Effect of a single interfering noise or speech source upon the binaural sentence intelligibility of aged persons. J. Acoust. Soc. Am., 74:739–743, 1983.

    Google Scholar 

  9. N. I. Durlach and H. S. Colburn. Handbook of Perception, volume 4, chapter Binaural phenomena. New York: Academic Press, 1978.

    Google Scholar 

  10. C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116:3075–3089, 2004.

    Google Scholar 

  11. W. Gaik and W. Lindemann. Ein digitales Richtungsfilter, basierend auf der Auswertung interauraler Parameter von Kunstkopfsignalen. In Fortschr. Akust., DAGA’86, volume 86, pages 721–724, Oldenburg, Germany, 1986.

    Google Scholar 

  12. E. L. J. George. Factors affecting speech reception in fluctuating noise and reverberation. PhD thesis, Vrije Universiteit, The Netherlands, 2007.

    Google Scholar 

  13. J. Greenberg and P. Zurek. Microphone arrays: Signal processing techniques and applications, chapter Microphone-array hearing aids. Springer-Verlag, 2001.

    Google Scholar 

  14. V. Hamacher, U. Kornagel, T. Lotter, and H. Puder. Advances in digital speech transmission, chapter Binaural signal processing in hearing aids. John Wiley & Sons Ltd., 2008.

    Google Scholar 

  15. S. Harding, J. Barker, and G. J. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, and Language Processing, 14:58–67, 2005.

    Google Scholar 

  16. C. Houck, J. Joines, and M. Kay. A genetic algorithm for function optimization: a Matlab implementation. North Carolina State University, Raleigh, NC, Technical Report, 1995.

    Google Scholar 

  17. T. Houtgast and H. J. M. Steeneken. Past, present and future of the Speech Transmission Index, chapter The roots of the STI approach, pages 3–11. TNO Human Factors, Soesterberg, The Netherlands, 2002.

    Google Scholar 

  18. L. Jeffress. A place theory of sound localization. J. Comparative and Physiological Psychol., 41:35–39, 1948.

    Google Scholar 

  19. J. M. Kates and K. H. Arehart. A model of speech intelligibility and quality in hearing aids. In IEEE Worksh. Applications of Signal Process. to Audio and Acoustics, WASPAA, pages 53–56, New Paltz, 2005.

    Google Scholar 

  20. J. M. Kates and K. H. Arehart. Coherence and the speech intelligibility index. J. Acoust. Soc. Am., 117:2224–2237, 2005.

    Google Scholar 

  21. B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95:1593–1602, 1994.

    Google Scholar 

  22. D. Kolossa. High-level processing of binaural features. In Forum Acusticum 2011, Aalborg, Denmark, 2011.

    Google Scholar 

  23. J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication. Speech Comm., 53:677–689, 2010.

    Google Scholar 

  24. J. Ma, Y. Hu, and P. C. Loizou. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am., 125:3387–3405, 2009.

    Google Scholar 

  25. N. Madhu. Data-driven mask generation for source separation. In Int. Symp. Auditory and Audiological Res., ISAAR, Marienlyst, Denmark, 2009.

    Google Scholar 

  26. M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation maximization source separation and localization. IEEE Trans. Audio, Speech, and Language Process., 53:382–394, 2010.

    Google Scholar 

  27. R. Martin. Microphone arrays: Signal processing techniques and applications, chapter Small microphone arrays with postfilters for noise and acoustic echo reduction. Springer-Verlag, 2001.

    Google Scholar 

  28. I. Merks. Binaural application of microphone arrays for improved speech intelligibility in a noisy environment. PhD thesis, Delft University of Technology, The Netherlands, 2000.

    Google Scholar 

  29. J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119:463–479, 2006.

    Google Scholar 

  30. J. Peissig. Binaurale Hörgerätestrategien in komplexen Störschallsituationen (Binaural stragegies for hearing aids in complex noise situations). PhD thesis, Georg-August Universität, Göttingen, Germany, 1992.

    Google Scholar 

  31. B. Rakerd and W. M. Hartmann. Localization of sound in rooms. V. Binaural coherence and human sensitivity to interaural time differences in noise. J. Acoust. Soc. Am., 128:3052–3063, 2010.

    Google Scholar 

  32. K. Reindl, P. Prokein, E. Fischer, Z. Y., and W. Kellermann. Combining monaural beamforming and blind source separation for binaural speech enhancement in multi-microphone hearing aids. In ITG-Fachtg. Sprachkommunikation, Nürnberg, Germany, 2010.

    Google Scholar 

  33. N. Roman, D. L. Wang, and G. J. Brown. A classification-based cocktail-party processor. volume 16, Vancover, Canada, 2003.

    Google Scholar 

  34. A. Sarampalis, S. Kalluri, B. Edwards, E. Hafter. Objective measures of listening effort: Effects of background noise and noise reduction. J. Speech, Language, and, Hearing Res., 52:1230–1240, 2009.

    Google Scholar 

  35. A. Schlesinger. Binaural model-based speech intelligibility enhancement and assessment in hearing aids. PhD thesis, Delft University of Technology, The Netherlands, 2012.

    Google Scholar 

  36. A. Schlesinger. Transient-based speech transmission index for predicting intelligibility in nonlinear speech enhancement processors. In IEEE Intl. Conf. Acoustics, Speech and Signal Process., ICASSP, volume 1, pages 3993–3996, Kyoto, Japan, 2012.

    Google Scholar 

  37. K. U. Simmer, J. Bitzer, and C. Marro. Microphone arrays: Signal processing techniques and applications, chapter Post-filtering techniques. Springer-Verlag, 2001.

    Google Scholar 

  38. C. H. Taal, R. C. Hendriks, R. Heusdens. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In IEEE Intl. Conf. Acoust., Speech and Signal Process., ICASSP 2010, pp. 4214–4217, Dallas, United States of, America, 2010.

    Google Scholar 

  39. The MathWorks, Inc. MATLAB R2012a Documentation, 2012.

    Google Scholar 

  40. TNO. Multilingual database. TNO Human Factors Research Institute, Soesterberg, The Netherlands, 2000.

    Google Scholar 

  41. R. J. Weiss, M. I. Mandel, and D. P. W. Ellis. Combining localization cues and source model constraints for binaural source separation. Speech Communication, 53:606–621, 2011.

    Google Scholar 

  42. P. Welch. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio and Electroacoustics, 15:70–73, 1967.

    Google Scholar 

  43. O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Processing, 52:1830–1847, 2004.

    Google Scholar 

  44. Y. Zheng, K. Reindl, and W. Kellermann. BSS for improved interference estimation for blind speech signal extraction with two microphones. In 3rd IEEE Intl. Worksh. Computational Advances in Multi-Sensor Adaptive Process., CAMSAP, pages 253–256, Aruba, Dutch Antilles, 2009.

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge their indebtedness to two anonymous reviewers for helpful advice. Part of this work has been supported by the Dutch Technology Foundation STW-project # DTF.7459.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Schlesinger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schlesinger, A., Luther, C. (2013). Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37762-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37761-7

  • Online ISBN: 978-3-642-37762-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics