Neural Decoding of Attentional Selection in Multi-speaker Environments Without Access to Clean Sources

O’Sullivan, James; Chen, Zhuo; Herrero, Jose; Sheth, Sameer A.; McKhann, Guy; Mehta, Ashesh D.; Mesgarani, Nima

doi:10.1007/978-3-030-49583-1_6

James O’Sullivan⁸,
Zhuo Chen⁸,
Jose Herrero¹⁰,
Sameer A. Sheth⁹,
Guy McKhann⁹,
Ashesh D. Mehta¹⁰ &
…
Nima Mesgarani⁸

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSELECTRIC))

501 Accesses

Abstract

People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Modern hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation without knowing which speaker is being attended to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. A number of challenges exist, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener’s neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker’s voice to assist the listener. Using invasive electrophysiology recordings, our system is able to decode the attention of a subject and detect switches in attention using only the mixed audio. We also identified the regions of the auditory cortex that contribute to AAD. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearing aids.

Research supported by NIH, NIDCD, DC014279.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J.E. Peelle, A. Wingfield, The neural consequences of age-related hearing loss, Trends Neurosci. (2016)
Google Scholar
J.L. Clark, D.W. Swanepoel, Technology for hearing loss–as we know it, and as we dream it. Disab. Rehabil. Assist. Tech. 9, 408–413 (2014)
Article Google Scholar
N. Ding, J.Z. Simon, Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl. Acad. Sci. U.S.A. 109, 11854–11859 (2012)
Article Google Scholar
A.J. Power, J.J. Foxe, E.J. Forde, R.B. Reilly, E.C. Lalor, At what time is the cocktail party? A late locus of selective attention to natural speech. Eur. J. Neurosci. 35, 1497–1503 (2012)
Article Google Scholar
N. Mesgarani, E.F. Chang, Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233-U118 (2012)
Article Google Scholar
J.A. O’Sullivan, A.J. Power, N. Mesgarani, S. Rajaram, J.J. Foxe, B.G. Shinn-Cunningham et al., Attentional selection in a cocktail party environment can be decoded from single-trial EEG. Cerebral Cortex 25, 1697–1706 (2015)
Article Google Scholar
S. Van Eyndhoven, T. Francart, A. Bertrand, EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses. arXiv preprint arXiv:1602.05702 (2016)
B. Mirkovic, S. Debener, M. Jaeger, M. De Vos, Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications. J. Neural Eng. 12, 046007 (2015)
Article Google Scholar
M.G. Bleichner, B. Mirkovic, S. Debener, Identifying auditory attention with ear-EEG: cEEGrid versus high-density cap-EEG comparison. J. Neural Eng. 13, 066004 (2016)
Article Google Scholar
N. Das, S. Van Eyndhoven, T. Francart, A. Bertrand, Adaptive attention-driven speech enhancement for EEG-informed hearing prostheses, in 2016 IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC) (2016) pp. 77–80
Google Scholar
F. Weninger, J.R. Hershey, J. Le Roux, B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, in IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 577–581 (2014)
Google Scholar
http://naplab.ee.columbia.edu/nnaad.html
J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22, 745–777 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Columbia University, New York, NY, USA
James O’Sullivan, Zhuo Chen & Nima Mesgarani
Department of Neurological Surgery, The Neurological Institute, 710 West 168 Street, New York, NY, USA
Sameer A. Sheth & Guy McKhann
Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, New York, NY, USA
Jose Herrero & Ashesh D. Mehta

Authors

James O’Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jose Herrero
View author publications
You can also search for this author in PubMed Google Scholar
Sameer A. Sheth
View author publications
You can also search for this author in PubMed Google Scholar
Guy McKhann
View author publications
You can also search for this author in PubMed Google Scholar
Ashesh D. Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Nima Mesgarani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James O’Sullivan .

Editor information

Editors and Affiliations

g.tec medical engineering GmbH, Schiedlberg, Austria
Christoph Guger
Department of Cognitive Science, University of California at San Diego, La Jolla, USA
Brendan Z. Allison
Mayo Clinic, Rochester, MN, USA
Kai Miller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

O’Sullivan, J. et al. (2020). Neural Decoding of Attentional Selection in Multi-speaker Environments Without Access to Clean Sources. In: Guger, C., Allison, B.Z., Miller, K. (eds) Brain–Computer Interface Research. SpringerBriefs in Electrical and Computer Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-49583-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-49583-1_6
Published: 18 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49582-4
Online ISBN: 978-3-030-49583-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics