An Auditory Scene Analysis Approach to Monaural Speech Segregation

doi:10.1007/3-540-33213-8_12

Part of the book series: Signals and Communication Technology ((SCT))

1785 Accesses
1 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.M.A. Ali, J. Van der Spiegel: Acoustic-phonetic features for the automatic classification of stop consonants, IEEE Trans. Speech Audio Process., 9, 833-841, 2001.
Article Google Scholar
J.P. Barker, M.P. Cooke, D.P.W. Ellis: Decoding speech in the presence of other sources, Speech Comm., 45, 5-25, 2005.
Article Google Scholar
J. Bird, C.J. Darwin: Effects of a difference in fundamental frequency inseparating two sentences, in A.R. Palmer, A. Rees, A.Q. Summerfield, R. Meddis (eds.), Psychophysical and Physiological Advances in Hearing, London, UK: Whurr, 263-269, 1998.
Google Scholar
P. Boersma, D. Weenink: Praat: Doing Phonetics by Computer, Version 4.2.31, http://www.fon.hum.uva.nl/praat/, 2004.
S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., 27, 113-120, 1979.
Article Google Scholar
A.S. Bregman: Auditory Scene Analysis, Cambridge, MA, USA: MIT Press, 1990.
Google Scholar
G.J. Brown, M.P. Cooke: Computational auditory scene analysis, Comput. Speech and Language, 8, 297-336, 1994.
Article Google Scholar
G.J. Brown, D.L. Wang: Separation of speech by computational auditory scene analysis, J. Benesty, S. Makino, J. Chen (eds.), Speech Enhancement, Berlin, Germany: Springer, 371-402, 2005.
Chapter Google Scholar
D.S. Brungart, P.S. Chang, B.D. Simpson, D.L. Wang: Isolating the energetic component of speech-on-speech masking with an ideal binary mask, Submitted for journal publication, 2005.
Google Scholar
J. Canny: A computational approach to edge detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, 679-698, 1986.
Article Google Scholar
R.P. Carlyon, T.M. Shackleton: Comparing the fundamental frequen- cies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J. Acoust. Soc. Am., 95, 3541-3554, 1994.
Article Google Scholar
P.S. Chang: Exploration of Behavioral, Physiological, and Compu- tational Approaches to Auditory Scene Analysis, M.S. Thesis, The Ohio State University Dept. Comput. Sci. & Eng., 2004 (available at http://www.cse.ohio-state.edu/pnl/theses).
M.P. Cooke: Modelling Auditory Processing and Organisation, Cam- bridge, UK: Cambridge University Press, 1993.
Google Scholar
M.P. Cooke, P. Green, L. Josifovski, A. Vizinho: Robust automatic speech recognition with missing and unreliable acoustic data, Speech Comm., 34, 267-285, 2001.
Article MATH Google Scholar
L.A. Drake: Sound Source Separation via Computational Auditory Scene Analysis (CASA) - Enhanced Beamforming, Ph.D. Dissertation, Northwestern University Dept. Elec. Eng., 2001.
Google Scholar
D.P.W. Ellis: Prediction-driven Computational Auditory Scene Analy- sis, Ph.D. Dissertation, MIT Dept. Elec. Eng. & Comput. Sci., 1996.
Google Scholar
Y. Ephraim, H.L. van Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process., 3, 251-266, 1995.
Article Google Scholar
J. Garofolo, L. Lamel, et al.: Darpa TIMIT acoustic-phonetic continuous speech corpus, NISTIR 4930, 1993.
Google Scholar
H. Helmholtz: On the Sensation of Tone, 2nd English ed., New York, NY, USA: Dover Publishers, 1863.
Google Scholar
J. Holdsworth, I. Nimmo-Smith, R.D. Patterson, P. Rice: Implementing a gammatone filter bank, MRC Applied Psych. Unit, 1988.
Google Scholar
G. Hu, D.L. Wang: Speech segregation based on pitch tracking and am- plitude modulation, Proc. WASPAA ’01, 79-82, New Paltz, New York, USA, 2001.
Google Scholar
G. Hu, D.L. Wang: Separation of stop consonants, Proc. ICASSP ’03, 2,749-752, 2003.
Google Scholar
G. Hu, D.L. Wang: Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. Neural Net., 15, 1135-1150, 2004.
Article MathSciNet Google Scholar
G. Hu, D.L. Wang: Auditory segmentation based on event detection, Proc. ISCA Tutorial and Research Workshop on Stat. & Percept. Audio Process., 2004.
Google Scholar
G. Hu, D.L. Wang: Separation of fricatives and affricates, Proc. ICASSP ’05, 1, 1101-1104, Philadelphia, PA, USA, 2005.
Google Scholar
A. Hyvärinen, J. Karhunen, E. Oja: Independent Component Analysis, New York, NY, USA: Wiley, 2001.
Book Google Scholar
ISO: Normal Equal-loudness Level Contours for Pure Tones under Free- field Listening Conditions (ISO 226), International standards organiza- tion.
Google Scholar
J. Jensen, J.H.L. Hansen: Speech enhancement using a constrained iter- ative sinusoidal model, IEEE Trans. Speech Audio Process., 9, 731-740, 2001.
Article Google Scholar
H. Krim, M. Viberg: Two decades of array signal processing research: The parametric approach, IEEE Signal Process. Mag., 13, 67-94, 1996.
Article Google Scholar
P. Ladefoged: Vowels and Consonants, Oxford, UK: Blackwell, 2001.
Google Scholar
J.C.R. Licklider: A duplex theory of pitch perception, Experientia, 7, 128-134, 1951.
Article Google Scholar
D. Marr: Vision, New York, NY, USA: Freeman, 1982.
Google Scholar
R. Meddis: Simulation of auditory-neural transduction: Further studies, J. Acoust. Soc. Am., 83, 1056-1063, 1988.
Article Google Scholar
R. Meddis, M. Hewitt: Modelling the identification of concurrent vowels with different fundamental frequencies, J. Acoust. Soc. Am., 91, 233-245,1992.
Article Google Scholar
B.C.J. Moore: An Introduction to the Psychology of Hearing, 5th ed., San Diego, CA, USA: Academic Press, 2003.
Google Scholar
R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, P. Rice: An efficient auditory filterbank based on the gammatone function, MRC Applied Psych. Unit. 2341, 1988.
Google Scholar
J.O. Pickles: An Introduction to the Physiology of Hearing, 2nd ed., London, UK: Academic Press, 1988.
Google Scholar
R. Plomp: The Ear as a Frequency Analyzer, J. Acoust. Soc. Am., 36, 1628-1636, 1964.
Article Google Scholar
R. Plomp: The Intelligent Ear, Mahwah, NJ, USA: Lawrence Erlbaum Associates, 2002.
Google Scholar
R. Plomp, A.M. Mimpen: The ear as a frequency analyzer II, J. Acoust. Soc. Am., 43, 764-767, 1968.
Article Google Scholar
N. Roman, D.L. Wang: A pitch-based model for separation of reverber- ant speech, Proc. INTERSPEECH ’05, 2109-2112, Lisbon, Portugal, 2005.
Google Scholar
N. Roman, D.L. Wang, G.J. Brown: Speech segregation based on sound localization, J. Acoust. Soc. Am., 114, 2236-2252, 2003.
Article Google Scholar
B.H. Romeny, L. Florack, J. Koenderink, M. Viergever (eds.): Scale- space Theory in Computer Vision, Berlin, Germany: Springer, 1997.
Google Scholar
D.F. Rosenthald, H.G. Okuno (eds.): Computational Auditory Scene Analysis, Mahwah, NJ: Lawrence Erlbaum Associates, 1998.
Google Scholar
S.T. Roweis: One microphone source separation, Proceedings of the Annual Neural Information Processing Systems (NIPS 2000) Conference, 2001.
Google Scholar
H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strate- gies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process., 6, 445-455, 1998.
Article Google Scholar
Y. Shao, D.L. Wang: Model-based sequential organization in cochannel speech, IEEE Trans. Speech Audio Process., in press, 2005.
Google Scholar
M. Slaney, R.F. Lyons: A perceptual pitch detector, Proc. ICASSP ’90, 1,357-360, Albuquerque, NM, USA, 1990.
Google Scholar
S. Srinivasan, D.L. Wang: A schema-based model for phonemic restora- tion, Speech Comm., 45, 63-87, 2005.
Article Google Scholar
K.N. Stevens: Acoustic Phonetics, Cambridge, MA, USA: MIT Press, 1998.
Google Scholar
D.L. Wang: On ideal binary mask as the computational goal of audi- tory scene analysis, P. Divenyi (ed.), Speech Separation by Humans and Machines, Norwell, MA, USA: Kluwer, 181-197, 2005.
Chapter Google Scholar
D.L. Wang, G.J. Brown: Separation of speech from interfering sounds based on oscillatory correlation, IEEE Trans. Neural Net., 10, 684-697, 1999.
Article Google Scholar
M. Weintraub: A Theory and Computational Model of Auditory Monau- ral Sound Separation, Ph.D. Dissertation, Stanford University Dept. Elec. Eng., 1985.
Google Scholar
M. Wu, D.L. Wang, G.J. Brown: A multipitch tracking algorithm for noisy speech, IEEE Trans. Speech Audio Process., 11, 229-241, 2003.
Article Google Scholar

Download references

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Institute of Telecommunications, Merckstrasse, 25 D-64283, Darmstadt, Germany
Eberhard Hänsler
Harman/Becker Automotive Systems, Acoustic Signal Processing, D-89077, Ulm, Germany
Gerhard Schmidt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

(2006). An Auditory Scene Analysis Approach to Monaural Speech Segregation. In: Hänsler, E., Schmidt, G. (eds) Topics in Acoustic Echo and Noise Control. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33213-8_12

Download citation

DOI: https://doi.org/10.1007/3-540-33213-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33212-1
Online ISBN: 978-3-540-33213-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics