Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.M.A. Ali, J. Van der Spiegel: Acoustic-phonetic features for the automatic classification of stop consonants, IEEE Trans. Speech Audio Process., 9, 833-841, 2001.
J.P. Barker, M.P. Cooke, D.P.W. Ellis: Decoding speech in the presence of other sources, Speech Comm., 45, 5-25, 2005.
J. Bird, C.J. Darwin: Effects of a difference in fundamental frequency inseparating two sentences, in A.R. Palmer, A. Rees, A.Q. Summerfield, R. Meddis (eds.), Psychophysical and Physiological Advances in Hearing, London, UK: Whurr, 263-269, 1998.
P. Boersma, D. Weenink: Praat: Doing Phonetics by Computer, Version 4.2.31, http://www.fon.hum.uva.nl/praat/, 2004.
S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., 27, 113-120, 1979.
A.S. Bregman: Auditory Scene Analysis, Cambridge, MA, USA: MIT Press, 1990.
G.J. Brown, M.P. Cooke: Computational auditory scene analysis, Comput. Speech and Language, 8, 297-336, 1994.
G.J. Brown, D.L. Wang: Separation of speech by computational auditory scene analysis, J. Benesty, S. Makino, J. Chen (eds.), Speech Enhancement, Berlin, Germany: Springer, 371-402, 2005.
D.S. Brungart, P.S. Chang, B.D. Simpson, D.L. Wang: Isolating the energetic component of speech-on-speech masking with an ideal binary mask, Submitted for journal publication, 2005.
J. Canny: A computational approach to edge detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, 679-698, 1986.
R.P. Carlyon, T.M. Shackleton: Comparing the fundamental frequen- cies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J. Acoust. Soc. Am., 95, 3541-3554, 1994.
P.S. Chang: Exploration of Behavioral, Physiological, and Compu- tational Approaches to Auditory Scene Analysis, M.S. Thesis, The Ohio State University Dept. Comput. Sci. & Eng., 2004 (available at http://www.cse.ohio-state.edu/pnl/theses).
M.P. Cooke: Modelling Auditory Processing and Organisation, Cam- bridge, UK: Cambridge University Press, 1993.
M.P. Cooke, P. Green, L. Josifovski, A. Vizinho: Robust automatic speech recognition with missing and unreliable acoustic data, Speech Comm., 34, 267-285, 2001.
L.A. Drake: Sound Source Separation via Computational Auditory Scene Analysis (CASA) - Enhanced Beamforming, Ph.D. Dissertation, Northwestern University Dept. Elec. Eng., 2001.
D.P.W. Ellis: Prediction-driven Computational Auditory Scene Analy- sis, Ph.D. Dissertation, MIT Dept. Elec. Eng. & Comput. Sci., 1996.
Y. Ephraim, H.L. van Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process., 3, 251-266, 1995.
J. Garofolo, L. Lamel, et al.: Darpa TIMIT acoustic-phonetic continuous speech corpus, NISTIR 4930, 1993.
H. Helmholtz: On the Sensation of Tone, 2nd English ed., New York, NY, USA: Dover Publishers, 1863.
J. Holdsworth, I. Nimmo-Smith, R.D. Patterson, P. Rice: Implementing a gammatone filter bank, MRC Applied Psych. Unit, 1988.
G. Hu, D.L. Wang: Speech segregation based on pitch tracking and am- plitude modulation, Proc. WASPAA ’01, 79-82, New Paltz, New York, USA, 2001.
G. Hu, D.L. Wang: Separation of stop consonants, Proc. ICASSP ’03, 2,749-752, 2003.
G. Hu, D.L. Wang: Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. Neural Net., 15, 1135-1150, 2004.
G. Hu, D.L. Wang: Auditory segmentation based on event detection, Proc. ISCA Tutorial and Research Workshop on Stat. & Percept. Audio Process., 2004.
G. Hu, D.L. Wang: Separation of fricatives and affricates, Proc. ICASSP ’05, 1, 1101-1104, Philadelphia, PA, USA, 2005.
A. Hyvärinen, J. Karhunen, E. Oja: Independent Component Analysis, New York, NY, USA: Wiley, 2001.
ISO: Normal Equal-loudness Level Contours for Pure Tones under Free- field Listening Conditions (ISO 226), International standards organiza- tion.
J. Jensen, J.H.L. Hansen: Speech enhancement using a constrained iter- ative sinusoidal model, IEEE Trans. Speech Audio Process., 9, 731-740, 2001.
H. Krim, M. Viberg: Two decades of array signal processing research: The parametric approach, IEEE Signal Process. Mag., 13, 67-94, 1996.
P. Ladefoged: Vowels and Consonants, Oxford, UK: Blackwell, 2001.
J.C.R. Licklider: A duplex theory of pitch perception, Experientia, 7, 128-134, 1951.
D. Marr: Vision, New York, NY, USA: Freeman, 1982.
R. Meddis: Simulation of auditory-neural transduction: Further studies, J. Acoust. Soc. Am., 83, 1056-1063, 1988.
R. Meddis, M. Hewitt: Modelling the identification of concurrent vowels with different fundamental frequencies, J. Acoust. Soc. Am., 91, 233-245,1992.
B.C.J. Moore: An Introduction to the Psychology of Hearing, 5th ed., San Diego, CA, USA: Academic Press, 2003.
R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, P. Rice: An efficient auditory filterbank based on the gammatone function, MRC Applied Psych. Unit. 2341, 1988.
J.O. Pickles: An Introduction to the Physiology of Hearing, 2nd ed., London, UK: Academic Press, 1988.
R. Plomp: The Ear as a Frequency Analyzer, J. Acoust. Soc. Am., 36, 1628-1636, 1964.
R. Plomp: The Intelligent Ear, Mahwah, NJ, USA: Lawrence Erlbaum Associates, 2002.
R. Plomp, A.M. Mimpen: The ear as a frequency analyzer II, J. Acoust. Soc. Am., 43, 764-767, 1968.
N. Roman, D.L. Wang: A pitch-based model for separation of reverber- ant speech, Proc. INTERSPEECH ’05, 2109-2112, Lisbon, Portugal, 2005.
N. Roman, D.L. Wang, G.J. Brown: Speech segregation based on sound localization, J. Acoust. Soc. Am., 114, 2236-2252, 2003.
B.H. Romeny, L. Florack, J. Koenderink, M. Viergever (eds.): Scale- space Theory in Computer Vision, Berlin, Germany: Springer, 1997.
D.F. Rosenthald, H.G. Okuno (eds.): Computational Auditory Scene Analysis, Mahwah, NJ: Lawrence Erlbaum Associates, 1998.
S.T. Roweis: One microphone source separation, Proceedings of the Annual Neural Information Processing Systems (NIPS 2000) Conference, 2001.
H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strate- gies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process., 6, 445-455, 1998.
Y. Shao, D.L. Wang: Model-based sequential organization in cochannel speech, IEEE Trans. Speech Audio Process., in press, 2005.
M. Slaney, R.F. Lyons: A perceptual pitch detector, Proc. ICASSP ’90, 1,357-360, Albuquerque, NM, USA, 1990.
S. Srinivasan, D.L. Wang: A schema-based model for phonemic restora- tion, Speech Comm., 45, 63-87, 2005.
K.N. Stevens: Acoustic Phonetics, Cambridge, MA, USA: MIT Press, 1998.
D.L. Wang: On ideal binary mask as the computational goal of audi- tory scene analysis, P. Divenyi (ed.), Speech Separation by Humans and Machines, Norwell, MA, USA: Kluwer, 181-197, 2005.
D.L. Wang, G.J. Brown: Separation of speech from interfering sounds based on oscillatory correlation, IEEE Trans. Neural Net., 10, 684-697, 1999.
M. Weintraub: A Theory and Computational Model of Auditory Monau- ral Sound Separation, Ph.D. Dissertation, Stanford University Dept. Elec. Eng., 1985.
M. Wu, D.L. Wang, G.J. Brown: A multipitch tracking algorithm for noisy speech, IEEE Trans. Speech Audio Process., 11, 229-241, 2003.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
(2006). An Auditory Scene Analysis Approach to Monaural Speech Segregation. In: Hänsler, E., Schmidt, G. (eds) Topics in Acoustic Echo and Noise Control. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33213-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-33213-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33212-1
Online ISBN: 978-3-540-33213-8
eBook Packages: EngineeringEngineering (R0)