Skip to main content

An Auditory Scene Analysis Approach to Monaural Speech Segregation

  • Chapter
Topics in Acoustic Echo and Noise Control

Part of the book series: Signals and Communication Technology ((SCT))

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.M.A. Ali, J. Van der Spiegel: Acoustic-phonetic features for the automatic classification of stop consonants, IEEE Trans. Speech Audio Process., 9, 833-841, 2001.

    Article  Google Scholar 

  2. J.P. Barker, M.P. Cooke, D.P.W. Ellis: Decoding speech in the presence of other sources, Speech Comm., 45, 5-25, 2005.

    Article  Google Scholar 

  3. J. Bird, C.J. Darwin: Effects of a difference in fundamental frequency inseparating two sentences, in A.R. Palmer, A. Rees, A.Q. Summerfield, R. Meddis (eds.), Psychophysical and Physiological Advances in Hearing, London, UK: Whurr, 263-269, 1998.

    Google Scholar 

  4. P. Boersma, D. Weenink: Praat: Doing Phonetics by Computer, Version 4.2.31, http://www.fon.hum.uva.nl/praat/, 2004.

  5. S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., 27, 113-120, 1979.

    Article  Google Scholar 

  6. A.S. Bregman: Auditory Scene Analysis, Cambridge, MA, USA: MIT Press, 1990.

    Google Scholar 

  7. G.J. Brown, M.P. Cooke: Computational auditory scene analysis, Comput. Speech and Language, 8, 297-336, 1994.

    Article  Google Scholar 

  8. G.J. Brown, D.L. Wang: Separation of speech by computational auditory scene analysis, J. Benesty, S. Makino, J. Chen (eds.), Speech Enhancement, Berlin, Germany: Springer, 371-402, 2005.

    Chapter  Google Scholar 

  9. D.S. Brungart, P.S. Chang, B.D. Simpson, D.L. Wang: Isolating the energetic component of speech-on-speech masking with an ideal binary mask, Submitted for journal publication, 2005.

    Google Scholar 

  10. J. Canny: A computational approach to edge detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, 679-698, 1986.

    Article  Google Scholar 

  11. R.P. Carlyon, T.M. Shackleton: Comparing the fundamental frequen- cies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J. Acoust. Soc. Am., 95, 3541-3554, 1994.

    Article  Google Scholar 

  12. P.S. Chang: Exploration of Behavioral, Physiological, and Compu- tational Approaches to Auditory Scene Analysis, M.S. Thesis, The Ohio State University Dept. Comput. Sci. & Eng., 2004 (available at http://www.cse.ohio-state.edu/pnl/theses).

  13. M.P. Cooke: Modelling Auditory Processing and Organisation, Cam- bridge, UK: Cambridge University Press, 1993.

    Google Scholar 

  14. M.P. Cooke, P. Green, L. Josifovski, A. Vizinho: Robust automatic speech recognition with missing and unreliable acoustic data, Speech Comm., 34, 267-285, 2001.

    Article  MATH  Google Scholar 

  15. L.A. Drake: Sound Source Separation via Computational Auditory Scene Analysis (CASA) - Enhanced Beamforming, Ph.D. Dissertation, Northwestern University Dept. Elec. Eng., 2001.

    Google Scholar 

  16. D.P.W. Ellis: Prediction-driven Computational Auditory Scene Analy- sis, Ph.D. Dissertation, MIT Dept. Elec. Eng. & Comput. Sci., 1996.

    Google Scholar 

  17. Y. Ephraim, H.L. van Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process., 3, 251-266, 1995.

    Article  Google Scholar 

  18. J. Garofolo, L. Lamel, et al.: Darpa TIMIT acoustic-phonetic continuous speech corpus, NISTIR 4930, 1993.

    Google Scholar 

  19. H. Helmholtz: On the Sensation of Tone, 2nd English ed., New York, NY, USA: Dover Publishers, 1863.

    Google Scholar 

  20. J. Holdsworth, I. Nimmo-Smith, R.D. Patterson, P. Rice: Implementing a gammatone filter bank, MRC Applied Psych. Unit, 1988.

    Google Scholar 

  21. G. Hu, D.L. Wang: Speech segregation based on pitch tracking and am- plitude modulation, Proc. WASPAA ’01, 79-82, New Paltz, New York, USA, 2001.

    Google Scholar 

  22. G. Hu, D.L. Wang: Separation of stop consonants, Proc. ICASSP ’03, 2,749-752, 2003.

    Google Scholar 

  23. G. Hu, D.L. Wang: Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. Neural Net., 15, 1135-1150, 2004.

    Article  MathSciNet  Google Scholar 

  24. G. Hu, D.L. Wang: Auditory segmentation based on event detection, Proc. ISCA Tutorial and Research Workshop on Stat. & Percept. Audio Process., 2004.

    Google Scholar 

  25. G. Hu, D.L. Wang: Separation of fricatives and affricates, Proc. ICASSP ’05, 1, 1101-1104, Philadelphia, PA, USA, 2005.

    Google Scholar 

  26. A. Hyvärinen, J. Karhunen, E. Oja: Independent Component Analysis, New York, NY, USA: Wiley, 2001.

    Book  Google Scholar 

  27. ISO: Normal Equal-loudness Level Contours for Pure Tones under Free- field Listening Conditions (ISO 226), International standards organiza- tion.

    Google Scholar 

  28. J. Jensen, J.H.L. Hansen: Speech enhancement using a constrained iter- ative sinusoidal model, IEEE Trans. Speech Audio Process., 9, 731-740, 2001.

    Article  Google Scholar 

  29. H. Krim, M. Viberg: Two decades of array signal processing research: The parametric approach, IEEE Signal Process. Mag., 13, 67-94, 1996.

    Article  Google Scholar 

  30. P. Ladefoged: Vowels and Consonants, Oxford, UK: Blackwell, 2001.

    Google Scholar 

  31. J.C.R. Licklider: A duplex theory of pitch perception, Experientia, 7, 128-134, 1951.

    Article  Google Scholar 

  32. D. Marr: Vision, New York, NY, USA: Freeman, 1982.

    Google Scholar 

  33. R. Meddis: Simulation of auditory-neural transduction: Further studies, J. Acoust. Soc. Am., 83, 1056-1063, 1988.

    Article  Google Scholar 

  34. R. Meddis, M. Hewitt: Modelling the identification of concurrent vowels with different fundamental frequencies, J. Acoust. Soc. Am., 91, 233-245,1992.

    Article  Google Scholar 

  35. B.C.J. Moore: An Introduction to the Psychology of Hearing, 5th ed., San Diego, CA, USA: Academic Press, 2003.

    Google Scholar 

  36. R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, P. Rice: An efficient auditory filterbank based on the gammatone function, MRC Applied Psych. Unit. 2341, 1988.

    Google Scholar 

  37. J.O. Pickles: An Introduction to the Physiology of Hearing, 2nd ed., London, UK: Academic Press, 1988.

    Google Scholar 

  38. R. Plomp: The Ear as a Frequency Analyzer, J. Acoust. Soc. Am., 36, 1628-1636, 1964.

    Article  Google Scholar 

  39. R. Plomp: The Intelligent Ear, Mahwah, NJ, USA: Lawrence Erlbaum Associates, 2002.

    Google Scholar 

  40. R. Plomp, A.M. Mimpen: The ear as a frequency analyzer II, J. Acoust. Soc. Am., 43, 764-767, 1968.

    Article  Google Scholar 

  41. N. Roman, D.L. Wang: A pitch-based model for separation of reverber- ant speech, Proc. INTERSPEECH ’05, 2109-2112, Lisbon, Portugal, 2005.

    Google Scholar 

  42. N. Roman, D.L. Wang, G.J. Brown: Speech segregation based on sound localization, J. Acoust. Soc. Am., 114, 2236-2252, 2003.

    Article  Google Scholar 

  43. B.H. Romeny, L. Florack, J. Koenderink, M. Viergever (eds.): Scale- space Theory in Computer Vision, Berlin, Germany: Springer, 1997.

    Google Scholar 

  44. D.F. Rosenthald, H.G. Okuno (eds.): Computational Auditory Scene Analysis, Mahwah, NJ: Lawrence Erlbaum Associates, 1998.

    Google Scholar 

  45. S.T. Roweis: One microphone source separation, Proceedings of the Annual Neural Information Processing Systems (NIPS 2000) Conference, 2001.

    Google Scholar 

  46. H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strate- gies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process., 6, 445-455, 1998.

    Article  Google Scholar 

  47. Y. Shao, D.L. Wang: Model-based sequential organization in cochannel speech, IEEE Trans. Speech Audio Process., in press, 2005.

    Google Scholar 

  48. M. Slaney, R.F. Lyons: A perceptual pitch detector, Proc. ICASSP ’90, 1,357-360, Albuquerque, NM, USA, 1990.

    Google Scholar 

  49. S. Srinivasan, D.L. Wang: A schema-based model for phonemic restora- tion, Speech Comm., 45, 63-87, 2005.

    Article  Google Scholar 

  50. K.N. Stevens: Acoustic Phonetics, Cambridge, MA, USA: MIT Press, 1998.

    Google Scholar 

  51. D.L. Wang: On ideal binary mask as the computational goal of audi- tory scene analysis, P. Divenyi (ed.), Speech Separation by Humans and Machines, Norwell, MA, USA: Kluwer, 181-197, 2005.

    Chapter  Google Scholar 

  52. D.L. Wang, G.J. Brown: Separation of speech from interfering sounds based on oscillatory correlation, IEEE Trans. Neural Net., 10, 684-697, 1999.

    Article  Google Scholar 

  53. M. Weintraub: A Theory and Computational Model of Auditory Monau- ral Sound Separation, Ph.D. Dissertation, Stanford University Dept. Elec. Eng., 1985.

    Google Scholar 

  54. M. Wu, D.L. Wang, G.J. Brown: A multipitch tracking algorithm for noisy speech, IEEE Trans. Speech Audio Process., 11, 229-241, 2003.

    Article  Google Scholar 

Download references

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

(2006). An Auditory Scene Analysis Approach to Monaural Speech Segregation. In: Hänsler, E., Schmidt, G. (eds) Topics in Acoustic Echo and Noise Control. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33213-8_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-33213-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33212-1

  • Online ISBN: 978-3-540-33213-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics