Signal Processing for Audio HCI

Chapter

Abstract

This chapter reviews recent advances in computer audio processing from the viewpoint of improving the human-computer interface. Microphone arrays are described as basic tools for untethered audio acquisition, and principles for the synthesis of realistic virtual audio are outlined. The influence of room acoustics on audio acquisition and production is also considered. The chapter finishes with a review of several relevant signal processing systems, including a fast head-related transfer function (HRTF) measurement system and a complete system for capture, visualization, and reproduction of auditory scenes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Brandstein and D. Ward (2001). “Microphone arrays: signal processing techniques and applications”, Springer, New York, NY.Google Scholar
  2. 2.
    J. Chen, J. Benesty, and Y. Huang (2006). “Time delay estimation in room acoustic environments: An overview”, EURASIP Journal on Applied Signal Processing, vol. 2006, no. 1.Google Scholar
  3. 3.
    M. S. Brandstein and H. F. Silverman (1997). “A robust method for speech signal time-delay estimation in reverberant rooms”, Proc. IEEE ICASSP 1997, Munich, Germany, pp. 375–378.Google Scholar
  4. 4.
    A. G. Piersol (1981). “Time delay estimation using phase data”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 471–477.CrossRefGoogle Scholar
  5. 5.
    B. Yegnanarayana, S. R. M. Prasanna, R. Duraiswami, and D. N. Zotkin (2005). ”Processing of reverberant speech for time-delay estimation”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 6, pp. 1110–1118.CrossRefGoogle Scholar
  6. 6.
    J. Dmochowski, J. Benesty, and S. Affes (2007). “Direction of arrival estimation using the parameterized spatial correlation matrix”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1327–1339.CrossRefGoogle Scholar
  7. 7.
    H. Wang and M. Kaveh (1985). “Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 33, no. 4, pp. 823–831.CrossRefGoogle Scholar
  8. 8.
    D. B. Ward and R. C. Williamson (2002). “Particle filter beamforming for acoustic source localization in a reverberant environment”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1777–1780.Google Scholar
  9. 9.
    D. N. Zotkin and R. Duraiswami (2004). ”Accelerated speech source localization via a hierarchical search of steered response power”, IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 499–508.Google Scholar
  10. 10.
    M. Wax and T. Kailath (1983). “Optimum localization of multiple sources by passive arrays”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 5, pp. 1210–1217.CrossRefMathSciNetGoogle Scholar
  11. 11.
    M. F. Berger and H. F. Silverman (1991). “Microphone array optimization by stochastic region contraction”, IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2377–2386.CrossRefGoogle Scholar
  12. 12.
    B. D. van Veen and K. B. Buckley (1988). “Beamforming: A versatile approach to spatial filtering”, IEEE ASSP Magazine, vol. 5, no. 2, pp. 4–24.CrossRefGoogle Scholar
  13. 13.
    B. Rafaely (2005). “Analysis and design of spherical microphone arrays”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135–143.CrossRefGoogle Scholar
  14. 14.
    C. Kyriakakis, P. Tsakalides, and T. Holman (1999). “Surrounded by sound: Immersive audio acquisition and rendering methods”, IEEE Signal Processing Magazine, vol. 16, no. 1, pp. 55–66.CrossRefGoogle Scholar
  15. 15.
    V. Pulkki (2002). “Compensating displacement of amplitude-panned virtual sources”, Proc. 22th AES Conference, Espoo, Finland, pp. 186–195.Google Scholar
  16. 16.
    D. N. Zotkin, R. Duraiswami, and L. S. Davis (2004). ”Rendering localized spatial audio in a virtual auditory space”, IEEE Transactions on Multimedia, vol. 6, no. 4, pp. 553–564.CrossRefGoogle Scholar
  17. 17.
    W. M. Hartmann (1999). “How we localize sound”, Physics Today, November 1999, pp. 24–29.Google Scholar
  18. 18.
    E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman (1993). “Localization using nonindividualized head-related transfer functions”, Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111–123.CrossRefGoogle Scholar
  19. 19.
    C. Jin, P. Leong, J. Leung, A. Corderoy, and S. Carlile (2000). “Enabling individualized virtual auditory space using morphological measurements”, Proceedings of the First IEEE Pacific- Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), pp. 235–238.Google Scholar
  20. 20.
    P. Runkle, A. Yendiki, and G. Wakefield (2000). “Active sensory tuning for immersive spatialized audio”, Proc. ICAD 2000, Atlanta, GA.Google Scholar
  21. 21.
    T. Xiao and Q.-H. Liu (2003). “Finite difference computation of head-related transfer function for human hearing”, Journal of the Acoustical Society of America, vol. 113, no. 5, pp. 2434–2441.CrossRefGoogle Scholar
  22. 22.
    M. Otani and S. Ise (2006). “Fast calculation system specialized for head-related transfer function based on boundary element method”, Journal of the Acoustical Society of America, vol. 119, no. 5, pp. 2589–2598.CrossRefGoogle Scholar
  23. 23.
    N. A. Gumerov and R. Duraiswami (2009). “A broadband fast multipole accelerated boundary element method for the 3D Helmholtz equation”, Journal of the Acoustical Society of America, vol. 125, no. 1, pp. 191–205.CrossRefGoogle Scholar
  24. 24.
    N. A. Gumerov, A. O’Donovan, R. Duraiswami, and D. N. Zotkin (2010). “Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation”, Journal of the Acoustical Society of America, vol. 127, no. 1, pp. 370–386.CrossRefGoogle Scholar
  25. 25.
    R. Duraiswami, D. N. Zotkin, and N. A. Gumerov (2007). ”Fast evaluation of the room transfer function using multipole expansion”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp, 565–576.CrossRefGoogle Scholar
  26. 26.
    J. B. Allen and D. A. Berkeley (1979). “Image method for efficiently simulating small-room acoustics”, Journal of the Acoustical Society of America, vol. 65, no, 4, pp. 943–950.CrossRefGoogle Scholar
  27. 27.
    N. A. Gumerov and R. Duraiswami (2004). “Fast multipole methods for the Helmholtz equation in three dimensions”, Elsevier Science, Amsterdam, The Netherlands.Google Scholar
  28. 28.
    N. F. Dixon and L. Spitz (1980). “The detection of auditory visual desynchrony”, Perception, vol. 9, no. 6, pp. 719–721.CrossRefGoogle Scholar
  29. 29.
    V. R. Algazi, R. O. Duda, D. P. Thompson, and C. Avendano (2001). “The CIPIC HRTF database”, Proc. IEEE WASPAA 2001, New Paltz, NY, pp. 99–102.Google Scholar
  30. 30.
    E. Grassi, J. Tulsi, and S. A. Shamma (2003). “Measurement of head-related transfer functions based on the empirical transfer function estimate”, Proc ICAD 2003, Boston, MA.Google Scholar
  31. 31.
    D. N. Zotkin, R. Duraiswami, E. Grassi, and N. A. Gumerov (2006). ”Fast head-related transfer function measurement via reciprocity”, Journal of the Acoustical Society of America, vol. 120, no. 4, pp. 2202–2215.CrossRefGoogle Scholar
  32. 32.
    P. M. Morse and K. U. Ingard (1968). “Theoretical Acoustics”, Princeton University Press, New Jersey.Google Scholar
  33. 33.
    V. R. Algazi, R. O. Duda, and D.M. Thompson (2002). “The use of head-and-torso models for improved spatial sound synthesis”, Proc. 113th AES convention, Los Angeles, CA, preprint #5712.Google Scholar
  34. 34.
    A. E. O’Donovan, D. N. Zotkin, and R. Duraiswami (2008). “Spherical microphone array based immersive audio scene rendering”, Proc. ICAD 2008, Paris, France.Google Scholar
  35. 35.
    J. Meyer and G. Elko (2002). “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1781–1784.Google Scholar
  36. 36.
    D. N. Zotkin, R. Duraiswami, and N. A. Gumerov (2010). “Plane-wave decomposition of acoustical scenes via spherical and cylindrical microphone arrays”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 2–16.CrossRefGoogle Scholar
  37. 37.
    H. Teutsch (2007). “Modal array signal processing: principles and applications of acoustic wavefield decomposition”, Springer-Verlag, Berlin, Germany.MATHGoogle Scholar
  38. 38.
    M. Park and B. Rafaely (2005). “Sound-field analysis by plane-wave decomposition using spherical microphone array”, Journal of the Acoustical Society of America, vol. 118, no. 5, pp. 3094–3103.CrossRefGoogle Scholar
  39. 39.
    Z. Li and R. Duraiswami (2007). “Flexible and optimal design of spherical microphone arrays for beamforming”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 702–714.CrossRefGoogle Scholar
  40. 40.
    R. Duraiswami, D. N. Zotkin, Z. Li, E. Grassi, N. A. Gumerov, and L. S. Davis (2005). ”High order spatial audio capture and its binaural head-tracked playback over headphones with HRTF cues”, Proc. 119th AES convention, New York, NY, preprint #6540.Google Scholar
  41. 41.
    A. E. O’Donovan, R. Duraiswami, and N. A. Gumerov (2007). “Real time capture of audio images and their use with video”, Proc. IEEE WASPAA 2007, New Paltz, NY, pp. 10-–13.Google Scholar
  42. 42.
    A. E. O’Donovan, R. Duraiswami, and D. N. Zotkin (2008). “Imaging concert hall acoustics using visual and audio cameras”, Proc. IEEE ICASSP 2008, Las Vegas, NV, April 2008, pp. 5284–5287.Google Scholar
  43. 43.
    A. E. O’Donovan, R. Duraiswami, and J. Neumann (2007). “Microphone arrays as generalized cameras for integrated audio-visual processing”, Proc. IEEE CVPR 2007, Minneapolis, MN.Google Scholar
  44. 44.
    NVIDIA, NVIDIA CUDA Programming Guide 2.3, 2009.Google Scholar
  45. 45.
    http://www.gpgpu.org/ - General-Purpose Computation on GPU.
  46. 46.
    J. D. Owens et al. (2008). “GPU computing”, Proceedings of the IEEE, vol. 96, no. 5, pp. 879–899.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Institute for Advanced Computer Studies (UMIACS)University of MarylandCollege ParkUSA
  2. 2.Department of Computer Science and Institute for Advanced Computer Studies (UMIACS)University of MarylandCollege ParkUSA

Personalised recommendations