Abstract
This chapter reviews recent advances in computer audio processing from the viewpoint of improving the human-computer interface. Microphone arrays are described as basic tools for untethered audio acquisition, and principles for the synthesis of realistic virtual audio are outlined. The influence of room acoustics on audio acquisition and production is also considered. The chapter finishes with a review of several relevant signal processing systems, including a fast head-related transfer function (HRTF) measurement system and a complete system for capture, visualization, and reproduction of auditory scenes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Brandstein and D. Ward (2001). “Microphone arrays: signal processing techniques and applications”, Springer, New York, NY.
J. Chen, J. Benesty, and Y. Huang (2006). “Time delay estimation in room acoustic environments: An overview”, EURASIP Journal on Applied Signal Processing, vol. 2006, no. 1.
M. S. Brandstein and H. F. Silverman (1997). “A robust method for speech signal time-delay estimation in reverberant rooms”, Proc. IEEE ICASSP 1997, Munich, Germany, pp. 375–378.
A. G. Piersol (1981). “Time delay estimation using phase data”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 471–477.
B. Yegnanarayana, S. R. M. Prasanna, R. Duraiswami, and D. N. Zotkin (2005). ”Processing of reverberant speech for time-delay estimation”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 6, pp. 1110–1118.
J. Dmochowski, J. Benesty, and S. Affes (2007). “Direction of arrival estimation using the parameterized spatial correlation matrix”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1327–1339.
H. Wang and M. Kaveh (1985). “Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 33, no. 4, pp. 823–831.
D. B. Ward and R. C. Williamson (2002). “Particle filter beamforming for acoustic source localization in a reverberant environment”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1777–1780.
D. N. Zotkin and R. Duraiswami (2004). ”Accelerated speech source localization via a hierarchical search of steered response power”, IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 499–508.
M. Wax and T. Kailath (1983). “Optimum localization of multiple sources by passive arrays”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 5, pp. 1210–1217.
M. F. Berger and H. F. Silverman (1991). “Microphone array optimization by stochastic region contraction”, IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2377–2386.
B. D. van Veen and K. B. Buckley (1988). “Beamforming: A versatile approach to spatial filtering”, IEEE ASSP Magazine, vol. 5, no. 2, pp. 4–24.
B. Rafaely (2005). “Analysis and design of spherical microphone arrays”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135–143.
C. Kyriakakis, P. Tsakalides, and T. Holman (1999). “Surrounded by sound: Immersive audio acquisition and rendering methods”, IEEE Signal Processing Magazine, vol. 16, no. 1, pp. 55–66.
V. Pulkki (2002). “Compensating displacement of amplitude-panned virtual sources”, Proc. 22th AES Conference, Espoo, Finland, pp. 186–195.
D. N. Zotkin, R. Duraiswami, and L. S. Davis (2004). ”Rendering localized spatial audio in a virtual auditory space”, IEEE Transactions on Multimedia, vol. 6, no. 4, pp. 553–564.
W. M. Hartmann (1999). “How we localize sound”, Physics Today, November 1999, pp. 24–29.
E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman (1993). “Localization using nonindividualized head-related transfer functions”, Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111–123.
C. Jin, P. Leong, J. Leung, A. Corderoy, and S. Carlile (2000). “Enabling individualized virtual auditory space using morphological measurements”, Proceedings of the First IEEE Pacific- Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), pp. 235–238.
P. Runkle, A. Yendiki, and G. Wakefield (2000). “Active sensory tuning for immersive spatialized audio”, Proc. ICAD 2000, Atlanta, GA.
T. Xiao and Q.-H. Liu (2003). “Finite difference computation of head-related transfer function for human hearing”, Journal of the Acoustical Society of America, vol. 113, no. 5, pp. 2434–2441.
M. Otani and S. Ise (2006). “Fast calculation system specialized for head-related transfer function based on boundary element method”, Journal of the Acoustical Society of America, vol. 119, no. 5, pp. 2589–2598.
N. A. Gumerov and R. Duraiswami (2009). “A broadband fast multipole accelerated boundary element method for the 3D Helmholtz equation”, Journal of the Acoustical Society of America, vol. 125, no. 1, pp. 191–205.
N. A. Gumerov, A. O’Donovan, R. Duraiswami, and D. N. Zotkin (2010). “Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation”, Journal of the Acoustical Society of America, vol. 127, no. 1, pp. 370–386.
R. Duraiswami, D. N. Zotkin, and N. A. Gumerov (2007). ”Fast evaluation of the room transfer function using multipole expansion”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp, 565–576.
J. B. Allen and D. A. Berkeley (1979). “Image method for efficiently simulating small-room acoustics”, Journal of the Acoustical Society of America, vol. 65, no, 4, pp. 943–950.
N. A. Gumerov and R. Duraiswami (2004). “Fast multipole methods for the Helmholtz equation in three dimensions”, Elsevier Science, Amsterdam, The Netherlands.
N. F. Dixon and L. Spitz (1980). “The detection of auditory visual desynchrony”, Perception, vol. 9, no. 6, pp. 719–721.
V. R. Algazi, R. O. Duda, D. P. Thompson, and C. Avendano (2001). “The CIPIC HRTF database”, Proc. IEEE WASPAA 2001, New Paltz, NY, pp. 99–102.
E. Grassi, J. Tulsi, and S. A. Shamma (2003). “Measurement of head-related transfer functions based on the empirical transfer function estimate”, Proc ICAD 2003, Boston, MA.
D. N. Zotkin, R. Duraiswami, E. Grassi, and N. A. Gumerov (2006). ”Fast head-related transfer function measurement via reciprocity”, Journal of the Acoustical Society of America, vol. 120, no. 4, pp. 2202–2215.
P. M. Morse and K. U. Ingard (1968). “Theoretical Acoustics”, Princeton University Press, New Jersey.
V. R. Algazi, R. O. Duda, and D.M. Thompson (2002). “The use of head-and-torso models for improved spatial sound synthesis”, Proc. 113th AES convention, Los Angeles, CA, preprint #5712.
A. E. O’Donovan, D. N. Zotkin, and R. Duraiswami (2008). “Spherical microphone array based immersive audio scene rendering”, Proc. ICAD 2008, Paris, France.
J. Meyer and G. Elko (2002). “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1781–1784.
D. N. Zotkin, R. Duraiswami, and N. A. Gumerov (2010). “Plane-wave decomposition of acoustical scenes via spherical and cylindrical microphone arrays”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 2–16.
H. Teutsch (2007). “Modal array signal processing: principles and applications of acoustic wavefield decomposition”, Springer-Verlag, Berlin, Germany.
M. Park and B. Rafaely (2005). “Sound-field analysis by plane-wave decomposition using spherical microphone array”, Journal of the Acoustical Society of America, vol. 118, no. 5, pp. 3094–3103.
Z. Li and R. Duraiswami (2007). “Flexible and optimal design of spherical microphone arrays for beamforming”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 702–714.
R. Duraiswami, D. N. Zotkin, Z. Li, E. Grassi, N. A. Gumerov, and L. S. Davis (2005). ”High order spatial audio capture and its binaural head-tracked playback over headphones with HRTF cues”, Proc. 119th AES convention, New York, NY, preprint #6540.
A. E. O’Donovan, R. Duraiswami, and N. A. Gumerov (2007). “Real time capture of audio images and their use with video”, Proc. IEEE WASPAA 2007, New Paltz, NY, pp. 10-–13.
A. E. O’Donovan, R. Duraiswami, and D. N. Zotkin (2008). “Imaging concert hall acoustics using visual and audio cameras”, Proc. IEEE ICASSP 2008, Las Vegas, NV, April 2008, pp. 5284–5287.
A. E. O’Donovan, R. Duraiswami, and J. Neumann (2007). “Microphone arrays as generalized cameras for integrated audio-visual processing”, Proc. IEEE CVPR 2007, Minneapolis, MN.
NVIDIA, NVIDIA CUDA Programming Guide 2.3, 2009.
http://www.gpgpu.org/ - General-Purpose Computation on GPU.
J. D. Owens et al. (2008). “GPU computing”, Proceedings of the IEEE, vol. 96, no. 5, pp. 879–899.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zotkin, D.N., Duraiswami, R. (2010). Signal Processing for Audio HCI. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6345-1_10
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6345-1_10
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6344-4
Online ISBN: 978-1-4419-6345-1
eBook Packages: EngineeringEngineering (R0)