Signal Processing for Audio HCI

Zotkin, Dmitry N.; Duraiswami, Ramani

doi:10.1007/978-1-4419-6345-1_10

Dmitry N. Zotkin⁵ &
Ramani Duraiswami⁶

2880 Accesses

Abstract

This chapter reviews recent advances in computer audio processing from the viewpoint of improving the human-computer interface. Microphone arrays are described as basic tools for untethered audio acquisition, and principles for the synthesis of realistic virtual audio are outlined. The influence of room acoustics on audio acquisition and production is also considered. The chapter finishes with a review of several relevant signal processing systems, including a fast head-related transfer function (HRTF) measurement system and a complete system for capture, visualization, and reproduction of auditory scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Brandstein and D. Ward (2001). “Microphone arrays: signal processing techniques and applications”, Springer, New York, NY.
Google Scholar
J. Chen, J. Benesty, and Y. Huang (2006). “Time delay estimation in room acoustic environments: An overview”, EURASIP Journal on Applied Signal Processing, vol. 2006, no. 1.
Google Scholar
M. S. Brandstein and H. F. Silverman (1997). “A robust method for speech signal time-delay estimation in reverberant rooms”, Proc. IEEE ICASSP 1997, Munich, Germany, pp. 375–378.
Google Scholar
A. G. Piersol (1981). “Time delay estimation using phase data”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 471–477.
Article Google Scholar
B. Yegnanarayana, S. R. M. Prasanna, R. Duraiswami, and D. N. Zotkin (2005). ”Processing of reverberant speech for time-delay estimation”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 6, pp. 1110–1118.
Article Google Scholar
J. Dmochowski, J. Benesty, and S. Affes (2007). “Direction of arrival estimation using the parameterized spatial correlation matrix”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1327–1339.
Article Google Scholar
H. Wang and M. Kaveh (1985). “Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 33, no. 4, pp. 823–831.
Article Google Scholar
D. B. Ward and R. C. Williamson (2002). “Particle filter beamforming for acoustic source localization in a reverberant environment”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1777–1780.
Google Scholar
D. N. Zotkin and R. Duraiswami (2004). ”Accelerated speech source localization via a hierarchical search of steered response power”, IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 499–508.
Google Scholar
M. Wax and T. Kailath (1983). “Optimum localization of multiple sources by passive arrays”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 5, pp. 1210–1217.
Article MathSciNet Google Scholar
M. F. Berger and H. F. Silverman (1991). “Microphone array optimization by stochastic region contraction”, IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2377–2386.
Article Google Scholar
B. D. van Veen and K. B. Buckley (1988). “Beamforming: A versatile approach to spatial filtering”, IEEE ASSP Magazine, vol. 5, no. 2, pp. 4–24.
Article Google Scholar
B. Rafaely (2005). “Analysis and design of spherical microphone arrays”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135–143.
Article Google Scholar
C. Kyriakakis, P. Tsakalides, and T. Holman (1999). “Surrounded by sound: Immersive audio acquisition and rendering methods”, IEEE Signal Processing Magazine, vol. 16, no. 1, pp. 55–66.
Article Google Scholar
V. Pulkki (2002). “Compensating displacement of amplitude-panned virtual sources”, Proc. 22th AES Conference, Espoo, Finland, pp. 186–195.
Google Scholar
D. N. Zotkin, R. Duraiswami, and L. S. Davis (2004). ”Rendering localized spatial audio in a virtual auditory space”, IEEE Transactions on Multimedia, vol. 6, no. 4, pp. 553–564.
Article Google Scholar
W. M. Hartmann (1999). “How we localize sound”, Physics Today, November 1999, pp. 24–29.
Google Scholar
E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman (1993). “Localization using nonindividualized head-related transfer functions”, Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111–123.
Article Google Scholar
C. Jin, P. Leong, J. Leung, A. Corderoy, and S. Carlile (2000). “Enabling individualized virtual auditory space using morphological measurements”, Proceedings of the First IEEE Pacific- Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), pp. 235–238.
Google Scholar
P. Runkle, A. Yendiki, and G. Wakefield (2000). “Active sensory tuning for immersive spatialized audio”, Proc. ICAD 2000, Atlanta, GA.
Google Scholar
T. Xiao and Q.-H. Liu (2003). “Finite difference computation of head-related transfer function for human hearing”, Journal of the Acoustical Society of America, vol. 113, no. 5, pp. 2434–2441.
Article Google Scholar
M. Otani and S. Ise (2006). “Fast calculation system specialized for head-related transfer function based on boundary element method”, Journal of the Acoustical Society of America, vol. 119, no. 5, pp. 2589–2598.
Article Google Scholar
N. A. Gumerov and R. Duraiswami (2009). “A broadband fast multipole accelerated boundary element method for the 3D Helmholtz equation”, Journal of the Acoustical Society of America, vol. 125, no. 1, pp. 191–205.
Article Google Scholar
N. A. Gumerov, A. O’Donovan, R. Duraiswami, and D. N. Zotkin (2010). “Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation”, Journal of the Acoustical Society of America, vol. 127, no. 1, pp. 370–386.
Article Google Scholar
R. Duraiswami, D. N. Zotkin, and N. A. Gumerov (2007). ”Fast evaluation of the room transfer function using multipole expansion”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp, 565–576.
Article Google Scholar
J. B. Allen and D. A. Berkeley (1979). “Image method for efficiently simulating small-room acoustics”, Journal of the Acoustical Society of America, vol. 65, no, 4, pp. 943–950.
Article Google Scholar
N. A. Gumerov and R. Duraiswami (2004). “Fast multipole methods for the Helmholtz equation in three dimensions”, Elsevier Science, Amsterdam, The Netherlands.
Google Scholar
N. F. Dixon and L. Spitz (1980). “The detection of auditory visual desynchrony”, Perception, vol. 9, no. 6, pp. 719–721.
Article Google Scholar
V. R. Algazi, R. O. Duda, D. P. Thompson, and C. Avendano (2001). “The CIPIC HRTF database”, Proc. IEEE WASPAA 2001, New Paltz, NY, pp. 99–102.
Google Scholar
E. Grassi, J. Tulsi, and S. A. Shamma (2003). “Measurement of head-related transfer functions based on the empirical transfer function estimate”, Proc ICAD 2003, Boston, MA.
Google Scholar
D. N. Zotkin, R. Duraiswami, E. Grassi, and N. A. Gumerov (2006). ”Fast head-related transfer function measurement via reciprocity”, Journal of the Acoustical Society of America, vol. 120, no. 4, pp. 2202–2215.
Article Google Scholar
P. M. Morse and K. U. Ingard (1968). “Theoretical Acoustics”, Princeton University Press, New Jersey.
Google Scholar
V. R. Algazi, R. O. Duda, and D.M. Thompson (2002). “The use of head-and-torso models for improved spatial sound synthesis”, Proc. 113th AES convention, Los Angeles, CA, preprint #5712.
Google Scholar
A. E. O’Donovan, D. N. Zotkin, and R. Duraiswami (2008). “Spherical microphone array based immersive audio scene rendering”, Proc. ICAD 2008, Paris, France.
Google Scholar
J. Meyer and G. Elko (2002). “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield”, Proc. IEEE ICASSP 2002, Orlando, FL, vol. 2, pp. 1781–1784.
Google Scholar
D. N. Zotkin, R. Duraiswami, and N. A. Gumerov (2010). “Plane-wave decomposition of acoustical scenes via spherical and cylindrical microphone arrays”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 2–16.
Article Google Scholar
H. Teutsch (2007). “Modal array signal processing: principles and applications of acoustic wavefield decomposition”, Springer-Verlag, Berlin, Germany.
MATH Google Scholar
M. Park and B. Rafaely (2005). “Sound-field analysis by plane-wave decomposition using spherical microphone array”, Journal of the Acoustical Society of America, vol. 118, no. 5, pp. 3094–3103.
Article Google Scholar
Z. Li and R. Duraiswami (2007). “Flexible and optimal design of spherical microphone arrays for beamforming”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 702–714.
Article Google Scholar
R. Duraiswami, D. N. Zotkin, Z. Li, E. Grassi, N. A. Gumerov, and L. S. Davis (2005). ”High order spatial audio capture and its binaural head-tracked playback over headphones with HRTF cues”, Proc. 119th AES convention, New York, NY, preprint #6540.
Google Scholar
A. E. O’Donovan, R. Duraiswami, and N. A. Gumerov (2007). “Real time capture of audio images and their use with video”, Proc. IEEE WASPAA 2007, New Paltz, NY, pp. 10-–13.
Google Scholar
A. E. O’Donovan, R. Duraiswami, and D. N. Zotkin (2008). “Imaging concert hall acoustics using visual and audio cameras”, Proc. IEEE ICASSP 2008, Las Vegas, NV, April 2008, pp. 5284–5287.
Google Scholar
A. E. O’Donovan, R. Duraiswami, and J. Neumann (2007). “Microphone arrays as generalized cameras for integrated audio-visual processing”, Proc. IEEE CVPR 2007, Minneapolis, MN.
Google Scholar
NVIDIA, NVIDIA CUDA Programming Guide 2.3, 2009.
Google Scholar
http://www.gpgpu.org/ - General-Purpose Computation on GPU.
J. D. Owens et al. (2008). “GPU computing”, Proceedings of the IEEE, vol. 96, no. 5, pp. 879–899.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park, MD, 20742, USA
Dmitry N. Zotkin
Department of Computer Science and Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park, MD, 20742, USA
Ramani Duraiswami

Authors

Dmitry N. Zotkin
View author publications
You can also search for this author in PubMed Google Scholar
Ramani Duraiswami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry N. Zotkin .

Editor information

Editors and Affiliations

, Dept. of Electrical and, University of Maryland, A. V. Williams Bldg. 2311, College Park, 20742, Maryland, USA
Shuvra S. Bhattacharyya
Leiden Inst. Advanced Computer Science, Leiden Embedded Research Center, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, Netherlands
Ed F. Deprettere
RWTH Aachen University, Templergraben 55, Aachen, 52056, Germany
Rainer Leupers
, Department of Computer Systems, Tampere University of Technology, Korkeakoulunkatu 1, Tampere, 33720, Finland
Jarmo Takala

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zotkin, D.N., Duraiswami, R. (2010). Signal Processing for Audio HCI. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6345-1_10

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6345-1_10
Published: 16 July 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6344-4
Online ISBN: 978-1-4419-6345-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics