Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Chien, Jen-Tzung; Lai, Jain-Ray

doi:10.1007/978-1-4757-6363-8_6

Jen-Tzung Chien⁴ &
Jain-Ray Lai⁴

162 Accesses

Abstract

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, Inc., 1993.
MATH Google Scholar
H.F. Silverman, “Some Analysis of Microphone Arrays for Speech Data Acquisition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 12, 1987, pp. 1699–1711.
Article Google Scholar
M. Omologo and P. Svaizer, “Acoustic Event Localization Using a Crosspower-Spectrum Phone Phase Based Technique,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1994, pp. 273276.
Google Scholar
T. Yamada, S. Nakamura, and K. Shikano, “Robust Speech Recognition with Speaker Localization by a Microphone Array,” Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1317–1320.
Google Scholar
M.S. Brandstein, J.E. Adcock, and H.F. Silverman, “A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 1, 1997, pp. 45–50.
Article Google Scholar
T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, “Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), “ vol. 2, 2000, pp. 1053–1056.
Google Scholar
Y. Nagata and H. Tsuboi, “A Two-Channel Adaptive Microphone Array with Target Tracking,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 343–346.
Google Scholar
M. Mizumachi and M. Akagi, “Noise Reduction by Paired-Microphones Using Spectral Subtraction,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSPI, vol. 2, 1998, pp. 1001–1004.
Google Scholar
Q.-G. Liu, B. Champagne, and R. Kabal, “A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space,” Speech Communication,vol. 18, 1996, pp. 317334.
Google Scholar
M. Dahl and I. Claesson, “Acoustic Noise and Echo Canceling with Microphone Array,” IEEE Transactions on Vehicular Technology, vol. 48, no. 5, 1999, pp. 1518–1526.
Article Google Scholar
M. Dorbecker, “Small Microphone Arrays with Optimized Directivity for Speech Enhancement,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 327–330.
Google Scholar
H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1999, pp. 69–72.
Google Scholar
D. Mahmoudi, “A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 339–342.
Google Scholar
D. Mahmoudi and A. Drygajlo, “Combined Wiener and Coherence Filtering in Wavelet Domain for Microphone Array Speech Enhancement,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 385–388.
Google Scholar
M. Inoue, S. Nakamura, T. Yamada, and K. Shikano, “Microphone Array Design Measures for Hands-Free Speech Recognition,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 331–334.
Google Scholar
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of Different Microphone Array Configurations for Hands-Free Speech Recognition in Noisy and Reverberant Environment,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 347–350.
Google Scholar
R. Aubauer, R. Kern, and D. Leckschat, “Optimized Second Order Gradient Microphone for Hands-Free Speech Recordings in Cars,” in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 191–194.
Google Scholar
J. Bitzer, K.U. Simmer, and K.D. Kammeyer, “Multi-Microphone Noise Reduction Techniques for Hands-Free Speech Recognition—A Comparative Study,” in Proceeding of Workshop on Robust Methods, for Speech Recognition in Adverse Conditions, 1999, pp. 171–174.
Google Scholar
T. Yamada, S. Nakamura, and K. Shikano, “Hands-Free Speech Recognition Based on 3-D Viterbi Search Using a Microphone Array,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 245–248.
Google Scholar
J.E. Adcock, Y. Gotoh, D.J. Mashao, and H.F. Silverman, “Microphone-Array Speech Recognition via Incremental MAP Training,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1996, pp. 897–900.
Google Scholar
J.L. Gauvain and C.-H. Lee, “Maximum a posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, 1994, pp. 291–298.
Article Google Scholar
C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, vol. 9, 1995, pp. 171–185.
Article Google Scholar
D. Giuliani, M. Matassoni, M. Omologo, and R. Svaizer, “Experiments of HMM Adaptation for Hands-Free Connected Digit Recognition,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 473–476.
Google Scholar
J. Kleban and Y. Gong, “HMM Adaptation and Microphone Array Processing for Distant Speech Recognition,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2000, pp. 1411–1414.
Google Scholar
H.F. Silverman and S.E. Kirtman, “A Two-Atage Algorithm for Determining Talker Location from Linear Microphone Array Data,” Computer Speech and Language,vol. 6, 1992, pp. 129152.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statist. Society (B), vol. 39, 1977, pp. 1–38.
MathSciNet MATH Google Scholar
J.-T. Chien and J.-C. Junqua, “Unsupervised Hierarchical Adaptation Using Reliable Selection of Cluster-Dependent Parameters,” Speech Communication,vol. 30, no. 4, 2000, pp. 235253.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 70101, Taiwan, Republic of China
Jen-Tzung Chien & Jain-Ray Lai

Authors

Jen-Tzung Chien
View author publications
You can also search for this author in PubMed Google Scholar
Jain-Ray Lai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Taiwan, R.O.C.
Jhing-Fa Wang
Tokyo Institute of Technology, Tokyo, Japan
Sadaoki Furui
Georgia Institute of Technology, Atlanta, Georgia, USA
Biing-Hwang Juang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chien, JT., Lai, JR. (2004). Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition. In: Wang, JF., Furui, S., Juang, BH. (eds) Real World Speech Processing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6363-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4757-6363-8_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5439-8
Online ISBN: 978-1-4757-6363-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics