Skip to main content

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

  • Chapter
Real World Speech Processing
  • 162 Accesses

Abstract

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, Inc., 1993.

    MATH  Google Scholar 

  2. H.F. Silverman, “Some Analysis of Microphone Arrays for Speech Data Acquisition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 12, 1987, pp. 1699–1711.

    Article  Google Scholar 

  3. M. Omologo and P. Svaizer, “Acoustic Event Localization Using a Crosspower-Spectrum Phone Phase Based Technique,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1994, pp. 273276.

    Google Scholar 

  4. T. Yamada, S. Nakamura, and K. Shikano, “Robust Speech Recognition with Speaker Localization by a Microphone Array,” Proceedings of International Conference on Spoken Language Processing (ICSLP), 1996, pp. 1317–1320.

    Google Scholar 

  5. M.S. Brandstein, J.E. Adcock, and H.F. Silverman, “A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 1, 1997, pp. 45–50.

    Article  Google Scholar 

  6. T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, “Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), “ vol. 2, 2000, pp. 1053–1056.

    Google Scholar 

  7. Y. Nagata and H. Tsuboi, “A Two-Channel Adaptive Microphone Array with Target Tracking,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 343–346.

    Google Scholar 

  8. M. Mizumachi and M. Akagi, “Noise Reduction by Paired-Microphones Using Spectral Subtraction,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSPI, vol. 2, 1998, pp. 1001–1004.

    Google Scholar 

  9. Q.-G. Liu, B. Champagne, and R. Kabal, “A Microphone Array Processing Technique for Speech Enhancement in a Reverberant Space,” Speech Communication,vol. 18, 1996, pp. 317334.

    Google Scholar 

  10. M. Dahl and I. Claesson, “Acoustic Noise and Echo Canceling with Microphone Array,” IEEE Transactions on Vehicular Technology, vol. 48, no. 5, 1999, pp. 1518–1526.

    Article  Google Scholar 

  11. M. Dorbecker, “Small Microphone Arrays with Optimized Directivity for Speech Enhancement,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 327–330.

    Google Scholar 

  12. H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1999, pp. 69–72.

    Google Scholar 

  13. D. Mahmoudi, “A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 339–342.

    Google Scholar 

  14. D. Mahmoudi and A. Drygajlo, “Combined Wiener and Coherence Filtering in Wavelet Domain for Microphone Array Speech Enhancement,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 385–388.

    Google Scholar 

  15. M. Inoue, S. Nakamura, T. Yamada, and K. Shikano, “Microphone Array Design Measures for Hands-Free Speech Recognition,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 331–334.

    Google Scholar 

  16. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, “Use of Different Microphone Array Configurations for Hands-Free Speech Recognition in Noisy and Reverberant Environment,” Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 347–350.

    Google Scholar 

  17. R. Aubauer, R. Kern, and D. Leckschat, “Optimized Second Order Gradient Microphone for Hands-Free Speech Recordings in Cars,” in Proceeding of Workshop on Robust Methods for Speech Recognition in Adverse Conditions, 1999, pp. 191–194.

    Google Scholar 

  18. J. Bitzer, K.U. Simmer, and K.D. Kammeyer, “Multi-Microphone Noise Reduction Techniques for Hands-Free Speech Recognition—A Comparative Study,” in Proceeding of Workshop on Robust Methods, for Speech Recognition in Adverse Conditions, 1999, pp. 171–174.

    Google Scholar 

  19. T. Yamada, S. Nakamura, and K. Shikano, “Hands-Free Speech Recognition Based on 3-D Viterbi Search Using a Microphone Array,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 245–248.

    Google Scholar 

  20. J.E. Adcock, Y. Gotoh, D.J. Mashao, and H.F. Silverman, “Microphone-Array Speech Recognition via Incremental MAP Training,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 1996, pp. 897–900.

    Google Scholar 

  21. J.L. Gauvain and C.-H. Lee, “Maximum a posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, 1994, pp. 291–298.

    Article  Google Scholar 

  22. C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, vol. 9, 1995, pp. 171–185.

    Article  Google Scholar 

  23. D. Giuliani, M. Matassoni, M. Omologo, and R. Svaizer, “Experiments of HMM Adaptation for Hands-Free Connected Digit Recognition,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1998, pp. 473–476.

    Google Scholar 

  24. J. Kleban and Y. Gong, “HMM Adaptation and Microphone Array Processing for Distant Speech Recognition,” IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2000, pp. 1411–1414.

    Google Scholar 

  25. H.F. Silverman and S.E. Kirtman, “A Two-Atage Algorithm for Determining Talker Location from Linear Microphone Array Data,” Computer Speech and Language,vol. 6, 1992, pp. 129152.

    Google Scholar 

  26. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statist. Society (B), vol. 39, 1977, pp. 1–38.

    MathSciNet  MATH  Google Scholar 

  27. J.-T. Chien and J.-C. Junqua, “Unsupervised Hierarchical Adaptation Using Reliable Selection of Cluster-Dependent Parameters,” Speech Communication,vol. 30, no. 4, 2000, pp. 235253.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Chien, JT., Lai, JR. (2004). Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition. In: Wang, JF., Furui, S., Juang, BH. (eds) Real World Speech Processing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6363-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6363-8_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5439-8

  • Online ISBN: 978-1-4757-6363-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics