Advertisement

Speaker Identification and Speech Recognition Using Phased Arrays

  • Roger Xu
  • Gang Mei
  • ZuBing Ren
  • Chiman Kwan
  • Julien Aube
  • Cedrick Rochet
  • Vincent Stanford
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3864)

Abstract

We summarize our research results on an innovative approach to making smart meeting rooms accessible to hands-free users. Specifically, we developed an autodirective system to acquire speech in a noisy room using a microphone array, and to identify the speech from a privileged speaker among others in real time. We successfully established that a commercial speaker-dependent speech recognition product could recognize beamformed speech acquired using our autodirective algorithm. We used the NIST Smart Flow System and the Mk-III microphone array developed by the National Institute of Standards and Technology to conduct our experiments.

Keywords

Speech Recognition Multimodal Interface Speaker Verification Speaker Identification Microphone Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Stanford, V.: Smart Space Scenario. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 1.1–1.2 (1998)Google Scholar
  2. 2.
    Flanagan, J., Stanford, V.: Situation Awareness in Smart Spaces. In: Proceedings of the 1998 DARPA/NIST Smart Spaces Workshop, July 30-31, Gaithersburg, MD, pp. 3.1–3.13 (1998)Google Scholar
  3. 3.
    Li, Q., Juang, B.: Speaker Authentication. In: Chou, W., Juang, B. (eds.) Pattern Recognition in Speech and Language Processing, pp. 229–259. CRC Press, Boca Raton (2003)Google Scholar
  4. 4.
    Reynolds, D., Rose, R.: Robust Text-Independent Speaker Verification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech and Audio Processing 3(1) (1995)Google Scholar
  5. 5.
    Kwan, C., et al.: A Real-Time Demonstration of the NIST Smart Flow System, Phase 1 SBIR Final Report (2003)Google Scholar
  6. 6.
    Flanagan, J., Berkley, D., Elko, G., West, J., Sondhi, M.: Autodirective Microphone Systems. Acustica 73, 58–71 (1991)Google Scholar
  7. 7.
    DeGraaf, S., Johnson, D.: Capability of Processing Algorithms to Estimate Source Bearings. IEEE Trans. On Acoustics, Speech, and Signal Processing ASSP-33(6), 1368–1379 (1985)CrossRefGoogle Scholar
  8. 8.
    Johnson, D., DeGraaf, S.: Improving the Resolution of Bearing in Passive Sonar Arrays by Eigenvalue Analysis. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-33(6), 638–647 (1982)CrossRefGoogle Scholar
  9. 9.
    Omologo, M., Matassoni, M., Svaizer, P.: Speech Recognition with Microphone Arrays. Microphone Arrays. In: Brandstein, M., Ward, D. (eds.) Signal Processing Techniques and Applications, pp. 331–349. Springer, Heidelberg (2001)Google Scholar
  10. 10.
    Flanagan, J., Huang, T. (eds.): Special Issue on Human-Computer Multimodal Interface. Proc. of the IEEE 91(9) (2003)Google Scholar
  11. 11.
    Hazen, T., et al.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 235–242 (2004)Google Scholar
  12. 12.
    Rose, R., Quek, F., Shi, Y.: MacVisSTA: A System for Multimodal Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 259–264 (2004)Google Scholar
  13. 13.
    Demirdjian, D., Wilson, K., Siracusa, M., Derrell, T.: Real-time Audio-Visual Tracking for Meeting Analysis. In: Proc. of the Sixth International Conference on Multimodal Interfaces, October 14-15, 2004, State College, Pennsylvania, USA, pp. 331–332 (2004)Google Scholar
  14. 14.
    Rabiner, L., Juang, B.-H.: Linear Predictive Coding Model for Speech Recognition. In: Fundamentals of Speech Recognition, pp. 97–121. PTR Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  15. 15.
    Knill, K., Young, S.: Hidden Markov Models in Speech and Language Processing. In: Young, S., Bloothoft, G. (eds.) Corpus-Based Methods in Language and Speech Processing, pp. 36–41. Kluwer Academic Pulishers, Norwell (1997)Google Scholar
  16. 16.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Nabney, I.: Netlab Algorithms for Pattern Recognition. Springer, New York (2001)zbMATHGoogle Scholar
  18. 18.
    Fiscus, J., Radde, N., Garofolo, J., Le, A., Ajot, J., Laprun, C.: The Rich Transcription 2005 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Fiscus, J., Ajot, J., Radde, N., Laprun, C.: Multiple Dimension Levenshtein Edit Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech. LREC, May 2006, Genoa, Italy (to appear, 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Roger Xu
    • 1
  • Gang Mei
    • 1
  • ZuBing Ren
    • 1
  • Chiman Kwan
    • 1
  • Julien Aube
    • 1
  • Cedrick Rochet
    • 1
  • Vincent Stanford
    • 2
  1. 1.Intelligent Automation, Inc.RockvilleUSA
  2. 2.The National Institute of Standards and TechnologyGaithersburgUSA

Personalised recommendations