International Journal of Speech Technology

, Volume 10, Issue 2–3, pp 63–74

Development of the compact English LVCSR acoustic model for embedded entertainment robot applications

  • Xavier Menéndez-Pidal
  • Ajay Patrikar
  • Lex Olorenshaw
  • Hitoshi Honda
Article

DOI: 10.1007/s10772-008-9012-6

Cite this article as:
Menéndez-Pidal, X., Patrikar, A., Olorenshaw, L. et al. Int J Speech Technol (2007) 10: 63. doi:10.1007/s10772-008-9012-6
  • 47 Downloads

Abstract

In this paper we discuss two techniques to reduce the size of the acoustic model while maintaining or improving the accuracy of the recognition engine. The first technique, demiphone modeling, tries to reduce the redundancy existing in a context dependent state-clustered Hidden Markov Model (HMM). Three-state demiphones optimally designed from the triphone decision tree are introduced to drastically reduce the phone space of the acoustic model and to improve system accuracy. The second redundancy elimination technique is a more classical approach based on parameter tying. Similar vectors of variances in each HMM cluster are tied together to reduce the number of parameters. The closeness between the vectors of variances is measured using a Vector Quantizer (VQ) to maintain the information provided by the variances parameters. The paper also reports speech recognition improvements using assignment of variable number Gaussians per cluster and gender-based HMMs. The main motivation behind these techniques is to improve the acoustic model and at the same time lower its memory usage. These techniques may help in reducing memory and improving accuracy of an embedded Large Vocabulary Continuous Speech Recognition (LVCSR) application.

Keywords

Large Vocabulary Continuous Speech Recognition Acoustic modeling Hidden Markov Model Embedded speech recognition systems Redundancy elimination HMM parameters optimization HMM memory reduction Triphones Demiphones 

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Xavier Menéndez-Pidal
    • 1
  • Ajay Patrikar
    • 2
  • Lex Olorenshaw
    • 2
  • Hitoshi Honda
    • 3
  1. 1.R&D LaboratorySONY Computer Entertainment of AmericaFoster CityUSA
  2. 2.Former Spoken Language Technology LaboratorySONY ElectronicsSan JoséUSA
  3. 3.Information Technologies LaboratoriesSONY CorporationTokyoJapan

Personalised recommendations