Skip to main content

Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Abstract

This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows the synthesis of unseen triphones. The proposed technique is compared with decision tree-based state tying, and it is shown that for bigger acoustic models, at a size of 3000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system on a large vocabulary continuous speech transription task. Experiments, performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. Relative decrease of word error rate was 4.23% for models with 7500 states, and 4.13% at 11500 states.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D., Picheny, M.A.: Decision trees for phonological rules in continuous speech. In: ICASSP 1991, pp. 185–188. IEEE Computer Society, Washington, DC, USA (1991)

    Google Scholar 

  2. Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based state tying for high accuracy acoustic modelling. In: Proceedings of the workshop on Human Language Technology. HLT 1994, pp. 307–312. ACL, Stroudsburg (1994)

    Google Scholar 

  3. Baker, J., Deng, L., Khudanpur, S., Lee, C.H., Glass, J., Morgan, N., O’Shaughnessy, D.: Updated MINDS report on speech recognition and understanding, Part 2 (DSP Education). IEEE Signal Processing Magazine 26(4), 78–85 (2009)

    Article  Google Scholar 

  4. Prince, A., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford (1993/2004)

    Google Scholar 

  5. Young, S., Woodland, P.C.: State clustering in hidden Markov model-based continuous speech recognition. Computer Speech & Language 8(4), 369–383 (1994)

    Article  Google Scholar 

  6. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Ovell, J., Ollason, D., Valtchev, D.P.V., Woodland, P.: The HTK Book (for v3.4.1), Cambridge (2009)

    Google Scholar 

  7. Lee, A., Kawahara, T., Shikano, K.: Julius – an Open Source Real-Time Large Vocabulary Recognition Engine. In: Proc. of the European Conference on Speech Communications and Technology (EUROSPEECH), Aalborg, Denmark (September 2001)

    Google Scholar 

  8. Johansen, F.T., Warakagoda, N., Lindberg, B., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: The COST 249 SpeechDat multilingual reference recogniser. In: Proc. of the 2nd Intl. Conf. on LREC, Athens (May 2000)

    Google Scholar 

  9. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication 33(1–2) (January 2000)

    Google Scholar 

  10. Staš, J., Hládek, D., Juhár, J.: Language Model Adaptation for Slovak LVCSR. In: Proc. of the Intl. Conference on AEI, Venice, Italy, pp. 101–106 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Darjaa, S., Cerňak, M., Beňuš, Š., Rusko, M., Sabo, R., Trnka, M. (2011). Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23538-2_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23537-5

  • Online ISBN: 978-3-642-23538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics