Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition

Darjaa, Sakhia; Cerňak, Miloš; Beňuš, Štefan; Rusko, Milan; Sabo, Róbert; Trnka, Marián

doi:10.1007/978-3-642-23538-2_34

Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition

Sakhia Darjaa²¹,
Miloš Cerňak²¹,
Štefan Beňuš^21,22,
Milan Rusko²¹,
Róbert Sabo²¹ &
…
Marián Trnka²¹

Conference paper

964 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Abstract

This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows the synthesis of unseen triphones. The proposed technique is compared with decision tree-based state tying, and it is shown that for bigger acoustic models, at a size of 3000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system on a large vocabulary continuous speech transription task. Experiments, performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. Relative decrease of word error rate was 4.23% for models with 7500 states, and 4.13% at 11500 states.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bahl, L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D., Picheny, M.A.: Decision trees for phonological rules in continuous speech. In: ICASSP 1991, pp. 185–188. IEEE Computer Society, Washington, DC, USA (1991)
Google Scholar
Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based state tying for high accuracy acoustic modelling. In: Proceedings of the workshop on Human Language Technology. HLT 1994, pp. 307–312. ACL, Stroudsburg (1994)
Google Scholar
Baker, J., Deng, L., Khudanpur, S., Lee, C.H., Glass, J., Morgan, N., O’Shaughnessy, D.: Updated MINDS report on speech recognition and understanding, Part 2 (DSP Education). IEEE Signal Processing Magazine 26(4), 78–85 (2009)
Article Google Scholar
Prince, A., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford (1993/2004)
Google Scholar
Young, S., Woodland, P.C.: State clustering in hidden Markov model-based continuous speech recognition. Computer Speech & Language 8(4), 369–383 (1994)
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Ovell, J., Ollason, D., Valtchev, D.P.V., Woodland, P.: The HTK Book (for v3.4.1), Cambridge (2009)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius – an Open Source Real-Time Large Vocabulary Recognition Engine. In: Proc. of the European Conference on Speech Communications and Technology (EUROSPEECH), Aalborg, Denmark (September 2001)
Google Scholar
Johansen, F.T., Warakagoda, N., Lindberg, B., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: The COST 249 SpeechDat multilingual reference recogniser. In: Proc. of the 2nd Intl. Conf. on LREC, Athens (May 2000)
Google Scholar
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication 33(1–2) (January 2000)
Google Scholar
Staš, J., Hládek, D., Juhár, J.: Language Model Adaptation for Slovak LVCSR. In: Proc. of the Intl. Conference on AEI, Venice, Italy, pp. 101–106 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of informatics, Slovak Academy of Sciences, Dúbravská c. 9, 845 07, Bratislava, Slovakia
Sakhia Darjaa, Miloš Cerňak, Štefan Beňuš, Milan Rusko, Róbert Sabo & Marián Trnka
Department of Eng. and Am. Studies, Constantine the Philosopher University, Nitra, Slovakia
Štefan Beňuš

Authors

Sakhia Darjaa
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Cerňak
View author publications
You can also search for this author in PubMed Google Scholar
Štefan Beňuš
View author publications
You can also search for this author in PubMed Google Scholar
Milan Rusko
View author publications
You can also search for this author in PubMed Google Scholar
Róbert Sabo
View author publications
You can also search for this author in PubMed Google Scholar
Marián Trnka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Sciences, University of West Bohemia, Univerzitní 22, 306 14, Pilsen, Czech Republic
Ivan Habernal
Faculty of Applied Sciences, Dept. of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Darjaa, S., Cerňak, M., Beňuš, Š., Rusko, M., Sabo, R., Trnka, M. (2011). Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-23538-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics