The use of speed-up techniques for a speech recognizer system

Kocsor, András; Gosztolya, Gábor

doi:10.1007/s10772-008-9005-5

The use of speed-up techniques for a speech recognizer system

Published: 02 December 2008

Volume 9, pages 95–107, (2006)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

András Kocsor¹ &
Gábor Gosztolya¹

55 Accesses
1 Citation
Explore all metrics

Abstract

In speech recognition, not just the accuracy of an automatic speech recognition application is important, but also its speed. However, if we want to create a real-time speech recognizer, this requirement limits the time that is spent on searching for the best hypothesis, which can even affect the recognition accuracy. Thus the applied search method plays an important role in the speech recognition task, and so does its efficiency, i.e. how quickly it finds the uttered words. To speed up this search process, various ideas are available in the literature: we can use search heuristics, multi-pass search, or apply a family of aggregation operators. In this paper we test all these methods in turn, and combine them with a set of other novel speed-up ideas. The test results confirm that all of these techniques are valuable: using combinations of them helped make the speech recognition process over 12 times faster than the basic multi-stack decoding algorithm, and almost 11 times faster than the Viterbi beam search method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ayako, I., Pellom, B., Cer, D., Thornton, A., Brenier, J. M., Jurafsky, D., Ward, W., & Byrne, W. (2003). Issues in recognition of Spanish-accented spontaneous English. In Proceedings of the 2003 IEEE/ISCA workshop on spontaneous speech processing and recognition, paper MAP7, Tokyo, Japan.
Bahl, L., Gopalakrishnan, P., & Mercer, R. (1993). Search issues in large vocabulary speech recognition. In Proceedings of the 1993 IEEE workshop on automatic speech recognition, Snowbird, UT.
Bishop, C. (1995). Neural networks for pattern recognition. Oxford: Clarendon.
Google Scholar
Bourland, H., Konig, Y., & Morgan, N. (1994). Remap: recursive estimation and maximization of a posteriori probabilities—application to transition-based connectionist speech recognition. ICSI technical report TR-94-064.
Chelba, C. (2000). Exploiting syntactic structure for natural language modeling, Ph.D. thesis, Johns Hopkins University, Maryland.
Cloud, M., & Drachman, B. (1998). Inequalities. Berlin: Springer.
MATH Google Scholar
Devijver, P., & Kittler, J. (1982). Pattern recognition, a statistical approach. Englewood Cliffs: Prentice-Hall.
MATH Google Scholar
Dubois, D., & Prade, H. (2000). Fundamentals of fuzzy sets. Dordrecht: Kluwer Academic.
MATH Google Scholar
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
MATH Google Scholar
Duda, R., Hart, P., & Stork, D. (2001). Pattern classification. New York: Wiley.
MATH Google Scholar
Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., & Woodland, P. (2004). Development of the 2003 cu-htk conversational telephone speech transcription system. In Proceedings of the 2004 IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 249–252). Montreal, Canada.
Glass, J., Chang, J., & McCandless, M. (1996). A probabilistic framework for features-based speech recognition. In Proceedings of the 1996 international conference on spoken language processing (pp. 2277–2280). Philadelphia, PA.
Gosztolya, G., & Kocsor, A. (2004). Aggregation operators and hypothesis space reductions in speech recognition. In Proceedings of the 2004 conference on text, speech and dialogue (pp. 315–322). Brno, Czech Republic.
Gosztolya, G., & Kocsor, A. (2005). A hierarchical evaluation methodology in speech recognition. Acta Cybernetica, 17, 213–224.
MATH MathSciNet Google Scholar
Gosztolya, G., Kocsor, A., Tóth, L., & Felföldi, L. (2003). Various robust search methods in a Hungarian speech recognition system. Acta Cybernetica, 16, 229–240.
Google Scholar
Hand, D., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press.
Google Scholar
Hardy, G., Littlewood, J., & Pólya, G. (1968). Inequalities. Cambridge: Cambridge University Press.
Google Scholar
Hart, P., Nilsson, N., & Raphael, B. (1972). Correction to a formal basis for the heuristic determination of minimal cost paths. SIGART Newsletter, 37, 28–29.
Google Scholar
Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge: MIT Press.
Google Scholar
Kanthak, S., Ney, H., Riley, M., & Mohri, M. (2002). A comparison of two lvr search optimization techniques. In Proceedings of the 2002 international conference on spoken language processing (pp. 1309–1312). Denver, CO.
Klement, E., Mesiar, R., & Pap, E. (2000). Triangular norms. Dordrecht: Kluwer Academic.
MATH Google Scholar
Kocsor, A., Tóth, L., & A. Kuba, J. (1999). An overview of the oasis speech recognition project. In Proceedings of the 1999 international conference on applied informatics, Eger-Noszvaj, Hungary.
Morgan, N., & Bourland, H. (1995). An introduction to hybrid hmm/connectionist continuous speech recognition. Signal Processing Magazine, May 1995, 1025–1028.
Ney, H., & Ortmanss, S. (2000). Progress in dynamic programming search for lvcsr. In Proceedings of the IEEE’88.
Ostendorf, M., Digalakis, V., & Kimball, O. A. (1996). From hmms to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 4, 360–378.
Google Scholar
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Schlimmer, J. C. (1993). Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In Proceedings of the international conference on machine learning ’93 (pp. 284–290).
Schwartz, R., Nguyen, L., & Makhoul, J. (1996). Multiple-pass search strategies. In Automatic speech and speaker recognition advanced topics, Philadelphia, PA (pp. 429–456). Dordrecht: Kluwer Academic.
Google Scholar
Szarvas, M., Fegyó, T., Mihajlik, P., & Tatai, P. (2000). Automatic recognition of Hungarian: Theory and practice. International Journal of Speech Technology, 3, 237–252.
Article MATH Google Scholar
Vicsi, K., Tóth, L., Kocsor, A., & Csirik, J. (2002). Mtba—a Hungarian telephone speech database. Híradástechnika, LVII(8).
Young, S. (1995). The HMM Toolkit (HTK) (Software and manual). http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Research Group on Artificial Intelligence, Hungarian Academy of Sciences and University of Szeged, Aradi vértanúk tere 1, 6720, Szeged, Hungary
András Kocsor & Gábor Gosztolya

Authors

András Kocsor
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Gosztolya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gábor Gosztolya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kocsor, A., Gosztolya, G. The use of speed-up techniques for a speech recognizer system. Int J Speech Technol 9, 95–107 (2006). https://doi.org/10.1007/s10772-008-9005-5

Download citation

Received: 14 December 2005
Accepted: 09 October 2008
Published: 02 December 2008
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10772-008-9005-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The use of speed-up techniques for a speech recognizer system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Chinese dialect speech recognition: a comprehensive survey

A Survey on Pipelined FFT Hardware Architectures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The use of speed-up techniques for a speech recognizer system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Chinese dialect speech recognition: a comprehensive survey

A Survey on Pipelined FFT Hardware Architectures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation