Skip to main content
Log in

The use of speed-up techniques for a speech recognizer system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In speech recognition, not just the accuracy of an automatic speech recognition application is important, but also its speed. However, if we want to create a real-time speech recognizer, this requirement limits the time that is spent on searching for the best hypothesis, which can even affect the recognition accuracy. Thus the applied search method plays an important role in the speech recognition task, and so does its efficiency, i.e. how quickly it finds the uttered words. To speed up this search process, various ideas are available in the literature: we can use search heuristics, multi-pass search, or apply a family of aggregation operators. In this paper we test all these methods in turn, and combine them with a set of other novel speed-up ideas. The test results confirm that all of these techniques are valuable: using combinations of them helped make the speech recognition process over 12 times faster than the basic multi-stack decoding algorithm, and almost 11 times faster than the Viterbi beam search method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ayako, I., Pellom, B., Cer, D., Thornton, A., Brenier, J. M., Jurafsky, D., Ward, W., & Byrne, W. (2003). Issues in recognition of Spanish-accented spontaneous English. In Proceedings of the 2003 IEEE/ISCA workshop on spontaneous speech processing and recognition, paper MAP7, Tokyo, Japan.

  • Bahl, L., Gopalakrishnan, P., & Mercer, R. (1993). Search issues in large vocabulary speech recognition. In Proceedings of the 1993 IEEE workshop on automatic speech recognition, Snowbird, UT.

  • Bishop, C. (1995). Neural networks for pattern recognition. Oxford: Clarendon.

    Google Scholar 

  • Bourland, H., Konig, Y., & Morgan, N. (1994). Remap: recursive estimation and maximization of a posteriori probabilities—application to transition-based connectionist speech recognition. ICSI technical report TR-94-064.

  • Chelba, C. (2000). Exploiting syntactic structure for natural language modeling, Ph.D. thesis, Johns Hopkins University, Maryland.

  • Cloud, M., & Drachman, B. (1998). Inequalities. Berlin: Springer.

    MATH  Google Scholar 

  • Devijver, P., & Kittler, J. (1982). Pattern recognition, a statistical approach. Englewood Cliffs: Prentice-Hall.

    MATH  Google Scholar 

  • Dubois, D., & Prade, H. (2000). Fundamentals of fuzzy sets. Dordrecht: Kluwer Academic.

    MATH  Google Scholar 

  • Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.

    MATH  Google Scholar 

  • Duda, R., Hart, P., & Stork, D. (2001). Pattern classification. New York: Wiley.

    MATH  Google Scholar 

  • Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., & Woodland, P. (2004). Development of the 2003 cu-htk conversational telephone speech transcription system. In Proceedings of the 2004 IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 249–252). Montreal, Canada.

  • Glass, J., Chang, J., & McCandless, M. (1996). A probabilistic framework for features-based speech recognition. In Proceedings of the 1996 international conference on spoken language processing (pp. 2277–2280). Philadelphia, PA.

  • Gosztolya, G., & Kocsor, A. (2004). Aggregation operators and hypothesis space reductions in speech recognition. In Proceedings of the 2004 conference on text, speech and dialogue (pp. 315–322). Brno, Czech Republic.

  • Gosztolya, G., & Kocsor, A. (2005). A hierarchical evaluation methodology in speech recognition. Acta Cybernetica, 17, 213–224.

    MATH  MathSciNet  Google Scholar 

  • Gosztolya, G., Kocsor, A., Tóth, L., & Felföldi, L. (2003). Various robust search methods in a Hungarian speech recognition system. Acta Cybernetica, 16, 229–240.

    Google Scholar 

  • Hand, D., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press.

    Google Scholar 

  • Hardy, G., Littlewood, J., & Pólya, G. (1968). Inequalities. Cambridge: Cambridge University Press.

    Google Scholar 

  • Hart, P., Nilsson, N., & Raphael, B. (1972). Correction to a formal basis for the heuristic determination of minimal cost paths. SIGART Newsletter, 37, 28–29.

    Google Scholar 

  • Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge: MIT Press.

    Google Scholar 

  • Kanthak, S., Ney, H., Riley, M., & Mohri, M. (2002). A comparison of two lvr search optimization techniques. In Proceedings of the 2002 international conference on spoken language processing (pp. 1309–1312). Denver, CO.

  • Klement, E., Mesiar, R., & Pap, E. (2000). Triangular norms. Dordrecht: Kluwer Academic.

    MATH  Google Scholar 

  • Kocsor, A., Tóth, L., & A. Kuba, J. (1999). An overview of the oasis speech recognition project. In Proceedings of the 1999 international conference on applied informatics, Eger-Noszvaj, Hungary.

  • Morgan, N., & Bourland, H. (1995). An introduction to hybrid hmm/connectionist continuous speech recognition. Signal Processing Magazine, May 1995, 1025–1028.

  • Ney, H., & Ortmanss, S. (2000). Progress in dynamic programming search for lvcsr. In Proceedings of the IEEE’88.

  • Ostendorf, M., Digalakis, V., & Kimball, O. A. (1996). From hmms to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 4, 360–378.

    Google Scholar 

  • Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Schlimmer, J. C. (1993). Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In Proceedings of the international conference on machine learning ’93 (pp. 284–290).

  • Schwartz, R., Nguyen, L., & Makhoul, J. (1996). Multiple-pass search strategies. In Automatic speech and speaker recognition advanced topics, Philadelphia, PA (pp. 429–456). Dordrecht: Kluwer Academic.

    Google Scholar 

  • Szarvas, M., Fegyó, T., Mihajlik, P., & Tatai, P. (2000). Automatic recognition of Hungarian: Theory and practice. International Journal of Speech Technology, 3, 237–252.

    Article  MATH  Google Scholar 

  • Vicsi, K., Tóth, L., Kocsor, A., & Csirik, J. (2002). Mtba—a Hungarian telephone speech database. Híradástechnika, LVII(8).

  • Young, S. (1995). The HMM Toolkit (HTK) (Software and manual). http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gábor Gosztolya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kocsor, A., Gosztolya, G. The use of speed-up techniques for a speech recognizer system. Int J Speech Technol 9, 95–107 (2006). https://doi.org/10.1007/s10772-008-9005-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-008-9005-5

Keywords

Navigation