Algorithm Optimizations: Low Computational Complexity

  • Miroslav Novak
Part of the Advances in Pattern Recognition book series (ACVPR)

Advances in ASR are driven by both scientific achievements in the field and the availability of more powerful hardware. While very powerful CPUs allow us to use ever more complex algorithms in server-based large vocabulary ASR systems (e.g. in telephony applications), the capability of embedded platforms will always lag behind. Nevertheless as the popularity of ASR application grows, we can expect an increasing demand for functionality on embedded platforms as well. For example, replacing simple command and control grammar-based applications by natural language understanding (NLU) systems leads to increased vocabulary sizes and thus the need for greater CPU performance. In this chapter we present an overview of ASR decoder design options with an emphasis on techniques which are suitable for embedded platforms. One needs to keep in mind that there is no one-size-fits-all solution; specific algorithmic improvements may only be best applied to highly restricted applications or scenarios. The optimal solution can usually be achieved by making choices with respect to algorithms aimed at maximizing specific benefits for a particular platform and task.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aubert, X.L. (2002). An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech & Language, vol. 16, no. 1, pp. 89-114.CrossRefGoogle Scholar
  2. Bahl, J.L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D. and Picheny, M. (1994). Ro-bust methods for using context-dependent features and speech recognition models in a continuous speech recognizer. In Proceedings of ICASSP.Google Scholar
  3. Balakrishnan, S. (2003). Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. In Proceedings of Eurospeech.Google Scholar
  4. Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of ICSLP, pp. 692-695.Google Scholar
  5. Caseiro, D. and Trancose, I. (2006). A specialized on-the-fly algorithm for lexicon and lan-guage model composition. IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 4, pp. 1281-1291.CrossRefGoogle Scholar
  6. Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, R. and Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transac-tions on Speech and Audio Processing, Special issue on automatic speech recognition for mobile and portable devices, 10 (8), pp. 551-561.Google Scholar
  7. Dolfing, H.J.G.A. (2002). A comparison of prefix tree and finite-state transducer search space modelings for large vocabulary speech recognition. In Proceedings of ICSLP, pp. 1305-1308.Google Scholar
  8. Frichtsch, J. and Rogina, I. (1996). The bucket box intersection (BBI) algorithm for fast ap-proximative evaluation of diagonal mixture gaussians. In Proceedings of ICASSP.Google Scholar
  9. Gales, M.J.F. (1997). Maximum likelihood linear transformations from HMM-based speech recognition. CUED Technical Report TR291.Google Scholar
  10. Gales, M.J.F., Knill, K.M. and Young, S.J. (1992). State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 154-161.Google Scholar
  11. Gopalakrishnan, P.S., Bahl, L.R. and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of ICASSP, pp. 572-575.Google Scholar
  12. Kanthak, S., Ney, H., Riley, M. and Mohri, M. (2000). A comparison of two LVR search optimization techniques. In Proceedings of ICSLP, pp. 1309-1312.Google Scholar
  13. Mohri, M., Pereira, F. and Riley, M. (2002). Weighted finite-state transducers in speech recog-nition. Computer Speech & Language, vol. 16, no. 1, pp. 69-88.CrossRefGoogle Scholar
  14. Novak, M., and Picheny, M. (2000). Speed improvement of the tree-based time asynchronous search. In Proceedings of ICSLP, pp. 334-337.Google Scholar
  15. Novak, M., Gopinath, R.A. and Sedivy, J. (2002). Efficient hierarchical labeler algorithm for Gaussian likelihoods computation in resource constrained speech recognition systems. http://www.research.ibm.com/people/r/rameshg/novak-icassp2002.ps.
  16. Novak, M., Hampl, R., Krbec, P., Bergl, V. and Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proceedings of ICASSP, pp. 200-203.Google Scholar
  17. Novak, M. (2005). Memory efficient approximative lattice generation for grammar based decoding. In Proceedings of Eurospeech, pp. 573-576.Google Scholar
  18. Novak, M. and Bergl, V. (2004). Memory efficient decoding graph compilation with wide cross-word accoustic context. In Proceedings of ICSPL, pp. 281-284.Google Scholar
  19. Olsen, P. and Dharanipragada, S. (2003). An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. In Proceedings of Eurospeech, pp. 2509-2512.Google Scholar
  20. Ortmanns, S., Firzlaff, T. and Ney, H. (1997a). Fast likelihood computation for continuous mixture densities in large vocabulary speech recognition. In Proceedings of Eurospeech, pp. 143-146.Google Scholar
  21. Ortmanns, S., Ney, H., Eiden, S.A. and Coenen, N. (1997b). Look ahead techniques for fast beam search. In Proceedings of ICASSP, pp. 1783-1786.Google Scholar
  22. Ortmanns, S., Eiden, S.A. and Ney, H. (1998). Improved lexical tree search for large vocabu-lary speech recognition. In Proceedings of ICASSP, pp. 817-820.Google Scholar
  23. Ortmanns, S. and Ney, H. (2000). The time-conditioned approach in dynamic programming search for LVCSR. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 676-687.CrossRefGoogle Scholar
  24. Renals, S. and Hochberg, M.M. (1999). Start-synchronous search for large vocabulary con-tinuous speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 542-553.CrossRefGoogle Scholar
  25. Saon, G., Zweig, G., Kingsbury, B., Mangu L. and Chaudhari, U. (2003). An architecture for rapid decoding of large vocabulary conversational speech. In Proceedings of Eurospeech, pp. 1977-1980.Google Scholar
  26. Saon, G., Zweig, G. and Povey, D. (2005). Anatomy of an extremely fast LVCSR decoder. In Proceedings of Interspeech, pp. 549-552.Google Scholar
  27. Schalkwyk, J., Hetherington, L. and Story, E. (2003). Speech recognition with dynamic gram-mars using finite-state transducers. In Proceedings of Eurospeech, pp. 1969-1972.Google Scholar
  28. Schwartz, R. and Austin, S. (1993). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In Proceedings of ICASSP.Google Scholar
  29. Willet, D. and Katagiri, S. (2002). Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation. In Proceedings of ICASSP, pp. 713-716.Google Scholar
  30. Zheng, J. and Franco, H. (2002). Fast hierarchical grammar optimization algorithm toward time and space efficiency. In Proceedings of ICSLP, pp. 393-396.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Miroslav Novak
    • 1
  1. 1.Speech and Language TechnologiesIBM T.J Watson Research CenterUSA

Personalised recommendations