Algorithm Optimizations: Low Memory Footprint

  • Marcel Vasilache

For speech recognition algorithms targeting mobile devices the memory footprint is a critical parameter. Although the memory consumption can be both static (long-term) and dynamic (run-time) in this chapter we focus mainly on the long-term memory requirements and, more specifically, on the techniques for acoustic model compression. As all compression methods, acoustic model compression is exploiting redundancies within the data as well as the limits for the parameter representation accuracy. Considering data redundancies specific for hidden Markov models (HMMs), parameter tying and state or density clustering algorithms are presented with cases like semicontinuous HMMs (SCHMMs) and subspace distribution clustered HMMs (SDCHMMs). Regarding parameter representation a simple scalar quantized representation is shown for the case of quantized HMMs (qHMMs). The effects on computational complexity are also reviewed for all the compression methods presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acero, A., Plumpe, M.D. (2004) Method for training of subspace coded Gaussian models. United States Patent Application Publication US2004/0181408A1.Google Scholar
  2. Aiyer, A., Gales, M.J.F., Picheny, M. (2000) Rapid likelihood calculation of subspace clustered Gaussian components. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. vol. 3, Istanbul, Turkey, pp. 1519-1522.Google Scholar
  3. Akaike, H. (1973) Information theory and an extension to the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory. Budapest, Hungary, pp. 267-281.Google Scholar
  4. Akaike, H. (1974) A new look at the statistical model identification. IEEE Transaction on Auto-matic Control vol. 19, nr. 6, pp. 716-723.MATHCrossRefMathSciNetGoogle Scholar
  5. Astrov, S. (2002) Memory space reduction for hidden Markov models in low-resource speech recognition system. In Proceedings of the International Conference on Spoken Language Processing. Denver, USA, pp. 1585-1588.Google Scholar
  6. Bahl, L., Brown, P., De Souza, P., Mercer, R. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proceedings of the Interna-tional Conference on Acoustics, Speech and Signal Processing. vol. 11, Tokyo, Japan, pp. 49-52.Google Scholar
  7. Barron, A., Rissanen, J., Yu, B. (1998) The minimum description length principle in coding and modeling. IEEE Transaction on Information Theory vol. 44, nr. 6, pp. 2743-2760.MATHCrossRefMathSciNetGoogle Scholar
  8. Burnham, K.P., Anderson, D.R. (2002) Model selection and multimodel inference: A practical-theoretic approach. 2nd edition. Springer-Verlag.Google Scholar
  9. Burnham, K.P., Anderson, D.R. (2004) Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods and Research. vol. 33, pp. 261-304.CrossRefMathSciNetGoogle Scholar
  10. Chen, S.S., Gopalakrishnan, P. (1998) Clustering via the Bayesian information criterion with applications in speech recognition. In Proceedings of the International Conference on Acous-tics, Speech and Signal Processing. vol. 2, Seattle, USA, pp. 645-648.Google Scholar
  11. Chien, J.-T., Furui, S. (2005) Predictive hidden Markov model selection for speech recognition. IEEE Transaction on Speech and Audio Processing vol. 13, nr. 3, pp. 377-387.CrossRefGoogle Scholar
  12. Chou, W., Reichl, W. (1999) Decision tree state tying based on penalized Bayesian information criterion. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. vol. 1, Phoenix, USA, pp. 345-348.Google Scholar
  13. Digalakis, V., Monaco, P., Murveit, H. (1996) Genones: Generalized mixture tying in continuous hidden Markov model based speech recognizers. IEEE Transaction on Speech and Audio Processing vol. 4, nr. 4, pp. 281-289.CrossRefGoogle Scholar
  14. Duda, R.O., Hart, P.E., Stork, D.G. (2001) Pattern classification and scene analysis. 2nd edition. John Willey & Sons, New York.Google Scholar
  15. Filali, K., Li, X., Bilmes, J. (2002) Data-driven vector clustering for low-memory footprint ASR. In Proceedings of the International Conference on Spoken Language Processing. Denver, USA.Google Scholar
  16. Filali, K., Li, X., Bilmes, J. (2005) Algorithms for data-driven ASR parameter quantization. Computer Speech and Language vol. 20, nr. 4, pp. 625-643.CrossRefGoogle Scholar
  17. Gales, M.J.F. (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Transac-tion on Speech and Audio Processing, pp. 272-281.Google Scholar
  18. Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., Doddington, G.R. (2001) Syllable-based large vocabulary continuous speech recognition. IEEE Transaction on Speech and Au-dio Processing vol. 9, nr. 4, pp. 358-366.CrossRefGoogle Scholar
  19. Gersho, A., Gray, R.M. (1992) Vector quantization and signal compression. Kluwer Academic Press.Google Scholar
  20. Gray, R.M., Neuhoff, D.L. (1998) Quantization. IEEE Transaction on Information Theory vol. 44, nr. 6, pp. 2325-2383.MATHCrossRefMathSciNetGoogle Scholar
  21. Harju, M., Salmela, P., Leppänen, J., Viikki, O., Saarinen, J. (2001) Comparing parameter tying techniques for multilingual acoustic modelling. In Proceedings of the Eurospeech. Aalborg, Denmark.Google Scholar
  22. Huang, X.D. (1992) Phoneme classification using semicontinuous hidden Markov models. IEEE Transaction on Acoustics, Speech and Signal Processing vol. 4, nr. 5, pp. 1062-1067.Google Scholar
  23. Huang, X.D., Jack, M. (1989) Semi-continuous hidden Markov models for speech signals. Com-puter Speech and Language vol. 3, nr. 3, pp. 239-252.CrossRefGoogle Scholar
  24. Huang, X.D., Ariki, Y., Jack, M.A. (1990) Hidden Markov models for speech recognition. Edin-burgh University Press, Edinburgh U.K.Google Scholar
  25. Hui, J., Xinwei, L., Chaojun, L. (2006) Large margin hidden Markov models for speech recogni-tion. IEEE Transaction on Speech and Audio Processing vol. 14, pp. 1584-1595.CrossRefGoogle Scholar
  26. Jelinek, F. (1998) Statistical methods for speech recognition. The MIT Press, Cambridge, Massachusetts.Google Scholar
  27. Juang, B.-H., Chou, W., Lee, C.-H. (1997) Minimum classification error rate methods for speech recognition. IEEE Transaction on Speech and Audio Processing vol. 5, nr. 3, pp. 257-265.CrossRefGoogle Scholar
  28. Junqua, J.-C., Vassallo, L. (1996) Context modeling and clustering in continuous speech recogni-tion. In Proceedings of the International Conference on Spoken Language Processing. Philadelphia, USA, pp. 2262-2265.Google Scholar
  29. Kailath, T. (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Transaction on Communications. vol. 15, nr. 1, pp. 52-60.CrossRefGoogle Scholar
  30. Katagiri, S., Juang, B.-H., Lee, C.-H. (1998) Pattern recognition using a generalized probabilistic descent method. Proceedings of the IEEE vol. 86, nr. 11, pp. 2345-2373.CrossRefGoogle Scholar
  31. Lahti, T., Viikki, O., Vasilache, M. (2003) Low memory acoustic models for HMM based speech recognition. In Proceedings of the Eurospeech. Geneva, Switzerland, pp. 2489-2492.Google Scholar
  32. Lanterman, A.D. (2001) Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model order estimation. International Statistical Review. vol. 69, nr. 2, pp. 185-212.MATHCrossRefGoogle Scholar
  33. Leppänen, J., Kiss, I. (2005) Comparison of low footprint acoustic modeling techniques for embedded ASR systems. In Proceedings of the Interspeech. Lisbon, Portugal.Google Scholar
  34. Li, H.-B., Soong, F.K., Myrvoll, T.A., Wang, R.-H. (2005) Optimal clustering and non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. vol. 1. Philadel-phia, USA, pp. 552-555.Google Scholar
  35. Liu, X., Gales, M.J.F. (2007) Automatic model complexity control using marginalized discrimi-native growth functions. IEEE Transaction on Speech and Audio Processing vol. 15, pp. 1414-1424.CrossRefGoogle Scholar
  36. Mak, B.K.-W. (1998) Towards a compact speech recognizer. Ph.D. Thesis, Massachusetts Insti-tute of Technology.Google Scholar
  37. Mak, B.K.-W. (2004) An acoustic-phonetic and a model-theoretic analysis of subspace distribu-tion clustering hidden Markov models. International Journal of Speech Technology vol. 7, nr. 1, pp. 55-68.CrossRefMathSciNetGoogle Scholar
  38. Mak, B.K.-W., Bocchieri, E. (2001a) Direct training of subspace distribution clustering hidden Markov model. IEEE Transaction on Speech and Audio Processing vol. 9, pp. 378-387.CrossRefGoogle Scholar
  39. Mak, B.K.-W., Bocchieri, E. (2001b) Subspace distribution clustering hidden Markov model. IEEE Transaction on Speech and Audio Processing vol. 9, pp. 264-275.CrossRefGoogle Scholar
  40. Myrvoll, T.A., Soong, F.K. (2003) Optimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation. In Proceedings of the International Con-ference on Acoustics, Speech and Signal Processing. vol. 1, Hong Kong, China, pp. 552-555.Google Scholar
  41. Nock, H.J., Gales, M.J.F., Young, S. (1997) A comparative study of methods for phonetic deci-sion-tree state clustering. In Proceedings of the Eurospeech. Rhodes, Greece, pp. 111-114.Google Scholar
  42. Normandin, Y., Cardin, R., De Mori, R. (1994) High-performance connected digit recognition using maximum mutual information estimation. IEEE Transaction on Speech and Audio Processing vol. 2, nr. 2, pp. 299-311.CrossRefGoogle Scholar
  43. Padmanabhan, M., Ban, L. (2000) Model complexity adaptation using a discriminant measure. IEEE Transaction on Speech and Audio Processing vol. 8, nr. 2, pp. 205-208.CrossRefGoogle Scholar
  44. Pan, J., Yuan, B., Yan, Y. (2000) Effective vector quantization for a highly compact acoustic model for LVCSR. In Proceedings of the International Conference on Spoken Language Processing. vol. 4. Beijing, China, pp. 318-321.Google Scholar
  45. Rabiner, L., Juang, B.H. (1986) An introduction to hidden Markov models. IEEE ASSP Maga-zine vol. 3, 4-16.CrossRefGoogle Scholar
  46. Rabiner, L., Juang, B.H. (1993) Fundamentals of speech recognition. PTR Prentice-Hall, Inc., New Jersey.Google Scholar
  47. Rabiner, L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE vol. 77, nr. 2, pp. 257-286.CrossRefGoogle Scholar
  48. Ravishankar, M., Bisiani, R., Thayer, E. (1997) Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation. In Proceedings of the Eurospeech. Rhodes, Greece, pp. 151-154.Google Scholar
  49. Rigazio, L., Tsakam, B., Junqua, J. (2000) An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. vol. 3. Istanbul, Turkey, pp. 1599-1602.Google Scholar
  50. Roberts, S.J., Husmeier, D., Rezek, I., Penny, W.D. (1998) Bayesian approaches to Gaussian mixture modeling. IEEE Transaction on Pattern Analysis and Machine Intelligence. vol. 20, nr. 11, pp. 1133-1142.CrossRefGoogle Scholar
  51. Schwartz, G. (1978) Estimating the dimension of a model. The Annals of Statistics. vol. 6, nr. 2, pp. 461-464.CrossRefMathSciNetGoogle Scholar
  52. Takahashi, S., Sagayama, S. (1995a) Effects of variance tying for four-level tied structure phone models. In Proceedings of the ASI Conference vol. 1-Q-23. Tokyo, Japan, pp. 141-142, (in Japanese).Google Scholar
  53. Takahashi, S., Sagayama, S. (1995b) Four-level tied-structure for efficient representation of acoustic modeling. In Proceedings of the International. Conference on Acoustics, Speech and Signal Processing. vol. 1. Detroit, USA, pp. 520-523.Google Scholar
  54. Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J., Baugeant, C., Hoge, H. (2002) ASR in mobile phones—an industrial approach. IEEE Transaction on Speech and Audio Processing. vol. 10, nr. 8, pp. 562-569.CrossRefGoogle Scholar
  55. Vasilache, M. (2000) Speech recognition using HMMs with quantized parameters. In Proceed-ings of the International Conference on Spoken Language Processing. Beijing, China, pp. 871-874.Google Scholar
  56. Vasilache, M. (2008) Multi-rate HMM quantization for speech recognition. Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA.Google Scholar
  57. Vasilache, M., Iso-Sipilä, J., Viikki, O. (2004) On a practical design of a low complexity speech recognition engine. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Montreal, Canada, pp. V.113-116.Google Scholar
  58. Vasilache, M., Viikki, O. (2001) Speaker adaptation of quantized parameter HMMs. In Proceed-ings of the Eurospeech-Scandinavia. Aalborg, Denmark, pp. II. 1265-1268.Google Scholar
  59. Vihola, M., Harju, M., Salmela, P., Suontausta, J., Savela, J. (2002) Two dissimilarity measures for HMMs and their application in phoneme model clustering. In Proceedings of the Interna-tional Conference on Acoustics, Speech and Signal Processing. vol. 1. Orlando, USA, pp. 933-936.Google Scholar
  60. Wallance, C., Boulton, D. (1968) An information measure for classification. The Computer Journal. vol. 11, nr. 2, pp. 195-209.Google Scholar
  61. Willett, D., Rigoll, G. (1997) A new approach to generalized mixture tying for continuous HMM-based speech recognition. In Proceedings of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, pp. 1175-1178.Google Scholar
  62. Yang, Y., Barron, A. (1998) An asymptotic property of model selection criteria. IEEE Transac-tion on Information Theory. vol. 44, nr. 1, pp. 95-116.MATHCrossRefMathSciNetGoogle Scholar
  63. Young, S. (1992) The general use of tying in phoneme-based HMM speech recognisers. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. San Francisco, USA, pp. 569-572.Google Scholar
  64. Young, S., Odell, J., Woodland, P. (1994) Tree-based state tying for high accuracy acoustic modelling. In Proceedings of the ARPA Workshop on Human Language Technology. pp. 307-312.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Marcel Vasilache
    • 1
  1. 1.NokiaTampereFinland

Personalised recommendations