Skip to main content

Model Selection of Combined Neural Nets for Speech Recognition

  • Chapter
Combining Artificial Neural Nets

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

  • 177 Accesses

Summary

The problem of finding criteria through which a model will be chosen to match a problem and available data and give optimal future performance is a crucial issue in practical applications, not to be understimated when proposing model combination to solve a complex regression or classification task. How can it be ensured that each specialised model has been trained with enough material and that the aggregate model has the optimal structure for reducing error on novel inputs? What if a key requirement is minimisation of training material and time?

This chapter introduces bootstrap error estimation for automatic model selection in combined networks: the resulting model is embedded in the acoustic front-end of an automatic speech recognition system based on hidden Markov models. The method is evaluated in two applications: in a large vocabulary (10,000 words), continuous speech recognition task and in digit recognition over a noisy telephone line. Bootstrap estimates of minimum MSE allow selection of regression models that improve system recognition performance. The procedure allows a flexible strategy for dealing with inter-speaker variability without requiring an additional validation set. Recognition results are compared for linear, generalised Radial Basis Functions and Multilayer Perceptron network architectures and with system re-training methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Aitkin and G. T. Wilson. Mixture models, outliers, and the EM algorithm. Technometrics, 22 (3): 325–331, August 1980.

    Article  MATH  Google Scholar 

  2. B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, and M. Omologo. Speaker Independent Continuous Speech Recognition Using an Acoustic-Phonetic Italian Corpus. In ICSLP 94, pages 1391–1394, Yokohama, 1994.

    Google Scholar 

  3. J. R. Bellegarda, P. V. de Souza, A. J. Nadas, D. Nahamoo, M. A. Picheny, and L. R. Bahl. Robust speaker adaptation using a piecewise linear acoustic mapping. In ICASSP, pages 445–448, San Francisco, March 1992.

    Google Scholar 

  4. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.

    Google Scholar 

  5. C. Bregler and S. M. Omohundro. Nonlinear image interpolation using manifold learning. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 973–980, Cambridge MA, 1995. MIT Press.

    Google Scholar 

  6. Leo Breiman. Bagging predictors. Machine Learning, 26 (2): 123–140, 1996.

    Google Scholar 

  7. D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2: 321–355, 1988.

    MathSciNet  MATH  Google Scholar 

  8. H. Bunke and O. Bunke, editors. Nonlinear Regression, Functional Relations and Robust Methods. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1989.

    Google Scholar 

  9. F. Class, A. Kaltenmeier, P. Regel, and K. Troller. Fast speaker adaptation for speech recognition system. In ICASSP 90, pages I-133–136, Albuquerque, April 1990.

    Google Scholar 

  10. S. B. Davis and P. Mermelstein. Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoust. Speech and Signal Processing ASSP, 28 (4): 357–366, 1980.

    Article  Google Scholar 

  11. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

    MATH  Google Scholar 

  12. B. Efron. The Jackknife, the Bootstrap and Other Resampling Plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics. Bristol: SIAM, 1982.

    Google Scholar 

  13. B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. New York: Chapman & Hall, Inc., 1993.

    Google Scholar 

  14. D. Falavigna and R. Gretter. Evaluation of digit recognition over the telephone network. In Proceedings of ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pages 191–194, Pont-a-Mousson, France, April 1997.

    Google Scholar 

  15. D. Falavigna and R. Gretter. On field experiments of continuous digit recognition over the telephone network. In Proceedings of EUROSPEECH ‘97, pages 1827–1830, Rhodes, Greece, September 1997.

    Google Scholar 

  16. M. Federico, M. Cettolo, F. Brugnara, and G. Antoniol. Language modelling for efficient beam-search. Computer Speech and Language, 9: 353–379, 1995.

    Article  Google Scholar 

  17. C. Furlanello and D. Giuliani. Combining local PCA and Radial Basis Function Networks for speaker normalization. In Proceedings of the 1995 IEEE Workshop on Neural Networks for Signal Processing V,pages 233–242, Cambridge MA, 1995. IEEE.

    Google Scholar 

  18. C. Furlanello, D. Giuliani, and E. Trentin. Connectionist speaker normalization with Generalized Resource Allocating Networks. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7 (1994 pages 1704–1707, Cambridge MA, 1995. MIT Press.

    Google Scholar 

  19. J.-L. Gauvain and C.-H. Lee. Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing, 2 (2): 291–298, 1994.

    Article  Google Scholar 

  20. D. Giuliani and R. De Mori. Speaker adaptation. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 363–404, London, 1998. Academic Press.

    Google Scholar 

  21. W. Härdle. Applied nonparametric regression, volume 19 of Econom. Soc. Monographs. Cambridge Un. Press, New York, 1990.

    Google Scholar 

  22. R. J. Hathaway and J. C. Bezdek. Switching regression models and fuzzy clustering. IEEE Transactions on Fuzzy Systems, 1 (3): 195–204, 1993.

    Article  Google Scholar 

  23. John Hertz, Anders Krogh, and Richard Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.

    Google Scholar 

  24. G. E. Hinton and Z. Ghahramani. Generative models for discovering sparse distributed representations. Philosophical Transactions Royal Society B, 352: 117–71190, 1997.

    Google Scholar 

  25. G. E. Hinton, M. Revow, and P. Dayan. Recognizing handwritten digits using mixtures of linear models. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 1015–1022, Cambridge MA, 1995. MIT Press.

    Google Scholar 

  26. X. D. Huang. Speaker normalization for speech recognition. In ICASSP, pages I-465–468, San Francisco, March 1992.

    Google Scholar 

  27. N. Intrator and S. Edelman. Making a low-dimensional representation suitable for diverse tasks. Connection Science, 8 (2): 205–224, 1997.

    Article  Google Scholar 

  28. N. Iwahashi and Y. Sagisaka. Voice adaptation using multi-functional transformation with weighting by radial basis function networks. In ICSLP 94, pages III-1599–1602, September 1994.

    Google Scholar 

  29. R. A. Jacobs. Methods For Combining Experts’ Probability Assessments. Neural Computation, 7 (5): 867–888, 1995.

    Article  Google Scholar 

  30. R. A. Jacobs and M. I. Jordan. Learning piecewise control strategies in a modular neural network architecture. IEEE Trans. on Systems, Man, and Cybernetics, 23 (2): 337–345, March 1993.

    Article  MATH  Google Scholar 

  31. M. I. Jordan and R. A. Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation, 6 (2): 181–214, 1994.

    Article  Google Scholar 

  32. N. M. Kiefer. Discrete parameter variation: Efficient estimation of a switching regression model. Econometrica, 46 (2): 427–434, March 1978.

    Article  MathSciNet  MATH  Google Scholar 

  33. C. J. Leggetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9: 171–185, 1995.

    Article  Google Scholar 

  34. J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson. Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system. In EUROSPEECH 95, pages 2171–2174, Madrid, September 1995.

    Google Scholar 

  35. L. Neumeyer and M. Weintraub. Probabilistic Optimum Filtering for Robust Speech Recognition. In Proc. of ICASSP, pages I-417–420, Adelaide, April 1994.

    Google Scholar 

  36. T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo No. 1140, MIT, 1989.

    Google Scholar 

  37. L. Y. Pratt. Experiments on the transfer of knowledge between neural networks. In S. Hanson, G. Drastal, and R. Rivest, editors, Computational Learning Theory and Natural Learning Systems, Constraints and Prospects, pages 523–560. MIT Press, Cambridge Ma, 1994.

    Google Scholar 

  38. W.H. Press, S. A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes: the Art of Scientific Computing. Cambridge Univ. Press, 2nd edition, 1992.

    Google Scholar 

  39. L.R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition,. Proc. of IEEE, 77 (2): 267–295, October 1989.

    Article  Google Scholar 

  40. A. V. Rao, D. Miller, K. Rose, and A. Gersho. Mixture of experts regression modeling by deterministic annealing. IEEE Trans. on Signal Processing, 45 (11): 2811–2820, November 1997.

    Article  Google Scholar 

  41. A. Sankar and C.-H. Lee. Robust speech recognition based on stochastic matching. In ICASSP, pages 121–124, Detroit, May 1995.

    Google Scholar 

  42. M. E. Tipping and C. M. Bishop. Mixtures of principal component analysers. NCRG Tech Report NCRG/97/003, Aston University, Birmingham U.K., 1997.

    Google Scholar 

  43. E. Trentin, Y. Bengio, C. Furlanello, and R. De Mori. Neural networks for speech recognition. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 311–362, London, 1998. Academic Press.

    Google Scholar 

  44. E. Trentin, and D. Giuliani. Speaker Normalization with a Mixture of Recurrent Networks. Proc. of ESANN97, Bruges, Belgium, April 1997.

    Google Scholar 

  45. V. Tresp and M. Taniguchi. Combining estimators using non-constant weighting functions. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 419–426, Cambridge (MA), 1995. MIT Press.

    Google Scholar 

  46. R. Watrous. Speaker normalization and adaptation using second-order connectionist networks. IEEE Trans. on Neural Networks, 4 (1): 21–30, January 1994.

    Article  Google Scholar 

  47. G. Zavaliagkos, R. Schwartz, and J. Makhoul. Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In Proc. of ICASSP, pages I-676–679, Detroit, May 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this chapter

Cite this chapter

Sharkey, A.J.C. (1999). Model Selection of Combined Neural Nets for Speech Recognition. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0793-4_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-004-0

  • Online ISBN: 978-1-4471-0793-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics