Summary
The problem of finding criteria through which a model will be chosen to match a problem and available data and give optimal future performance is a crucial issue in practical applications, not to be understimated when proposing model combination to solve a complex regression or classification task. How can it be ensured that each specialised model has been trained with enough material and that the aggregate model has the optimal structure for reducing error on novel inputs? What if a key requirement is minimisation of training material and time?
This chapter introduces bootstrap error estimation for automatic model selection in combined networks: the resulting model is embedded in the acoustic front-end of an automatic speech recognition system based on hidden Markov models. The method is evaluated in two applications: in a large vocabulary (10,000 words), continuous speech recognition task and in digit recognition over a noisy telephone line. Bootstrap estimates of minimum MSE allow selection of regression models that improve system recognition performance. The procedure allows a flexible strategy for dealing with inter-speaker variability without requiring an additional validation set. Recognition results are compared for linear, generalised Radial Basis Functions and Multilayer Perceptron network architectures and with system re-training methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Aitkin and G. T. Wilson. Mixture models, outliers, and the EM algorithm. Technometrics, 22 (3): 325–331, August 1980.
B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, and M. Omologo. Speaker Independent Continuous Speech Recognition Using an Acoustic-Phonetic Italian Corpus. In ICSLP 94, pages 1391–1394, Yokohama, 1994.
J. R. Bellegarda, P. V. de Souza, A. J. Nadas, D. Nahamoo, M. A. Picheny, and L. R. Bahl. Robust speaker adaptation using a piecewise linear acoustic mapping. In ICASSP, pages 445–448, San Francisco, March 1992.
C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.
C. Bregler and S. M. Omohundro. Nonlinear image interpolation using manifold learning. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 973–980, Cambridge MA, 1995. MIT Press.
Leo Breiman. Bagging predictors. Machine Learning, 26 (2): 123–140, 1996.
D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2: 321–355, 1988.
H. Bunke and O. Bunke, editors. Nonlinear Regression, Functional Relations and Robust Methods. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1989.
F. Class, A. Kaltenmeier, P. Regel, and K. Troller. Fast speaker adaptation for speech recognition system. In ICASSP 90, pages I-133–136, Albuquerque, April 1990.
S. B. Davis and P. Mermelstein. Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoust. Speech and Signal Processing ASSP, 28 (4): 357–366, 1980.
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
B. Efron. The Jackknife, the Bootstrap and Other Resampling Plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics. Bristol: SIAM, 1982.
B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. New York: Chapman & Hall, Inc., 1993.
D. Falavigna and R. Gretter. Evaluation of digit recognition over the telephone network. In Proceedings of ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pages 191–194, Pont-a-Mousson, France, April 1997.
D. Falavigna and R. Gretter. On field experiments of continuous digit recognition over the telephone network. In Proceedings of EUROSPEECH ‘97, pages 1827–1830, Rhodes, Greece, September 1997.
M. Federico, M. Cettolo, F. Brugnara, and G. Antoniol. Language modelling for efficient beam-search. Computer Speech and Language, 9: 353–379, 1995.
C. Furlanello and D. Giuliani. Combining local PCA and Radial Basis Function Networks for speaker normalization. In Proceedings of the 1995 IEEE Workshop on Neural Networks for Signal Processing V,pages 233–242, Cambridge MA, 1995. IEEE.
C. Furlanello, D. Giuliani, and E. Trentin. Connectionist speaker normalization with Generalized Resource Allocating Networks. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7 (1994 pages 1704–1707, Cambridge MA, 1995. MIT Press.
J.-L. Gauvain and C.-H. Lee. Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing, 2 (2): 291–298, 1994.
D. Giuliani and R. De Mori. Speaker adaptation. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 363–404, London, 1998. Academic Press.
W. Härdle. Applied nonparametric regression, volume 19 of Econom. Soc. Monographs. Cambridge Un. Press, New York, 1990.
R. J. Hathaway and J. C. Bezdek. Switching regression models and fuzzy clustering. IEEE Transactions on Fuzzy Systems, 1 (3): 195–204, 1993.
John Hertz, Anders Krogh, and Richard Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.
G. E. Hinton and Z. Ghahramani. Generative models for discovering sparse distributed representations. Philosophical Transactions Royal Society B, 352: 117–71190, 1997.
G. E. Hinton, M. Revow, and P. Dayan. Recognizing handwritten digits using mixtures of linear models. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 1015–1022, Cambridge MA, 1995. MIT Press.
X. D. Huang. Speaker normalization for speech recognition. In ICASSP, pages I-465–468, San Francisco, March 1992.
N. Intrator and S. Edelman. Making a low-dimensional representation suitable for diverse tasks. Connection Science, 8 (2): 205–224, 1997.
N. Iwahashi and Y. Sagisaka. Voice adaptation using multi-functional transformation with weighting by radial basis function networks. In ICSLP 94, pages III-1599–1602, September 1994.
R. A. Jacobs. Methods For Combining Experts’ Probability Assessments. Neural Computation, 7 (5): 867–888, 1995.
R. A. Jacobs and M. I. Jordan. Learning piecewise control strategies in a modular neural network architecture. IEEE Trans. on Systems, Man, and Cybernetics, 23 (2): 337–345, March 1993.
M. I. Jordan and R. A. Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation, 6 (2): 181–214, 1994.
N. M. Kiefer. Discrete parameter variation: Efficient estimation of a switching regression model. Econometrica, 46 (2): 427–434, March 1978.
C. J. Leggetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9: 171–185, 1995.
J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson. Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system. In EUROSPEECH 95, pages 2171–2174, Madrid, September 1995.
L. Neumeyer and M. Weintraub. Probabilistic Optimum Filtering for Robust Speech Recognition. In Proc. of ICASSP, pages I-417–420, Adelaide, April 1994.
T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo No. 1140, MIT, 1989.
L. Y. Pratt. Experiments on the transfer of knowledge between neural networks. In S. Hanson, G. Drastal, and R. Rivest, editors, Computational Learning Theory and Natural Learning Systems, Constraints and Prospects, pages 523–560. MIT Press, Cambridge Ma, 1994.
W.H. Press, S. A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes: the Art of Scientific Computing. Cambridge Univ. Press, 2nd edition, 1992.
L.R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition,. Proc. of IEEE, 77 (2): 267–295, October 1989.
A. V. Rao, D. Miller, K. Rose, and A. Gersho. Mixture of experts regression modeling by deterministic annealing. IEEE Trans. on Signal Processing, 45 (11): 2811–2820, November 1997.
A. Sankar and C.-H. Lee. Robust speech recognition based on stochastic matching. In ICASSP, pages 121–124, Detroit, May 1995.
M. E. Tipping and C. M. Bishop. Mixtures of principal component analysers. NCRG Tech Report NCRG/97/003, Aston University, Birmingham U.K., 1997.
E. Trentin, Y. Bengio, C. Furlanello, and R. De Mori. Neural networks for speech recognition. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 311–362, London, 1998. Academic Press.
E. Trentin, and D. Giuliani. Speaker Normalization with a Mixture of Recurrent Networks. Proc. of ESANN97, Bruges, Belgium, April 1997.
V. Tresp and M. Taniguchi. Combining estimators using non-constant weighting functions. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 419–426, Cambridge (MA), 1995. MIT Press.
R. Watrous. Speaker normalization and adaptation using second-order connectionist networks. IEEE Trans. on Neural Networks, 4 (1): 21–30, January 1994.
G. Zavaliagkos, R. Schwartz, and J. Makhoul. Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In Proc. of ICASSP, pages I-676–679, Detroit, May 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this chapter
Cite this chapter
Sharkey, A.J.C. (1999). Model Selection of Combined Neural Nets for Speech Recognition. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_9
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0793-4_9
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive