Model Selection of Combined Neural Nets for Speech Recognition

Sharkey, Amanda J. C.

doi:10.1007/978-1-4471-0793-4_9

Amanda J. C. Sharkey³

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

177 Accesses

Summary

The problem of finding criteria through which a model will be chosen to match a problem and available data and give optimal future performance is a crucial issue in practical applications, not to be understimated when proposing model combination to solve a complex regression or classification task. How can it be ensured that each specialised model has been trained with enough material and that the aggregate model has the optimal structure for reducing error on novel inputs? What if a key requirement is minimisation of training material and time?

This chapter introduces bootstrap error estimation for automatic model selection in combined networks: the resulting model is embedded in the acoustic front-end of an automatic speech recognition system based on hidden Markov models. The method is evaluated in two applications: in a large vocabulary (10,000 words), continuous speech recognition task and in digit recognition over a noisy telephone line. Bootstrap estimates of minimum MSE allow selection of regression models that improve system recognition performance. The procedure allows a flexible strategy for dealing with inter-speaker variability without requiring an additional validation set. Recognition results are compared for linear, generalised Radial Basis Functions and Multilayer Perceptron network architectures and with system re-training methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Aitkin and G. T. Wilson. Mixture models, outliers, and the EM algorithm. Technometrics, 22 (3): 325–331, August 1980.
Article MATH Google Scholar
B. Angelini, F. Brugnara, D. Falavigna, D. Giuliani, R. Gretter, and M. Omologo. Speaker Independent Continuous Speech Recognition Using an Acoustic-Phonetic Italian Corpus. In ICSLP 94, pages 1391–1394, Yokohama, 1994.
Google Scholar
J. R. Bellegarda, P. V. de Souza, A. J. Nadas, D. Nahamoo, M. A. Picheny, and L. R. Bahl. Robust speaker adaptation using a piecewise linear acoustic mapping. In ICASSP, pages 445–448, San Francisco, March 1992.
Google Scholar
C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.
Google Scholar
C. Bregler and S. M. Omohundro. Nonlinear image interpolation using manifold learning. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 973–980, Cambridge MA, 1995. MIT Press.
Google Scholar
Leo Breiman. Bagging predictors. Machine Learning, 26 (2): 123–140, 1996.
Google Scholar
D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2: 321–355, 1988.
MathSciNet MATH Google Scholar
H. Bunke and O. Bunke, editors. Nonlinear Regression, Functional Relations and Robust Methods. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1989.
Google Scholar
F. Class, A. Kaltenmeier, P. Regel, and K. Troller. Fast speaker adaptation for speech recognition system. In ICASSP 90, pages I-133–136, Albuquerque, April 1990.
Google Scholar
S. B. Davis and P. Mermelstein. Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoust. Speech and Signal Processing ASSP, 28 (4): 357–366, 1980.
Article Google Scholar
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
MATH Google Scholar
B. Efron. The Jackknife, the Bootstrap and Other Resampling Plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics. Bristol: SIAM, 1982.
Google Scholar
B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. New York: Chapman & Hall, Inc., 1993.
Google Scholar
D. Falavigna and R. Gretter. Evaluation of digit recognition over the telephone network. In Proceedings of ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pages 191–194, Pont-a-Mousson, France, April 1997.
Google Scholar
D. Falavigna and R. Gretter. On field experiments of continuous digit recognition over the telephone network. In Proceedings of EUROSPEECH ‘97, pages 1827–1830, Rhodes, Greece, September 1997.
Google Scholar
M. Federico, M. Cettolo, F. Brugnara, and G. Antoniol. Language modelling for efficient beam-search. Computer Speech and Language, 9: 353–379, 1995.
Article Google Scholar
C. Furlanello and D. Giuliani. Combining local PCA and Radial Basis Function Networks for speaker normalization. In Proceedings of the 1995 IEEE Workshop on Neural Networks for Signal Processing V,pages 233–242, Cambridge MA, 1995. IEEE.
Google Scholar
C. Furlanello, D. Giuliani, and E. Trentin. Connectionist speaker normalization with Generalized Resource Allocating Networks. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7 (1994 pages 1704–1707, Cambridge MA, 1995. MIT Press.
Google Scholar
J.-L. Gauvain and C.-H. Lee. Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing, 2 (2): 291–298, 1994.
Article Google Scholar
D. Giuliani and R. De Mori. Speaker adaptation. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 363–404, London, 1998. Academic Press.
Google Scholar
W. Härdle. Applied nonparametric regression, volume 19 of Econom. Soc. Monographs. Cambridge Un. Press, New York, 1990.
Google Scholar
R. J. Hathaway and J. C. Bezdek. Switching regression models and fuzzy clustering. IEEE Transactions on Fuzzy Systems, 1 (3): 195–204, 1993.
Article Google Scholar
John Hertz, Anders Krogh, and Richard Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.
Google Scholar
G. E. Hinton and Z. Ghahramani. Generative models for discovering sparse distributed representations. Philosophical Transactions Royal Society B, 352: 117–71190, 1997.
Google Scholar
G. E. Hinton, M. Revow, and P. Dayan. Recognizing handwritten digits using mixtures of linear models. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 1015–1022, Cambridge MA, 1995. MIT Press.
Google Scholar
X. D. Huang. Speaker normalization for speech recognition. In ICASSP, pages I-465–468, San Francisco, March 1992.
Google Scholar
N. Intrator and S. Edelman. Making a low-dimensional representation suitable for diverse tasks. Connection Science, 8 (2): 205–224, 1997.
Article Google Scholar
N. Iwahashi and Y. Sagisaka. Voice adaptation using multi-functional transformation with weighting by radial basis function networks. In ICSLP 94, pages III-1599–1602, September 1994.
Google Scholar
R. A. Jacobs. Methods For Combining Experts’ Probability Assessments. Neural Computation, 7 (5): 867–888, 1995.
Article Google Scholar
R. A. Jacobs and M. I. Jordan. Learning piecewise control strategies in a modular neural network architecture. IEEE Trans. on Systems, Man, and Cybernetics, 23 (2): 337–345, March 1993.
Article MATH Google Scholar
M. I. Jordan and R. A. Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation, 6 (2): 181–214, 1994.
Article Google Scholar
N. M. Kiefer. Discrete parameter variation: Efficient estimation of a switching regression model. Econometrica, 46 (2): 427–434, March 1978.
Article MathSciNet MATH Google Scholar
C. J. Leggetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9: 171–185, 1995.
Article Google Scholar
J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson. Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system. In EUROSPEECH 95, pages 2171–2174, Madrid, September 1995.
Google Scholar
L. Neumeyer and M. Weintraub. Probabilistic Optimum Filtering for Robust Speech Recognition. In Proc. of ICASSP, pages I-417–420, Adelaide, April 1994.
Google Scholar
T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo No. 1140, MIT, 1989.
Google Scholar
L. Y. Pratt. Experiments on the transfer of knowledge between neural networks. In S. Hanson, G. Drastal, and R. Rivest, editors, Computational Learning Theory and Natural Learning Systems, Constraints and Prospects, pages 523–560. MIT Press, Cambridge Ma, 1994.
Google Scholar
W.H. Press, S. A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes: the Art of Scientific Computing. Cambridge Univ. Press, 2nd edition, 1992.
Google Scholar
L.R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition,. Proc. of IEEE, 77 (2): 267–295, October 1989.
Article Google Scholar
A. V. Rao, D. Miller, K. Rose, and A. Gersho. Mixture of experts regression modeling by deterministic annealing. IEEE Trans. on Signal Processing, 45 (11): 2811–2820, November 1997.
Article Google Scholar
A. Sankar and C.-H. Lee. Robust speech recognition based on stochastic matching. In ICASSP, pages 121–124, Detroit, May 1995.
Google Scholar
M. E. Tipping and C. M. Bishop. Mixtures of principal component analysers. NCRG Tech Report NCRG/97/003, Aston University, Birmingham U.K., 1997.
Google Scholar
E. Trentin, Y. Bengio, C. Furlanello, and R. De Mori. Neural networks for speech recognition. In Renato De Mori, editor, Spoken Dialogues with Computers, pages 311–362, London, 1998. Academic Press.
Google Scholar
E. Trentin, and D. Giuliani. Speaker Normalization with a Mixture of Recurrent Networks. Proc. of ESANN97, Bruges, Belgium, April 1997.
Google Scholar
V. Tresp and M. Taniguchi. Combining estimators using non-constant weighting functions. In D. S. Touretzky G. Tesauro and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 419–426, Cambridge (MA), 1995. MIT Press.
Google Scholar
R. Watrous. Speaker normalization and adaptation using second-order connectionist networks. IEEE Trans. on Neural Networks, 4 (1): 21–30, January 1994.
Article Google Scholar
G. Zavaliagkos, R. Schwartz, and J. Makhoul. Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In Proc. of ICASSP, pages I-676–679, Detroit, May 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
Amanda J. C. Sharkey

Authors

Amanda J. C. Sharkey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
Amanda J. C. Sharkey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sharkey, A.J.C. (1999). Model Selection of Combined Neural Nets for Speech Recognition. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0793-4_9
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics