Abstract
The goal of the SPICOS project is the development of a system for answering spoken database queries. This paper describes the present state of two modules for speech recognition developed in this project. The two approaches described can be characterized as a bottom-up and an integrated approach.
In the bottom-up approach, a data-driven two-network matching parser compares the input network of alternative phonological units with a word lexicon organized as a cyclic network. This lexicon contains not only the standard pronunciation, it also models inter-word and intra-word assimilations. Substitutions, deletions and insertions of single phonemes are also taken into account during the match. The output of the parser is a network of word hypotheses. Results are presented with respect to phoneme and word recognition.
In the integrated approach, there are three knowledge sources, namely phoneme models, pronunciation lexicon, and language model. They are integrated into a global search procedure, which is based on statistical decision theory and which finds that word sequence which best explains the input speech signal. A stochastic language model based on probabilities of trigrams, bigrams, and unigrams of word categories is used to incorporate language restrictions into the search process. The word-error rate is reduced from 22% without language model to 9% with a stochastic language model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. K. Baker: “Stochastic Modeling for Automatic Speech Understanding”, in D. R. Reddy (ed.): ‘Speech Recognition’, Academic Press, New York, pp. 512–542, 1975.
M. Brenner, H. Höge, E. Marschall, J. Romano: “Word Recognition in Continuous Speech using a Phonological Based Two-Network Matching Parser and a Synthesis Based Prediction”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 457–460, May 1989.
H. Bunt: “Mass Nouns and Model Theoretic Semantics”, Cambridge University Press, 1985.
J. P. van Hemert, U. Adriaens-Porzig, L. M. H. Adiaens: “Speech Synthesis in the SPICOS project”, in: H. G. Tillmann, G. Willee (eds.): ‘Analyse und Synthese gesprochener Sprache’, Georg Olms Verlag Hüdesheim, pp. 34–39, 1987.
H. Höge, H. Ney: “Architektur des sprachverstehenden Systems SPICOS”. Proc. Kleinheubacher Bericht No. 29, FTZ Darmstadt 1986, pp. 29–36.
H. Höge et. al.: “Syllable-based Acoustic-Phonetic Decoding and Word Hypotheses Generation in Fluently Spoken Speech”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Tokyo, pp. 30. 1. 1–4, April 1986.
F. Jelinek: “Continuous Speech Recognition by Statistical Methods”, Proc. of the IEEE, Vol. 64, No. 10, pp. 532–556, April 1976.
F. Jelinek, R.L. Mercer: “Interpolated Estimation of Markov Source parameters from Sparse Data”, in: Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal (eds.), Amsterdam: North Holland, 1980.
F. Jelinek: “The Development of an Experimental Discrete Dictation Recognizer”, Proc. of the IEEE, Vol. 73, No. 11, pp. 1616–1624. Nov. 1985.
S. M. Katz: “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 3, pp. 400–401, March 1987.
D. Mergel, A. Paeseler: “Construction of Language Models for Spoken Data Base Queries”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 13. 1–4, April 1987.
A. Nadas: “On Turing’s Formula for Word Probabilities”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP–33, No. 6, pp. 1414–1416, Dec. 1985.
H. Ney: “The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 263–271, April 1984.
H. Ney, D. Mergel, A. Noll, A. Paeseler: “A Data-Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 10. 1–4, April 1987.
H. Ney, A. Noll: “Phoneme Modelling Using Continuous Mixture Densities”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, New York, pp. 437–440, April 1988.
G. T. Niedermair: “Syntactic Analysis in Speech Understanding”. Proc. of the Europ. Conf. on Speech Techn., Edinburgh, 1987.
A. Noll, H. Ney: “Training of Phoneme Models in a Sentence Recognition System”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.29. 6. 1–4, April 1987.
A. Paeseler, H. Ney: “Continuous speech recognition using a stochastic language model”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 719–722, May 1989.
O. Schmidbauer: “Syllable-based Segment-Hypotheses Generation in Fluently Spoken Speech Using Gross Articulatory Features”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp. 391–394, April 1987.
O. Schmidbauer: “Robust Statistic Modelling of Systematic Variabilities in Continuous Speech Incorporating Acoustic-Articulatory Relations”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 616–619, May 1989.
J. Sotscheck: “Sätze für Sprachgütemessungen und ihre phonologische Anpassung an die deutsche Sprache”, Proc. DAGA ’84, Deutsche Arbeitsgemeinschaft für Akustik, Darmstadt, West Germany, 4 p., March 1984.
V. Steinbiss: “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System”, to appear in the Proc. of the European Conf. on Speech Communication and Technology, Paris, Sept. 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paeseler, A., Ney, H., Steinbiss, V., Höge, H., Marschall, E. (1989). Continuous-Speech Recognition in the SPICOS-II System. In: Brauer, W., Freksa, C. (eds) Wissensbasierte Systeme. Informatik-Fachberichte, vol 227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-75182-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-75182-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51838-9
Online ISBN: 978-3-642-75182-0
eBook Packages: Springer Book Archive