Continuous-Speech Recognition in the SPICOS-II System

Paeseler, Annedore; Ney, Hermann; Steinbiss, Volker; Höge, Harald; Marschall, Erwin

doi:10.1007/978-3-642-75182-0_28

Annedore Paeseler³,
Hermann Ney³,
Volker Steinbiss³,
Harald Höge⁴ &
…
Erwin Marschall⁴

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 227))

69 Accesses
1 Altmetric

Abstract

The goal of the SPICOS project is the development of a system for answering spoken database queries. This paper describes the present state of two modules for speech recognition developed in this project. The two approaches described can be characterized as a bottom-up and an integrated approach.

In the bottom-up approach, a data-driven two-network matching parser compares the input network of alternative phonological units with a word lexicon organized as a cyclic network. This lexicon contains not only the standard pronunciation, it also models inter-word and intra-word assimilations. Substitutions, deletions and insertions of single phonemes are also taken into account during the match. The output of the parser is a network of word hypotheses. Results are presented with respect to phoneme and word recognition.

In the integrated approach, there are three knowledge sources, namely phoneme models, pronunciation lexicon, and language model. They are integrated into a global search procedure, which is based on statistical decision theory and which finds that word sequence which best explains the input speech signal. A stochastic language model based on probabilities of trigrams, bigrams, and unigrams of word categories is used to incorporate language restrictions into the search process. The word-error rate is reduced from 22% without language model to 9% with a stochastic language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. K. Baker: “Stochastic Modeling for Automatic Speech Understanding”, in D. R. Reddy (ed.): ‘Speech Recognition’, Academic Press, New York, pp. 512–542, 1975.
Google Scholar
M. Brenner, H. Höge, E. Marschall, J. Romano: “Word Recognition in Continuous Speech using a Phonological Based Two-Network Matching Parser and a Synthesis Based Prediction”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 457–460, May 1989.
Google Scholar
H. Bunt: “Mass Nouns and Model Theoretic Semantics”, Cambridge University Press, 1985.
Google Scholar
J. P. van Hemert, U. Adriaens-Porzig, L. M. H. Adiaens: “Speech Synthesis in the SPICOS project”, in: H. G. Tillmann, G. Willee (eds.): ‘Analyse und Synthese gesprochener Sprache’, Georg Olms Verlag Hüdesheim, pp. 34–39, 1987.
Google Scholar
H. Höge, H. Ney: “Architektur des sprachverstehenden Systems SPICOS”. Proc. Kleinheubacher Bericht No. 29, FTZ Darmstadt 1986, pp. 29–36.
Google Scholar
H. Höge et. al.: “Syllable-based Acoustic-Phonetic Decoding and Word Hypotheses Generation in Fluently Spoken Speech”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Tokyo, pp. 30. 1. 1–4, April 1986.
Google Scholar
F. Jelinek: “Continuous Speech Recognition by Statistical Methods”, Proc. of the IEEE, Vol. 64, No. 10, pp. 532–556, April 1976.
Article Google Scholar
F. Jelinek, R.L. Mercer: “Interpolated Estimation of Markov Source parameters from Sparse Data”, in: Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal (eds.), Amsterdam: North Holland, 1980.
Google Scholar
F. Jelinek: “The Development of an Experimental Discrete Dictation Recognizer”, Proc. of the IEEE, Vol. 73, No. 11, pp. 1616–1624. Nov. 1985.
Article Google Scholar
S. M. Katz: “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 3, pp. 400–401, March 1987.
Article Google Scholar
D. Mergel, A. Paeseler: “Construction of Language Models for Spoken Data Base Queries”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 13. 1–4, April 1987.
Google Scholar
A. Nadas: “On Turing’s Formula for Word Probabilities”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP–33, No. 6, pp. 1414–1416, Dec. 1985.
Article Google Scholar
H. Ney: “The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 263–271, April 1984.
Article Google Scholar
H. Ney, D. Mergel, A. Noll, A. Paeseler: “A Data-Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 10. 1–4, April 1987.
Google Scholar
H. Ney, A. Noll: “Phoneme Modelling Using Continuous Mixture Densities”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, New York, pp. 437–440, April 1988.
Google Scholar
G. T. Niedermair: “Syntactic Analysis in Speech Understanding”. Proc. of the Europ. Conf. on Speech Techn., Edinburgh, 1987.
Google Scholar
A. Noll, H. Ney: “Training of Phoneme Models in a Sentence Recognition System”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.29. 6. 1–4, April 1987.
Google Scholar
A. Paeseler, H. Ney: “Continuous speech recognition using a stochastic language model”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 719–722, May 1989.
Google Scholar
O. Schmidbauer: “Syllable-based Segment-Hypotheses Generation in Fluently Spoken Speech Using Gross Articulatory Features”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp. 391–394, April 1987.
Google Scholar
O. Schmidbauer: “Robust Statistic Modelling of Systematic Variabilities in Continuous Speech Incorporating Acoustic-Articulatory Relations”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 616–619, May 1989.
Google Scholar
J. Sotscheck: “Sätze für Sprachgütemessungen und ihre phonologische Anpassung an die deutsche Sprache”, Proc. DAGA ’84, Deutsche Arbeitsgemeinschaft für Akustik, Darmstadt, West Germany, 4 p., March 1984.
Google Scholar
V. Steinbiss: “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System”, to appear in the Proc. of the European Conf. on Speech Communication and Technology, Paris, Sept. 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Philips GmbH Forschungslaboratorium Hamburg, P.O. Box 540840, D-2000, Hamburg 54, West Germany
Annedore Paeseler, Hermann Ney & Volker Steinbiss
Corporate Research and Development, Siemens AG, P.O. Box 830951, D-8000, Munich 83, West Germany
Harald Höge & Erwin Marschall

Authors

Annedore Paeseler
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar
Volker Steinbiss
View author publications
You can also search for this author in PubMed Google Scholar
Harald Höge
View author publications
You can also search for this author in PubMed Google Scholar
Erwin Marschall
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Informatik, Technische Universität München, Postfach 202420, D-8000, München 2, Germany
W. Brauer & C. Freksa &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paeseler, A., Ney, H., Steinbiss, V., Höge, H., Marschall, E. (1989). Continuous-Speech Recognition in the SPICOS-II System. In: Brauer, W., Freksa, C. (eds) Wissensbasierte Systeme. Informatik-Fachberichte, vol 227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-75182-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-75182-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51838-9
Online ISBN: 978-3-642-75182-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics