Linguistic Processing in a Speech Understanding System

Giachin, Egidio P.; Rullent, Claudio

doi:10.1007/978-3-642-76626-8_43

Egidio P. Giachin³ &
Claudio Rullent³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

277 Accesses
1 Citations

Abstract

The goal of a speech understanding system is to correctly identify the action to be taken as a response to a user’s voiced request. To this purpose, the system has to rely on some type of linguistic knowledge beside merely recognize words. Several approaches have been proposed to employ language modeling in speech understanding. They include unified architectures integrating modular knowledge sources that account for every level of knowledge from acoustics to linguistics, and two-level architectures in which the separation between recognition and linguistic processing is well defined. Within this approach, two main methods may be conceived: linguistic constraints are integrated into the recognizer, which decodes one string of words that is treated by a natural language interface; or the recognizer produces a scored word lattice that is subsequently processed by a suitable linguistic module. For the present study, this latter approach was considered the most promising one, provided a satisfactory solution to efficient word lattice parsing could be found.

Parsing a word lattice is a search activity whose space is extremely large. It may be performed in two basic modes, namely the left-to-right mode and the score-driven middle-out mode. Optimal algorithms based on the left-to-right mode induce a computation that grows polynomially with the lattice length, while those based on the middle-out mode work exponentially with length. However, it is possible to devise score-driven middle-out methods so that the amount of computation they induce depends on the average likelihood score of the word sequence they are expected to output. Hence, if these words are recognized with a good score, computation may get lower than with left-to-right methods.

This paper describes in detail an algorithm that was experimentally proven to exhibit high parsing efficiency in the task it was designed for (1000-word continuous speech understanding, restricted semantic domain, and high syntactic freedom). Improved efficiency is reached through the use of heuristics which, exploiting the redundancy of the middle-out parsing approach, permit to cut down search without sensibly invalidate the optimality of the method. Problems like imperfect determination of start and ending points of words and the absence of short function words from the lattice are also kept into account.

Experimental results, evaluated on lattices produced on the speaker-dependent version of the recognizer available in 1988, show that high-speed speech understanding is feasible compatibly with habitable language models (for a specific application) and reasonable accuracy of comprehension. Parsing time is about 1.8 seconds on a Sun 4 workstation and correct sentence understanding is 82% for a language model of perplexity 25.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.V.Aho, J.D.Ullman, The Theory of Parsing, Translation, and Compiling, Englewood Cliffs, NJ: Prentice-Hall, 1972.
Google Scholar
L.R.Bahl, F.Jelinek, and R.L.Mercer, “A maximum likelihood approach to continuous speech recognition”, IEEE Trans, on PAMI, vol. PAMI-5, pp. 179–190, March 1983.
Article Google Scholar
D.Bigoigne, A.Cozannet, M.Guyomard, G.Mercier, L.Miclet, M.Querre, and J.Siroux, “A versatile speaker-dependent continuous speech understanding system”, Proc. ICASSP 88, New York, April 1988.
Google Scholar
A.Brietzmann and U.Ehrlich, “The role of semantic processing in an automatic speech understanding system”, Proc COLING 86, Bonn, August 1986.
Google Scholar
Y.-L.Chow, S.Roukos, “Speech understanding using a unification grammar”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.
Google Scholar
M.DeMattia and E.P.Giachin, “Experimental results on large vocabulary continuous speech understanding”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.
Google Scholar
L.D.Erman, F.Hayes-Roth, V.R.Lesser, D.Raj Reddy, “The Hearsay-II speech understanding system: integrating knowledge to resolve uncertainty”, ACM Computing Survey, vol. 12, 1980.
Google Scholar
C.J. Fillmore, “The case for case”, in Bach, Harris (Eds.), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York, 1968.
Google Scholar
L.Fissore, E.P.Giachin, P.Laface, G.Micca, R.Pieraccini, and C.Rullent, “Experimental results on large vocabulary continuous speech recognition and understanding”, Proc ICASSP 88, New York, April 1988.
Google Scholar
L.Fissore, P.Laface, G.Micca, and R.Pieraccini, “Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition”, Proc ICASSP 88, New York, April 1988.
Google Scholar
E.P.Giachin and C.Rullent, “Robust parsing of severely corrupted spoken utterances”, Proc. COLING-88, Budapest, August 1988.
Google Scholar
E.P.Giachin, A.E.Rosenberg, and C.-H.Lee, “Word juncture modeling using phonological rules for HMM-based continuous speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
Google Scholar
G.Goerz and C.Beckstein, “How to parse gaps in spoken utterances”, Proc. 1st Conf. Europ. Chapt. ACL, 1983.
Google Scholar
Y.F.Gong, J.P.Haton, “A specialist society for continuous speech understanding”, Proc ICASSP 88, New York, April 1988.
Google Scholar
P.J.Hayes, A.G.Hauptmann, J.G.Carbonell, and M.Tomita, “Parsing spoken language: a semantic caseframe approach”, Proc. COLING 86, Bonn, WG, August 1986.
Google Scholar
D.G.Hays, “Dependency theory: a formalism and some observations”, Memorandum RM4087 P.R., The Rand Corporation, 1964.
Google Scholar
S.E.Levinson, “Structural methods in automatic speech recognition”, Proc of the IEEE, vol. 73, no. 11, pp.1625–1650, Nov.1985.
Article Google Scholar
B.T.Lowerre and D.R.Reddy, “The Harpy speech understanding system”, in Trends in Speech recognition, W.Lea, Ed., Englewood Cliffs, NJ: Prentice-Hall, 1980, pp.340–360.
Google Scholar
L.G.Miller and S.E.Levinson, “Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar”, Proc ICASSP 88, New York, April 1988.
Google Scholar
H.Murveit and R.Moore, “Integrating natural language constraints into HMM-based speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
Google Scholar
H.Ney, “Dynamic programming speech recognition using a context-free grammar”, Proc ICASSP 87, Dallas, April 1987.
Google Scholar
G.T.Niedermair, “Divided and valency-oriented parsing in speech understanding”, Proc. COLING 86, Bonn, August 1986.
Google Scholar
A.Paeseler, “Modification of Earley’s algorithm for speech recognition”, in Niemann, Lang, and Sagerer (Eds.), Recent advances in speech understanding and dialogue systems, NATO ASI series, Springer Verlag, 1987.
Google Scholar
M.Poesio and C. Rullent, “Modified caseframe parsing for speech understanding systems”, Proc. IJCAI 87, Milano, August 1987.
Google Scholar
L.R.Rabiner and S.E.Levinson, “A speaker-independent, syntax-directed, connected word recognition system based on HMMs and level building”, IEEE Trans, on ASSP, vol. ASSP-33, pp.561–573, June 1985.
Google Scholar
R.Schwartz and Y.-L.Chow, “The n-best algorithm: an efficient and exact procedure for finding the n most likely sentence hypotheses”, Proc ICASSP 90, Albuquerque, NM, April 1990.
Google Scholar
M.Tomita, “An efficient word lattice parsing algorithm for continuous speech recognition”, Proc ICASSP 86, Tokio, April 1986.
Google Scholar
W.A. Woods, “Optimal search strategies for speech understanding control”, Artificial Intelligence, vol. 18, 1982.
Google Scholar
V.Zue, J.Glass, D.Goodine, H.Leung, M.Phillips, J.Polifroni, and S.Seneff, “The Voyager speech understanding system: preliminary development and evaluation”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

CSELT — Centro Studi e Laboratori Telecomunicazioni, Via G. Reiss Romoli 274, 10148, Torino, Italy
Egidio P. Giachin & Claudio Rullent

Authors

Egidio P. Giachin
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Rullent
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Pietro Laface
School of Computer Science, 3480 University St., Montreal, Quebec, H3A 2A7, Canada
Renato De Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giachin, E.P., Rullent, C. (1992). Linguistic Processing in a Speech Understanding System. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-76626-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics