Abstract
The goal of a speech understanding system is to correctly identify the action to be taken as a response to a user’s voiced request. To this purpose, the system has to rely on some type of linguistic knowledge beside merely recognize words. Several approaches have been proposed to employ language modeling in speech understanding. They include unified architectures integrating modular knowledge sources that account for every level of knowledge from acoustics to linguistics, and two-level architectures in which the separation between recognition and linguistic processing is well defined. Within this approach, two main methods may be conceived: linguistic constraints are integrated into the recognizer, which decodes one string of words that is treated by a natural language interface; or the recognizer produces a scored word lattice that is subsequently processed by a suitable linguistic module. For the present study, this latter approach was considered the most promising one, provided a satisfactory solution to efficient word lattice parsing could be found.
Parsing a word lattice is a search activity whose space is extremely large. It may be performed in two basic modes, namely the left-to-right mode and the score-driven middle-out mode. Optimal algorithms based on the left-to-right mode induce a computation that grows polynomially with the lattice length, while those based on the middle-out mode work exponentially with length. However, it is possible to devise score-driven middle-out methods so that the amount of computation they induce depends on the average likelihood score of the word sequence they are expected to output. Hence, if these words are recognized with a good score, computation may get lower than with left-to-right methods.
This paper describes in detail an algorithm that was experimentally proven to exhibit high parsing efficiency in the task it was designed for (1000-word continuous speech understanding, restricted semantic domain, and high syntactic freedom). Improved efficiency is reached through the use of heuristics which, exploiting the redundancy of the middle-out parsing approach, permit to cut down search without sensibly invalidate the optimality of the method. Problems like imperfect determination of start and ending points of words and the absence of short function words from the lattice are also kept into account.
Experimental results, evaluated on lattices produced on the speaker-dependent version of the recognizer available in 1988, show that high-speed speech understanding is feasible compatibly with habitable language models (for a specific application) and reasonable accuracy of comprehension. Parsing time is about 1.8 seconds on a Sun 4 workstation and correct sentence understanding is 82% for a language model of perplexity 25.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.V.Aho, J.D.Ullman, The Theory of Parsing, Translation, and Compiling, Englewood Cliffs, NJ: Prentice-Hall, 1972.
L.R.Bahl, F.Jelinek, and R.L.Mercer, “A maximum likelihood approach to continuous speech recognition”, IEEE Trans, on PAMI, vol. PAMI-5, pp. 179–190, March 1983.
D.Bigoigne, A.Cozannet, M.Guyomard, G.Mercier, L.Miclet, M.Querre, and J.Siroux, “A versatile speaker-dependent continuous speech understanding system”, Proc. ICASSP 88, New York, April 1988.
A.Brietzmann and U.Ehrlich, “The role of semantic processing in an automatic speech understanding system”, Proc COLING 86, Bonn, August 1986.
Y.-L.Chow, S.Roukos, “Speech understanding using a unification grammar”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.
M.DeMattia and E.P.Giachin, “Experimental results on large vocabulary continuous speech understanding”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.
L.D.Erman, F.Hayes-Roth, V.R.Lesser, D.Raj Reddy, “The Hearsay-II speech understanding system: integrating knowledge to resolve uncertainty”, ACM Computing Survey, vol. 12, 1980.
C.J. Fillmore, “The case for case”, in Bach, Harris (Eds.), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York, 1968.
L.Fissore, E.P.Giachin, P.Laface, G.Micca, R.Pieraccini, and C.Rullent, “Experimental results on large vocabulary continuous speech recognition and understanding”, Proc ICASSP 88, New York, April 1988.
L.Fissore, P.Laface, G.Micca, and R.Pieraccini, “Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition”, Proc ICASSP 88, New York, April 1988.
E.P.Giachin and C.Rullent, “Robust parsing of severely corrupted spoken utterances”, Proc. COLING-88, Budapest, August 1988.
E.P.Giachin, A.E.Rosenberg, and C.-H.Lee, “Word juncture modeling using phonological rules for HMM-based continuous speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
G.Goerz and C.Beckstein, “How to parse gaps in spoken utterances”, Proc. 1st Conf. Europ. Chapt. ACL, 1983.
Y.F.Gong, J.P.Haton, “A specialist society for continuous speech understanding”, Proc ICASSP 88, New York, April 1988.
P.J.Hayes, A.G.Hauptmann, J.G.Carbonell, and M.Tomita, “Parsing spoken language: a semantic caseframe approach”, Proc. COLING 86, Bonn, WG, August 1986.
D.G.Hays, “Dependency theory: a formalism and some observations”, Memorandum RM4087 P.R., The Rand Corporation, 1964.
S.E.Levinson, “Structural methods in automatic speech recognition”, Proc of the IEEE, vol. 73, no. 11, pp.1625–1650, Nov.1985.
B.T.Lowerre and D.R.Reddy, “The Harpy speech understanding system”, in Trends in Speech recognition, W.Lea, Ed., Englewood Cliffs, NJ: Prentice-Hall, 1980, pp.340–360.
L.G.Miller and S.E.Levinson, “Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar”, Proc ICASSP 88, New York, April 1988.
H.Murveit and R.Moore, “Integrating natural language constraints into HMM-based speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
H.Ney, “Dynamic programming speech recognition using a context-free grammar”, Proc ICASSP 87, Dallas, April 1987.
G.T.Niedermair, “Divided and valency-oriented parsing in speech understanding”, Proc. COLING 86, Bonn, August 1986.
A.Paeseler, “Modification of Earley’s algorithm for speech recognition”, in Niemann, Lang, and Sagerer (Eds.), Recent advances in speech understanding and dialogue systems, NATO ASI series, Springer Verlag, 1987.
M.Poesio and C. Rullent, “Modified caseframe parsing for speech understanding systems”, Proc. IJCAI 87, Milano, August 1987.
L.R.Rabiner and S.E.Levinson, “A speaker-independent, syntax-directed, connected word recognition system based on HMMs and level building”, IEEE Trans, on ASSP, vol. ASSP-33, pp.561–573, June 1985.
R.Schwartz and Y.-L.Chow, “The n-best algorithm: an efficient and exact procedure for finding the n most likely sentence hypotheses”, Proc ICASSP 90, Albuquerque, NM, April 1990.
M.Tomita, “An efficient word lattice parsing algorithm for continuous speech recognition”, Proc ICASSP 86, Tokio, April 1986.
W.A. Woods, “Optimal search strategies for speech understanding control”, Artificial Intelligence, vol. 18, 1982.
V.Zue, J.Glass, D.Goodine, H.Leung, M.Phillips, J.Polifroni, and S.Seneff, “The Voyager speech understanding system: preliminary development and evaluation”, Proc ICASSP 90, Albuquerque, NM, April, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giachin, E.P., Rullent, C. (1992). Linguistic Processing in a Speech Understanding System. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-76626-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive