Skip to main content

Linguistic Processing in a Speech Understanding System

  • Conference paper

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

Abstract

The goal of a speech understanding system is to correctly identify the action to be taken as a response to a user’s voiced request. To this purpose, the system has to rely on some type of linguistic knowledge beside merely recognize words. Several approaches have been proposed to employ language modeling in speech understanding. They include unified architectures integrating modular knowledge sources that account for every level of knowledge from acoustics to linguistics, and two-level architectures in which the separation between recognition and linguistic processing is well defined. Within this approach, two main methods may be conceived: linguistic constraints are integrated into the recognizer, which decodes one string of words that is treated by a natural language interface; or the recognizer produces a scored word lattice that is subsequently processed by a suitable linguistic module. For the present study, this latter approach was considered the most promising one, provided a satisfactory solution to efficient word lattice parsing could be found.

Parsing a word lattice is a search activity whose space is extremely large. It may be performed in two basic modes, namely the left-to-right mode and the score-driven middle-out mode. Optimal algorithms based on the left-to-right mode induce a computation that grows polynomially with the lattice length, while those based on the middle-out mode work exponentially with length. However, it is possible to devise score-driven middle-out methods so that the amount of computation they induce depends on the average likelihood score of the word sequence they are expected to output. Hence, if these words are recognized with a good score, computation may get lower than with left-to-right methods.

This paper describes in detail an algorithm that was experimentally proven to exhibit high parsing efficiency in the task it was designed for (1000-word continuous speech understanding, restricted semantic domain, and high syntactic freedom). Improved efficiency is reached through the use of heuristics which, exploiting the redundancy of the middle-out parsing approach, permit to cut down search without sensibly invalidate the optimality of the method. Problems like imperfect determination of start and ending points of words and the absence of short function words from the lattice are also kept into account.

Experimental results, evaluated on lattices produced on the speaker-dependent version of the recognizer available in 1988, show that high-speed speech understanding is feasible compatibly with habitable language models (for a specific application) and reasonable accuracy of comprehension. Parsing time is about 1.8 seconds on a Sun 4 workstation and correct sentence understanding is 82% for a language model of perplexity 25.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.V.Aho, J.D.Ullman, The Theory of Parsing, Translation, and Compiling, Englewood Cliffs, NJ: Prentice-Hall, 1972.

    Google Scholar 

  2. L.R.Bahl, F.Jelinek, and R.L.Mercer, “A maximum likelihood approach to continuous speech recognition”, IEEE Trans, on PAMI, vol. PAMI-5, pp. 179–190, March 1983.

    Article  Google Scholar 

  3. D.Bigoigne, A.Cozannet, M.Guyomard, G.Mercier, L.Miclet, M.Querre, and J.Siroux, “A versatile speaker-dependent continuous speech understanding system”, Proc. ICASSP 88, New York, April 1988.

    Google Scholar 

  4. A.Brietzmann and U.Ehrlich, “The role of semantic processing in an automatic speech understanding system”, Proc COLING 86, Bonn, August 1986.

    Google Scholar 

  5. Y.-L.Chow, S.Roukos, “Speech understanding using a unification grammar”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.

    Google Scholar 

  6. M.DeMattia and E.P.Giachin, “Experimental results on large vocabulary continuous speech understanding”, Proc ICASSP 89, Glasgow, Scotland, May, 1989.

    Google Scholar 

  7. L.D.Erman, F.Hayes-Roth, V.R.Lesser, D.Raj Reddy, “The Hearsay-II speech understanding system: integrating knowledge to resolve uncertainty”, ACM Computing Survey, vol. 12, 1980.

    Google Scholar 

  8. C.J. Fillmore, “The case for case”, in Bach, Harris (Eds.), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York, 1968.

    Google Scholar 

  9. L.Fissore, E.P.Giachin, P.Laface, G.Micca, R.Pieraccini, and C.Rullent, “Experimental results on large vocabulary continuous speech recognition and understanding”, Proc ICASSP 88, New York, April 1988.

    Google Scholar 

  10. L.Fissore, P.Laface, G.Micca, and R.Pieraccini, “Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition”, Proc ICASSP 88, New York, April 1988.

    Google Scholar 

  11. E.P.Giachin and C.Rullent, “Robust parsing of severely corrupted spoken utterances”, Proc. COLING-88, Budapest, August 1988.

    Google Scholar 

  12. E.P.Giachin, A.E.Rosenberg, and C.-H.Lee, “Word juncture modeling using phonological rules for HMM-based continuous speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.

    Google Scholar 

  13. G.Goerz and C.Beckstein, “How to parse gaps in spoken utterances”, Proc. 1st Conf. Europ. Chapt. ACL, 1983.

    Google Scholar 

  14. Y.F.Gong, J.P.Haton, “A specialist society for continuous speech understanding”, Proc ICASSP 88, New York, April 1988.

    Google Scholar 

  15. P.J.Hayes, A.G.Hauptmann, J.G.Carbonell, and M.Tomita, “Parsing spoken language: a semantic caseframe approach”, Proc. COLING 86, Bonn, WG, August 1986.

    Google Scholar 

  16. D.G.Hays, “Dependency theory: a formalism and some observations”, Memorandum RM4087 P.R., The Rand Corporation, 1964.

    Google Scholar 

  17. S.E.Levinson, “Structural methods in automatic speech recognition”, Proc of the IEEE, vol. 73, no. 11, pp.1625–1650, Nov.1985.

    Article  Google Scholar 

  18. B.T.Lowerre and D.R.Reddy, “The Harpy speech understanding system”, in Trends in Speech recognition, W.Lea, Ed., Englewood Cliffs, NJ: Prentice-Hall, 1980, pp.340–360.

    Google Scholar 

  19. L.G.Miller and S.E.Levinson, “Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar”, Proc ICASSP 88, New York, April 1988.

    Google Scholar 

  20. H.Murveit and R.Moore, “Integrating natural language constraints into HMM-based speech recognition”, Proc ICASSP 90, Albuquerque, NM, April, 1990.

    Google Scholar 

  21. H.Ney, “Dynamic programming speech recognition using a context-free grammar”, Proc ICASSP 87, Dallas, April 1987.

    Google Scholar 

  22. G.T.Niedermair, “Divided and valency-oriented parsing in speech understanding”, Proc. COLING 86, Bonn, August 1986.

    Google Scholar 

  23. A.Paeseler, “Modification of Earley’s algorithm for speech recognition”, in Niemann, Lang, and Sagerer (Eds.), Recent advances in speech understanding and dialogue systems, NATO ASI series, Springer Verlag, 1987.

    Google Scholar 

  24. M.Poesio and C. Rullent, “Modified caseframe parsing for speech understanding systems”, Proc. IJCAI 87, Milano, August 1987.

    Google Scholar 

  25. L.R.Rabiner and S.E.Levinson, “A speaker-independent, syntax-directed, connected word recognition system based on HMMs and level building”, IEEE Trans, on ASSP, vol. ASSP-33, pp.561–573, June 1985.

    Google Scholar 

  26. R.Schwartz and Y.-L.Chow, “The n-best algorithm: an efficient and exact procedure for finding the n most likely sentence hypotheses”, Proc ICASSP 90, Albuquerque, NM, April 1990.

    Google Scholar 

  27. M.Tomita, “An efficient word lattice parsing algorithm for continuous speech recognition”, Proc ICASSP 86, Tokio, April 1986.

    Google Scholar 

  28. W.A. Woods, “Optimal search strategies for speech understanding control”, Artificial Intelligence, vol. 18, 1982.

    Google Scholar 

  29. V.Zue, J.Glass, D.Goodine, H.Leung, M.Phillips, J.Polifroni, and S.Seneff, “The Voyager speech understanding system: preliminary development and evaluation”, Proc ICASSP 90, Albuquerque, NM, April, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Giachin, E.P., Rullent, C. (1992). Linguistic Processing in a Speech Understanding System. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76626-8_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-76628-2

  • Online ISBN: 978-3-642-76626-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics