Skip to main content

Robust Data Oriented Spoken Language Understanding

  • Chapter
New Developments in Parsing Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 23))

  • 396 Accesses

Abstract

Spoken utterances do not always abide by linguistically motivated grammatical rules. These utterances exhibit various phenomena considered outside the realm of theoretically-oriented linguistic research. For a language model that extends linguistically motivated grammars with probabilistic reasoning, the problem is how to feature the robustness that is necessary for speech understanding. This paper addresses the issue of the robustness of the Data Oriented Parsing (DOP) model within a Dutch speech-based dialogue system. It presents an extension of the DOP model into a head-driven variant, which allows for Markovian generation of parse trees. It is shown empirically that the new variant improves over the original DOP model on two tasks: the formal understanding of speech utterances, and the extraction of semantic concepts from word lattices output by a speech recognizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aho, A. and Ullman, J. (1972). The Theory of Parsing, Translation and Compiling, volume I, II. Prentice-Hall Series in Automatic Computation.

    Google Scholar 

  • Black et al., E. (1991). A procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.

    Google Scholar 

  • Bod, R. (1995). Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-dissertation series 1995–14, University of Amsterdam.

    Google Scholar 

  • Bonnema, R., Bod, R., and Scha, R. (1997). A DOP Model for Semantic Interpretation. In Proceedings of ACL-97, Madrid, Spain.

    Google Scholar 

  • Bonnema, R., Buying, P., and Scha, R. (1999). A new probability model for data oriented parsing. In Dekker, P. and Kerdiles, G., editors, Proceedings of the 12th Amsterdam Colloquium, Amsterdam, The Netherlands. Institute for Logic, Language and Computation, Department of Philosophy.

    Google Scholar 

  • Boros, M., Eckert, W., Gallwitz, F., Gorz, G., Hanrieder, G., and Niemann, H. (1996). Towards understanding spontaneous speech: Word accuracy vs. concept accuracy. In Proceedings of the Fourth International Conference on Spokenm Language Processing (ICSLP 96), Philadelphia.

    Google Scholar 

  • Charniak, E. (1999). A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island.

    Google Scholar 

  • Chen, S. and Goodman, J. (1998). An empirical study of smoothing techniques for language modeling. In Technical report TR-10-98, Harvard University.

    Google Scholar 

  • Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the EACL, pages 16–23, Madrid, Spain.

    Google Scholar 

  • Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights.

    Google Scholar 

  • Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3).

    Google Scholar 

  • Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP Volume 2, pages 119–122.

    Google Scholar 

  • Scha, R. (1990). Language Theory and Language Technology; Competence and Performance. In de Kort, Q. and Leerdam, G., editors, Computertoepas-singen in de Neerlandistiek, Almere: LVVN-jaarboek. www.hum.uva.nl /computerlinguistiek/scha/IAAA/rs/cv.html

    Google Scholar 

  • Scha, R., Bonnema, R., Bod, R., and Sima’an, K. (1996). Disambiguation and Interpretation of Wordgraphs using Data Oriented Parsing. Probabilistic Natural Language Processing in the NWO priority Programme on Language and Speech Technology, Amsterdam.

    Google Scholar 

  • Sima’an, K. (1996). ComputationalComplexity of Probabilistic Disambiguation by means of Tree Grammars. In Proceedings of COLING’96, volume 2, pages 1175–1180, Copenhagen, Denmark.

    Google Scholar 

  • Sima’an, K. (1999). Learning Efficient Disambiguation. A PhD dissertation. ILLC dissertationseries 1999-02 (Utrecht University / University of Amsterdam), Amsterdam.

    Google Scholar 

  • Sima’an, K. (2000). Tree-gramParsing: Lexical Dependencies and Structual Relations. In Proceedings of the 38 th Annual Meeting of the Association for Computational Linguistics (ACL’00), pages 53–60, Hong Kong, China.

    Google Scholar 

  • Sima’an, K. (2002). Computational Ccomplexity of Prababilistic Disambiguation. NP-compleneness Results for Parsing Problems That Arise in Speech and Language Processing Applications. Grammars 5(2): 125–151.

    Google Scholar 

  • van Noord, G. (1995). The intersection of finite state automata and definite clause grammars. In Proceedings of ACL-95.

    Google Scholar 

  • van Noord, G. (1997). Evaluation of OVIS2 NLP components. In Technical Report #46, NWO Priority Programme Language and Speech Technology.

    Google Scholar 

  • van Noord, G., Bouma, G., Koeling, R., and Nederhof, M. (1999). Robust Grammatical Analysis for spoken dialogue systems. Journal of Natural Language Engineering, 5 (1):45–93.

    Article  Google Scholar 

  • Veldhuijzen van Zanten, G. (1996). Semantics of update expressions. Technical report 24, NWO Priority Programme Language and Speech Technology, http://odur.let.rug.nl:4321/.

  • Veldhuijzen van Zanten, G., Bouma, G., Sima’an, K., van Noord, G., and Bonnema, R. (1999). Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System. In F. van Einde, I. S. and Schelkens, N., editors, Proceedings of Computational Linguistics In the Netherlands 1998.

    Google Scholar 

  • Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Information Theory, IT-13:260–269.

    Article  Google Scholar 

  • Younger, D. (1967). Recognition and parsing of context-free languages in time n3 Inf.Control, 10(2):189–208.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Kluwer Academic Publishers

About this chapter

Cite this chapter

Sima’an, K. (2004). Robust Data Oriented Spoken Language Understanding. In: Bunt, H., Carroll, J., Satta, G. (eds) New Developments in Parsing Technology. Text, Speech and Language Technology, vol 23. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2295-6_16

Download citation

  • DOI: https://doi.org/10.1007/1-4020-2295-6_16

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-2293-7

  • Online ISBN: 978-1-4020-2295-1

  • eBook Packages: Humanities, Social Sciences and Law

Publish with us

Policies and ethics