Robust Data Oriented Spoken Language Understanding

Sima’an, Khalil

doi:10.1007/1-4020-2295-6_16

Khalil Sima’an¹⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 23))

396 Accesses

Abstract

Spoken utterances do not always abide by linguistically motivated grammatical rules. These utterances exhibit various phenomena considered outside the realm of theoretically-oriented linguistic research. For a language model that extends linguistically motivated grammars with probabilistic reasoning, the problem is how to feature the robustness that is necessary for speech understanding. This paper addresses the issue of the robustness of the Data Oriented Parsing (DOP) model within a Dutch speech-based dialogue system. It presents an extension of the DOP model into a head-driven variant, which allows for Markovian generation of parse trees. It is shown empirically that the new variant improves over the original DOP model on two tasks: the formal understanding of speech utterances, and the extraction of semantic concepts from word lattices output by a speech recognizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A. and Ullman, J. (1972). The Theory of Parsing, Translation and Compiling, volume I, II. Prentice-Hall Series in Automatic Computation.
Google Scholar
Black et al., E. (1991). A procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.
Google Scholar
Bod, R. (1995). Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-dissertation series 1995–14, University of Amsterdam.
Google Scholar
Bonnema, R., Bod, R., and Scha, R. (1997). A DOP Model for Semantic Interpretation. In Proceedings of ACL-97, Madrid, Spain.
Google Scholar
Bonnema, R., Buying, P., and Scha, R. (1999). A new probability model for data oriented parsing. In Dekker, P. and Kerdiles, G., editors, Proceedings of the 12th Amsterdam Colloquium, Amsterdam, The Netherlands. Institute for Logic, Language and Computation, Department of Philosophy.
Google Scholar
Boros, M., Eckert, W., Gallwitz, F., Gorz, G., Hanrieder, G., and Niemann, H. (1996). Towards understanding spontaneous speech: Word accuracy vs. concept accuracy. In Proceedings of the Fourth International Conference on Spokenm Language Processing (ICSLP 96), Philadelphia.
Google Scholar
Charniak, E. (1999). A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island.
Google Scholar
Chen, S. and Goodman, J. (1998). An empirical study of smoothing techniques for language modeling. In Technical report TR-10-98, Harvard University.
Google Scholar
Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the EACL, pages 16–23, Madrid, Spain.
Google Scholar
Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights.
Google Scholar
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3).
Google Scholar
Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP Volume 2, pages 119–122.
Google Scholar
Scha, R. (1990). Language Theory and Language Technology; Competence and Performance. In de Kort, Q. and Leerdam, G., editors, Computertoepas-singen in de Neerlandistiek, Almere: LVVN-jaarboek. www.hum.uva.nl /computerlinguistiek/scha/IAAA/rs/cv.html
Google Scholar
Scha, R., Bonnema, R., Bod, R., and Sima’an, K. (1996). Disambiguation and Interpretation of Wordgraphs using Data Oriented Parsing. Probabilistic Natural Language Processing in the NWO priority Programme on Language and Speech Technology, Amsterdam.
Google Scholar
Sima’an, K. (1996). ComputationalComplexity of Probabilistic Disambiguation by means of Tree Grammars. In Proceedings of COLING’96, volume 2, pages 1175–1180, Copenhagen, Denmark.
Google Scholar
Sima’an, K. (1999). Learning Efficient Disambiguation. A PhD dissertation. ILLC dissertationseries 1999-02 (Utrecht University / University of Amsterdam), Amsterdam.
Google Scholar
Sima’an, K. (2000). Tree-gramParsing: Lexical Dependencies and Structual Relations. In Proceedings of the 38 ^th Annual Meeting of the Association for Computational Linguistics (ACL’00), pages 53–60, Hong Kong, China.
Google Scholar
Sima’an, K. (2002). Computational Ccomplexity of Prababilistic Disambiguation. NP-compleneness Results for Parsing Problems That Arise in Speech and Language Processing Applications. Grammars 5(2): 125–151.
Google Scholar
van Noord, G. (1995). The intersection of finite state automata and definite clause grammars. In Proceedings of ACL-95.
Google Scholar
van Noord, G. (1997). Evaluation of OVIS2 NLP components. In Technical Report #46, NWO Priority Programme Language and Speech Technology.
Google Scholar
van Noord, G., Bouma, G., Koeling, R., and Nederhof, M. (1999). Robust Grammatical Analysis for spoken dialogue systems. Journal of Natural Language Engineering, 5 (1):45–93.
Article Google Scholar
Veldhuijzen van Zanten, G. (1996). Semantics of update expressions. Technical report 24, NWO Priority Programme Language and Speech Technology, http://odur.let.rug.nl:4321/.
Veldhuijzen van Zanten, G., Bouma, G., Sima’an, K., van Noord, G., and Bonnema, R. (1999). Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System. In F. van Einde, I. S. and Schelkens, N., editors, Proceedings of Computational Linguistics In the Netherlands 1998.
Google Scholar
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Information Theory, IT-13:260–269.
Article Google Scholar
Younger, D. (1967). Recognition and parsing of context-free languages in time n³ Inf.Control, 10(2):189–208.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computational Linguistics, University of Amsterdam, Nieuwe Achtergracht 166, Amsterdam, The Netherlands
Khalil Sima’an

Authors

Khalil Sima’an
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tilburg University, Tilburg, The Netherlands
Harry Bunt
University of Sussex, Brighton, UK
John Carroll
University of Padua, Padua, Italy
Giorgio Satta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sima’an, K. (2004). Robust Data Oriented Spoken Language Understanding. In: Bunt, H., Carroll, J., Satta, G. (eds) New Developments in Parsing Technology. Text, Speech and Language Technology, vol 23. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2295-6_16

Download citation

DOI: https://doi.org/10.1007/1-4020-2295-6_16
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2293-7
Online ISBN: 978-1-4020-2295-1
eBook Packages: Humanities, Social Sciences and Law

Publish with us

Policies and ethics