Abstract
Spoken utterances do not always abide by linguistically motivated grammatical rules. These utterances exhibit various phenomena considered outside the realm of theoretically-oriented linguistic research. For a language model that extends linguistically motivated grammars with probabilistic reasoning, the problem is how to feature the robustness that is necessary for speech understanding. This paper addresses the issue of the robustness of the Data Oriented Parsing (DOP) model within a Dutch speech-based dialogue system. It presents an extension of the DOP model into a head-driven variant, which allows for Markovian generation of parse trees. It is shown empirically that the new variant improves over the original DOP model on two tasks: the formal understanding of speech utterances, and the extraction of semantic concepts from word lattices output by a speech recognizer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A. and Ullman, J. (1972). The Theory of Parsing, Translation and Compiling, volume I, II. Prentice-Hall Series in Automatic Computation.
Black et al., E. (1991). A procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.
Bod, R. (1995). Enriching Linguistics with Statistics: Performance models of Natural Language. PhD thesis, ILLC-dissertation series 1995–14, University of Amsterdam.
Bonnema, R., Bod, R., and Scha, R. (1997). A DOP Model for Semantic Interpretation. In Proceedings of ACL-97, Madrid, Spain.
Bonnema, R., Buying, P., and Scha, R. (1999). A new probability model for data oriented parsing. In Dekker, P. and Kerdiles, G., editors, Proceedings of the 12th Amsterdam Colloquium, Amsterdam, The Netherlands. Institute for Logic, Language and Computation, Department of Philosophy.
Boros, M., Eckert, W., Gallwitz, F., Gorz, G., Hanrieder, G., and Niemann, H. (1996). Towards understanding spontaneous speech: Word accuracy vs. concept accuracy. In Proceedings of the Fourth International Conference on Spokenm Language Processing (ICSLP 96), Philadelphia.
Charniak, E. (1999). A maximum-entropy-inspired parser. In Report CS-99-12, Providence, Rhode Island.
Chen, S. and Goodman, J. (1998). An empirical study of smoothing techniques for language modeling. In Technical report TR-10-98, Harvard University.
Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the EACL, pages 16–23, Madrid, Spain.
Jelinek, F., Lafferty, J., and Mercer, R. (1990). Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684). Yorktown Heights.
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3).
Oerder, M. and Ney, H. (1993). Word graphs: An efficient interface between continuous-speech recognition and language understanding. In ICASSP Volume 2, pages 119–122.
Scha, R. (1990). Language Theory and Language Technology; Competence and Performance. In de Kort, Q. and Leerdam, G., editors, Computertoepas-singen in de Neerlandistiek, Almere: LVVN-jaarboek. www.hum.uva.nl /computerlinguistiek/scha/IAAA/rs/cv.html
Scha, R., Bonnema, R., Bod, R., and Sima’an, K. (1996). Disambiguation and Interpretation of Wordgraphs using Data Oriented Parsing. Probabilistic Natural Language Processing in the NWO priority Programme on Language and Speech Technology, Amsterdam.
Sima’an, K. (1996). ComputationalComplexity of Probabilistic Disambiguation by means of Tree Grammars. In Proceedings of COLING’96, volume 2, pages 1175–1180, Copenhagen, Denmark.
Sima’an, K. (1999). Learning Efficient Disambiguation. A PhD dissertation. ILLC dissertationseries 1999-02 (Utrecht University / University of Amsterdam), Amsterdam.
Sima’an, K. (2000). Tree-gramParsing: Lexical Dependencies and Structual Relations. In Proceedings of the 38 th Annual Meeting of the Association for Computational Linguistics (ACL’00), pages 53–60, Hong Kong, China.
Sima’an, K. (2002). Computational Ccomplexity of Prababilistic Disambiguation. NP-compleneness Results for Parsing Problems That Arise in Speech and Language Processing Applications. Grammars 5(2): 125–151.
van Noord, G. (1995). The intersection of finite state automata and definite clause grammars. In Proceedings of ACL-95.
van Noord, G. (1997). Evaluation of OVIS2 NLP components. In Technical Report #46, NWO Priority Programme Language and Speech Technology.
van Noord, G., Bouma, G., Koeling, R., and Nederhof, M. (1999). Robust Grammatical Analysis for spoken dialogue systems. Journal of Natural Language Engineering, 5 (1):45–93.
Veldhuijzen van Zanten, G. (1996). Semantics of update expressions. Technical report 24, NWO Priority Programme Language and Speech Technology, http://odur.let.rug.nl:4321/.
Veldhuijzen van Zanten, G., Bouma, G., Sima’an, K., van Noord, G., and Bonnema, R. (1999). Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System. In F. van Einde, I. S. and Schelkens, N., editors, Proceedings of Computational Linguistics In the Netherlands 1998.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Information Theory, IT-13:260–269.
Younger, D. (1967). Recognition and parsing of context-free languages in time n3 Inf.Control, 10(2):189–208.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Kluwer Academic Publishers
About this chapter
Cite this chapter
Sima’an, K. (2004). Robust Data Oriented Spoken Language Understanding. In: Bunt, H., Carroll, J., Satta, G. (eds) New Developments in Parsing Technology. Text, Speech and Language Technology, vol 23. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2295-6_16
Download citation
DOI: https://doi.org/10.1007/1-4020-2295-6_16
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2293-7
Online ISBN: 978-1-4020-2295-1
eBook Packages: Humanities, Social Sciences and Law