Skip to main content
Log in

The search for robustness in natural language understanding

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Practical natural language understanding systems used to be concerned with very small miniature domains only: They knew exactly what potential text might be about, and what kind of sentence structures to expect. This optimistic assumption is no longer feasible if NLU is to scale up to deal with text that naturally occurs in the "real world". The key issue is robustness: The system needs to be prepared for cases where the input data does not correspond to the expectations encoded in the grammar. In this paper, we survey the approaches towards the robustness problem that have been developed throughout the last decade. We inspect techniques to overcome both syntactically and semantically ill-formed input in sentence parsing and then look briefly into more recent ideas concerning the extraction of information from texts, and the related question of the role that linguistic research plays in this game. Finally, the robust sentence parsing schemes are classified on a more abstract level of analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allport, D. (1988). The TICC: Parsing Interesting Text. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 211–218, Austin.

  • AJCL (1983). American Journal of Computational Linguistics. Special Issue on Ill-formed Input 9 (3–4): 123–196.

    Google Scholar 

  • Bobrow, R. B. and Bates, M. (1982). Design Dimensions for Non-Normative Understanding Systems. In Proceedings of the 20th Annual Meeting of the Association for Computational Linguistics, pp. 153–156, Toronto.

  • Borissova, E. (1988). Two-Component Teaching System that Understands and Corrects Mistakes. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 68–70, Budapest.

  • Burton R. R. (1976). Semantic grammar: An engineering technique for constructing natural language understanding systems. Technical Report BBN Rep. 3453, Bolt, Beranek and Newman, Inc., Cambridge.

    Google Scholar 

  • Carbonell J. G. and Hayes P. J. (1983) Recovery Strategies for Parsing Extragrammatical Language. American Journal of Computational Linguistics 9 (3/4): 123–146.

    Google Scholar 

  • Carbonell J. G. and Hayes P. J. (1987). Robust Parsing Using Multiple Construction-Specific Strategies. In Bolc L., (ed.), Natural Language Parsing Systems, pp. 1–32. Springer, Berlin.

    Google Scholar 

  • Catt, M. E. (1988). Intelligent Diagnosis of Ungrammaticality in Computer-Assisted Language Instruction. Master's thesis, Dept. of Computer Science, University of Toronto.

  • Fain, J., Carbonell, J. G., Hayes, P. J., and Minton, S. (1985). MULTIPAR: A Robust Entity-Oriented Parser. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pp. 110–119, Irvine (CA).

  • Fass, D. and Hall, G. (1990). A Belief-Based View of Ill-Formed Input. Technical Report CSS/LCCR TR90-18, Centre for Systems Science, Simon Fraser University. Also in: P. McKevitt and Y. Wilks (eds.), Proceedings of the Fifth Rocky Mountains Conference on AI. Computing Research Laboratory, New Mexico State University, 1990.

  • Fass D. and Wilks Y. (1983). Preference Semantics, Ill-Formedness, and Metaphor. American Journal of Computational Linguistics 9 (3/4): 178–187.

    Google Scholar 

  • Gehrke, M. (1983). A Multilevel Approach to Handle Non-Standard Input. In Proceedings of the First Conference of the European Chapter of the Association for Computational Linguistics, pp. 183–187, Copenhangen.

  • Goeser, S. (1990). A Linguistic Theory of Robustness. In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), pp. 156–161, Helsinki.

  • Granger, R. H. (1977). FOUL-UP: A Program That Figures Out Meanings of Words From Context. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence (IJCAI-77), Cambridge (MA).

  • Granger R. H. (1983). The NOMAD System: Expectation-Based Detection and Correction of Errors During Understanding of Syntactically and Semantically Ill-Formed Text. American Journal of Computational Linguistics 9 (3/4): 188–196.

    Google Scholar 

  • Grishman, R. and Peng, P. (1988). Responding to Semantically Ill-Formed Input. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 66–70, Austin.

  • Grishman, R. and Sterling, J. (1990). Towards Robust Natural Language Analysis. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 106–108, Stanford.

  • Hafner, C. D. (1989). A Robust Method for Content Analysis of Natural Language Text. Technical Report NU-CCS-89-35, College of Computer Science, Northeastern University.

  • Hafner, C. D. (1990). Challenges for Text-Based Intelligent Systems. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 34–38, Stanford.

  • Hayes P. J. and Mouradian G. V. (1981). Flexible Parsing. American Journal of Computational Linguistics 7 (4): 232–242.

    Google Scholar 

  • Hendrix, G. G. (1977). Human Engineering for Applied Natural Language Processing. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence (IJCAI-77), pp. 183–191, Cambridge (MA).

  • Hirst, G. (1990). Mixed-Depth Representations for Natural Language Text. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 25–29, Stanford.

  • Jacobs, P. S. (1990a). Text Power and Intelligent Systems. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 1–4, Stanford.

  • Jacobs, P. S. (1990b). To Parse or Not to Parse: Relation-Driven Text Skimming. In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), pp. 194–198, Helsinki.

  • Jacobs, P. S. and Rau, L. F. (1990). The GE NLToolset: A Software Foundation for Intelligent Text Processing. In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), pp. 373–375, Helsinki.

  • Jensen, K. (1988). Why Computational Grammarians Can Be Skeptical About Existing Linguistic Theories. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 448–449, Budapest.

  • Jensen K., Heidorn G., Miller L., and Ravin Y. (1983). Parse Fitting and Prose Fixing: Getting a Hold on Ill-formedness. American Journal of Computational Linguistics, 9 (3/4): 147–160.

    Google Scholar 

  • Kakigahara, K. and Aizawa, T. (1988). Completion of Japanese Sentences by Inferring Function Words from Content Words. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 291–296, Budapest.

  • Kobsa, A. and Wahlster, W., eds. (1988). Special Issue on User Modeling, Computational Linguistics 14 (3).

  • Kudo, I., Koshino, H., Chung, M., and Morimoto, T. (1988). Schema Method: A Framework for Correcting Grammatically Ill-Formed Input. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 341–347, Budapest.

  • Kwasny S. C. and Sondheimer N. K. (1981). Relaxation Techniques for Parsing Grammatically Ill-formed Input in Natural Language Understanding Systems. American Journal of Computational Linguistics 7 (2): 99–108.

    Google Scholar 

  • Lang, B. (1988). Parsing Incomplete Sentences. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 365–371, Budapest.

  • Lavelli, A. and Stock, O. (1990). When Something is Missing: Ellipsis, Coordination and the Chart. In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), pp. 184–189, Helsinki.

  • Lesmo, L. and Torasso, P. (1983). A Flexible Natural Language Parser Based On a Two-Level Representation of Syntax. In Proceedings of the First Conference of the European Chapter of the Association for Computational Linguistics, pp. 114–121, Copenhagen.

  • Lesmo, L. and Torasso, P. (1984). Interpreting Syntactically Ill-Formed Sentences. In Proceedings of the 10th International Conference on Computational Linguistics (COLING-84), pp. 534–539, Stanford.

  • Linebarger, M. C., Dahl, D. A., Hirschman, L., and Passonneau, R. J. (1988). Sentence Fragments Regular Structures. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, pp. 7–15, Buffalo.

  • Lytinen, S. (1990). Robust Processing of Terse Text. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 10–14, Stanford.

  • Marsh, E. (1983). Utilizing Domain-Specific Information for Processing Compact Text. In Proceedings of the First Conference on Applied Natural Language Processing, pp. 99–103, Santa Monica (CA).

  • Matsunaga, S. and Kahda, M. (1988). Linguistic Processing Using a Dependency Structure Grammar for Speech Recognition and Understanding. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 402–407, Budapest.

  • McDonald, D. (1990). Robust Partial Parsing Through Incremental, Multi-Level Processing: Rationales and Biases. In Working Notes of the AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 17–19, Stanford.

  • Mellish, C. S. (1989). Some Chart-Based Techniques for Parsing Ill-Formed Input. In Proceedings of the27th Annual Meeting of the Association for Computational Linguistics, pp. 102–109, Vancouver.

  • Menzel, W. (1988). Error Diagnosing and Selection in a Training System for Second Language Learning. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 414–419, Budapest.

  • Minton, S., Hayes, P. J., and Fain, J. (1985). Controlling Search in Flexible Parsing. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85), pp. 785–787, Los Angeles.

  • Nagao, M. (1988). Language Engineering: The Real Bottle Neck of Natural Language Processing (Panel Introduction). In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), p 448, Budapest.

  • Rau, L. F. and Jacobs, P. S. (1988). Integrating Top-Down and Bottom-Up Strategies in a Text Processing System. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 129–135, Austin.

  • Schank R. C. (1975). Conceptual Information Processing. Elsevier-North Holland, New York.

    Google Scholar 

  • Schank, R. C. and Birnbaum, L. (1981). Memory, meaning, and syntax. Technical Report 189, Dept. of Computer Science, Yale University.

  • Schwind, C. (1988). Sensitive Parsing: Error Analysis and Explanation in an Intelligent Language Tutoring System. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 608–613, Budapest.

  • Selfridge M. (1986). Integrated Processing Produces Robust Understanding. Computational Linguistics 12 (2): 89–106.

    Google Scholar 

  • Stock, O., Falcone, R., and Insinnomo, P. (1988). Island Parsing and Bidirectional Charts. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), pp. 636–641, Budapest.

  • Tomita, M. (1988). "Linguistic" Sentences and "Real" Sentences. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), p. 453, Budapest.

  • Tsujii, J.-I. (1988). Reasons Why I Do Not Care Grammar Formalism. In Proceedings of the 12th International Conference on Computational Linguistics (COLING-88), p. 452, Budapest.

  • vonHahn, W. (1990). Unfinished Language (Electronic Colloquium). In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90), pp. 94–95, Helsinki.

  • Weischedel R. M. and Black J. E. (1980). Responding Intellingently to Unparsable Inputs. American Journal of Computational Linguistics 6 (2): 97–109.

    Google Scholar 

  • Weischedel R. M. and Ramshaw L. A. (1987). Reflections on the Knowledge Needed to Process Ill-Formed Language. In Nirenburg S. (ed.) Machine Translation, chapter 10, pp. 155–167. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Weischedel R. M. and Sondheimer N. K. (1983). Meta-Rules as a Basis for Processing Ill-formed Input. American Journal of Computational Linguistics 9 (3/4): 161–177.

    Google Scholar 

  • White G. M. (1990). Natural Language Understanding and Speech Recognition. Communications of the acm 33 (8): 72–82.

    Google Scholar 

  • Whitelock P., Wood M. M., Somers H. L., Johnson R., and Bennett P., editors (1987). Linguistic Theory and Computer Applications. Academic Press, London.

    Google Scholar 

  • Wilks Y. A. (1975). An Intelligent Analyser and Understander of English. Communications of the acm 18: 264–274.

    Google Scholar 

  • Wilks Y. A., editor (1990). Theoretical Issues in Natural Language Processing (TINLAP-3). Lawrence Erlbaum, Hillsdale, NJ.

    Google Scholar 

Download references

Authors

Additional information

Dept. of Computer Science, University of Toronto

For helpful comments on earlier drafts of this paper, I thank Judy Dick, Graeme Hirst, Diane Horton, Kem Luther, and Jan Wiebe. Financial support by the University of Toronto is acknowledged. Communication and requests for reprints should be directed to the author at Department of Computer Science, University of Toronto, Toronto, Canada M5S 1A4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stede, M. The search for robustness in natural language understanding. Artif Intell Rev 6, 383–414 (1992). https://doi.org/10.1007/BF00123691

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00123691

Keywords

Navigation