Advertisement

Syntactic parsing as a knowledge acquisition problem

Long Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1319)

Abstract

Corpus linguistics involves the construction and annotation of large databases of text from spoken and written language. These have applications in NLP and taught grammar. This annotation represents the problem of the KA “bottleneck” in a new application area. This paper introduces parse checking as a KA problem, and compares it to other tree-oriented KA methodologies such as laddering and clustering. It argues that corpus linguistics represents a significant application area for KA. The laddering tools discussed here have been used to process thousands of tree structures. The paper compares two tools in use on the ICE-GB corpus. One tool, ICE Tree II, exploits the structure of grammatical trees more fully than the other. Timing results show a main learning effect which dominates any difference comparison. However, the more integrated tool reduces the scope for error.

Keywords

Noun Phrase Syntax Tree Repertory Grid Syntactic Parsing English Usage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bowden, P., Halstead, P. and Rose, T.G. (1996), Extracting Conceptual Knowledge From Text Using Explicit Relation Markers, (in Shadbolt, O'Hara and Schreiber, 1996, 147–162).Google Scholar
  2. Burnage, G., and Dunlop, D. (1992), Encoding the British National Corpus, in Aarts, J., de Haan, P., and Oostdijk, N. (eds.) 1992, English Language Corpora: Design, Analysis and Exploitation, Papers from the 13th international conference on English Language research on computerized corpora, Nijmegen 1992, Amsterdam: Rodopi.Google Scholar
  3. Corbridge, C., Rugg, G., Major, N.P., Shadbolt N.R., and Burton, A.M. (1994), Laddering: technique and tool use in knowledge acquisition, Knowledge Acquisition (1994) 6,315–341.CrossRefGoogle Scholar
  4. Cupit, J., and Shadbolt, N.R. (1996), Knowledge Discovery in Databases: Exploiting Knowledge-Level Redescription (in Shadbolt, O'Hara and Schreiber, 1996, 245–261).Google Scholar
  5. EAGLES (1996), Syntactic Annotation: Survey of Annotation practices. EAG-TCWG-SASG/2. Pisa: Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale.Google Scholar
  6. Etherington, D.W., and Reiter, R. (1983), On Inheritance Hierarchies With Exceptions, reprinted in Brachman, R.J., and Levesque, H.J. (eds.) (1985) Readings in Knowledge Representation, San Mateo, CA: Morgan Kaufman.Google Scholar
  7. Fang, A.C. (1996), The Survey Parser: Design and Development (Chapter 11 in Greenbaum, 1996b, 142–160).Google Scholar
  8. Greenbaum, S. (1992), A New Corpus of English: ICE, in Svartvik, J. (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm 4-8 August 1991, Berlin: Mouton de Gruyter.Google Scholar
  9. Greenbaum, S. (1996a), The Oxford English Grammar, Oxford: Oxford University Press.Google Scholar
  10. —(ed.) (1996b), Comparing English Worldwide: The International Corpus of English, Oxford: Clarendon Press.Google Scholar
  11. — and Ni, Y. (1996), About the ICE Tagset (Chapter 8 in Greenbaum, 1996b, 92–109).Google Scholar
  12. Halteren, H. Van and Oostdijk, N. (1993), Towards a Syntactic Database: the TOSCA Analysis System, in Aarts, J, de Haan, P. and Oostdijk, N. (eds), English Language Corpora: Design, Analysis and Exploitation. Amsterdam: Rodopi.Google Scholar
  13. Jonassen, D.H., Beissener, K., and Yacci, M. (1993), Structural Knowledge: Techniques for Representing, Conveying, and Acquiring Structural Knowledge, Hillsdale, NJ: LEA.Google Scholar
  14. Leech, G. and Garside, R. (1991), Running a Grammar Factory: on the compilation of parsed corpora, or treebanks, in Johansson, S. and Stenström, A.-B. (eds), English Computer Corpora: Selected Papers and Research Guide. Berlin: Mouton de Gruyter, 15–32.Google Scholar
  15. Major, N.P., and Reichgelt, H. (1990), ALTO: An Automated Laddering Tool, in Wielinga, B., Boose, J., Gaines, B. Schreiber, G., van Someren, M. (Eds.) (1990), Current Trends in Knowledge Acquisition, 222–236, Amsterdam: IOS Press.Google Scholar
  16. Major, N.P., and Shadbolt, N.R. (1992), CNN: Integrating Knowledge Elicitation with a Machine Learning Technique, in Proceedings of JKAW-92.Google Scholar
  17. Marcus, M., Marcinkiewicz, M.A. and Santorini, B. (1993), Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2, 313–330.Google Scholar
  18. Michalski, R.S. and Stepp, R.E. (1983), Learning from observation: conceptual clustering, in Michalski, R.S., Carbonell J.G. and Mitchell T.M. (Eds.), Machine Learning: an Artificial Intelligence Approach, 331–363, Palo Alto: CA, Tioga.Google Scholar
  19. Minsky, M. (1975), A Framework for the Representation of Knowledge, in Winston, P. (Ed.), The Psychology of Computer Vision, New York: McGraw Hill, 211–277.Google Scholar
  20. Paskiewicz, T., Patten, C., Shadbolt, N.R., Swallow, S., and Wallis, S.A. (1991), Functional specification of SET tools, SET deliverable D006, University of Nottingham.Google Scholar
  21. Quinn, A., and Porter, N. (1996), ICE Annotation Tools, (Chapter 6 in Greenbaum, 1996b, 65–78).Google Scholar
  22. Shadbolt, N.R., O'Hara, K. and Schreiber, G. (eds.) Advances in Knowledge Acquisition, Proceedings of EKAW '96, Berlin: Springer-Verlaag.Google Scholar
  23. Wallis, S.A. (1993), Machine Learning with Knowledge, in Proceedings of MLnet Workshop on Scientific Discovery 1993, MLnet.Google Scholar
  24. — (1997), Exploiting hierarchical sets in A. L, PhD Thesis (submitted), University of Nottingham.Google Scholar
  25. — and SHADBoLT, N.R. (1993), Induction as Knowledge Acquisition, Dept. of Psychology Postgraduate Conference 1993, Department of Psychology, University of Nottingham.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  1. 1.Department of English (Survey of English Usage)University College LondonLondonUK

Personalised recommendations