Syntactic parsing as a knowledge acquisition problem
Purchase on Springer.com
$29.95 / €24.95 / £19.95*
* Final gross prices may vary according to local VAT.
Corpus linguistics involves the construction and annotation of large databases of text from spoken and written language. These have applications in NLP and taught grammar. This annotation represents the problem of the KA “bottleneck” in a new application area. This paper introduces parse checking as a KA problem, and compares it to other tree-oriented KA methodologies such as laddering and clustering. It argues that corpus linguistics represents a significant application area for KA. The laddering tools discussed here have been used to process thousands of tree structures. The paper compares two tools in use on the ICE-GB corpus. One tool, ICE Tree II, exploits the structure of grammatical trees more fully than the other. Timing results show a main learning effect which dominates any difference comparison. However, the more integrated tool reduces the scope for error.
- Bowden, P., Halstead, P. and Rose, T.G. (1996), Extracting Conceptual Knowledge From Text Using Explicit Relation Markers, (in Shadbolt, O'Hara and Schreiber, 1996, 147–162).
- Burnage, G., and Dunlop, D. (1992), Encoding the British National Corpus, in Aarts, J., de Haan, P., and Oostdijk, N. (eds.) 1992, English Language Corpora: Design, Analysis and Exploitation, Papers from the 13th international conference on English Language research on computerized corpora, Nijmegen 1992, Amsterdam: Rodopi.
- Corbridge, C., Rugg, G., Major, N.P., Shadbolt N.R., and Burton, A.M. (1994), Laddering: technique and tool use in knowledge acquisition, Knowledge Acquisition (1994) 6,315–341. CrossRef
- Cupit, J., and Shadbolt, N.R. (1996), Knowledge Discovery in Databases: Exploiting Knowledge-Level Redescription (in Shadbolt, O'Hara and Schreiber, 1996, 245–261).
- EAGLES (1996), Syntactic Annotation: Survey of Annotation practices. EAG-TCWG-SASG/2. Pisa: Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale.
- Etherington, D.W., and Reiter, R. (1983), On Inheritance Hierarchies With Exceptions, reprinted in Brachman, R.J., and Levesque, H.J. (eds.) (1985) Readings in Knowledge Representation, San Mateo, CA: Morgan Kaufman.
- Fang, A.C. (1996), The Survey Parser: Design and Development (Chapter 11 in Greenbaum, 1996b, 142–160).
- Greenbaum, S. (1992), A New Corpus of English: ICE, in Svartvik, J. (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm 4-8 August 1991, Berlin: Mouton de Gruyter.
- Greenbaum, S. (1996a), The Oxford English Grammar, Oxford: Oxford University Press.
- —(ed.) (1996b), Comparing English Worldwide: The International Corpus of English, Oxford: Clarendon Press.
- — and Ni, Y. (1996), About the ICE Tagset (Chapter 8 in Greenbaum, 1996b, 92–109).
- Halteren, H. Van and Oostdijk, N. (1993), Towards a Syntactic Database: the TOSCA Analysis System, in Aarts, J, de Haan, P. and Oostdijk, N. (eds), English Language Corpora: Design, Analysis and Exploitation. Amsterdam: Rodopi.
- Jonassen, D.H., Beissener, K., and Yacci, M. (1993), Structural Knowledge: Techniques for Representing, Conveying, and Acquiring Structural Knowledge, Hillsdale, NJ: LEA.
- Leech, G. and Garside, R. (1991), Running a Grammar Factory: on the compilation of parsed corpora, or treebanks, in Johansson, S. and Stenström, A.-B. (eds), English Computer Corpora: Selected Papers and Research Guide. Berlin: Mouton de Gruyter, 15–32.
- Major, N.P., and Reichgelt, H. (1990), ALTO: An Automated Laddering Tool, in Wielinga, B., Boose, J., Gaines, B. Schreiber, G., van Someren, M. (Eds.) (1990), Current Trends in Knowledge Acquisition, 222–236, Amsterdam: IOS Press.
- Major, N.P., and Shadbolt, N.R. (1992), CNN: Integrating Knowledge Elicitation with a Machine Learning Technique, in Proceedings of JKAW-92.
- Marcus, M., Marcinkiewicz, M.A. and Santorini, B. (1993), Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2, 313–330.
- Michalski, R.S. and Stepp, R.E. (1983), Learning from observation: conceptual clustering, in Michalski, R.S., Carbonell J.G. and Mitchell T.M. (Eds.), Machine Learning: an Artificial Intelligence Approach, 331–363, Palo Alto: CA, Tioga.
- Minsky, M. (1975), A Framework for the Representation of Knowledge, in Winston, P. (Ed.), The Psychology of Computer Vision, New York: McGraw Hill, 211–277.
- Paskiewicz, T., Patten, C., Shadbolt, N.R., Swallow, S., and Wallis, S.A. (1991), Functional specification of SET tools, SET deliverable D006, University of Nottingham.
- Quinn, A., and Porter, N. (1996), ICE Annotation Tools, (Chapter 6 in Greenbaum, 1996b, 65–78).
- Shadbolt, N.R., O'Hara, K. and Schreiber, G. (eds.) Advances in Knowledge Acquisition, Proceedings of EKAW '96, Berlin: Springer-Verlaag.
- Wallis, S.A. (1993), Machine Learning with Knowledge, in Proceedings of MLnet Workshop on Scientific Discovery 1993, MLnet.
- — (1997), Exploiting hierarchical sets in A. L, PhD Thesis (submitted), University of Nottingham.
- — and SHADBoLT, N.R. (1993), Induction as Knowledge Acquisition, Dept. of Psychology Postgraduate Conference 1993, Department of Psychology, University of Nottingham.
- Syntactic parsing as a knowledge acquisition problem
- Book Title
- Knowledge Acquisition, Modeling and Management
- Book Subtitle
- 10th European Workshop, EKAW '97 Sant Feliu de Guixols, Catalonia, Spain October 15–18, 1997 Proceedings
- pp 285-300
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series Subtitle
- Lecture Notes in Artificial Intelligence
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Additional Links
- Industry Sectors
- eBook Packages
To view the rest of this content please follow the download PDF link above.