Abstract
This chapter describes a system for automatically acquiring sub-categorization information for Swedish verbs from corpora by using available lexical resources in order to reduce the effects of the knowledge acquisition bottleneck. I emphasize the need of such knowledge and focus on the automatic validation and qualitative and quantitative completion of the content of these resources, as far as that is possible. The aim of this work is to maintain a complete knowledge-base, which can be easily updated and extended with minimal human supervision. This knowledge is of vital importance in a broad range of applications, such as lexically specified grammars, e.g. HPSG, parsing and language learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abney, S. (1994) Partial Parsing, Tutorial ANLP-91 4, Stuttgart
Abney, S. (1996) Part of Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech,Young, S. and Bloothooft, G. (eds), Vol. 2 of the Text, Speech and Language Technology series, Kluwer Academic Publishers, Dodrecht
Adriaens, G., de Braekeleer, G. (1992) Converting Large On-Line Valency Dictionaries for NLP Applications: from PROTON Descriptions to METAL Frames, Proc. COLING 92, Nantes, France, Vol. 3, pp. 1182–1186
Aone, C., McKee, D. (1996) Acquiring Predicate-Argument Mapping Information from Multilingual Texts, In Corpus Processing for Language Acquisition,Boguraev B. and Pustejovsky J. (eds), pp. 191–202, Bradford MIT Press
Basili, R., Pazienza, M.T., Velardi, P. (1992) A Shallow Syntactic Analyser to Extract Word Associations from Corpora, Journal of the Association for Literary and Linguistic Computing, Vol. 7. 2, pp. 113–123
Brent, M. (1991) Automatic Acquisition of Subcategorization Frames from Untagged text, Proc. of the 29th ACL, Univ. of California, Berkeley, pp. 209–214
Brent, M. (1993) From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax, Journal of Computational Linguistics, Vol. 192, pp. 243–262
Brent, M. (1994) Surface Cues and Robust Inference as a Basis for the Early Acquisition of Subcategorization Frames, Lingua 92, pp. 433–470
Brill, E. (1995) Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Journal of Computational Linguistics, Vol. 21. 4, pp. 543–565
Briscoe, T., Carroll, J. (1994) Towards Automatic Extraction of Argument Structure from Corpora, Rank Xerox Technical report MLTT-006
Brown, P., Della Pietra S., Della Pietra, V., Mercer, R. (1991) Word-Sense Disambiguation Using Statistical Methods, Proc. 29th ACL, Univ. of California, Berkeley, pp. 264–270
Calzolari, N. (1991) Lexical Databases and Textual Corpora: Perspectives of Integration for a Lexical Knowledge-Base, In Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Zernik U. (ed), chap. 8, pp. 191–208, Lawrence Erlbaum Assoc. Publ.
Church K.W., Gale W., Hanks P., Hindle D. (1991) Parsing, Word Associations, and Typical Predicate-Argument Relations, In Current Issues in Parsing Technology, Tornita M. (ed), chap. 7, pp. 103–112
Daugaard, J. Kirchmeier-Andersen, S. Schosler, L. (1992) Parsing Large Scale Corpora for Valency Information, Holmboe, H. (ed.) SPS Nr 6 Aarhus Univ., pp. 181–194
Dorr, B.J., Jones, D. (1996) Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues, Proc. COLING 96, Copenhagen, Denmark, Vol. I, pp. 322–327
Ejerhed, E., Källgren, G., Wennstedt, G., Aström, M. (1992) The Linguistic Annotation of the Stockholm-Ume, Corpus Project, Technical Report No 33, University of Ume
Ejerhed, E. (1993) En ytstrukturgrammatik för svenska, Publication 27, Department of General Linguistics, University of Umeâ
Grishman R., Macleod, C., Meyers, A. (1994) Comlex Syntax: Building a Computational Lexicon, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 268–272
Hajicova, E. (1994a) Grammatical Data in the Lexicon, In Computational Approaches to the Lexicon,Atkins, B.T.S. and Zampolli, A. (eds)
Hajicova, E., Rosen, A. (1994b) Machine Readable Dictionary as a source of Grammatical Information, In Current Issues in Computational Linguistics, In Honour of Don Walker, Zampolli, A., Calzolari, N., and Palmer, M. (eds). Kluwer Academic Press, Boston
Hindle, D., Rooth, M., (1994) Structural Ambiguity and Lexical, Journal of Computational Linguistics, Vol. 19. 1, pp. 103–120
Hoperoft, J.E., Ullman, J.D., (1979) Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publ. Company
Ide N., Veronis, J. (1995) Knowledge Extraction from Machine-Readable Dictionaries: An Evaluation, Machine Translation and the Lexicon, Proc. of the 3rd International EAMT Workshop, Heidelberg, Germany, Steffens P. (ed), pp. 19–34, Springer-Verlag
Johansson Kokkinakis, S., Kokkinakis, D. (1996) Rule-Based Tagging in Sprâkbanken,Research Reports from the Department of Swedish, Göteborg University, GU-ISS96–5
Järborg, J. (1988) Towards a Formalized Lexicon of Swedish, Studies in Computer-Aided Lexicology, Data Linguistica 18, Dept. of Swedish Language, Göteborg University, pp. 140–158
Järborg, J. (1990) Användning av Syzz Tag,Research Reports from the Department of Swedish, Göteborg University, Feb. 1990, (In Swedish)
Järborg, J. (1997) Semantiska Roller i Definitioner, Research Reports from the Depart- ment of Swedish, Göteborg University, Nov. 1997 ( Under Preparation, In Swedish )
Kokkinakis, D., Johansson Kokkinakis, S. (1997) A Robust, Modularized Lemmatizer/Tagger for Swedish Based on Large Lexical Resources, Research Reports from the Department of Swedish, Göteborg University, GU-ISS-97–1
Leech, G., Wilson, A. (1996) EAGLES Recommendations for the Morphosyntactic Annotation of Corpora, Document: EAG-TCWG-MAC/R
Magerman, D.M., Marcus, M. (1990) Parsing a Natural Language Using Mutual Information Statistics, Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989, Boston, Massachusetts, AAAI Press/The MIT Press
Manning, C.D. (1993) Automatic Acquisition of a Large Subcategorization Dictionary from Corpora, Proc. of the 31st ACL
Marcus, M., Kim, G., Marcinkiewicz, M.A., Maclntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, M. (1994) The Penn Treebank: Annotating Predicate Argument Structure, ARPA ‘84
Miller, G., Beckwith, R., Fellbaum, C., Gross D., Miller, K. (1993) Introduction to WordNet: An On-line Lexical Database,Cognitive Science Laboratory, Princeton University, http://www.cogsci.princeton.edu/ ’ wn/w3wn.html. Site visited 17/09/97
Pollard, C., Sag, I.A. (1994) Head-Driven Phrase Structure Grammar, Studies in Contemporary Linguistics, The Univ. of Chicago Press
Poznanski, V., Sanfilippo, A. (1996) Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora, In Corpus Processing for Language Acquisition,Boguraev, B. and Pustejovsky, J. (eds), pp. 175–190, Bradford MIT Press
Sanfilippo, A. (1996) Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 273–277
Sinclair, J.M. (1987) Grammar in the Dictionary, Chapter 5, In Looking Up-An Account of the COBUILD Project in Lexical Computing,Collins Publishers
Toporowska Gronostaj, M. (1991) Mot ett formaliserat vertmalenslexikon,PhL Report, Sprâkdata, Göteborg University, (In Swedish)
Yarowsky, D. (1992) Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora, Proc, COLING-92, Nantes, France, Vol. 2, pp. 454–460
Zernik, U., Jacobs, P. (1990) Tagging foi- Learning: Collecting Thematic Relations from Corpus, Proc. COLING 90, Vol. 1, pp. 34–39
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Kokkinakis, D. (1999). Corpus-Based Argument Identification Using a Statistically Enriched Valency MRD. In: Saint-Dizier, P. (eds) Predicative Forms in Natural Language and in Lexical Knowledge Bases. Text, Speech and Language Technology, vol 6. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2746-4_7
Download citation
DOI: https://doi.org/10.1007/978-94-017-2746-4_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5146-2
Online ISBN: 978-94-017-2746-4
eBook Packages: Springer Book Archive