Corpus-Based Argument Identification Using a Statistically Enriched Valency MRD

Kokkinakis, Dimitrios

doi:10.1007/978-94-017-2746-4_7

Dimitrios Kokkinakis⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 6))

155 Accesses

Abstract

This chapter describes a system for automatically acquiring sub-categorization information for Swedish verbs from corpora by using available lexical resources in order to reduce the effects of the knowledge acquisition bottleneck. I emphasize the need of such knowledge and focus on the automatic validation and qualitative and quantitative completion of the content of these resources, as far as that is possible. The aim of this work is to maintain a complete knowledge-base, which can be easily updated and extended with minimal human supervision. This knowledge is of vital importance in a broad range of applications, such as lexically specified grammars, e.g. HPSG, parsing and language learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S. (1994) Partial Parsing, Tutorial ANLP-91 4, Stuttgart
Google Scholar
Abney, S. (1996) Part of Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech,Young, S. and Bloothooft, G. (eds), Vol. 2 of the Text, Speech and Language Technology series, Kluwer Academic Publishers, Dodrecht
Google Scholar
Adriaens, G., de Braekeleer, G. (1992) Converting Large On-Line Valency Dictionaries for NLP Applications: from PROTON Descriptions to METAL Frames, Proc. COLING 92, Nantes, France, Vol. 3, pp. 1182–1186
Google Scholar
Aone, C., McKee, D. (1996) Acquiring Predicate-Argument Mapping Information from Multilingual Texts, In Corpus Processing for Language Acquisition,Boguraev B. and Pustejovsky J. (eds), pp. 191–202, Bradford MIT Press
Google Scholar
Basili, R., Pazienza, M.T., Velardi, P. (1992) A Shallow Syntactic Analyser to Extract Word Associations from Corpora, Journal of the Association for Literary and Linguistic Computing, Vol. 7. 2, pp. 113–123
Article Google Scholar
Brent, M. (1991) Automatic Acquisition of Subcategorization Frames from Untagged text, Proc. of the 29th ACL, Univ. of California, Berkeley, pp. 209–214
Google Scholar
Brent, M. (1993) From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax, Journal of Computational Linguistics, Vol. 192, pp. 243–262
Google Scholar
Brent, M. (1994) Surface Cues and Robust Inference as a Basis for the Early Acquisition of Subcategorization Frames, Lingua 92, pp. 433–470
Article Google Scholar
Brill, E. (1995) Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Journal of Computational Linguistics, Vol. 21. 4, pp. 543–565
Google Scholar
Briscoe, T., Carroll, J. (1994) Towards Automatic Extraction of Argument Structure from Corpora, Rank Xerox Technical report MLTT-006
Google Scholar
Brown, P., Della Pietra S., Della Pietra, V., Mercer, R. (1991) Word-Sense Disambiguation Using Statistical Methods, Proc. 29th ACL, Univ. of California, Berkeley, pp. 264–270
Google Scholar
Calzolari, N. (1991) Lexical Databases and Textual Corpora: Perspectives of Integration for a Lexical Knowledge-Base, In Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Zernik U. (ed), chap. 8, pp. 191–208, Lawrence Erlbaum Assoc. Publ.
Google Scholar
Church K.W., Gale W., Hanks P., Hindle D. (1991) Parsing, Word Associations, and Typical Predicate-Argument Relations, In Current Issues in Parsing Technology, Tornita M. (ed), chap. 7, pp. 103–112
Google Scholar
Daugaard, J. Kirchmeier-Andersen, S. Schosler, L. (1992) Parsing Large Scale Corpora for Valency Information, Holmboe, H. (ed.) SPS Nr 6 Aarhus Univ., pp. 181–194
Google Scholar
Dorr, B.J., Jones, D. (1996) Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues, Proc. COLING 96, Copenhagen, Denmark, Vol. I, pp. 322–327
Google Scholar
Ejerhed, E., Källgren, G., Wennstedt, G., Aström, M. (1992) The Linguistic Annotation of the Stockholm-Ume, Corpus Project, Technical Report No 33, University of Ume
Google Scholar
Ejerhed, E. (1993) En ytstrukturgrammatik för svenska, Publication 27, Department of General Linguistics, University of Umeâ
Google Scholar
Grishman R., Macleod, C., Meyers, A. (1994) Comlex Syntax: Building a Computational Lexicon, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 268–272
Google Scholar
Hajicova, E. (1994a) Grammatical Data in the Lexicon, In Computational Approaches to the Lexicon,Atkins, B.T.S. and Zampolli, A. (eds)
Google Scholar
Hajicova, E., Rosen, A. (1994b) Machine Readable Dictionary as a source of Grammatical Information, In Current Issues in Computational Linguistics, In Honour of Don Walker, Zampolli, A., Calzolari, N., and Palmer, M. (eds). Kluwer Academic Press, Boston
Google Scholar
Hindle, D., Rooth, M., (1994) Structural Ambiguity and Lexical, Journal of Computational Linguistics, Vol. 19. 1, pp. 103–120
Google Scholar
Hoperoft, J.E., Ullman, J.D., (1979) Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publ. Company
Google Scholar
Ide N., Veronis, J. (1995) Knowledge Extraction from Machine-Readable Dictionaries: An Evaluation, Machine Translation and the Lexicon, Proc. of the 3rd International EAMT Workshop, Heidelberg, Germany, Steffens P. (ed), pp. 19–34, Springer-Verlag
Google Scholar
Johansson Kokkinakis, S., Kokkinakis, D. (1996) Rule-Based Tagging in Sprâkbanken,Research Reports from the Department of Swedish, Göteborg University, GU-ISS96–5
Google Scholar
Järborg, J. (1988) Towards a Formalized Lexicon of Swedish, Studies in Computer-Aided Lexicology, Data Linguistica 18, Dept. of Swedish Language, Göteborg University, pp. 140–158
Google Scholar
Järborg, J. (1990) Användning av Syzz Tag,Research Reports from the Department of Swedish, Göteborg University, Feb. 1990, (In Swedish)
Google Scholar
Järborg, J. (1997) Semantiska Roller i Definitioner, Research Reports from the Depart- ment of Swedish, Göteborg University, Nov. 1997 ( Under Preparation, In Swedish )
Google Scholar
Kokkinakis, D., Johansson Kokkinakis, S. (1997) A Robust, Modularized Lemmatizer/Tagger for Swedish Based on Large Lexical Resources, Research Reports from the Department of Swedish, Göteborg University, GU-ISS-97–1
Google Scholar
Leech, G., Wilson, A. (1996) EAGLES Recommendations for the Morphosyntactic Annotation of Corpora, Document: EAG-TCWG-MAC/R
Google Scholar
Magerman, D.M., Marcus, M. (1990) Parsing a Natural Language Using Mutual Information Statistics, Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989, Boston, Massachusetts, AAAI Press/The MIT Press
Google Scholar
Manning, C.D. (1993) Automatic Acquisition of a Large Subcategorization Dictionary from Corpora, Proc. of the 31st ACL
Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M.A., Maclntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, M. (1994) The Penn Treebank: Annotating Predicate Argument Structure, ARPA ‘84
Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross D., Miller, K. (1993) Introduction to WordNet: An On-line Lexical Database,Cognitive Science Laboratory, Princeton University, http://www.cogsci.princeton.edu/ ’ wn/w3wn.html. Site visited 17/09/97
Google Scholar
Pollard, C., Sag, I.A. (1994) Head-Driven Phrase Structure Grammar, Studies in Contemporary Linguistics, The Univ. of Chicago Press
Google Scholar
Poznanski, V., Sanfilippo, A. (1996) Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora, In Corpus Processing for Language Acquisition,Boguraev, B. and Pustejovsky, J. (eds), pp. 175–190, Bradford MIT Press
Google Scholar
Sanfilippo, A. (1996) Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 273–277
Google Scholar
Sinclair, J.M. (1987) Grammar in the Dictionary, Chapter 5, In Looking Up-An Account of the COBUILD Project in Lexical Computing,Collins Publishers
Google Scholar
Toporowska Gronostaj, M. (1991) Mot ett formaliserat vertmalenslexikon,PhL Report, Sprâkdata, Göteborg University, (In Swedish)
Google Scholar
Yarowsky, D. (1992) Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora, Proc, COLING-92, Nantes, France, Vol. 2, pp. 454–460
Google Scholar
Zernik, U., Jacobs, P. (1990) Tagging foi- Learning: Collecting Thematic Relations from Corpus, Proc. COLING 90, Vol. 1, pp. 34–39
Google Scholar

Download references

Author information

Authors and Affiliations

Språkdata/Dept. of Swedish Language, Göteborg University, SE 405 20, Sweden
Dimitrios Kokkinakis

Authors

Dimitrios Kokkinakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT-CNRS, Toulouse, France
Patrick Saint-Dizier

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kokkinakis, D. (1999). Corpus-Based Argument Identification Using a Statistically Enriched Valency MRD. In: Saint-Dizier, P. (eds) Predicative Forms in Natural Language and in Lexical Knowledge Bases. Text, Speech and Language Technology, vol 6. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2746-4_7

Download citation

DOI: https://doi.org/10.1007/978-94-017-2746-4_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5146-2
Online ISBN: 978-94-017-2746-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics