Skip to main content

Corpus-Based Argument Identification Using a Statistically Enriched Valency MRD

  • Chapter
Predicative Forms in Natural Language and in Lexical Knowledge Bases

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 6))

  • 155 Accesses

Abstract

This chapter describes a system for automatically acquiring sub-categorization information for Swedish verbs from corpora by using available lexical resources in order to reduce the effects of the knowledge acquisition bottleneck. I emphasize the need of such knowledge and focus on the automatic validation and qualitative and quantitative completion of the content of these resources, as far as that is possible. The aim of this work is to maintain a complete knowledge-base, which can be easily updated and extended with minimal human supervision. This knowledge is of vital importance in a broad range of applications, such as lexically specified grammars, e.g. HPSG, parsing and language learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abney, S. (1994) Partial Parsing, Tutorial ANLP-91 4, Stuttgart

    Google Scholar 

  • Abney, S. (1996) Part of Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech,Young, S. and Bloothooft, G. (eds), Vol. 2 of the Text, Speech and Language Technology series, Kluwer Academic Publishers, Dodrecht

    Google Scholar 

  • Adriaens, G., de Braekeleer, G. (1992) Converting Large On-Line Valency Dictionaries for NLP Applications: from PROTON Descriptions to METAL Frames, Proc. COLING 92, Nantes, France, Vol. 3, pp. 1182–1186

    Google Scholar 

  • Aone, C., McKee, D. (1996) Acquiring Predicate-Argument Mapping Information from Multilingual Texts, In Corpus Processing for Language Acquisition,Boguraev B. and Pustejovsky J. (eds), pp. 191–202, Bradford MIT Press

    Google Scholar 

  • Basili, R., Pazienza, M.T., Velardi, P. (1992) A Shallow Syntactic Analyser to Extract Word Associations from Corpora, Journal of the Association for Literary and Linguistic Computing, Vol. 7. 2, pp. 113–123

    Article  Google Scholar 

  • Brent, M. (1991) Automatic Acquisition of Subcategorization Frames from Untagged text, Proc. of the 29th ACL, Univ. of California, Berkeley, pp. 209–214

    Google Scholar 

  • Brent, M. (1993) From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax, Journal of Computational Linguistics, Vol. 192, pp. 243–262

    Google Scholar 

  • Brent, M. (1994) Surface Cues and Robust Inference as a Basis for the Early Acquisition of Subcategorization Frames, Lingua 92, pp. 433–470

    Article  Google Scholar 

  • Brill, E. (1995) Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Journal of Computational Linguistics, Vol. 21. 4, pp. 543–565

    Google Scholar 

  • Briscoe, T., Carroll, J. (1994) Towards Automatic Extraction of Argument Structure from Corpora, Rank Xerox Technical report MLTT-006

    Google Scholar 

  • Brown, P., Della Pietra S., Della Pietra, V., Mercer, R. (1991) Word-Sense Disambiguation Using Statistical Methods, Proc. 29th ACL, Univ. of California, Berkeley, pp. 264–270

    Google Scholar 

  • Calzolari, N. (1991) Lexical Databases and Textual Corpora: Perspectives of Integration for a Lexical Knowledge-Base, In Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Zernik U. (ed), chap. 8, pp. 191–208, Lawrence Erlbaum Assoc. Publ.

    Google Scholar 

  • Church K.W., Gale W., Hanks P., Hindle D. (1991) Parsing, Word Associations, and Typical Predicate-Argument Relations, In Current Issues in Parsing Technology, Tornita M. (ed), chap. 7, pp. 103–112

    Google Scholar 

  • Daugaard, J. Kirchmeier-Andersen, S. Schosler, L. (1992) Parsing Large Scale Corpora for Valency Information, Holmboe, H. (ed.) SPS Nr 6 Aarhus Univ., pp. 181–194

    Google Scholar 

  • Dorr, B.J., Jones, D. (1996) Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues, Proc. COLING 96, Copenhagen, Denmark, Vol. I, pp. 322–327

    Google Scholar 

  • Ejerhed, E., Källgren, G., Wennstedt, G., Aström, M. (1992) The Linguistic Annotation of the Stockholm-Ume, Corpus Project, Technical Report No 33, University of Ume

    Google Scholar 

  • Ejerhed, E. (1993) En ytstrukturgrammatik för svenska, Publication 27, Department of General Linguistics, University of Umeâ

    Google Scholar 

  • Grishman R., Macleod, C., Meyers, A. (1994) Comlex Syntax: Building a Computational Lexicon, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 268–272

    Google Scholar 

  • Hajicova, E. (1994a) Grammatical Data in the Lexicon, In Computational Approaches to the Lexicon,Atkins, B.T.S. and Zampolli, A. (eds)

    Google Scholar 

  • Hajicova, E., Rosen, A. (1994b) Machine Readable Dictionary as a source of Grammatical Information, In Current Issues in Computational Linguistics, In Honour of Don Walker, Zampolli, A., Calzolari, N., and Palmer, M. (eds). Kluwer Academic Press, Boston

    Google Scholar 

  • Hindle, D., Rooth, M., (1994) Structural Ambiguity and Lexical, Journal of Computational Linguistics, Vol. 19. 1, pp. 103–120

    Google Scholar 

  • Hoperoft, J.E., Ullman, J.D., (1979) Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Publ. Company

    Google Scholar 

  • Ide N., Veronis, J. (1995) Knowledge Extraction from Machine-Readable Dictionaries: An Evaluation, Machine Translation and the Lexicon, Proc. of the 3rd International EAMT Workshop, Heidelberg, Germany, Steffens P. (ed), pp. 19–34, Springer-Verlag

    Google Scholar 

  • Johansson Kokkinakis, S., Kokkinakis, D. (1996) Rule-Based Tagging in Sprâkbanken,Research Reports from the Department of Swedish, Göteborg University, GU-ISS96–5

    Google Scholar 

  • Järborg, J. (1988) Towards a Formalized Lexicon of Swedish, Studies in Computer-Aided Lexicology, Data Linguistica 18, Dept. of Swedish Language, Göteborg University, pp. 140–158

    Google Scholar 

  • Järborg, J. (1990) Användning av Syzz Tag,Research Reports from the Department of Swedish, Göteborg University, Feb. 1990, (In Swedish)

    Google Scholar 

  • Järborg, J. (1997) Semantiska Roller i Definitioner, Research Reports from the Depart- ment of Swedish, Göteborg University, Nov. 1997 ( Under Preparation, In Swedish )

    Google Scholar 

  • Kokkinakis, D., Johansson Kokkinakis, S. (1997) A Robust, Modularized Lemmatizer/Tagger for Swedish Based on Large Lexical Resources, Research Reports from the Department of Swedish, Göteborg University, GU-ISS-97–1

    Google Scholar 

  • Leech, G., Wilson, A. (1996) EAGLES Recommendations for the Morphosyntactic Annotation of Corpora, Document: EAG-TCWG-MAC/R

    Google Scholar 

  • Magerman, D.M., Marcus, M. (1990) Parsing a Natural Language Using Mutual Information Statistics, Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989, Boston, Massachusetts, AAAI Press/The MIT Press

    Google Scholar 

  • Manning, C.D. (1993) Automatic Acquisition of a Large Subcategorization Dictionary from Corpora, Proc. of the 31st ACL

    Google Scholar 

  • Marcus, M., Kim, G., Marcinkiewicz, M.A., Maclntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, M. (1994) The Penn Treebank: Annotating Predicate Argument Structure, ARPA ‘84

    Google Scholar 

  • Miller, G., Beckwith, R., Fellbaum, C., Gross D., Miller, K. (1993) Introduction to WordNet: An On-line Lexical Database,Cognitive Science Laboratory, Princeton University, http://www.cogsci.princeton.edu/ ’ wn/w3wn.html. Site visited 17/09/97

    Google Scholar 

  • Pollard, C., Sag, I.A. (1994) Head-Driven Phrase Structure Grammar, Studies in Contemporary Linguistics, The Univ. of Chicago Press

    Google Scholar 

  • Poznanski, V., Sanfilippo, A. (1996) Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora, In Corpus Processing for Language Acquisition,Boguraev, B. and Pustejovsky, J. (eds), pp. 175–190, Bradford MIT Press

    Google Scholar 

  • Sanfilippo, A. (1996) Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation, Proc. COLING 94, Kyoto, Japan, Vol. I, pp. 273–277

    Google Scholar 

  • Sinclair, J.M. (1987) Grammar in the Dictionary, Chapter 5, In Looking Up-An Account of the COBUILD Project in Lexical Computing,Collins Publishers

    Google Scholar 

  • Toporowska Gronostaj, M. (1991) Mot ett formaliserat vertmalenslexikon,PhL Report, Sprâkdata, Göteborg University, (In Swedish)

    Google Scholar 

  • Yarowsky, D. (1992) Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora, Proc, COLING-92, Nantes, France, Vol. 2, pp. 454–460

    Google Scholar 

  • Zernik, U., Jacobs, P. (1990) Tagging foi- Learning: Collecting Thematic Relations from Corpus, Proc. COLING 90, Vol. 1, pp. 34–39

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Kokkinakis, D. (1999). Corpus-Based Argument Identification Using a Statistically Enriched Valency MRD. In: Saint-Dizier, P. (eds) Predicative Forms in Natural Language and in Lexical Knowledge Bases. Text, Speech and Language Technology, vol 6. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2746-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2746-4_7

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5146-2

  • Online ISBN: 978-94-017-2746-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics