Corpus-driven unsupervised learning of verb subcategorization frames

  • Roberto Basili
  • Maria Teresa Pazienza
  • Michele Vindigni
Machine Learning 2
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1321)


The behavior of verbs in sublanguages is highly specific and does not follow general principles of lexical decomposition. NLP applications require specific lexicons for tasks like surface parsing and shallow semantic interpretation. The reduced set of verbal senses specific to a given domain is more appropriate for efficient processing in real world tasks (e.g. information extraction and retrieval). In this paper a method for learning verb subcategorization patterns from corpora is proposed. Conceptual clustering techniques are applied to the results of surface parsing in order to extract relevant domain typical senses and automatically build a lexicon of subcategorization frames. The aim is to learn a core of lexico-grammatical knowledge suitable to support more sophisticated parsing strategies to be applied in a target NLP application. Results derived for the Italian language from several corpora are presented.


Conceptual Cluster Legal Domain Syntactic Relation Grammatical Relation Galois Lattice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. (Basili 1992).
    Basili R., M.T. Pazienza, P. Velardi, A shallow syntactic analyser to extract word associations from corpora, Literary and Linguistic Computing, 1992, vol.7, n. 2, 114–124.Google Scholar
  2. (Basili 1993).
    Basili R., M.T. Pazienza, P. Velardi, What can be learned from raw texts?, Journal of Machine Translation, 8:147–173, 1993.Google Scholar
  3. (Basili 1994).
    Basili R., A. Marziali, M.T. Pazienza, Modelling Syntactic Uncertainty in Lexical Acquisition from Texts, Journal of Quantitative Linguistics, 1,1: 62–81,1994.Google Scholar
  4. (Basili et al.,1996)
    Basili R., Marziali A., Pazienza M.T., and Velardi P, Unsupervised learning of syntactic knowledge: Methods and measures,in Proceedings of the International Conference on Empirical Methods in Natural Language Processing, Philadelfia, Pennsylvania, 1996.Google Scholar
  5. (Basili 1997).
    Basili R., M.T. Pazienza, P. Velardi, Integrating General Purpose and Corpus-based Verb Classifications, to appear in Computationa Linguistics 1997.Google Scholar
  6. (Brent 1992).
    Brent M. R. Automatic Acquisition of Subcategorisation Frames from Unrestricted English, PhD Thesis, 1989.Google Scholar
  7. (Carpineto 1993).
    Carpineto C., Romano G. GALOIS: An order-theoretic approach to conceptual clustering, Fondazione Ugo Bordoni, 1993.Google Scholar
  8. (Chomsky 1965).
    Chomsky N., Aspects of the Theory of Syntax, MIT Press, Cambridge, MA, 1965.Google Scholar
  9. (Chomsky 1981).
    Chomsky N., Lectures on Government and Binding, Foris Publications, Dordrecht, 1981.Google Scholar
  10. (Del Monte, Dolci,1989).
    Del Monte, R. and Dolci, R. “ Parsing Italian with a Contextfree recogniser” Annali di Ca' Foscari XXVIII,1–2, 1989.Google Scholar
  11. (Gennari 1989).
    Gennari J. H. & Langley P. & Fisher D. H. Models of incremental concept formation, Artificial Intelligence, 40, 11–61, 1989.Google Scholar
  12. (Gazdar 1985).
    Gazdar G. Klein E., Pullum K. Sag I. Developments in GPSG theory Indiana University Linguistics, 38–68, 1985Google Scholar
  13. (Grimshaw 1977).
    Grimshaw J. Complement selection and the Lexicon. Linguistic Inquiry 10 (2):279–326, 1977.Google Scholar
  14. (Kaplan and Bresnan 1982).
    Kaplan R., Bresnan J. Lexical-Functional Grammar: A Formal System for Grammatical Representation, in J. Bresnan Ed., The Mental Representation of Grammatical Relations, MIT Press, Cambridge, MA, 1982.Google Scholar
  15. (Pollard 1987).
    Pollard C. & Sag I. Information-Based Syntax and Semantics, CSLI Lecture Note Series, Chicago, 1987Google Scholar
  16. (Pollard 1994).
    Pollard C. & Sag I. Head-Driven Phrase Structure Grammar, CSLI Lecture Note Series, Chicago, 1994Google Scholar
  17. (XTAG,1995).
    XTAG Research Group. A Lexicalized Tree Adjoining Grammar for English, Technical Report IRCS 95-03, University of Pennsylvania, 1995.Google Scholar
  18. (Zingarelli 1970).
    N. Zingarelli, Vocabolario della lingua italiana, 1970Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Roberto Basili
    • 1
  • Maria Teresa Pazienza
    • 1
  • Michele Vindigni
    • 1
  1. 1.Department of Computer Science, System and ProductionUniversity of Roma, Tor VergataRomaItaly

Personalised recommendations