Learning verbal transitivity using loglinear models

  • Nuno Miguel Marques
  • Gabriel Pereira Lopes
  • Carlos Agra Coelho
Regular Papers Applications of ML
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


In this paper we show how loglinear models can be used to cluster verbs based on their subcategorization preferences. We describe how the information about the phrases or clauses a verb goes with can be computationally learned from an automatically tagged corpus with 9,333,555 words. We will use loglinear modeling to describe the relation between the acquired counts for the part-of-speech tags co-occurring with the verbs on predetermined positions.Based on these results an unsupervised clustering algorithm will be proposed.


Subcategorization Learning from Corpora Loglinear Modeling Clustering Natural Language Processing 


  1. [Agr90]
    Alan Agresti. Categorical Data Analysis. John Wiley and Sons, 1990.Google Scholar
  2. [BC97).
    Ted Briscoe and John Carroll. Automatic extraction of subcategorization from corpora. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP'97), 1997.Google Scholar
  3. [Bre93]
    Michael R. Brent. From grammar to lexicon: Unsupervised learning of lexical syntax. Computacional Linguistics, 19(2):245–262, 1993.Google Scholar
  4. [Fra96.
    Alexander Franz. Automatic Ambiguity Resolution in Natural Language Processing, volume 1171 of Lecture Notes in Artificial Intelligence. Springer, 1996.Google Scholar
  5. [Hea88).
    M. J. R. Healy. GLIM: An Introduction. Clarendon Press, Oxford, 1988.Google Scholar
  6. [Man93.
    Cristopher Manning. Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of ACL, pages 235–242, 1993.Google Scholar
  7. [ML96]
    Nuno C. Marques and José Gabriel Lopes. A neural network approach to part-of-speech tagging. In Proceedings of the Second Workshop on Computational Processing of Written and Spoken Portuguese, pages 1–9, Curitiba, Brazil, October 21–22 1996.Google Scholar
  8. [MLC98]
    Nuno Miguel Cavalheiro Marques, Gabriel Pereira Lopes, and Carlos Agra Coelho. Using loglinear clustering for subcategorization identification. In Coling-ACL, submitted paper, Available at http:\\wwwssdi.di.fct.unl.pt\~nmm 1998.Google Scholar
  9. [UEGW96]
    Akira Ushioda, David Evans, Ted Gibson, and Alex Waibel. Estimation of verb subcategorization frame frequencies based on syntactic and multidimensional statistical analysis. In Harry Bunt and Masaru Tomita, editors, Recent Advances in Parsing Technology. Kluwer Academic Publishers, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Nuno Miguel Marques
    • 1
  • Gabriel Pereira Lopes
    • 1
  • Carlos Agra Coelho
    • 2
  1. 1.Dep. InfórmaticaFCT/UNLSpain
  2. 2.Dep. MatemiticaISA/UTLSpain

Personalised recommendations