Dependency Language Modeling Using KNN and PLSI

  • Hiram Calvo
  • Kentaro Inui
  • Yuji Matsumoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5845)


In this paper we present a comparison of two language models based on dependency triples. We explore using the verb only for predicting the most plausible argument as in selectional preferences, as well as using both the verb and argument for predicting another argument. This latter causes a problem of data sparseness that must be solved by different techniques for data smoothing. Based on our results on the K-Nearest Neighbor model (KNN) algorithm we conclude that adding more information is useful for attaining higher precision, while the PLSI model was inconveniently sensitive to this information, yielding better results for the simpler model (using the verb only). Our results suggest that combining the strengths of both algorithms would provide best results.


Data Sparseness Selectional Preference Probabilistic Latent Semantic Analysis Plausible Argument Human Language Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Martinez, D.: Learning class-to-class selectional preferences. In: Workshop on Computational Natural Language Learning, ACL (2001)Google Scholar
  2. 2.
    Bolshakov, I.A., Galicia-Haro, S.N., Gelbukh, A.: Detection and Correction of Malapropisms in Spanish by means of Internet Search. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 115–122. Springer, Heidelberg (2005)Google Scholar
  3. 3.
    Budanitsky, E., Graeme, H.: Semantic distance in WorldNet: An experimental, application-oriented evaluation of five measures. In: NAACL Workshop on WordNet and other lexical resources (2001)Google Scholar
  4. 4.
    Calvo, H., Gelbukh, A., Kilgarriff, A.: Automatic Thesaurus vs. WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 177–188. Springer, Heidelberg (2005)Google Scholar
  5. 5.
    Clarkson, P.R., Rosenfeld, R.: Statistical Language Modeling Using the CMU-Cambridge Toolkit. In: Procs. ESCA Eurospeech (1997)Google Scholar
  6. 6.
    Foley, W.A.: Anthropological linguistics: An introduction. Blackwell Publishing, Malden (1997)Google Scholar
  7. 7.
    Fuji, A., Iwayama, M. (eds.): Patent Retrieval Task (PATENT). Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (2005)Google Scholar
  8. 8.
    Gao, J., Suzuki, H.: Learning of dependency structure for language modeling. In: Procs. of the 41st Annual Meeting on Association for Computational Linguistics, Annual Meeting of the ACL archive, vol. 1 (2003)Google Scholar
  9. 9.
    Gao, J., Nie, J.Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Procs. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 170–177 (2004)Google Scholar
  10. 10.
    Gelbukh, A., Sidorov, G.: On Indirect Anaphora Resolution. PACLING 1999, pp. 181–190 (1999)Google Scholar
  11. 11.
    Hoffmann, T.: Probabilistic Latent Semantic Analysis. Uncertainity in Artificial Intelligence, UAI (1999)Google Scholar
  12. 12.
    Kawahara, D., Kurohashi, S.: Japanese Case Frame Construction by Coupling the Verb and its Closest Case Component. In: 1st Intl. Conf. on Human Language Technology Research, ACL (2001)Google Scholar
  13. 13.
    Lee, L.: Measures of Distributional Similarity. In: Procs. 37th ACL (1999)Google Scholar
  14. 14.
    Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Procs. 36th Annual Meeting of the ACL and 17th International Conference on Computational Linguistics (1998)Google Scholar
  15. 15.
    Lin, D.: Dependency-based Evaluation of MINIPAR. In: Proc. Workshop on the Evaluation of Parsing Systems (1998)Google Scholar
  16. 16.
    McCarthy, D., Carroll, J.: Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences. Computational Linguistics 29(4), 639–654 (2006)CrossRefGoogle Scholar
  17. 17.
    McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant senses in untagged text. In: Procs 42nd meeting of the ACL, pp. 280–287 (2004)Google Scholar
  18. 18.
    Padó, S., Lapata, M.: Dependency-Based Construction of Semantic Space Models. Computational Linguistics 33(2), 161–199 (2007)CrossRefGoogle Scholar
  19. 19.
    Padó, U., Crocker, M., Keller, F.: Modeling Semantic Role Plausibility in Human Sentence Processing. In: Procs. EACL (2006)Google Scholar
  20. 20.
    Ponzetto, P.S., Strube, M.: Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution. In: Procs. Human Language Technology Conference, NAACL, pp. 192–199 (2006)Google Scholar
  21. 21.
    Resnik, P.: Selectional Constraints: An Information-Theoretic Model and its Computational Realization. Cognition 61, 127–159 (1996)CrossRefGoogle Scholar
  22. 22.
    Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proceedings of the IEEE 88(8), 1270–1278 (2000)CrossRefGoogle Scholar
  23. 23.
    Salgueiro, P., Alexandre, T., Marcu, D., Volpe Nunes, M.: Unsupervised Learning of Verb Argument Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 59–70. Springer, Heidelberg (2006)Google Scholar
  24. 24.
    Weeds, J., Weir, D.: A General Framework for Distributional Similarity. In: Procs. conf on EMNLP, vol. 10, pp. 81–88 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hiram Calvo
    • 1
    • 2
  • Kentaro Inui
    • 2
  • Yuji Matsumoto
    • 2
  1. 1.Center for Computing ResearchNational Polytechnic InstituteMexico
  2. 2.Nara Institute of Science and TechnologyNaraJapan

Personalised recommendations