Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information

  • Maryam Aminian
  • Mohammad Sadegh Rasooli
  • Hossein Sameti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7912)

Abstract

Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin’s theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a manual classification of 265 verbs into 43 semantic classes. We show that spectral clustering algorithm outperforms KMeans and improves on the baseline algorithm with about 17% in Fmeasure and 0.13 in Rand index.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Shamsfard, M.: Challenges and open problems in Persian text processing. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 65–69 (2011)Google Scholar
  2. 2.
    Rasooli, M.S., Kashefi, O., Minaei-Bidgoli, B.: Effect of adaptive spell checking in Persian. In: 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 161–164 (2011)Google Scholar
  3. 3.
    Karimi-Doostan, G.: Lexical categories in Persian. Lingua 121(2), 207–220 (2011)CrossRefGoogle Scholar
  4. 4.
    Karimi-Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)CrossRefGoogle Scholar
  5. 5.
    Agirre, E., Bengoetxea, K., Gojenola, K., Nivre, J.: Improving dependency parsing with semantic classes. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT), Portland, Oregon, USA, pp. 699–703 (June 2011)Google Scholar
  6. 6.
    Chen, J., Palmer, M.: Improving english verb sense disambiguation performance with linguistically motivated features and clear sense distinction boundaries. Language Resources and Evaluation 43(2), 181–208 (2009)CrossRefGoogle Scholar
  7. 7.
    Korhonen, A.: Semantically motivated subcategorization acquisition. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, USA, pp. 51–58 (2002)Google Scholar
  8. 8.
    Titov, I., Klementiev, A.: A Bayesian approach to unsupervised semantic role induction. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Avignon, France, pp. 12–22 (April 2012)Google Scholar
  9. 9.
    Levin, B.: English verb classes and alternations: A preliminary investigation, vol. 348. University of Chicago press (1993)Google Scholar
  10. 10.
    Rasooli, M.S., Kouhestani, M., Moloodi, A.: Development of a persian syntactic dependency treebank. In: The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA (2013)Google Scholar
  11. 11.
    Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)Google Scholar
  12. 12.
    Alpert, C., Kahng, A., Yao, S.: Spectral partitioning with multiple eigenvectors. Discrete Applied Mathematics 90(1), 3–26 (1999)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Schulte im Walde, S.: Experiments on the automatic induction of German semantic verb classes. Computational Linguistics 32(2), 159–194 (2006)Google Scholar
  14. 14.
    Rasooli, M.S., Moloodi, A., Kouhestani, M., Minaei-Bidgoli, B.: A syntactic valency lexicon for Persian verbs: The first steps towards Persian dependency treebank. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 227–231 (2011)Google Scholar
  15. 15.
    Schulte Im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany, vol. 2, pp.747–753 (2000)Google Scholar
  16. 16.
    Resnik, P.: Selectional preference and sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington DC., USA, pp. 52–57 (1997)Google Scholar
  17. 17.
    Brew, C.,Schulte im Walde, S.: Spectral clustering for German verbs. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, USA, pp. 117–124 (2002)Google Scholar
  18. 18.
    Schulte im Walde, S., Brew, C.: Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 223–230 (July 2002)Google Scholar
  19. 19.
    Sun, L., Korhonen, A., Krymolowski, Y.: Verb class discovery from rich syntactic data. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 16–27. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    Sun, L., Korhonen, A., Krymolowski, Y.: Automatic classification of English verbs using rich syntactic features. In: Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, pp. 769–774 (2008)Google Scholar
  21. 21.
    Sun, L., Korhonen, A.: Improving verb clustering with automatically acquired selectional preferences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suntec, Singapore, vol. 2, pp. 638–647 (2009)Google Scholar
  22. 22.
    Lapata, M., Brew, C.: Verb class disambiguation using informative priors. Computational Linguistics 30(1), 45–73 (2004)MATHCrossRefGoogle Scholar
  23. 23.
    Korhonen, A., Krymolowski, Y., Marx, Z.: Clustering polysemic subcategorization frame distributions semantically. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL), Sapporo, Japan, vol. 1, pp. 64–71 (2003)Google Scholar
  24. 24.
    Sun, L., Korhonen, A.: Hierarchical verb clustering using graph factorization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1023–1033. Association for Computational Linguistics (2011)Google Scholar
  25. 25.
    Vlachos, A., Korhonen, A., Ghahramani, Z.: Unsupervised and constrained Dirichlet process mixture models for verb clustering. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS), Athens, Greece, pp. 74–82 (2009)Google Scholar
  26. 26.
    Saeedi, P., Faili, H.: Feature engineering using shallow parsing in argument classification of Persian verbs. In: Proceedings of the 16th CSI International Symposiums on Artificial Intelligence and Signal Processing (AISP 2012), Shiraz, Iran (2012)Google Scholar
  27. 27.
    Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)Google Scholar
  28. 28.
    Kashefi, O., Nasri, M., Kanani, K.: Automatic Spell Checking in Persian Language. Supreme Council of Information and Communication Technology (SCICT), Tehran, Iran (2010)Google Scholar
  29. 29.
    Rasooli, M.S., Faili, H., Minaei-Bidgoli, B.: Unsupervised identification of persian compound verbs. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS (LNAI), vol. 7094, pp. 394–406. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  30. 30.
    McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL), Sydney, Australia, pp. 91–98 (2005)Google Scholar
  31. 31.
    Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. In: Artificial Intelligence and Statistics, vol. 2001, pp. 65–72 (2001)Google Scholar
  32. 32.
    Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 2, pp. 849–856 (2002)Google Scholar
  33. 33.
    Croce, D., Moschitti, A., Basili, R., Palmer, M.: Verb classification using distributional similarity in syntactic and semantic structures. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), Jeju Island, Korea (2012)Google Scholar
  34. 34.
    Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S.: Semi Automatic Development of FarsNet; the Persian WordNet. In: Proceedings of 5th Global WordNet Conference, Mumbai, India (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Maryam Aminian
    • 1
  • Mohammad Sadegh Rasooli
    • 2
  • Hossein Sameti
    • 1
  1. 1.Department of Computer EngineeringSharif University of TechnologyTehranIran
  2. 2.Department of Computer ScienceColumbia UniversityNew YorkUSA

Personalised recommendations