Unsupervised Identification of Persian Compound Verbs

  • Mohammad Sadegh Rasooli
  • Heshaam Faili
  • Behrouz Minaei-Bidgoli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7094)


One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.


multiword expression light verb constructions unsupervised identification bootstrapping K-means Persian 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)Google Scholar
  2. 2.
    Choueka, Y., Klein, T., Neuwitz, E.: Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing 4(1), 34–38 (1983)Google Scholar
  3. 3.
    Evert, S.: Corpora and collocations. In: Corpus Linguistics. An International Handbook, pp. 1212–1248 (2009)Google Scholar
  4. 4.
    Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1), 137–158 (2010)CrossRefGoogle Scholar
  5. 5.
    Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 17–22. Association for Computational Linguistics, Suntec (2009)Google Scholar
  6. 6.
    Bannard, C., Baldwin, T., Lascarides, A.: A statistical approach to the semantics of verb-particles. In: ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 65–72. Association for Computational Linguistics (2003)Google Scholar
  7. 7.
    Diab, M.T., Krishna, M.: Unsupervised Classification of Verb Noun Multi-word Expression Tokens. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 98–110. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
  9. 9.
    Sag, I., et al.: Multiword expressions: A pain in the neck for NLP. In: 6th Conference on Natural Language Learning (COLING 2002), pp. 1–15 (2002)Google Scholar
  10. 10.
    Villavicencio, A., Copestake, A.: On the nature of idioms. In: LinGO Working (2002)Google Scholar
  11. 11.
    Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)CrossRefGoogle Scholar
  12. 12.
    Villavicencio, A., Copestake, A.: Verb-particle constructions in a computational grammar of English. Citeseer (2002)Google Scholar
  13. 13.
    Karimi-Doostan, G.: Light verbs and structural case. Lingua 115(12), 1737–1756 (2005)CrossRefGoogle Scholar
  14. 14.
    Fazly, A., Stevenson, S., North, R.: Automatically learning semantic knowledge about multiword predicates. Language Resources and Evaluation 41(1), 61–89 (2007)CrossRefGoogle Scholar
  15. 15.
    Karimi-Doostan, G.: Event structure of verbal nouns and light verbs. In: Aspects of Iranian Linguistics: Papers in Honor of Mohammad Reza Bateni, pp. 209–226 (2008)Google Scholar
  16. 16.
    Fazly, A., Nematzadeh, A., Stevenson, S.: Acquiring Multiword Verbs: The Role of Statistical Evidence. In: 31st Annual Conference of the Cognitive Science Society, Amsterdam, The Netherlands, pp. 1222–1227 (2009)Google Scholar
  17. 17.
    Lin, D.: Automatic identification of non-compositional phrases. In: 37th Annual Meeting of Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics, College Park (1999)Google Scholar
  18. 18.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  19. 19.
    Pecina, P.: An extensive empirical study of collocation extraction methods. In: ACL Student Research Workshop. Association for Computational Linguistics (2005)Google Scholar
  20. 20.
    Hoang, H.H., Kim, S.N., Kan, M.-Y.: A re-examination of lexical association measures. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 31–39. Association for Computational Linguistics, Suntec (2009)Google Scholar
  21. 21.
    Krenn, B., Evert, S.: Can we do better than frequency? A case study on extracting PP-verb collocations. In: ACL Workshop on Collocations. Citeseer (2001)Google Scholar
  22. 22.
    Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: 23rd International Conference on Computational Linguistics (Coling 2010). Association for Computational Linguistics, Beijing (2010)Google Scholar
  23. 23.
    Blaheta, D., Johnson, M.: Unsupervised learning of multi-word verbs. In: 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL 39), Toulouse, France (2001)Google Scholar
  24. 24.
    Baldwin, T., Villavicencio, A.: Extracting the unextractable: A case study on verb-particles. In: 6th Conference on Natural Language Learning (COLING 2002). Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  25. 25.
    Birke, J., Sarkar, A.: A clustering approach for the nearly unsupervised recognition of nonliteral language. In: EACL 2006, pp. 329–336 (2006)Google Scholar
  26. 26.
    Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. Association for Computational Linguistics, Sydney (2006)Google Scholar
  27. 27.
    Cook, P., Fazly, A., Stevenson, S.: Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
  28. 28.
    Fazly, A., Stevenson, S.: Automatically constructing a lexicon of verb phrase idiomatic combinations. In: EACL 2006 (2006)Google Scholar
  29. 29.
    Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
  30. 30.
    Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 19–22 (2008)Google Scholar
  31. 31.
    Diab, M.T., Krishna, M.: Handling sparsity for verb noun MWE token classification. In: Workshop on Geometrical Models of Natural Language Semantics. Association for Computational Linguistics, Athens (2009)Google Scholar
  32. 32.
    Pecina, P.: A machine learning approach to multiword expression extraction. In: Shared Task for Multiword Expressions (MWE 2008), pp. 54–57 (2008)Google Scholar
  33. 33.
    Kaalep, H.-J., Muischnek, K.: Multi-word verbs of Estonian: a database and a corpus. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 23–26 (2008)Google Scholar
  34. 34.
    Bömová, A., et al.: The Prague Dependency Treebank: A three-level annotation scenario. Treebanks: Building and Using Parsed Corpora, 103–127 (2003)Google Scholar
  35. 35.
    Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)Google Scholar
  36. 36.
    Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. Citeseer (2007)Google Scholar
  37. 37.
    Dabir-Moghaddam, M.: Compound verbs in Persian. Studies in the Linguistic Sciences 27(2), 25–59 (1997)Google Scholar
  38. 38.
    Family, N.: Explorations of Semantic Space: The Case of Light Verb Constructions in Persian. In: Ecole des Hautes Etudes en Sciences Sociales, Paris, France (2006)Google Scholar
  39. 39.
    Pantcheva, M.: First Phase Syntax of Persian Complex Predicates: Argument Structure and Telicity. Journal of South Asian Linguistics 2(1) (2010)Google Scholar
  40. 40.
    Müller, S.: Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics 46(03), 601–655 (2010)CrossRefGoogle Scholar
  41. 41.
    Karimi Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)CrossRefGoogle Scholar
  42. 42.
    Ghomeshi, J.: Non-projecting nouns and the ezafe: construction in Persian. Natural Language & Linguistic Theory 15(4), 729–788 (1997)CrossRefGoogle Scholar
  43. 43.
    Anvari, H., Ahmadi-Givi, H.: Persian grammar 2, 2nd edn. Fatemi, Tehran (2006)Google Scholar
  44. 44.
    Deza, E., Deza, M.M.: Encyclopedia of Distances. Springer, Heidelberg (2009)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mohammad Sadegh Rasooli
    • 1
  • Heshaam Faili
    • 2
  • Behrouz Minaei-Bidgoli
    • 1
  1. 1.Department of Computer EngineeringIran University of Science and TechnologyIran
  2. 2.School of Electrical & Computer EngineeringTehran UniversityIran

Personalised recommendations