Skip to main content

Unsupervised Identification of Persian Compound Verbs

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7094)

Abstract

One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.

Keywords

  • multiword expression
  • light verb constructions
  • unsupervised identification
  • bootstrapping
  • K-means
  • Persian

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-25324-9_34
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-25324-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  2. Choueka, Y., Klein, T., Neuwitz, E.: Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing 4(1), 34–38 (1983)

    Google Scholar 

  3. Evert, S.: Corpora and collocations. In: Corpus Linguistics. An International Handbook, pp. 1212–1248 (2009)

    Google Scholar 

  4. Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1), 137–158 (2010)

    CrossRef  Google Scholar 

  5. Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 17–22. Association for Computational Linguistics, Suntec (2009)

    Google Scholar 

  6. Bannard, C., Baldwin, T., Lascarides, A.: A statistical approach to the semantics of verb-particles. In: ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 65–72. Association for Computational Linguistics (2003)

    Google Scholar 

  7. Diab, M.T., Krishna, M.: Unsupervised Classification of Verb Noun Multi-word Expression Tokens. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 98–110. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  8. Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  9. Sag, I., et al.: Multiword expressions: A pain in the neck for NLP. In: 6th Conference on Natural Language Learning (COLING 2002), pp. 1–15 (2002)

    Google Scholar 

  10. Villavicencio, A., Copestake, A.: On the nature of idioms. In: LinGO Working (2002)

    Google Scholar 

  11. Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)

    CrossRef  Google Scholar 

  12. Villavicencio, A., Copestake, A.: Verb-particle constructions in a computational grammar of English. Citeseer (2002)

    Google Scholar 

  13. Karimi-Doostan, G.: Light verbs and structural case. Lingua 115(12), 1737–1756 (2005)

    CrossRef  Google Scholar 

  14. Fazly, A., Stevenson, S., North, R.: Automatically learning semantic knowledge about multiword predicates. Language Resources and Evaluation 41(1), 61–89 (2007)

    CrossRef  Google Scholar 

  15. Karimi-Doostan, G.: Event structure of verbal nouns and light verbs. In: Aspects of Iranian Linguistics: Papers in Honor of Mohammad Reza Bateni, pp. 209–226 (2008)

    Google Scholar 

  16. Fazly, A., Nematzadeh, A., Stevenson, S.: Acquiring Multiword Verbs: The Role of Statistical Evidence. In: 31st Annual Conference of the Cognitive Science Society, Amsterdam, The Netherlands, pp. 1222–1227 (2009)

    Google Scholar 

  17. Lin, D.: Automatic identification of non-compositional phrases. In: 37th Annual Meeting of Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics, College Park (1999)

    Google Scholar 

  18. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  19. Pecina, P.: An extensive empirical study of collocation extraction methods. In: ACL Student Research Workshop. Association for Computational Linguistics (2005)

    Google Scholar 

  20. Hoang, H.H., Kim, S.N., Kan, M.-Y.: A re-examination of lexical association measures. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 31–39. Association for Computational Linguistics, Suntec (2009)

    Google Scholar 

  21. Krenn, B., Evert, S.: Can we do better than frequency? A case study on extracting PP-verb collocations. In: ACL Workshop on Collocations. Citeseer (2001)

    Google Scholar 

  22. Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: 23rd International Conference on Computational Linguistics (Coling 2010). Association for Computational Linguistics, Beijing (2010)

    Google Scholar 

  23. Blaheta, D., Johnson, M.: Unsupervised learning of multi-word verbs. In: 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL 39), Toulouse, France (2001)

    Google Scholar 

  24. Baldwin, T., Villavicencio, A.: Extracting the unextractable: A case study on verb-particles. In: 6th Conference on Natural Language Learning (COLING 2002). Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  25. Birke, J., Sarkar, A.: A clustering approach for the nearly unsupervised recognition of nonliteral language. In: EACL 2006, pp. 329–336 (2006)

    Google Scholar 

  26. Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. Association for Computational Linguistics, Sydney (2006)

    Google Scholar 

  27. Cook, P., Fazly, A., Stevenson, S.: Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  28. Fazly, A., Stevenson, S.: Automatically constructing a lexicon of verb phrase idiomatic combinations. In: EACL 2006 (2006)

    Google Scholar 

  29. Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  30. Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 19–22 (2008)

    Google Scholar 

  31. Diab, M.T., Krishna, M.: Handling sparsity for verb noun MWE token classification. In: Workshop on Geometrical Models of Natural Language Semantics. Association for Computational Linguistics, Athens (2009)

    Google Scholar 

  32. Pecina, P.: A machine learning approach to multiword expression extraction. In: Shared Task for Multiword Expressions (MWE 2008), pp. 54–57 (2008)

    Google Scholar 

  33. Kaalep, H.-J., Muischnek, K.: Multi-word verbs of Estonian: a database and a corpus. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 23–26 (2008)

    Google Scholar 

  34. Bömová, A., et al.: The Prague Dependency Treebank: A three-level annotation scenario. Treebanks: Building and Using Parsed Corpora, 103–127 (2003)

    Google Scholar 

  35. Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)

    Google Scholar 

  36. Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. Citeseer (2007)

    Google Scholar 

  37. Dabir-Moghaddam, M.: Compound verbs in Persian. Studies in the Linguistic Sciences 27(2), 25–59 (1997)

    Google Scholar 

  38. Family, N.: Explorations of Semantic Space: The Case of Light Verb Constructions in Persian. In: Ecole des Hautes Etudes en Sciences Sociales, Paris, France (2006)

    Google Scholar 

  39. Pantcheva, M.: First Phase Syntax of Persian Complex Predicates: Argument Structure and Telicity. Journal of South Asian Linguistics 2(1) (2010)

    Google Scholar 

  40. Müller, S.: Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics 46(03), 601–655 (2010)

    CrossRef  Google Scholar 

  41. Karimi Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)

    CrossRef  Google Scholar 

  42. Ghomeshi, J.: Non-projecting nouns and the ezafe: construction in Persian. Natural Language & Linguistic Theory 15(4), 729–788 (1997)

    CrossRef  Google Scholar 

  43. Anvari, H., Ahmadi-Givi, H.: Persian grammar 2, 2nd edn. Fatemi, Tehran (2006)

    Google Scholar 

  44. Deza, E., Deza, M.M.: Encyclopedia of Distances. Springer, Heidelberg (2009)

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rasooli, M.S., Faili, H., Minaei-Bidgoli, B. (2011). Unsupervised Identification of Persian Compound Verbs. In: Batyrshin, I., Sidorov, G. (eds) Advances in Artificial Intelligence. MICAI 2011. Lecture Notes in Computer Science(), vol 7094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25324-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25324-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25323-2

  • Online ISBN: 978-3-642-25324-9

  • eBook Packages: Computer ScienceComputer Science (R0)