Unsupervised Identification of Persian Compound Verbs
- 4 Citations
- 1.1k Downloads
Abstract
One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.
Keywords
multiword expression light verb constructions unsupervised identification bootstrapping K-means PersianPreview
Unable to display preview. Download preview PDF.
References
- 1.Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)Google Scholar
- 2.Choueka, Y., Klein, T., Neuwitz, E.: Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing 4(1), 34–38 (1983)Google Scholar
- 3.Evert, S.: Corpora and collocations. In: Corpus Linguistics. An International Handbook, pp. 1212–1248 (2009)Google Scholar
- 4.Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1), 137–158 (2010)CrossRefGoogle Scholar
- 5.Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 17–22. Association for Computational Linguistics, Suntec (2009)Google Scholar
- 6.Bannard, C., Baldwin, T., Lascarides, A.: A statistical approach to the semantics of verb-particles. In: ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 65–72. Association for Computational Linguistics (2003)Google Scholar
- 7.Diab, M.T., Krishna, M.: Unsupervised Classification of Verb Noun Multi-word Expression Tokens. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 98–110. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 8.Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
- 9.Sag, I., et al.: Multiword expressions: A pain in the neck for NLP. In: 6th Conference on Natural Language Learning (COLING 2002), pp. 1–15 (2002)Google Scholar
- 10.Villavicencio, A., Copestake, A.: On the nature of idioms. In: LinGO Working (2002)Google Scholar
- 11.Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)CrossRefGoogle Scholar
- 12.Villavicencio, A., Copestake, A.: Verb-particle constructions in a computational grammar of English. Citeseer (2002)Google Scholar
- 13.Karimi-Doostan, G.: Light verbs and structural case. Lingua 115(12), 1737–1756 (2005)CrossRefGoogle Scholar
- 14.Fazly, A., Stevenson, S., North, R.: Automatically learning semantic knowledge about multiword predicates. Language Resources and Evaluation 41(1), 61–89 (2007)CrossRefGoogle Scholar
- 15.Karimi-Doostan, G.: Event structure of verbal nouns and light verbs. In: Aspects of Iranian Linguistics: Papers in Honor of Mohammad Reza Bateni, pp. 209–226 (2008)Google Scholar
- 16.Fazly, A., Nematzadeh, A., Stevenson, S.: Acquiring Multiword Verbs: The Role of Statistical Evidence. In: 31st Annual Conference of the Cognitive Science Society, Amsterdam, The Netherlands, pp. 1222–1227 (2009)Google Scholar
- 17.Lin, D.: Automatic identification of non-compositional phrases. In: 37th Annual Meeting of Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics, College Park (1999)Google Scholar
- 18.Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
- 19.Pecina, P.: An extensive empirical study of collocation extraction methods. In: ACL Student Research Workshop. Association for Computational Linguistics (2005)Google Scholar
- 20.Hoang, H.H., Kim, S.N., Kan, M.-Y.: A re-examination of lexical association measures. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 31–39. Association for Computational Linguistics, Suntec (2009)Google Scholar
- 21.Krenn, B., Evert, S.: Can we do better than frequency? A case study on extracting PP-verb collocations. In: ACL Workshop on Collocations. Citeseer (2001)Google Scholar
- 22.Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: 23rd International Conference on Computational Linguistics (Coling 2010). Association for Computational Linguistics, Beijing (2010)Google Scholar
- 23.Blaheta, D., Johnson, M.: Unsupervised learning of multi-word verbs. In: 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL 39), Toulouse, France (2001)Google Scholar
- 24.Baldwin, T., Villavicencio, A.: Extracting the unextractable: A case study on verb-particles. In: 6th Conference on Natural Language Learning (COLING 2002). Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
- 25.Birke, J., Sarkar, A.: A clustering approach for the nearly unsupervised recognition of nonliteral language. In: EACL 2006, pp. 329–336 (2006)Google Scholar
- 26.Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. Association for Computational Linguistics, Sydney (2006)Google Scholar
- 27.Cook, P., Fazly, A., Stevenson, S.: Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
- 28.Fazly, A., Stevenson, S.: Automatically constructing a lexicon of verb phrase idiomatic combinations. In: EACL 2006 (2006)Google Scholar
- 29.Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)Google Scholar
- 30.Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 19–22 (2008)Google Scholar
- 31.Diab, M.T., Krishna, M.: Handling sparsity for verb noun MWE token classification. In: Workshop on Geometrical Models of Natural Language Semantics. Association for Computational Linguistics, Athens (2009)Google Scholar
- 32.Pecina, P.: A machine learning approach to multiword expression extraction. In: Shared Task for Multiword Expressions (MWE 2008), pp. 54–57 (2008)Google Scholar
- 33.Kaalep, H.-J., Muischnek, K.: Multi-word verbs of Estonian: a database and a corpus. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 23–26 (2008)Google Scholar
- 34.Bömová, A., et al.: The Prague Dependency Treebank: A three-level annotation scenario. Treebanks: Building and Using Parsed Corpora, 103–127 (2003)Google Scholar
- 35.Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)Google Scholar
- 36.Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. Citeseer (2007)Google Scholar
- 37.Dabir-Moghaddam, M.: Compound verbs in Persian. Studies in the Linguistic Sciences 27(2), 25–59 (1997)Google Scholar
- 38.Family, N.: Explorations of Semantic Space: The Case of Light Verb Constructions in Persian. In: Ecole des Hautes Etudes en Sciences Sociales, Paris, France (2006)Google Scholar
- 39.Pantcheva, M.: First Phase Syntax of Persian Complex Predicates: Argument Structure and Telicity. Journal of South Asian Linguistics 2(1) (2010)Google Scholar
- 40.Müller, S.: Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics 46(03), 601–655 (2010)CrossRefGoogle Scholar
- 41.Karimi Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)CrossRefGoogle Scholar
- 42.Ghomeshi, J.: Non-projecting nouns and the ezafe: construction in Persian. Natural Language & Linguistic Theory 15(4), 729–788 (1997)CrossRefGoogle Scholar
- 43.Anvari, H., Ahmadi-Givi, H.: Persian grammar 2, 2nd edn. Fatemi, Tehran (2006)Google Scholar
- 44.Deza, E., Deza, M.M.: Encyclopedia of Distances. Springer, Heidelberg (2009)CrossRefzbMATHGoogle Scholar