Abstract
One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.
Keywords
- multiword expression
- light verb constructions
- unsupervised identification
- bootstrapping
- K-means
- Persian
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Choueka, Y., Klein, T., Neuwitz, E.: Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing 4(1), 34–38 (1983)
Evert, S.: Corpora and collocations. In: Corpus Linguistics. An International Handbook, pp. 1212–1248 (2009)
Pecina, P.: Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1), 137–158 (2010)
Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 17–22. Association for Computational Linguistics, Suntec (2009)
Bannard, C., Baldwin, T., Lascarides, A.: A statistical approach to the semantics of verb-particles. In: ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 65–72. Association for Computational Linguistics (2003)
Diab, M.T., Krishna, M.: Unsupervised Classification of Verb Noun Multi-word Expression Tokens. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 98–110. Springer, Heidelberg (2009)
Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)
Sag, I., et al.: Multiword expressions: A pain in the neck for NLP. In: 6th Conference on Natural Language Learning (COLING 2002), pp. 1–15 (2002)
Villavicencio, A., Copestake, A.: On the nature of idioms. In: LinGO Working (2002)
Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103 (2009)
Villavicencio, A., Copestake, A.: Verb-particle constructions in a computational grammar of English. Citeseer (2002)
Karimi-Doostan, G.: Light verbs and structural case. Lingua 115(12), 1737–1756 (2005)
Fazly, A., Stevenson, S., North, R.: Automatically learning semantic knowledge about multiword predicates. Language Resources and Evaluation 41(1), 61–89 (2007)
Karimi-Doostan, G.: Event structure of verbal nouns and light verbs. In: Aspects of Iranian Linguistics: Papers in Honor of Mohammad Reza Bateni, pp. 209–226 (2008)
Fazly, A., Nematzadeh, A., Stevenson, S.: Acquiring Multiword Verbs: The Role of Statistical Evidence. In: 31st Annual Conference of the Cognitive Science Society, Amsterdam, The Netherlands, pp. 1222–1227 (2009)
Lin, D.: Automatic identification of non-compositional phrases. In: 37th Annual Meeting of Association for Computational Linguistics, pp. 317–324. Association for Computational Linguistics, College Park (1999)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Pecina, P.: An extensive empirical study of collocation extraction methods. In: ACL Student Research Workshop. Association for Computational Linguistics (2005)
Hoang, H.H., Kim, S.N., Kan, M.-Y.: A re-examination of lexical association measures. In: Workshop on Multiword Expressions (ACL-IJCNLP 2009), pp. 31–39. Association for Computational Linguistics, Suntec (2009)
Krenn, B., Evert, S.: Can we do better than frequency? A case study on extracting PP-verb collocations. In: ACL Workshop on Collocations. Citeseer (2001)
Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: 23rd International Conference on Computational Linguistics (Coling 2010). Association for Computational Linguistics, Beijing (2010)
Blaheta, D., Johnson, M.: Unsupervised learning of multi-word verbs. In: 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL 39), Toulouse, France (2001)
Baldwin, T., Villavicencio, A.: Extracting the unextractable: A case study on verb-particles. In: 6th Conference on Natural Language Learning (COLING 2002). Association for Computational Linguistics, Stroudsburg (2002)
Birke, J., Sarkar, A.: A clustering approach for the nearly unsupervised recognition of nonliteral language. In: EACL 2006, pp. 329–336 (2006)
Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. Association for Computational Linguistics, Sydney (2006)
Cook, P., Fazly, A., Stevenson, S.: Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)
Fazly, A., Stevenson, S.: Automatically constructing a lexicon of verb phrase idiomatic combinations. In: EACL 2006 (2006)
Bannard, C.: A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In: Workshop on A Broader Perspective on Multiword Expressions. Association for Computational Linguistics, Prague (2007)
Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 19–22 (2008)
Diab, M.T., Krishna, M.: Handling sparsity for verb noun MWE token classification. In: Workshop on Geometrical Models of Natural Language Semantics. Association for Computational Linguistics, Athens (2009)
Pecina, P.: A machine learning approach to multiword expression extraction. In: Shared Task for Multiword Expressions (MWE 2008), pp. 54–57 (2008)
Kaalep, H.-J., Muischnek, K.: Multi-word verbs of Estonian: a database and a corpus. In: LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 23–26 (2008)
Bömová, A., et al.: The Prague Dependency Treebank: A three-level annotation scenario. Treebanks: Building and Using Parsed Corpora, 103–127 (2003)
Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)
Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. Citeseer (2007)
Dabir-Moghaddam, M.: Compound verbs in Persian. Studies in the Linguistic Sciences 27(2), 25–59 (1997)
Family, N.: Explorations of Semantic Space: The Case of Light Verb Constructions in Persian. In: Ecole des Hautes Etudes en Sciences Sociales, Paris, France (2006)
Pantcheva, M.: First Phase Syntax of Persian Complex Predicates: Argument Structure and Telicity. Journal of South Asian Linguistics 2(1) (2010)
Müller, S.: Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics 46(03), 601–655 (2010)
Karimi Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)
Ghomeshi, J.: Non-projecting nouns and the ezafe: construction in Persian. Natural Language & Linguistic Theory 15(4), 729–788 (1997)
Anvari, H., Ahmadi-Givi, H.: Persian grammar 2, 2nd edn. Fatemi, Tehran (2006)
Deza, E., Deza, M.M.: Encyclopedia of Distances. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rasooli, M.S., Faili, H., Minaei-Bidgoli, B. (2011). Unsupervised Identification of Persian Compound Verbs. In: Batyrshin, I., Sidorov, G. (eds) Advances in Artificial Intelligence. MICAI 2011. Lecture Notes in Computer Science(), vol 7094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25324-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-25324-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25323-2
Online ISBN: 978-3-642-25324-9
eBook Packages: Computer ScienceComputer Science (R0)