Abstract
This paper deals with the identification of Reduplicated Multiword Expressions (RMWEs) which is important for any natural language applications like Machine Translation, Information Retrieval etc. In the present task, reduplicated MWEs have been identified in Manipuri language texts using CRF tool. Manipuri is highly agglutinative in nature and reduplication is quite high in this language. The important features selected for running the CRF tool include stem words, number of suffixes, number of prefixes, prefixes in the word, suffixes in the word, Part Of Speech (POS) of the surrounding words, surrounding stem words, length of the word, word frequency and digit feature. Experimental results show the effectiveness of the proposed approach with the overall average Recall, Precision and F-Score values of 92.91%, 91.90% and 92.40% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kishorjit, N., Bandyopadhyay, S.: Identification of Reduplicated MWEs in Manipuri: A Rule based Approached. In: Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL 2010), Redwood City, San Francisco, USA, pp. 49–54 (2010)
Singh, T.D., Bandyopadhyay, S.: Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 35–42 (2010)
Chakraborty, T., Bandyopadhyay, S.: Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule-Based Approach. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 73–76 (2010)
Agarwal, A., Ray, B., Choudhury, M., Sarkar, S., Basu, A.: Automatic Extraction of Multiword Expressions in Bengali: An Approach for Miserly Resource Scenarios. In: Proceedings of ICON 2004, pp. 165–174. Macmillan, Basingstoke (2004)
Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (N-V) collocations as multi-word-expressions. In: Proceedings of MSPIL, Mumbai, pp. 230–233 (2006)
Kunchukuttan, A., Damani, O.P.: A System for Compound Nouns Multiword Expression Extraction for Hindi. In: Proceedings of ICON 2008, Macmillan, Basingstoke (2008)
Enivre, J., Nilson, J.: Multiword Units in Syntactic Parsing. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 39–46 (2004)
Koster, C.H.A.: Transducing Text to Multiword Unit. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 31–38 (2004)
Odijik, J.: Reusable Lexical Representation for Idioms. In: Proceedings of LREC 2004, Lisbon, pp. 903–906 (2004)
Diab, M.T., Bhutada, P.: Verb Noun Construction MWE Token Supervised Classification. In: Workshop on Multiword Expression, ACL-IJCNLP, Singapore, pp. 17–22 (2009)
Singh, C.Y.: Manipuri Grammar, pp. 190–204. Rajesh Publications, Delhi (2000)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Procceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA, pp. 282–289 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nongmeikapam, K., Laishram, D., Singh, N.B., Chanu, N.M., Bandyopadhyay, S. (2011). Identification of Reduplicated Multiword Expressions Using CRF. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-19400-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)