Identification of Reduplicated Multiword Expressions Using CRF

Nongmeikapam, Kishorjit; Laishram, Dhiraj; Singh, Naorem Bikramjit; Chanu, Ngariyanbam Mayekleima; Bandyopadhyay, Sivaji

doi:10.1007/978-3-642-19400-9_4

Kishorjit Nongmeikapam¹⁷,
Dhiraj Laishram¹⁷,
Naorem Bikramjit Singh¹⁷,
Ngariyanbam Mayekleima Chanu¹⁸ &
…
Sivaji Bandyopadhyay¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2198 Accesses
4 Citations

Abstract

This paper deals with the identification of Reduplicated Multiword Expressions (RMWEs) which is important for any natural language applications like Machine Translation, Information Retrieval etc. In the present task, reduplicated MWEs have been identified in Manipuri language texts using CRF tool. Manipuri is highly agglutinative in nature and reduplication is quite high in this language. The important features selected for running the CRF tool include stem words, number of suffixes, number of prefixes, prefixes in the word, suffixes in the word, Part Of Speech (POS) of the surrounding words, surrounding stem words, length of the word, word frequency and digit feature. Experimental results show the effectiveness of the proposed approach with the overall average Recall, Precision and F-Score values of 92.91%, 91.90% and 92.40% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kishorjit, N., Bandyopadhyay, S.: Identification of Reduplicated MWEs in Manipuri: A Rule based Approached. In: Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL 2010), Redwood City, San Francisco, USA, pp. 49–54 (2010)
Google Scholar
Singh, T.D., Bandyopadhyay, S.: Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 35–42 (2010)
Google Scholar
Chakraborty, T., Bandyopadhyay, S.: Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule-Based Approach. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 73–76 (2010)
Google Scholar
Agarwal, A., Ray, B., Choudhury, M., Sarkar, S., Basu, A.: Automatic Extraction of Multiword Expressions in Bengali: An Approach for Miserly Resource Scenarios. In: Proceedings of ICON 2004, pp. 165–174. Macmillan, Basingstoke (2004)
Google Scholar
Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (N-V) collocations as multi-word-expressions. In: Proceedings of MSPIL, Mumbai, pp. 230–233 (2006)
Google Scholar
Kunchukuttan, A., Damani, O.P.: A System for Compound Nouns Multiword Expression Extraction for Hindi. In: Proceedings of ICON 2008, Macmillan, Basingstoke (2008)
Google Scholar
Enivre, J., Nilson, J.: Multiword Units in Syntactic Parsing. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 39–46 (2004)
Google Scholar
Koster, C.H.A.: Transducing Text to Multiword Unit. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 31–38 (2004)
Google Scholar
Odijik, J.: Reusable Lexical Representation for Idioms. In: Proceedings of LREC 2004, Lisbon, pp. 903–906 (2004)
Google Scholar
Diab, M.T., Bhutada, P.: Verb Noun Construction MWE Token Supervised Classification. In: Workshop on Multiword Expression, ACL-IJCNLP, Singapore, pp. 17–22 (2009)
Google Scholar
Singh, C.Y.: Manipuri Grammar, pp. 190–204. Rajesh Publications, Delhi (2000)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Procceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA, pp. 282–289 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Sc. & Engg., Manipur Institute of Technology, Manipur University, Imphal, India
Kishorjit Nongmeikapam, Dhiraj Laishram & Naorem Bikramjit Singh
Dept. of Education Technology, Kanan Devi Memorial College of Education, Imphal, India
Ngariyanbam Mayekleima Chanu
Dept. of Computer Sc. & Engg., Jadavpur University, Jadavpur, Kolkata, India
Sivaji Bandyopadhyay

Authors

Kishorjit Nongmeikapam
View author publications
You can also search for this author in PubMed Google Scholar
Dhiraj Laishram
View author publications
You can also search for this author in PubMed Google Scholar
Naorem Bikramjit Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ngariyanbam Mayekleima Chanu
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander F. Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nongmeikapam, K., Laishram, D., Singh, N.B., Chanu, N.M., Bandyopadhyay, S. (2011). Identification of Reduplicated Multiword Expressions Using CRF. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-19400-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics