Skip to main content

Identification of Reduplicated Multiword Expressions Using CRF

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Abstract

This paper deals with the identification of Reduplicated Multiword Expressions (RMWEs) which is important for any natural language applications like Machine Translation, Information Retrieval etc. In the present task, reduplicated MWEs have been identified in Manipuri language texts using CRF tool. Manipuri is highly agglutinative in nature and reduplication is quite high in this language. The important features selected for running the CRF tool include stem words, number of suffixes, number of prefixes, prefixes in the word, suffixes in the word, Part Of Speech (POS) of the surrounding words, surrounding stem words, length of the word, word frequency and digit feature. Experimental results show the effectiveness of the proposed approach with the overall average Recall, Precision and F-Score values of 92.91%, 91.90% and 92.40% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kishorjit, N., Bandyopadhyay, S.: Identification of Reduplicated MWEs in Manipuri: A Rule based Approached. In: Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL 2010), Redwood City, San Francisco, USA, pp. 49–54 (2010)

    Google Scholar 

  2. Singh, T.D., Bandyopadhyay, S.: Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 35–42 (2010)

    Google Scholar 

  3. Chakraborty, T., Bandyopadhyay, S.: Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule-Based Approach. In: 23rd International Conference on the Computational Linguistics (COLING), Beijing, pp. 73–76 (2010)

    Google Scholar 

  4. Agarwal, A., Ray, B., Choudhury, M., Sarkar, S., Basu, A.: Automatic Extraction of Multiword Expressions in Bengali: An Approach for Miserly Resource Scenarios. In: Proceedings of ICON 2004, pp. 165–174. Macmillan, Basingstoke (2004)

    Google Scholar 

  5. Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (N-V) collocations as multi-word-expressions. In: Proceedings of MSPIL, Mumbai, pp. 230–233 (2006)

    Google Scholar 

  6. Kunchukuttan, A., Damani, O.P.: A System for Compound Nouns Multiword Expression Extraction for Hindi. In: Proceedings of ICON 2008, Macmillan, Basingstoke (2008)

    Google Scholar 

  7. Enivre, J., Nilson, J.: Multiword Units in Syntactic Parsing. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 39–46 (2004)

    Google Scholar 

  8. Koster, C.H.A.: Transducing Text to Multiword Unit. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 31–38 (2004)

    Google Scholar 

  9. Odijik, J.: Reusable Lexical Representation for Idioms. In: Proceedings of LREC 2004, Lisbon, pp. 903–906 (2004)

    Google Scholar 

  10. Diab, M.T., Bhutada, P.: Verb Noun Construction MWE Token Supervised Classification. In: Workshop on Multiword Expression, ACL-IJCNLP, Singapore, pp. 17–22 (2009)

    Google Scholar 

  11. Singh, C.Y.: Manipuri Grammar, pp. 190–204. Rajesh Publications, Delhi (2000)

    Google Scholar 

  12. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Procceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA, pp. 282–289 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nongmeikapam, K., Laishram, D., Singh, N.B., Chanu, N.M., Bandyopadhyay, S. (2011). Identification of Reduplicated Multiword Expressions Using CRF. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19400-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19399-6

  • Online ISBN: 978-3-642-19400-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics