Advertisement

International Journal of Information Technology

, Volume 11, Issue 4, pp 667–675 | Cite as

Disambiguation using joint entropy in part of speech of written Myanmar text

  • Sin Thi Yar MyintEmail author
  • G. R. Sinha
Original Research

Abstract

Myanmar language also known as Burmese language is a free order of word language and syntactic patterns of one word can vary based on the position and the structure in the sentence. There are many ambiguous part of speech (POS) tags on one word in the sentence of Myanmar text. This research work presents disambiguation for the POS of the words in written Myanmar text. We aim at removing this ambiguity on Myanmar word and assigning single POS to each word of sentence. This is demonstrated on the following ideas: (i) input the sentence and segmented into words using syllable segmentation rules and forward maximum matching approach with monolingual Myanmar dictionary and (ii) apply the Joint Entropy (JE) for POS ambiguous for each word in the sentence with monolingual Myanmar tagged corpus. Joint probability value could be given the useful and accurate disambiguation of POS for free order and structure of words in Myanmar text. The monolingual Myanmar tagged corpus and tagged dictionary are created including 620 sentences and 15,000 words, respectively. This study attempts practical word segmentation and POS tagging system which can really overcome bottleneck of the machine translation system for Myanmar to other languages and research activities related to natural language processing (NLP).

Keywords

Part of speech Joint entropy Natural language processing Myanmar language 

References

  1. 1.
    Brent MR (1999) Speech segmentation and word discovery: a computational perspective. Trends Cogn Sci 3(8):294–301CrossRefGoogle Scholar
  2. 2.
    Cutting D, Kupiec J, Pederson J, Sibun P (1992) A practical part-of-speech tagger. In: Proceedings of the 3rd conference on applied NLP, pp 133–140Google Scholar
  3. 3.
    Ding C, Thu YK, Utiyama M, Sumita E (2016) Word segmentation for Burmese (Myanmar). ACM Trans Asian Low-resour Lang Inf Process 15(4):Article 22CrossRefGoogle Scholar
  4. 4.
    Joint Entropy (2019) https://en.wikipedia.org/wiki/Joint_entropy. Accessed 10 Mar 2019
  5. 5.
    Machine Translation (2019) https://en.wikipedia.org/wiki/Machine_translation. Accessed 4 Apr 2019
  6. 6.
    Minn AZ (2009) A comparative study of the two grammatical systems of written english and MYANMAR and its significance to learning english as a foreign language. Ph.D Dissertation, Department of English, University of Mandalay, MyanmarGoogle Scholar
  7. 7.
    Myanmar (2019)  https://en.wikipedia.org/wiki/Myanmar. Accessed 3 Apr 2019
  8. 8.
    Myanmar Language Commission (1993) Myanmar—English dictionary. Republic of the Union of Myanmar, Myanmar Language Commission, NaypyitawGoogle Scholar
  9. 9.
    Myanmar Language Commission (2003) Myanmar orthography, 2nd edn. Republic of the Union of Myanmar, Myanmar Language Commission, NaypyitawGoogle Scholar
  10. 10.
    Myanmar Language Commission (2005) Myanmar Grammar, 1st edn. Republic of the Union of Myanmar, Myanmar Language Commission, NaypyitawGoogle Scholar
  11. 11.
    Myanmar Population (2019)  http://worldpopulationreview.com/countries/myanmar-population/. Accessed 4 Apr 2019
  12. 12.
    Myanmar (pyidaungsu) Font (2019) https://www.mmunicode.org/wiki/pyidaungsu-font/. Accessed June 2019
  13. 13.
    Myanmar Script Summary (2015)  https://r12a.github.io/scripts/myanmar/. Accessed 9 Mar 2019
  14. 14.
    Myanmar Unicode and NLP Research Center (1998) http://mcf.org.mm/myanmar-unicode.html. Accessed 3 Apr 2019
  15. 15.
    Myint C (2011) A hybrid approach for part-of-speech tagging of Burmese texts. In: Proceedings of 2011 international conference on computer and management, Wuhan, China, pp 648–651Google Scholar
  16. 16.
    Myint STY, Khin MM (2013) Lexicon based word segmentation and part of speech tagging for written Myanmar text. Int J Comput Ling Nat Lang Proc 2(6):396–403Google Scholar
  17. 17.
    Nam TV, Hue NT, Khanh PH (2017) “Building a syllable database to solve the problem of Khmer word segmentation. Int J Nat Lang Comput 6(1):2278–2307Google Scholar
  18. 18.
    Nguyen DQ, Thanh V, Nugyen DQ, Dras M, Johnson M (2017) From word segmentation to POS tagging for Vietnamese. In: Proceedings of Australasian language technology association workshop, pp 108–113Google Scholar
  19. 19.
    Phyu ML, Hashimoto K (2017) Burmese word segmentation with Character Clustering and CRFs. In: 14th international joint conference on computer science and software engineering (JCSSE), Nakhon Si Thammarat, ThailandGoogle Scholar
  20. 20.
    Syntatic ambiguity (2019) https://en.wikipedia.org/wiki/Syntactic_ambiguity. Accessed 3 Apr 2019
  21. 21.
    Tedla YK, Yamamoto K, Marasinghe A (2016) Tigrinya part-of-speech tagging with morphological patterns and the new Nagaoka Tigrinya Corpus. Int J Comput Appl (0975–8887) 146(14):33–41Google Scholar
  22. 22.
    Win MT, Win MM, Than MM, Than M, Aye K (2011) Burmese Phrase Segmentation. In: Proceedings of conference on human language technology for development, Egypt, pp 27–33Google Scholar

Copyright information

© Bharati Vidyapeeth's Institute of Computer Applications and Management 2019

Authors and Affiliations

  1. 1.Faculty of Computer ScienceMyanmar Institute of Information TechnologyMandalayMyanmar
  2. 2.Myanmar Institute of Information TechnologyMandalayMyanmar

Personalised recommendations