Advertisement

A System for Recognition of Named Entities in Odia Text Corpus Using Machine Learning Algorithm

  • Bishwa Ranjan Das
  • Srikanta Patnaik
  • Sarada Baboo
  • Niladri Sekhar Dash
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 31)

Abstract

This paper presents a novel approach to recognize named entities in Odia corpus. The development of a NER system for Odia using Support Vector Machine is a challenging task in intelligent computing. NER aims at classifying each word in a document into predefined target named entity classes in a linear and non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. We have used required lexical databases to prepare rules and identify the context patterns for Odia. Experimental results show that our approach achieves higher accuracy than previous approaches.

Keywords

Support vector machine Name entity recognition Part of speech tagging Root word 

References

  1. 1.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machine. In: Proceedings of NAACL, pp. 192–199 (2001)Google Scholar
  2. 2.
    Biswas, S., Mishra, S.P., Acharya, S., Mohanty, S.: A hybrid Oriya named entity recognition system: harnessing the power of rule. Int. J. Artif. Intell. Expert Syst. 1(1), 639–643 (2010)Google Scholar
  3. 3.
    Ekbal, A., Bandyopadhyay, S.: Bengali named entity recognition using support vector machine. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp. 51–58 (2008)Google Scholar
  4. 4.
    Saha, S.K., Sarkar, S., Mitra, P.: A hybrid feature set based maximum entropy hindi named entity recognition. In: Proceedings of the 3rd International Joint Conference on NLP, Hyderabad, India, pp. 343–349, Jan 2008Google Scholar
  5. 5.
    Goyal, A.: Named entity recognition for South Asian languages. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages, Hyderabad, India, pp. 89–96, Jan 2008Google Scholar
  6. 6.
    Sasidhar, B., Yohan, P.M., Babu, A.V., Govardhan, A.: A survey on named entity recognition in Indian languages with particular reference to Telugu. Int. J. Comput. Sci. 8(2). ISSN 1694-0814. www.IJCSI.org (2011)
  7. 7.
    Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: 19th International Conference on Computational Linguistics (COLING 2002), 24 Aug–1 Sept 2002Google Scholar
  8. 8.
    Dash, N.S.: Indian scenario in language corpus generation. In: Dash, N.S., Dash, P.D., Sarkar, P. (eds.) Rainbow of Linguistics, vol. I, pp. 129–162. T Media Publication, Kolkata (2007)Google Scholar
  9. 9.
    Das, B.R., Patnaik, S., Dash, N.S.: Development of Odia language corpus from modern news paper texts: some problems and issues. In: Proceedings of the International Conference on Intelligent Computing, Communication and Devices (ICCD 2014). SOA University, Bhubaneswar, India, Springer Book Series on AISC, pp. 88–94 (2014)Google Scholar
  10. 10.
    Sharma, P., Sharma, U., Kalita, J.: Named entity recognition: a survey for the Indian languages. Language in India. Special Volume: Problems of Parsing in Indian Languages 11(5). www.languageinindia.com, May 2011
  11. 11.
    Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Electron. Eng. 4(2), 155–170 (2010)Google Scholar
  12. 12.
    Saha, S.K., Ghosh, P.S., Sarkar, S., Mitra, P.: Named entity recognition in Hindi using maximum entropy and transliteration. Res. J. Comput. Sci. Comput. Eng. Appl. 33–41 (2008)Google Scholar
  13. 13.
    Bharati, A., Sangal, R., Chaitnya, V.: Natural language processing—a Paninian perspective. Prentice Hall-India, New Delhi (1995)Google Scholar
  14. 14.
    Ray, P.R., Harish, V., Sarkar, S., Basu, A.: Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In: Proceedings of the International Conference on Natural Language Processing (ICON 2003), pp. 118–125 (2003)Google Scholar
  15. 15.
    Satish, K.: Neural Network Book: A Classroom Approach, 10th edn. TMH Publication, New Delhi (2010)Google Scholar
  16. 16.
    Mahapatra, D.: Adhunika Odia Byakarana (Modern Odia Grammar), 5th edn. Kitab Mahal, Cuttack (2010)Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Bishwa Ranjan Das
    • 1
  • Srikanta Patnaik
    • 1
  • Sarada Baboo
    • 2
  • Niladri Sekhar Dash
    • 3
  1. 1.Department of Computer Science and Information Technology, Institute of Technical Education and ResearchSOA UniversityBhubaneswarIndia
  2. 2.Department of Computer Science and ApplicationSambalpur UniversityBurlaIndia
  3. 3.Linguistic Research UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations