Assamese Named Entity Recognition System Using Naive Bayes Classifier

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 905)


Named Entity Recognition (NER) is crucial when it comes to taking care of information extraction, question-answering, document summarization and machine translation which are undoubtly the important Natural Language Processing (NLP) tasks. This work is a detailed analysis of our previously developed NER system with more emphasis on how individual features will contribute towards the recognition of person, location and organization named entities and how these features in different combinations affect the performance measure of the system. In addition to these, we have also evaluated the behaviour of the features with the increase in training and test corpus. Since this system is based on supervised learning, we need to have a large parts of speech tagged and named entity tagged Training Corpus as well as a parts of speech tagged Test Corpus. The maximum value of performance measure of the overall system is obtained when the training corpus is of size with 5000 words and the amount of named entities present in the test corpus is 50 and the values obtained are 95% in terms of precision, 84% in terms of recall and 89% in terms of F1-measure. This work will add a new dimension in the usage of features for recognition of ENAMEX tags in Assamese corpus.


Named entity Corpus Naive Bayes classifier Machine learning 


  1. 1.
    Sharma, P., Sharma, U., Kalita, J.: The first Steps towards Assamese named entity recognition. Brisbane Convention Center, Brisbane, Australia (2010)Google Scholar
  2. 2.
    Sharma, P., Sharma, U., Kalita, J.: Suffix stripping based NER in Assamese for location names. In: Computational Intelligence and Signal Processing (CISP) (2012)Google Scholar
  3. 3.
    Sharma, P., Sharma, U., Kalita, J.: Named entity recognition in Assamese using CRFS and rules. In: 2014 International Conference on Asian Language Processing (IALP), pp. 15–18. IEEE, October 2014Google Scholar
  4. 4.
    Sharma, P., Sharma, U., Kalita, J.: Named entity recognition in Assamese: a hybrid approach. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2114–2120. IEEE, September 2016Google Scholar
  5. 5.
    Fleischman, M.: Automated subcategorization of named entities. In: ACL (Companion Volume) (2001)Google Scholar
  6. 6.
    Lee, S., Lee, G.G.: Heuristic methods for reducing errors of geographic named entities learned by bootstrapping. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 658–669. Springer, Heidelberg (2005). Scholar
  7. 7.
    Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2002)Google Scholar
  8. 8.
    Bodenreider, O., Zweigenbaum, P.: Identifying proper names in parallel medical terminologies. Stud. Health Technol. Inform. 77, 443 (2000)Google Scholar
  9. 9.
    Bick, E.: A named entity recognizer for Danish. In: LREC (2004)Google Scholar
  10. 10.
    Ohta, T., Tateisi, Y., Kim, J.-D.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Second International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  11. 11.
    Sekine, S., Nobata, C.: Definition, dictionaries and tagger for extended named entity hierarchy. In: LREC (2004)Google Scholar
  12. 12.
    Bikel, D.M., et al.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics (1997)Google Scholar
  13. 13.
    Sekine, S.: NYU: description of the Japanese NE system used for MET-2. In: Proceedings of the Seventh Message Understanding Conference MUC-7 (1998)Google Scholar
  14. 14.
    Borthwick, A., et al.: NYU: description of the MENE named entity system as used in MUC-7. In: Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, 29 April–1 May 1998Google Scholar
  15. 15.
    Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics (2003)Google Scholar
  16. 16.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics (2003)Google Scholar
  17. 17.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 3–26 (2007)CrossRefGoogle Scholar
  18. 18.
    Talukdar, G., Borah, P.P., Baruah, A.: Supervised named entity recognition in Assamese language. In: International Conference on Contemporary Computing and Informatics, India (2014)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringRoyal Group of InstitutionsGuwahatiIndia
  2. 2.Department of DesignIndian Institute of Technology GuwahatiGuwahatiIndia
  3. 3.Department of Computer Science and EngineeringAssam Don Bosco UniversityGuwahatiIndia

Personalised recommendations