Assamese Named Entity Recognition System Using Naive Bayes Classifier
- 731 Downloads
Abstract
Named Entity Recognition (NER) is crucial when it comes to taking care of information extraction, question-answering, document summarization and machine translation which are undoubtly the important Natural Language Processing (NLP) tasks. This work is a detailed analysis of our previously developed NER system with more emphasis on how individual features will contribute towards the recognition of person, location and organization named entities and how these features in different combinations affect the performance measure of the system. In addition to these, we have also evaluated the behaviour of the features with the increase in training and test corpus. Since this system is based on supervised learning, we need to have a large parts of speech tagged and named entity tagged Training Corpus as well as a parts of speech tagged Test Corpus. The maximum value of performance measure of the overall system is obtained when the training corpus is of size with 5000 words and the amount of named entities present in the test corpus is 50 and the values obtained are 95% in terms of precision, 84% in terms of recall and 89% in terms of F1-measure. This work will add a new dimension in the usage of features for recognition of ENAMEX tags in Assamese corpus.
Keywords
Named entity Corpus Naive Bayes classifier Machine learningReferences
- 1.Sharma, P., Sharma, U., Kalita, J.: The first Steps towards Assamese named entity recognition. Brisbane Convention Center, Brisbane, Australia (2010)Google Scholar
- 2.Sharma, P., Sharma, U., Kalita, J.: Suffix stripping based NER in Assamese for location names. In: Computational Intelligence and Signal Processing (CISP) (2012)Google Scholar
- 3.Sharma, P., Sharma, U., Kalita, J.: Named entity recognition in Assamese using CRFS and rules. In: 2014 International Conference on Asian Language Processing (IALP), pp. 15–18. IEEE, October 2014Google Scholar
- 4.Sharma, P., Sharma, U., Kalita, J.: Named entity recognition in Assamese: a hybrid approach. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2114–2120. IEEE, September 2016Google Scholar
- 5.Fleischman, M.: Automated subcategorization of named entities. In: ACL (Companion Volume) (2001)Google Scholar
- 6.Lee, S., Lee, G.G.: Heuristic methods for reducing errors of geographic named entities learned by bootstrapping. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 658–669. Springer, Heidelberg (2005). https://doi.org/10.1007/11562214_58CrossRefGoogle Scholar
- 7.Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2002)Google Scholar
- 8.Bodenreider, O., Zweigenbaum, P.: Identifying proper names in parallel medical terminologies. Stud. Health Technol. Inform. 77, 443 (2000)Google Scholar
- 9.Bick, E.: A named entity recognizer for Danish. In: LREC (2004)Google Scholar
- 10.Ohta, T., Tateisi, Y., Kim, J.-D.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Second International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
- 11.Sekine, S., Nobata, C.: Definition, dictionaries and tagger for extended named entity hierarchy. In: LREC (2004)Google Scholar
- 12.Bikel, D.M., et al.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics (1997)Google Scholar
- 13.Sekine, S.: NYU: description of the Japanese NE system used for MET-2. In: Proceedings of the Seventh Message Understanding Conference MUC-7 (1998)Google Scholar
- 14.Borthwick, A., et al.: NYU: description of the MENE named entity system as used in MUC-7. In: Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, 29 April–1 May 1998Google Scholar
- 15.Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics (2003)Google Scholar
- 16.McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics (2003)Google Scholar
- 17.Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 3–26 (2007)CrossRefGoogle Scholar
- 18.Talukdar, G., Borah, P.P., Baruah, A.: Supervised named entity recognition in Assamese language. In: International Conference on Contemporary Computing and Informatics, India (2014)Google Scholar