Named Entity Recognition for Mongolian Language
- 1 Citations
- 1.4k Downloads
Abstract
This paper presents a pioneering work on building a Named Entity Recognition system for the Mongolian language, with an agglutinative morphology and a subject-object-verb word order. Our work explores the fittest feature set from a wide range of features and a method that refines machine learning approach using gazetteers with approximate string matching, in an effort for robust handling of out-of-vocabulary words. As well as we tried to apply various existing machine learning methods and find optimal ensemble of classifiers based on genetic algorithm. The classifiers uses different feature representations. The resulting system constitutes the first-ever usable software package for Mongolian NER, while our experimental evaluation will also serve as a much-needed basis of comparison for further research.
Keywords
Mongolian named entity recognition Genetic algorithm Machine learning String matchingPreview
Unable to display preview. Download preview PDF.
References
- 1.Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of CoNLL-2003, pp. 148–151 (2003)Google Scholar
- 2.Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), Taipei, Taiwan (2002)Google Scholar
- 3.Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic modelsfor segmenting and labeling sequence data. In: Machine Learning International Workshop (2001)Google Scholar
- 4.Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (2003)Google Scholar
- 5.Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Preceedings of the 20th Meeting of Computational Linguistics, Netherlands (2010)Google Scholar
- 6.Ekbal, A., Saha, S.: Maximum entropy classifier ensembling using genetic algorithm for ner in bengali. In: Proceedings of the International Conference on Language Resource and Evaluation (LERC) (2010)Google Scholar
- 7.Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, pp. 73–78 (2003)Google Scholar
- 8.Piskorski, J., Wieloch, K., Pikula, M., Sydow, M.: Towards person name matching for inflective languages. In: NLPIX, Beijing, China (2008)Google Scholar
- 9.Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 10.Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: CoNLL-2003, Canada (2003)Google Scholar
- 11.Purev, J., Odbayar, C.: Part of speech tagging for mongolian corpus. In: The 7th Workshop on Asian Language Resources, Singapore (2009)Google Scholar
- 12.Simon, E., Kornai, A.: Approaches to hungarian named entity recognition. In: PhD thesis, Budapest University of Technology and Economics. (2013)Google Scholar
- 13.Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefGoogle Scholar
- 14.Winkler, W.E.: The state of record linkage and current research problems. In: Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC (1999)Google Scholar
- 15.Fleggi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Society 64, 1183–1210 (1969)CrossRefGoogle Scholar
- 16.Monge, A., Elkan, C.: The field matching problem: Algorithms and applications. In: Proceedings of Knowledge Discovery and Data Mining, pp. 267–270 (1996)Google Scholar
- 17.Goldberg, D.E.: Genetic algorithm in search, optimization, and machine learning. Addison-Wesley Publishing Company, Boston (1989)Google Scholar