Abstract
This paper presents a pioneering work on building a Named Entity Recognition system for the Mongolian language, with an agglutinative morphology and a subject-object-verb word order. Our work explores the fittest feature set from a wide range of features and a method that refines machine learning approach using gazetteers with approximate string matching, in an effort for robust handling of out-of-vocabulary words. As well as we tried to apply various existing machine learning methods and find optimal ensemble of classifiers based on genetic algorithm. The classifiers uses different feature representations. The resulting system constitutes the first-ever usable software package for Mongolian NER, while our experimental evaluation will also serve as a much-needed basis of comparison for further research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of CoNLL-2003, pp. 148–151 (2003)
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), Taipei, Taiwan (2002)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic modelsfor segmenting and labeling sequence data. In: Machine Learning International Workshop (2001)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (2003)
Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Preceedings of the 20th Meeting of Computational Linguistics, Netherlands (2010)
Ekbal, A., Saha, S.: Maximum entropy classifier ensembling using genetic algorithm for ner in bengali. In: Proceedings of the International Conference on Language Resource and Evaluation (LERC) (2010)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, pp. 73–78 (2003)
Piskorski, J., Wieloch, K., Pikula, M., Sydow, M.: Towards person name matching for inflective languages. In: NLPIX, Beijing, China (2008)
Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)
Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: CoNLL-2003, Canada (2003)
Purev, J., Odbayar, C.: Part of speech tagging for mongolian corpus. In: The 7th Workshop on Asian Language Resources, Singapore (2009)
Simon, E., Kornai, A.: Approaches to hungarian named entity recognition. In: PhD thesis, Budapest University of Technology and Economics. (2013)
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Winkler, W.E.: The state of record linkage and current research problems. In: Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC (1999)
Fleggi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Society 64, 1183–1210 (1969)
Monge, A., Elkan, C.: The field matching problem: Algorithms and applications. In: Proceedings of Knowledge Discovery and Data Mining, pp. 267–270 (1996)
Goldberg, D.E.: Genetic algorithm in search, optimization, and machine learning. Addison-Wesley Publishing Company, Boston (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Munkhjargal, Z., Bella, G., Chagnaa, A., Giunchiglia, F. (2015). Named Entity Recognition for Mongolian Language. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)