Advertisement

Named Entity Recognition for Mongolian Language

  • Zoljargal MunkhjargalEmail author
  • Gabor Bella
  • Altangerel Chagnaa
  • Fausto Giunchiglia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)

Abstract

This paper presents a pioneering work on building a Named Entity Recognition system for the Mongolian language, with an agglutinative morphology and a subject-object-verb word order. Our work explores the fittest feature set from a wide range of features and a method that refines machine learning approach using gazetteers with approximate string matching, in an effort for robust handling of out-of-vocabulary words. As well as we tried to apply various existing machine learning methods and find optimal ensemble of classifiers based on genetic algorithm. The classifiers uses different feature representations. The resulting system constitutes the first-ever usable software package for Mongolian NER, while our experimental evaluation will also serve as a much-needed basis of comparison for further research.

Keywords

Mongolian named entity recognition Genetic algorithm Machine learning String matching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of CoNLL-2003, pp. 148–151 (2003)Google Scholar
  2. 2.
    Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), Taipei, Taiwan (2002)Google Scholar
  3. 3.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic modelsfor segmenting and labeling sequence data. In: Machine Learning International Workshop (2001)Google Scholar
  4. 4.
    Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (2003)Google Scholar
  5. 5.
    Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Preceedings of the 20th Meeting of Computational Linguistics, Netherlands (2010)Google Scholar
  6. 6.
    Ekbal, A., Saha, S.: Maximum entropy classifier ensembling using genetic algorithm for ner in bengali. In: Proceedings of the International Conference on Language Resource and Evaluation (LERC) (2010)Google Scholar
  7. 7.
    Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, pp. 73–78 (2003)Google Scholar
  8. 8.
    Piskorski, J., Wieloch, K., Pikula, M., Sydow, M.: Towards person name matching for inflective languages. In: NLPIX, Beijing, China (2008)Google Scholar
  9. 9.
    Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: CoNLL-2003, Canada (2003)Google Scholar
  11. 11.
    Purev, J., Odbayar, C.: Part of speech tagging for mongolian corpus. In: The 7th Workshop on Asian Language Resources, Singapore (2009)Google Scholar
  12. 12.
    Simon, E., Kornai, A.: Approaches to hungarian named entity recognition. In: PhD thesis, Budapest University of Technology and Economics. (2013)Google Scholar
  13. 13.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefGoogle Scholar
  14. 14.
    Winkler, W.E.: The state of record linkage and current research problems. In: Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC (1999)Google Scholar
  15. 15.
    Fleggi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Society 64, 1183–1210 (1969)CrossRefGoogle Scholar
  16. 16.
    Monge, A., Elkan, C.: The field matching problem: Algorithms and applications. In: Proceedings of Knowledge Discovery and Data Mining, pp. 267–270 (1996)Google Scholar
  17. 17.
    Goldberg, D.E.: Genetic algorithm in search, optimization, and machine learning. Addison-Wesley Publishing Company, Boston (1989)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zoljargal Munkhjargal
    • 1
    Email author
  • Gabor Bella
    • 2
  • Altangerel Chagnaa
    • 1
  • Fausto Giunchiglia
    • 2
  1. 1.DICS, National University of MongoliaUlaanbaatarMongolia
  2. 2.DISI, University of TrentoTrentoItaly

Personalised recommendations