Named Entity Recognition for Mongolian Language

Munkhjargal, Zoljargal; Bella, Gabor; Chagnaa, Altangerel; Giunchiglia, Fausto

doi:10.1007/978-3-319-24033-6_28

Zoljargal Munkhjargal¹⁵,
Gabor Bella¹⁶,
Altangerel Chagnaa¹⁵ &
…
Fausto Giunchiglia¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1839 Accesses
4 Citations

Abstract

This paper presents a pioneering work on building a Named Entity Recognition system for the Mongolian language, with an agglutinative morphology and a subject-object-verb word order. Our work explores the fittest feature set from a wide range of features and a method that refines machine learning approach using gazetteers with approximate string matching, in an effort for robust handling of out-of-vocabulary words. As well as we tried to apply various existing machine learning methods and find optimal ensemble of classifiers based on genetic algorithm. The classifiers uses different feature representations. The resulting system constitutes the first-ever usable software package for Mongolian NER, while our experimental evaluation will also serve as a much-needed basis of comparison for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of CoNLL-2003, pp. 148–151 (2003)
Google Scholar
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), Taipei, Taiwan (2002)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic modelsfor segmenting and labeling sequence data. In: Machine Learning International Workshop (2001)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (2003)
Google Scholar
Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Preceedings of the 20th Meeting of Computational Linguistics, Netherlands (2010)
Google Scholar
Ekbal, A., Saha, S.: Maximum entropy classifier ensembling using genetic algorithm for ner in bengali. In: Proceedings of the International Conference on Language Resource and Evaluation (LERC) (2010)
Google Scholar
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, pp. 73–78 (2003)
Google Scholar
Piskorski, J., Wieloch, K., Pikula, M., Sydow, M.: Towards person name matching for inflective languages. In: NLPIX, Beijing, China (2008)
Google Scholar
Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)
Chapter Google Scholar
Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: CoNLL-2003, Canada (2003)
Google Scholar
Purev, J., Odbayar, C.: Part of speech tagging for mongolian corpus. In: The 7th Workshop on Asian Language Resources, Singapore (2009)
Google Scholar
Simon, E., Kornai, A.: Approaches to hungarian named entity recognition. In: PhD thesis, Budapest University of Technology and Economics. (2013)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Winkler, W.E.: The state of record linkage and current research problems. In: Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC (1999)
Google Scholar
Fleggi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Society 64, 1183–1210 (1969)
Article Google Scholar
Monge, A., Elkan, C.: The field matching problem: Algorithms and applications. In: Proceedings of Knowledge Discovery and Data Mining, pp. 267–270 (1996)
Google Scholar
Goldberg, D.E.: Genetic algorithm in search, optimization, and machine learning. Addison-Wesley Publishing Company, Boston (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

DICS, National University of Mongolia, 14200, Ulaanbaatar, Mongolia
Zoljargal Munkhjargal & Altangerel Chagnaa
DISI, University of Trento, 38100, Trento, Italy
Gabor Bella & Fausto Giunchiglia

Authors

Zoljargal Munkhjargal
View author publications
You can also search for this author in PubMed Google Scholar
Gabor Bella
View author publications
You can also search for this author in PubMed Google Scholar
Altangerel Chagnaa
View author publications
You can also search for this author in PubMed Google Scholar
Fausto Giunchiglia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoljargal Munkhjargal .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Munkhjargal, Z., Bella, G., Chagnaa, A., Giunchiglia, F. (2015). Named Entity Recognition for Mongolian Language. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_28
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics