Utilizing Annotated Wikipedia Article Titles to Improve a Rule-Based Named Entity Recognizer for Turkish

Küçük, Dilek

doi:10.1007/978-3-642-40769-7_59

Dilek Küçük²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8132))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

1364 Accesses
1 Citations

Abstract

Named entity recognition is one of the information extraction tasks which aims to identify named entities such as person/ location/organization names along with some numeric and temporal expressions in free natural language texts. In this study, we target at named entity recognition from Turkish texts on which information extraction research is considerably rare compared to other well-studied languages. The effects of utilizing annotated Wikipedia article titles to enrich the lexical resources of a rule-based named entity recognizer for Turkish are discussed after evaluating the enriched named entity recognizer against its initial version. The evaluation results demonstrate that the presented extension improves the recognition performance on different text genres, particularly on historical and financial news text sets for which the initial recognizer has not been engineered for. The current study is significant as it is the first study to address the utilization of Wikipedia articles as an information source to improve named entity recognition on Turkish texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvistica Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Grishman, R.: Information extraction. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics. Oxford University Press (2003)
Google Scholar
Turmo, J., Ageno, A., Catala, N.: Adaptive information extraction. ACM Computing Surveys 38(2), 1–47 (2006)
Article Google Scholar
Sekine, S., Grishman, R., Shinnou, H.: A decision tree method for finding and classifying names in Japanese texts. In: Proceedings of the Sixth Workshop on Very Large Corpora (1998)
Google Scholar
Freitag, D.: Machine learning for information extraction in informal domains. Machine Learning 39(2-3), 169–202 (2000)
Article MATH Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201 (1997)
Google Scholar
Li, Y., Bontcheva, K., Cunningham, H.: Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Natural Language Engineering 15(2), 241–271 (2009)
Article Google Scholar
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the Seventh Conference on Natural language Learning (CONLL) at HLT-NAACL, pp. 184–187 (2003)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning, pp. 188–191 (2003)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 90–99 (1999)
Google Scholar
Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181–210 (2003)
Article Google Scholar
Küçük, D., Yazıcı, A.: Named entity recognition experiments on Turkish texts. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 524–535. Springer, Heidelberg (2009)
Chapter Google Scholar
Küçük, D., Yazıcı, A.: A hybrid named entity recognizer for Turkish. Expert Systems with Applications 39(3), 2733–2742 (2012)
Article Google Scholar
Küçük, D., Yazıcı, A.: Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowledge-Based Systems 24(6), 844–857 (2011)
Article Google Scholar
Küçük, D., Yazıcı, A.: A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval. Expert Systems with Applications 40(9), 3398–3411 (2013)
Article Google Scholar
Yeniterzi, R.: Exploiting morphology in Turkish named entity recognition system. In: Proceedings of the ACL Student Session, pp. 105–110 (2011)
Google Scholar
Tatar, S., Çicekli, İ.: Automatic rule learning exploiting morphological features for named entity recognition in Turkish. Journal of Information Science 37(2), 137–151 (2011)
Article Google Scholar
Medelyan, O., Milne, D.N., Legg, C., Witten, I.H.: Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67(9), 716–754 (2009)
Article Google Scholar
Toral, A., Munoz, R.: A proposal to automatically build and maintain gazetteers for named entity recognition by using Wikipedia. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL) (2006)
Google Scholar
Nothman, J., Curran, J.R., Murphy, T.: Transforming Wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 124–132 (2008)
Google Scholar
Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., Curran, J.R.: Named entity recognition in Wikipedia. In: Proceedings of the Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 10–18 (2009)
Google Scholar
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the 11th International Conference of Turkish Linguistics (ICTL) (2002)
Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Power Technologies Group, TÜBİTAK Energy Institute, 06800, Ankara, Turkey
Dilek Küçük

Authors

Dilek Küçük
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Systems, Aalborg University, 6700, Esbjerg, Denmark
Henrik Legind Larsen
Department of Computer Science and Artificial Intelligence, University of Granada, 18071, Granada, Spain
Maria J. Martin-Bautista
Department of Computer Science and Arificial IntelIigence, University of Granada, 18071, Granada, Spain
María Amparo Vila
CBIT, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Troels Andreasen
CBIT, Roskilde University, 4000, Roskilde, Denmark
Henning Christiansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Küçük, D. (2013). Utilizing Annotated Wikipedia Article Titles to Improve a Rule-Based Named Entity Recognizer for Turkish. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2013. Lecture Notes in Computer Science(), vol 8132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40769-7_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-40769-7_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40768-0
Online ISBN: 978-3-642-40769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics