TMD-NER: Turkish multi-domain named entity recognition for informal texts

Yilmaz, Selim F.; Mutlu, Furkan B.; Balaban, Ismail; Kozat, Suleyman S.

doi:10.1007/s11760-023-02898-0

TMD-NER: Turkish multi-domain named entity recognition for informal texts

Original Paper
Published: 19 December 2023

Volume 18, pages 2255–2263, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

176 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

An analysis of large language models: their impact and potential applications

Article 11 May 2024

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

Availability of data and materials

Data sharing is not applicable to this article as no new data were analyzed in this study.

Notes

For instance, employing i as opposed to ı, or g instead of ğ.

References

Yilmaz, S.F., Balaban, I., Tekin, S.F., and Kozat, S.S.: Hybrid framework for named entity recognition in turkish social media. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2020)
Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition (2003) arXiv preprint arXiv:cs/0306050
Chen, X., Du, J., Zhang, H.: Lipreading with densenet and resbi-lstm. SIViP 14, 981–989 (2020)
Article Google Scholar
Bontcheva, K., et al.: Twitie: an open-source information extraction pipeline for microblog text. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP), pp. 83–90 (2013)
Mohit, B.: Named entity recognition. In Natural Language Processing of Semitic Languages, pp. 221–245, (2014)
Mollá, D., et al.: Named entity recognition for question answering (2006)
Babych, B., Hartley, A.: Improving machine translation quality with automatic named entity recognition. In Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools. Association for Computational Linguistics, pp. 1–8 (2003)
Shi, Y., et al.: A natural language-inspired multilabel video streaming source identification method based on deep neural networks. SIViP 15, 1161–1168 (2021)
Article Google Scholar
Ritter, A., et al.: Named entity recognition in tweets: an experimental study. In Proceedings of the Conference Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1524–1534 (2011)
Şahinuç, F., Yilmaz, E. H., Toraman, C., Koç, A.: The effect of gender bias on hate speech detection. SIViP 1–7 (2022)
Yeniterzi, R. et al.: Turkish named-entity recognition. In Turkish Natural Language Processing, pp. 115–132. Springer (2018)
Alazaidah, R., Ahmad, F.K.: Trending challenges in multi label classification. Int. J. Adv. Comput. Sci. Appl. (2016)
Tür, G.: A statistical information extraction system for turkish, Ph.D. dissertation, Bilkent Univ., (2000)
Küçük, D., Yazici, A.: A hybrid named entity recognizer for Turkish with applications to different text genres. In Computing and Information Science, pp. 113–116. Springer (2011)
Tatar, S., Cicekli, I.: Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J. Inf. Sci. 37(2), 137–151 (2011)
Article Google Scholar
Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9(2), 181–210 (2003)
Article Google Scholar
Küçük, D. et al.: Named entity recognition experiments on Turkish texts. In International Conference on Flexible Query Answering Systems, pp. 524–535. Springer (2009)
Şeker, G. A., Eryiğit, G.: Initial explorations on using crfs for Turkish named entity recognition. In Proceedings of the COLING, pp. 2459–2474 (2012)
Demir, H., Özgür, A.: Improving named entity recognition for morphologically rich languages using word embeddings. In ICMLA (2014)
Çelikkaya, G. et al.: Named entity recognition on real data: a preliminary investigation for turkish. In proceedings of the 7th International Conference on Information, Communication and Computing Technology, IEEE, pp. 1–5 (2013)
Eken, B., Tantug, C.: Recognizing named entities in turkish tweets. In Proceedings of the Fourth International Conference on Software Engineering and Application, Dubai, UAE (2015)
Küçük, D., Steinberger, R.: Experiments to improve named entity recognition on turkish tweets (2014) arXiv preprint arXiv:1410.8668
Vural, N.M., Ilhan, F., Yilmaz, S.F., Ergüt, S., Kozat, S.S.: Achieving online regression performance of LSTMS with simple RNNS. IEEE Trans. Neural Netw. Learn. Syst. 33(12), 7632–7643 (2022)
Article PubMed Google Scholar
Yilmaz, S.F., Kaynak, E.B., Koç, A., Dibeklioğlu, H., Kozat, S.S.: Multi-label sentiment analysis on 100 languages with dynamic weighting for label imbalance. IEEE Trans. Neural Netw. Learn. Syst. (2021)
Jin, Y., Xie, J., Guo, W., Luo, C., Wu, D., Wang, R.: LSTM-CRF neural network with gated self attention for Chinese NER. IEEE Access 7, 136694–136703 (2019)
Article Google Scholar
Akkaya, E.K.: Deep neural networks for named entity recognition on social media, Master’s thesis, Fen Bilimleri Enstitüsü, (2018)
Yilmaz, S.F., Balaban, I., Kozat, S.S.: Improved named entity recognition in Turkish news via word lookup methods. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2020)
Nakayama, H. et al.: doccano: Text annotation tool for human (2018) [Online]. Available: https://github.com/doccano/doccano
Eryiğit, G.: Itu turkish nlp web service. In Proceedings of the Demonstrations 14th Conference of the European Chapter of the Association for Computational Linguistic, pp. 1–4 (2014)
Akın, A.A., Akın, M.D.: Zemberek, an open source NLP framework for Turkic languages. Structure 10, 1–5 (2007)
Google Scholar
Manning, C. et al.: The stanford corenlp natural language processing toolkit. In 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), pp. 1577–1586 (2013)
Giritlioğlu, D., Mandira, B., Yilmaz, S.F., Ertenli, C.U., Akgür, B.F., Kınıklıoğlu, M., Kurt, A.G., Mutlu, E., Gürel, ŞC., Dibeklioğlu, H.: Multimodal analysis of personality traits on videos of self-presentation and induced behavior. J. Multimodal User Interfaces 15(4), 337–358 (2021)
Article Google Scholar
Mandıra, B., Giritlioglu, D., Yilmaz, S.F., Ertenli, C.U., Akgür, B.F., Kınıklıoglu, M., Kurt, A.G., Doganlı, M.N., Mutlu, E., Gürel, S.C., et al.: Spatiotemporal and multimodal analysis of personality traits. In 15th International Summer Workshop on Multimodal Interfaces, (2019)
Collobert, R., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
Grave, E. et al.: Learning word vectors for 157 languages. In Proceedings of theInternational Conference on Language Resources and Evaluation (LREC 2018), (2018)
Kuru, O.: Charner: character-level named entity recognition. In Proceedings of the of COLING, et al.: The 26th International Conference on Computational Linguistics: Technical Papers 2016, 911–921 (2016)
Gungor, O. et al.: Morphological embeddings for named entity recognition in morphologically rich languages (2017) arXiv preprint arXiv:1706.00506
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, pp. 1064–1074 (2016)
Lesk, M.E., Schmidt, E.: Lex: A lexical analyzer generator (1975)
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Article MathSciNet Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging (2015) arXiv preprint arXiv:1508.01991
Reimers , N., Gurevych, I.: Reporting score distributions: performance study of lstm-networks for sequence tagging (2017) arXiv:1707.09861
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014) arXiv:1412.6980
Eşref, Y., Can, B.: Using morpheme-level attention mechanism for turkish sequence labelling. In 27th Signal Processing and Communications Applications Conference (SIU). IEEE, pp. 1–4 (2019)
Güneş, A., Tantug, A.C.: Turkish named entity recognition with deep learning. In 26th Signal Processing and Communications Applications Conference (SIU). IEEE, pp. 1–4 (2018)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Imperial College London, London, UK
Selim F. Yilmaz
Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey
Furkan B. Mutlu & Suleyman S. Kozat
Department of Statistics, Middle East Technical University, Ankara, Turkey
Ismail Balaban

Authors

Selim F. Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Furkan B. Mutlu
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Balaban
View author publications
You can also search for this author in PubMed Google Scholar
Suleyman S. Kozat
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors agreed on the content of this study. SY and FM conducted the analysis based on the agreed steps. Results and conclusions are examined and written together.

Corresponding author

Correspondence to Furkan B. Mutlu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has appeared in part at the 2022 IEEE Signal Processing and Communications Applications Conference [1] and was done when Selim F. Yilmaz was affiliated with Bilkent University, Ankara, Turkey.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 4986 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yilmaz, S.F., Mutlu, F.B., Balaban, I. et al. TMD-NER: Turkish multi-domain named entity recognition for informal texts. SIViP 18, 2255–2263 (2024). https://doi.org/10.1007/s11760-023-02898-0

Download citation

Received: 19 September 2023
Revised: 14 November 2023
Accepted: 15 November 2023
Published: 19 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02898-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TMD-NER: Turkish multi-domain named entity recognition for informal texts

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

An analysis of large language models: their impact and potential applications

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 4986 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TMD-NER: Turkish multi-domain named entity recognition for informal texts

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

An analysis of large language models: their impact and potential applications

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 4986 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation