Skip to main content
Log in

TMD-NER: Turkish multi-domain named entity recognition for informal texts

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

Data sharing is not applicable to this article as no new data were analyzed in this study.

Notes

  1. For instance, employing i as opposed to ı, or g instead of ğ.

References

  1. Yilmaz, S.F., Balaban, I., Tekin, S.F., and Kozat, S.S.: Hybrid framework for named entity recognition in turkish social media. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2020)

  2. Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition (2003) arXiv preprint arXiv:cs/0306050

  3. Chen, X., Du, J., Zhang, H.: Lipreading with densenet and resbi-lstm. SIViP 14, 981–989 (2020)

    Article  Google Scholar 

  4. Bontcheva, K., et al.: Twitie: an open-source information extraction pipeline for microblog text. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP), pp. 83–90 (2013)

  5. Mohit, B.: Named entity recognition. In Natural Language Processing of Semitic Languages, pp. 221–245, (2014)

  6. Mollá, D., et al.: Named entity recognition for question answering (2006)

  7. Babych, B., Hartley, A.: Improving machine translation quality with automatic named entity recognition. In Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools. Association for Computational Linguistics, pp. 1–8 (2003)

  8. Shi, Y., et al.: A natural language-inspired multilabel video streaming source identification method based on deep neural networks. SIViP 15, 1161–1168 (2021)

    Article  Google Scholar 

  9. Ritter, A., et al.: Named entity recognition in tweets: an experimental study. In Proceedings of the Conference Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1524–1534 (2011)

  10. Şahinuç, F., Yilmaz, E. H., Toraman, C., Koç, A.: The effect of gender bias on hate speech detection. SIViP 1–7 (2022)

  11. Yeniterzi, R. et al.: Turkish named-entity recognition. In Turkish Natural Language Processing, pp. 115–132. Springer (2018)

  12. Alazaidah, R., Ahmad, F.K.: Trending challenges in multi label classification. Int. J. Adv. Comput. Sci. Appl. (2016)

  13. Tür, G.: A statistical information extraction system for turkish, Ph.D. dissertation, Bilkent Univ., (2000)

  14. Küçük, D., Yazici, A.: A hybrid named entity recognizer for Turkish with applications to different text genres. In Computing and Information Science, pp. 113–116. Springer (2011)

  15. Tatar, S., Cicekli, I.: Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J. Inf. Sci. 37(2), 137–151 (2011)

    Article  Google Scholar 

  16. Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9(2), 181–210 (2003)

    Article  Google Scholar 

  17. Küçük, D. et al.: Named entity recognition experiments on Turkish texts. In International Conference on Flexible Query Answering Systems, pp. 524–535. Springer (2009)

  18. Şeker, G. A., Eryiğit, G.: Initial explorations on using crfs for Turkish named entity recognition. In Proceedings of the COLING, pp. 2459–2474 (2012)

  19. Demir, H., Özgür, A.: Improving named entity recognition for morphologically rich languages using word embeddings. In ICMLA (2014)

  20. Çelikkaya, G. et al.: Named entity recognition on real data: a preliminary investigation for turkish. In proceedings of the 7th International Conference on Information, Communication and Computing Technology, IEEE, pp. 1–5 (2013)

  21. Eken, B., Tantug, C.: Recognizing named entities in turkish tweets. In Proceedings of the Fourth International Conference on Software Engineering and Application, Dubai, UAE (2015)

  22. Küçük, D., Steinberger, R.: Experiments to improve named entity recognition on turkish tweets (2014) arXiv preprint arXiv:1410.8668

  23. Vural, N.M., Ilhan, F., Yilmaz, S.F., Ergüt, S., Kozat, S.S.: Achieving online regression performance of LSTMS with simple RNNS. IEEE Trans. Neural Netw. Learn. Syst. 33(12), 7632–7643 (2022)

    Article  PubMed  Google Scholar 

  24. Yilmaz, S.F., Kaynak, E.B., Koç, A., Dibeklioğlu, H., Kozat, S.S.: Multi-label sentiment analysis on 100 languages with dynamic weighting for label imbalance. IEEE Trans. Neural Netw. Learn. Syst. (2021)

  25. Jin, Y., Xie, J., Guo, W., Luo, C., Wu, D., Wang, R.: LSTM-CRF neural network with gated self attention for Chinese NER. IEEE Access 7, 136694–136703 (2019)

    Article  Google Scholar 

  26. Akkaya, E.K.: Deep neural networks for named entity recognition on social media, Master’s thesis, Fen Bilimleri Enstitüsü, (2018)

  27. Yilmaz, S.F., Balaban, I., Kozat, S.S.: Improved named entity recognition in Turkish news via word lookup methods. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2020)

  28. Nakayama, H. et al.: doccano: Text annotation tool for human (2018) [Online]. Available: https://github.com/doccano/doccano

  29. Eryiğit, G.: Itu turkish nlp web service. In Proceedings of the Demonstrations 14th Conference of the European Chapter of the Association for Computational Linguistic, pp. 1–4 (2014)

  30. Akın, A.A., Akın, M.D.: Zemberek, an open source NLP framework for Turkic languages. Structure 10, 1–5 (2007)

    Google Scholar 

  31. Manning, C. et al.: The stanford corenlp natural language processing toolkit. In 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

  32. Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), pp. 1577–1586 (2013)

  33. Giritlioğlu, D., Mandira, B., Yilmaz, S.F., Ertenli, C.U., Akgür, B.F., Kınıklıoğlu, M., Kurt, A.G., Mutlu, E., Gürel, ŞC., Dibeklioğlu, H.: Multimodal analysis of personality traits on videos of self-presentation and induced behavior. J. Multimodal User Interfaces 15(4), 337–358 (2021)

    Article  Google Scholar 

  34. Mandıra, B., Giritlioglu, D., Yilmaz, S.F., Ertenli, C.U., Akgür, B.F., Kınıklıoglu, M., Kurt, A.G., Doganlı, M.N., Mutlu, E., Gürel, S.C., et al.: Spatiotemporal and multimodal analysis of personality traits. In 15th International Summer Workshop on Multimodal Interfaces, (2019)

  35. Collobert, R., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  36. Grave, E. et al.: Learning word vectors for 157 languages. In Proceedings of theInternational Conference on Language Resources and Evaluation (LREC 2018), (2018)

  37. Kuru, O.: Charner: character-level named entity recognition. In Proceedings of the of COLING, et al.: The 26th International Conference on Computational Linguistics: Technical Papers 2016, 911–921 (2016)

  38. Gungor, O. et al.: Morphological embeddings for named entity recognition in morphologically rich languages (2017) arXiv preprint arXiv:1706.00506

  39. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, pp. 1064–1074 (2016)

  40. Lesk, M.E., Schmidt, E.: Lex: A lexical analyzer generator (1975)

  41. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  Google Scholar 

  42. Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging (2015) arXiv preprint arXiv:1508.01991

  43. Reimers , N., Gurevych, I.: Reporting score distributions: performance study of lstm-networks for sequence tagging (2017) arXiv:1707.09861

  44. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014) arXiv:1412.6980

  45. Eşref, Y., Can, B.: Using morpheme-level attention mechanism for turkish sequence labelling. In 27th Signal Processing and Communications Applications Conference (SIU). IEEE, pp. 1–4 (2019)

  46. Güneş, A., Tantug, A.C.: Turkish named entity recognition with deep learning. In 26th Signal Processing and Communications Applications Conference (SIU). IEEE, pp. 1–4 (2018)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All authors agreed on the content of this study. SY and FM conducted the analysis based on the agreed steps. Results and conclusions are examined and written together.

Corresponding author

Correspondence to Furkan B. Mutlu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has appeared in part at the 2022 IEEE Signal Processing and Communications Applications Conference [1] and was done when Selim F. Yilmaz was affiliated with Bilkent University, Ankara, Turkey.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 4986 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yilmaz, S.F., Mutlu, F.B., Balaban, I. et al. TMD-NER: Turkish multi-domain named entity recognition for informal texts. SIViP 18, 2255–2263 (2024). https://doi.org/10.1007/s11760-023-02898-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02898-0

Keywords

Navigation