Skip to main content

A Bank Information Extraction System Based on Named Entity Recognition with CRFs from Noisy Customer Order Texts in Turkish

  • Conference paper
  • First Online:
Knowledge Engineering and Semantic Web (KESW 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 649))

Included in the following conference series:

Abstract

Each day hundred thousands of customer transactions arrive at banks operation center via fax channel. The information required to complete each transaction (money transfer, salary payment, tax payment etc.) is extracted manually by operators from the image of customer orders. Our information extraction system uses CRFs (Conditional Random Fields) for obtaining the required named entities for each transaction type from noisy text of customer orders. The difficulty of the problem arouses from the fact that every customer order has different formats, image resolution of orders are so low that OCR-ed (Optical Character Recognition) texts are highly noisy and Turkish is still challenging for the natural language processing techniques due to structure of the language. This paper mentions the difficulties of our problem domain and provides details of the methodology developed for extracting entities such as client name, organization name, bank account number, IBAN number, amount, currency and explanation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    TurkIE dataset contains approximately 55K tokens from news articles on terrorism from both online and print news sources.

  2. 2.

    ACL 2015 Workshop on Noisy User-generated Text (W-NUT).

  3. 3.

    Celikkaya et al’s data contains approximately 5K tweets with about 50K tokens.

  4. 4.

    Zemberek https://github.com/ahmetaa/zemberek-nlp.

  5. 5.

    The gazetteer is provided by the bank.

  6. 6.

    Available from http://mallet.cs.umass.edu/.

References

  1. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  2. Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)

    Article  MATH  Google Scholar 

  3. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  4. Seker, G.A., Eryigit, G.: Initial explorations on using CRFs for Turkish named entity recognition. In: COLING, pp. 2459–2474 (2012)

    Google Scholar 

  5. Yeniterzi, R.: Exploiting morphology in Turkish named entity recognition system. In: Proceedings of the ACL 2011 Student Session. pp. 105–110. Association for Computational Linguistics (2011)

    Google Scholar 

  6. Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: KONVENS, pp. 118–127 (2012)

    Google Scholar 

  7. Tatar, S., Cicekli, I.: Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J. Inf. Sci. 37(2), 137–151 (2011)

    Article  Google Scholar 

  8. Kucuk, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. In: Proceedings of 5th Workshop on Language Analysis for Social Media, pp. 71–78 (2014)

    Google Scholar 

  9. Yamada, I., Takeda, H., Takefuji, Y.: Enhancing named entity recognition in Twitter messages using entity linking. In: ACL-IJCNLP, p. 136 (2015)

    Google Scholar 

  10. Eken, B., Tantug, C.: Recognizing named entities in Turkish tweets. In: Proceedings of 4th International Conference on Software Engineering and Applications, Dubai (2015)

    Google Scholar 

  11. Tur, G., Hakkani-Tur, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9(02), 181–210 (2003)

    Article  Google Scholar 

  12. Celikkaya, G., Torunoglu, D., Eryigit, G.: Named entity recognition on real data: a preliminary investigation for Turkish. In: 7th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–5 (2013)

    Google Scholar 

  13. Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of 5th international Conference on Language Resources and Evaluation (LREC-2006), pp. 1–4 (2006)

    Google Scholar 

  14. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp. 104–107 (2004)

    Google Scholar 

  15. Klinger, R., Friedrich, C.M., Fluck, J., Hofmann-Apitius, M.: Named entity recognition with combinations of conditional random fields. In: Proceedings of 2nd Biocreative Challenge Evaluation Workshop (2007)

    Google Scholar 

  16. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. vol. 1, pp. 134–141 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erdem Emekligil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Emekligil, E., Arslan, S., Agin, O. (2016). A Bank Information Extraction System Based on Named Entity Recognition with CRFs from Noisy Customer Order Texts in Turkish. In: Ngonga Ngomo, AC., Křemen, P. (eds) Knowledge Engineering and Semantic Web. KESW 2016. Communications in Computer and Information Science, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-45880-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45880-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45879-3

  • Online ISBN: 978-3-319-45880-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics