Abstract
Each day hundred thousands of customer transactions arrive at banks operation center via fax channel. The information required to complete each transaction (money transfer, salary payment, tax payment etc.) is extracted manually by operators from the image of customer orders. Our information extraction system uses CRFs (Conditional Random Fields) for obtaining the required named entities for each transaction type from noisy text of customer orders. The difficulty of the problem arouses from the fact that every customer order has different formats, image resolution of orders are so low that OCR-ed (Optical Character Recognition) texts are highly noisy and Turkish is still challenging for the natural language processing techniques due to structure of the language. This paper mentions the difficulties of our problem domain and provides details of the methodology developed for extracting entities such as client name, organization name, bank account number, IBAN number, amount, currency and explanation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
TurkIE dataset contains approximately 55K tokens from news articles on terrorism from both online and print news sources.
- 2.
ACL 2015 Workshop on Noisy User-generated Text (W-NUT).
- 3.
Celikkaya et al’s data contains approximately 5K tweets with about 50K tokens.
- 4.
Zemberek https://github.com/ahmetaa/zemberek-nlp.
- 5.
The gazetteer is provided by the bank.
- 6.
Available from http://mallet.cs.umass.edu/.
References
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Seker, G.A., Eryigit, G.: Initial explorations on using CRFs for Turkish named entity recognition. In: COLING, pp. 2459–2474 (2012)
Yeniterzi, R.: Exploiting morphology in Turkish named entity recognition system. In: Proceedings of the ACL 2011 Student Session. pp. 105–110. Association for Computational Linguistics (2011)
Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: KONVENS, pp. 118–127 (2012)
Tatar, S., Cicekli, I.: Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J. Inf. Sci. 37(2), 137–151 (2011)
Kucuk, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. In: Proceedings of 5th Workshop on Language Analysis for Social Media, pp. 71–78 (2014)
Yamada, I., Takeda, H., Takefuji, Y.: Enhancing named entity recognition in Twitter messages using entity linking. In: ACL-IJCNLP, p. 136 (2015)
Eken, B., Tantug, C.: Recognizing named entities in Turkish tweets. In: Proceedings of 4th International Conference on Software Engineering and Applications, Dubai (2015)
Tur, G., Hakkani-Tur, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9(02), 181–210 (2003)
Celikkaya, G., Torunoglu, D., Eryigit, G.: Named entity recognition on real data: a preliminary investigation for Turkish. In: 7th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–5 (2013)
Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of 5th international Conference on Language Resources and Evaluation (LREC-2006), pp. 1–4 (2006)
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp. 104–107 (2004)
Klinger, R., Friedrich, C.M., Fluck, J., Hofmann-Apitius, M.: Named entity recognition with combinations of conditional random fields. In: Proceedings of 2nd Biocreative Challenge Evaluation Workshop (2007)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. vol. 1, pp. 134–141 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Emekligil, E., Arslan, S., Agin, O. (2016). A Bank Information Extraction System Based on Named Entity Recognition with CRFs from Noisy Customer Order Texts in Turkish. In: Ngonga Ngomo, AC., Křemen, P. (eds) Knowledge Engineering and Semantic Web. KESW 2016. Communications in Computer and Information Science, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-45880-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-45880-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45879-3
Online ISBN: 978-3-319-45880-9
eBook Packages: Computer ScienceComputer Science (R0)