Abstract
Banking Information Systems continuously generate large quantities of data as inter-connected streams (transactions, events logs, time series, metrics, graphs, process, etc.). Such data streams need to be processed online to deal with critical business applications such as real-time fraud detection, network security attack prevention or predictive maintenance on information system infrastructure. Many algorithms have been proposed for data stream learning, however, most of them do not deal with the important challenges and constraints imposed by real-world applications. In particular, when we need to train models incrementally from heterogeneous data mining and deployment them within complex big data architecture. Based on banking applications and lessons learned in production environments of BNP Paribas - a major international banking group and leader in the Eurozone - we identified the most important current challenges for mining IT data streams. Our goal is to highlight the key challenges faced by data scientists and data engineers within complex industry settings for building or deploying models for real word streaming applications. We provide future research directions on Stream Learning that will accelerate the adoption of online learning models for solving real-word problems. Therefore bridging the gap between research and industry communities. Finally, we provide some recommendations to tackle some of these challenges.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Noah Fiedel talk at TensorFlow Dev Summit 2017 - https://www.youtube.com/watch?v=q_IkJcPyNl0 at 2,24”.
- 2.
- 3.
BWT refers to Transferring funds between banks to eliminate source of dirty and criminal money.
- 4.
- 5.
Presented at the 6th International Workshop on Quantitative Approaches to Software Quality - 2018.
- 6.
- 7.
Due to space restrictions, for more details, we refer the reader to the official documentation of technologies of Apache Kafka, Apache Nifi, Apcahe Flink, and Jenkins.
References
AEPD: GDPR compliance of processings that embed Artificial Intelligence An introduction. The Spanish Data Protection Agency (2020). https://www.aepd.es/sites/default/files/2020-07/adecuacion-rgpd-ia-en.pdf. Accessed 10 Dec 2020
Apache: The Apache Software Foundation (2021). https://www.apache.org//. Accessed 10 May 2021
Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1285–1298. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3133956.3134015
EBF: European Banking Federation, EBF position paper on AI in the banking industry. EU Transparency Register, ID number: 4722660838–23 (2019). https://www.ebf.eu/wp-content/uploads/2020/03/EBF_037419-Artificial-Intelligence-in-the-banking-sector-EBF.pdf. Accessed 15 May 2021
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013). https://doi.org/10.1145/2481244.2481246
Flink: Apache Flink, a framework and distributed processing engine (2021). https://flink.apache.org/. Accessed 04 Mar 2021
Gitlab: Gitlab, Create a Jenkins Pipeline (2021). https://about.gitlab.com/handbook/customer-success/demo-systems/tutorials/integrations/create-jenkins-pipeline/. Accessed 10 May 2021
He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40 (2017)
He, S., He, P., Chen, Z., Yang, T., Su, Y., Lyu, M.: A survey on automated log analysis for reliability engineering (September 2020)
Hoi, S., Sahoo, D., Lu, J., Zhao, P.: Online learning: a comprehensive survey (February 2018)
Jenkins: The leading open source automation server for deploying projects (2021). https://www.jenkins.io/. Accessed 16 Apr 2021
Kafka: Apache Kafka, an open-source distributed event streaming platform (2021). https://kafka.apache.org/. Accessed 5 May 2021
Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16, 1–10 (2014). https://doi.org/10.1145/2674026.2674028
Kubeflow: The Machine Learning Toolkit for Kubernetes (2021). https://www.kubeflow.org/. Accessed 01 Apr 2021
Kubernetes: Automating deployment containerized applications (2021). https://kubernetes.io. Accessed 04 Mar 2021
Manzoor, E., Milajerdi, S., Venkatakrishnan, V., Akoglu, L.: Fast memory-efficient anomaly detection in streaming heterogeneous graphs (February 2016)
Mcgregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43, 9–20 (2014)
Meng, F.J., Wegman, M.N., Xu, J.M., Zhang, X., Chen, P., Chafle, G.: It troubleshooting with drift analysis in the DevOps era. IBM J. Res. Dev. 61(1), 6:62-6:73 (2017)
Montiel, J., et al.: River: machine learning for streaming data in Python (2020)
NIFI: Apache NIFI, an easy to use, powerful, and reliable system to process and distribute data (2021). https://nifi.apache.org/. Accessed 10 May 2021
Tan, S., Ting, K., Liu, F.T.: Fast anomaly detection for streaming data, pp. 1511–1516 (January 2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-254
Wu, W., Gruenwald, L.: Research issues in mining multiple data streams, pp. 56–60 (July 2010)
Zhu, J., et al.: Tools and benchmarks for automated log parsing (November 2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Barry, M., Bifet, A., Chiky, R., Montiel, J., Tran, VT. (2021). Challenges of Machine Learning for Data Streams in the Banking Industry. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-93620-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93619-8
Online ISBN: 978-3-030-93620-4
eBook Packages: Computer ScienceComputer Science (R0)