Skip to main content

Challenges of Machine Learning for Data Streams in the Banking Industry

  • 484 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 13147)


Banking Information Systems continuously generate large quantities of data as inter-connected streams (transactions, events logs, time series, metrics, graphs, process, etc.). Such data streams need to be processed online to deal with critical business applications such as real-time fraud detection, network security attack prevention or predictive maintenance on information system infrastructure. Many algorithms have been proposed for data stream learning, however, most of them do not deal with the important challenges and constraints imposed by real-world applications. In particular, when we need to train models incrementally from heterogeneous data mining and deployment them within complex big data architecture. Based on banking applications and lessons learned in production environments of BNP Paribas - a major international banking group and leader in the Eurozone - we identified the most important current challenges for mining IT data streams. Our goal is to highlight the key challenges faced by data scientists and data engineers within complex industry settings for building or deploying models for real word streaming applications. We provide future research directions on Stream Learning that will accelerate the adoption of online learning models for solving real-word problems. Therefore bridging the gap between research and industry communities. Finally, we provide some recommendations to tackle some of these challenges.


  • Challenges
  • Production
  • IT
  • Streaming
  • Banking

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-93620-4_9
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-93620-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. 1.

    Noah Fiedel talk at TensorFlow Dev Summit 2017 - at 2,24”.

  2. 2.

  3. 3.

    BWT refers to Transferring funds between banks to eliminate source of dirty and criminal money.

  4. 4.

  5. 5.

    Presented at the 6th International Workshop on Quantitative Approaches to Software Quality - 2018.

  6. 6.

  7. 7.

    Due to space restrictions, for more details, we refer the reader to the official documentation of technologies of Apache Kafka, Apache Nifi, Apcahe Flink, and Jenkins.


  1. AEPD: GDPR compliance of processings that embed Artificial Intelligence An introduction. The Spanish Data Protection Agency (2020). Accessed 10 Dec 2020

  2. Apache: The Apache Software Foundation (2021). Accessed 10 May 2021

  3. Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1285–1298. Association for Computing Machinery, New York (2017).

  4. EBF: European Banking Federation, EBF position paper on AI in the banking industry. EU Transparency Register, ID number: 4722660838–23 (2019). Accessed 15 May 2021

  5. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013).

    CrossRef  Google Scholar 

  6. Flink: Apache Flink, a framework and distributed processing engine (2021). Accessed 04 Mar 2021

  7. Gitlab: Gitlab, Create a Jenkins Pipeline (2021). Accessed 10 May 2021

  8. He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40 (2017)

    Google Scholar 

  9. He, S., He, P., Chen, Z., Yang, T., Su, Y., Lyu, M.: A survey on automated log analysis for reliability engineering (September 2020)

    Google Scholar 

  10. Hoi, S., Sahoo, D., Lu, J., Zhao, P.: Online learning: a comprehensive survey (February 2018)

    Google Scholar 

  11. Jenkins: The leading open source automation server for deploying projects (2021). Accessed 16 Apr 2021

  12. Kafka: Apache Kafka, an open-source distributed event streaming platform (2021). Accessed 5 May 2021

  13. Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16, 1–10 (2014).

    CrossRef  Google Scholar 

  14. Kubeflow: The Machine Learning Toolkit for Kubernetes (2021). Accessed 01 Apr 2021

  15. Kubernetes: Automating deployment containerized applications (2021). Accessed 04 Mar 2021

  16. Manzoor, E., Milajerdi, S., Venkatakrishnan, V., Akoglu, L.: Fast memory-efficient anomaly detection in streaming heterogeneous graphs (February 2016)

    Google Scholar 

  17. Mcgregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43, 9–20 (2014)

    CrossRef  Google Scholar 

  18. Meng, F.J., Wegman, M.N., Xu, J.M., Zhang, X., Chen, P., Chafle, G.: It troubleshooting with drift analysis in the DevOps era. IBM J. Res. Dev. 61(1), 6:62-6:73 (2017)

    CrossRef  Google Scholar 

  19. Montiel, J., et al.: River: machine learning for streaming data in Python (2020)

    Google Scholar 

  20. NIFI: Apache NIFI, an easy to use, powerful, and reliable system to process and distribute data (2021). Accessed 10 May 2021

  21. Tan, S., Ting, K., Liu, F.T.: Fast anomaly detection for streaming data, pp. 1511–1516 (January 2011).

  22. Wu, W., Gruenwald, L.: Research issues in mining multiple data streams, pp. 56–60 (July 2010)

    Google Scholar 

  23. Zhu, J., et al.: Tools and benchmarks for automated log parsing (November 2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Albert Bifet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Barry, M., Bifet, A., Chiky, R., Montiel, J., Tran, VT. (2021). Challenges of Machine Learning for Data Streams in the Banking Industry. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93619-8

  • Online ISBN: 978-3-030-93620-4

  • eBook Packages: Computer ScienceComputer Science (R0)