Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Apache Flink

Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_303-1

Synonyms

Overview

Today, virtually all data is continuously generated as streams of events. This includes business transactions, interactions with web or mobile application, sensor or device logs, and database modifications. There are two ways to process continuously produced data, namely batch and stream processing. For stream processing, the data is immediately ingested and processed by a continuously running application as it arrives. For batch processing, the data is first recorded and persisted in a storage system, such as a file system or database system, before it is (periodically) processed by an application that processes a bounded data set. While stream processing typically achieves lower latencies to produce results, it induces operational challenges because streaming applications which run 24 × 7 make high demands on failure recovery and consistency guarantees.

The most fundamental difference between batch and stream processing applications is that...

This is a preview of subscription content, log in to check access.

References

  1. Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endowment 8(12):1792–1803CrossRefGoogle Scholar
  2. Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, …, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In BTW, pp 25–44Google Scholar
  3. Alexandrov A, Bergmann R, Ewen S, Freytag JC, Hueske F, Heise A, …, Naumann F (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964CrossRefGoogle Scholar
  4. Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, pp 119–130Google Scholar
  5. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015a) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4Google Scholar
  6. Carbone P et al (2015b) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4Google Scholar
  7. Carbone P et al (2015c) Lightweight asynchronous snapshots for distributed dataflows. In CoRR abs/1506.08603. http://arxiv.org/abs/1506.08603
  8. Carbone P, Ewen S, Fóra G, Haridi S, Richter S, Tzoumas K (2017) State management in apache flink®: consistent stateful distributed stream processing. Proc VLDB Endowment 10(12):1718–1729CrossRefGoogle Scholar
  9. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  10. Ewen S, Tzoumas K, Kaufmann M, Markl V (2012) Spinning fast iterative data flows. Proc VLDB Endowment 5(11):1268–1279CrossRefGoogle Scholar
  11. Ghemawat S, Gobioff H, Leung ST (2003) The google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43. ACMCrossRefGoogle Scholar
  12. Hueske F, Peters M, Sax MJ, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endowment 5(11):1256–1267CrossRefGoogle Scholar
  13. Koliopoulos A (2017) Drivetribe’s modern take on CQRS with Apache Flink. Drivetribe. https://data-artisans.com/blog/drivetribe-cqrs-apache-flink. Visited on 7 Sept 2017
  14. Mani Chandy K, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Trans Comp Syst (TOCS) 3(1):63–75CrossRefGoogle Scholar
  15. The Apache Software Foundation. RocksDB|A persistent key-value store|RocksDB. http://rocksdb.org/. Visited on 30 Sept 2017

Recommended Reading

  1. Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media, Sebastopol. ISBN 1491976586Google Scholar
  2. Hueske F, Kalavri V (2018) Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media, Sebastopol. ISBN 149197429XGoogle Scholar

Authors and Affiliations

  1. 1.data Artisans GmbHBerlinGermany

Section editors and affiliations

  • Alessandro Margara
    • 1
  • Tilmann Rabl
    • 2
  1. 1.Politecnico di Milano
  2. 2.Database Systems and Information Management GroupTechnische Universität BerlinBerlinGermany