Synonyms
Overview
Today, virtually all data is continuously generated as streams of events. This includes business transactions, interactions with web or mobile application, sensor or device logs, and database modifications. There are two ways to process continuously produced data, namely batch and stream processing. For stream processing, the data is immediately ingested and processed by a continuously running application as it arrives. For batch processing, the data is first recorded and persisted in a storage system, such as a file system or database system, before it is (periodically) processed by an application that processes a bounded data set. While stream processing typically achieves lower latencies to produce results, it induces operational challenges because streaming applications which run 24 × 7 make high demands on failure recovery and consistency guarantees.
The most fundamental difference between batch and stream processing applications is that...
This is a preview of subscription content, log in via an institution.
References
Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endowment 8(12):1792–1803
Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, …, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In BTW, pp 25–44
Alexandrov A, Bergmann R, Ewen S, Freytag JC, Hueske F, Heise A, …, Naumann F (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964
Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, pp 119–130
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015a) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4
Carbone P et al (2015b) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4
Carbone P et al (2015c) Lightweight asynchronous snapshots for distributed dataflows. In CoRR abs/1506.08603. http://arxiv.org/abs/1506.08603
Carbone P, Ewen S, Fóra G, Haridi S, Richter S, Tzoumas K (2017) State management in apache flink®: consistent stateful distributed stream processing. Proc VLDB Endowment 10(12):1718–1729
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Ewen S, Tzoumas K, Kaufmann M, Markl V (2012) Spinning fast iterative data flows. Proc VLDB Endowment 5(11):1268–1279
Ghemawat S, Gobioff H, Leung ST (2003) The google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43. ACM
Hueske F, Peters M, Sax MJ, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endowment 5(11):1256–1267
Koliopoulos A (2017) Drivetribe’s modern take on CQRS with Apache Flink. Drivetribe. https://data-artisans.com/blog/drivetribe-cqrs-apache-flink. Visited on 7 Sept 2017
Mani Chandy K, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Trans Comp Syst (TOCS) 3(1):63–75
The Apache Software Foundation. RocksDB|A persistent key-value store|RocksDB. http://rocksdb.org/. Visited on 30 Sept 2017
Recommended Reading
Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media, Sebastopol. ISBN 1491976586
Hueske F, Kalavri V (2018) Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media, Sebastopol. ISBN 149197429X
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Hueske, F., Walther, T. (2018). Apache Flink. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_303-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_303-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
Apache Flink- Published:
- 17 May 2022
DOI: https://doi.org/10.1007/978-3-319-63962-8_303-2
-
Original
Apache Flink- Published:
- 24 April 2018
DOI: https://doi.org/10.1007/978-3-319-63962-8_303-1