- 352 Downloads
Today, virtually all data is continuously generated as streams of events. This includes business transactions, interactions with web or mobile application, sensor or device logs, and database modifications. There are two ways to process continuously produced data, namely batch and stream processing. For stream processing, the data is immediately ingested and processed by a continuously running application as it arrives. For batch processing, the data is first recorded and persisted in a storage system, such as a file system or database system, before it is (periodically) processed by an application that processes a bounded data set. While stream processing typically achieves lower latencies to produce results, it induces operational challenges because streaming applications which run 24 × 7 make high demands on failure recovery and consistency guarantees.
The most fundamental difference between batch and stream processing applications is that...
- Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, …, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In BTW, pp 25–44Google Scholar
- Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, pp 119–130Google Scholar
- Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015a) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4Google Scholar
- Carbone P et al (2015b) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4Google Scholar
- Carbone P et al (2015c) Lightweight asynchronous snapshots for distributed dataflows. In CoRR abs/1506.08603. http://arxiv.org/abs/1506.08603
- Koliopoulos A (2017) Drivetribe’s modern take on CQRS with Apache Flink. Drivetribe. https://data-artisans.com/blog/drivetribe-cqrs-apache-flink. Visited on 7 Sept 2017
- The Apache Software Foundation. RocksDB|A persistent key-value store|RocksDB. http://rocksdb.org/. Visited on 30 Sept 2017
- Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media, Sebastopol. ISBN 1491976586Google Scholar
- Hueske F, Kalavri V (2018) Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media, Sebastopol. ISBN 149197429XGoogle Scholar