Introduction
Apache Apex (2018; Weise et al. 2017) is a large-scale stream-first big data processing framework that can be used for low-latency, high-throughput, and fault-tolerant processing of unbounded (or bounded) datasets on clusters. Apex development started in 2012, and it became a project at the Apache Software Foundation in 2015. Apex can be used for real-time and batch processing, based on a unified stateful streaming architecture, with support for event-time windowing and exactly-once processing semantics (Fig. 1).
References
Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6:1033–1044
Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8:1792–1803
Apache Apex (2018) https://apex.apache.org/
Apache Calcite (2018) https://calcite.apache.org/
Bertolucci M et al (2015) Static and dynamic big data partitioning on Apache Spark. PARCO
Carbone P et al (2015a) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38:28–38
Carbone P et al (2015b) Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603: n. pag
Carbone P et al (2017) State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10:1718–1729
Confluent blog (2018) https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/
Del Monte B (2017) Efficient migration of very large distributed state for scalable stream processing. PhD@VLDB
Fernandez RC et al (2013) Integrating scale out and fault tolerance in stream processing using operator state management. SIGMOD conference
Floratou A et al (2017) Dhalion: self-regulating stream processing in Heron. PVLDB 10:1825–1836
Hummer W et al (2013) Elastic stream processing in the cloud. Wiley Interdisc Rew: Data Min Knowl Discov 3:333–345
Jacques-Silva G et al (2016) Consistent regions: guaranteed tuple processing in IBM streams. PVLDB 9:1341–1352
Kulkarni S et al (2015) Twitter Heron: stream processing at scale. SIGMOD conference
Lin W et al (2016) StreamScope: continuous reliable distributed processing of dig data streams. NSDI
Nasir MAU (2016) Fault tolerance for stream processing engines. CoRR abs/1605.00928: n. pag
Noghabi SA et al (2017) Stateful scalable stream processing at LinkedIn. PVLDB 10:1634–1645
Sattler K-U, Beier F (2013) Towards elastic stream processing: patterns and infrastructure. BD3@VLDB
Sebepou Z, Magoutis K (2011) CEC: continuous eventual checkpointing for data stream processing operators. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). pp 145–156
To Q-C et al (2017) A survey of state management in big data processing systems. CoRR abs/1702.01596: n. pag
Weise T et al (2017) Learning Apache Apex. Packt Publishing
Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI
Zaharia M et al (2013) Discretized streams: fault-tolerant streaming computation at scale. SOSP
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Gundabattula, A., Weise, T. (2018). Apache Apex. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_316-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_316-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering