Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Apache Apex

Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_316-1


Apache Apex (2018; Weise et al. 2017) is a large-scale stream-first big data processing framework that can be used for low-latency, high-throughput, and fault-tolerant processing of unbounded (or bounded) datasets on clusters. Apex development started in 2012, and it became a project at the Apache Software Foundation in 2015. Apex can be used for real-time and batch processing, based on a unified stateful streaming architecture, with support for event-time windowing and exactly-once processing semantics (Fig. 1).
This is a preview of subscription content, log in to check access.


  1. Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6:1033–1044Google Scholar
  2. Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8:1792–1803Google Scholar
  3. Apache Apex (2018) https://apex.apache.org/
  4. Apache Calcite (2018) https://calcite.apache.org/
  5. Bertolucci M et al (2015) Static and dynamic big data partitioning on Apache Spark. PARCOGoogle Scholar
  6. Carbone P et al (2015a) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38:28–38Google Scholar
  7. Carbone P et al (2015b) Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603: n. pagGoogle Scholar
  8. Carbone P et al (2017) State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10:1718–1729Google Scholar
  9. Del Monte B (2017) Efficient migration of very large distributed state for scalable stream processing. PhD@VLDBGoogle Scholar
  10. Fernandez RC et al (2013) Integrating scale out and fault tolerance in stream processing using operator state management. SIGMOD conferenceGoogle Scholar
  11. Floratou A et al (2017) Dhalion: self-regulating stream processing in Heron. PVLDB 10:1825–1836Google Scholar
  12. Hummer W et al (2013) Elastic stream processing in the cloud. Wiley Interdisc Rew: Data Min Knowl Discov 3:333–345Google Scholar
  13. Jacques-Silva G et al (2016) Consistent regions: guaranteed tuple processing in IBM streams. PVLDB 9:1341–1352Google Scholar
  14. Kulkarni S et al (2015) Twitter Heron: stream processing at scale. SIGMOD conferenceGoogle Scholar
  15. Lin W et al (2016) StreamScope: continuous reliable distributed processing of dig data streams. NSDIGoogle Scholar
  16. Nasir MAU (2016) Fault tolerance for stream processing engines. CoRR abs/1605.00928: n. pagGoogle Scholar
  17. Noghabi SA et al (2017) Stateful scalable stream processing at LinkedIn. PVLDB 10:1634–1645Google Scholar
  18. Sattler K-U, Beier F (2013) Towards elastic stream processing: patterns and infrastructure. BD3@VLDBGoogle Scholar
  19. Sebepou Z, Magoutis K (2011) CEC: continuous eventual checkpointing for data stream processing operators. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). pp 145–156Google Scholar
  20. To Q-C et al (2017) A survey of state management in big data processing systems. CoRR abs/1702.01596: n. pagGoogle Scholar
  21. Weise T et al (2017) Learning Apache Apex. Packt PublishingGoogle Scholar
  22. Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDIGoogle Scholar
  23. Zaharia M et al (2013) Discretized streams: fault-tolerant streaming computation at scale. SOSPGoogle Scholar

Authors and Affiliations

  1. 1.Commonwealth Bank of AustraliaSydneyAustralia
  2. 2.Atrato Inc.San FranciscoUSA

Section editors and affiliations

  • Alessandro Margara
    • 1
  • Tilmann Rabl
    • 2
  1. 1.Politecnico di Milano
  2. 2.Database Systems and Information Management GroupTechnische Universität BerlinBerlinGermany