Skip to main content

Apache Apex

  • Reference work entry
  • First Online:
Encyclopedia of Big Data Technologies


Apache Apex (2018; Weise et al. 2017) is a large-scale stream-first big data processing framework that can be used for low-latency, high-throughput, and fault-tolerant processing of unbounded (or bounded) datasets on clusters. Apex development started in 2012, and it became a project at the Apache Software Foundation in 2015. Apex can be used for real-time and batch processing, based on a unified stateful streaming architecture, with support for event-time windowing and exactly-once processing semantics (Fig. 1).

Apache Apex, Fig. 1
figure 15 figure 15

Apex as distributed stream processor

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  • Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6:1033–1044

    Google Scholar 

  • Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8:1792–1803

    Google Scholar 

  • Apache Apex (2018)

  • Apache Calcite (2018)

  • Bertolucci M et al (2015) Static and dynamic big data partitioning on Apache Spark. PARCO

    Google Scholar 

  • Carbone P et al (2015a) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38:28–38

    Google Scholar 

  • Carbone P et al (2015b) Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603: n. pag

    Google Scholar 

  • Carbone P et al (2017) State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10:1718–1729

    Google Scholar 

  • Confluent blog (2018)

  • Del Monte B (2017) Efficient migration of very large distributed state for scalable stream processing. PhD@VLDB

    Google Scholar 

  • Fernandez RC et al (2013) Integrating scale out and fault tolerance in stream processing using operator state management. SIGMOD conference

    Google Scholar 

  • Floratou A et al (2017) Dhalion: self-regulating stream processing in Heron. PVLDB 10:1825–1836

    Google Scholar 

  • Hummer W et al (2013) Elastic stream processing in the cloud. Wiley Interdisc Rew: Data Min Knowl Discov 3:333–345

    Google Scholar 

  • Jacques-Silva G et al (2016) Consistent regions: guaranteed tuple processing in IBM streams. PVLDB 9:1341–1352

    Google Scholar 

  • Kulkarni S et al (2015) Twitter Heron: stream processing at scale. SIGMOD conference

    Google Scholar 

  • Lin W et al (2016) StreamScope: continuous reliable distributed processing of dig data streams. NSDI

    Google Scholar 

  • Nasir MAU (2016) Fault tolerance for stream processing engines. CoRR abs/1605.00928: n. pag

    Google Scholar 

  • Noghabi SA et al (2017) Stateful scalable stream processing at LinkedIn. PVLDB 10:1634–1645

    Google Scholar 

  • Sattler K-U, Beier F (2013) Towards elastic stream processing: patterns and infrastructure. BD3@VLDB

    Google Scholar 

  • Sebepou Z, Magoutis K (2011) CEC: continuous eventual checkpointing for data stream processing operators. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). pp 145–156

    Google Scholar 

  • To Q-C et al (2017) A survey of state management in big data processing systems. CoRR abs/1702.01596: n. pag

    Google Scholar 

  • Weise T et al (2017) Learning Apache Apex. Packt Publishing

    Google Scholar 

  • Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI

    Google Scholar 

  • Zaharia M et al (2013) Discretized streams: fault-tolerant streaming computation at scale. SOSP

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Ananth Gundabattula or Thomas Weise .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gundabattula, A., Weise, T. (2019). Apache Apex. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham.

Download citation

Publish with us

Policies and ethics