Skip to main content

Apache Apex

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 276 Accesses

Introduction

Apache Apex (2018; Weise et al. 2017) is a large-scale stream-first big data processing framework that can be used for low-latency, high-throughput, and fault-tolerant processing of unbounded (or bounded) datasets on clusters. Apex development started in 2012, and it became a project at the Apache Software Foundation in 2015. Apex can be used for real-time and batch processing, based on a unified stateful streaming architecture, with support for event-time windowing and exactly-once processing semantics (Fig. 1).

Fig. 1
figure 1

Apex as distributed stream processor

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6:1033–1044

    Google Scholar 

  • Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8:1792–1803

    Google Scholar 

  • Apache Apex (2018) https://apex.apache.org/

  • Apache Calcite (2018) https://calcite.apache.org/

  • Bertolucci M et al (2015) Static and dynamic big data partitioning on Apache Spark. PARCO

    Google Scholar 

  • Carbone P et al (2015a) Apache Flinkâ„¢: stream and batch processing in a single engine. IEEE Data Eng Bull 38:28–38

    Google Scholar 

  • Carbone P et al (2015b) Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603: n. pag

    Google Scholar 

  • Carbone P et al (2017) State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB 10:1718–1729

    Google Scholar 

  • Confluent blog (2018) https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/

  • Del Monte B (2017) Efficient migration of very large distributed state for scalable stream processing. PhD@VLDB

    Google Scholar 

  • Fernandez RC et al (2013) Integrating scale out and fault tolerance in stream processing using operator state management. SIGMOD conference

    Google Scholar 

  • Floratou A et al (2017) Dhalion: self-regulating stream processing in Heron. PVLDB 10:1825–1836

    Google Scholar 

  • Hummer W et al (2013) Elastic stream processing in the cloud. Wiley Interdisc Rew: Data Min Knowl Discov 3:333–345

    Google Scholar 

  • Jacques-Silva G et al (2016) Consistent regions: guaranteed tuple processing in IBM streams. PVLDB 9:1341–1352

    Google Scholar 

  • Kulkarni S et al (2015) Twitter Heron: stream processing at scale. SIGMOD conference

    Google Scholar 

  • Lin W et al (2016) StreamScope: continuous reliable distributed processing of dig data streams. NSDI

    Google Scholar 

  • Nasir MAU (2016) Fault tolerance for stream processing engines. CoRR abs/1605.00928: n. pag

    Google Scholar 

  • Noghabi SA et al (2017) Stateful scalable stream processing at LinkedIn. PVLDB 10:1634–1645

    Google Scholar 

  • Sattler K-U, Beier F (2013) Towards elastic stream processing: patterns and infrastructure. BD3@VLDB

    Google Scholar 

  • Sebepou Z, Magoutis K (2011) CEC: continuous eventual checkpointing for data stream processing operators. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). pp 145–156

    Google Scholar 

  • To Q-C et al (2017) A survey of state management in big data processing systems. CoRR abs/1702.01596: n. pag

    Google Scholar 

  • Weise T et al (2017) Learning Apache Apex. Packt Publishing

    Google Scholar 

  • Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI

    Google Scholar 

  • Zaharia M et al (2013) Discretized streams: fault-tolerant streaming computation at scale. SOSP

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ananth Gundabattula or Thomas Weise .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gundabattula, A., Weise, T. (2018). Apache Apex. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_316-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_316-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics