Skip to main content

Deterministic Model for Distributed Speculative Stream Processing

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2018)

Abstract

Users of modern distributed stream processing systems have to choose between non-deterministic computations and high latency due to a need in excessive buffering. We introduce a speculative model based on MapReduce-complete set of operations that allows us to achieve determinism and low-latency. Experiments show that our prototype can outperform existing solutions due to low overhead of optimistic synchronization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB 6(11), 1033–1044 (2013)

    Article  Google Scholar 

  2. Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB 8(12), 1792–1803 (2015)

    Article  Google Scholar 

  3. Apache storm, October 2017. http://storm.apache.org/

  4. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. The VLDB J. 15(2), 121–142 (2006)

    Article  Google Scholar 

  5. Babu, S., Srivastava, U., Widom, J.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Trans. Database Syst. 29(3), 545–580 (2004). https://doi.org/10.1145/1016028.1016032

    Article  Google Scholar 

  6. Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache flink®: consistent stateful distributed stream processing. Proc. VLDB 10(12), 1718–1729 (2017)

    Article  Google Scholar 

  7. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Committee Data Eng. 36(4) (2015)

    Google Scholar 

  8. Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1789–1792, May 2016

    Google Scholar 

  9. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 647–651. ACM, New York (2003). http://doi.acm.org/10.1145/872757.872838

  10. Jacques-Silva, G., et al.: Consistent regions: guaranteed tuple processing in IBM streams. Proc. VLDB Endow. 9(13), 1341–1352 (2016)

    Article  Google Scholar 

  11. Li, C.W., Gu, Y., Yu, G., Hong, B.: Aggressive complex event processing with confidence over out-of-order streams. J. Comput. Sci. Technol. 26(4), 685–696 (2011). https://doi.org/10.1007/s11390-011-1168-x

    Article  Google Scholar 

  12. Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)

    Article  Google Scholar 

  13. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 439–455. ACM, New York (2013). http://doi.acm.org/10.1145/2517349.2522738

  14. Noghabi, S.A., et al.: Samza: stateful scalable stream processing at LinkedIn. Proc. VLDB Endow. 10(12), 1634–1645 (2017)

    Article  Google Scholar 

  15. Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: Proceedings of the PODS, pp. 263–274. ACM, New York (2004)

    Google Scholar 

  16. Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. SIGMOD Rec. 34(4), 42–47 (2005)

    Article  Google Scholar 

  17. Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003). https://doi.org/10.1109/TKDE.2003.1198390

    Article  Google Scholar 

  18. Wei, M., Liu, M., Li, M., Golovnya, D., Rundensteiner, E.A., Claypool, K.: Supporting a spectrum of out-of-order event processing technologies: from aggressive to conservative methodologies. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 1031–1034. ACM, New York (2009)

    Google Scholar 

  19. Zacheilas, N., Kalogeraki, V., Nikolakopoulos, Y., Gulisano, V., Papatriantafilou, M., Tsigas, P.: Maximizing determinism in stream processing under latency constraints. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, pp. 112–123. ACM, New York (2017)

    Google Scholar 

  20. Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, p. 10. USENIX Association, Berkeley (2012)

    Google Scholar 

  21. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor E. Kuralenok .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kuralenok, I.E., Trofimov, A., Marshalkin, N., Novikov, B. (2018). Deterministic Model for Distributed Speculative Stream Processing. In: Benczúr, A., Thalheim, B., Horváth, T. (eds) Advances in Databases and Information Systems. ADBIS 2018. Lecture Notes in Computer Science(), vol 11019. Springer, Cham. https://doi.org/10.1007/978-3-319-98398-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98398-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98397-4

  • Online ISBN: 978-3-319-98398-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics