Skip to main content
Log in

Reliable stream data processing for elastic distributed stream processing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Distributed stream processing system (DSPS) has proven to be an effective way to process and analyze large-scale data streams in real-time fashions. The reliability problem of DSPS is becoming a popular topic in recent years. Novel elastic DSPSs provide the ability to seamlessly adapt to stream workload changes, which introduce new reliability challenges: (1) operators can be scaled up and down at runtime, requiring fault tolerant methods to maintain data backup consistency under the runtime dynamics. (2) Rollback recovery to the last checkpoint may undo recent auto-scaling adjustments, which will introduce high cost and unacceptable impact to the system. In this paper, we put forward a novel fault-tolerant mechanism to deal with these issues. In particular, we propose a self-adaptive backup unit, elastic data slice (EDS), that can partition and merge data backups according to operator auto-scaling at runtime. The consistency of recovery is guaranteed by new upstream backup protocols, which restart the system from the status after auto-scaling instead of last checkpoint and avoid high recovery latency. Based on them, we implement a prototype system named SPATE. Evaluations on SPATE show that our mechanism supports auto-scaling changes with similar overhead compared to existing approaches, while achieving low recovery latency despite auto-scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. (TODS) 33(1), 3 (2008)

    Article  Google Scholar 

  3. Brito, A., Martin, A., Knauth, T., Creutz, S., Becker, D., Weigert, S., Fetzer, C.: Scalable and low-latency data processing with stream mapreduce. In: Proceedings of the IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), 2011 , IEEE, pp. 48–58 (2011)

  4. Castro Fernandez, R., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp. 725–736 (2013)

  5. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp. 668–668 (2003)

  6. Chen, Q., Hsu, M., Malu, C.: Fault tolerant distributed stream processing based on backtracking. Int. J. Netw. Distrib. Comput. 1(4), 226–238 (2013)

    Article  Google Scholar 

  7. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing. CIDR 3, 257–268 (2003)

    Google Scholar 

  8. de Assuncao, M.D., da Silva, A., Buyya, R.: Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103, 1–17 (2018)

    Article  Google Scholar 

  9. De Matteis, T., Mencagli, G.: Elastic scaling for distributed latency-sensitive data stream operators. In: Proccedings of the 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), IEEE, pp. 61–68 (2017)

  10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Gedik, B., Schneider, S., Hirzel, M., Wu, K.L.: Elastic scaling for data stream processing. Parallel Distrib. Syst. IEEE Trans. 25(25), 1447–1463 (2014)

    Article  Google Scholar 

  12. Gu, Y., Zhang, Z., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: An empirical study of high availability in stream processing systems. In: Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Springer-Verlag New York, Inc., p. 23 (2009)

  13. Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. Parallel Distrib. Syst. IEEE Trans. 23(12), 2351–2365 (2012)

    Article  Google Scholar 

  14. He, B., Yang, M., Guo, Z., Chen, R., Su, B., Lin, W., Zhou, L.: Comet: batched stream processing for data intensive distributed computing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp. 63–74 (2010)

  15. He, F., Wei, P.: Research on comprehensive point of interest (poi) recommendation based on spark. Clust. Comput. (2018). https://doi.org/10.1007/s10586-018-2061-y

    Article  Google Scholar 

  16. Heinze, T., Zia, M., Krahn, R., Jerzak, Z., Fetzer, C.: An adaptive replication scheme for elastic data stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 150–161 (2015)

  17. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, Boston, MA, USA, vol. 8 (2010)

  18. Hwang, J.H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, IEEE, pp. 779–790 (2005)

  19. Imai, S., Patterson, S., Varela, C.A.: Uncertainty-aware elastic virtual machine scheduling for stream processing systems. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), IEEE, pp. 62–71 (2018)

  20. Javed, M.H., Lu, X., Panda, D.K.: Cutting the tail: designing high performance message brokers to reduce tail latencies in stream processing. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 223–233 (2018)

  21. Koldehofe, B., Mayer, R., Ramachandran, U., Rothermel, K., Völz, M.: Rollback-recovery without checkpoints in distributed event processing systems. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 27–38 (2013)

  22. Li, H., Wu, J., Jiang, Z., Li, X., Wei, X.: Minimum backups for stream processing with recovery latency guarantees. IEEE Trans. Reliab. PP(99), 1–12 (2017)

    Google Scholar 

  23. Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Clust. Comput. 20(3), 2575–2589 (2017)

    Article  Google Scholar 

  24. Liu Z., Huang H., He Q., Chiew K., Gao Y.: Rare category exploration on llnear time complexity. In: Renz M., Shahabi C., Zhou X., Cheema M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science, vol. 9050, pp. 37–54. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18123-3_3

  25. Lohrmann, B., Janacik, P., Kao, O.: Elastic stream processing with latency guarantees. In: IEEE International Conference on Distributed Computing Systems, pp. 399–410 (2015)

  26. Lombardi, F., Aniello, L., Bonomi, S., Querzoni, L.: Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29(3), 572–585 (2018)

    Article  Google Scholar 

  27. Martin, A., Fetzer, C., Brito, A.: Active replication at (almost) no cost. In: 2011 30th IEEE Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 21–30 (2011)

  28. Marz, N.: Storm: distributed and fault-tolerant realtime computation (2013)

  29. Mencagli, G., Torquati, M., Danelutto, M.: Elastic-ppq: a two-level autonomic system for spatial preference query processing over dynamic data streams. Future Gener. Comput. Syst. 79, 862–877 (2018)

    Article  Google Scholar 

  30. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp. 170–177 (2010)

  31. Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, ACM, pp. 1–14 (2013)

  32. Sîrbu, A., Babaoglu, O.: Towards operator-less data centers through data-driven, predictive, proactive autonomics. Clust. Comput. 19(2), 865–878 (2016)

    Article  Google Scholar 

  33. Sumalatha, M., Ananthi, M.: Efficient data retrieval using adaptive clustered indexing for continuous queries over streaming data. Clust. Comput. (2017). https://doi.org/10.1007/s10586-017-1093-z

    Article  Google Scholar 

  34. Wang, H., Peh, L.S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: 2012 IEEE 26th International on Parallel & Distributed Processing Symposium (IPDPS), IEEE, pp. 1180–1191 (2012)

  35. Wei, X., Xiang, L., Hongliang, L., Cong, L., Yuan, Z.: Flexible online mapreduce model and topology protocols supporting large-scale stream data processing. J. Jilin Univ. (Eng. Technol. Edn.) 46(4), 1222–1231 (2016)

    Google Scholar 

  36. Wei, X., Li, L., Li, X., Wang, X., Gao, S., Li, H.: Pec: proactive elastic collaborative resourcescheduling in data stream processing. In: Proceedings of the IEEE Transactions on Parallel and Distributed Systems (2019)

  37. Wu, Y., Tan, K.L.: Chronostream: elastic stateful stream computation in the cloud. In: IEEE International Conference on Data Engineering, pp. 723–734 (2015)

  38. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: a fault-tolerant model for scalable stream processing. Technical Report, DTIC Document (2012)

  39. Zhang, Z., Gu, Y., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: A hybrid approach to high availability in stream processing systems. In: 2010 IEEE 30th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp. 138–148 (2010)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61602205 and 61772228), the National Key Research and Development Program of China (Grant Nos. 2017YFC1502306, 2016YFB0201503 and 2016YFB0701101), the Major Special Research Project of Science and Technology Department of Jilin Province (20160203008GX), and the Jilin Scientific and Technological Development Program (20170520066JH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongliang Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, X., Zhuang, Y., Li, H. et al. Reliable stream data processing for elastic distributed stream processing systems. Cluster Comput 23, 555–574 (2020). https://doi.org/10.1007/s10586-019-02939-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02939-9

Keywords

Navigation