Abstract
Performance optimization especially for fault tolerance optimization has been a significant aspect for in-memory computing computing. When node failure occurs, in-memory computing may lose the data, which increases the execution time without checkpoint. In the traditional Spark strategy, the programmer chooses the checkpoint with the uncertainty and risk. Therefore, we aims at the checkpoint strategy of in memory computing framework Spark in this paper. After the theoretical analysis, the checkpoint selection algorithm which taking into account the length of the RDD lineage, the computational cost, the operation complexity and the size in setting the checkpoint is presented. The greater the weight of RDD, the higher priority it has. The RDD with higher cost will be set as the checkpoint first, which can reduce the recomputation cost of the task. When failure occurs, the recovery algorithm is executed, and the efficiency of the task recovery can be effectively improved. And the experimental results show that the strategy optimizes the fault tolerance mechanism for Spark and improves the efficiency of the job recovery.
Similar content being viewed by others
References
Chen CLP, Zhang CY (2013) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275(11), 314–347
Cho C, Lee S (2016) Effective five directional partial derivatives-based image smoothing and a parallel structure design. IEEE Trans Image Process 25(4):1617–1625
Dimitriou I (2015) A retrial queue for modeling fault-tolerant systems with checkpointing and rollback recovery. Comput Ind Eng 79:156–167
Ferreira KB, Riesen R, Bridges P et al (2014) Accelerating incremental checkpointing for extreme-scale computing. Future Gen Comput Syst 30:66–77
Ifeanyi P, Egwutuoha D, Levy B, Selic S, Chen (2013) A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J Supercomput 65(3):1302–1326
Jo I, Bae DH, Yoon AS et al (2016) YourSQL: a high-performance database system leveraging in-storage computing. Proc Vldb Endow 9(12):924–935
John Walker S (2014) Big data: a revolution that will transform how we live, work, and think. Int J Advert 33(1):181–183
Kambatla K, Kollias G, Kumar V et al (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Kozhirbayev Z, Sinnott RO (2017) A performance comparison of container-based technologies for the cloud. Future Gen Comput Syst 68(3):175–182
Matsukawa G, Nakata Y, Sugure Y et al (2015) A low-latency DMR architecture with fast checkpoint recovery scheme. IEICE Trans Electron 98(4):333–339
Napoli C, Pappalardo G, Tramontana E (2016) A mathematical model for file fragment diffusion and a neural predictor to manage priority queues over BitTorrent. Int J Appl Math Comput Sci 26(1):147–160
Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59
Rodríguez G, González P et al (2013) Improving scalability of application-level checkpoint-recovery by reducing checkpoint sizes. New Gen Comput 31(3):163–185
Sengupta B, Das A (2017) Use of SIMD-based data parallelism to speed up sieving in integer-factoring algorithms. Appl Math Comput 293(1):204–217
Tang Y, Gedik B (2013) Autopipelining for data stream processing. IEEE Trans Parallel Distrib Syst 24(12):2344–2354
Wu R, Huang L, Yu P et al (2017) SunwayMR: a distributed parallel computing framework with convenient data-intensive applications programming. Future Gen Comput Syst 71:43–56
Acknowledgements
The authors are grateful to the anonymous referees for their insightful suggestions and comments. This research was supported by the National Natural Science Foundation of China under Grant Nos. 61262088, 61462079, 61363083, 61562086.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ying, C., Yu, J. & He, J. Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Human Comput (2018). https://doi.org/10.1007/s12652-018-1018-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12652-018-1018-6