Towards fault tolerance optimization based on checkpoints of in-memory framework spark

Ying, Changtian; Yu, Jiong; He, JingSha

doi:10.1007/s12652-018-1018-6

Towards fault tolerance optimization based on checkpoints of in-memory framework spark

Original Research
Published: 07 September 2018

(2018)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Changtian Ying^1,2,
Jiong Yu² &
JingSha He³

341 Accesses
7 Citations
Explore all metrics

Abstract

Performance optimization especially for fault tolerance optimization has been a significant aspect for in-memory computing computing. When node failure occurs, in-memory computing may lose the data, which increases the execution time without checkpoint. In the traditional Spark strategy, the programmer chooses the checkpoint with the uncertainty and risk. Therefore, we aims at the checkpoint strategy of in memory computing framework Spark in this paper. After the theoretical analysis, the checkpoint selection algorithm which taking into account the length of the RDD lineage, the computational cost, the operation complexity and the size in setting the checkpoint is presented. The greater the weight of RDD, the higher priority it has. The RDD with higher cost will be set as the checkpoint first, which can reduce the recomputation cost of the task. When failure occurs, the recovery algorithm is executed, and the efficiency of the task recovery can be effectively improved. And the experimental results show that the strategy optimizes the fault tolerance mechanism for Spark and improves the efficiency of the job recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment

Article 02 August 2021

Application-Based Coarse-Grained Incremental Checkpointing Based on Non-volatile Memory

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

References

Chen CLP, Zhang CY (2013) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275(11), 314–347
Google Scholar
Cho C, Lee S (2016) Effective five directional partial derivatives-based image smoothing and a parallel structure design. IEEE Trans Image Process 25(4):1617–1625
Article MathSciNet Google Scholar
Dimitriou I (2015) A retrial queue for modeling fault-tolerant systems with checkpointing and rollback recovery. Comput Ind Eng 79:156–167
Article Google Scholar
Ferreira KB, Riesen R, Bridges P et al (2014) Accelerating incremental checkpointing for extreme-scale computing. Future Gen Comput Syst 30:66–77
Article Google Scholar
Ifeanyi P, Egwutuoha D, Levy B, Selic S, Chen (2013) A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J Supercomput 65(3):1302–1326
Article Google Scholar
Jo I, Bae DH, Yoon AS et al (2016) YourSQL: a high-performance database system leveraging in-storage computing. Proc Vldb Endow 9(12):924–935
Article Google Scholar
John Walker S (2014) Big data: a revolution that will transform how we live, work, and think. Int J Advert 33(1):181–183
Article Google Scholar
Kambatla K, Kollias G, Kumar V et al (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Article Google Scholar
Kozhirbayev Z, Sinnott RO (2017) A performance comparison of container-based technologies for the cloud. Future Gen Comput Syst 68(3):175–182
Article Google Scholar
Matsukawa G, Nakata Y, Sugure Y et al (2015) A low-latency DMR architecture with fast checkpoint recovery scheme. IEICE Trans Electron 98(4):333–339
Article Google Scholar
Napoli C, Pappalardo G, Tramontana E (2016) A mathematical model for file fragment diffusion and a neural predictor to manage priority queues over BitTorrent. Int J Appl Math Comput Sci 26(1):147–160
Article MathSciNet MATH Google Scholar
Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59
Article Google Scholar
Rodríguez G, González P et al (2013) Improving scalability of application-level checkpoint-recovery by reducing checkpoint sizes. New Gen Comput 31(3):163–185
Article Google Scholar
Sengupta B, Das A (2017) Use of SIMD-based data parallelism to speed up sieving in integer-factoring algorithms. Appl Math Comput 293(1):204–217
MathSciNet Google Scholar
Tang Y, Gedik B (2013) Autopipelining for data stream processing. IEEE Trans Parallel Distrib Syst 24(12):2344–2354
Article Google Scholar
Wu R, Huang L, Yu P et al (2017) SunwayMR: a distributed parallel computing framework with convenient data-intensive applications programming. Future Gen Comput Syst 71:43–56
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to the anonymous referees for their insightful suggestions and comments. This research was supported by the National Natural Science Foundation of China under Grant Nos. 61262088, 61462079, 61363083, 61562086.

Author information

Authors and Affiliations

School of Mechanical and Electrical Engineering, Shaoxing University, Shaoxing, China
Changtian Ying
School of Software, Xinjiang University, Urumqi, China
Changtian Ying & Jiong Yu
School of Software, Beijing University of Technology, Beijing, China
JingSha He

Authors

Changtian Ying
View author publications
You can also search for this author in PubMed Google Scholar
Jiong Yu
View author publications
You can also search for this author in PubMed Google Scholar
JingSha He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changtian Ying.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ying, C., Yu, J. & He, J. Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Human Comput (2018). https://doi.org/10.1007/s12652-018-1018-6

Download citation

Received: 06 July 2018
Accepted: 27 August 2018
Published: 07 September 2018
DOI: https://doi.org/10.1007/s12652-018-1018-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards fault tolerance optimization based on checkpoints of in-memory framework spark

Abstract

Access this article

Similar content being viewed by others

Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment

Application-Based Coarse-Grained Incremental Checkpointing Based on Non-volatile Memory

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards fault tolerance optimization based on checkpoints of in-memory framework spark

Abstract

Access this article

Similar content being viewed by others

Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment

Application-Based Coarse-Grained Incremental Checkpointing Based on Non-volatile Memory

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation