Energy-Aware Fault-Tolerant Scheduling Under Reliability and Time Constraints in Heterogeneous Systems

  • Tian Guo
  • Jing Liu
  • Wei Hu
  • Mengxue Wei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10956)


As heterogeneous systems have been deployed widely in various fields, the reliability become the major concern. Thereby, fault tolerance receives a great deal of attention in both industry and academia, especially for safety critical systems. Such systems require that tasks need to be carried out correctly in a given deadline even when an error occurs. Therefore, it is imperative to support fault-tolerance capability for systems. Scheduling is an efficient approach to achieving fault tolerance by allocating multiple copies of tasks on processors. Existing fault-tolerant scheduling algorithms realize fault tolerance without energy limit. To address this issue, this paper proposes an energy-aware fault-tolerant scheduling algorithm DRB-FTSA-E. The algorithm adopts the active replication strategy and uses a high utilization of energy consumption to complete a set of tasks with given reliability and time constraints. It finds out all schemes that meet time and system reliability constraints, and chooses the scheme with the maximum utilization of energy consumption as the final scheduling scheme. Experimental simulation results show that the proposed algorithm can effectively achieve the maximum utilization of energy consumption while meeting the reliability and time constraints.


Fault-tolerant Scheduling algorithm Reliability Time constraint Energy consumption 



The authors would like to express their sincere gratitude to the editors and the referees. This work was supported by the National Natural Science Foundation of China (Grant Nos. 61602350, 61602349), the Open Foundation of Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System (2016znss26C).


  1. 1.
    Benoit, A., Hakem, M., Robert, Y.: Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008)Google Scholar
  2. 2.
    Broberg, J., Ståhl, P.: Dynamic fault tolerance and task scheduling in distributed systems (2016)Google Scholar
  3. 3.
    Cui, X.T., Wu, K.J., Wei, T.Q., Sha, H.M.: Worst-case finish time analysis for dag-based applications in the presence of transient faults. J. Comput. Sci. Technol. 31(2), 267–283 (2016)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Deng, F., Tian, Y., Zhu, R., Chen, Z.: Fault-tolerant approach for modular multilevel converters under submodule faults. IEEE Trans. Ind. Electron. 63(11), 7253–7263 (2016)CrossRefGoogle Scholar
  5. 5.
    Girault, A., Kalla, H., Sighireanu, M., Sorel, Y.: An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In: 2003 Proceedings of the International Conference on Dependable Systems and Networks, pp. 159–168 (2006)Google Scholar
  6. 6.
    Guo, H., Wang, Z.G., Zhou, J.L.: Load balancing based process scheduling with fault-tolerance in heterogeneous distributed system. Chin. J. Comput. 28(11), 1807–1816 (2005)MathSciNetGoogle Scholar
  7. 7.
    Guo, Y., Zhu, D., Aydin, H.: Generalized standby-sparing techniques for energy-efficient fault tolerance in multiprocessor real-time systems. In: IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 62–71 (2013)Google Scholar
  8. 8.
    Guo, Y., Zhu, D., Aydin, H., Yang, L.T., Member, S., Antonio, S.: Energy-efficient scheduling of primary/backup tasks in multiprocessor real-time systems (extended version) (2013)Google Scholar
  9. 9.
    Haque, M.A., Aydin, H., Zhu, D.: On reliability management of energy-aware real-time systems through task replication. IEEE Trans. Parallel Distrib. Syst. 28(3), 813–825 (2017)CrossRefGoogle Scholar
  10. 10.
    Iyer, R.K.: Measurement and modeling of computer reliability as affected by system activity. ACM Trans. Comput. Syst. 4(3), 214–237 (1986)CrossRefGoogle Scholar
  11. 11.
    Levitin, G., Xing, L., Dai, Y.: Optimizing dynamic performance of multistate systems with heterogeneous 1-out-of-n warm standby components. IEEE Trans. Syst. Man Cybern. Syst. PP(99), 1–10 (2016)Google Scholar
  12. 12.
    Liu, J., Wang, S., Zhou, A., Kumar, S., Yang, F., Buyya, R.: Using proactive fault-tolerance approach to enhance cloud service reliability. IEEE Trans. Cloud Comput. PP(99), 1 (2016)Google Scholar
  13. 13.
    Luo, W., Yang, F., Pang, L., Qin, X.: Fault-tolerant scheduling based on periodic tasks for heterogeneous systems. In: Yang, L.T., Jin, H., Ma, J., Ungerer, T. (eds.) ATC 2006. LNCS, vol. 4158, pp. 571–580. Springer, Heidelberg (2006). Scholar
  14. 14.
    Song, Y.D., Yuan, X.: Low-cost adaptive fault-tolerant approach for semi-active suspension control of high speed trains. IEEE Trans. Ind. Electron. PP(99), 1 (2016)Google Scholar
  15. 15.
    Sridharan, R., Mahapatra, R.: Reliability aware power management for dual-processor real-time embedded systems. In: Design Automation Conference, pp. 819–824 (2010)Google Scholar
  16. 16.
    Tabbaa, N., Entezari-Maleki, R., Movaghar, A.: A fault tolerant scheduling algorithm for dag applications in cluster environments. Commun. Comput. Inf. Sci. 188, 189–199 (2011)Google Scholar
  17. 17.
    Topcuouglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  18. 18.
    Treaster, M.: A survey of fault-tolerance and fault-recovery techniques in parallel systems. ACM Computing Research Repository (CoRR 501002, 1–11) (2005)Google Scholar
  19. 19.
    Wei, M., Liu, J., Li, T., Xu, X., Hu, W., Zhao, D.: Fault-tolerant scheduling of real-time tasks on heterogeneous systems. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 1006–1011. IEEE (2017)Google Scholar
  20. 20.
    Xie, G.Q., Ren-Fa, L.I., Liu, L., Yang, F.: Dag reliability model and fault-tolerant algorithm for heterogeneous distributed systems. Chin. J. Comput. 36(10), 2019–2032 (2013)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Zhao, B., Aydin, H., Zhu, D.: Shared recovery for energy efficiency and reliability enhancements in real-time applications with precedence constraints. ACM Trans. Des. Autom. Electron. Syst. 18(2), 1–21 (2013)CrossRefGoogle Scholar
  22. 22.
    Zhao, L., Ren, Y., Yang, X., Sakurai, K.: Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. In: IEEE International Conference on High Performance Computing and Communications, pp. 434–441 (2011)Google Scholar
  23. 23.
    Zhu, D., Aydin, H.: Reliability-aware energy management for periodic real-time tasks. In: IEEE Real Time and Embedded Technology and Applications Symposium, pp. 225–235 (2007)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyWuhan University of Science and TechnologyWuhanChina
  2. 2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial SystemWuhanChina

Personalised recommendations