Abstract
Engineering modern many-core systems is a challenging task because of their scale and complexity. We cannot focus on ensuring their dependability without understanding its interplay with performance and energy consumption. This calls for developing new structuring mechanisms that step away from the traditional ways systems are developed (such as strict layering, strong encapsulation, abstractions, hiding). The paper reports on the initial steps of a PhD work focusing on development methods and tools for architecting cross-layer fault tolerance in many-core systems in which error detection and error recovery are applied at several system layers in a concerted coordinated fashion to ensure the overall system efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.E.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Sec. Comput. 1(1), 11–33 (2004)
DeHon, A., Carter, N., Quinn, H.: Final Report for CCC Cross-Layer Reliability Visioning Study. http://relxlayer.org/ (2011)
Borkar, S.: Thousand core chips—a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference (DAC) (2007)
Vajda, A.: Programming Many-Core Chips. Springer, New York (2011)
Randell, B., Xu, J.: The evolution of the recovery block concept. In: Software Fault Tolerance. John Wiley & Sons Ltd, Hoboken, pp. 1–22 (1994)
Chen, L., Avizienis, A.: N-version programming: A fault tolerance approach to reliability of software operation. In: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, pp. 113–119 (1995)
Cristian, F.: A recovery mechanism for modular software. In: Proceeding of the 4th International Conference on Software Engineering, ICSE’1979 (1979)
Anderson, T., Lee, P.A.: Fault Tolerance, Principles and Practice. Prentice/Hall International, New Jersey (1981)
Mills, M.P.: The Cloud Begins With Coal. CEO Digital Power Group, Washington D.C (2013)
Carnevali, L., Ridi, L., Vicario, E.: Stochastic fault trees for cross-layer power management of WSN monitoring systems. In: Proceedings of IEEE Conference on Emerging Technologies & Factory Automation, pp. 1–8 (2009)
Rachelin Sujae, P., Vigneshpandi, M.: A cross layer fault tolerant communication architecture for wireless sensor networks. Middle-East J. Sci. Res. pp. 1292–1296 (2014)
Wang, Y., Wu, H., Lin, F., Tzeng, N.F.: Cross-layer protocol design and optimization for delay/fault-tolerant mobile sensor networks (DFT-MSN’s). IEEE J. Sel. Areas Commun. 26(5), 809–819 (2008)
Ho, C.H., de Kruijf, M., Sankaralingam, K., Rountree, B., Schulz, M., de Supinski, B.R.: Mechanisms and evaluation of cross-layer fault-tolerance for supercomputing. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP), pp. 510–519 (2012)
Rafiev, A., Xia, F., Iliasov, A., Gensh, R., Aalsaud, A., Romanovsky, A., Yakovlev, A.: Order graphs and cross-layer parametric significance-driven modelling. In: Proceedings of ACSD 2015. IEEE CS, Brussels (2015)
Acknowledgments
This work is supported by the EPSRC/UK PRiME project and by the School of Computing Science, Newcastle University (UK).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gensh, R., Romanovsky, A., Yakovlev, A. (2015). Engineering Cross-Layer Fault Tolerance in Many-Core Systems. In: Fantechi, A., Pelliccione, P. (eds) Software Engineering for Resilient Systems. SERENE 2015. Lecture Notes in Computer Science(), vol 9274. Springer, Cham. https://doi.org/10.1007/978-3-319-23129-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23129-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23128-0
Online ISBN: 978-3-319-23129-7
eBook Packages: Computer ScienceComputer Science (R0)