Engineering Cross-Layer Fault Tolerance in Many-Core Systems

Gensh, Rem; Romanovsky, Alexander; Yakovlev, Alex

doi:10.1007/978-3-319-23129-7_5

Rem Gensh¹⁵,
Alexander Romanovsky¹⁵ &
Alex Yakovlev¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9274))

Included in the following conference series:

International Workshop on Software Engineering for Resilient Systems

545 Accesses
1 Citations

Abstract

Engineering modern many-core systems is a challenging task because of their scale and complexity. We cannot focus on ensuring their dependability without understanding its interplay with performance and energy consumption. This calls for developing new structuring mechanisms that step away from the traditional ways systems are developed (such as strict layering, strong encapsulation, abstractions, hiding). The paper reports on the initial steps of a PhD work focusing on development methods and tools for architecting cross-layer fault tolerance in many-core systems in which error detection and error recovery are applied at several system layers in a concerted coordinated fashion to ensure the overall system efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.E.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Sec. Comput. 1(1), 11–33 (2004)
Article Google Scholar
DeHon, A., Carter, N., Quinn, H.: Final Report for CCC Cross-Layer Reliability Visioning Study. http://relxlayer.org/ (2011)
Borkar, S.: Thousand core chips—a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference (DAC) (2007)
Google Scholar
Vajda, A.: Programming Many-Core Chips. Springer, New York (2011)
Book Google Scholar
Randell, B., Xu, J.: The evolution of the recovery block concept. In: Software Fault Tolerance. John Wiley & Sons Ltd, Hoboken, pp. 1–22 (1994)
Google Scholar
Chen, L., Avizienis, A.: N-version programming: A fault tolerance approach to reliability of software operation. In: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, pp. 113–119 (1995)
Google Scholar
Cristian, F.: A recovery mechanism for modular software. In: Proceeding of the 4th International Conference on Software Engineering, ICSE’1979 (1979)
Google Scholar
Anderson, T., Lee, P.A.: Fault Tolerance, Principles and Practice. Prentice/Hall International, New Jersey (1981)
Google Scholar
Mills, M.P.: The Cloud Begins With Coal. CEO Digital Power Group, Washington D.C (2013)
Google Scholar
Carnevali, L., Ridi, L., Vicario, E.: Stochastic fault trees for cross-layer power management of WSN monitoring systems. In: Proceedings of IEEE Conference on Emerging Technologies & Factory Automation, pp. 1–8 (2009)
Google Scholar
Rachelin Sujae, P., Vigneshpandi, M.: A cross layer fault tolerant communication architecture for wireless sensor networks. Middle-East J. Sci. Res. pp. 1292–1296 (2014)
Google Scholar
Wang, Y., Wu, H., Lin, F., Tzeng, N.F.: Cross-layer protocol design and optimization for delay/fault-tolerant mobile sensor networks (DFT-MSN’s). IEEE J. Sel. Areas Commun. 26(5), 809–819 (2008)
Article MATH Google Scholar
Ho, C.H., de Kruijf, M., Sankaralingam, K., Rountree, B., Schulz, M., de Supinski, B.R.: Mechanisms and evaluation of cross-layer fault-tolerance for supercomputing. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP), pp. 510–519 (2012)
Google Scholar
Rafiev, A., Xia, F., Iliasov, A., Gensh, R., Aalsaud, A., Romanovsky, A., Yakovlev, A.: Order graphs and cross-layer parametric significance-driven modelling. In: Proceedings of ACSD 2015. IEEE CS, Brussels (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by the EPSRC/UK PRiME project and by the School of Computing Science, Newcastle University (UK).

Author information

Authors and Affiliations

Centre for Software Reliability, Newcastle University, Newcastle upon Tyne, UK
Rem Gensh & Alexander Romanovsky
School of Electrical and Electronic Engineering, Newcastle University, Newcastle upon Tyne, UK
Alex Yakovlev

Authors

Rem Gensh
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Romanovsky
View author publications
You can also search for this author in PubMed Google Scholar
Alex Yakovlev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rem Gensh .

Editor information

Editors and Affiliations

University of Florence, Firence, Italy
Alessandro Fantechi
University of Gothenburg, Gothenburgh, Sweden
Patrizio Pelliccione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gensh, R., Romanovsky, A., Yakovlev, A. (2015). Engineering Cross-Layer Fault Tolerance in Many-Core Systems. In: Fantechi, A., Pelliccione, P. (eds) Software Engineering for Resilient Systems. SERENE 2015. Lecture Notes in Computer Science(), vol 9274. Springer, Cham. https://doi.org/10.1007/978-3-319-23129-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-23129-7_5
Published: 28 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23128-0
Online ISBN: 978-3-319-23129-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics