Skip to main content

Engineering Cross-Layer Fault Tolerance in Many-Core Systems

  • Conference paper
  • First Online:
Software Engineering for Resilient Systems (SERENE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9274))

Included in the following conference series:

Abstract

Engineering modern many-core systems is a challenging task because of their scale and complexity. We cannot focus on ensuring their dependability without understanding its interplay with performance and energy consumption. This calls for developing new structuring mechanisms that step away from the traditional ways systems are developed (such as strict layering, strong encapsulation, abstractions, hiding). The paper reports on the initial steps of a PhD work focusing on development methods and tools for architecting cross-layer fault tolerance in many-core systems in which error detection and error recovery are applied at several system layers in a concerted coordinated fashion to ensure the overall system efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.E.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Sec. Comput. 1(1), 11–33 (2004)

    Article  Google Scholar 

  2. DeHon, A., Carter, N., Quinn, H.: Final Report for CCC Cross-Layer Reliability Visioning Study. http://relxlayer.org/ (2011)

  3. Borkar, S.: Thousand core chips—a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference (DAC) (2007)

    Google Scholar 

  4. Vajda, A.: Programming Many-Core Chips. Springer, New York (2011)

    Book  Google Scholar 

  5. Randell, B., Xu, J.: The evolution of the recovery block concept. In: Software Fault Tolerance. John Wiley & Sons Ltd, Hoboken, pp. 1–22 (1994)

    Google Scholar 

  6. Chen, L., Avizienis, A.: N-version programming: A fault tolerance approach to reliability of software operation. In: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, pp. 113–119 (1995)

    Google Scholar 

  7. Cristian, F.: A recovery mechanism for modular software. In: Proceeding of the 4th International Conference on Software Engineering, ICSE’1979 (1979)

    Google Scholar 

  8. Anderson, T., Lee, P.A.: Fault Tolerance, Principles and Practice. Prentice/Hall International, New Jersey (1981)

    Google Scholar 

  9. Mills, M.P.: The Cloud Begins With Coal. CEO Digital Power Group, Washington D.C (2013)

    Google Scholar 

  10. Carnevali, L., Ridi, L., Vicario, E.: Stochastic fault trees for cross-layer power management of WSN monitoring systems. In: Proceedings of IEEE Conference on Emerging Technologies & Factory Automation, pp. 1–8 (2009)

    Google Scholar 

  11. Rachelin Sujae, P., Vigneshpandi, M.: A cross layer fault tolerant communication architecture for wireless sensor networks. Middle-East J. Sci. Res. pp. 1292–1296 (2014)

    Google Scholar 

  12. Wang, Y., Wu, H., Lin, F., Tzeng, N.F.: Cross-layer protocol design and optimization for delay/fault-tolerant mobile sensor networks (DFT-MSN’s). IEEE J. Sel. Areas Commun. 26(5), 809–819 (2008)

    Article  MATH  Google Scholar 

  13. Ho, C.H., de Kruijf, M., Sankaralingam, K., Rountree, B., Schulz, M., de Supinski, B.R.: Mechanisms and evaluation of cross-layer fault-tolerance for supercomputing. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP), pp. 510–519 (2012)

    Google Scholar 

  14. Rafiev, A., Xia, F., Iliasov, A., Gensh, R., Aalsaud, A., Romanovsky, A., Yakovlev, A.: Order graphs and cross-layer parametric significance-driven modelling. In: Proceedings of ACSD 2015. IEEE CS, Brussels (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the EPSRC/UK PRiME project and by the School of Computing Science, Newcastle University (UK).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rem Gensh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gensh, R., Romanovsky, A., Yakovlev, A. (2015). Engineering Cross-Layer Fault Tolerance in Many-Core Systems. In: Fantechi, A., Pelliccione, P. (eds) Software Engineering for Resilient Systems. SERENE 2015. Lecture Notes in Computer Science(), vol 9274. Springer, Cham. https://doi.org/10.1007/978-3-319-23129-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23129-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23128-0

  • Online ISBN: 978-3-319-23129-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics