Evaluating Application Vulnerability to Soft Errors in Multi-level Cache Hierarchy

  • Zhe Ma
  • Trevor Carlson
  • Wim Heirman
  • Lieven Eeckhout
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7156)


As the capacity of caches increases dramatically with new processors, soft errors originating in cache memories has become a major reliability concern for high performance processors. This paper presents application specific soft error vulnerability analysis in order to understand an application’s responses to soft errors from different levels of caches. Based on a high-performance processor simulator called Graphite, we have implemented a fault injection framework that can selectively inject bit flips to different levels of caches. We simulated a wide range of relevant bit error patterns and measured the applications’ vulnerabilities to bit errors. Our experimental results have shown the differing vulnerabilities of applications to bit errors in different levels of caches (e.g. the application failure rate for one program is more than the doulbe of that for another program for a given cache); the results have also indicated the probabilities of different failure behaviors for the given applications.


Soft error processor simulator fault injection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baumann, R.: Soft errors in advanced computer systems. IEEE Design & Test of Computers 22(3), 258–266 (2005)CrossRefGoogle Scholar
  2. 2.
    Bronevetsky, G., de Supinski, B.R.: Soft error vulnerability of iterative linear algebra methods. In: SELSE (2007)Google Scholar
  3. 3.
    Carlson, T.E., Heirman, W., Eeckhout, L.: Exploring the level of abstraction for scalable and accurate parallel multicore simulation. In: SC (2011)Google Scholar
  4. 4.
    Cochran, W.G.: Sampling Techniques, 3rd edn. John Wiley (1977)Google Scholar
  5. 5.
    da Lu, C., Reed, D.A.: Assessing fault sensitivity in MPI applications. In: SC, p. 37. IEEE Computer Society (2004)Google Scholar
  6. 6.
    Daveau, J.-M., Blampey, A., Gasiot, G., Bulone, J., Roche, P.: An industrial fault injection platform for soft-error dependability analysis and hardening of complex system-on-a-chip. In: IRPS, pp. 212–220 (2009)Google Scholar
  7. 7.
    Heidel, D., Marchal, P., et al.: Single-event upsets and multiple-bit upsets on a 45nm SOI SRAM. IEEE Transactions on Nuclear Science 56(6), 3499–3504 (2009)CrossRefGoogle Scholar
  8. 8.
    Kim, J., Hardavellas, N., Mai, K., Falsafi, B., Hoe, J.C.: Multi-bit error tolerant caches using two-dimensional error coding. In: MICRO, pp. 197–209 (2007)Google Scholar
  9. 9.
    Luk, C.-K., Cohn, R.S., Muth, R., Patil, H., Klauser, A., Geoffrey Lowney, P., Wallace, S., Reddi, V.J., Hazelwood, K.M.: Pin: building customized program analysis tools with dynamic instrumentation. In: PLDI, pp. 190–200 (2005)Google Scholar
  10. 10.
    Mak, T.M., Mitra, S., Zhang, M.: DFT assisted built-in soft error resilience. In: IOLTS, p. 69 (2005)Google Scholar
  11. 11.
    Miller, J.E., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: A distributed parallel simulator for multicores. In: HPCA, pp. 1–12 (2010)Google Scholar
  12. 12.
    Mukherjee, S.S., Weaver, C.T., Emer, J.S., Reinhardt, S.K., Austin, T.M.: A systematic methodology to compute the archi- tectural vulnerability factors for a high-performance microprocessor. In: MICRO, pp. 29–42. ACM/IEEE (2003)Google Scholar
  13. 13.
    Ramachandran, P., Kudva, P., Kellington, J.W., Schumann, J., Sanda, P.: Statistical fault injection. In: DSN, pp. 122–127. IEEE Computer Society (2008)Google Scholar
  14. 14.
    Rao, S., Sanda, P., Ackaret, J., Barrera, A., Yanez, J., Mitra, S.: Examing workload dependence of soft error rates. In: SELSE (2008)Google Scholar
  15. 15.
    Ruckerbauer, F.X., Georgakos, G.: Soft error rates in 65nm SRAMs analysis of new phenomena. In: IOLTS, pp. 203–204 (2007)Google Scholar
  16. 16.
    Schroeder, B., Gibson, G.A.: A large-scale study of failures in high performance computing systems. In: DSN, pp. 249–258 (2006)Google Scholar
  17. 17.
    Wang, N.J., Fertig, M., Patel, S.J.: Y-branches: When you come to a fork in the road, take it. In: IEEE PACT, pp. 56–66 (2003)Google Scholar
  18. 18.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zhe Ma
    • 1
    • 3
  • Trevor Carlson
    • 2
    • 3
  • Wim Heirman
    • 2
    • 3
  • Lieven Eeckhout
    • 2
    • 3
  1. 1.ImecLeuvenBelgium
  2. 2.Ghent UniversityGentBelgium
  3. 3.Intel ExaScience labLeuvenBelgium

Personalised recommendations