Abstract
A study of Google’s data center revealed that the incidence of main memory errors is surprisingly high. These errors can lead to application and system corruption, impacting reliability. The high error rate is an indication that new resiliency techniques will be vital in future memories. To develop such approaches, a framework is needed to conduct flexible and repeatable experiments. This paper describes such a framework, StealthWorks, to facilitate research on software resilience by behaviorally emulating memory errors in a live system. We illustrate it to study program tolerance to random errors and in the development of a new software technique to continuously test memory for errors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dell, T.J.: A white paper on the benefits of chipkill - correct ECC for PC server main memory. In: IBM Microelectronics Division (1997)
Kumar, N., Childers, B.R., Soffa, M.L.: Low overhead program monitoring and profiling. In: ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2005), pp. 28–34 (2005)
Li, M.-L., Ramachandran, P., Sahoo, S.K., Adve, S.V., Adve, V.S., Zhou, Y.: SWAT: An error resilient system. In: 4th Workshop on Silicon Errors in Logic - System Effects (2008)
Li, M.-L., Ramachandran, P., Sahoo, S.K., Adve, S.V., Adve, V.S., Zhou, Y.: Understanding the propagation of hard errors to software and its implications on resilient system design. In: Architecture Support for Programming Languages and Operating Systems (ASPLOS 2008), pp. 265–276 (2008)
Li, X., Huang, M.C., Shen, K.: A realistic evaluation of memory hardware errors and software system susceptibility. In: USENIX Conference (2010)
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PDLI 2005), pp. 190–200 (2005)
Schroeder, B., Pinheiro, E., Weber, W.-D.: DRAM errors in the wild: a large-scale field study. In: Internaetional Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2009), pp. 193–204 (2009)
Scott, K., Kumar, N., Velusamy, S., Childers, B.R., Davidson, J.W., Soffa, M.L.: Retargetable and reconfigurable software dynamic translation. In: International Conference on Code Generation and Optimization (CGO 2003), pp. 36–47 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rahman, M., Childers, B.R., Cho, S. (2010). StealthWorks: Emulating Memory Errors. In: Barringer, H., et al. Runtime Verification. RV 2010. Lecture Notes in Computer Science, vol 6418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16612-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-16612-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16611-2
Online ISBN: 978-3-642-16612-9
eBook Packages: Computer ScienceComputer Science (R0)