Advertisement

Language Support for Reliable Memory Regions

  • Saurabh HukerikarEmail author
  • Christian Engelmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10136)

Abstract

The path to exascale computational capabilities in high-performance computing (HPC) systems is challenged by the inadequacy of present software technologies to adapt to the rapid evolution of architectures of supercomputing systems. The constraints of power have driven system designs to include increasingly heterogeneous architectures and diverse memory technologies and interfaces. Future systems are also expected to experience an increased rate of errors, such that the applications will no longer be able to assume correct behavior of the underlying machine. To enable the scientific community to succeed in scaling their applications, and to harness the capabilities of exascale systems, we need software strategies that enable explicit management of resilience to errors in the system, in addition to locality of reference in the complex memory hierarchies of future HPC systems.

In prior work, we introduced the concept of explicitly reliable memory regions, called havens. Memory management using havens supports reliability management through a region-based approach to memory allocations. Havens enable the creation of robust memory regions, whose resilient behavior is guaranteed by software-based protection schemes. In this paper, we propose language support for havens through type annotations that make the structure of a program’s havens more explicit and convenient for HPC programmers to use. We describe how the extended haven-based memory management model is implemented, and demonstrate the use of the language-based annotations to affect the resiliency of a conjugate gradient solver application.

Keywords

Program Object Memory Management Static Annotation Performance Overhead Type Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19328-6_1 CrossRefGoogle Scholar
  2. 2.
    Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dallya, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K.: Exascale computing study: technology challenges in achieving exascale systems. Technical report, DARPA, September 2008Google Scholar
  3. 3.
    DeBardeleben, N., Laros, J., Daly, J., Scott, S., Engelmann, C., Harrod, B.: High-end computing resilience: analysis of issues facing the HEC community and path-forward for research and development. Whitepaper, December 2009Google Scholar
  4. 4.
    Amarasinghe, S., Hall, M., Lethin, R., Pingali, K., Quinlan, D., Sarkar, V., Shalf, J., Lucas, R., Yelick, K., Balaji, P., Diniz, P.C., Koniges, A., Snir, M., Sachs, S.R., Yelick, K.: Exascale programming challenges: report of the 2011 workshop on exascale programming challenges. Technical report, U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR), July 2011Google Scholar
  5. 5.
    Hukerikar, S., Engelmann, C.: Havens: explicit reliable memory regions for HPC applications. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6, September 2016Google Scholar
  6. 6.
    Tofte, M., Talpin, J.P.: Implementation of the typed call-by-value \({\lambda }\)-calculus using a stack of regions. In: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1994, pp. 188–201. ACM, New York (1994)Google Scholar
  7. 7.
    Ross, D.T.: The AED free storage package. Commun. ACM 10(8), 481–492 (1967)CrossRefzbMATHGoogle Scholar
  8. 8.
    Vo, K.P.: Vmalloc: a general and efficient memory allocator. Softw. Pract. Exp. 26(3), 357–374 (1996)CrossRefGoogle Scholar
  9. 9.
    Hanson, D.R.: Fast allocation and deallocation of memory based on object lifetimes. Softw. Pract. Exp. 20(1), 5–12 (1990)CrossRefGoogle Scholar
  10. 10.
    Barrett, D.A., Zorn, B.G.: Using lifetime predictors to improve memory allocation performance. In: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, PLDI 1993, New York, NY, USA, pp. 187–196 (1993)Google Scholar
  11. 11.
    Gay, D., Aiken, A.: Memory management with explicit regions. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI 1998, pp. 313–323. ACM, New York (1998)Google Scholar
  12. 12.
    Aiken, A., Fähndrich, M., Levien, R.: Better static memory management: improving region-based analysis of higher-order languages. In: Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI 1995, pp. 174–18 (1995)Google Scholar
  13. 13.
    Tofte, M., Birkedal, L., Elsman, M., Hallenberg, N., Olesen, T.H., Sestoft, P., Bertelsen, P.: Programming with regions in the ML kit. Technical report (diku-tr-97/12), University of Copenhagen, Denmark, April 1997Google Scholar
  14. 14.
    Makholm, H.: A region-based memory manager for prolog. In: Proceedings of the 2nd International Symposium on Memory Management, ISMM 2000, pp. 25–34. ACM, New York (2000)Google Scholar
  15. 15.
    Grossman, D., Morrisett, G., Jim, T., Hicks, M., Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI 2002, pp. 282–293. ACM, New York (2002)Google Scholar
  16. 16.
    Rust: the rust programming language. http://www.rust-lang.org
  17. 17.
    Chung, J., Lee, I., Sullivan, M., Ryoo, J.H., Kim, D.W., Yoon, D.H., Kaplan, L., Erez, M.: Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 58:1–58:11 (2012)Google Scholar
  18. 18.
    Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72, 1–33 (2016)CrossRefGoogle Scholar
  19. 19.
    Chien, A., Balaji, P., Beckman, P., Dun, N., Fang, A., Fujita, H., Iskra, K., Rubenstein, Z., Zheng, Z., Schreiber, R., Hammond, J., Dinan, J., Laguna, I., Richards, D., Dubey, A., van Straalen, B., Hoemmen, M., Heroux, M., Teranishi, K., Siegel, A.: Versioned distributed arrays for resilience in scientific applications: global view resilience. Procedia Comput. Sci. 51, 29–38 (2015)CrossRefGoogle Scholar
  20. 20.
    Bridges, P.G., Hoemmen, M., Ferreira, K.B., Heroux, M.A., Soltero, P., Brightwell, R.: Cooperative application/OS DRAM fault recovery. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 241–250. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-29740-3_28 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Computer Science and Mathematics DivisionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations