Skip to main content

Language Support for Reliable Memory Regions

  • Conference paper
  • First Online:
  • 883 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Abstract

The path to exascale computational capabilities in high-performance computing (HPC) systems is challenged by the inadequacy of present software technologies to adapt to the rapid evolution of architectures of supercomputing systems. The constraints of power have driven system designs to include increasingly heterogeneous architectures and diverse memory technologies and interfaces. Future systems are also expected to experience an increased rate of errors, such that the applications will no longer be able to assume correct behavior of the underlying machine. To enable the scientific community to succeed in scaling their applications, and to harness the capabilities of exascale systems, we need software strategies that enable explicit management of resilience to errors in the system, in addition to locality of reference in the complex memory hierarchies of future HPC systems.

In prior work, we introduced the concept of explicitly reliable memory regions, called havens. Memory management using havens supports reliability management through a region-based approach to memory allocations. Havens enable the creation of robust memory regions, whose resilient behavior is guaranteed by software-based protection schemes. In this paper, we propose language support for havens through type annotations that make the structure of a program’s havens more explicit and convenient for HPC programmers to use. We describe how the extended haven-based memory management model is implemented, and demonstrate the use of the language-based annotations to affect the resiliency of a conjugate gradient solver application.

This work was sponsored by the U.S. Department of Energy’s Office of Advanced Scientific Computing Research. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19328-6_1

    Chapter  Google Scholar 

  2. Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dallya, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K.: Exascale computing study: technology challenges in achieving exascale systems. Technical report, DARPA, September 2008

    Google Scholar 

  3. DeBardeleben, N., Laros, J., Daly, J., Scott, S., Engelmann, C., Harrod, B.: High-end computing resilience: analysis of issues facing the HEC community and path-forward for research and development. Whitepaper, December 2009

    Google Scholar 

  4. Amarasinghe, S., Hall, M., Lethin, R., Pingali, K., Quinlan, D., Sarkar, V., Shalf, J., Lucas, R., Yelick, K., Balaji, P., Diniz, P.C., Koniges, A., Snir, M., Sachs, S.R., Yelick, K.: Exascale programming challenges: report of the 2011 workshop on exascale programming challenges. Technical report, U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR), July 2011

    Google Scholar 

  5. Hukerikar, S., Engelmann, C.: Havens: explicit reliable memory regions for HPC applications. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6, September 2016

    Google Scholar 

  6. Tofte, M., Talpin, J.P.: Implementation of the typed call-by-value \({\lambda }\)-calculus using a stack of regions. In: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1994, pp. 188–201. ACM, New York (1994)

    Google Scholar 

  7. Ross, D.T.: The AED free storage package. Commun. ACM 10(8), 481–492 (1967)

    Article  MATH  Google Scholar 

  8. Vo, K.P.: Vmalloc: a general and efficient memory allocator. Softw. Pract. Exp. 26(3), 357–374 (1996)

    Article  Google Scholar 

  9. Hanson, D.R.: Fast allocation and deallocation of memory based on object lifetimes. Softw. Pract. Exp. 20(1), 5–12 (1990)

    Article  Google Scholar 

  10. Barrett, D.A., Zorn, B.G.: Using lifetime predictors to improve memory allocation performance. In: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, PLDI 1993, New York, NY, USA, pp. 187–196 (1993)

    Google Scholar 

  11. Gay, D., Aiken, A.: Memory management with explicit regions. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI 1998, pp. 313–323. ACM, New York (1998)

    Google Scholar 

  12. Aiken, A., Fähndrich, M., Levien, R.: Better static memory management: improving region-based analysis of higher-order languages. In: Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI 1995, pp. 174–18 (1995)

    Google Scholar 

  13. Tofte, M., Birkedal, L., Elsman, M., Hallenberg, N., Olesen, T.H., Sestoft, P., Bertelsen, P.: Programming with regions in the ML kit. Technical report (diku-tr-97/12), University of Copenhagen, Denmark, April 1997

    Google Scholar 

  14. Makholm, H.: A region-based memory manager for prolog. In: Proceedings of the 2nd International Symposium on Memory Management, ISMM 2000, pp. 25–34. ACM, New York (2000)

    Google Scholar 

  15. Grossman, D., Morrisett, G., Jim, T., Hicks, M., Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI 2002, pp. 282–293. ACM, New York (2002)

    Google Scholar 

  16. Rust: the rust programming language. http://www.rust-lang.org

  17. Chung, J., Lee, I., Sullivan, M., Ryoo, J.H., Kim, D.W., Yoon, D.H., Kaplan, L., Erez, M.: Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 58:1–58:11 (2012)

    Google Scholar 

  18. Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72, 1–33 (2016)

    Article  Google Scholar 

  19. Chien, A., Balaji, P., Beckman, P., Dun, N., Fang, A., Fujita, H., Iskra, K., Rubenstein, Z., Zheng, Z., Schreiber, R., Hammond, J., Dinan, J., Laguna, I., Richards, D., Dubey, A., van Straalen, B., Hoemmen, M., Heroux, M., Teranishi, K., Siegel, A.: Versioned distributed arrays for resilience in scientific applications: global view resilience. Procedia Comput. Sci. 51, 29–38 (2015)

    Article  Google Scholar 

  20. Bridges, P.G., Hoemmen, M., Ferreira, K.B., Heroux, M.A., Soltero, P., Brightwell, R.: Cooperative application/OS DRAM fault recovery. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 241–250. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29740-3_28

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Hukerikar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hukerikar, S., Engelmann, C. (2017). Language Support for Reliable Memory Regions. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52709-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52708-6

  • Online ISBN: 978-3-319-52709-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics