Skip to main content
Log in

Compiler Support for Array Distribution on NUMA Shared Memory Multiprocessors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Management of program data to improve data locality and reduce false sharing is critical for scaling performance on NUMA shared memory multiprocessors. We use HPF-like data decomposition directives to partition and place arrays in data-parallel applications on Hector, a shared-memory NUMA multiprocessor. We describe a compiler system for automating the partitioning and placement of arrays. The compiler exploits Hectors shared memory architecture to efficiently implement distributed arrays. Experimental results from a prototype implementation demonstrate the effectiveness of these techniques. They also demonstrate the magnitude of the performance improvement attainable when our compiler-based data management schemes are used instead of operating system data management policies; performance improves by up to a factor of 5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T. Abdelrahman and T. Wong. Distributed array data management on NUMA multiprocessors. In Proceedings of Scalable High Performance Computing Conference, pages 551–559, 1994.

  2. J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 112–125, 1993.

  3. V. Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9:(6):154–170, 1990.

    Google Scholar 

  4. R. Chandra. The COOL parallel programming language: design, implementation and performance. Ph.D. Thesis, Department of Computer Science, Stanford University, 1995.

  5. R. Chandra, S. Devine, B. Verghese, A. Gupta and M. Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the 6th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–24, 1994.

  6. B. Chapman, P. Mehrotra and H. Zima. Vienna Fortran-a language extension for distributed memory multiprocessors. Technical report 91–72, ICASE, 1991.

  7. Convex Computer Corporation. Convex Exemplar System Overview. Document No. 080–002293–000, Richardson, TX, USA, 1994.

    Google Scholar 

  8. Cray Research. The T3D massively parallel processor system. Cray Research, 1993.

  9. S. Hiranandani, K. Kennedy and C. Tseng. Compiling Fortran D. Communications of the ACM, 35(8):66–80, 1992.

    Google Scholar 

  10. C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA 1994.

    Google Scholar 

  11. U. Kremer. Automatic data layout for distributed memory machines. Ph.D. Thesis, Department of Computer Science, Rice University, 1995.

  12. R. LaRowe, C. Ellis, and L. Kaplan. The robustness of NUMA memory management. In Proceedings of the Symposium on Operating System Principles, pages 137–151, 1991.

  13. R. LaRowe, J. Wilkes and C. Ellis. Exploiting operating system support for dynamic page placement on NUMA shared memory multiprocessors. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming, pages 122–132, 1991.

  14. D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz and M. Lam. The Stanford Dash multiprocessor. IEEE Computer, 25(3):63–79, 1992.

    Google Scholar 

  15. H. Li and K. Sevcik. NUMACROS: Data Parallel Programming on NUMA Multiprocessors. In Proceedings of 4th Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), pages 247–263, 1993.

  16. D. Palmero and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. In Workshop on Languages and Compilers for Parallel Computing, pages 392–406, 1995.

  17. H. Sandu. Shared regions: a strategy for efficient cache management in shared-memory multiprocessors. Ph.D. Thesis, Department of Computer Science, University of Toronto, 1995.

  18. H. Sandhu, B. Gamsa and S. Zhou. The shared region approach to software cache coherence on multi-processors. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 229–238, 1993.

  19. D. Solow, Linear programming: An introduction to finite improvement algorithms, North-Holland, New York, 1984.

    Google Scholar 

  20. M. Stumm, R. Unrau and O. Krieger. Clustering micro-kernels for scalability. In USENIX Workshop on Micro-Kernels, pages 285–303, 1992.

  21. S. Tandri and T. Abdelrahman. Automatic partitioning of data and computations on scalable shared memory multiprocessors. In Proceedings of the Int'l Conference on Parallel Processing, pages 64–73, 1997.

  22. B. Verghese, S. Devine, A. Gupta and M. Rosenblum. OS support for improving data locality on CC-NUMA compute servers. In Proceedings of the 7th Int'l Conference on Architectural Support for Programming Languages and Operating Systems, pages 279–289, 1996.

  23. Z. Vranesic, M. Stumm, R. White and D. Lewis. The Hector multiprocessor. IEEE Computer, 24(1):72–79, 1991.

    Google Scholar 

  24. M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, Redwood City, CA, 1996.

    Google Scholar 

  25. T. Wong. Data partitioning based compiling techniques for NUMA shared memory multiprocessors. M.A.Sc. Thesis, Department of Electrical and Computer Engineering, University of Toronto, 1994.

  26. H. Zima, H. Bast and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6(1):1–18, 1988.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelrahman, T.S., Wong, T.N. Compiler Support for Array Distribution on NUMA Shared Memory Multiprocessors. The Journal of Supercomputing 12, 349–371 (1998). https://doi.org/10.1023/A:1008035807599

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008035807599

Navigation