A programming interface for NUMA shared-memory clusters

  • Marcus Dormanns
  • Walter Sprangers
  • Hubert Ertl
  • Thomas Bemmerl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1225)


We describe a programming interface for parallel computing on NUMA (Non-Uniform Memory Access) shared memory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufacturers offer products of this type, there is still a lack in parallelization support. We developed SMI, the Shared Memory Interface, and implemented it as a library on an SCI-coupled cluster of workstations. It aims at providing sophisticated support to account for the NUMA performance characteristic and to allow a step-by-step parallelization. We show it's application to the parallelization of a sparse matrix computation.


parallel programming interface shared memory parallelization NUMA multiprocessor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Abandah, G. A.; Davidson, E. S.: Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000. Technical Report CSE-TR-277-96, Dept. of EECS, Univ. of Michigan, Ann Arbor, 1996.Google Scholar
  2. [2]
    Adve, S. V.; Gharachorloo, K.: Shared Memory Consistency Models: A Tutorial. WRL Research Report 95/7, Digital Western Res. Labs, Palo Alto, California, 1995.Google Scholar
  3. [3]
    Bemmerl, T.; Ries, B.: Programming Tools for Distributed Multiprocessor Environments. Int. J. of High Speed Comp., Vol. 5, No. 7, pp. 595–615, 1993.Google Scholar
  4. [4]
    Carter, J. B.; Bennett, J. K., Zwaenepoel, W.: Implementation and Performance of Munin. Proc. 13th ACM Symp. on Operating Sys. Principles (SOSP), pp. 152–164, Oct. 1991.Google Scholar
  5. [5]
    Chandra, R.; Gharachorloo, K.; Soundararajan, V.; Gupta, A.: Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols. Proc. 8th ACM Int. Conf. on Supercomputing, pp. 274–288, 1994.Google Scholar
  6. [6]
    Clark, R.; Alnes, K.: SCI Interconnect Chipset and Adapter: Building Large Scale Enterprise Servers with Pentium Pro SHV Nodes. Proc. Hot Interconnects IV, 1996.Google Scholar
  7. [7]
    Convex Computer Corp.: Convex Exemplar Architecture. 1994Google Scholar
  8. [8]
    Dolphin Interconnect Solutions, AS: SPARC SBus-SCI Cluster Adapter Card. White Paper, June 1995.Google Scholar
  9. [9]
    Dormanns, M.; Sprangers, W.; Ertl, H.; Bemmerl, T.: Performance Potential of a SCI Workstation Cluster for Grid-Based Scientific Codes. Proc. High Perf. Computing, 1997.Google Scholar
  10. [10]
    Falfasi, B.; Lebeck, A. R.; Reinhardt, S. K.; Schoinas, I.; Hill, M. D.; Larus, J. R.; Rogers, A.; Wood, D. A.: Application-Specific Protocols for User-Level Shared Memory. Proc. Supercomputing, 1994.Google Scholar
  11. [11]
    George, A.; Todd, R.; Phillips, W.; Miars, M.; Rosen, W.: Parallel Processing Experiments on an SCI-based Workstation Cluster. Proc. 5th Int. Workshop on SCI-based High-Perf. Low-Cost Computing, pp. 29–39, March 1996.Google Scholar
  12. [12]
    Gharachorloo, K.; Gupta, A.; Hennessy, J.: Performance Evaluation of memory Consistency Models for Shared-Memory Multiprocessors. Proc. 4th Int. Conf. on Arch. Support for Prog. Languages and Operating Systems, pp. 245–257, 1991.Google Scholar
  13. [13]
    Gillet, R. B.: Memory Channel Network for PCI. IEEE Micro, pp. 12–18, Feb. 1996.Google Scholar
  14. [14]
    IEEE: ANSI/IEEE Std. 1596–1992, Scalable Coherent Interface (SCI). 1992.Google Scholar
  15. [15]
    Iftode, L.; Singh, J.P.; Li, K.: Irregular Applications under Software Shared Memory. Technical Report TR-514-96, Dept. of Computer Science, Princeton Univ, 1996.Google Scholar
  16. [16]
    Lamport, L.: How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, C-28(9), pp. 241–248, Sept. 1979.Google Scholar
  17. [17]
    Lenoski, D. E.; Weber, W.-D.: Scalable Shared-Memory Multiprocessing. Morgan Kaufmann Publishers, 1995.Google Scholar
  18. [18]
    Lovett, T.; Clapp, R.: STiNG: A CC-NUMA Computer System for the Commercial Market-place. Proc. 23rd Annual Int. Symp. on Comp. Architecture, 1996.Google Scholar
  19. [19]
    Nieplocha, J.; Harrison, R. J.; Littlefield, R. J.: GLOBAL Arrays: A Portable “Shared-Memory” Programming Model for Distributed Memory Computers. Proc. Supercomputing, 1994.Google Scholar
  20. [20]
    Omang, K.; Parady, B.: Performance of Low-Cost UltraSparc Multiprocessors connected by SCI. Research Report No. 219, Univ. of Oslo, Dept. of Comp. Science, June 1996.Google Scholar
  21. [21]
    Protic, J.; Tomasevic, M.; Milutinovic, V.: Distributed Shared Memory: Concepts and Systems. IEEE Par. & Distr. Technology, Vol. 4, No. 2, pp. 63–79, 1996.Google Scholar
  22. [22]
    Saad, Y.: SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Technical Report 90-20, Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffet Field, CA, 1990.Google Scholar
  23. [23]
    Sandhu, H. S.; Gamsa, B.; Zhou, S.: The Shared Region Approach to Software Cache Coherence on Multiprocessors. Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 229–238, 1993.Google Scholar
  24. [24]
    Torrellas, J.; Lam, M. S.; Hennessy, J. L.: False Sharing and Spatial Locality in Multiprocessor Caches. IEEE TOC, June 1994.Google Scholar
  25. [25]
    Zhang, X.; Yan, Y.; Castaneda, R.: Evaluating and Designing Software Mutual Exclusion Algorithms on Shared-Memory Multiprocessors. IEEE Par. and Distrib. Tech., Vol. 4, No. 1, pp.25–42, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Marcus Dormanns
    • 1
  • Walter Sprangers
    • 1
  • Hubert Ertl
    • 1
  • Thomas Bemmerl
    • 1
  1. 1.Lehrstuhl für BetriebssystemeRWTH AachenAachenGermany

Personalised recommendations