Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport

Antony, Joseph; Janes, Pete P.; Rendell, Alistair P.

doi:10.1007/11945918_35

Joseph Antony²⁰,
Pete P. Janes²⁰ &
Alistair P. Rendell²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4297))

Included in the following conference series:

International Conference on High-Performance Computing

1004 Accesses
27 Citations

Abstract

Modern shared memory multiprocessor systems commonly have non-uniform memory access (NUMA) with asymmetric memory bandwidth and latency characteristics. Operating systems now provide application programmer interfaces allowing the user to perform specific thread and memory placement. To date, however, there have been relatively few detailed assessments of the importance of memory/thread placement for complex applications.

This paper outlines a framework for performing memory and thread placement experiments on Solaris and Linux. Thread binding and location specific memory allocation and its verification is discussed and contrasted.

Using the framework, the performance characteristics of serial versions of lmbench, Stream and various BLAS libraries (ATLAS, GOTO, ACML on Opteron/Linux and Sunperf on Opteron, UltraSPARC/Solaris) are measured on two different hardware platforms (UltraSPARC/FirePlane and Opteron/HyperTransport). A simple model describing performance as a function of memory distribution is proposed and assessed for both the Opteron and UltraSPARC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brecht, T.: On the Importance of Parallel Application Placement in NUMA Multiprocessors. In: Proceedings of the Fourth Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), pp. 1–18 (1993)
Google Scholar
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)
Article Google Scholar
Celestica Inc. AMD A8440 4U 4 Processor SCSI System, http://www.celestica.com/products/A8440.asp
Charlesworth, A.: The Sun Fireplane System Interconnect. In: Supercomputing 2001: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM). ACM Press, New York (2001)
Google Scholar
Culler, D.E., Gupta, A., Singh, J.P.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
Google Scholar
Butenhof, D.R.: Programming with POSIX Threads. Addison-Wesley Professional, Reading (1997)
Google Scholar
Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J., Ayguadé, E.: Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 415–427. Springer, Heidelberg (2000)
Chapter Google Scholar
Goto, K., van de Geijn, R.A.: Anatomy of High-Performance Matrix Multiplication. ACM Transactions on Mathematical Software (in submission, 2006)
Google Scholar
Trodden, J., Anderson, D.: HyperTransport System Architecture. Addison-Wesley Professional, Reading (2003)
Google Scholar
McCalpin, J.: Stream: Sustainable memory bandwidth in high performance computers, http://www.cs.virginia.edu/stream
Chew, J.: Memory Placement Optimisation, http://www.opensolaris.org/os/community/performance/mpo_overview.pdf
Keltcher, C.N., McGrath, K.J., Ahmed, A., Conway, P.: The AMD Opteron Processor for Multiprocessor Servers. IEEE Micro 23(2), 66–76 (2003)
Article Google Scholar
McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996)
Google Scholar
Novell. A NUMA API for Linux, http://www.novell.com/collateral/4621437/4621437.pdf
Ekman, P.: Linux kernel memory-to-node mappings, http://www.pdc.kth.se/~pek/linux/NUMA/
Robertson, N., Rendell, A.P.: OpenMP and NUMA Architectures I: Investigating Memory Placement on the SGI Origin 3000. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 648–656. Springer, Heidelberg (2003)
Chapter Google Scholar
Chandra, R., Menon, R., et al.: Parallel Programming in OpenMP. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Sun Microsystems. Solaris 10 : Extended Library Functions, http://docs.sun.com/app/docs/doc/817-0679
Sun Microsystems. Solaris 10: Programming Interfaces Guide, http://docs.sun.com/app/docs/doc/817-4415
Sun Microsystems. UltraSPARC III Cu User’s Manual. Sun Microsystems, Santa Clara, California, USA, Version 2.2.1. (January 2004)
Google Scholar
Sun Microsystems Inc. The Sun Fire V1280 Server Architecture(November 2002), http://www.sun.com/servers/midrange
Tikir, M.M., Hollingsworth, J.K.: Using Hardware Counters to Automatically Improve Memory Performance. In: SC, p. 46. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.: ATLAS. Parallel Computing 27(1-2), 3–35 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Australian National University, Canberra, Australia
Joseph Antony, Pete P. Janes & Alistair P. Rendell

Authors

Joseph Antony
View author publications
You can also search for this author in PubMed Google Scholar
Pete P. Janes
View author publications
You can also search for this author in PubMed Google Scholar
Alistair P. Rendell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

,
Yves Robert
Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, 94 Brett Road, NJ 08854, Piscataway, USA
Manish Parashar
Hewlett-Packard ISO, Sy 192, Whitefield Road, Mahadevapura Post, 560048, Bangalore, India
Ramamurthy Badrinath
Department of Electrical Engineering, University of Southern California, 90089-2562, Los Angeles, CA, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antony, J., Janes, P.P., Rendell, A.P. (2006). Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport. In: Robert, Y., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2006. HiPC 2006. Lecture Notes in Computer Science, vol 4297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945918_35

Download citation

DOI: https://doi.org/10.1007/11945918_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68039-0
Online ISBN: 978-3-540-68040-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics