Abstract
Highly scalable parallel computers, e.g. SCI-coupled workstation clusters, are NUMA architectures. Thus good static locality is essential for high performance and scalability of parallel programs on these machines. This paper describes novel techniques to optimize static locality at compilation time by application of data transformations and data distributions. The metric which guides the optimizations employs Ehrhart polynomials and allows to calculate the amount of static locality precisely. The effectiveness of our novel techniques has been confirmed by experiments conducted on the SCI-coupled workstation cluster of the PC 2 at the University of Paderborn.1
This work has been supported in part by the DFG Sonderforschungsbereich 376 “Massive Parallelität — Algorithmen, Entwurfsmethoden, Anwendungen”, Paderborn
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and computation transformations for multiprocessors. In PPOPP 95, Santa Clara, CA USA, pages 166–178, June 1995.
E. Ayguade, J. Garcia, M. Girones, and J. Labarta. Detecting and using affinity in an automatic data distribution tool. Lecture Notes in Computer Science, 892:61–75, 1995.
P. Clauss. Counting Solutions to Linear and Nonlinear Constraints through Ehrhart Polynomials. In ACM Int. Conf. on Supercomputing. ACM, May 1996.
P. Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and Microprocessors, 41:425–439, 1995.
F. Heine. Optimierung der Datenverteilung für SCI-gekoppelte Workstation-Cluster. Master’s thesis, Universität-GH Paderborn, May 1999.
C. H. Koelbel. The High Performance Fortran handbook. Scientific and engineering computation. MIT Press, Cambridge, MA, USA, Jan. 1994.
U. Kremer. Automatic Data Layout for Distributed Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, Oct. 1995.
C. Lengauer. Loop parallelization in the polytope model. Technical report, Universität Passau, Fakultät für Mathematik und Informatik, 1993.
A. Slowik. Volume Driven Selection of Loop and Data Transformations for Cache-Coherent Parallel Processors. PhD thesis, Universität-GH Paderborn, 1999. To appear (submitted).
D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Intitut de Recherche en Informatique et Systèmes Aléatoires, Dec. 1993.
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, pages 30–44, June 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heine, F., Slowik, A. (2000). Volume Driven Data Distribution for NUMA-Machines. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds) Euro-Par 2000 Parallel Processing. Euro-Par 2000. Lecture Notes in Computer Science, vol 1900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44520-X_53
Download citation
DOI: https://doi.org/10.1007/3-540-44520-X_53
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67956-1
Online ISBN: 978-3-540-44520-3
eBook Packages: Springer Book Archive