The Challenges of Efficient Code-Generation for Massively Parallel Architectures

McGuiness, Jason M; Egan, Colin; Christianson, Bruce; Gao, Guang

doi:10.1007/11859802_38

Jason M McGuiness¹⁸,
Colin Egan¹⁸,
Bruce Christianson¹⁸ &
…
Guang Gao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4186))

Included in the following conference series:

Asia-Pacific Conference on Advances in Computer Systems Architecture

641 Accesses
1 Citations

Abstract

Overcoming the memory wall [15] may be achieved by increasing the bandwidth and reducing the latency of the processor to memory connection, for example by implementing Cellular architectures, such as the IBM Cyclops. Such massively parallel architectures have sophisticated memory models. In this paper we used DIMES (the Delaware Iterative Multiprocessor Emulation System), developed by CAPSL at the University of Delaware, as a hardware evaluation tool for cellular architectures. The authors contend that there is an open question regarding the potential, ideal approach to parallelism from the programmer’s perspective. For example, at language-level such as UPC or HPF, or using trace-scheduling, or at a library-level, for example OpenMP or POSIX-threads. To investigate this, we have chosen to use a threaded Mandelbrot-set generator with a work-stealing algorithm to evaluate the DIMES cthread programming model for writing a simple multi-threaded program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Almásil, G., Cascaval, C., Castaños, J.G., Denneau, M., Lieber, D., Moreira, J.E., Warren, H.S.: Dissecting Cyclops: Detailed Analysis of a Multithreaded Architecture. ACM SIGARCH Computer Architecture News 31 (March 2003)
Google Scholar
Cascaval, C., Castaños, J.G., Ceze, L., Denneau, M., Gupta, M., Lieber, D., Moreira, J.E., Strauss, K., Warren, H.S.: Evaluation of a Multithreaded Architecture for Cellular Computing. In: 8th International Symposium on High-Performance Computer Architecture (HPCA) (2002)
Google Scholar
Cavalherio, G.G.H., Doreille, M., Galilée, F., Gautier, T., Roch, J.-L.: Scheduling Parallel Programs on Non-Uniform Memory Architectures. In: HPCA Conference – Workshop on Parallel Computing for Irregular Applications WPCIA1, Orlando, USA (January 1999)
Google Scholar
del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), Madison, Wisconsin, June 4 (2005)
Google Scholar
del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture. In: Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, Denver, Colorado, April 3 - 8 (2005)
Google Scholar
Duller, A., Towner, D., Panesar, G., Gray, A., Robbins, W.: picoArray technology: the tool’s story. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. IEEE, Los Alamitos (2005)
Google Scholar
Gao, G.R., Sarkar, V.: Location Consistency - a New Memory Model and Cache Consistency Protocol. IEEE Transactions on Computers 49(8) (August 2000)
Google Scholar
Gao, G.R., Theobald, K.B., Govindarajan, R., Leung, C., Hu, Z., Wu, H., Lu, J., del Cuvillo, J., Jacquet, A., Janot, V., Sterling, T.L.: Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress. In: International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 22 - 26 (2003)
Google Scholar
El-Ghazawi, T.A., Carlson, W.W., Draper, J.M.: UPC Language Specifications V1.1.1 (October 2003)
Google Scholar
Kakulavarapu, P., Morrone, C.J., Theobald, K., Amaral, J.N., Gao, G.R.: A Comparative Performance Study of Fine-Grain Multi-threading on Distributed Memory Machines. In: 19th IEEE International Performance, Computing and Communication Conference-IPCCC 2000, Phoenix, Arizona, USA, February 20-22 (2000)
Google Scholar
M^cGuiness, J.M.: A DIMES Demonstration Application: Mandelbrot-Set Generation Using a Work-Stealing Algorithm. CAPSL Technical Note 11, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware (June 2003), ftp://ftp.capsl.udel.edu/pub/doc/notes
Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H.Freeman & Co., New York (1982)
MATH Google Scholar
Rodenas, D., Martorell, X., Ayguade, E., Labarta, J., Almasi, G., Cascaval, C., Castanos, J., Moreira, J.: Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture. In: 19th IEEE International Parallel and Distributed Processing Symposium, vol. 1, p. 110 (2005)
Google Scholar
Sakane, H., Yakay, L., Karna, V., Leung, C., Gao, G.R.: DIMES: An Iterative Emulation Platform for Multiprocessor-System-on-Chip Designs. In: IEEE International Conference on Field-Programmable Technology, Tokyo, Japan, December 15-17 (2003)
Google Scholar
Wulf, W., McKee, S.: Hitting the memory wall: Implications of the obvious. Computer Architecture News 23(1), 20–24 (1995)
Article Google Scholar
Zhang, Y., Zhu., W., Chen, F., Hu, Z., Gao, G.R.: Sequential Consistency Revisited: The Sufficient Conditions and Method to Reason Consistency Model of a Multiprocessor-on-a chip Architecture. In: The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), Innsbruck, Austria, February 15 - 17 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Compiler Technology and Computer Architecture, University of Hertfordshire, Hatfield, Hertfordshire, AL10 9AB, U.K.
Jason M McGuiness, Colin Egan & Bruce Christianson
CAPSL, University of Delaware, Delaware, U.S.A.
Guang Gao

Authors

Jason M McGuiness
View author publications
You can also search for this author in PubMed Google Scholar
Colin Egan
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Christianson
View author publications
You can also search for this author in PubMed Google Scholar
Guang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Systems Architecture Group, University of Amsterdam, The Netherlands
Chris Jesshope
School of Computer Science, University of Hertfordshire, College Lane, AL10 9AB, Hatfield, UK
Colin Egan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McGuiness, J.M., Egan, C., Christianson, B., Gao, G. (2006). The Challenges of Efficient Code-Generation for Massively Parallel Architectures. In: Jesshope, C., Egan, C. (eds) Advances in Computer Systems Architecture. ACSAC 2006. Lecture Notes in Computer Science, vol 4186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11859802_38

Download citation

DOI: https://doi.org/10.1007/11859802_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40056-1
Online ISBN: 978-3-540-40058-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics