Skip to main content

The Challenges of Efficient Code-Generation for Massively Parallel Architectures

  • Conference paper
Advances in Computer Systems Architecture (ACSAC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4186))

Included in the following conference series:

Abstract

Overcoming the memory wall [15] may be achieved by increasing the bandwidth and reducing the latency of the processor to memory connection, for example by implementing Cellular architectures, such as the IBM Cyclops. Such massively parallel architectures have sophisticated memory models. In this paper we used DIMES (the Delaware Iterative Multiprocessor Emulation System), developed by CAPSL at the University of Delaware, as a hardware evaluation tool for cellular architectures. The authors contend that there is an open question regarding the potential, ideal approach to parallelism from the programmer’s perspective. For example, at language-level such as UPC or HPF, or using trace-scheduling, or at a library-level, for example OpenMP or POSIX-threads. To investigate this, we have chosen to use a threaded Mandelbrot-set generator with a work-stealing algorithm to evaluate the DIMES cthread programming model for writing a simple multi-threaded program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almásil, G., Cascaval, C., Castaños, J.G., Denneau, M., Lieber, D., Moreira, J.E., Warren, H.S.: Dissecting Cyclops: Detailed Analysis of a Multithreaded Architecture. ACM SIGARCH Computer Architecture News 31 (March 2003)

    Google Scholar 

  2. Cascaval, C., Castaños, J.G., Ceze, L., Denneau, M., Gupta, M., Lieber, D., Moreira, J.E., Strauss, K., Warren, H.S.: Evaluation of a Multithreaded Architecture for Cellular Computing. In: 8th International Symposium on High-Performance Computer Architecture (HPCA) (2002)

    Google Scholar 

  3. Cavalherio, G.G.H., Doreille, M., Galilée, F., Gautier, T., Roch, J.-L.: Scheduling Parallel Programs on Non-Uniform Memory Architectures. In: HPCA Conference – Workshop on Parallel Computing for Irregular Applications WPCIA1, Orlando, USA (January 1999)

    Google Scholar 

  4. del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), Madison, Wisconsin, June 4 (2005)

    Google Scholar 

  5. del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture. In: Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, Denver, Colorado, April 3 - 8 (2005)

    Google Scholar 

  6. Duller, A., Towner, D., Panesar, G., Gray, A., Robbins, W.: picoArray technology: the tool’s story. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. IEEE, Los Alamitos (2005)

    Google Scholar 

  7. Gao, G.R., Sarkar, V.: Location Consistency - a New Memory Model and Cache Consistency Protocol. IEEE Transactions on Computers 49(8) (August 2000)

    Google Scholar 

  8. Gao, G.R., Theobald, K.B., Govindarajan, R., Leung, C., Hu, Z., Wu, H., Lu, J., del Cuvillo, J., Jacquet, A., Janot, V., Sterling, T.L.: Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress. In: International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 22 - 26 (2003)

    Google Scholar 

  9. El-Ghazawi, T.A., Carlson, W.W., Draper, J.M.: UPC Language Specifications V1.1.1 (October 2003)

    Google Scholar 

  10. Kakulavarapu, P., Morrone, C.J., Theobald, K., Amaral, J.N., Gao, G.R.: A Comparative Performance Study of Fine-Grain Multi-threading on Distributed Memory Machines. In: 19th IEEE International Performance, Computing and Communication Conference-IPCCC 2000, Phoenix, Arizona, USA, February 20-22 (2000)

    Google Scholar 

  11. McGuiness, J.M.: A DIMES Demonstration Application: Mandelbrot-Set Generation Using a Work-Stealing Algorithm. CAPSL Technical Note 11, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware (June 2003), ftp://ftp.capsl.udel.edu/pub/doc/notes

  12. Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H.Freeman & Co., New York (1982)

    MATH  Google Scholar 

  13. Rodenas, D., Martorell, X., Ayguade, E., Labarta, J., Almasi, G., Cascaval, C., Castanos, J., Moreira, J.: Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture. In: 19th IEEE International Parallel and Distributed Processing Symposium, vol. 1, p. 110 (2005)

    Google Scholar 

  14. Sakane, H., Yakay, L., Karna, V., Leung, C., Gao, G.R.: DIMES: An Iterative Emulation Platform for Multiprocessor-System-on-Chip Designs. In: IEEE International Conference on Field-Programmable Technology, Tokyo, Japan, December 15-17 (2003)

    Google Scholar 

  15. Wulf, W., McKee, S.: Hitting the memory wall: Implications of the obvious. Computer Architecture News 23(1), 20–24 (1995)

    Article  Google Scholar 

  16. Zhang, Y., Zhu., W., Chen, F., Hu, Z., Gao, G.R.: Sequential Consistency Revisited: The Sufficient Conditions and Method to Reason Consistency Model of a Multiprocessor-on-a chip Architecture. In: The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), Innsbruck, Austria, February 15 - 17 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McGuiness, J.M., Egan, C., Christianson, B., Gao, G. (2006). The Challenges of Efficient Code-Generation for Massively Parallel Architectures. In: Jesshope, C., Egan, C. (eds) Advances in Computer Systems Architecture. ACSAC 2006. Lecture Notes in Computer Science, vol 4186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11859802_38

Download citation

  • DOI: https://doi.org/10.1007/11859802_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40056-1

  • Online ISBN: 978-3-540-40058-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics