Design and Analysis of Algorithms for Shared-Memory Multiprocessors

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1663)


Shared-memory multiprocessors feature parallelism and a steep cache hierarchy, two salient characteristics that distinguish them from the commodity processors of ten years ago. Both of these characteristics can be exploited effectively using the same general strategy: divide-and-conquer recursion. This talk overviews the Cilk multithreaded programming language being developed in the MIT Laboratory for Computer Science, which allows a programmer to exploit parallelism through divide-and-conquer. In addition, I show how divide-and-conquer allows caches to be used effectively.

In the first part of my talk, I introduce the Cilk programming language. Cilk minimally extends the C programming language to allow interactions among computational threads to be specified in a simple and highlevel fashion. Cilk’s provably efficient runtime system dynamically maps a user’s program onto available physical resources, freeing the programmer from concerns of communication protocols and load balancing. In addition, Cilk provides an abstract performance model that a programmer can use to predict the multiprocessor performance of his application from its execution on a single processor. Not only do Cilk programs scale up to run efficiently on multiple processors, but unlike existing parallel-programming environments, such as MPI and HPF, Cilk programs “scale down”: the efficiency of a Cilk program on one processor rivals that of a comparable C program. I illustrate Cilk programming through the example of divide-and-conquer matrix multiplication.

The second part of my talk presents a strategy for designing algorithms to exploit multiple levels of caching effectively. Unlike previous algorithms, these algorithms are “cache oblivious”: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, I show that cache-oblivious algorithms can be developed that use an optimal amount of work and move data optimally among multiple levels of cache. Problems that can be solved efficiently by cache-oblivious algorithms include matrix multiplication, FFT, sorting, and matrix transpose, all of which are solved in a divide-and-conquer fashion.

Together, these two technologies provide a foundation for the design and analysis of efficient algorithms for shared-memory multiprocessors.


Programming Language Matrix Multiplication Load Balance Physical Resource Cache Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  1. 1.MIT Laboratory for Computer ScienceUSA

Personalised recommendations