Lessons learned from implementing BSP
We focus on two criticisms of Bulk Synchronous Parallelism (BSP): that delaying communication until specific points in a program causes poor performance, and that frequent barrier synchronisations are too expensive for high-performance parallel computing. We show that these criticisms are misguided, not just about BSP but about parallel programming in general, because they are based on misconceptions about the origins of poor performance. The main implication for parallel programming is that higher levels of abstraction do not only make software construction easier—they also make high-performance implementation easier.
KeywordsShared Memory Delivery Time Runtime System Cache Coherence Barrier Synchronisation
Unable to display preview. Download preview PDF.
- 1.M.W. Goudreau, J.M.D. Hill, K. Lang, W.F. McColl, S.D. Rao, D.C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for a BSP Worldwide standard. BSP Worldwide, http://www.bsp-worldwide.org/, April 1996.Google Scholar
- 2.J.M.D. Hill and D.B. Skillicorn. Practical barrier synchronisation. Technical Report TR-16-96, Oxford University Computing Laboratory, August 1996.Google Scholar
- 3.W.F. McColl. General purpose parallel computing. In A.M. Gibbons and P. Spirakis, editors, Lectures on Parallel Computation, Cambridge International Series on Parallel Computation, pages 337–391, 1993.Google Scholar
- 4.D.B. Skillicorn, J.M.D. Hill, and W.F. McColl. Questions and answers about BSP. Technical Report TR-15-96, Oxford University Computing Laboratory, August 1996.Google Scholar
- 5.L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8): 103–111, August 1990.Google Scholar