Communication performance optimisation requires minimising variance
The cost of communication in message-passing systems can only be computed based on a large number of low-level details. Consequently, the only architectural measure they naturally suggest is a first-order one, latency. We show that a second-order property, the standard deviation of the delivery times is also of interest. Most importantly, the average performance of a large communication system depends not only on the average performance of its components, but also on the standard deviation of these performances. In other words, building a high-performance system requires components that are themselves high-performance, but their performance must also have small variance. We illustrate this effect using distributions of the BSP g parameter. Lower bounds on the communication performance of large systems can be derived from data measured over single links.
Unable to display preview. Download preview PDF.
- 1.Andrew A. Chien and Jae H. Kim. Approaches to quality of service in high performance networks. In Proceedings of the Parallel Computer Routing and Communications Workshop, LNCS, Atlanta, Georgia, July 1997. Springer-Verlag.Google Scholar
- 2.Jonathan M. D. Hill, Stephen Donaldson, and David Skillicorn. Stability of communication performance in practice: from the Cray T3E to networks of workstations. Technical Report PRG-TR-33-97, Oxford University Computing Laboratory, October 1997.Google Scholar
- 3.Jonathan M. D. Hill, Bill McColl, Dan C. Stefanescu, Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, and Rob Bisseling. BSPlib: The BSP Programming Library. Technical Report PRG-TR-29-97, Oxford University Computing Laboratory, May 1997. see www.bsp-worldwide.org for more details.Google Scholar
- 4.Jonathan M. D. Hill and David Skillicorn. Lessons learned from implementing BSP. Journal of Future Generation Computer Systems, April 1998.Google Scholar
- 5.Ronald Mraz. Reducing the variance of point-to-point transfers for parallel realtime programs. IEEE Parallel and Distributed Technology, pages 20–31, Winter 1994.Google Scholar
- 6.The Oxford BSP toolset machine parameters. http://www.bsp-worldwide.org/implmnts/oxtool/params-frame.html, 1997.Google Scholar
- 7.David Skillicorn, Jonathan M. D. Hill, and W. F. McColl. Questions and answers about BSP. Scientific Programming, 6(3):249–274, Fall 1997.Google Scholar
- 8.Leslie G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, August 1990.Google Scholar