Automatic reduction tree generation for fine-grain parallel architectures when iteration count is unknown
Over the last few years, the research trend in future generation high-performance computing systems has been moving toward a multi-threaded parallel architectures. Thus the importance to exploit and control parallelism has growing parallel activities must be both synchronized and reduced. In fine-grain parallel computation, designing efficient micro synchronization, at the same level of granularity as the grain size, is essential for implementation. This article discusses methods of synchronizing parallel activities, focusing on the case when the number of activities to be gathered is determined at run time. A new reduction graph, without loss of parallelism, is proposed. It is especially useful if the number of parallel activities is determined dynamically. This method is basically developed for instruction-level dataflow computers. Its full potential should be realized when trends in parallel processing return to finer grain sizes.
Unable to display preview. Download preview PDF.
- 1.Hiraki, K., Sekiguchi, S., and Shimada, T., “System architecture of a dataflow supercomputer”, Proc. TENCON 87, IEEE, Seul, August 1987, IEEE.Google Scholar
- 2.Hiraki, K., Sekiguchi, S., and Shimada, T., “Efficient vector processing on a dataflow supercomputer SIGMA-1”, Proc. Supercomputing'88, IEEE, Orlando, November 1988, IEEE.Google Scholar
- 3.Hiraki, K., Sekiguchi, S., and Shimada, T., “Status report of SIGMA-1: a data-flow supercomputer”, Gaudiot, J.-L., and Bic, L. (eds.), Advanced Topics in Data-Flow Computing, chapter 7, Prentice Hall, 1991, chapter 7.Google Scholar
- 4.Sekiguchi, S., Shimada, T., and Hiraki, K., “Sequential description and parallel execution language DFC II for dataflow supercomputers”, 1991 Intl. Conf. on Supercomputing, ACM, Cologne, June 1991, ACM.Google Scholar
- 5.Traub, T. R., “A compiler for the MIT tagged token dataflow architecture”, Master's thesis, MIT, 1986.Google Scholar