Abstract
Code generation belongs to the backend of parallelizing compiler, and is for generating efficient computation and communication code for the target parallel computing system. Traditional research resolve array redistribution mainly by generating communication code that each processor sends all data defined in its local memory to all processors, but this will bring large amount of communication redundancy, which increase with the growth of number of processors. Focusing on this problem, this paper presents an accurate code generation algorithm of array redistribution for distributed-memory architecture. The algorithm determines source processor and goals processor of each array element’s migration in array redistribution by the transformation of data decompositions, then generate accurate communication code. The experimental results show that algorithm proposed by this paper can effectively reduce communication redundancy with the processor scale growth, and improve the parallel performance of applications.
Article PDF
Avoid common mistakes on your manuscript.
6. References
Ancourt C, Irigoin F. Scanning polyhedra with do loops. Proceedings of the third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ´91). NY, USA: ACM New York, 1991. 39–50.
Amarasinghe SP and Lam MS. Communication optimization and code generation for distributed memory machines. Proceedings of The ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ´93). NY, USA: ACM New York, 1993. 126–138.
Ferner CS. The Paraguin Compiler-Message-Passing Code Generation Using SUIF. Proceedings IEEE SoutheastCon. Washington DC: IEEE Computer Society, 2002:1–6.
Ferner CS. Revisiting Communication Code Generation Algorithms for Message-passing Systems. International Journal of Parallel, Emergent and Distributed Systems, 2006, 21(5):323–344.
Martin PJ. and Ferner CS, “Suppressing independent loops in packing/unpacking loop nests to reduce message size for message-passing code,” in the Proceedings of the PDPTA’07 – The 2007 International Conference on Parallel and Distributed Processing Techniques and Applications (as part of WORLDCOMP’07), Las Vegas, NV, June 15–18, 2007. p98–104.
Martin PJ. Suppressing Independent Loops in packing/unpacking Loop Nests to Reduce Message Size for Message-Passing Code. University of North Carolina Wilmington, USA, Wilmington NC, 2010, Master
M. Griebl, Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004.
M. Classen and M. Griebl. Automatic code generation for distributed memory architectures in the polytope model. In IEEE IPDPS, Apr. 2006.
Bondhugula U. Automatic Distributed-Memory Parallelization and Code Generation using the Polyhedral Framework. Technical Report, Report No.IISc-CSA-TR-2011-3, Bangalore: Indian Institute of Science, 2011.
Kwon O, Jubair F, Eigenmann R and Midkiff S. A Hybrid Approach of OpenMP for Clusters. Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ´12). NY, USA: ACM New York, 2012. 75–84.
Basumallik A and Eigenmann R. Optimizing irregular shared-memory applications for distributed-memory systems. Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ´06). NY, USA: ACM New York, 2006. 119–128.
Ravishankar M, Eisenlohr J, Pouchet LN, Ramanujam J, Rountev A, Sadayappan P. Code generation for parallel execution of a class of irregular loops on distributed memory systems. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12). CA, USA: IEEE Computer Society Press Los Alamitos, 2012.
Ravishankar M, Eisenlohr J, Pouchet LN, Ramanujam J, Rountev A, and Sadayappan P. Code generation for parallel execution of a class of irregular loops on distributed memory systems. Technical Report, Report No.OSU-CISRC-5/12-TR10. The Ohio State University, 2012.
Kim H, Johnson NP, Lee JW, Mahlke SA, August DI. Automatic Speculative DOALL for Clusters. Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO ´12). NY, USA: ACM New York, 2012. 94–103.
Anderson JM and Lam MS. Global optimizations for parallelism and locality on scalable parallel machines. In: Cartwright R, ed. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation. Albuquerque: ACM New York, 1993.112–125.
Lee PZ, Kedem ZM. Automatic Data and Computation Decomposition on Distributed Memory Parallel Computers. ACM Transactions on Programming Languages and Systems, 2002, 24(1): 1–50.
S. Amarasinghe. Parallelizing Compiler Techniques Based on Linear Inequalities. PhD thesis, Stanford University, 1997.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Zhao, B., Ding, R., Han, L. et al. Code generation for accurate array redistribution on automatic distributed-memory parallelization. Int J Netw Distrib Comput 2, 11–25 (2014). https://doi.org/10.2991/ijndc.2014.2.1.2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijndc.2014.2.1.2