Abstract
Graph partitioners play an important role in many parallel work distribution and locality optimization approaches. Surprisingly, however, to our knowledge there is no freely available parallel graph partitioner designed for execution on a shared memory multicore system. This paper presents a shared memory parallel graph partitioner, ParCubed, for use in the context of sparse tiling run-time data and computation reordering. Sparse tiling is a run-time scheduling technique that schedules groups of iterations across loops together when they access the same data and one or more of the loops contains indirect array accesses. For sparse tiling, which is implemented with an inspector/executor strategy, the inspector needs to find an initial seed partitioning of adequate quality very quickly. We compare our presented hierarchical clustering partitioner, ParCubed, with GPart and METIS in terms of partitioning speed, partitioning quality, and the effect the generated seed partitions have on executor speed. We find that the presented partitioner is 25 to 100 times faster than METIS on a 16 core machine. The total edge cut of the partitioning generated by ParCubed was found not to exceed 1.27x that of the partitioning found by METIS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, R.J., Woll, H.: Wait-free parallel algorithms for the union-find problem. In: Proceedings of the Twenty-Third Annual ACM Symposium on Theory of Computing, STOC 1991, pp. 370–380. ACM, New York (1991)
I. Berman. Multicore programming in the face of metamorphosis: Union-find as an example. Master’s thesis, Tel-Aviv University, July 2010.
Chevalier, C., Pellegrini, F.: PT-Scotch: A tool for efficient parallel graph ordering. Parallel Comput. 34(6-8), 318–331 (2008)
Cybenko, G., Allen, T.G., Polito, J.E.: Practical parallel union-find algorithms for transitive closure and clustering. Int. J. Parallel Program 17(5), 403–423 (1989)
Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiss, C.: Cache optimization for structured and unstructured grid multigrid. Electronic Tranactions on Numerical Analysis 10, 21–40 (2000)
Han, H., Tseng, C.-W.: A Comparison of Locality Transformations for Irregular Codes. In: Dwarkadas, S. (ed.) LCR 2000. LNCS, vol. 1915, pp. 70–84. Springer, Heidelberg (2000)
Han, H., Tseng, C.-W.: Improving Locality for Adaptive Irregular Scientific Codes. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 173–188. Springer, Heidelberg (2001)
Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing 1996, IEEE Computer Society, Washington, DC (1996)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 36:1–36:12. ACM, New York (2009)
Strout, M.M., Carter, L., Ferrante, J., Freeman, J., Kreaseck, B.: Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 90–110. Springer, Heidelberg (2005)
Strout, M.M., Carter, L., Ferrante, J., Kreaseck, B.: Sparse tiling for stationary iterative methods. International Journal of High Performance Computing Applications 18(1), 95–114 (2004)
Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)
Walshaw, C., Cross, M.: Parallel optimisation algorithms for multilevel mesh partitioning. Parallel Comput. 26(12), 1635–1660 (2000)
B. Wu, E. Z. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, pp. 243–252. IEEE Computer Society, Washington, DC (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krieger, C.D., Strout, M.M. (2013). A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-37658-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)