Advertisement

Impact of Variable Privatization on Extracting Synchronization-Free Slices for Multi-core Computers

  • Marek Palkowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7686)

Abstract

Variable Privatization is an important technique that has been used by compilers to parallelize loops by eliminating storage-related dependences. In this paper, we present an approach that combines extracting synchronization-free slices available in program loops with variable privatization. This permits us to reduce the number of dependence relations and as a consequence to reduce the time complexity of algorithms aimed at extracting synchronization-free slices. This leads to enlarging the scope of the applicability of those algorithms and reducing the time required to parallelize loops. The scope of the applicability of the approach is illustrated by means of the NAS Parallel Benchmark suite. Results of a performance analysis for parallelized loops executed on a multi-core computer are presented. Received results are compared with those obtained by other loop parallelization techniques. The future work is outlined.

Keywords

iteration space slicing scalar and array variable privatization automatic loop parallelizer NAS Parallel Benchmark 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beletska, A., Bielecki, W., Cohen, A., Palkowski, M., Siedlecki, K.: Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Computing 37, 479–497 (2011)CrossRefGoogle Scholar
  2. 2.
    Pugh, W., Rosser, E.: Iteration space slicing and its application to communication optimization. In: International Conference on Supercomputing, pp. 221–228 (1997)Google Scholar
  3. 3.
    Weiser, M.: Program slicing. IEEE Transactions on Software Engineering, 352–357 (1984)Google Scholar
  4. 4.
    Gupta, M.: On Privatization of Variables for Data-Parallel Execution. In: Proceedings of the 11th International Parallel Processing Symposium, pp. 533–541 (1997)Google Scholar
  5. 5.
    Allen, R., Kennedy, K.: Optimizing compilers for modern architectures: A Dependence based Approach. Morgan Kaufmann Publish., Inc. (2001)Google Scholar
  6. 6.
    Moldovan, D.: Parallel Processing: From Applications to Systems. Morgan Kaufmann Publishers, Inc. (1993)Google Scholar
  7. 7.
    The NAS benchmark suite, http://www.nas.nasa.gov
  8. 8.
    Pugh, W., Wonnacott, D.: An Exact Method for Analysis of Value-Based Array Data Dependences. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 546–566. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  9. 9.
    Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide. Technical report, College Park, MD, USA (1994)Google Scholar
  10. 10.
    Kelly, W., Pugh, W., Rosser, E., Shpeisman, T.: Transitive clousure of infinite graphs and its applications. In: Languages and Compilers for Parallel Computing (1995)Google Scholar
  11. 11.
    Verdoolaege, S.: Integer Set Library - Manual (2011), http://www.kotnet.org/~skimo//isl/manual.pdf
  12. 12.
    Wlodzimierz, B., Tomasz, K., Marek, P., Beletska, A.: An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations. In: Wu, W., Daescu, O. (eds.) COCOA 2010, Part I. LNCS, vol. 6508, pp. 104–113. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Lim, A., Lam, M., Cheong, G.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: ICS 1999, pp. 228–237. ACM Press (1999)Google Scholar
  14. 14.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I and II, one and multidimensional time. International Journal of Parallel Programming 21, 313–348, 389–420 (1992)Google Scholar
  15. 15.
    Kelly, W., Pugh, W., Rosser, E., Maslov, V., Shpeisman, T., Wonnacott, D.: New User Interface for Petit and Other Extensions. User Guide (1996)Google Scholar
  16. 16.
  17. 17.
    PLUTO - An automatic parallelizer and locality optimizer for multicores (2012), http://pluto-compiler.sourceforge.net
  18. 18.
    Bondhugula, U., Hartono, A., Ramanujan, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Programming Languages Design and Implementation (PLDI 2008), pp. 101–1123 (2008)Google Scholar
  19. 19.
    Marek, P.: Automatic Privatization for Parallel Execution of Loops. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 395–403. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  20. 20.
    Vasilache, N., et al.: Trading Off Memory For Parallelism Quality. In: Proceedings of IMPACT 2012 (2012), http://impact.gforge.inria.fr/impact2012/workshop_IMPACT/vasilache_memopt.pdf
  21. 21.
    Amini, M., Ancourt, C., et al.: PIPS Documentation (2012), http://pips4u.org/doc
  22. 22.
    Amini, M., et al.: PIPS Is not (just) Polyhedral Software. In: First International Workshop on Polyhedral Compilation Techniques (IMPACT 2011), Chamonix, France (April 2011)Google Scholar
  23. 23.
    Chirag, D., et al.: Cetus: A Source-to-Source Compiler Infrastructure for Multicores. IEEE Computer, 36–42 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marek Palkowski
    • 1
  1. 1.Faculty of Computer ScienceWest Pomeranian University of TechnologySzczecinPoland

Personalised recommendations