Skip to main content

The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation


This paper provides an overview and an evaluation of the Cetus source-to-source compiler infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler for research in automatic parallelization of C programs. In meantime, Cetus has been used for many additional program transformation tasks. It serves as a compiler infrastructure for many projects in the US and internationally. Recently, Cetus has been supported by the National Science Foundation to build a community resource. The compiler has gone through several iterations of benchmark studies and implementations of those techniques that could improve the parallel performance of these programs. These efforts have resulted in a system that favorably compares with state-of-the-art parallelizers, such as Intel’s ICC. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data. We will discuss and evaluate several techniques that support dynamic optimization decisions. Finally, as there is an extensive body of proposed compiler analyses and transformations for parallelization, the question of the importance of the techniques arises. This paper evaluates the impact of the individual Cetus techniques on overall program performance.

This is a preview of subscription content, access via your institution.


  1. Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)

    Google Scholar 

  2. Asenjo, R., Castillo, R., Corbera, F., Navarro, A., Tineo, A., Zapata, E.: Parallelizing irregular C codes assisted by interprocedural shape analysis. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08) (2008)

  3. Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The opentm transactional application programming interface. In: PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 376–387. IEEE Computer Society, Washington, DC, USA (2007). doi:10.1109/PACT.2007.74

  4. Barszcz, E., Barton, J., Dagum, L., Frederickson, P., Lasinski, T., Schreiber, R., Venkatakrishnan, V., Weeratunga, S., Bailey, D., Bailey, D., Browning, D., Browning, D., Carter, R., Carter, R., Fineberg, S., Fineberg, S., Simon, H., Simon, H.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. Technical report (1991)

  5. Basumallik, A., Eigenmann, R.: Optimizing irregular shared-memory applications for distributed-memory systems. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 119–128. ACM, New York, NY, USA (2006). doi:10.1145/1122971.1122990

  6. Blume W., Doallo R., Eigenmann R., Grout J., Hoeflinger J., Lawrence T., Lee J., Padua D., Paek Y., Pottenger B., Rauchwerger L., Tu P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)

    Article  Google Scholar 

  7. Blume W., Eigenmann R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3(1), 643–656 (1992)

    Article  Google Scholar 

  8. Blume, W., Eigenmann, R.: The range test: a dependence test for symbolic, non-linear expressions. In: Proceedings of Supercomputing ’94, Washington, DC, pp. 528–537 (1994)

  9. Callahan, D., Dongarra, J., Levine D.: Vectorizing compilers: a test suite and results. In: Proceedings of the 1988 ACE/IEEE Conference on Supercomputing, Orlando, FL, USA, pp. 98–105. IEEE Computer Society Press, Los Alamitos, CA (1988)

  10. Callahan, D.: The program summary graph and flow-sensitive interprocedual data flow analysis. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, PLDI ’88, pp. 47–56. ACM, New York, NY, USA (1988). doi:10.1145/53990.53995

  11. Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011 (2011)

  12. Dave, C.: Parallelization and performance-tuning: automating two essential techniques in the multicore era. Master’s thesis, Purdue University (2010)

  13. Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)

    Google Scholar 

  14. Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: Proceedings of the International Conference on Parallel Processing, vol. 2, pp. 17–25 (1991)

  15. Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)

    Google Scholar 

  16. Emami, M., Ghiya, R., Hendren, L.J.: Context-sensitive interprocedural points-to analysis in the presence of function pointers. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI ’94, pp. 242–256. ACM, New York, NY, USA (1994). doi:10.1145/178243.178264

  17. Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI ’06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM, New York, NY, USA (2006). doi:10.1145/1133981.1133992

  18. Guo, J., Stiles, M., Yi, Q., Psarris, K.: Enhancing the role of inlining in effective interprocedural parallelization. In: Parallel Processing (ICPP), 2011 International Conference on, pp. 265–274 (2011). doi:10.1109/ICPP.2011.68

  19. Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)

  20. Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the ACM Symposium on Principles and Practice of Parallel Programming (PPOPP’09), ACM Press (2009)

  21. Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–10. IEEE Computer Society, Washington, DC, USA (2009) doi:10.1109/IPDPS.2009.5160988.

  22. Min, S.J., Kim, S.W., Voss, M., Lee, S.I., Eigenmann, R.: Portable compilers for OpenMP. In: OpenMP Shared-Memory Parallel Programming, Lecture Notes in Computer Science #2104, pp. 11–19. Springer, Heidelberg (2001)

  23. Mustafa, D., Eigenmann, R.: Portable section-level tuning of compiler parallelized applications. In: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing. IEEE Press (2012)

  24. Mustafa, D., Eigenmann, R.: Window-based empirical tuning of parallelized applications. Technical report, Purdue University, ParaMount Research Group (2011)

  25. Mytkowicz T., Diwan A., Hauswirth M., Sweeney P.: The effect of omitted-variable bias on the evaluation of compiler optimizations. Computer 43(9), 62–67 (2010). doi:10.1109/MC.2010.214

    Article  Google Scholar 

  26. Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing Fortran compilers. In: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pp. 820–825 (1989)

  27. Papakonstantinou, A., Gururaj, K., Stratton, J.A., Chen, D., Cong, J., Hwu, W.M.W.: High-performance CUDA kernel execution on FPGAs. In: Proceedings of the 23rd International Conference on Supercomputing, ICS ’09, pp. 515–516. ACM, New York, NY, USA (2009). doi:10.1145/1542275.1542357

  28. Satoh, S.: NAS Parallel Benchmarks 2.3 OpenMP C version [Online]. Available:

  29. Shen Z., Li Z., Yew P.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1(3), 356–364 (1990)

    Article  Google Scholar 

  30. Tu, P., Padua, D.: Automatic array privatization. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua D. (eds.) Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 768, pp. 500–521, Portland (12–14 August 1993)

  31. der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Technical report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)

  32. Wolfe M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  33. Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 86–97. ACM, New York, NY, USA (2010). doi:10.1145/1806596.1806606

  34. Yang, Y., Xiang, P., Kong, J., Zhou, H.: An optimizing compiler for GPGPU programs with input-data sharing. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 343–344. ACM, New York, NY, USA (2010). doi:10.1145/1693453.1693505

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hansang Bae.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bae, H., Mustafa, D., Lee, JW. et al. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation. Int J Parallel Prog 41, 753–767 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Automatic parallelization
  • Compiler infrastructure
  • Source-to-source translation
  • Performance