International Journal of Parallel Programming

, Volume 36, Issue 6, pp 571–591 | Cite as

A Case Study on Compiler Optimizations for the Intel® CoreTM 2 Duo Processor

  • Aart J. C. Bik
  • David L. Kreitzer
  • Xinmin Tian


The complexity of modern processors poses increasingly more difficult challenges to software optimization. Modern optimizing compilers have become essential tools for leveraging the power of recent processors by means of high-level optimizations to exploit multi-core platforms and single-instruction-multiple-data (SIMD) instructions, as well as advanced code generation to deal with microarchitectural performance aspects. Using the Intel® CoreTM 2 Duo processor and Intel Fortran/C++ compiler as a case study, this paper gives a detailed account of the sort of optimizations required to obtain high performance on modern processors.


Code generation Compilers Optimization Parallelization Vectorization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen J.R. and Kennedy K. (1987). Automatic translation of Fortran programs to vector form. ACM T. Progr. Lang. Sys. 9: 491–542 zbMATHCrossRefGoogle Scholar
  2. 2.
    Bik A.J.C. (2004). The Software Vectorization Handbook. Intel Press, Hillsboro, OR Google Scholar
  3. 3.
    Bik A.J.C., Girkar M., Grey P.M. and Tian X. (1998). Automatic intra-register vectorization for the Intel architecture. Int. J. Parallel Process. 30: 65–98 CrossRefGoogle Scholar
  4. 4.
    Callahan, D., Cooper, K.D., Kennedy, K., Torczon, L.: Interprocedural constant propagation. In: SIGPLAN ’86 Symposium on Compiler Construction, pp. 152–161. July 1986Google Scholar
  5. 5.
    Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, H., Menon, R.: Parallel Programming in OpenMP. Morgan Kaufmann Publishers Inc. (2001)Google Scholar
  6. 6.
    Eichenberger, A., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pp. 82–93. Washington DC, June 2004Google Scholar
  7. 7.
    Hennessy J.L. and Patterson D.A. (1990). Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, Californa Google Scholar
  8. 8.
    Intel Corporation. Intel Architecture Software Developer’s Manual, vol. 1: Basic Architecture. Intel Corporation (available at (2007)
  9. 9.
    Krall A. and Lelait S. (2000). Compilation techniques for multi-media processors. Int. J. Parallel Prog. 28(4): 347–361 CrossRefGoogle Scholar
  10. 10.
    Larsen, S., Amarasinghe, S.: Exploiting Superword level parallelism with multimedia instruction sets. In: Proceeding of the SIGPLAN Conference on Programming Language Design and Implementation. Vancouver, B.C., June 2000Google Scholar
  11. 11.
    Larsen, S., Witchel, E., Amarasinghe, S.: Increasing and detecting memory address congruence. In: Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques. Charlottesville, VA, September 2002Google Scholar
  12. 12.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December (1995)Google Scholar
  13. 13.
    Muchnick S. (1997). Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, San Mateo, CA Google Scholar
  14. 14.
    Pryanishnikov, I., Krall, A., Horspool, N.: Pointer alignment analysis for processors with SIMD instructions. In: Proceedings of the 5th Workshop on Media and Streaming Processors. San Diego, CA, December 2003Google Scholar
  15. 15.
    Tian, X., Bik, A.J.C., Girkar, M., Grey, P.M., Saito, H., Su, E.: Intel® OpenMP C++/Fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6(1) (2002)Google Scholar
  16. 16.
    Tian X., Gikar M., Bik A.J.C. and Saito H. (2005). Practical compiler techniques on efficient multithreaded code generation for OpenMP programs. Comput. J. 48(5): 558–601 CrossRefGoogle Scholar
  17. 17.
    Wolfe M.J. (1996). High Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, California zbMATHGoogle Scholar
  18. 18.
    Zima H. (1990). Supercompilers for Parallel and Vector Computers. ACM Press, New York Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Aart J. C. Bik
    • 1
  • David L. Kreitzer
    • 1
  • Xinmin Tian
    • 1
  1. 1.Intel CorporationSanta ClaraUSA

Personalised recommendations