Advertisement

Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations’ Perspective

  • Jack Liu
  • Youfeng Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3923)

Abstract

Intel Extended Memory 64 Technology (EM64T) and AMD 64-bit architecture (AMD64) are emerging 64-bit x86 architectures that are fully x86 compatible. Compared with the 32-bit x86 architecture, the 64-bit x86 architectures cater some new features to applications. For instance, applications can address 64 bits of virtual memory space, perform operations on 64-bit-wide operands, get access to 16 general-purpose registers (GPRs) and 16 extended multi-media (XMM) registers, and use a register-based argument passing convention. In this paper, we investigate the performance impacts of these new features from compiler optimizations’ standpoint. Our research compiler is based on the Intel Fortran/C++ production compiler, and our experiments are conducted on the SPEC2000 benchmark suite. Results show that for 64-bit-wide pointer and long data types, several SPEC2000 C benchmarks are slowed down by more than 20%, which is mainly due to the enlarged memory footprint. To evaluate the performance potential of 64-bit x86 architectures, we designed and implemented the LP32 code model such that the sizes of pointer and long are 32 bits. Our experiments demonstrate that on average the LP32 code model speeds up the SPEC2000 C benchmarks by 13.4%. For the register-based argument passing convention, our experiments show that the performance gain is less than 1% because of the aggressive function inlining optimization. Finally, we observe that using 16 GPRs and 16 XMM registers significantly outperforms the scenario when only 8 GPRs and 8 XMM registers are used. However, our results also show that using 12 GPRs and 12 XMM registers can achieve as competitive performance as employing 16 GPRs and 16 XMM registers.

Keywords

Normalize Execution Time Register Allocation Performance Characterization SPEC2000 Benchmark Compatibility Mode 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Intel Corporation, Santa Clara. 64-bit Extension Technology Software Developer’s Guide Volume 1&2 Order Number 300834, 300835Google Scholar
  2. 2.
    Luna, D., Pettersson, M., Sagonas, K.: Efficiently compiling a functional language on AMD64: the HiPE experience. In: PPDP 2005: Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming, pp. 176–186 (2005)Google Scholar
  3. 3.
    Hubička, J.: Porting GCC to the AMD64 architecture. In: Proceedings of the GCC Developers Summit, May 2003, pp. 79–105 (2003)Google Scholar
  4. 4.
    Luk, C.-K., Cohn, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming language design and implementation, pp. 190–200 (2005)Google Scholar
  5. 5.
    Sprunt, B.: Pentium 4 Performance-Monitoring Features. IEEE Micro. 22(4), 72–82 (2002)CrossRefGoogle Scholar
  6. 6.
    Chaitin, G.J.: Register allocation & spilling via graph coloring. In: SIGPLAN 1982: Proceedings of the, SIGPLAN symposium on Compiler construction, pp. 98–101 (1982)Google Scholar
  7. 7.
    Briggs, P., Cooper, K.D., Kennedy, K., Torczon, L.: Coloring heuristics for register allocation. In: PLDI 1989: Proceedings of the ACM SIGPLAN, Conference on Programming language design and implementation, pp. 275–284 (1989)Google Scholar
  8. 8.
    Hubička, J., Jaeger, A., Mitchell, M. (eds.): System V Application Binary Interface: AMD64 Architecture Processor Supplement, Available from: http://www.x86-64.org
  9. 9.
    Lattner, C., Adve, V.S.: Transparent Pointer Compression for Linked Data Structures. In: Proceedings of Memory System Performance Workshop (2005)Google Scholar
  10. 10.
    Adl-Tabatabai, A.-R., et al.: Improving 64-Bit Java IPF Performance by Compressing Heap References. In: Proceedings of CGO, March 2004, pp. 100–111 (2004)Google Scholar
  11. 11.
    Koes, D., Goldstein, S.C.: A Progressive Register Allocator for Irregular Architectures. In: Proceedings of CGO, pp. 269–280 (2005)Google Scholar
  12. 12.
    Govindarajan, R., Yang, H., Amaral, J.N., Zhang, C., Gao, G.R.: Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures. IEEE Transaction on Computers 52(1) (January 2003)Google Scholar
  13. 13.
    Kong, T., Wilken, K.D.: Precise register allocation for irregular architectures. In: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 297–307 (1998)Google Scholar
  14. 14.
    Appel, A.W., George, L.: Optimal spilling for CISC machines with few registers. In: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pp. 243–253 (2001)Google Scholar
  15. 15.
    Kochetkov, K.: SPEC CPU 2000, Part 19. EM64T in Intel Pentium 4, (June 2005), Available from: http://www.digit-life.com/articles2/cpu/insidespeccpu2000-part-j.html
  16. 16.
    Intel Corporation, Santa Clara. IA-32 Intel® Architecture Software Developer’s Manual, Volume 1: Basic Architecture (2005), Order Number 253665 Google Scholar
  17. 17.
    Intel Corporation, Santa Clara. IA-32 Intel® Architecture Software Developers Manual, Volume 3: System Programming Guide (2005), Order Number 253668Google Scholar
  18. 18.
    Poletto, M., Sarkar, V.: Linear scan register allocation. ACM Transactions on Programming Languages and Systems 21(5), 895–913 (1999)CrossRefGoogle Scholar
  19. 19.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a Modern Processor: Where Does Time Go? In: Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, September 1999, pp. 266–277 (1999)Google Scholar
  20. 20.
    Keeton, K., Patterson, D.A., et al.: Performance characterization of a Quad Pentium Pro SMP using OLTP workloads. In: Proceedings of the 25th annual international symposium on Computer architecture, pp. 15–26 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jack Liu
    • 1
  • Youfeng Wu
    • 1
  1. 1.Intel CorporationSanta ClaraUSA

Personalised recommendations