Evaluation of Large L3 Caches Using TPC-H Trace Samples

  • Jaeheon Jeong
  • Ramendra Sahoo
  • Krishnan Sugavanam
  • Ashwini Nanda
  • Michel Dubois


In this chapter we evaluate the miss rates of four L3 cache architectures for small-scale multiprocessors. Eight processors are partitioned into 1, 2, 4, or 8 clusters with 8, 4, 2, or 1 processors, respectively. Each cluster has a large L3 cache, and the aggregate amount of L3 cache in each of the four architectures varies between 64 MB and 1 GB. The target of our evaluations is decision support systems. We use bus trace samples obtained during the execution of a 100 GB TPC-H on an 8-way multiprocessor. These 12 time samples were taken at one hour intervals during the first day of execution of TPC-H. Each sample contains 64 M bus references.

We first show the distribution of bus references across samples and across processors in the same sample. The major problem with time samples is the cold start misses at the beginning of each sample. We show the cache warm-up rate for all cluster architectures and cache sizes. Unfortunately, systems with aggregate L3 cache sizes above 128 MB are never completely warm with our samples of size 64 million. Thus we evaluate cache architectures under three conditions: cold cache at the beginning of each sample, warm sets only, and stitched trace. We classify misses to understand their cause. Observations are similar across all three simulation types. We also show that the 12 time samples exhibit similar behavior.

One of the major observations using the twelve 64 M reference trace samples is the large number of interprocessor and IO coherence misses.


Address Space Cache Size Shared Cache Trace Sample Large Cache 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barroso L, Gharachorloo K, Bugnion E (1998) Memory System Characterization of Commercial Workloads. In: Proceedings of the 25th ACM International Symposium on Computer Architecture.Google Scholar
  2. 2.
    Barroso L, Gharachorloo K, Nowatzyk A, Verghese B (2000) Impact of Chip-Level Integration on Performance of OLTP Workloads. In: Proceedings of the 6th International Symposium on High-Performance Computer Architecture.Google Scholar
  3. 3.
    Barroso L, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In: Proceedings of the 27th International Symposium on Computer Architecture.Google Scholar
  4. 4.
    Chame J, Dubois M (1993) Cache Inclusion and Processor Sampling in Multiprocessor Simulations. In: Proceedings of ACM Sigmetrics, pp. 36–47.Google Scholar
  5. 5.
    Dubois M, Skeppstedt J, Stenstrom P (1995) Essential Misses and Memory Traffic in Coherence Protocols. Journal of Parallel and Distributed Computing 29(2):108–125.CrossRefGoogle Scholar
  6. 6.
    Kessler R, Hill M, Wood D (1994) A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches. IEEE Transactions on Computers 43(6):664–675.MATHCrossRefGoogle Scholar
  7. 7.
    Magnusson P et al. (1998) SimICS/sun4m: A Virtual Workstation. In: Proceedings of the 1998 USENIX Annual Technical Conference, pp. 119–130.Google Scholar
  8. 8.
    Nanda A, Mak K, Sugavanam K, Sahoo R, Soundararajan B, Smith T (2000) MemorIES: A Programmable, Real-Time Hardware Emulation Tool for Multiprocessor Server Design. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
  9. 9.
    Puzak T (1985) Analysis of Cache Replacement-Algorithms. Ph.D. Dissertation, University of Massachusetts, Amherst, MA.Google Scholar
  10. 10.
    Rosenblaum M, Herrod S, Witchel E, Gupta A (1995) Complete Computer System Simulation: The SimOS Approach. IEEE Parallel and Distributed Technology 3(4):34–43.CrossRefGoogle Scholar
  11. 11. Transaction Processing Performance Council (1999) TPC Benchmark H Standard Specification. Transaction Processing Performance Council.
  12. 12.
    Wang W, Baer J (1991) Efficient Trace-Driven Simulation Methods for Cache Performance Analysis. ACM Transactions on Computer Systems 9(3):222–241.CrossRefGoogle Scholar
  13. 13.
    Wood D, Hill M, Kessler R (1990) A Model for Estimating Trace-Sample Miss Ratios. In: Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems.Google Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Jaeheon Jeong
    • 1
  • Ramendra Sahoo
    • 2
  • Krishnan Sugavanam
    • 2
  • Ashwini Nanda
    • 2
  • Michel Dubois
    • 3
  1. 1.IBMResearch Triangle ParkUSA
  2. 2.IBM ResearchYorktown HeightsUSA
  3. 3.Department of EE-SystemsUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations