A quantitative approach for architecture-invariant parallel workload characterization

  • Abdullah I. Meajil
  • Tarek El-Ghazawi
  • Thomas Sterling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1184)


Experimental design of parallel computers calls for quantifiable methods to compare and evaluate the requirements of different workloads within an application domain. Such metrics can help establish the basis for scientific design of parallel computers driven by application needs, to optimize performance to cost. In this work, a parallelism-based framework is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This method is architecture-invariant and can be used effectively for the comparison of workloads and assessing resource requirements. Our workload characterization is derived from parallel instruction centroid and parallel workload similarity. The centroid is a workload approximation which captures the type and amount of parallel work generated by the workload on the average. The centroid is an efficient measure which aggregates average parallelism, instruction mix, and critical path length. When captured with abstracted information about communication requirements, the result is a powerful tool in understanding the requirements of workloads and their potential performance on target parallel machines. The parallel workload similarity is based on measuring the normalized Euclidean distance (ned), which provides an efficient means of comparing workloads, between workload centroids. This provides the basis for quantifiable analysis of workloads to make informed decisions on the composition of parallel benchmark suites. It is shown that this workload characterization method outperforms comparable ones in accuracy, as well as in time and space requirements. Analysis of the NAS Parallel Benchmark workloads and their performance is presented to demonstrate some of the applications and insight provided by this framework. The parallel-instruction workload model is used to study the similarities among the NAS Parallel Benchmark workloads in a quantitative manner. The results confirm that workloads in NPB represent a wide range of non-redundant benchmarks with different characteristics.


Instruction-Level Parallelism Parallel Computer Architecture Parallel Workload Characterization Performance Evaluation Workload Similarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    R. Hockney, The Science of Computer Benchmarking, SIAM Publisher, Philadelphia, PA., 1996.Google Scholar
  2. [2]
    G.S. Tjaden and M.J. Flynn, “Detection and Parallel Execution of Independent Instructions,” IEEE Transactions on Computers, vol. C-19, no. 10, pp. 889–895, October 1970.Google Scholar
  3. [3]
    A. Nicolau and J.A. Fisher, “Measuring the Parallelism Available for Very Long Instruction Word Architectures,” IEEE Transactions on Computers, vol. 33, no. 11, pp. 968–976, Nov. 1984.Google Scholar
  4. [4]
    M. Butler, T. Yeh, Y. Patt, M. Alsup, H. Scales, and M. Shebanow, “Single Instruction Stream Parallelism Is Greater Than Two,” Proc. of the 8th Annual Symp. on Comp. Arch., pp. 276–286, May 1991.Google Scholar
  5. [5]
    Kumar, “Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications,” IEEE Transactions on Computers, vol. C-37, no. 9, pp. 1088–1098, Sep. 1988.Google Scholar
  6. [6]
    T. Sterling, T. El-Ghazawi, A. Meajil and A. Ozkaya, “NASA Science Workload Characterization for Scalable Parallel Computer Architectures,” Supercomputing 93, technical program, 1993, pp. 78.Google Scholar
  7. [7]
    K.B. Theobald, G.R. Gao, and L.J. Hendren, “On the Limits of Program Parallelism and its Smoothability” Proc. of the 25th Annual International Symp. on Micro-Architecture (MICRO-25), pp. 10–19, Portland, Oregon, December 1992.Google Scholar
  8. [8]
    T. Conte and W. Hwu, “Benchmark Characterization,” IEEE Computer, pp. 48–56, January 1991.Google Scholar
  9. [9]
    M. Calzarossa, and G. Serazzi, “Workload Characterization for Supercomputer,” Performance Evaluation of Supercomputers, J.L. Martin (editor), pp. 283–315, North-Holland, 1988.Google Scholar
  10. [10]
    J. Martin, “Performance Evaluation of Supercomputers and Their Applications,” Parallel Systems and Computation, G. Paul and G. Almasi (Editors), pp. 221–235, North-Holland, 1988.Google Scholar
  11. [11]
    D. Bradley and J. Larson, “A Parallelism-Based Analytic Approach to Performance Evaluation Using Application Programs,” Proceedings of the IEEE, vol., 81, no. 8, pp. 1126–1135, August 1993.Google Scholar
  12. [12]
    A.I. Meajil, “An Architecture-Independent Workload Characterization Model for Parallel Computer Architectures,” Technical Report No. GWU-IIST 96-12, Department of Electrical Engineering and Computer Science, George Washington University, July 1996.Google Scholar
  13. [13]
    K.B. Theobald, G.R. Gao, and L.J. Hendren, “Speculative Execution and Branch Prediction on Parallel Machines,” ACAPS Technical Memo 57, McGill University, December 21, 1992.Google Scholar
  14. [14]
    D. Bailey et al. “The NAS Parallel Benchmarks,” RNR Technical Report RNR-94-007, March 1994, NASA Ames Research Center, Moffett Field, CA.Google Scholar
  15. [15]
    G. Irlam, The Spa package, version 1.0, October 1991.Google Scholar
  16. [16]
    The SPARC Architecture Manual, Version 8, SPARC Int'l, Inc., Menlo Park, CA, 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Abdullah I. Meajil
    • 1
  • Tarek El-Ghazawi
    • 2
  • Thomas Sterling
    • 2
  1. 1.Electrical Engineering and Computer Science DepartmentThe George Washington UniversityUSA
  2. 2.Center of Excellence in Space Data & Information Sciences NASA/Goddard Space Flight CenterUSA

Personalised recommendations