Abstract
The performance of a computer system depends on the characteristics of the workload it must serve: for example, if work is evenly distributed performance will be better than if it comes in unpredictable bursts that lead to congestion. Thus performance evaluations require the use of representative workloads in order to produce dependable results. This can be achieved by collecting data about real workloads, and creating statistical models that capture their salient features. This survey covers methodologies for doing so. Emphasis is placed on problematic issues such as dealing with correlations between workload parameters and dealing with heavy-tailed distributions and rare events. These considerations lead to the notion of structural modeling, in which the general statistical model of the workload is replaced by a model of the process generating the workload.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
P. Abry and D. Veitch, “Wavelet analysis of long-range-dependent traffic”. IEEE Trans. Information Theory 44(1), pp. 2–15, Jan 1998.
L. A. Adamic, “Zipf, power-laws, and Pareto — a ranking tutorial”. 2000. http://www.hpl.hp.com/shl/papers/ranking/.
R. J. Adler, R. E. Feldman, and M. S. Taqqu (eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, 1998.
A. K. Agrawala, J. M. Mohr, and R. M. Bryant, “An approach to the workload characterization problem”. Computer 9(6), pp. 18–32, Jun 1976.
M. F. Arlitt and C. L. Williamson, “Web server workload characterization:the search for invariants”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 126–137, May 1996.
P. Barford and M. Crovella, “Generating representative web workloads for network and server performance evaluation”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 151–160, Jun 1998.
L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipflike distributions:evidence and implications”. In IEEE Infocom, pp. 126–134, Mar 1999.
W. Buchholz, “A synthetic job for measuring system performance”. IBM Syst. J. 8(4), pp. 309–318, 1969.
W. Bux and U. Herzog, “The phase concept:appro ximation of measured data and perfrmance analysis”. In Computer Performance, K. M. Chandy and M. Reiser (eds.), pp. 23–38, North Holland, 1977.
M. Calzarossa, G. Haring, G. Kotsis, A. Merlo, and D. Tessera, “A hierarchical approach to workload characterization for parallel systems”. In High-Performance Computing and Networking, pp. 102–109, Springer-Verlag, May 1995. Lect. Notes Comput. Sci. vol. 919.
M. Calzarossa, L. Massari, and D. Tessera, “Workload characterization issues and methodologies”. In Performance Evaluation: Origins and Directions, G. Haring, C. Lindemann, and M. Reiser (eds.), pp. 459–482, Springer-Verlag, 2000. Lect. Notes Comput. Sci. vol. 1769.
M. Calzarossa and G. Serazzi, “A characterization of the variation in time of workload arrival patterns”. IEEE Trans. Comput. C-34(2), pp. 156–162, Feb 1985.
M. Calzarossa and G. Serazzi, “Workload characterization:a survey”. Proc. IEEE 81(8), pp. 1136–1150, Aug 1993.
S-H. Chiang and M. K. Vernon, “Characteristics of a large shared memory production workload”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 159–187, Springer Verlag, 2001. Lect. Notes Comput. Sci. vol. 2221.
W. Cirne and F. Berman, “A comprehensive model of the supercomputer workload”. In 4th Workshop on Workload Characterization, Dec 2001.
E. G. Coffman, Jr., M. R. Garey, and D. S. Johnson, “Approximation algorithms for bin-packing — an updated survey”. In Algorithm Design for Computer Systems Design, G. Ausiello, M. Lucertini, and P. Serafini (eds.), pp. 49–106, Springer-Verlag, 1984.
M. E. Crovella, “Performance evaluation with heavy tailed distributions”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–10, Springer Verlag, 2001. Lect. Notes Comput. Sci. vol. 2221.
M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic:evidence and possible causes”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 160–169, May 1996.
M. E. Crovella and M. S. Taqqu, “Estimating the heavy tail index from scaling properties”. Methodology & Comput. in Applied Probability 1(1), pp. 55–79, Jul 1999.
R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, “A quantitative study of parallel scientific applications with explicit communication”. J. Supercomput. 10(1), pp. 5–24, 1996.
A. B. Downey, “A parallel workload model and its implications for processor allocation”. In 6th Intl. Symp. High Performance Distributed Comput., Aug 1997.
A. B. Downey, “The structural cause of file size distributions”. In 9th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst., Aug 2001.
A. B. Downey and D. G. Feitelson, “The elusive goal of workload characterization”. Performance Evaluation Rev. 26(4), pp. 14–29, Mar 1999.
A. Erramilli, U. Narayan, and W. Willinger, “Experimental queueing analysis with long-range dependent packet traffic”. IEEE/ACM Trans. Networking 4(2), pp. 209–223, Apr 1996.
D. G. Feitelson, Analyzing the Root Causes of Performance Evaluation Results. Technical Report 2002-4, School of Computer Science and Engineering, Hebrew University, Mar 2002.
D. G. Feitelson, “The forgotten factor:facts ”. In EuroPar, Springer-Verlag, Aug 2002. Lect. Notes Comput. Sci.
D. G. Feitelson, “Memory usage in the LANL CM-5 workload”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 78–94, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 89–110, Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-Verlag, 1995. Lect. Notes Comput. Sci. vol. 949.
D. G. Feitelson and L. Rudolph, “Metrics and benchmarking for parallel job scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–24, Springer-Verlag, 1998. Lect. Notes Comput. Sci. vol. 1459.
D. Ferrari, “On the foundation of artificial workload design”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 8–14, Aug 1984.
D. Ferrari, “Workload characterization and selection in computer performance measurement”. Computer 5(4), pp. 18–24, Jul/Aug 1972.
K. Ferschweiler, M. Calzarossa, C. Pancake, D. Tessera, and D. Keon, “A community databank for performance tracefiles”. In Euro PVM/MPI, Y. Cotronis and J. Dongarra (eds.), pp. 233–240, Springer-Verlag, 2001. Lect. Notes Comput. Sci. vol. 2131.
R. Gibbons, “A historical application profiler for use by parallel schedulers”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 58–77, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
S. D. Gribble and E. A. Brewer, “System design issues for internet middleware services:deductions from a large client trace”. In Symp. Internet Technologies and Systems, USENIX, Dec 1997.
S. D. Gribble, G. S. Manku, D. Roselli, E. A. Brewer, T. J. Gibson, and E. L. Miller, “Self-similarity in file systems”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 141–150, Jun 1998.
M. Harchol-Balter and A. B. Downey, “Exploiting process lifetime distributions for dynamic load balancing”. ACM Trans. Comput. Syst. 15(3), pp. 253–285, Aug 1997.
J. K. Hollingsworth, B. P. Miller, and J. Cargille, “Dynamic program instrumentation for scalable performance tools”. In Scalable High-Performance Comput. Conf., pp. 841–850, May 1994.
S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 27–40, Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
G. Irlam, “Unix file size survey-1993”. http://www.base.com/gordoni/ufs93.html.
J. Jann, P. Pattnaik, H. Franke, F. Wang, J. Skovira, and J. Riodan, “Modeling of workload in MPPs”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 95–116, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
R. E. Kessler, M. D. Hill, and D. A. Wood, “A comparison of trace-sampling techniques for multi-megabyte caches”. IEEE Trans. Comput. 43(6), pp. 664–675, Jun 1994.
D. N. Kimelman and T. A. Ngo, “The RP3 program visualization environment”. IBM J. Res. Dev. 35(5/6), pp. 635–651, Sep/Nov 1991.
D. L. Kiskis and K. G. Shin, “SWSL:a synthetic workload specification language for real-time systems”. IEEE Trans. Softw. Eng. 20(10), pp. 798–811, Oct 1994.
E. J. Koldinger, S. J. Eggers, and H. M. Levy, “On the validity of trace-driven simulation for multiprocessors”. In 18th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 244–253, May 1991.
G. Kotsis, “A systematic approach for workload modeling for parallel processing systems”. Parallel Comput. 22, pp. 1771–1787, 1997.
P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more important than processor allocation for hypercube computers”. IEEE Trans. Parallel & Distributed Syst. 5(5), pp. 488–497, May 1994.
M. Krunz and S. K. Tripathi, “On the characterization of VBR MPEG streams”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 192–202, Jun 1997.
A. M. Law and W. D. Kelton, Simulation Modeling and Analysis. McGraw Hill, 3rd ed., 2000.
E. D. Lazowska, “The use of percentiles in modeling CPU service time distributions”. In Computer Performance, K. M. Chandy and M. Reiser (eds.), pp. 53–66, North-Holland, 1977.
W. E. Leland and T. J. Ott, “Load-balancing heuristics and process behavior”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 54–69, 1986.
W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar nature of Ethernet traffic”. IEEE/ACM Trans. Networking 2(1), pp. 1–15, Feb 1994.
V. Lo, J. Mache, and K. Windisch, “A comparative study of real workload traces and synthetic workload models for parallel job scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 25–46, Springer Verlag, 1998. Lect. Notes Comput. Sci. vol. 1459.
U. Lublin and D. G. Feitelson, The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs. Technical Report 2001-12, Hebrew University, Oct 2001.
A. D. Malony, D. A. Reed, and H. A. G. Wijsho., “Performance measurement intrusion and perturbation analysis”. IEEE Trans. Parallel & Distributed Syst. 3(4), pp. 433–450, Jul 1992.
B. B. Mandelbrot, The Fractal Geometry of Nature. W. H. Freeman and Co., 1982.
A. W. Mu’alem and D. G. Feitelson, “Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling”. IEEE Trans. Parallel & Distributed Syst. 12(6), pp. 529–543, Jun 2001.
T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characterization for multiprocessor scheduling policy design”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 175–199, Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. L. Best, “File-access characteristics of parallel scientific workloads”. IEEE Trans. Parallel & Distributed Syst. 7(10), pp. 1075–1089, Oct 1996.
K. Park and W. Willinger, “Self-similar network traffic:an overview”. In Self-Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger (eds.), pp. 1–38, John Wiley & Sons, 2000.
V. Paxon and S. Floyd, “Wide-area traffic:the failure of Poisson modeling”. IEEE/ACM Trans. Networking 3(3), pp. 226–244, Jun 1995.
E. E. Peters, Fractal Market Analysis. John Wiley & Sons, 1994.
S. V. Raghavan, D. Vasukiammaiyar, and G. Haring, “Generative workload models for a single server environment”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 118–127, May 1994.
E. Rosti, G. Serazzi, E. Smirni, and M. S. Squillante, “Models of parallel applications with large computation and I/O requirements”. IEEE Trans. Softw. Eng. 28(3), pp. 286–307, Mar 2002.
R. V. Rubin, L. Rudolph, and D. Zernik, “Debugging parallel programs in parallel”. In Workshop on Parallel and Distributed Debugging, pp. 216–225, SIGPLAN/SIGOPS, May 1988.
M. Schroeder, Fractals, chaos, Power Laws. W. H. Freeman and Co., 1991.
K. C. Sevcik, “Application scheduling and processor allocation in multiprogrammed parallel processing systems”. Performance Evaluation 19(2–3), pp. 107–140, Mar 1994.
A. Shaikh, J. Rexford, and K. G. Shin, “Load-sensitive routing of long-lived IP flows”. In SIGCOMM, pp. 215–226, Aug 1999.
A. Singh and Z. Segall, “Synthetic workload generation for experimentation with multiprocessors”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 778–785, Oct 1982.
D. Thiébaut, “On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio”. IEEE Trans. Comput. 38(7), pp. 1012–1026, Jul 1989.
D. Thiébaut, J. L. Wolf, and H. S. Stone, “Synthetic traces for trace-driven simulation of cache memories”. IEEE Trans. Comput. 41(4), pp. 388–410, Apr 1992. (Corrected in IEEE Trans. Comput. 42(5) p. 635, May 1993).
J. J. P. Tsai, K-Y. Fang, and H-Y. Chen, “A noninvasive architecture to monitor real-time distributed systems”. Computer 23(3), pp. 11–23, Mar 1990.
J. S. Vetter and F. Mueller, “Communication characteristics of large-scale scientific applications for contemporary cluster architectures”. In 16th Intl. Parallel & Distributed Processing Symp., May 2002.
W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, “Self-similarity through high-variability:statistical analysis of Ethernet LAN traffic at the source level”. In ACM SIGCOMM, pp. 100–113, 1995.
K. Windisch, V. Lo, R. Moore, D. Feitelson, and B. Nitzberg, “A comparison of workload traces from two production parallel machines”. In 6th Symp. Frontiers Massively Parallel Comput., pp. 319–326, Oct 1996.
G. K. Zipf, Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feitelson, D.G. (2002). Workload Modeling for Performance Evaluation. In: Calzarossa, M.C., Tucci, S. (eds) Performance Evaluation of Complex Systems: Techniques and Tools. Performance 2002. Lecture Notes in Computer Science, vol 2459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45798-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-45798-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44252-3
Online ISBN: 978-3-540-45798-5
eBook Packages: Springer Book Archive