Abstract
There are many choices to make when evaluating the performance of a complex system. In the context of parallel job scheduling, one must decide what workload to use and what measurements to take. These decisions sometimes have subtle implications that are easy to overlook. In this paper we document numerous pitfalls one may fall into, with the hope of providing at least some help in avoiding them. Along the way, we also identify topics that could benefit from additional research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aida, K., Kasahara, H., Narita, S.: Job scheduling scheme for pure space sharing among rigid jobs. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 98–121. Springer, Heidelberg (1998)
Alverson, G., Kahan, S., Korry, R., McCann, C., Smith, B.: Scheduling on the Tera MTA. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 19–44. Springer, Heidelberg (1995)
Antonopoulos, C.D., Nikolopoulos, D.S., Papatheodorou, T.S.: Informing algorithms for efficient scheduling of synchronizing threads on multiprogrammed SMPs. In: 30th International Conference on Parallel Processing (ICPP), Valencia, Spain, September 2001, pp. 123–130 (2001)
Arpaci-Dusseau, A.C.: Implicit Coscheduling: Coordinated scheduling with implicit information in distributed systems. ACM Transactions on Computer Systems 19(3), 283–331 (2001)
Bailey, D.H.: Misleading performance in the supercomputing field. In: IEEE/ACM Supercomputing, Minneapolis, MN, November 1992, pp. 155–158 (1992)
Bailey, D.H., Dagum, L., Barszcz, E., Simon, H.D.: NAS parallel benchmark results. In: IEEE/ACM Supercomputing, Minneapolis, MN, November 1992, pp. 386–393 (1992)
Batat, A., Feitelson, D.G.: Gang scheduling with memory considerations. In: 14th International Parallel and Distributed Processing Symposium (IPDPS), May 2000, pp. 109–114 (2000)
Brecht, T.B.: An experimental evaluation of processor pool-based scheduling for shared-memory NUMA multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 139–165. Springer, Heidelberg (1997)
Chapin, S.J., Cirne, W., Feitelson, D.G., Jones, J.P., Leutenegger, S.T., Schwiegelshohn, U., Smith, W., Talby, D.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999)
Cirne, W., Berman, F.: Adaptive selection of partition size for supercomputer requests. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 187–207. Springer, Heidelberg (2000)
Crovella, M.E.: Performance evaluation with heavy tailed distributions. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 1–10. Springer, Heidelberg (2001)
Cypher, R., Ho, A., Konstantinidou, S., Messina, P.: Architectural requirements of parallel scientific applications with explicit communication. In: 20th International Symposium on Computer Architecture (ISCA), May 1993, pp. 2–13 (1993)
Downey, A.B.: A model for speedup of parallel programs. Technical Report UCB/CSD-97-933, University of California, Berkeley, CA (January 1997)
Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Performance Evaluation Review 26(4), 14–29 (1999)
Ernemann, C., Song, B., Yahyapour, R.: Scaling of workload traces. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 166–182. Springer, Heidelberg (2003)
Feitelson, D.G.: Packing schemes for gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 89–110. Springer, Heidelberg (1996)
Feitelson, D.G.: Memory usage in the LANL CM-5 workload. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 78–94. Springer, Heidelberg (1997)
Feitelson, D.G.: A critique of ESP. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 68–73. Springer, Heidelberg (2000)
Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–1205. Springer, Heidelberg (2001)
Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)
Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003)
Feitelson, D.G., Jette, M.A.: Improved utilization and responsiveness with gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 238–261. Springer, Heidelberg (1997)
Feitelson, D.G., Mu’alem, A.W.: On the definition of “on-line” in job scheduling problems. ACM SIGACT News 36(1), 122–131 (2005)
Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)
Feitelson, D.G., Rudolph, L.: Gang scheduling performance benefits for fine-grain synchronization. Journal of Parallel and Distributed Computing 16(4), 306–318 (1992)
Feitelson, D.G., Rudolph, L.: Coscheduling based on run-time identification of activity working sets. International Journal of Parallel Programming 23(2), 136–160 (1995)
Feitelson, D.G., Rudolph, L.: Parallel job scheduling: Issues and approaches. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 1–18. Springer, Heidelberg (1995)
Feitelson, D.G., Rudolph, L.: Evaluation of design choices for gang scheduling using distributed hierarchical control. Journal of Parallel and Distributed Computing 35(1), 18–34 (1996)
Feitelson, D.G., Rudolph, L.: Toward convergence in job schedulers for parallel supercomputers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 1–26. Springer, Heidelberg (1996)
Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)
Feitelson, D.G., Rudolph, L., Schwigelshohn, U.: Parallel job scheduling – A status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005)
Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium (IPPS), April 1998, pp. 542–546 (1998)
Frachtenberg, E., Feitelson, D.G., Fernandez-Peinador, J., Petrini, F.: Parallel job scheduling under dynamic workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 208–227. Springer, Heidelberg (2003)
Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems (to appear)
Gupta, A., Tucker, A., Urushibara, S.: The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications. In: SIGMETRICS Measurement & Modeling of Computer Systems, San Diego, CA, May 1991, pp. 120–132 (1991)
Holt, G.: Time-critical scheduling on a well utilised HPC system at ECMWF using LoadLeveler with resource reservation. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 102–124. Springer, Heidelberg (2005)
Hori, A., Tezuka, H., Ishikawa, Y.: Overhead analysis of preemptive gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 217–230. Springer, Heidelberg (1998)
Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riodan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997)
Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: A historical perspective of achievable utilization. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 1–16. Springer, Heidelberg (1999)
Krevat, E., Castaños, J.G., Moreira, J.E.: Job scheduling for the BlueGene/L system. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 38–54. Springer, Heidelberg (2002)
Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)
Lee, W., Frank, M., Lee, V., Mackenzie, K., Rudolph, L.: Implications of I/O for gang scheduled workloads. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 215–237. Springer, Heidelberg (1997)
Li, H., Groep, D., Wolters, L.: Workload characteristics of a multi-cluster supercomputer. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 176–193. Springer, Heidelberg (2005)
Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)
Liu, W., Lo, V., Windisch, K., Nitzberg, B.: Non-contiguous processor allocation algorithms for distributed memory multicomputers. In: IEEE/ACM Supercomputing, November 1994, pp. 227–236 (1994)
Lo, V., Mache, J.: Job scheduling for prime time vs. non-prime time. In: Fourth Proceedings of the IEEE International Conference on Cluster Computing, September 2002, pp. 488–493 (2002)
Lo, V., Mache, J., Windisch, K.: A comparative study of real workload traces and synthetic workload models for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 25–46. Springer, Heidelberg (1998)
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)
MacDougall, M.H.: Simulating Computer Systems: Techniques and Tools. MIT Press, Cambridge (1987)
Moreira, J.E., Chan, W., Fong, L.L., Franke, H., Jette, M.A.: An infrastructure for efficient parallel job execution in terascale computing environments. In: IEEE/ACM Supercomputing, Orlando, FL (November 1998)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6), 529–543 (2001)
Nguyen, T.D., Vaswani, R., Zahorjan, J.: Parallel application characterization for multiprocessor scheduling policy design. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 175–199. Springer, Heidelberg (1996)
Nieuwejaar, N., Kotz, D., Purakayastha, A., Ellis, C.S., Best, M.L.: File-access characteristics of parallel scientific workloads. IEEE Transactions on Parallel and Distributed Systems 7(10), 1075–1089 (1996)
Parallel workload archive, http://www.cs.huji.ac.il/labs/parallel/workload
Parsons, E.W., Sevcik, K.C.: Multiprocessor scheduling for high-variability service time distributions. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 127–145. Springer, Heidelberg (1995)
Pawlikowski, K.: Steady-state simulation of queueing processes: A survey of problems and solutions. ACM Computing Surveys 22(2), 123–170 (1990)
Peris, V.G.J., Squillante, M.S., Naik, V.K.: Analysis of the impact of memory in distributed parallel processing systems. In: SIGMETRICS Measurement & Modeling of Computer Systems, Nashville, TN, May 1994, pp. 5–18 (1994)
ASCI program. ASCI technology prospectus: Simulation and computational science. Technical Report DOE/DP/ASC-ATP-001, National Nuclear Security Agency (July 2001)
Rudolph, L., Smith, P.: Valuation of ultra-scale computing systems. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 39–55. Springer, Heidelberg (2000)
Schwiegelshohn, U., Yahyapour, R.: Improving first-come-first-serve job scheduling by gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 180–198. Springer, Heidelberg (1998)
Setia, S.K.: The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 146–164. Springer, Heidelberg (1995)
Smith, K.A., Seltzer, M.I.: File system aging—Increasing the relevance of file system benchmarks. In: SIGMETRICS Measurement & Modeling of Computer Systems, June 1997, pp. 203–213 (1997)
Sobalvarro, P.G., Weihl, W.E.: Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 106–126. Springer, Heidelberg (1995)
Sodan, A.C., Lan, L.: LOMARC—Lookahead matchmaking for multi-resource coscheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 288–315. Springer, Heidelberg (2005)
Squillante, M., Zhang, Y., Sivasubramaniam, S., Gautam, N., Franke, H., Moreira, J.: Modeling and analysis of dynamic coscheduling in parallel and distributed environments. In: SIGMETRICS Measurement & Modeling of Computer Systems, Marina Del Rey, CA, June 2002, pp. 43–54 (2002)
Squillante, M.S.: On the benefits and limitations of dynamic partitioning in parallel computer systems. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 219–238. Springer, Heidelberg (1995)
Squillante, M.S., Yao, D.D., Zhang, L.: Analysis of job arrival patterns and parallel scheduling performance. Performance Evaluation 36–37, 137–163 (1999)
Talby, D., Feitelson, D.G., Raveh, A.: Comparing logs and models of parallel workloads using the Co-Plot method. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 43–66. Springer, Heidelberg (1999)
Tongsima, S., Chantrapornchai, C., Sha, E.H.-M.: Probabilistic loop scheduling considering communication overhead. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 158–179. Springer, Heidelberg (1998)
Top 500 supercomputers, http://www.top500.org
Trivedi, K.S., Vaidyanathan, K.: Software reliability and rejuvenation: Modeling and analysis. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 318–345. Springer, Heidelberg (2002)
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005)
Tsafrir, D., Feitelson, D.G.: Workload flurries. Technical Report 2003-85, Hebrew University (November 2003)
Wan, M., Moore, R., Kremenek, G., Steube, K.: A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 48–64. Springer, Heidelberg (1996)
Wang, F., Papaefthymiou, M., Squillante, M.: Performance evaluation of gang scheduling for parallel and distributed multiprogramming. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 277–298. Springer, Heidelberg (1997)
Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level. In: ACM SIGCOMM, pp. 100–113 (1995)
Wiseman, Y., Feitelson, D.G.: Paired gang scheduling. IEEE Transactions on Parallel and Distributed Systems 14(6), 581–592 (2003)
Wong, A.T., Oliker, L., Kramer, W.T.C., Kaltz, T.L., Bailey, D.H.: ESP: A system utilization benchmark. In: IEEE/ACM Supercomputing, Dallas, TX, November 2000, pp. 52–52 (2000)
Yue, K.K., Lilja, D.J.: Loop-level process control: An effective processor allocation policy for multiprogrammed shared-memory multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 182–199. Springer, Heidelberg (1995)
Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 133–158. Springer, Heidelberg (2001)
Zhang, Y., Yang, A., Sivasubramaniam, A., Moreira, J.: Gang scheduling extensions for I/O intensive workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 183–207. Springer, Heidelberg (2003)
Zhou, B.B., Brent, R.P., Walsh, D., Suzaki, K.: Job scheduling strategies for networks of workstations. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 143–157. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frachtenberg, E., Feitelson, D.G. (2005). Pitfalls in Parallel Job Scheduling Evaluation. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2005. Lecture Notes in Computer Science, vol 3834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605300_13
Download citation
DOI: https://doi.org/10.1007/11605300_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31024-2
Online ISBN: 978-3-540-31617-6
eBook Packages: Computer ScienceComputer Science (R0)