Skip to main content

Resampling with Feedback: A New Paradigm of Using Workload Data for Performance Evaluation

(Extended Version)

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12985))

Included in the following conference series:

Abstract

Reliable performance evaluations require representative workloads. This has led to the use of accounting logs from production systems as a source for workload data in simulations. But using such logs directly suffers from various deficiencies, such as providing data about only one specific situation, and lack of flexibility, namely the inability to adjust the workload as needed. Creating workload models solves some of these problems but creates others, most notably the danger of missing out on important details that were not recognized in advance, and therefore not included in the model. Resampling solves many of these deficiencies by combining the best of both worlds. It is based on partitioning real workloads into basic components (specifically the job streams contributed by different users), and then generating new workloads by sampling from this pool of basic components. The generated workloads are adjusted dynamically to the conditions of the simulated system using a feedback loop, which may change the throughput. Using this methodology analysts can create multiple varied (but related) workloads from the same original log, all the time retaining much of the structure that exists in the original workload. Resampling with feedback thus provides a new way to use workload logs which benefits from the realism of logs while eliminating many of their drawbacks. In addition, it enables evaluations of throughput effects that are impossible with static workloads.

This paper reflects a keynote address at JSSPP 2021, and provides more details than a previous version from a keynote at Euro-Par 2016 [18]. It summarizes my and my students’ work and reflects a personal view. The goal is to show the big picture and the building and interplay of ideas, at the possible expense of not providing a full overview of and comparison with related work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_4

    Chapter  Google Scholar 

  2. Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization, pp. 140–148 (2001). https://doi.org/10.1109/WWC.2001.990753

  3. Denning, P.J.: Performance analysis: experimental computer science at its best. Comm. ACM 24(11), 725–727 (1981). https://doi.org/10.1145/358790.358791

    Article  Google Scholar 

  4. Downey, A.B.: A parallel workload model and its implications for processor allocation. Cluster Comput. 1(1), 133–145 (1998). https://doi.org/10.1023/A:1019077214124

    Article  Google Scholar 

  5. Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Perform. Eval. Rev. 26(4), 14–29 (1999). https://doi.org/10.1145/309746.309750

    Article  Google Scholar 

  6. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Statist. 7(1), 1–26 (1979). https://doi.org/10.1214/aos/1176344552

    Article  MathSciNet  MATH  Google Scholar 

  7. Efron, B., Gong, G.: A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37(1), 36–48 (1983). https://doi.org/10.2307/2685844

    Article  MathSciNet  Google Scholar 

  8. Feitelson, D.G.: Memory usage in the LANL CM-5 workload. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 78–94. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_17

    Chapter  Google Scholar 

  9. Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–205. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_11

    Chapter  MATH  Google Scholar 

  10. Feitelson, D.G.: The forgotten factor: facts on performance evaluation and its dependence on workloads. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 49–60. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45706-2_4

    Chapter  Google Scholar 

  11. Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45798-4_6

    Chapter  MATH  Google Scholar 

  12. Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003). https://doi.org/10.1109/MC.2003.1231190

    Article  Google Scholar 

  13. Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005). https://doi.org/10.1109/TPDS.2005.18

    Article  Google Scholar 

  14. Feitelson, D.G.: Experimental computer science: the need for a cultural change (2005). http://www.cs.huji.ac.il/~feit/papers/exp05.pdf

  15. Feitelson, D.G.: Locality of sampling and diversity in parallel system workloads. In: 21st International Conference Supercomputing, pp. 53–63 (2007). https://doi.org/10.1145/1274971.1274982

  16. Feitelson, D.G.: Looking at data. In: 22nd IEEE International Symposium on Parallel and Distributed Processing (2008). https://doi.org/10.1109/IPDPS.2008.4536092

  17. Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  18. Feitelson, D.G.: Resampling with feedback — a new paradigm of using workload data for performance evaluation. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 3–21. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_1

    Chapter  Google Scholar 

  19. Feitelson, D.G., Mu’alem, A.W.: On the definition of “on-line’’ in job scheduling problems. SIGACT News 36(1), 122–131 (2005). https://doi.org/10.1145/1052796.1052797

    Article  Google Scholar 

  20. Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546 (1998). https://doi.org/10.1109/IPPS.1998.669970

  21. Feitelson, D.G., Naaman, M.: Self-tuning systems. IEEE Softw. 16(2), 52–60 (1999). https://doi.org/10.1109/52.754053

    Article  Google Scholar 

  22. Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_38

    Chapter  Google Scholar 

  23. Feitelson, D.G., Rudolph, L.: Distributed hierarchical control for parallel processing. Computer 23(5), 65–77 (1990). https://doi.org/10.1109/2.53356

    Article  Google Scholar 

  24. Feitelson, D.G., Rudolph, L.: Evaluation of design choices for gang scheduling using distributed hierarchical control. J. Parallel Distrib. Comput. 35(1), 18–34 (1996). https://doi.org/10.1006/jpdc.1996.0064

    Article  MATH  Google Scholar 

  25. Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0053978

    Chapter  Google Scholar 

  26. Feitelson, D.G., Shmueli, E.: A case for conservative workload modeling: parallel job scheduling with daily cycles of activity. In: 17th Modelling, Analysis & Simulation of Computer and Telecommunication Systems (2009). https://doi.org/10.1109/MASCOT.2009.5366139

  27. Feitelson, D.G., Tsafrir, D.: Workload sanitation for performance evaluation. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 221–230 (2006). https://doi.org/10.1109/ISPASS.2006.1620806

  28. Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with using the parallel workloads archive. J. Parallel Distrib. Comput. 74(10), 2967–2982 (2014). https://doi.org/10.1016/j.jpdc.2014.06.013

    Article  Google Scholar 

  29. Floyd, S., Paxson, V.: Difficulties in simulating the Internet. IEEE/ACM Trans. Netw. 9(4), 392–403 (2001). https://doi.org/10.1109/90.944338

    Article  Google Scholar 

  30. Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riordan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_18

    Chapter  Google Scholar 

  31. Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_35

    Chapter  Google Scholar 

  32. Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003). https://doi.org/10.1016/S0743-7315(03)00108-4

    Article  MATH  Google Scholar 

  33. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). https://doi.org/10.1109/71.932708

    Article  Google Scholar 

  34. Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/

  35. Prasad, R.S., Dovrolis, C.: Measuring the congestion responsiveness of internet traffic. In: Uhlig, S., Papagiannaki, K., Bonaventure, O. (eds.) PAM 2007. LNCS, vol. 4427, pp. 176–185. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71617-4_18

    Chapter  Google Scholar 

  36. Schroeder, B., Harchol-Balter, M.: Web servers under overload: how scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)

    Article  Google Scholar 

  37. Shmueli, E., Feitelson, D.G.: Using site-level modeling to evaluate the performance of parallel system schedulers. In: 14th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 167–176 (2006). https://doi.org/10.1109/MASCOTS.2006.50

  38. Shmueli, E., Feitelson, D.G.: Uncovering the effect of system performance on user behavior from traces of parallel systems. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 274–280 (2007). https://doi.org/10.1109/MASCOTS.2007.67

  39. Shmueli, E., Feitelson, D.G.: On simulation and design of parallel-systems schedulers: are we doing the right thing? IEEE Trans. Parallel Distrib. Syst. 20(7), 983–996 (2009). https://doi.org/10.1109/TPDS.2008.152

    Article  Google Scholar 

  40. Snir, M.: Computer and information science and engineering: one discipline, many specialties. Comm. ACM 54(3), 38–43 (2011). https://doi.org/10.1145/1897852.1897867

    Article  Google Scholar 

  41. Talby, D., Feitelson, D.G., Raveh, A.: Comparing logs and models of parallel workloads using the co-plot method. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 43–66. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_3

    Chapter  Google Scholar 

  42. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005). https://doi.org/10.1007/11605300_1

    Chapter  Google Scholar 

  43. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007). https://doi.org/10.1109/TPDS.2007.70606

    Article  Google Scholar 

  44. Tsafrir, D., Feitelson, D.G.: Instability in parallel job scheduling simulation: the role of workload flurries. In: 20th International Parallel & Distributed Processing Symposium (2006). https://doi.org/10.1109/IPDPS.2006.1639311

  45. Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: IEEE International Symposium on Workload Characterization, pp. 131–141 (2006). https://doi.org/10.1109/IISWC.2006.302737

  46. Tsafrir, D., Ouaknine, K., Feitelson, D.G.: Reducing performance evaluation sensitivity and variability by input shaking. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 231–237 (2007). https://doi.org/10.1109/MASCOTS.2007.58

  47. Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level. In: ACM SIGCOMM Conference, pp. 100–113 (1995)

    Google Scholar 

  48. Zakay, N., Feitelson, D.G.: On identifying user session boundaries in parallel workload logs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 216–234. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35867-8_12

    Chapter  Google Scholar 

  49. Zakay, N., Feitelson, D.G.: Workload resampling for performance evaluation of parallel job schedulers. Concurr. Comput. Pract. Exp. 26(12), 2079–2105 (2014). https://doi.org/10.1002/cpe.3240

    Article  Google Scholar 

  50. Zakay, N., Feitelson, D.G.: Preserving user behavior characteristics in trace-based simulation of parallel job scheduling. In: 22nd Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 51–60 (2014). https://doi.org/10.1109/MASCOTS.2014.15

  51. Zakay, N., Feitelson, D.G.: Semi-open trace based simulation for reliable evaluation of job throughput and user productivity. In: 7th IEEE International Conference on Cloud Computing Technology & Science, pp. 413–421 (2015). https://doi.org/10.1109/CloudCom.2015.35

  52. Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th International Symposium on High Performance Distributed Computing, pp. 236–243 (1999). https://doi.org/10.1109/HPDC.1999.805303

Download references

Acknowledgments

The work described here was by and large performed by several outstanding students, especially Edi Shmueli, Netanel Zakay, and Dan Tsafrir. Our work was supported by the Israel Science Foundation (grants no. 219/99 and 167/03) and the Ministry of Science and Technology, Israel.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dror G. Feitelson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feitelson, D.G. (2021). Resampling with Feedback: A New Paradigm of Using Workload Data for Performance Evaluation. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2021. Lecture Notes in Computer Science(), vol 12985. Springer, Cham. https://doi.org/10.1007/978-3-030-88224-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88224-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88223-5

  • Online ISBN: 978-3-030-88224-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics