Resampling with Feedback: A New Paradigm of Using Workload Data for Performance Evaluation

Feitelson, Dror G.

doi:10.1007/978-3-030-88224-2_1

Dror G. Feitelson ORCID: orcid.org/0000-0002-2733-7709¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12985))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

436 Accesses
2 Citations

Abstract

Reliable performance evaluations require representative workloads. This has led to the use of accounting logs from production systems as a source for workload data in simulations. But using such logs directly suffers from various deficiencies, such as providing data about only one specific situation, and lack of flexibility, namely the inability to adjust the workload as needed. Creating workload models solves some of these problems but creates others, most notably the danger of missing out on important details that were not recognized in advance, and therefore not included in the model. Resampling solves many of these deficiencies by combining the best of both worlds. It is based on partitioning real workloads into basic components (specifically the job streams contributed by different users), and then generating new workloads by sampling from this pool of basic components. The generated workloads are adjusted dynamically to the conditions of the simulated system using a feedback loop, which may change the throughput. Using this methodology analysts can create multiple varied (but related) workloads from the same original log, all the time retaining much of the structure that exists in the original workload. Resampling with feedback thus provides a new way to use workload logs which benefits from the realism of logs while eliminating many of their drawbacks. In addition, it enables evaluations of throughput effects that are impossible with static workloads.

This paper reflects a keynote address at JSSPP 2021, and provides more details than a previous version from a keynote at Euro-Par 2016 [18]. It summarizes my and my students’ work and reflects a personal view. The goal is to show the big picture and the building and interplay of ideas, at the possible expense of not providing a full overview of and comparison with related work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_4
Chapter Google Scholar
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization, pp. 140–148 (2001). https://doi.org/10.1109/WWC.2001.990753
Denning, P.J.: Performance analysis: experimental computer science at its best. Comm. ACM 24(11), 725–727 (1981). https://doi.org/10.1145/358790.358791
Article Google Scholar
Downey, A.B.: A parallel workload model and its implications for processor allocation. Cluster Comput. 1(1), 133–145 (1998). https://doi.org/10.1023/A:1019077214124
Article Google Scholar
Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Perform. Eval. Rev. 26(4), 14–29 (1999). https://doi.org/10.1145/309746.309750
Article Google Scholar
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Statist. 7(1), 1–26 (1979). https://doi.org/10.1214/aos/1176344552
Article MathSciNet MATH Google Scholar
Efron, B., Gong, G.: A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37(1), 36–48 (1983). https://doi.org/10.2307/2685844
Article MathSciNet Google Scholar
Feitelson, D.G.: Memory usage in the LANL CM-5 workload. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 78–94. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_17
Chapter Google Scholar
Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–205. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_11
Chapter MATH Google Scholar
Feitelson, D.G.: The forgotten factor: facts on performance evaluation and its dependence on workloads. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 49–60. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45706-2_4
Chapter Google Scholar
Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45798-4_6
Chapter MATH Google Scholar
Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003). https://doi.org/10.1109/MC.2003.1231190
Article Google Scholar
Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005). https://doi.org/10.1109/TPDS.2005.18
Article Google Scholar
Feitelson, D.G.: Experimental computer science: the need for a cultural change (2005). http://www.cs.huji.ac.il/~feit/papers/exp05.pdf
Feitelson, D.G.: Locality of sampling and diversity in parallel system workloads. In: 21st International Conference Supercomputing, pp. 53–63 (2007). https://doi.org/10.1145/1274971.1274982
Feitelson, D.G.: Looking at data. In: 22nd IEEE International Symposium on Parallel and Distributed Processing (2008). https://doi.org/10.1109/IPDPS.2008.4536092
Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015)
Book Google Scholar
Feitelson, D.G.: Resampling with feedback — a new paradigm of using workload data for performance evaluation. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 3–21. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_1
Chapter Google Scholar
Feitelson, D.G., Mu’alem, A.W.: On the definition of “on-line’’ in job scheduling problems. SIGACT News 36(1), 122–131 (2005). https://doi.org/10.1145/1052796.1052797
Article Google Scholar
Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546 (1998). https://doi.org/10.1109/IPPS.1998.669970
Feitelson, D.G., Naaman, M.: Self-tuning systems. IEEE Softw. 16(2), 52–60 (1999). https://doi.org/10.1109/52.754053
Article Google Scholar
Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_38
Chapter Google Scholar
Feitelson, D.G., Rudolph, L.: Distributed hierarchical control for parallel processing. Computer 23(5), 65–77 (1990). https://doi.org/10.1109/2.53356
Article Google Scholar
Feitelson, D.G., Rudolph, L.: Evaluation of design choices for gang scheduling using distributed hierarchical control. J. Parallel Distrib. Comput. 35(1), 18–34 (1996). https://doi.org/10.1006/jpdc.1996.0064
Article MATH Google Scholar
Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0053978
Chapter Google Scholar
Feitelson, D.G., Shmueli, E.: A case for conservative workload modeling: parallel job scheduling with daily cycles of activity. In: 17th Modelling, Analysis & Simulation of Computer and Telecommunication Systems (2009). https://doi.org/10.1109/MASCOT.2009.5366139
Feitelson, D.G., Tsafrir, D.: Workload sanitation for performance evaluation. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 221–230 (2006). https://doi.org/10.1109/ISPASS.2006.1620806
Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with using the parallel workloads archive. J. Parallel Distrib. Comput. 74(10), 2967–2982 (2014). https://doi.org/10.1016/j.jpdc.2014.06.013
Article Google Scholar
Floyd, S., Paxson, V.: Difficulties in simulating the Internet. IEEE/ACM Trans. Netw. 9(4), 392–403 (2001). https://doi.org/10.1109/90.944338
Article Google Scholar
Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riordan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_18
Chapter Google Scholar
Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_35
Chapter Google Scholar
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003). https://doi.org/10.1016/S0743-7315(03)00108-4
Article MATH Google Scholar
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). https://doi.org/10.1109/71.932708
Article Google Scholar
Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/
Prasad, R.S., Dovrolis, C.: Measuring the congestion responsiveness of internet traffic. In: Uhlig, S., Papagiannaki, K., Bonaventure, O. (eds.) PAM 2007. LNCS, vol. 4427, pp. 176–185. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71617-4_18
Chapter Google Scholar
Schroeder, B., Harchol-Balter, M.: Web servers under overload: how scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)
Article Google Scholar
Shmueli, E., Feitelson, D.G.: Using site-level modeling to evaluate the performance of parallel system schedulers. In: 14th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 167–176 (2006). https://doi.org/10.1109/MASCOTS.2006.50
Shmueli, E., Feitelson, D.G.: Uncovering the effect of system performance on user behavior from traces of parallel systems. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 274–280 (2007). https://doi.org/10.1109/MASCOTS.2007.67
Shmueli, E., Feitelson, D.G.: On simulation and design of parallel-systems schedulers: are we doing the right thing? IEEE Trans. Parallel Distrib. Syst. 20(7), 983–996 (2009). https://doi.org/10.1109/TPDS.2008.152
Article Google Scholar
Snir, M.: Computer and information science and engineering: one discipline, many specialties. Comm. ACM 54(3), 38–43 (2011). https://doi.org/10.1145/1897852.1897867
Article Google Scholar
Talby, D., Feitelson, D.G., Raveh, A.: Comparing logs and models of parallel workloads using the co-plot method. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 43–66. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_3
Chapter Google Scholar
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005). https://doi.org/10.1007/11605300_1
Chapter Google Scholar
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007). https://doi.org/10.1109/TPDS.2007.70606
Article Google Scholar
Tsafrir, D., Feitelson, D.G.: Instability in parallel job scheduling simulation: the role of workload flurries. In: 20th International Parallel & Distributed Processing Symposium (2006). https://doi.org/10.1109/IPDPS.2006.1639311
Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: IEEE International Symposium on Workload Characterization, pp. 131–141 (2006). https://doi.org/10.1109/IISWC.2006.302737
Tsafrir, D., Ouaknine, K., Feitelson, D.G.: Reducing performance evaluation sensitivity and variability by input shaking. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 231–237 (2007). https://doi.org/10.1109/MASCOTS.2007.58
Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level. In: ACM SIGCOMM Conference, pp. 100–113 (1995)
Google Scholar
Zakay, N., Feitelson, D.G.: On identifying user session boundaries in parallel workload logs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 216–234. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35867-8_12
Chapter Google Scholar
Zakay, N., Feitelson, D.G.: Workload resampling for performance evaluation of parallel job schedulers. Concurr. Comput. Pract. Exp. 26(12), 2079–2105 (2014). https://doi.org/10.1002/cpe.3240
Article Google Scholar
Zakay, N., Feitelson, D.G.: Preserving user behavior characteristics in trace-based simulation of parallel job scheduling. In: 22nd Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 51–60 (2014). https://doi.org/10.1109/MASCOTS.2014.15
Zakay, N., Feitelson, D.G.: Semi-open trace based simulation for reliable evaluation of job throughput and user productivity. In: 7th IEEE International Conference on Cloud Computing Technology & Science, pp. 413–421 (2015). https://doi.org/10.1109/CloudCom.2015.35
Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th International Symposium on High Performance Distributed Computing, pp. 236–243 (1999). https://doi.org/10.1109/HPDC.1999.805303

Download references

Acknowledgments

The work described here was by and large performed by several outstanding students, especially Edi Shmueli, Netanel Zakay, and Dan Tsafrir. Our work was supported by the Israel Science Foundation (grants no. 219/99 and 167/03) and the Ministry of Science and Technology, Israel.

Author information

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel
Dror G. Feitelson

Authors

Dror G. Feitelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dror G. Feitelson .

Editor information

Editors and Affiliations

CESNET, Prague, Czech Republic
Dalibor Klusáček
Google, Mountain View, CA, USA
Walfredo Cirne
Apple, Cupertino, CA, USA
Gonzalo P. Rodrigo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feitelson, D.G. (2021). Resampling with Feedback: A New Paradigm of Using Workload Data for Performance Evaluation. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2021. Lecture Notes in Computer Science(), vol 12985. Springer, Cham. https://doi.org/10.1007/978-3-030-88224-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-88224-2_1
Published: 06 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88223-5
Online ISBN: 978-3-030-88224-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics