Advertisement

Elastic CPU Cap Mechanism for Timely Dataflow Applications

  • M. Reza Hoseinyfarahabady
  • Nazanin Farhangsadr
  • Albert Y. Zomaya
  • Zahir Tari
  • Samee U. Khan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10860)

Abstract

Sudden surges in the incoming workload can cause adverse consequences on the run-time performance of data-flow applications. Our work addresses the problem of limiting CPU associated with the elastic scaling of timely data-flow (TDF) applications running in a shared computing environment while each application can possess a different quality of service (QoS) requirement. The key argument here is that an unwise consolidation decision to dynamically scale up/out the computing resources for responding to unexpected workload changes can degrade the performance of some (if not all) collocated applications due to their fierce competition getting the shared resources (such as the last level cache). The proposed solution uses a queue-based model to predict the performance degradation of running data-flow applications together. The problem of CPU cap adjustment is addressed as an optimization problem, where the aim is to reduce the quality of service violation incidents among applications while raising the CPU utilization level of server nodes as well as preventing the formation of bottlenecks due to the fierce competition among collocated applications. The controller uses and efficient dynamic method to find a solution at each round of the controlling epoch. The performance evaluation is carried out by comparing the proposed controller against an enhanced QoS-aware version of round robin strategy which is deployed in many commercial packages. Experimental results confirmed that the proposed solution improves QoS satisfaction by near to 148% on average while it can reduce the latency of processing data records for applications in the highest QoS classes by near to 19% during workload surges.

Keywords

Shared resource interference Distributed stream processing Scheduling and resource allocation algorithms 

Notes

Acknowledgement

We would like to acknowledge the support by Australian Research Council (ARC) for the work carried out in this paper, under Linkage project scheme (LP160100406). Samee U. Khan’s work is supported by (while serving at) the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

  1. 1.
    Xen credit scheduler. wiki.xen.org/wiki/Credit_Scheduler. Accessed 1 Nov 2017
  2. 2.
    Abdelwahed, S., et al.: On the application of MPC techniques for adaptive performance management of computing systems. IEEE Trans. Netw. Serv. Manag. 6(4), 212–225 (2009)CrossRefGoogle Scholar
  3. 3.
    Akidau, T., Balikov, A., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6(11), 1033–1044 (2013)CrossRefGoogle Scholar
  4. 4.
    Allen, A.O.: Probability, Statistics, and Queueing Theory. Academic Press, Cambridge (2014)Google Scholar
  5. 5.
    Bolch, G., Greiner, S., de Meer, H., Trivedi, K.S.: Queueing Networks and Markov Chains. Wiley, Hoboken (2006)CrossRefGoogle Scholar
  6. 6.
    Thekkath, C.: Naiad project (2017). https://www.microsoft.com/en-us/research/project/naiad
  7. 7.
    Chen, L., Shen, H.: Considering resource demand misalignments to reduce resource over-provisioning in cloud. In: IEEE Conference on Computer Communications (2017)Google Scholar
  8. 8.
    Chen, L., Shen, H., Platt, S.: Cache contention aware VM placement & migration in cloud. In: International Conference on Network Protocols, pp. 1–10. IEEE (2016)Google Scholar
  9. 9.
    Croarkin, C., Tobias, P., Filliben, J.J., Hembree, B., Guthrie, W.: NIST/SEMATECH e-Handbook of Statistical Methods. NIST, U.S. Department of Commerce, NY, USA (2006). http://www.itl.nist.gov/div898/handbook
  10. 10.
    Dagum, P., Karp, R., Luby, M., Ross, S.: An optimal algorithm for Monte Carlo estimation. SIAM J. Comput. 29(5), 1484–1496 (2000)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  12. 12.
    Murray, D.: An introduction to timely dataflow (2017). bigdataatsvc.wordpress.com/2013/09/18/an-introduction-to-timely-dataflow/
  13. 13.
    Dudoladov, S., Xu, C., et al.: Optimistic recovery for iterative dataflows in action. In: ACM SIGMOD International Conference on Management of Data, pp. 1439–1443 (2015)Google Scholar
  14. 14.
    Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. (CSUR) 46(4), 46 (2014)CrossRefGoogle Scholar
  15. 15.
    Huang, X., Xue, G., Yu, R., Leng, S.: Joint scheduling and beamforming coordination in cloud radio access networks with qos guarantees. IEEE Trans. Veh. Technol. 65(7), 5449–5460 (2016)CrossRefGoogle Scholar
  16. 16.
    Li, B., Diao, Y., Shenoy, P.: Supporting scalable analytics with latency constraints. Proc. VLDB Endow. 8(11), 1166–1177 (2015)CrossRefGoogle Scholar
  17. 17.
    Li, K., Liu, C., Li, K.: An approximation algorithm based on game theory for scheduling simple linear deteriorating jobs. Theor. Comput. Sci. 543, 46–51 (2014)MathSciNetCrossRefGoogle Scholar
  18. 18.
    McSherry, F.: A modular implementation of timely dataflow in rust. https://github.com/frankmcsherry/timely-dataflow. Accessed 1 Nov 2017
  19. 19.
    McSherry, F., Isard, M., et al.: Scalability! but at what cost? In: HotOS (2015)Google Scholar
  20. 20.
    Mencagli, G.: Adaptive model predictive control of autonomic distributed parallel computations with variable horizons and switching costs. Concurrency Comput.: Pract. Exp. 28(7), 2187–2212 (2016)CrossRefGoogle Scholar
  21. 21.
    Mencagli, G., Vanneschi, M., Vespa, E.: A cooperative predictive control approach to improve the reconfiguration stability of adaptive distributed parallel applications. ACM Trans. Auton. Adapt. Syst. 9(1), 2 (2014)CrossRefGoogle Scholar
  22. 22.
    Murray, D.G., McSherry, F., et al.: Naiad: a timely dataflow system. In: ACM Symposium on Operating Systems Principles, pp. 439–455 (2013)Google Scholar
  23. 23.
    Padala, P., et al.: Automated control of multiple virtualized resources. In: European Conference on Computer Systems (EuroSys), pp. 13–26. ACM (2009)Google Scholar
  24. 24.
    Rao, J., Zhou, X.: Towards fair and efficient SMP VM scheduling. In: SIGPLAN Symposium on Principles & Practice of Parallel Programming, pp. 273–286. ACM (2014)Google Scholar
  25. 25.
    Rawlings, J.B., Mayne, D.Q.: Model Predictive Control: Theory and Design. Nob Hill Publishing, LLC, Madison (2009)Google Scholar
  26. 26.
    Şahin, S.: C-stream: a coroutune-based elastic stream processing engine. Ph.D. thesis, Bilkent University (2015)Google Scholar
  27. 27.
    Subramanian, L., Seshadri, V., Ghosh, A., Khan, S., Mutlu, O.: The application slowdown model. In: Microarchitecture (MICRO), pp. 62–75. IEEE (2015)Google Scholar
  28. 28.
    Tanner, M.: Practical Queueing Analysis. McGraw-Hill, New York City (1995)Google Scholar
  29. 29.
    Tembey, P., Gavrilovska, A., et al.: Application & platform-aware RA in consolidated systems. In: Symposium on Cloud Computing, pp. 1–14. ACM (2014)Google Scholar
  30. 30.
    Wang, H., Isci, C., Subramanian, L., Choi, J., Qian, D., Mutlu, O.: A-DRM: architecture-aware distributed resource management of virtualized clusters. ACM SIGPLAN Not. 50(7), 93–106 (2015)CrossRefGoogle Scholar
  31. 31.
    Yang, F., Qian, Z., Chen, X., Beschastnikh, I., Zhuang, L., Zhou, L., Shen, J.: Sonora: a platform for continuous mobile-cloud computing. Technical report, Microsoft Research Asia (2012)Google Scholar
  32. 32.
    Ye, K., et al.: Profiling-based workload consolidation & migration in VDCs. IEEE Trans. Parallel Distrib. Syst. 26(3), 878–890 (2015)CrossRefGoogle Scholar
  33. 33.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • M. Reza Hoseinyfarahabady
    • 1
  • Nazanin Farhangsadr
    • 1
  • Albert Y. Zomaya
    • 1
  • Zahir Tari
    • 2
  • Samee U. Khan
    • 3
  1. 1.School of IT, Center for Distributed and High Performance ComputingThe University of SydneySydneyAustralia
  2. 2.School of ScienceRMIT UniversityMelbourneAustralia
  3. 3.Department of Electrical and Computer EngineeringNorth Dakota State UniversityFargoUSA

Personalised recommendations