Modeling Multiclass Task-Based Applications on Heterogeneous Distributed Environments
The volume of data, one of the five “V” characteristics of Big Data, grows at a rate that is much higher than the increase of ability of the existing systems to manage it within an acceptable time. Several technologies have been developed to approach this scalability issue. For instance, MapReduce has been introduced to cope with the problem of processing a huge amount of data, by splitting the computation into a set of tasks that are concurrently executed. The savings of even a marginal time in the processing of all the tasks of a set can bring valuable benefits to the execution of the whole application and to the management costs of the entire data center. To this end, we propose a technique to minimize the global processing time of a set of tasks, having different service requirements, concurrently executed on two or more heterogeneous systems. The validity of the proposed technique is demonstrated using a multiformalism model that consists of a combination of Queueing Networks and Petri Nets. Application of this technique to an Apache Hive case-study shows that the described allocation policy can lead to performance gains on both total execution time and energy consumption.
KeywordsPool depletion systems MapReduce Schedulers Energy efficiency Performance evaluation Queueing networks Petri nets Multiformalism models
This research was supported in part by the European Commission under the grant ANTAREX H2020 FET-HPC-671623.
- 1.Andrew, L.L., Lin, M., Wierman, A.: Optimality, fairness, and robustness in speed scaling designs. In: ACM SIGMETRICS Performance Evaluation Review, vol. 38, pp. 37–48. ACM (2010)Google Scholar
- 2.Bansal, N., Chan, H.L., Pruhs, K.: Speed scaling with an arbitrary power function. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 693–701. Society for Industrial and Applied Mathematics (2009)Google Scholar
- 5.Cerotti, D., Gribaudo, M., Piazzolla, P., Pinciroli, R., Serazzi, G.: Multi-class queuing networks models for energy optimization. In: Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools, pp. 98–105. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2014)Google Scholar
- 6.Cerotti, D., Gribaudo, M., Piazzolla, P., Serazzi, G.: Flexible CPU provisioning in clouds: a new source of performance unpredictability. In: Ninth International Conference on Quantitative Evaluation of Systems, QEST 2012, London, United Kingdom, 17–20 September 2012, pp. 230–237 (2012)Google Scholar
- 8.Cerotti, D., Gribaudo, M., Pinciroli, R., Serazzi, G.: Optimal population mix in pool depletion systems with two-class workload. In: 10th EAI International Conference on Performance Evaluation Methodologies and Tools. ACM (2017)Google Scholar
- 10.Fan, X., Weber, W.D., Barroso, L.A.: Power provisioning for a warehouse-sized computer. In: ACM SIGARCH Computer Architecture News, vol. 35, pp. 13–23. ACM (2007)Google Scholar
- 12.Gribaudo, M., Iacono, M.: Theory and Application of Multi-formalism Modeling. IGI Global, Hershey (2013)Google Scholar
- 14.Huang, L., Wang, X.W., Zhai, Y.D., Yang, B.: Extraction of user profile based on the hadoop framework. In: 5th International Conference on Wireless Communications, Networking and Mobile Computing, WiCom 2009, pp. 1–6. IEEE (2009)Google Scholar
- 16.Kang, C.W., Abbaspour, S., Pedram, M.: Buffer sizing for minimum energy-delay product by using an approximating polynomial. In: Proceedings of the 13th ACM Great Lakes Symposium on VLSI, pp. 112–115. ACM (2003)Google Scholar
- 18.Kulkarni, A.P., Khandewal, M.: Survey on hadoop and introduction to YARN. Int. J. Emerg. Technol. Adv. Eng. 4(5), 82–87 (2014)Google Scholar
- 19.Rosti, E., Schiavoni, F., Serazzi, G.: Queueing network models with two classes of customers. In: Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 1997, pp. 229–234. IEEE (1997)Google Scholar
- 21.Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009-55, EECS Department, University of California, Berkeley (2009)Google Scholar