Abstract
Choosing the right setting for big data frameworks is an important yet difficult task. These frameworks come with a complex set of parameters that need to be tuned to achieve the best performance in terms of throughput and latency. Learning-based auto-tuning methods using traditional machine learning models might not be effective for the task because they require huge amounts of high-quality training data, which is time-consuming and very expensive. A good alternative would be to consider reinforcement learning methods to train an intelligent agent through trial and error. In this context, we propose a framework-agnostic auto-tuning system implementing an actor-critic algorithm namely TD3 (Twin Delayed Deep Deterministic Policy Gradient). We show that the agent can find an optimal configuration in a continuous high-dimensional search space with a limited number of steps. We conducted extensive experiments on Apache Spark, under different workloads from the HiBench, TPC-DS and TPC-H benchmarking tools. In this paper, we give a detailed representation of the reinforcement learning environment and show the best design through experiments. Results showed that our approach outperforms the state-of-the-art tuning methods and can improve the performance of spark workloads over the default configurations by up to \(\sim 77\%\) with an average of \(\sim 45\%\). It also showed a promising adaptation behaviour to workload variation during evaluation.
Similar content being viewed by others
References
Petridis P, Gounaris A, Torres J (2017) Spark parameter tuning via trial-and-error. In: Angelov, P, Manolopoulos, Y, Iliadis, L, Roy, A, Vellasco, M. (eds.) Advances in Big Data. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_24, pp 226–237
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge, MA, USA. http://www.deeplearningbook.org. Accessed 8 Feb 2022
Zhao X, Yin J, Chen Z, He S (2013) Workload classification model for specializing virtual machine operating system. In: 2013 IEEE Sixth international conference on cloud computing. https://doi.org/10.1109/CLOUD.2013.144, pp 343–350
Aken DV, Pavlo A, Gordon GJ, Zhang B (2017) Automatic database management system tuning through large-scale machine learning. Proceedings of the ACM SIGMOD international conference on management of data part F1277, pp 1009–1024. https://doi.org/10.1145/3035918.3064029
Du H, Han P, Xiang Q, Huang S (2020) Monkeyking: Adaptive parameter tuning on big data platforms with deep reinforcement learning. Big Data 8(4):270–290. https://doi.org/10.1089/big.2019.0123
Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: Self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836. https://doi.org/10.14778/3137765.3137786
Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst 12(4):1–3. https://doi.org/10.1145/3132618
Bilal M, Canini M (2017) Towards Automatic Parameter Tuning of Stream Processing Systems. In: Proceedings of the 2017 symposium on cloud computing. SoCC ’17, pp 189–200. Association for Computing Machinery, New York. https://doi.org/10.1145/3127479.3127492
Trotter M, Liu G, Wood T (2017) Into the storm: Descrying optimal configurations using genetic algorithms and bayesian optimization, pp 175–180. https://doi.org/10.1109/FAS-W.2017.144
Zacheilas N, Maroulis S, Priovolos T, Kalogeraki V, Gunopulos D (2018) Dione: A framework for automatic profiling and tuning big data Applications. In: Proceedings - IEEE 34th international conference on data engineering, ICDE 2018. https://doi.org/10.1109/ICDE.2018.00195, pp 1637–1640
Kalim F, Cooper T, Wu H, Li Y, Wang N, Lu N, Fu M, Qian X, Luo H, Cheng D, Wang Y, Dai F, Ghosh M, Wang B (2019) Caladrius: A performance modelling service for distributed stream processing systems, pp 1886–1897. https://doi.org/10.1109/ICDE.2019.00204
Ahmed N, Barczak ALC, Rashid MA, Susnjak T (2021) An enhanced parallelisation model for performance prediction of apache spark on a multinode hadoop cluster. Big Data and Cognitive Computing 5. https://doi.org/10.3390/bdcc5040065
Chen Y, Lu J, Chen C, Hoque M, Tarkoma S (2019) Cost-effective resource provisioning for spark workloads, pp 2477–2480. Association for Computing Machinery, New York. https://doi.org/10.1145/3357384.3358090
Singhal R, Singh P (2018) Performance assurance model for applications on spark platform. In: Nambiar, R, Poess, M. (eds.) Performance Evaluation and Benchmarking for the Analytics Era, pp 131–146. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_10
Gounaris A, Torres J (2018) A methodology for spark parameter tuning. Big Data Research 11:22–32. https://doi.org/10.1016/j.bdr.2017.05.001
Bao L, Liu X, Chen W (2018) Learning-based automatic parameter tuning for big data analytics frameworks. arXiv:1808.06008. https://doi.org/10.1109/BigData.2018.8622018
Zhu Y, Liu J, Guo M, Bao Y, Ma W, Liu Z, Song K, Yang Y (2017) Bestconfig: Tapping the performance potential of systems via automatic configuration tuning. In: Proceedings of the 2017 symposium on cloud computing. SoCC ’17, pp 338–350. Association for Computing Machinery, New York. https://doi.org/10.1145/3127479.3128605
Kumar S, Padakandla S, Chandrashekar L, Parihar P, Gopinath K, Bhatnagar S (2017) Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach. In: 2017 IEEE 10th International conference on cloud computing (CLOUD), pp 375–382. https://doi.org/10.1109/CLOUD.2017.55
Filho ERL, de Almeida EC, Scherzinger S, Herodotou H (2021) Investigating automatic parameter tuning for sql-on-hadoop systems. Big Data Research 25:100204. https://doi.org/10.1016/j.bdr.2021.100204
Chen Y, Goetsch P, Hoque MA, Lu J, Tarkoma S (2022) d-simplexed: Adaptive delaunay triangulation for performance modeling and prediction on big data analytics. IEEE Transactions on Big Data 8:458–469. https://doi.org/10.1109/TBDATA.2019.2948338
Wang H, Rafatirad S, Homayoun H (2019) A+ tuning: architecture+application auto-tuning for in-memory data-processing frameworks. In: 2019 IEEE 25th International conference on parallel and distributed systems (ICPADS), pp 163–166. https://doi.org/10.1109/ICPADS47876.2019.00032
Zhang J, Liu Y, Zhou K, Li G, Xiao Z, Cheng B, Xing J, Wang Y, Cheng T, Liu L, Ran M, Li Z (2019) An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: Proceedings of the 2019 International conference on management of data. SIGMOD ’19, pp 415–432. Association for computing machinery, New York. https://doi.org/10.1145/3299869.3300085
Bitsakos C, Konstantinou I, Koziris N (2018) DERP: A deep reinforcement learning cloud system for elastic resource provisioning. In: Proceedings of the international conference on cloud computing technology and science, CloudCom, vol 2018-Decem, pp 21–29. https://doi.org/10.1109/CloudCom2018.2018.00020
Li G, Zhou X, Li S, Gao B (2019) QTune: A query-aware database tuning system with deep reinforcement learning. Proc VLDB Endow 12(12):2118–2130. https://doi.org/10.14778/3352063.3352129
Sutton RS, Barto AG (2018) Reinforcement learning: An Introduction. A Bradford Book, Cambridge, MA, USA. https://doi.org/10.5555/3312046
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359. https://doi.org/10.1038/nature24270
Azhikodan AR, Bhat AGK, Jadhav MV (2019) Stock trading bot using deep reinforcement learning. In: Saini, HS, Sayal, R, Govardhan, A, Buyya, R, (eds.) Innovations in computer science and engineering, pp 41–49. Springer, Singapore
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: International conference on learning representations, Puerto Rico
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations, puerto rico
Gaskett C, Wettergreen D, Zelinsky A (1999) Q-learning in continuous state and action spaces. In: Foo, N. (ed.) Advanced topics in artificial intelligence, pp 417–428. Springer, Berlin, Heidelberg
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Dy, JG, Krause, A (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML. Proceedings of machine learning research, vol 80, pp 1582–1591. PMLR, Stockholmsmässan, Stockholm, Sweden. http://proceedings.mlr.press/v80/fujimoto18a.html. Accessed 23 Aug 2021
van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-Learning AAAI press
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. CoRR:1606.01540. arXiv:https://arxiv.org/abs/1606.01540
Wang H-n, Liu N, Zhang Y-y, Feng D-w, Huang F, Li D-s, Zhang Y-m (2020) Deep reinforcement learning: a survey. Frontiers of Information Technology and Electronic Engineering 21:1726–1744. https://doi.org/10.1631/FITEE.1900533
Morgan AS, Nandha D, Chalvatzaki G, D’Eramo C, Dollar AM, Peters J (2021) Model predictive actor-critic: Accelerating robot skill acquisition with deep reinforcement learning. 2021 IEEE International Conference on Robotics and Automation (ICRA), pp 6672–6678
Wong C-C, Chien S-Y, Feng H-M, Aoyama H (2021) Motion planning for dual-arm robot based on soft actor-critic. IEEE Access 9:26871–26885
Pantoja-Garcia L, Garcia-Rodriguez R, Parra-Vega V (2022) Adaptive actor-critic with integral sliding manifold for learning control of robots. In: Moreno, HA, Carrera, IG, Ramírez-Mendoza, RA, Baca, J, Banfield, IA (eds.) Advances in automation and robotics research, vol 347, pp 101–108. Springer, Cham. https://doi.org/10.1007/978-3-030-90033-5_12
Archetti F, Candelieri A (2019) The surrogate model, pp 37–56. Springer, Cham. https://doi.org/10.1007/978-3-030-24494-1_3
Archetti F, Candelieri A (2019) The acquisition function, pp 57–72. Springer, Cham. https://doi.org/10.1007/978-3-030-24494-1_4
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633. https://doi.org/10.14778/2180912.2180915
Alipourfard O, Liu HH, Chen J, Venkataraman S, Yu M, Zhang M (2017) Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In: 14th USENIX symposium on networked systems design and implementation (NSDI 17), pp 469–482. USENIX Association, Boston, MA. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/alipourfard. Accessed 17 Sep 2021
Funding
this work is supported and funded by the Walloon region, Belgium.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests/Competing Interests
The Authors declare that there is no conflict of interest
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ben Slimane, N., Sagaama, H., Marwani, M. et al. Mjolnir: A framework agnostic auto-tuning system with deep reinforcement learning. Appl Intell 53, 14008–14022 (2023). https://doi.org/10.1007/s10489-022-03956-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03956-9