Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Pérez, Jaime; Arroba, Patricia; Moya, José M.

doi:10.1007/s10489-022-03557-6

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Published: 29 April 2022

Volume 53, pages 1469–1486, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1029 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

The Cloud paradigm is at a critical point in which the existing energy-efficiency techniques are reaching a plateau, while the computing resources demand at Data Center facilities continues to increase exponentially. The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. For this purpose, we will implement a powerful generative algorithm: Generative Adversarial Networks (GANs). Specifically, our work combines the disciplines of GAN-based data augmentation and scenario forecasting, filling the gap in the generation of synthetic data in DCs. Furthermore, we propose a methodology to increase the variability and heterogeneity of the generated data by introducing on-demand anomalies without additional effort or expert knowledge. We also suggest the use of Kullback-Leibler Divergence and Mean Squared Error as new metrics in the validation of synthetic time series generation, as they provide a better overall comparison of multivariate data distributions. We validate our approach using real data collected in an operating Data Center, successfully generating synthetic data helpful for prediction and optimization models. Our research will help optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TimeGAN for Data-Driven AI in High-Dimensional Industrial Data

Towards Synthetic Multivariate Time Series Generation for Flare Forecasting

Framework for Upscaling Missing Data in Electricity Consumption Datasets Using Generative Adversarial Networks

Notes

Adam Data Centers [https://adam.es/data-center/]
TytheTools [https://www.tychetools.com]

References

Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Precup D, Teh YW (eds) 214–223. PMLR. International Convention Centre, Sydney
Cisco (2020) Annual Internet Report (2018–2023) White Paper. Technical report, Cisco
Sandvine (2019) Global Internet Phenomena. Technical report, Sandvine
Chintala S (2017) NIPS 2016 Workshop on Adversarial Training: How to train a GAN
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Computing Research Repository, arXiv:1412.6980
European Commission Team FPFIS (2017) Trends in data centre energy consumption under the European Code of Conduct for data centre energy efficiency. Technical report, European Commission
Masanet E, Shehabi A, Lei N, Smith S, Koomey J (2020) Recalibrating global data center energy-use estimates. Science 367(6481):984–986. Publisher: American Association for the Advancement of Science Section: Policy Forum
Article Google Scholar
Belkhir L, Elmeligi A (2018) Assessing ict global emissions footprint: Trends to 2040 & recommendations. J Clean Prod 177:448–463
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15. JMLR.org, pp 448–456
Jones N (2018) How to stop data centres from gobbling up the world’s electricity. Nature 561 (7722):163–166. Publisher: Nature Publishing Group
Article Google Scholar
Dell (2018) Intergenerational Energy Efficiency of Dell EMC PowerEdge Servers. Technical report, DellEMC white paper
Cisco (2018) Global Cloud Index: Forecast and Methodology, 2016–2021. Technical report, Cisco
Institute U (2020) Annual Data Center Survey Results 2020. Technical report, Uptime Institute, Intelligence Department
Evans R, Gao J (2016) DeepMind AI Reduces Google Data Centre Cooling Bill by 40%. DeepMind Blog
Lebaredian R (2019) Synthetic Data will Drive Next Wave of Business Applications - GTC Silicon Valley 2019
Duemig K (2017) Accelerating time-to-market with fabricated test data. IBM Big Data & Analytics Hub
Kohlberger T, Liu Y (2020) Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models. Google AI Blog
Garfinkel SL, Abowd JM, Benedetto GL (2020) Modernization of Statistical Disclosure Limitation at US Census Bureau. Technical report, US Census Bureau
Wang J, Perez L, et al. (2017) The effectiveness of data augmentation in image classification using deep learning. Convol Neural Netw Vis Recognit 11:1–8
Google Scholar
Shorten C, Khoshgoftaar T M (July 2019) A survey on Image Data Augmentation for Deep Learning. J Big Data 6(1):60
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol 27. Curran Associates, Inc., pp 2672–2680
Li Z, Ma C, Shi X, Zhang D, Li W, Wu L (2021) Tsa-gan: A robust generative adversarial networks for time series augmentation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–8
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C (2017) Improved training of Wasserstein GANs. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 5767–5777
Tsilingiris P (2008) Thermophysical and transport properties of humid air at temperature range between 0 and 100 ^∘c. Energy Conver Man 49:1098–1110
Um T T, Pfister F M J, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kulić D (2017) Data augmentation of wearable sensor data for parkinsons disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17. Association for Computing Machinery, New York, pp 216–220
Iwana B K, Uchida S (2020) An empirical survey of data augmentation for time series classification with neural networks. arXiv:2007.15951
Bandara K, Hewamalage H, Liu Y-H, Kang Y, Bergmeir C (2021) Improving the accuracy of global forecasting models using time series data augmentation. Pattern Recogn 120:108148
Article Google Scholar
Yu J, Jiang Y, Yan Y (2019) A simulation study on heat recovery of data center: A case study in Harbin, China. Renew Energy 130:154–173
Article Google Scholar
Fernández-Cerero D, Fernández-Montes A, Jakóbik A, Kołodziej J, Toro M (2018) Score: Simulator for cloud optimization of resources and energy consumption. Simul Model Pract Theory 82:160–173
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958
Siami-Namini S, Tavakoli N, Namin A S (2018) A comparison of arima and lstm in forecasting time series. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp 1394–1401
Conejo A J, Plazas M A, Espinola R, Molina A B (2005) Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 20(2):1035– 1042
Article Google Scholar
Zhuang J, Tang T, Ding Y, Tatikonda S, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Conference on Neural Information Processing Systems
Zucchini W, MacDonald I L, Langrock R (2017) Hidden markov models for time series: an introduction using R, 2nd edn. CRC Press
Pole A, West M, Harrison J (2018) Applied bayesian forecasting and time series analysis. Chapman and Hall/CRC
West M (2020) Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions. Ann Inst Stat Math 72(1):1–31
Article MathSciNet MATH Google Scholar
Pérez S, Pérez J, Arroba P, Blanco R, Ayala J L, Moya J M (2019) Predictive gpu-based adas management in energy-conscious smart cities. In: 2019 ieee international smart cities conference (isc2). IEEE, pp 349–354
Yin C, Dai Q (2021) A deep multivariate time series multistep forecasting network. Appl Intell
Pérez J, Pérez S, Moya J M, Arroba P (2018) Thermal prediction for immersion cooling data centers based on recurrent neural networks. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer, pp 491–498
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 6626–6637
Huang K, Wu S, Li F, Yang C, Gui W (2021) Fault diagnosis of hydraulic systems based on deep learning model with multirate data samples. IEEE Trans Neural Netw Learn Syst:1–13
Esteban C, Hyland S L, Rätsch G (2017) Real-valued (medical) time series generation with recurrent conditional GANs. arXiv:1706.02633 [cs, stat]
Lan J, Guo Q, Sun H (2018) Demand side data generating based on conditional generative adversarial networks. Energy Procedia 152:1188–1193
Article Google Scholar
Fekri M N, Ghosh A M, Grolinger K (2020) Generating energy data for machine learning with recurrent generative adversarial networks. Energies 13(1)
Zhang C, Kuppannagari S R, Kannan R, Prasanna V K (2018) Generative adversarial network for synthetic time series data generation in smart grids. In: 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pp 1–6
Alzantot M, Chakraborty S, Srivastava M (2017) SenseGen: A deep learning architecture for synthetic sensor data generation. In: 2017 IEEE International conference on pervasive computing and communications workshops (PerCom Workshops), pp 188–193
Alharbi F, Ouarbya L, Ward J A (2020) Synthetic sensor data for human activity recognition. In: 2020 International Joint Conference on Neural Networks (IJCNN). ISSN: 2161-4407, pp 1–9
Norgaard S, Saeedi R, Sasani K, Gebremedhin A H (July 2018) Synthetic sensor data generation for health applications: a supervised deep learning approach. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). ISSN: 1558-4615, pp 1164–1167
Harada S, Hayashi H, Uchida S (2019) Biosignal generation and latent variable analysis with recurrent generative adversarial networks. IEEE Access 7:144292–144302
Article Google Scholar
Pérez J (2021) Code GAN scenario forecasting - GitHub. https://github.com/jaimeperezsanchez/GAN_Scenario_Forecasting. Accessed 13 April 2022
Rahmani R, Moser I, Seyedmahmoudian M (2018) A complete model for modular simulation of data centre power load. arXiv:1804.00703
Ramponi G, Protopapas P, Brambilla M, Janssen R (2018) T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. arXiv:1811.08295
Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: Zhao Y, Kong X, Taubman D (eds) Image and Graphics, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 97–108
Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: Advances in neural information processing systems, vol 32. Curran Associates, Inc., pp 5508–5518
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations
Energy Star. Hot Aisle/Cold Aisle Layout, 2012. https://www.energystar.gov/products/low_carbon_it_campaign/12_ways_save_energy_data_center/hot_aisle_cold_aisle_layout. Accessed 13 April 2022
Summers C, Dinneen MJ (2019) Improved mixed-example data augmentation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1262–1270
Ni H, Szpruch L, Wiese M, Liao S, Xiao B (2020) Conditional sig-wasserstein gans for time series generation. arXiv:2006.05421
Lin Z, Jain A, Wang C, Fanti G, Sekar V (2020) Using GANs for sharing networked time series data: challenges, initial promise, and open questions. In: Proceedings of the ACM Internet Measurement Conference, IMC ’20. Association for Computing Machinery, New York, pp 464–483
Wang Y, Liu Y, Kirschen D S (2017) Scenario reduction with submodular optimization. IEEE Trans Power Syst 32(3):2479–2480
Article Google Scholar
Chen Y, Wang X, Zhang B (2018) An unsupervised deep learning approach for scenario forecasts. In: 2018 Power Systems Computation Conference (PSCC), pp 1–7
Jiang C, Mao Y, Chai Y, Yu M, Tao S (2018) Scenario generation for wind power using improved generative adversarial networks. IEEE Access 6:62193–62203
Article Google Scholar
Zhang Y, Ai Q, Xiao F, Hao R, Lu T (2020) Typical wind power scenario generation for multiple wind farms using conditional improved Wasserstein generative adversarial network. Int J Electr Power Energy Syst 114:105388
Article Google Scholar

Download references

Acknowledgements

This project has been partially supported by the Spanish Ministry of Science and Innovation under the grant PID2019-110866RB-I00, Adam Data Centers and Tychetools.

Author information

Authors and Affiliations

IIT - Instituto de Investigación Tecnológica, Universidad Pontificia Comillas, 28015, Madrid, Spain
Jaime Pérez
LSI - Integrated Systems Laboratory, ETSIT, Universidad Politécnica de Madrid, 28040, Madrid, Spain
Patricia Arroba & José M. Moya
CCS - Center for Computational Simulation, Campus de Montegancedo UPM, 28660, Madrid, Spain
Patricia Arroba & José M. Moya

Authors

Jaime Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Arroba
View author publications
You can also search for this author in PubMed Google Scholar
José M. Moya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaime Pérez.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez, J., Arroba, P. & Moya, J.M. Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks. Appl Intell 53, 1469–1486 (2023). https://doi.org/10.1007/s10489-022-03557-6

Download citation

Accepted: 27 March 2022
Published: 29 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03557-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Abstract

Access this article

Similar content being viewed by others

TimeGAN for Data-Driven AI in High-Dimensional Industrial Data

Towards Synthetic Multivariate Time Series Generation for Flare Forecasting

Framework for Upscaling Missing Data in Electricity Consumption Datasets Using Generative Adversarial Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Abstract

Access this article

Similar content being viewed by others

TimeGAN for Data-Driven AI in High-Dimensional Industrial Data

Towards Synthetic Multivariate Time Series Generation for Flare Forecasting

Framework for Upscaling Missing Data in Electricity Consumption Datasets Using Generative Adversarial Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation