Skip to main content

Advertisement

Log in

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The Cloud paradigm is at a critical point in which the existing energy-efficiency techniques are reaching a plateau, while the computing resources demand at Data Center facilities continues to increase exponentially. The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. For this purpose, we will implement a powerful generative algorithm: Generative Adversarial Networks (GANs). Specifically, our work combines the disciplines of GAN-based data augmentation and scenario forecasting, filling the gap in the generation of synthetic data in DCs. Furthermore, we propose a methodology to increase the variability and heterogeneity of the generated data by introducing on-demand anomalies without additional effort or expert knowledge. We also suggest the use of Kullback-Leibler Divergence and Mean Squared Error as new metrics in the validation of synthetic time series generation, as they provide a better overall comparison of multivariate data distributions. We validate our approach using real data collected in an operating Data Center, successfully generating synthetic data helpful for prediction and optimization models. Our research will help optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Adam Data Centers [https://adam.es/data-center/]

  2. TytheTools [https://www.tychetools.com]

References

  1. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Precup D, Teh YW (eds) 214–223. PMLR. International Convention Centre, Sydney

  2. Cisco (2020) Annual Internet Report (2018–2023) White Paper. Technical report, Cisco

  3. Sandvine (2019) Global Internet Phenomena. Technical report, Sandvine

  4. Chintala S (2017) NIPS 2016 Workshop on Adversarial Training: How to train a GAN

  5. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Computing Research Repository, arXiv:1412.6980

  6. European Commission Team FPFIS (2017) Trends in data centre energy consumption under the European Code of Conduct for data centre energy efficiency. Technical report, European Commission

  7. Masanet E, Shehabi A, Lei N, Smith S, Koomey J (2020) Recalibrating global data center energy-use estimates. Science 367(6481):984–986. Publisher: American Association for the Advancement of Science Section: Policy Forum

    Article  Google Scholar 

  8. Belkhir L, Elmeligi A (2018) Assessing ict global emissions footprint: Trends to 2040 & recommendations. J Clean Prod 177:448–463

    Article  Google Scholar 

  9. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15. JMLR.org, pp 448–456

  10. Jones N (2018) How to stop data centres from gobbling up the world’s electricity. Nature 561 (7722):163–166. Publisher: Nature Publishing Group

    Article  Google Scholar 

  11. Dell (2018) Intergenerational Energy Efficiency of Dell EMC PowerEdge Servers. Technical report, DellEMC white paper

  12. Cisco (2018) Global Cloud Index: Forecast and Methodology, 2016–2021. Technical report, Cisco

  13. Institute U (2020) Annual Data Center Survey Results 2020. Technical report, Uptime Institute, Intelligence Department

  14. Evans R, Gao J (2016) DeepMind AI Reduces Google Data Centre Cooling Bill by 40%. DeepMind Blog

  15. Lebaredian R (2019) Synthetic Data will Drive Next Wave of Business Applications - GTC Silicon Valley 2019

  16. Duemig K (2017) Accelerating time-to-market with fabricated test data. IBM Big Data & Analytics Hub

  17. Kohlberger T, Liu Y (2020) Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models. Google AI Blog

  18. Garfinkel SL, Abowd JM, Benedetto GL (2020) Modernization of Statistical Disclosure Limitation at US Census Bureau. Technical report, US Census Bureau

  19. Wang J, Perez L, et al. (2017) The effectiveness of data augmentation in image classification using deep learning. Convol Neural Netw Vis Recognit 11:1–8

    Google Scholar 

  20. Shorten C, Khoshgoftaar T M (July 2019) A survey on Image Data Augmentation for Deep Learning. J Big Data 6(1):60

  21. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol 27. Curran Associates, Inc., pp 2672–2680

  22. Li Z, Ma C, Shi X, Zhang D, Li W, Wu L (2021) Tsa-gan: A robust generative adversarial networks for time series augmentation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–8

  23. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C (2017) Improved training of Wasserstein GANs. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 5767–5777

  24. Tsilingiris P (2008) Thermophysical and transport properties of humid air at temperature range between 0 and 100 c. Energy Conver Man 49:1098–1110

  25. Um T T, Pfister F M J, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kulić D (2017) Data augmentation of wearable sensor data for parkinsons disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17. Association for Computing Machinery, New York, pp 216–220

  26. Iwana B K, Uchida S (2020) An empirical survey of data augmentation for time series classification with neural networks. arXiv:2007.15951

  27. Bandara K, Hewamalage H, Liu Y-H, Kang Y, Bergmeir C (2021) Improving the accuracy of global forecasting models using time series data augmentation. Pattern Recogn 120:108148

    Article  Google Scholar 

  28. Yu J, Jiang Y, Yan Y (2019) A simulation study on heat recovery of data center: A case study in Harbin, China. Renew Energy 130:154–173

    Article  Google Scholar 

  29. Fernández-Cerero D, Fernández-Montes A, Jakóbik A, Kołodziej J, Toro M (2018) Score: Simulator for cloud optimization of resources and energy consumption. Simul Model Pract Theory 82:160–173

    Article  Google Scholar 

  30. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958

  31. Siami-Namini S, Tavakoli N, Namin A S (2018) A comparison of arima and lstm in forecasting time series. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp 1394–1401

  32. Conejo A J, Plazas M A, Espinola R, Molina A B (2005) Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 20(2):1035– 1042

    Article  Google Scholar 

  33. Zhuang J, Tang T, Ding Y, Tatikonda S, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Conference on Neural Information Processing Systems

  34. Zucchini W, MacDonald I L, Langrock R (2017) Hidden markov models for time series: an introduction using R, 2nd edn. CRC Press

  35. Pole A, West M, Harrison J (2018) Applied bayesian forecasting and time series analysis. Chapman and Hall/CRC

  36. West M (2020) Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions. Ann Inst Stat Math 72(1):1–31

    Article  MathSciNet  MATH  Google Scholar 

  37. Pérez S, Pérez J, Arroba P, Blanco R, Ayala J L, Moya J M (2019) Predictive gpu-based adas management in energy-conscious smart cities. In: 2019 ieee international smart cities conference (isc2). IEEE, pp 349–354

  38. Yin C, Dai Q (2021) A deep multivariate time series multistep forecasting network. Appl Intell

  39. Pérez J, Pérez S, Moya J M, Arroba P (2018) Thermal prediction for immersion cooling data centers based on recurrent neural networks. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer, pp 491–498

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  41. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 6626–6637

  42. Huang K, Wu S, Li F, Yang C, Gui W (2021) Fault diagnosis of hydraulic systems based on deep learning model with multirate data samples. IEEE Trans Neural Netw Learn Syst:1–13

  43. Esteban C, Hyland S L, Rätsch G (2017) Real-valued (medical) time series generation with recurrent conditional GANs. arXiv:1706.02633 [cs, stat]

  44. Lan J, Guo Q, Sun H (2018) Demand side data generating based on conditional generative adversarial networks. Energy Procedia 152:1188–1193

    Article  Google Scholar 

  45. Fekri M N, Ghosh A M, Grolinger K (2020) Generating energy data for machine learning with recurrent generative adversarial networks. Energies 13(1)

  46. Zhang C, Kuppannagari S R, Kannan R, Prasanna V K (2018) Generative adversarial network for synthetic time series data generation in smart grids. In: 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pp 1–6

  47. Alzantot M, Chakraborty S, Srivastava M (2017) SenseGen: A deep learning architecture for synthetic sensor data generation. In: 2017 IEEE International conference on pervasive computing and communications workshops (PerCom Workshops), pp 188–193

  48. Alharbi F, Ouarbya L, Ward J A (2020) Synthetic sensor data for human activity recognition. In: 2020 International Joint Conference on Neural Networks (IJCNN). ISSN: 2161-4407, pp 1–9

  49. Norgaard S, Saeedi R, Sasani K, Gebremedhin A H (July 2018) Synthetic sensor data generation for health applications: a supervised deep learning approach. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). ISSN: 1558-4615, pp 1164–1167

  50. Harada S, Hayashi H, Uchida S (2019) Biosignal generation and latent variable analysis with recurrent generative adversarial networks. IEEE Access 7:144292–144302

    Article  Google Scholar 

  51. Pérez J (2021) Code GAN scenario forecasting - GitHub. https://github.com/jaimeperezsanchez/GAN_Scenario_Forecasting. Accessed 13 April 2022

  52. Rahmani R, Moser I, Seyedmahmoudian M (2018) A complete model for modular simulation of data centre power load. arXiv:1804.00703

  53. Ramponi G, Protopapas P, Brambilla M, Janssen R (2018) T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. arXiv:1811.08295

  54. Yu Y, Gong Z, Zhong P, Shan J (2017) Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: Zhao Y, Kong X, Taubman D (eds) Image and Graphics, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 97–108

  55. Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: Advances in neural information processing systems, vol 32. Curran Associates, Inc., pp 5508–5518

  56. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations

  57. Energy Star. Hot Aisle/Cold Aisle Layout, 2012. https://www.energystar.gov/products/low_carbon_it_campaign/12_ways_save_energy_data_center/hot_aisle_cold_aisle_layout. Accessed 13 April 2022

  58. Summers C, Dinneen MJ (2019) Improved mixed-example data augmentation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1262–1270

  59. Ni H, Szpruch L, Wiese M, Liao S, Xiao B (2020) Conditional sig-wasserstein gans for time series generation. arXiv:2006.05421

  60. Lin Z, Jain A, Wang C, Fanti G, Sekar V (2020) Using GANs for sharing networked time series data: challenges, initial promise, and open questions. In: Proceedings of the ACM Internet Measurement Conference, IMC ’20. Association for Computing Machinery, New York, pp 464–483

  61. Wang Y, Liu Y, Kirschen D S (2017) Scenario reduction with submodular optimization. IEEE Trans Power Syst 32(3):2479–2480

    Article  Google Scholar 

  62. Chen Y, Wang X, Zhang B (2018) An unsupervised deep learning approach for scenario forecasts. In: 2018 Power Systems Computation Conference (PSCC), pp 1–7

  63. Jiang C, Mao Y, Chai Y, Yu M, Tao S (2018) Scenario generation for wind power using improved generative adversarial networks. IEEE Access 6:62193–62203

    Article  Google Scholar 

  64. Zhang Y, Ai Q, Xiao F, Hao R, Lu T (2020) Typical wind power scenario generation for multiple wind farms using conditional improved Wasserstein generative adversarial network. Int J Electr Power Energy Syst 114:105388

    Article  Google Scholar 

Download references

Acknowledgements

This project has been partially supported by the Spanish Ministry of Science and Innovation under the grant PID2019-110866RB-I00, Adam Data Centers and Tychetools.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaime Pérez.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pérez, J., Arroba, P. & Moya, J.M. Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks. Appl Intell 53, 1469–1486 (2023). https://doi.org/10.1007/s10489-022-03557-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03557-6

Keywords

Navigation