Skip to main content
Log in

A novel deep generative modeling-based data augmentation strategy for improving short-term building energy predictions

  • Research Article
  • Published:
Building Simulation Aims and scope Submit manuscript

Abstract

Short-term building energy predictions serve as one of the fundamental tasks in building operation management. While large numbers of studies have explored the value of various supervised machine learning techniques in energy predictions, few studies have addressed the potential data shortage problem in developing data-driven models. One promising solution is data augmentation, which aims to enrich existing building data resources for reliable predictive modeling. This study proposes a deep generative modeling-based data augmentation strategy for improving short-term building energy predictions. Two types of conditional variational autoencoders have been designed for synthetic energy data generation using fully connected and one-dimensional convolutional layers respectively. Data experiments have been designed to evaluate the value of data augmentation using actual measurements from 52 buildings. The results indicate that conditional variational autoencoders are capable of generating high-quality synthetic data samples, which in turns helps to enhance the accuracy in short-term building energy predictions. The average performance enhancement ratios in terms of CV-RMSE range between 12% and 18%. Practical guidelines have been obtained to ensure the validity and quality of synthetic building energy data. The research outcomes are valuable for enhancing the robustness and reliability of data-driven models for smart building operation management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

CVAE:

conditional variational autoencoder

CV-RMSE:

coefficient of variation of root mean squared error

GAN:

generative adversarial network

LSTM:

long short-term memory

M 1, M 2, …, M 12 :

month from January to December

P(A,B) :

joint probability of A and B

P(A∣B) :

conditional probability of A given B

PER:

performance enhancement ratio

RMSE:

root mean squared error

T 1, T 2, …, T n :

time steps from 1 to n

VAE:

variational autoencoder

References

  • Amasyali K, El-Gohary NM (2018). A review of data-driven building energy consumption prediction studies. Renewable and Sustainable Energy Reviews, 81: 1192–1205.

    Article  Google Scholar 

  • Antoniou A, Storkey A, Edwards H (2018). Data augmentation generative adversarial networks. arXiv: 1711.04340v3.

  • Baldi P (2012). Autoencoders, unsupervised learning and deep architectures. JMLR Workshop and Conference Proceedings, 27: 37–50.

    Google Scholar 

  • Bregere M, Bessa RJ (2020). Simulating tariff impact in electrical energy consumption profiles with conditional variational autoencoders. IEEE Access, 8: 131949.

    Article  Google Scholar 

  • Chen Z, Xu P, Feng F, et al. (2021). Data mining algorithm and framework for identifying HVAC control strategies in large commercial buildings. Building Simulation, 14: 63–74.

    Article  Google Scholar 

  • Chollet F, Allaire JJ (2018). Deep Learning with R. New York: Manning Publications.

    Google Scholar 

  • Creswell A, White T, Dumoulin V, et al. (2017). Generative adversarial networks: An overview. In: Proceedings of IEEE Signal Processing Magazine Special Issue on Deep Learning for Visual Understanding.

  • Fan C, Xiao F, Zhao Y (2017). A short-term building cooling load prediction method using deep learning algorithms. Applied Energy, 195: 222–233.

    Article  Google Scholar 

  • Fan C, Sun Y, Zhao Y, et al. (2019a). Deep learning-based feature engineering methods for improved building energy prediction. Applied Energy, 240: 35–45.

    Article  Google Scholar 

  • Fan C, Xiao F, Yan C, et al. (2019b). A novel methodology to explain and evaluate data-driven building energy performance models based on interpretable machine learning. Applied Energy, 235: 1551–1560.

    Article  Google Scholar 

  • Fan C, Wang J, Gang W, et al. (2019c). Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Applied Energy, 236: 700–710.

    Article  Google Scholar 

  • Fan C, Sun Y, Xiao F, et al. (2020). Statistical investigations of transfer learning-based methodology for short-term building energy predictions. Applied Energy, 262: 114499.

    Article  Google Scholar 

  • Fan C, Yan D, Xiao F, et al. (2021a). Advanced data analytics for enhancing building performances: From data-driven to big data-driven approaches. Building Simulation, 14: 3–24.

    Article  Google Scholar 

  • Fan C, Liu X, Xue P, et al. (2021b). Statistical characterization of semi-supervised neural networks for fault detection and diagnosis of air handling units. Energy and Buildings, 234: 110733.

    Article  Google Scholar 

  • Fan C, Liu Y, Liu X, et al, (2021c). A study on semi-supervised learning in enhancing performance of AHU unseen fault detection with limited labeled data. Sustainable Cities and Society, 70: 102874.

    Article  Google Scholar 

  • Fan C, Chen M, Wang X, et al. (2021d). A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Frontiers in Energy Research, 9: 652801.

    Article  Google Scholar 

  • Fawaz HI, Forestier G, Weber J, et al. (2018). Data augmentation using synthetic data for time series classification with deep residual networks. arXiv: 10808.02455v1.

  • Frid-Adar M, Klang E, Amitai M, et al. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. In: Proceedings of IEEE 15th International Symposium on Biomedical Imaging.

  • Gal Y, Ghahramani Z (2016). A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of NIPS.

  • Gong M, Wang J, Bai Y, Li B, Zhang L (2020). Heat load prediction of residential buildings based on discrete wavelet transform and tree-based ensemble learning. Journal of Building Engineering, 32: 101455.

    Article  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016). Deep Learning. Cambridge, MA, USA: MIT Press, USA.

    MATH  Google Scholar 

  • Grubinger T, Chasparis GC, Natschläger T (2017). Generalized online transfer learning for climate control in residential buildings. Energy and Buildings, 139: 63–71.

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2016). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. New York: Springer.

    MATH  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997). Long short-term memory. Neural Computation, 9: 1735–1780.

    Article  Google Scholar 

  • Kingma DP, Welling M (2013). Auto-encoding variational Bayes. arXiv: 1312.6114.

  • Le Guennec A, Malinowski S, Tavenard R (2016). Data augmentation for time series classification using convolutional neural networks. In: Proceedings of ECML/PKDD Workshop in Advanced Analytics and Learning on Temporal Data.

  • Li A, Xiao F, Fan C, et al. (2021). Development of an ANN-based building energy model for information-poor buildings using transfer learning. Building Simulation, 14: 89–101.

    Article  Google Scholar 

  • Miller C, Meggers F (2017). The Building Data Genome Project: An open, public data set from non-residential building electrical meters. Energy Procedia, 122: 439–444.

    Article  Google Scholar 

  • Ng AY, Jordan MI (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01).

  • Piscitelli MS, Brandi S, Capozzoli A, et al. (2021). A data analytics-based tool for the detection and diagnosis of anomalous daily energy patterns in buildings. Building Simulation, 14: 131–147.

    Article  Google Scholar 

  • R Development Core Team (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  • Rashid KM, Louis J (2019). Times-series data augmentation and deep learning for construction equipment activity recognition. Advanced Engineering Informatics, 42: 100944.

    Article  Google Scholar 

  • Ribeiro M, Grolinger K, El Yamany HF, et al. (2018). Transfer learning with seasonal and trend adjustment for cross-building energy forecasting. Energy and Buildings, 165: 352–363.

    Article  Google Scholar 

  • Seyedzadeh S, Rahimian FP, Rastogi P, et al. (2019). Tuning machine learning models for prediction of building energy loads. Sustainable Cities and Society, 47: 101484.

    Article  Google Scholar 

  • Shao S, Wang P, Yan R (2019). Generative adversarial networks for data augmentation in machine fault diagnosis. Computers in Industry, 106: 85–93.

    Article  Google Scholar 

  • Shao M, Wang X, Bu Z, et al. (2020). Prediction of energy consumption in hotel buildings via support vector machines. Sustainable Cities and Society, 57: 102128.

    Article  Google Scholar 

  • Simão M, Neto P, Gibaru O (2019). Improving novelty detection with generative adversarial networks on hand gesture data. Neurocomputing, 358: 437–445.

    Article  Google Scholar 

  • Sohn K, Yan X, Lee H (2015). Learning structured output representation using deep conditional generative models. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15).

  • Sun Y, Haghighat F, Fung BCM (2020). A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy and Buildings, 221: 110022.

    Article  Google Scholar 

  • Tian C, Li C, Zhang G, et al. (2019). Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy and Buildings, 186: 230–243.

    Article  Google Scholar 

  • Um TT, Pfister FMJ, Pichler D, et al. (2017). Data augmentation of wearable sensor data for Parkinson’s disease monitoring using convolutional neural networks. In: Proceedings of ACM International Conference on Multimodal Interaction.

  • Walker S, Khan W, Katic K, et al. (2020). Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings. Energy and Buildings, 209: 109705.

    Article  Google Scholar 

  • Wang R, Lu S, Feng W (2020). A novel improved model for building energy consumption prediction based on model integration. Applied Energy, 262: 114561.

    Article  Google Scholar 

  • Wang Z, Hong T (2020). Generating realistic building electrical load profiles through the Generative Adversarial Network (GAN). Energy and Buildings, 224: 110299.

    Article  Google Scholar 

  • Wang Z, Srinivasan RS (2017). A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models. Renewable and Sustainable Energy Reviews, 75: 796–808.

    Article  Google Scholar 

  • Wei Y, Zhang X, Shi Y, et al. (2018). A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews, 82: 1027–1047.

    Article  Google Scholar 

  • Weiss K, Khoshgoftaar TM, Wang D (2016). A survey of transfer learning. Journal of Big Data, 3: 9.

    Article  Google Scholar 

  • Wen Q, Sun L, Song X, et al. (2020). Time series data augmentation for deep learning: A survey. arXiv: 2002.12478v1.

  • Xu P, Du R, Zhang Z (2019). Predicting pipeline leakage in petrochemical system through GAN and LSTM. Knowledge-Based Systems, 175: 50–61.

    Article  Google Scholar 

  • Yu Z, Haghighat F, Fung BCM, et al. (2010). A decision tree method for building energy demand modeling. Energy and Buildings, 42: 1637–1646.

    Article  Google Scholar 

  • Zhao Y, Zhang C, Zhang Y, et al. (2020). A review of data mining technologies in building energy systems: Load prediction, pattern identification, fault detection and diagnosis. Energy and Built Environment, 1: 149–164.

    Article  Google Scholar 

  • Zhou Y, Chen J, Yu ZJ, et al. (2020). A novel model based on multi-grained cascade forests with wavelet denoising for indoor occupancy estimation. Building and Environment, 167: 106461.

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the support of this research by the National Natural Science Foundation of China (No. 51908365, No. 71772125) and the Philosophical and Social Science Program of Guangdong Province, China (GD18YGL07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, C., Chen, M., Tang, R. et al. A novel deep generative modeling-based data augmentation strategy for improving short-term building energy predictions. Build. Simul. 15, 197–211 (2022). https://doi.org/10.1007/s12273-021-0807-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12273-021-0807-6

Keywords

Navigation