Skip to main content
Log in

CGAN-based synthetic multivariate time-series generation: a solution to data scarcity in solar flare forecasting

Neural Computing and Applications Aims and scope Submit manuscript


One of the major bottlenecks in refining supervised algorithms is data scarcity. This might be caused by a number of reasons often rooted in extremely expensive and lengthy data collection processes. In natural domains such as Heliophysics, it may take decades for sufficiently large samples for machine learning purposes. Inspired by the massive success of generative adversarial networks (GANs) in generating synthetic images, in this study we employed the conditional GAN (CGAN) on a recently released benchmark dataset tailored for solar flare forecasting. Our goal is to generate synthetic multivariate time-series data that (1) are statistically similar to the real data and (2) improve the performance of flare prediction when used to remedy the scarcity of strong flares. To evaluate the generated samples, first, we used the Kullback–Leibler divergence and adversarial accuracy measures to quantify the similarity between the real and synthetic data in terms of their descriptive statistics. Second, we evaluated the impact of the generated samples by training a predictive model on their descriptive statistics, which resulted in a significant improvement (over 1100% in TSS and 350% in HSS). Third, we used the generated time series to examine their high-dimensional contribution to mitigating the scarcity of the strong flares, which we also observed a significant improvement in terms of TSS (4%, 7%, and 31%) and HSS (75%, 35%, and 72%), compared to oversampling, undersampling, and synthetic oversampling methods, respectively. We believe our findings can open new doors toward more robust and accurate flare forecasting models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. Council NR (2008) Severe space weather events-understanding societal and economic impacts: a workshop report. The National Academies Press, Washington, DC

    Google Scholar 

  2. Boteler DH (2003) Geomagnetic hazards to conducting networks. Natl Hazards 28(2):537–561

    Article  Google Scholar 

  3. Benz AO (2008) Flare observations. Living Rev Sol Phys

  4. Martens PC, Angryk RA (2017) Data handling and assimilation for solar event prediction. In: Proceedings of the international astronomical union, 13(S335), pp 344–347.

  5. Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Icml, vol 97. ICML, pp 179–186

  6. Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2–3):427–436

    Article  Google Scholar 

  7. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232

    Article  Google Scholar 

  8. Ahmadzadeh A, Aydin B, Kempton DJ, Hostetter M, Angryk RA, Georgoulis MK, Mahajan SS (2019) Rare-event time series prediction: a case study of solar flare forecasting. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA), pp 1814–1820.

  9. Hostetter M, Angryk RA (2020) First steps toward synthetic sample generation for machine learning based flare forecasting. In: Wu X, Jermaine C, Xiong L, Hu X, Kotevska O, Lu S, Xu W, Aluru S, Zhai C, Al-Masri E, Chen Z, Saltz J (eds) IEEE international conference on big data, big data 2020, Atlanta, GA, USA, December 10–13, 2020, IEEE, pp. 4208–4217.

  10. Chen Y, Kempton DJ, Ahmadzadeh A, Angryk RA (2021) Towards synthetic multivariate time series generation for flare forecasting. Cham, pp 296–307.

  11. Ahmadzadeh A, Aydin B, Georgoulis MK, Kempton DJ, Mahajan SS, Angryk RA (2021) How to train your flare prediction model: revisiting robust sampling of rare events. Astrophys J Suppl Ser, 254(2), p 23.

  12. Angryk RA, Martens PC, Aydin B, Kempton D, Mahajan SS, Basodi S, Ahmadzadeh A, Cai X, Boubrahimi SF, Hamdi SM, Schuh MA, Georgoulis MK (2020) Multivariate time series dataset for space weather data analytics. Sci Data,

  13. Ahmadzadeh A, Hostetter M, Aydin B, Georgoulis MK, Kempton DJ, Mahajan SS, Angryk R (2019) Challenges with extreme class-imbalance and temporal coherence: A study on solar flare data. In: 2019 IEEE international conference on big data (Big Data), pp 1423–1431.

  14. Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357.

    Article  MATH  Google Scholar 

  15. Chan C, Ginosar S, Zhou T, Efros A (2019) Everybody dance now. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE.

  16. Mogren O (2016) C-rnn-gan: a continuous recurrent neural network with adversarial training. In: Constructive machine learning workshop (CML) at NIPS 2016, p 1

  17. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems - Volume 2, ser. NIPS’14. Cambridge, MA, USA: MIT Press, pp 2672-2680.

  18. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th international conference on machine learning - Volume 70., pp 214–223.

  19. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems, ser. NIPS’16. Red Hook, NY, USA: Curran Associates Inc., p. 2180–2188.

  20. Larsen ABL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ser. ICML’16., pp 1558–1566.

  21. Mirza M, Osindero S (2014) Conditional generative adversarial nets.

  22. Esteban C, Hyland SL, Rätsch G (2017) Real-valued (medical) time series generation with recurrent conditional gans. arXiv:1706.02633

  23. Lin Z, Jain A, Wang C, Fanti G, Sekar V (2020) Using gans for sharing networked time series data: challenges, initial promise, and open questions. In: Proceedings of the ACM internet measurement conference, ser. IMC ’20. New York, NY, USA: Association for Computing Machinery, pp 464-483.

  24. Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2018) Generative adversarial network for synthetic time series data generation in smart grids. In: 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm), pp 1–6.

  25. Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: Advances in neural information processing systems, pp 5508–5518

  26. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65.

    Article  Google Scholar 

  27. Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, arXiv:abs/1511.06434

  28. Jin Y, Zhang J, Li M, Tian Y, Zhu H, Fang Z (2017) Towards the automatic anime characters creation with generative adversarial networks. arXiv:1708.05509

  29. Huang R, Zhang S, Li T, He R (2017) Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In: 2017 IEEE international conference on computer vision (ICCV), pp 2458–2467.

  30. Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Van Gool L (2017) Pose guided person image generation. In: Proceedings of the 31st international conference on neural information processing systems, ser. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., pp 405–415.

  31. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5967–5976.

  32. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 2242–2251.

  33. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 5908–5916.

  34. Sønderby C, Caballero J, Theis L, Shi W, Huszár F (2017) Amortised map inference for image super-resolution. In: International conference on learning representations.

  35. Ledig C, Theis L, Huszár F, Caballero J, Aitken AP, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 105–114.

  36. Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas J (2018) Deblurgan: blind motion deblurring using conditional adversarial networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8183–8192.

  37. Haradal S, Hayashi H, Uchida S (2018) Biosignal data augmentation based on generative adversarial networks. In: 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 368–371.

  38. Simonetto L (2018) Generating spiking time series with generative adversarial networks : an application on banking transactions

  39. Angryk R, Martens P, Aydin B, Kempton D, Mahajan S, Basodi S, Ahmadzadeh A, Cai X, Filali Boubrahimi S, Hamdi SM, Schuh M, Georgoulis M (2020) SWAN-SF.

  40. Hoeksema JT, Liu Y, Hayashi K, Sun X, Schou J, Couvidat S, Norton A, Bobra M, Centeno R, Leka KD, Barnes G, Turmon M (2014) The helioseismic and magnetic imager (HMI) vector magnetic field pipeline: overview and performance. Sol Phys 289(9):3483–3530.

    Article  Google Scholar 

  41. Bobra MG, Sun X, Hoeksema JT, Turmon M, Liu Y, Hayashi K, Barnes G, Leka K (2014) The helioseismic and magnetic imager (hmi) vector magnetic field pipeline: Sharps-space-weather hmi active region patches. Solar Phys 289(9):3549–3578.

    Article  Google Scholar 

  42. Yeoleka A, Patel S, Talla S, Puthucode K. R, Ahmadzadeh A, Sadykov VM, Angryk RA (2021) Feature selection on a flare forecasting testbed: a comparative study of 24 methods. arXiv:2109.14770

  43. Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1

    Article  Google Scholar 

  44. Hanssen A, Kuipers W (1965) On the relationship between the frequency of rain and various meteorological parameters: (with reference to the problem ob objective forecasting), ser. Koninkl. Nederlands Meterologisch Institut. Mededelingen en Verhandelingen. Staatsdrukkerij- en Uitgeverijbedrijf .

  45. Balch CC (2008) Updated verification of the space weather prediction center’s solar energetic particle prediction model. Space Wea Int J Res Appl, 6(1).

  46. Brownlee J (2019) Generative adversarial networks with python: deep learning generative models for image synthesis and image translation. Mach Learn Mastery.

  47. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86.

    Article  MathSciNet  MATH  Google Scholar 

  48. Yale A, Dash S, Dutta R, Guyon I, Pavao A, Bennett KP (2019) Privacy preserving synthetic health data. F1000Research,

  49. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G. S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. software available from [Online].

  50. Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G (2013) API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning, pp 108–122

  51. Hsu C-W, Chang C-C, Lin C-J et al (2003) A practical guide to support vector classification’

  52. Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. Methods Mol Biol 609:223–39.

    Article  Google Scholar 

Download references


This project has been supported in part by funding from the Division of Advanced Cyberinfrastructure within the Directorate for Computer and Information Science and Engineering, the Division of Atmospheric & Geospace Sciences within the Directorate for Geosciences, under NSF awards #193155 and # 1936361.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Kempton, D.J., Ahmadzadeh, A. et al. CGAN-based synthetic multivariate time-series generation: a solution to data scarcity in solar flare forecasting. Neural Comput & Applic 34, 13339–13353 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: