Hourly solar irradiance forecasting based on statistical methods and a stochastic modeling approach for residual error compensation

Nikseresht, Ali; Amindavar, Hamidreza

doi:10.1007/s00477-023-02539-5

Hourly solar irradiance forecasting based on statistical methods and a stochastic modeling approach for residual error compensation

ORIGINAL PAPER
Published: 30 August 2023

Volume 37, pages 4857–4892, (2023)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

910 Accesses
2 Citations
Explore all metrics

Abstract

By reducing fossil fuel use, renewable energy improves the economy, quality of life, and environment. These impacts make renewable energy forecasting crucial for lowering fossil fuel utilization. This paper aims to mathematically improve time series forecasting literature by focusing on solar irradiance applications in Los Angeles, Denver, and Hawaii solar irradiance sites. A three-phased time series forecasting hybrid method is devised for this endeavor. The ARFIMA is used to forecast the original solar irradiance time series in phase I. Next, the dataset’s residuals, are retrieved by subtracting the phase I results from the observed time series to prepare the scenario for the following phase. A novel enhanced fractional Brownian motion is used for residual forecasting in phase II. The parameter estimation in phase II is implemented adaptively to capture the dynamic statistical characteristics of the time series efficiently. Finally, the phases I and II results are numerically conglomerated to form the final forecasting results in phase III. The residual forecasting part, in phase II, reveals a substantial superiority. Also, when comparing the proposed hybrid algorithm results to other existing cutting-edge algorithms applied to the same solar irradiance applications, the output demonstrates that the suggested algorithm has a significantly improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive review and analysis of solar forecasting techniques

Article 05 March 2021

Recursive Estimation Methods to Forecast Short-Term Solar Irradiation

Forecasting of Solar Irradiances using Time Series and Machine Learning Models: A Case Study from India

Article 01 February 2022

Notes

https://sdgresources.relx.com/
It should be noted that not all nonlinear dynamics exhibit complexity, and conversely, not all complexity is attributed to nonlinearity or intrinsic system behavior. For instance, stochastic chaos can possess high complexity despite having small values of the maximum LE. However, within the intersection of nonlinearity and complexity, there exists a realm known as Complex Nonlinear Dynamics. This term encompasses complexity that arises specifically from intrinsic nonlinearities within a system (Mihailović et al. 2018, 2021).
The fractality dealing feature of fBm is advantageous in detecting and modeling heavy-tailed time series.

References

Ahmed AAM, Mohammad AH et al (2022) Optimization algorithms as training approach with hybrid deep learning methods to develop an ultraviolet index forecasting model. Stoch Environ Res Risk Assess 3610(36):3011–3039. https://doi.org/10.1007/S00477-022-02177-3
Article Google Scholar
Akarslan E, Hocaoǧlu FO, Edizkan R (2014) A novel M-D (multi-dimensional) linear prediction filter approach for hourly solar radiation forecasting. Energy. https://doi.org/10.1016/j.energy.2014.06.113
Article Google Scholar
Altman DG, Bland JM (1983) Measurement in medicine: the analysis of method comparison studies. The Statistician. https://doi.org/10.2307/2987937
Article Google Scholar
Andersen T, Bollerslev T, Andersen T, Bollerslev T (1997) Intraday periodicity and volatility persistence in financial markets. J Empir Financ 4:115–158
Article Google Scholar
Azimi R, Ghayekhloo M, Ghofrani M (2016) A hybrid method based on a new clustering technique and multilayer perceptron neural networks for hourly solar radiation forecasting. Energy Convers Manag. https://doi.org/10.1016/j.enconman.2016.04.009
Article Google Scholar
Barndorff-Nielsen OE, Corcuera JM, Podolskij M (2013) Limit theorems for functionals of higher order differences of Brownian semi-stationary processes. Proc Math Stat. https://doi.org/10.1007/978-3-642-33549-5_4
Article Google Scholar
Başakın EE, Ekmekcioğlu Ö, Özger M (2023) Developing a novel approach for missing data imputation of solar radiation: a hybrid differential evolution algorithm based eXtreme gradient boosting model. Energy Convers Manag. https://doi.org/10.1016/j.enconman.2023.116780
Article Google Scholar
Bennedsen M, Lunde A, Pakkanen MS (2021) Decoupling the short- and long-term behavior of stochastic volatility. J Financ Econ. https://doi.org/10.1093/jjfinec/nbaa049
Article Google Scholar
Bensoussan A, Bertrand PR, Brouste A (2012) Forecasting the energy produced by a windmill on a yearly basis. Stoch Environ Res Risk Assess 26:1109–1122. https://doi.org/10.1007/S00477-012-0565-1/METRICS
Article Google Scholar
Beran J (2017) Statistics for long-memory processes, ISBN 9780203738481, 315 Pages, Published November 22, 2017 by Routledge
Beran J, Feng Y, Ghosh S, Kulik R (2013) Long-memory processes: probabilistic properties and statistical methods. https://doi.org/10.1007/978-3-642-35512-7
Biencinto M, González L, Valenzuela L (2022) Using time-windowed solar radiation profiles to assess the daily uncertainty of solar thermal electricity production forecasts. J Clean Prod 379:134821. https://doi.org/10.1016/J.JCLEPRO.2022.134821
Article Google Scholar
Boeing G (2016) Visual analysis of nonlinear dynamical systems: Chaos, fractals, self-similarity and the limits of prediction. Systems. https://doi.org/10.3390/systems4040037
Article Google Scholar
Bollerslev T, Patton AJ, Quaedvlieg R (2016) Exploiting the errors: a simple approach for improved volatility forecasting. J Econ. https://doi.org/10.1016/j.jeconom.2015.10.007
Article Google Scholar
Brown RL, Durbin J, Evans JM (1975) Techniques for testing the constancy of regression relationships over time. J R Stat Soc Ser B. https://doi.org/10.1111/j.2517-6161.1975.tb01532.x
Article Google Scholar
Burnecki K, Sikora G (2013) Estimation of FARIMA parameters in the case of negative memory and stable noise. IEEE Trans Signal Process. https://doi.org/10.1109/TSP.2013.2253773
Article Google Scholar
Caner M, Kilian L (2001) Size distortions of tests of the null hypothesis of stationarity: Evidence and implications for the PPP debate. J Int Money Financ. https://doi.org/10.1016/S0261-5606(01)00011-0
Article Google Scholar
Cecchetti SG, Lam PS (1994) Variance-ratio tests: Small-sample properties with an application to international output data. J Bus Econ Stat. https://doi.org/10.1080/07350015.1994.10510006
Article Google Scholar
Ceferino L, Lin N, Xi D (2022) Stochastic modeling of solar irradiance during hurricanes. Stoch Environ Res Risk Assess 36:2681–2693. https://doi.org/10.1007/S00477-021-02154-2/METRICS
Article Google Scholar
Cheridito P, Kawaguchi H, Maejima M (2003) Fractional ornstein-uhlenbeck processes. Electron J Probab. https://doi.org/10.1214/EJP.v8-125
Article Google Scholar
Colominas MA, Schlotthauer G, Torres ME (2014) Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed Signal Process Control. https://doi.org/10.1016/j.bspc.2014.06.009
Article Google Scholar
Comte F, Renault E (1998) Long memory in continuous-time stochastic volatility models. Math Financ. https://doi.org/10.1111/1467-9965.00057
Article Google Scholar
de Andrade GJBSO, Berchin II, Garcia J et al (2021) A literature-based study on the water–energy–food nexus for sustainable development. Stoch Environ Res Risk Assess 35:95–116. https://doi.org/10.1007/S00477-020-01772-6/METRICS
Article Google Scholar
Diebold F, Inoue A, Diebold F, Inoue A (2001) Long memory and regime switching. J Econ 105:131–159
Article Google Scholar
Ding Z, Granger CWJ (1996) Modeling volatility persistence of speculative returns: a new approach. J Econ 73:185–215. https://doi.org/10.1016/0304-4076(95)01737-2
Article Google Scholar
Dingle K, Kamal R, Hamzi B (2023) A note on a priori forecasting and simplicity bias in time series. Phys A Stat Mech Appl. https://doi.org/10.1016/j.physa.2022.128339
Article Google Scholar
Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression. I. Biometrika. https://doi.org/10.1093/biomet/37.3-4.409
Article Google Scholar
Elder J, Kennedy PE (2001) Testing for unit roots: What should students be taught? J Econ Educ. https://doi.org/10.1080/00220480109595179
Article Google Scholar
Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica. https://doi.org/10.2307/1912773
Article Google Scholar
Gao B, Huang X, Shi J et al (2020) Hourly forecasting of solar irradiance based on CEEMDAN and multi-strategy CNN-LSTM neural networks. Renew Energy. https://doi.org/10.1016/j.renene.2020.09.141
Article Google Scholar
Gatheral J, Jaisson T, Rosenbaum M (2018) Volatility is rough. Quant Financ. https://doi.org/10.1080/14697688.2017.1393551
Article Google Scholar
Geurts M, Box GEP, Jenkins GM (1977) Time series analysis: forecasting and control. J Mark Res. https://doi.org/10.2307/3150485
Article Google Scholar
Glasbey CA (1995) Imputation of missing values in spatio-temporal solar radiation data. Environmetrics. https://doi.org/10.1002/env.3170060405
Article Google Scholar
Granger C, Hyung N, Granger C, Hyung N (2004) Occasional structural breaks and long memory with an application to the S&P 500 absolute stock returns. J Empir Financ 11:399–421
Article Google Scholar
Granger CWJ, Joyeux R (1980) An introduction to long-memory time series models and fractional differencing. J Time Ser Anal. https://doi.org/10.1111/j.1467-9892.1980.tb00297.x
Article Google Scholar
Guermoui M, Gairaa K, Ferkous K et al (2023) Potential assessment of the TVF-EMD algorithm in forecasting hourly global solar radiation: review and case studies. J Clean Prod 385:135680. https://doi.org/10.1016/J.JCLEPRO.2022.135680
Article Google Scholar
Holt CC (2004) Forecasting seasonals and trends by exponentially weighted moving averages. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2003.09.015
Article Google Scholar
Hosking JRM (1981) Fractional differencing. Biometrika. https://doi.org/10.1093/biomet/68.1.165
Article Google Scholar
Huang X, Li Q, Tai Y et al (2021) Hybrid deep neural model for hourly solar irradiance forecasting. Renew Energy. https://doi.org/10.1016/j.renene.2021.02.161
Article Google Scholar
Hussain S, AlAlili A (2017) A hybrid solar radiation modeling approach using wavelet multiresolution analysis and artificial neural networks. Appl Energy. https://doi.org/10.1016/j.apenergy.2017.09.100
Article Google Scholar
Iacus SM (2009) Simulation and inference for stochastic differential equations. https://doi.org/10.1007/978-0-387-75839-8
Im KS, Pesaran MH, Shin Y (2003) Testing for unit roots in heterogeneous panels. J Econ. https://doi.org/10.1016/S0304-4076(03)00092-7
Article Google Scholar
Jarque CM, Bera AK (1987) A test for normality of observations and regression residuals. Int Stat Rev Rev Int Stat. https://doi.org/10.2307/1403192
Article Google Scholar
Kocifaj M (2015) Unified model of radiance patterns under arbitrary sky conditions. Sol Energy. https://doi.org/10.1016/j.solener.2015.02.019
Article Google Scholar
Kolmogorov AN (1968) Logical basis for information theory and probability theory. IEEE Trans Inf Theory. https://doi.org/10.1109/TIT.1968.1054210
Article Google Scholar
Kumari P, Toshniwal D (2021a) Deep learning models for solar irradiance forecasting: a comprehensive review. J Clean Prod. https://doi.org/10.1016/J.JCLEPRO.2021.128566
Article Google Scholar
Kumari P, Toshniwal D (2021b) Extreme gradient boosting and deep neural network based ensemble learning approach to forecast hourly solar irradiance. J Clean Prod. https://doi.org/10.1016/J.JCLEPRO.2020.123285
Article Google Scholar
Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against the alternative of a unit root. J Econ. https://doi.org/10.1016/0304-4076(92)90104-y
Article Google Scholar
Lan H, Yin H, Hong YY et al (2018) Day-ahead spatio-temporal forecasting of solar irradiation along a navigation route. Appl Energy 211:2569. https://doi.org/10.1016/j.apenergy.2017.11.014
Article Google Scholar
Lan H, Zhang C, Hong YY et al (2019) Day-ahead spatiotemporal solar irradiation forecasting using frequency-based hybrid principal component analysis and neural network. Appl Energy. https://doi.org/10.1016/j.apenergy.2019.04.056
Article Google Scholar
Lang G, Roueff F (2001) Semi-parametric estimation of the hölder exponent of a stationary gaussian process with minimax rates. Stat Inference Stoch Process 4:283–306. https://doi.org/10.1023/A:1012227325436
Article Google Scholar
Lauret P, Voyant C, Soubdhan T et al (2015) A benchmarking of machine learning techniques for solar radiation forecasting in an insular context. Sol Energy. https://doi.org/10.1016/j.solener.2014.12.014
Article Google Scholar
Lempel A, Ziv J (1976) On the complexity of finite sequences. IEEE Trans Inf Theory. https://doi.org/10.1109/TIT.1976.1055501
Article Google Scholar
Liu J, Huang X, Li Q et al (2023) Hourly stepwise forecasting for solar irradiance using integrated hybrid models CNN-LSTM-MLP combined with error correction and VMD. Energy Convers Manag. https://doi.org/10.1016/j.enconman.2023.116804
Article Google Scholar
Mandelbrot BB, Van Ness JW (1968) Fractional brownian motions, fractional noises and applications. SIAM Rev. https://doi.org/10.1137/1010093
Article Google Scholar
Martin Bland J, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. https://doi.org/10.1016/S0140-6736(86)90837-8
Article Google Scholar
McLeod I (1977) Correction: derivation of the theoretical autocovariance function of autoregressive-moving average time series. Appl Stat. https://doi.org/10.2307/2347029
Article Google Scholar
Mihailović DT, Aksentijevic A, Mihailović A (2021) Mapping regularities in the solar irradiance data using complementary complexity measures. Stoch Environ Res Risk Assess 35:1257–1272. https://doi.org/10.1007/S00477-020-01955-1/FIGURES/7
Article Google Scholar
Mihailović DT, Bessafi M, Marković S et al (2018) Analysis of solar irradiation time series complexity and predictability by combining Kolmogorov measures and Hamming distance for La Reunion (France). Entropy. https://doi.org/10.3390/e20080570
Article Google Scholar
Mihailović DT, Malinović-Milićević S, Han J, Singh VP (2023) Complexity and chaotic behavior of the US Rivers and estimation of their prediction horizon. J Hydrol 622:129730. https://doi.org/10.1016/j.jhydrol.2023.129730
Article Google Scholar
Mihailović DT, Nikolić-Đorić E, Arsenić I et al (2019a) Analysis of daily streamflow complexity by Kolmogorov measures and Lyapunov exponent. Phys A Stat Mech Its Appl. https://doi.org/10.1016/j.physa.2019.03.041
Article Google Scholar
Mihailović DT, Nikolić-Dorić E, Malinović-Milićević S et al (2019b) The choice of an appropriate information dissimilarity measure for hierarchical clustering of river streamflow time series, based on calculated Lyapunov exponent and Kolmogorov measures. Entropy. https://doi.org/10.3390/e21020215
Article Google Scholar
Modis T (2022) Links between entropy, complexity, and the technological singularity. Technol Forecast Soc Change. https://doi.org/10.1016/j.techfore.2021.121457
Article Google Scholar
Mohammadi S (2009) LYAPROSEN: MATLAB function to calculate Lyapunov exponent. https://www.researchgate.net/publication/241753217_LYAPROSEN_MATLAB_function_to_calculate_Lyapunov_exponent
Nogués-Bravo D (2009) Predicting the past distribution of species climatic niches. Glob Ecol Biogeogr 18:521–531. https://doi.org/10.1111/j.1466-8238.2009.00476.x
Article Google Scholar
Perez R, Kivalov S, Schlemmer J et al (2010) Validation of short and medium term operational solar radiation forecasts in the US. Sol Energy. https://doi.org/10.1016/j.solener.2010.08.014
Article Google Scholar
Perez R, Lorenz E, Pelland S et al (2013) Comparison of numerical weather prediction solar irradiance forecasts in the US, Canada and Europe. Sol Energy. https://doi.org/10.1016/j.solener.2013.05.005
Article Google Scholar
Rajabzadeh Y, Rezaie AH, Amindavar H (2017) Short-term traffic flow prediction using time-varying Vasicek model. Transp Res Part C Emerg Technol. https://doi.org/10.1016/j.trc.2016.11.001
Article Google Scholar
Reikard G, Hansen C (2019) Forecasting solar irradiance at short horizons: frequency and time domain models. Renew Energy. https://doi.org/10.1016/j.renene.2018.08.081
Article Google Scholar
Reilly A, Frazer G, Boashash B (1994) Analytic signal generation—tips and traps. IEEE Trans Signal Process. https://doi.org/10.1109/78.330385
Article Google Scholar
Riihimaki LD, Li X, Hou Z, Berg LK (2021) Improving prediction of surface solar irradiance variability by integrating observed cloud characteristics and machine learning. Sol Energy 225:275–285. https://doi.org/10.1016/J.SOLENER.2021.07.047
Article Google Scholar
Rosenbaum M (2008) Estimation of the volatility persistence in a discretely observed diffusion model. Stoch Process Their Appl. https://doi.org/10.1016/j.spa.2007.09.004
Article Google Scholar
Rosenstein MT, Collins JJ, De Luca CJ (1993) A practical method for calculating largest Lyapunov exponents from small data sets. Phys D Nonlinear Phenom. https://doi.org/10.1016/0167-2789(93)90009-P
Article Google Scholar
Schartner M, Seth A, Noirhomme Q et al (2015) Complexity of multi-dimensional spontaneous EEG decreases during propofol induced general anaesthesia. PLoS ONE. https://doi.org/10.1371/journal.pone.0133532
Article Google Scholar
Seymour L, Brockwell PJ, Davis RA (1997) Introduction to Time Series and Forecasting. J Am Stat Assoc. https://doi.org/10.2307/2965440
Article Google Scholar
Shapiro SS, Wilk MB, Chen HJ (1968) A comparative study of various tests for normality. J Am Stat Assoc. https://doi.org/10.1080/01621459.1968.10480932
Article Google Scholar
Stopa JE, Cheung KF (2014) Intercomparison of wind and wave data from the ECMWF reanalysis interim and the NCEP climate forecast system reanalysis. Ocean Model 75:65–83. https://doi.org/10.1016/j.ocemod.2013.12.006
Article Google Scholar
Tang L, Lv H, Yang F, Yu L (2015) Complexity testing techniques for time series data: a comprehensive literature review. Chaos Solitons Fractals 81:117–135
Article Google Scholar
Voyant C, Muselli M, Paoli C, Nivet ML (2013) Hybrid methodology for hourly global radiation forecasting in Mediterranean area. Renew Energy. https://doi.org/10.1016/j.renene.2012.10.049
Article Google Scholar
Wang G, Su Y, Shu L (2016) One-day-ahead daily power forecasting of photovoltaic systems based on partial functional linear regression models. Renew Energy. https://doi.org/10.1016/j.renene.2016.04.089
Article Google Scholar
Whittle P (1953) Estimation and information in stationary time series. Ark För Mat. https://doi.org/10.1007/BF02590998
Article Google Scholar
Winters PR (1960) Forecasting sales by exponentially weighted moving averages. Manag Sci. https://doi.org/10.1287/mnsc.6.3.324
Article Google Scholar
Yang D, Jirutitijaroen P, Walsh WM (2012) Hourly solar irradiance time series forecasting using cloud cover index. Sol Energy. https://doi.org/10.1016/j.solener.2012.07.029
Article Google Scholar
Yang D, Kleissl J, Gueymard CA et al (2018) History and trends in solar irradiance and PV power forecasting: a preliminary assessment and review using text mining. Sol Energy. https://doi.org/10.1016/j.solener.2017.11.023
Article Google Scholar
Zhang Z (2016) Introduction to machine learning: K-nearest neighbors. Ann Transl Med. https://doi.org/10.21037/atm.2016.03.37
Article Google Scholar
Zhang W, Lin Z, Liu X (2022) Short-term offshore wind power forecasting—a hybrid model based on discrete wavelet transform (DWT), seasonal autoregressive integrated moving average (SARIMA), and deep-learning-based long short-term memory (LSTM). Renew Energy 185:611–628. https://doi.org/10.1016/J.RENENE.2021.12.100
Article Google Scholar

Download references

Acknowledgements

The authors offer their sincere gratitude to the anonymous reviewers whose comments and points enhanced the paper significantly.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

School of Computer Science, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Ali Nikseresht
Department of Electrical Engineering, University of Washington, Seattle, WA, USA
Hamidreza Amindavar

Authors

Ali Nikseresht
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Amindavar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ali Nikseresht: Conceptualization, Methodology, Software, Writing, Data curation, Visualization. Hamidreza Amindavar: Conceptualization, Supervision, Writing-Reviewing, and Editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hamidreza Amindavar.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The utilized datasets in the current paper are available upon request.

Supplementary file

The high-resolution version of the current paper’s figures (TIF File (.tif)), and also their corresponding MATLAB graphical files (FIG File (.fig)) are provided in a separate file as a supplementary file (except for Figs. 1, 6, 10, 11, and 12 since they already have the highest resolution)..

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (RAR 11124 kb)

Appendices

Appendix A

Using various statistical performance evaluation metrics when forecasting time series is beneficial because different metrics assess different aspects of forecasting accuracy, such as magnitude errors, percentage errors, or scaling errors, and can provide a more comprehensive understanding of the model's performance. Using multiple metrics ensures that various dimensions of forecast quality are considered. For example, MAE focuses on magnitude errors, while MAPE emphasizes percentage errors. By examining multiple metrics, we gain a more complete picture of the model's performance. Additionally, different metrics can be sensitive to different types of errors. Some metrics, like MAE and RMSE, are sensitive to both overestimation and underestimation errors. On the other hand, metrics like MAPE or MASPE are more sensitive to percentage errors. Or for instance, Spearman's rho and Kendall's tau are robust to outliers and resistant to extreme values in the data. They are based on the ranks of the observations rather than their actual values. As a result, these measures are less influenced by a few extreme data points, which can be advantageous when working with time series that may contain outliers or data with heavy-tailed distributions. So, by using a combination of metrics, we can detect and analyze errors from multiple perspectives, allowing for a more nuanced assessment. Finally, no single metric is universally superior or immune to specific limitations. Each metric has its strengths and weaknesses. By employing multiple metrics, we mitigate the limitations of individual metrics and obtain a more robust evaluation of the forecasting model. It reduces the risk of relying solely on one metric that might overlook certain aspects of forecast accuracy or be influenced by outliers or specific characteristics of the data.

All the formulae regarding the metrics, which are employed in the assessment process in Sect. 2.1., are as the following:

Root Mean Square Error (RMSE):

$${\text{RMSE}} = \frac{{\sqrt {\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left( {{\text{Y}}_{{\text{i}}} - {\hat{\text{Y}}}_{{\text{i}}} } \right)\begin{array}{*{20}c} 2 \\ { } \\ \end{array} } }}{{{\text{N}}^{2} }}$$

(12)

in which N denotes the time series length, ${\mathrm{Y}}_{\mathrm{i}}$ is the original residuals time series and ${\widehat{\mathrm{Y}}}_{\mathrm{i}}$ is the forecasted residual time series.

Normalized Root Mean Square Error (nRMSE):

$${\text{nRMSE}} = \frac{{\sqrt {\frac{1}{{\text{N}}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left( {{\text{Y}}_{{\text{i}}} - {\hat{\text{Y}}}_{{\text{i}}} } \right)\begin{array}{*{20}c} 2 \\ { } \\ \end{array} } }}{{\frac{1}{{\text{N}}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left( {{\text{Y}}_{{\text{i}}} } \right)}}$$

(13)

Root Mean Squared Relative Error (RMSRE):

$${\text{ RMSRE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {\frac{{\left( {{\text{Y}}_{i} - {\hat{\text{Y}}}_{i} } \right)}}{{{\text{Y}}_{i}^{ } }}} \right)^{2} }$$

(14)

Root Mean Square Difference (RMSD):

$${\text{RMSD}} = \frac{{\sqrt {\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left[ {\left\{ {({\text{Y}}_{{\text{i}}} - {\overline{\text{Y}}}_{{\text{i}}} ) - ({\hat{\text{Y}}}_{{\text{i}}} - \overline{{{\hat{\text{Y}}}}}_{{\text{i}}} )} \right\}\begin{array}{*{20}c} 2 \\ { } \\ \end{array} } \right]} }}{{{\text{N}}^{2} }}$$

(15)

where ${\overline{\mathrm{Y}} }$ and ${\overline{\widehat{\mathrm{Y}}} }$ are the mean of the original residual time series and forecasted ones, respectively.

Mean Absolute Error (MAE):

$${\text{ MAE}} = \frac{1}{{\text{N}}}{ }\mathop \sum \limits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\text{i}}} - {\hat{\text{Y}}}_{{\text{i}}} }}{{{\text{Y}}_{{\text{i}}} }}} \right|$$

(16)

Mean Absolute Percentage Error (MAPE):

$${\text{ MAPE}} = \frac{1}{{\text{N}}}{ }\mathop \sum \limits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\text{i}}} - {\hat{\text{Y}}}_{{\text{i}}} }}{{{\text{Y}}_{{\text{i}}} }}} \right| \times 100$$

(17)

Mean Absolute Scaled Percentage Err (MASPE):

$$\begin{aligned} {\text{MASPE}} & = \frac{{{\text{MAE}}_{{Target\,Algorithm\left( {TA} \right)}} }}{{{\text{MAE}}_{{Naive\,Algorithm\left( {NA} \right)}} }} \times 100 = \frac{{\frac{1}{{\text{N}}}{ }\mathop \sum \nolimits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\left( {TA} \right)i}} - {\hat{\text{Y}}}_{{\left( {TA} \right)i}} }}{{{\text{Y}}_{{\left( {TA} \right)i}} }}} \right|}}{{\frac{1}{{\text{N}}}{ }\mathop \sum \nolimits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\left( {NA} \right)i}} - {\hat{\text{Y}}}_{{\left( {NA} \right)i}} }}{{{\text{Y}}_{{\left( {NA} \right)i}} }}} \right|}} \times 100 \\ & = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\left( {TA} \right)i}} - {\hat{\text{Y}}}_{{\left( {TA} \right)i}} }}{{{\text{Y}}_{{\left( {TA} \right)i}} }}} \right|}}{{\mathop \sum \nolimits_{i = 1}^{N} \left| {\frac{{{\text{Y}}_{{\left( {NA} \right)i}} - {\hat{\text{Y}}}_{{\left( {NA} \right)i}} }}{{{\text{Y}}_{{\left( {NA} \right)i}} }}} \right|}} \times 100 \\ \end{aligned}$$

(18)

R-squared (R²):

$${\text{R}}^{2} = 1 - \frac{{{\text{RSS}}}}{{{\text{TSS}}}} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left( {{\text{Y}}_{{\text{i}}} - {\hat{\text{Y}}}_{{\text{i}}} } \right)\begin{array}{*{20}c} 2 \\ { } \\ \end{array} }}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} \left( {{\text{Y}}_{{\text{i}}} - {\overline{\text{Y}}}} \right)\begin{array}{*{20}c} 2 \\ { } \\ \end{array} }}$$

(19)

In which RSS indicates the “sum of squares of residuals” and TSS stands for the “total sum of squares”.

Spearman’s rho and Kendall’s tau formula are as follows:

Spearman’s rho is equivalent to Pearson’s Linear Correlation Coefficient applied to the rankings of the columns ${xx}_{a}$ and ${yy}_{b}$. If all the ranks in each column are distinct, Spearman’s rho equation simplifies to:

$${\uprho }_{s} \left( {{\text{a}},{\text{b}}} \right) = 1 - \frac{{6\mathop \sum \nolimits_{ }^{ } {\mathcal{D}}^{2} }}{{N\left( {N^{2} - 1} \right)}}$$

(10)

where $\mathcal{D}$ is the difference between the ranks of the two columns, and $N$ is the length of each column.

Kendall’s tau is based on counting the number of $(i,j)$ pairs, for $i<j$, that are concordant—that is, for which ${x}_{a,i}-{x}_{a,j}$ and ${y}_{b,i}-{y}_{b,j}$ have the same sign. The equation for Kendall’s tau includes an adjustment for ties in the normalizing constant and is often referred to as tau-b.

For column ${x}_{a}$ in matrix $x$ and column ${y}_{b}$ in matrix $y$, Kendall’s tau coefficient is defined as:

$$\tau = \frac{2K}{{N\left( {N - 1} \right)}}$$

(11)

where $K=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}{\zeta }^{*}({x}_{a,i},{x}_{a,j},{y}_{b,i},{y}_{b,j})$, and

$$\zeta^{*} \left( {x_{a,i} ,x_{a,j} ,y_{b,i} ,y_{b,j} } \right) = \left\{ {\begin{array}{*{20}c} { 1: if \left( {x_{a,i} - x_{a,j} } \right)\left( {y_{b,i} - y_{b,j} } \right) > 0} \\ { 0: if \left( {x_{a,i} - x_{a,j} } \right)\left( {y_{b,i} - y_{b,j} } \right) = 0} \\ { - 1: if \left( {x_{a,i} - x_{a,j} } \right)\left( {y_{b,i} - y_{b,j} } \right) < 0} \\ \end{array} } \right.$$

The correlation coefficient might be anywhere between –1 and 1. A number of –1 indicates that one column’s rating is the inverse of the other, whereas a value of + 1 indicates that the two ranks are identical. A value of 0 indicates that the columns have no relationship.

Appendix B

2.1 The KC mathematical and complementary explanations

Let $X$ represent the solar irradiance time series, and $x$ denote a specific value within that series. The Kolmogorov complexity, denoted as ${K}_{c}(x)$, measures the length of the smallest program that, when run on a universal Turing machine $U$, produces the object $x$. Although ${K}_{c}(x)$ cannot be directly computed for arbitrary objects, it is approximated by the size of the most compressed version of $x$ (Mihailović et al. 2019a, 2023).

In practice, the calculation of the KC of a time series $X({x}_{1}, {x}_{2}, {x}_{3}, ..., {x}_{N})$ using the LZA algorithm involves several steps. Firstly, the time series is encoded by creating a binary sequence S using the characters 0 and 1 written as$s(i)$. Each element$s(i)$,$i=\mathrm{1,2},...,N$, of the sequence is set to 0 if ${x}_{i}<{x}_{t}$ or 1 if${x}_{i}>{x}_{t}$, where ${x}_{t}$ is a threshold typically chosen as the mean value of the time series (Mihailović et al. 2019a, 2023). Other encoding schemes are also available (Dingle et al. 2023).

The next step is to calculate the complexity counter $c(N)$, which represents the minimum number of distinct patterns present in the encoded sequence of length $N$. The complexity counter is a function of the sequence length $N$ and is bounded by $b(N)=N/{\mathit{log}}_{2}N$ as $N$ approaches infinity, denoted as $c(N)=O(b(N))$.

Finally, the normalized information measure ${C}_{k}(N)$ is computed as ${C}_{k}(N)=c(N)/b(N)$. For nonlinear time series, ${C}_{k}(N)$ varies between 0 and 1, although it may exceed 1 for finite-size random sequences (Mihailović et al. 2019a, 2023). It is worth noting that a pattern refers to a unique and non-repetitive sequence within the encoded time series. A flow chart illustrating the calculation of the KC of a time series using the LZA algorithm is provided in (Mihailović et al. 2019a, 2023).

2.2 The LE mathematical and complementary explanations

Quantitatively, the divergence of two trajectories in phase space with an initial separation $\delta X$ is approximately given by $\left|\delta X(t)\right|\approx {e}^{\lambda t}\left|\delta X(0)\right|$, where $\lambda$ represents the LE. It should be noted that this approximation assumes a linear relationship in the divergence. Since the rate of separation can vary depending on the orientation of the initial separation vector, there exists a spectrum of LEs, with the largest value commonly referred to as the LE. A positive value of the LE typically indicates chaotic behavior in the system. In our study, we calculated the LE for the solar irradiance time series using the Rosenstein algorithm (Rosenstein et al. 1993) implemented in the MATLAB program (Mohammadi 2009). The LE, denoted as $\lambda$, is obtained as the limit of the average logarithmic divergence rate as the time delay $\tau$ approaches infinity and the separation $\varepsilon$ approaches zero, given by $\lambda =\underset{\tau \to \infty }{\mathit{lim}}\underset{\varepsilon \to 0}{\mathit{lim}}\frac{1}{\tau }ln(\frac{\left|x(\tau )-{x}_{\varepsilon }(\tau )\right|}{\varepsilon })$ where $\left|x(0)-{x}_{\varepsilon }(0)\right|=\varepsilon$.

The Rosenstein algorithm is known for its efficiency, ease of application, and robustness to variations in embedding dimension, reconstruction delay, length of the time series, and noise level.

Appendix C

In our analysis, we estimated the most reliable prediction horizon (LT) following the procedure described in (Mihailović et al. 2019a). However, since the complexity of the solar irradiance datasets used in this study is low, we used the simplified version of LT as explained in Sect. 2.2.3, for predicting the time series. It is important to note that solar irradiance time series from other stations and countries may exhibit higher complexity levels (Mihailović et al. 2018, 2021).

In cases where solar irradiance time series exhibit higher complexity levels, the predictability can be influenced by randomness. To account for this, a randomization time (${\Delta t}_{rand}$) is introduced as ${\Delta t}_{rand}=1/{K}_{c}(x)$. This randomization time, denoted as KT, quantifies the time span beyond which randomness significantly affects predictability. The predictability horizon, LT, is then corrected for randomness and defined as the intersection of [0, ${\Delta t}_{lyap}$] and [0, ${\Delta t}_{rand}$]. This corrected LT takes into account the time window within the time series where complexity remains relatively unchanged. Figure

14 illustrates the predictability of solar irradiance data in hours, considering the corrected LT for randomness. The corrected LT is inversely proportional to the KC. The figure shows that for KCH, the prediction horizon ${\Delta t}_{rand}$ falls between two and four hours. Figure

15 displays the relationship between LT and KT. The impact of randomness on reducing the LT can be observed by comparing Figs. 5, 14, and 15. According to Fig. 5, the longest predictability in LT units for all datasets is around 10 h. However, considering the presence of randomness as shown in Fig. 14, this predictability is reduced to 4 h.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nikseresht, A., Amindavar, H. Hourly solar irradiance forecasting based on statistical methods and a stochastic modeling approach for residual error compensation. Stoch Environ Res Risk Assess 37, 4857–4892 (2023). https://doi.org/10.1007/s00477-023-02539-5

Download citation

Accepted: 04 August 2023
Published: 30 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00477-023-02539-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hourly solar irradiance forecasting based on statistical methods and a stochastic modeling approach for residual error compensation

Abstract

Access this article

Similar content being viewed by others

A comprehensive review and analysis of solar forecasting techniques

Recursive Estimation Methods to Forecast Short-Term Solar Irradiation

Forecasting of Solar Irradiances using Time Series and Machine Learning Models: A Case Study from India

Notes

References

Acknowledgements

Funding