Designing a long short-term network for short-term forecasting of global horizontal irradiance

Long short-term memory (LSTM) models based on specialized deep neural network-based architecture have emerged as an important model for forecasting time-series. However, the literature does not provide clear guidelines for design choices, which affect forecasting performance. Such choices include the need for pre-processing techniques such as deseasonalization, ordering of the input data, network size, batch size, and forecasting horizon. We detail this in the context of short-term forecasting of global horizontal irradiance, an accepted proxy for solar energy. Particularly, short-term forecasting is critical because the cloud conditions change at a sub-hourly having large impacts on incident solar radiation. We conduct an empirical investigation based on data from three solar stations from two climatic zones of India over two seasons. From an application perspective, it may be noted that despite the thrust given to solar energy generation in India, the literature contains few instances of robust studies across climatic zones and seasons. The model thus obtained subsequently outperformed three recent benchmark methods based on random forest, recurrent neural network, and LSTM, respectively, in terms of forecasting accuracy. Our findings underscore the importance of considering the temporal order of the data, lack of any discernible benefit from data pre-processing, the effect of making the LSTM model stateful. It is also found that the number of nodes in an LSTM network, as well as batch size, is influenced by the variability of the input data.


Introduction
Solar radiation is one of the most important components of alternative sources of energy [32,37]. Accurate prediction of solar radiation is essential for several tasks like planning power generation, matching peak demand, estimating surplus or even to make purchases [12]. Generation of solar power has significant variability because of its strong dependence on atmospheric conditions [31,33,47]. In the context of India, energy demand has been continuously on the rise because of the rapid development and expansion of urban areas. India is among the top five counties in terms of solar energy potential with the availability of sufficient solar hot-spots. Hence, research on solar energy is quite critical for India [40].
Most solar energy forecasting has been done using Numerical Weather Prediction (NWP) [4] models, also referred to as physical models in the literature. Statistical models like Auto-Regressive Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), etc., and machine learning models Hence, there is a general disagreement between design choices for an LSTM, such as preserving the temporal order of the data and the need for pre-processing. Apart from the above-mentioned two design issues, it is also perceived that a few other issues like the batch size, the prediction horizon, adjustments for inherent input data variability can impact model performance. In this paper, we consider the design questions enlisted below.
• Whether LSTM benefits from pre-processing steps such as seasonality removal. • Whether to set up the problem as a Supervised or Non-Supervised problem and in the latter case, whether it is necessary to consider dependency among batches. • How does the forecasting performance of an LSTM change with the increase in the prediction horizon, especially in context to the season? • What is the effect of variability in input data on model complexity?
In this paper, we have attempted to investigate the abovelisted questions systematically in the context of short-term intra-day forecasting of GHI using LSTM. The major contributions of this paper are enlisted as follows.
• An empirical study has been conducted for three solar stations, two seasons, and two climatic zones in India.
It may be noted that such a study is quite rare in India, despite its rich solar energy potential. • The design questions enlisted have been empirically evaluated, and important recommendations like considering the temporal order of the data (Non-Supervised setup), no pre-processing, and preserving dependency between batches have been made. • It has been established that the forecasting performance is dependent on batch size and variability of the input data. • It has also been demonstrated that the number of nodes required by the LSTM network increases with an increase in the variability of the input data. • The model obtained using these recommendations produces superior forecasting performance applying RF, RNN, and LSTM, respectively.
The rest of the paper is organized as follows. In Sect. 2, we have performed a detailed literature review of machine learning and deep learning for solar energy forecasting. The research efforts have been also categorized in terms of the length of the forecasting horizon. In Sect. 3, we have provided a brief outline of the LSTM architecture. In Sect. 4, we have discussed the materials and methods employed in setting up the experiment. In Sect. 5, the results of the forecasting models are presented along with a critical analysis. The paper concludes with a discussion in Sect. 6.

Related work
In this section, a brief overview of current research for the prediction of GHI has been presented. It may be noted that the approach of building physical models for estimating GHI using classical equations [27] has not been included in the scope of the furnished review. There is also a conscious effort to include studies conducted in India. In Fig. 1, the research papers or articles have been categorized in terms of the type of the forecasting model, prediction on different lengths of forecast horizons, and the number of input variables of the model.

Statistical and machine learning models
Yang et al. [51] have analyzed three approaches for one hour ahead solar irradiation forecasting based on the exponential smoothing technique (ETS) applied to cloud cover. Kashyap et al. [21] have proposed a model based on ANN to forecast GHI for a one-hour horizon. Feng et al. [11] have developed a one hour ahead GHI forecasting model using SVM classifier, with 9.75% nRMSE. Reikard et al. [42] have used their forecasting model for several horizons ranging from 15 minutes to three hours. They have reported that for a 15-minute horizon, the persistence model and regression model outperformed the frequency domain model. For a 45 minute horizon, the performance of the above three models was close. At the one-hour horizon, ARIMA achieved better accuracy when applied to the Clear Sky Index. For a two-hour horizon, the frequency domain model performed better than others. Finally, for a three-hour horizon, the performances of the frequencydomain approach and ARIMA were similar. Alfadda et al. [2] have shown that a multi-layer perceptron (MLP) works better as compared to SVR, k-nearest neighbors (kNN), and decision tree regression for one hour ahead irradiation forecasting. Fouilloy et al. [13] have proposed a bagged regression tree and RF-based models to predict hourly GHI for 6 hours. Perveen et al. [35] have proposed an adaptive neural fuzzy inference system (ANFIS)-based multivariate solar power forecasting model for different sky conditions for India. Benali et al. [5] demonstrated that RF performs best for predicting GHI, beam normal irradiation (BNI), and diffuse horizontal irradiation (DHI) for six hours ahead. Perveen et al. [34] have designed an ANFIS-based multivariate short-term solar power forecasting model for complex climatic conditions for India. Rana et al. [39] have designed five minutes to three hours ahead univariate solar photovoltaic power forecasting model based on a unique re-sampling technique and have combined multiple RF models predictions for individual steps to design a single robust multi-steps ahead prediction model.

Deep learning-based models
Ahmad et al. [3] showed the efficacy of deep recurrent neural network-based models over other benchmark models when applied to solar energy data in Canada.
Qing et al. [38] have achieved better results using LSTM as compared to neural networks using backpropagation. Caballero et al. [7] have designed a forecasting model using LSTM to forecast solar irradiation for the window of five minutes. Mukherjee et al. [28] have proposed an LSTM-based multivariate solar forecasting model for Kharagpur, India. Caldas et al. [8] have designed a hybrid forecasting model by considering solar energy data and sky images to predict for one to ten minutes ahead. Nikitidou et al. [29] have designed a 15-240 mins ahead model for forecasting cloudiness. Ryu et al. [43] have reported a forecasting model to forecast 5 to 20 mins ahead of using a convolutional neural network (CNN)based model on total sky images and lagged value of GHI. Abdel et al. [1] have proposed a univariate photovoltaic power forecasting model for hourly data based on an LSTM-RNN while experimenting with five different model architectures. Li et al. [26] have reported that RNN-based solar power forecasting model has outperformed the persistence method, backpropagation neural network (BPNN), radial basis function (RBF) neural network, SVM, and LSTM. Huang et al. [18] have proposed an hourly LSTM-MLP-based GHI forecasting model. Kumari et al. [23] have designed an hourly GHI forecasting model using an ensemble approach. The extreme gradient boosting forest (XGBF) and deep neural networks (DNN) used as the base learners. Ridge regression is used to combine the predictions.
It can be observed that the literature for India is limited even though the country has rich solar potential. Most available papers [8,18,26,28,29,43] consider prediction at a coarse time resolution or are limited to one geographical region or a particular solar power plant.
We have compared the performance of our proposed method with three recent methods for solar power forecasting. These include the RF and RNN of [39] and [26] and the LSTM developed by [1].
• In paper [39], the authors have used the same algorithm repetitively for multi-step prediction and have tuned model hyperparameters such as the number of trees and splits using Grid Search with 10-fold crossvalidation. • In paper [1], the authors have used a specific LSTM architecture for univariate solar power forecasting. • In paper [26], the authors have reported RNN for inter and intra-day prediction.

Deep learning sequence model and LSTM
Feed forward neural network (FNN) is the most common type of deep learning architecture. It has demonstrated remarkable performance across application domains over traditional machine learning problems. However, one of the limitations of FNNs is their inability to handle sequence data like text, video, time-series, etc. The RNN with a memory component, where the current output is a function of current input as well as a previous step can handle this issue. Though RNN achieved reasonable success, one of the weaknesses later exposed was its inability to remember long-range dependency because of the vanishing gradient problems [16]. LSTM was proposed by Hochreiter and Schmidhuber [17] and can address vanishing and exploding gradients [16,36]. LSTM is specially designed to memorize very longterm temporal dependencies through memory cells containing several types of gates. Apart from that, LSTM can learn nonlinearity. Hereby, using a schematic diagram in Fig. 2, the detailed architecture of a specific memory cell of LSTM is shown. The mathematical equations associated with different gates of the LSTM cell are discussed with the description of the gates.
Suppose at time t the current input is x t and the previous hidden state is h t−1 , then the current hidden state h t and the current cell state c t are computed as follows: Depending on the current input x t and previous hidden layer output h t−1 , based on a sigmoid layer, forget gate produces either 0 or 1. If 1, memory information is retained, else discarded.
Input gate helps to decide on new information to be added to the current cell state based on new candidate values provided by The new cell state c t depends on the previous cell state c t−1 and c t−1 * f t is the fraction of the old cell state that will be discarded with the help of forget gate, while new information will be added through ĉ t * i t . The summation of these two simultaneous updates is the current cell state.
Finally, the result of the output gate is multiplied with the cell state through tanh to compute the value of the current hidden state.
Here, w f , w i , w c , and w o are weight matrices. b f , b i , b c , and b o are the biases for individual gates. indicates a sigmoid activation function. * stands for element wise multiplication, and + implies element wise addition.
• The LSTM model is trained by selecting a continuous portion or window from the input data. Instead of taking all such windows for training, often it is broken into batches.
• If the batches are considered independent of each other, then such a model is called a stateless model. While if batch to batch dependency is taken into account, then it is called a stateful model. • Typically, when dealing with the sequence data, the hidden layer nodes are the LSTM cells. In Fig. 3

Materials and methods
This section has five subsections. In the first subsection, the source of the data, the extraction process, time-period, etc., have been outlined. In the second subsection, the pre-processing steps are described briefly. It is needed to understand how the design issue of pre-processing is investigated in this paper. In the third subsection, how the

Data collection
Indian Ministry of New and Renewable Energy (MNRE) in 2011 has initiated extensive solar and meteorological monitoring under the Solar Radiation Resource Assessment (SRRA) project [22]. Indian climate is made up of four seasons namely Summer, Monsoon (rainy) season, Post-Monsoon, and Winter by the Indian Meteorological Department (IMD).
We have used the application programming interface (API) provided by the Center for Wind Energy Technology (C-WET) to crawl raw solar irradiation data for SRRA stations across India. In this paper, data for 2016 were used for two climatic zones (Hot and Dry and Hot and Humid), and three stations located at Chennai (Tamil Nadu), Howrah (West Bengal), and Ajmer (Rajasthan). Table 1 describes the details of the solar stations, date range, number of data elements, etc. For each of the stations, we have chosen a month each from the rainy and winter seasons. Typically, the rainy season is known for its high variability in GHI compared to winter.
In Fig. 4, the distribution and variability of GHI are illustrated for each station-season combination. The plot shows that the variability of GHI is higher for Howrah and Ajmer in the rainy season, whereas it is relatively lesser for the other cases. The box-plots also confirm the absence of any outliers. The data for Howrah in the rainy season are observed to be having maximum skew compared to other stations.

Data pre-processing
In the pre-processing, firstly, the night hours are removed [19].
• As per standard practice for short-term solar forecasting, the resolution of GHI is converted from one minute into 5 mins [39,41]. • For each day, we have filtered the GHI between 7 AM to 7 PM. After removing the night hours, we have combined all days in a month to construct a single timeseries. • The GHI values have been normalized to lie in [0, 1] using Eq. 1.
In Eq. 1, GHI t is the GHI at time-step t, GHI min is the minimum value of the population, GHI max is the maximum value of the population, and Ĝ HI t is the normalized value of GHI at time-step t.
In some papers [44,52], the authors have removed the non-stationary part of the series before fitting the data into a deep learning model like LSTM. Our data display daily seasonality. Hence, we have deseasonalized it using the following algorithm.
(1) GHI t = GHI t − GHI min GHI max − GHI min • Thus, from the ith observation, we have subtracted the (i − 144) th observation to remove day-wise seasonality and then appended it sequentially. Here, GHI s is the final series. • The raw time-series and the pre-processed time-series are used as input to the LSTM network and compared.

Supervised or non-supervised learning
In this section, we have outlined the experimental setup needed to investigate the design issue of whether to preserve the temporal dependency (Non-Supervised) or not (Supervised). For LSTM, the preparation of data is different than for traditional machine learning algorithms. The data should be formatted as a three-dimensional array, where the three dimensions are the size of the batch, number of time-steps (Window Size), and number of input features. In Fig. 5, the array is pictorially presented. The input features are denoted as Feature 1 , Feature 2 , Feature 3 ,..., and Feature n . The time-steps are represented as T 1 , T 2 , T 3 ,..., and T m .

LSTM architectures
In this subsection, the details of LSTM networks are discussed.
• We have used a sequential model from the Keras Library [15] to design four sequentially arranged layers including, an input layer, two LSTM hidden layers, and one output layer.  Table 2. We have used Adam as the optimizer with a learning rate set to 0.01. Hyper-parameters like the number of epochs, number of layers, batch size, learning rate, and the number of nodes in each hidden layer has been optimized using the Random Search approach with 5-fold cross-validation with three repetitions of the process. • Tanh activation has been used for each hidden layer. • At the time of prediction, we have considered different specifications of a batch size such as 1,9,18,36, and 72 to find the optimal choice. • The stateful parameter is set to True and False alternatively to investigate the effect of preserving dependency between batches.
The stateful LSTM has been referred to as DSS-LSTM and stateless LSTM as DSSL-LSTM for subsequent discussions.

Evaluation of forecasting model
We have used three evaluation metrics namely root mean square error (RMSE), normalized root mean square error (nRMSE), and Explained Variance Score. The following equations are used for calculating the evaluating metrics.
In Eq. 2, GHI t is the tth actual value and the corresponding predicted value is Ĝ HI t . nRMSE can be a good measure of forecasting error when we want to forecast for multiple data-sets. It is defined as follows.
In Eqs. 3 and 4, is the standard deviation of the actual values of GHI. Explained Variance Score is given in the below equation.

Results
This section has six subsections. In the first subsection, the performance of DSS-LSTM is evaluated on the raw and pre-processed time-series. In the second subsection, we have investigated whether to set up the time-series prediction problem as Supervised or Non-Supervised. In the third subsection, the effect of batch size is examined on the forecasting performance of DSS-LSTM. In the fourth subsection, the performance of DSS-LSTM is analyzed with different choices of prediction horizons. In the fifth subsection, the forecasting performance of DSS-LSTM is analyzed with the station-season specific variability of GHI. Finally, in the sixth subsection, the overall forecasting performance of DSS-LSTM is compared with the benchmark models. Table 3 gives the overall forecasting accuracy in terms of nRMSE and the Explained Variance Score. It is observed  that DSS-LSTM has a better nRMSE score if it is dealing with raw time-series. It is found that with raw data and under all climatic conditions, DSS-LSTM better captures data variability. For data-sets corresponding to Howrah-Winter, Chennai-Rainy, and Chennai-Winter, the model is capable of explaining 20-24% more variability in terms of Explained Variance Score as compared to when the data are pre-processed.

Supervised or non-supervised?
Here, we have presented the comparison between SVR (Supervised), Stateless LSTM (Non-Supervised within a batch), and Stateful LSTM (Non-Supervised across batches). Figure 6 compares the RMSE scores for 20 steps ahead prediction of GHI. We have observed that, • For all climatic zones, the performance of DSS-LSTM is more stable. • SVR produced notably higher RMSE scores. • In the rainy season, when the variability of GHI is high, DSS-LSTM has outperformed other models. • For both climatic zones, LSTM (stateful) outperformed LSTM (stateless).
In Table 4, we have observed that, for all climatic zones, the Non-Supervised approach outperformed the Supervised approach. Figure 7 shows forecasted GHI for the test set. For all climatic conditions, the DSS-LSTM  has outperformed other models. In all the cases, SVR has produced the worst predictions.

Effect of batch size
The test set has been split using alternative batch sizes of 1, 9, 18, 36, and 72. Table 5 compares the corresponding nRMSEs. Compared to a batch size of 9, 18, and 36, a batch size of 72 has produced approximately 28.64%, 25.50%, and 24.47% better nRMSE. As illustrated in Fig. 8, • For Chennai-Winter, Howrah-Winter, Ajmer-Winter, and Chennai-Rainy having the lower variability of GHI, as we increase batch size, the nRMSE increases and we get the best nRMSE for a batch size of 72. • However, in the case of Howrah-Rainy and Ajmer-Rainy having higher variability in GHI, the nRMSE increases as we increase batch size, but saturates for a batch size of 36.
Hence, the above discussion suggests that in the case of solar forecasting, for stations with high variability of GHI, smaller batch size is recommended for LSTM. However, for stations with lower variability of GHI, a bigger batch size will give better forecasting performance.

Input variability vs Network complexity
Here, the complexity of the DSS-LSTM models measured in terms of the number of hidden layer nodes is analyzed in context to variability in GHI. Out of the six input conditions Howrah-Rainy and Ajmer-Rainy exhibit maximum variability in terms of GHI. To perform this analysis, we have increased the number of hidden layers from 25 to 150, with a step size of 25.
As illustrated in Table 7 and Fig. 9, • The example cases of Chennai-Winter, Howrah-Winter, Ajmer-Winter, and Chennai-Rainy having the lower input variability need fifty nodes for optimal performance measured in terms of nRMSE. • However, for example, cases of Howrah-Rainy and Ajmer-Rainy having higher variability in GHI need hundred nodes for giving optimal performance measured in terms of nRMSE.
This supports the existing knowledge that higher variability in solar data needs more model parameters or nodes to achieve an adequate forecasting performance.

Comparison to other prediction approaches
In Table 8, the overall prediction performance of DSS-LSTM has been compared to the prediction performance of the methods suggested by Rana et al. [39], Abdel et al. [1], and Li et al. [26]. It has been observed that DSS-LSTM has produced a lower nRMSE score for all of the station-season combinations. For all data-sets, Abdel et al. [1], Li et al. [26], and Rana et al. [39] produced higher nRMSE compared to DSS-LSTM. Also, DSS-LSTM has achieved lowest mean rank compared to others. In paper [13], it has been observed that for forecasting solar irradiation 1-6 h ahead, locations with less variability of solar irradiation, ARIMA, and MLP performed better with nRMSE score varying from 18.35% to 33.69% and from 18.26% to 33.84%. On the other side, the locations with high variability of solar irradiation, Bagged Regression Tree, and RF have performed better with nRMSE scores varying from 28.80% to 47.52% and from 28.76% to 48.34%. In paper [11], the authors have reported with overall 9.75% of nRMSE score. In our work, DSS-LSTM has achieved 2.25% of nRMSE. Therefore, the result shows that DSS-LSTM produces better or very competitive results over the papers [11,13] with a substantially lower value of nRMSE.

Conclusion
A stable short-term forecasting model for solar energy generation is critical as there is a lot of variance due to the subhourly cloud phenomenon. The proposed LSTM network model is designed to be part of a grid integration software platform that produces 25 and 30 steps ahead reliable forecasts for grid operators and other stakeholders to use in the energy management system. In our current work, we have performed an empirical investigation based on data from three solar stations from two climatic zones of India over two seasons for intra-hour short-term solar forecasting using the LSTM network. Some of our key recommendations for a better LSTM design from our study are as follows: • Pre-processing Using raw data in the case of solar forecasting, LSTM has been able to capture the average variability in predictions by 99% in terms of Explained Variability Score. In comparison, the average variability explained by LSTM applied to pre-processed data is 88%. Thus, we do not need to pre-process data to remove seasonality. • Supervised or Non-Supervised LSTM has performed better when we have preserved the order of the input data. Further to that, stateful LSTMs have produced better performance compared to the stateless LSTM.
• Batch Size It has been observed that the nRMSE decreases as we increase the batch size for stations with low variability in GHI, whereas, for the two stations where variability is high, the nRMSE decreases then saturates at a batch size of 36. • Effect of prediction horizon For winter data, 25 and 30 steps ahead prediction leads to nRMSE increase by 8.40% and 13.55% as compared to 20 steps ahead prediction for the DSS-LSTM. For the rainy season, the nRMSE of DSS-LSTM has correspondingly increased by 25.03% and 50.21%. • Input data variability and model complexity It has been observed that input data variability and model complexity are associated. Howrah-Rainy, Ajmer-Rainy need twice the number of nodes compared with the other four station-season combinations because of the variability in GHI. • Comparison to existing methods DSS-LSTM has outperformed Rana et al. [39], Abdel et al. [1], and Li et al. [26] by 52.20%, 15.83%, and 36.09% as measured by nRMSE. This model is also better in terms of mean rank.
For identifying a better design of LSTM networks, this work can be extended by including more input variables, including solar stations from other climatic zones.