1 Introduction

The future structure of the modern power systems will depend heavily on renewable energy systems that had become an essential part of the new power mix due to its low prices in comparison to other conventional type of power generation [1]. As a result of the renewable energy issues as its intermittent, randomness and unpredictable nature which affect not only the stability of the power system but also has a negative effect on the control performance of the grid in addition to economic impact, new power grids configurations, starts to take place to have an optimized grid size and utilization of the new smart tools in the new smart grids.

New hybrid grids configuration starts to take place instead of large power grids; these hybrid grids include different types of energy generation and storage unit to minimize the impacts of the drawbacks of renewable energy resources using new energy management techniques to optimize the grid performance and maximize its economic impact [2].

Accurate forecasting of PV power generation became essential for solving the issues of PV units planning and operation that can affect the entire power system stability and optimization as well as economical aspects of the system operation [3].

There are different techniques used for accurate PV power generation forecasting, from physical modeling to statistical methods and artificial intelligence methods, from simple artificial neural networks (ANN) to more advanced networks like LSTM and GRU to more complicated hybrid networks and configuration. In [4], ANN is used to predict the monthly average daily total solar irradiation. Different adaptive feed-forward back-propagation ANN (AFFNN) model with different parameters configurations were introduced for short-term forecasting of PV production and the proposed AFFNN shows better results than classical ANN [5]. A combination of fuzzy logic tools with RNN is used for long-term forecasting of PV generation in [6]. LSTM-RNN model is used for PV generation forecasting and gives best result comparing to three different methods multiple linear regression (MLR), bagged regression trees (BRT) and basic NN in [7], while in[8] a different structured LSTM networks model is used for hour ahead forecasting for solar power, that achieve good results with reduced computation time. Hybrid model based on modified CNN and Bi-GRU is used for short-term PV power generation prediction with a good prediction performance is introduced in [9]. CNN-LSTM networks combined model is proposed for solar energy production and achieves performance improvement in comparison to other traditional models [10]. LSTM autoencoder model used for day ahead forecast for PV power generation is introduced in [11] and autoencoder but with GRU network is also used for short-term PV power generation forecasting in [12]. A convolutional autoencoder combined generative adversarial network is proposed to be used for day ahead PV power forecasting [13].

In this paper, a new hybrid prediction model integrating CNN and LSTM autoencoder network is suggested for short-term PV power generation forecasting at different time spans, 0.5 h, 1 h and 2 h. CNN networks deals with Matrix formatting data inputs and with different dimensions that’s why it is famous in computer vision applications. In the proposed model, CNN layer extracts the internal relationship features between the adjacent measured values taken at the same time sequence and from that special features are extracted making use of the CNN powerful capabilities. While LSTM networks have high capabilities in capturing temporal relationship features of sequencing data including stationary and non-stationary dynamics by extracting the date-time features together with the past history of data using their recurrent architecture and memory units, that’s why it is famous in time series analysis. By combining both CNN and LSTM in addition to autoencoder block, a better result is provided as mentioned and explained in the manuscript results.

Besides, the weather data of temperature and solar radiation are taken into consideration for accurate PV power forecasting. Rather than taken PV power data of the preceding days as a factor to express the weather condition changes like the state-of-the art techniques introduced in literature, the actual values of PV power generation along with temperature and solar radiation data are given directly as input variables to train the proposed model. The performance of the proposed hybrid model is evaluated and compared to that of CNN, LSTM and GRU models in different metrics. All models are developed and simulated using Python software using TensorFlow and Keras libraries.

The contribution of this work can be abstracted as follows:

  • The proposed hybrid model permits to exploit both of the spatial and temporal features in time series data comparing to the competitive single models.

  • Besides, the architecture of the proposed model permits the incorporation of weather data with the PV power generation data which contributed in accurate prediction results.

  • The proposed model has the ability to reduce the errors between the predicted and the actual time sequences data in different metrics as well as the time reduction in both of training and test phases as well as the reduction of time elapsed in both of training and test phases.

The rest of this paper is organized as follows: Section 2 presents background to the deep networks and their structures that are utilized in our proposal. Section 3 explains the proposed model for PV power generation forecasting, its construction, the dataset and the performance evaluation metrics. Section 4 presents the simulation results. The conclusion and the future work are given in Sect. 5.

2 The proposed networks for PV power forecasting based on deep learning

There are different deep learning models that are utilized in this study for PV power generation forecasting such as LSTM, CNN, GRU and LSTM autoencoder, as well as the models that combines CNN with LSTM and another model combining CNN and GRU. All these networks are detailed in the following sections.

2.1 LSTM network

LSTM network is an advanced version of the conventional recurrent neural network (RNN), where LSTM networks solves the long-range dependencies issue in RNN. Since it was first introduced by Sepp Hochreiter and Jurgen Schmidhuber in 1997 [14], LSTM network has been widely used in different applications such as text recognition as in [15] where a LSTM network with up to ten hidden layers is used for handwriting recognition and achieved a significant performance improvement. Natural language processing is another application where LSTM is used as in [16]; an improved NLP method is depicted using LSTM structure. Time series forecasting is another area of using LSTM, such as price and stock market forecast as presented in [17], where RNN-LSTM model is used to forecast future values for both GOOGL and NKE assets, and it was employed as a predictor model for microclimate data inside the Greenhouse [18]. LSTM networks have been also used for PV generation forecasting such as in [8] and [19].

The basic LSTM network structure includes connected memory blocks where each block contains different gates that determine the state of the memory block and the output; this architecture solves the problem of gradient disappearance and gradient explosion faced RNN network, where long-term information can be captured. LSTM structure is shown in Fig. 1.

Fig. 1
figure 1

LSTM Block

The LSTM block includes the following gates and memory cells:

  • Input Gate (\({i}_{k}\)) The Input gates control the input values to update the memory cell depending on certain conditions.

  • Forget Gate (\({f}_{k}\)) Forget gates set the information and internal states that need to be reset or throw away from the block depending on certain conditions.

  • Memory cell (\({C}_{k}\)) Memory cell is the main component of the LSTM block, its status is updated over time and depending on the previous state (\({C}_{k-1}\)) ensuring that gradient can pass across multiple time steps.

  • Output Gate (\({O}_{k}\)) Output gates generate the output based on the input values, previous output values and the status of memory cell that depends on certain conditions set what to output based on these givens.

Activation functions for each gate is sigmoid function \(\sigma \) and its hidden layers adopt the hyperbolic tangent function \({\text{tanh}}\)

The mathematical formula describing gates’ outputs in LSTM model are as follows:

$${i}_{k}=\sigma \left({w}_{xi}{x}_{k}+{w}_{hi}{h}_{k-1}+{b}_{i}\right)$$
(1)
$${f}_{k}=\sigma \left({w}_{xf}{x}_{k}+{w}_{hf}{h}_{k-1}+{b}_{f}\right)$$
(2)
$${\widetilde{C}}_{k}={\text{tanh}}\left({w}_{xc}{x}_{k}+{w}_{hc}{h}_{k-1}+{b}_{c}\right)$$
(3)
$${C}_{k}={f}_{k}\cdot {C}_{k-1}+{i}_{k}\cdot {\widetilde{C}}_{k}$$
(4)
$${O}_{k}=\sigma \left({w}_{xo}{x}_{k}+{w}_{ho}{h}_{k-1}+{b}_{o}\right)$$
(5)
$${h}_{k}={O}_{k}\cdot {\text{tanh}}\left({C}_{k}\right)$$
(6)
$${y}_{k}=\sigma \left({w}_{y}{h}_{k}+{b}_{y}\right)$$
(7)
$$ \ \sigma \left( x \right) = {1 / {1 + e^{ - x} }} $$
(8)
$$ \tanh \left( x \right) = {{e^x - e^{ - x} } / {e^x + e^{ - x} }} $$
(9)

where \({x}_{k}\) is the input, \({h}_{k}\) is the hidden layer output, \(w\) s are the cell state weight and \(b\) s are the bias terms for the input, output, forget gate, and cell.

2.2 GRU network

The gated recurrent unit (GRU) network architecture is similar to LSTM and is first introduced by Cho [20], in which both of them has input and output structures similar to ordinary RNN. Although GRU network has simpler structure comparing to LSTM networks, its internal structure is more complicated than the normal RNN. GRU has one gate less than LSTM, this reduces the matrix multiplication, and consequently, it can save a lot of time without impacting its performance [21, 22]. The structure of GRU block is shown in Fig. 2.

Fig. 2
figure 2

GRU Block

GRU’s are able to solve the vanishing gradient problem by using an update gate and a reset gate. The update gate handles information that flows into memory, and the reset gate controls the information that flows out of memory. Both of these gates are trained to save information from the past or remove information that is irrelevant to the prediction.

Here, the update gate \({Z}_{k}\), the reset gate \({r}_{k}\), the cell State \({C}_{k}\) and the new State \({h}_{k}\) are mathematically formulated as follows:

$${Z}_{k}=\sigma \left({w}_{z}{h}_{k-1}+{U}_{z}{x}_{k}\right)$$
(10)
$${r}_{k}=\sigma \left({w}_{r}{h}_{k-1}+{U}_{r}{x}_{k}\right)$$
(11)
$${C}_{k}={\text{tanh}}\left({w}_{c}{r}_{k}+{w}_{c}{h}_{k-1}+{U}_{c}{x}_{k}\right)$$
(12)
$${h}_{k}=\left({z}_{k}{C}_{k}\right)+\left({h}_{k-1}-{z}_{k}{h}_{k-1}\right)$$
(13)

\({x}_{k}\) is the input, \({h}_{k}\) is the hidden layer output, \(w\) s are the cell state weights and \(U\) s are the bias terms.

2.3 CNN network

Convolutional neural network (CNN) is another class of deep learning techniques that was invented by Yann LCun late 80s [23]. CNN has been successfully applied in different domains and achieved good performance such as image processing, recognition and classifications [24] and [25]. Natural language processing and speech recognition are another areas where CNN is widely used with great success [26, 27]. Earlier, CNNs gain more interest to be applied in industrial applications such autonomous mobile robots and self-driving cars [28] and lately it has been used extensively in computer vision applications [29, 30]. The architecture of CNN is shown in Fig. 3.

Fig. 3
figure 3

CNN network architecture

As shown in Fig. 3, CNN basic architecture consists of several types of layers, such as convolution layer, pooling layer, and fully connected layer; these layers are then repeated and connected in different sizes and forms depending on the application. The different CNN layers are demonstrated as follows:

  • Convolution layer it is a fundamental component of CNN, which contains several convolution kernels to generate new feature maps. The convolution operation performs well in local feature extraction, where the kernel weights are shared across all input maps.

  • Pooling layer it is usually used to reduce the in-plane dimensionality of input which results in decreasing the number of learnable parameters and helping to avoid overfitting. The pooling operations can be different types, such as max pooling and average pooling.

  • Fully connected layer it is often used for high-level inference which maps the features processed by the convolution layers and the pooling layers to the output layer.

In addition, both the convolution layers and pooling layer are equipped with a nonlinear activation function, such as hyperbolic tangent function (tanh) and rectified linear unit (ReLU).

2.4 LSTM autoencoder network

LSTM autoencoder is a new approach utilized for time sequence data prediction; both of its encoder and decoder structures are implemented using LSTM cells that have the capability of learning from temporal dependencies between sequences data. Input data sequence to the encoder is compressed to a fixed length vector, which is considered a bottleneck at the midpoint of the model, and from this vector, the decoder reconstructs the output data which is considered the progressive prediction [31,32,33,34]. The autoencoder configuration includes basically encoder layer that compress the input data and convert it to a code and decoder layer which decode this code. The architecture of LSTM autoencoder is shown in Fig. 4.

Fig. 4
figure 4

Autoencoder model architecture using LSTM

3 Proposed hybrid CNN-LSTM autoencoder model

In this study, a new hybrid model is proposed to be used for short-term PV power generation forecasting by using CNN and LSTM autoencoder network. The contribution of the new hybrid model is to combine both advantages of CNN and LSTM autoencoder in features extraction and the ability to learn patterns in data over long sequences, where CNN network layers extract the main spatial features of the time series window and then the LSTM layers learn the time series gradient and the dependencies of long range in time series. To our knowledge, the proposed hybrid model was not used for the PV power generation forecasting until now.

The proposed CNN-LSTM autoencoder model structure consists of the following:

  • Input layer

  • CNN hidden layer with 24 filter and using kernel size of 3.

  • Max pooling layer

  • LSTM encoder and decoder layers, three layers

  • Output layer

  • The total number of trained parameters are 75,844, as depicted in Table 1

  • The used activation function is ReLU and the optimization method is adaptive moment estimation (Adam) which facilitates the computation of learning rates for each parameter using the first and second moment of the gradient. Adam optimization algorithm requires less memory and outperforms on large datasets than other methods. The architecture of the proposed hybrid model CNN-LSTM autoencoder is shown in Fig. 5.

Table 1 CNN-LSTM autoencoder model parameters
Fig. 5
figure 5

The architecture of the proposed hybrid model CNN-LSTM autoencoder model

The following sub-sections present the used dataset to validate the performance of the proposed hybrid network as well as the performance evaluation metrics of the proposed hybrid CNN-LSTM autoencoder model for PV power generation forecasting.

3.1 Dataset

The solar PV generation dataset used in this study is collected from a real 5MW solar farm, the location of the farm is in southern UK, the resolution of the data is 30 min and the duration time of the collected data was from November 3, 2017, till December 17, 2019, with total of 37,200 samples points. The PV power generation dataset is prepared for preprocessing and then is normalized.

$${x}_{{\text{norm}}}=\frac{x-{x}_{{\text{min}}}}{{x}_{{\text{max}}}-{x}_{{\text{min}}}}$$
(14)

where\({ x}_{{\text{min}}}\) is the minimum value in the dataset, \({x}_{{\text{max}}}\) is the maximum value and \({x}_{{\text{norm}}}\) is the normalized value of \(x\).

As weather is an important component of energy systems, a weather data is used in this study as well, it includes temperature and irradiance data that has been extracted for the same period from several sites surrounding the solar farms using the MERRA-2 reanalysis data, and then they had been averaged to get the weather condition at the solar farm.

The proposed model is applied on a PV power generation data with and without weather data as well. A sliding window technique was used with a window of two days and no classification of seasons were used. The datasets were divided so that 70% of the data were used for training the models (27,840 sample points) and 30% of the dataset were used for testing the models (9360 sample points).

The scatter plots of power generation data versus both the solar radiation data and temperature data are shown in Figs. 6 and 7, respectively.

Fig. 6
figure 6

The scatter plots of power generation data versus solar radiation mean

Fig. 7
figure 7

The scatter plots of power generation data versus temperature mean

3.2 Performance evaluation metrics

The forecasting performance of the suggested hybrid model CNN-LSTM autoencoder and the other models used in the comparison LSTM, GRU, CNN-LSTM and CNN-GRU models are evaluated using the common metrics used in the literature such as root-mean-square error (RMSE) and mean absolute error (MAE). RMSE and MAE are calculated as follows:

$${\text{RMSE}}=\sqrt{\frac{\sum_{k=1}^{n}{\left({y}_{k}-{Y}_{k}\right)}^{2}}{n}}$$
(15)
$${\text{MAE}}=\frac{\sum_{k=1}^{n}\left|{y}_{k}-{Y}_{k}\right|}{n}$$
(16)

Also the elapsed time in training of each model had been calculated for the comparative models.

Here, \({y}_{k}\) is the actual value, \({Y}_{k}\) is the maximum value and \(n\) is the number of points.

4 Simulation results

The proposed hybrid CNN-LSTM autoencoder model for power generation forecasting is evaluated by the comparison with LSTM, GRU, CNN-LSTM and CNN-GRU models at different time horizons half hour ahead, one hour and two hours ahead using RMSE and MAE metrics. Simulation results of PV power generation forecasting using the proposed hybrid network and the other models performed on PV generation data with and without the corresponding weather data through the aforementioned metrics in training and test stages are demonstrated in Tables 2, 3, 4, 5, 6, 7 and also shown in Fig. 8. The prediction values versus the actual values are shown in Figs. 10, 11, 12, 13, 14, 15.

Table 2 Results for 0.5 h. ahead forecasting using PV data only
Table 3 Results for 0.5 h ahead forecasting using PV and weather data
Table 4 Results for 1 h ahead forecasting using PV data only
Table 5 Results for 1 h ahead forecasting using PV and weather data
Table 6 Results for 2 h ahead forecasting using PV data only
Table 7 Results for 2 h ahead forecasting using PV and weather data
Fig. 8
figure 8

Forecasting error (RMSE and MAE) for all models using PV power generation data: (a) RMSE without weather data, (b) MAE without weather data, (c) RMSE with weather data and (d) MAE with weather data

As given in the preceding tables from Tables 2, 3, 4, 5, 6, 7, the forecasting performance of the proposed CNN-LSTM autoencoder model accomplished best performance and lowest values of RMSE and MAE metrics at different time horizons either without or with weather data comparing to the other comparative models in all cases. Furthermore, the proposed CNN-LSTM autoencoder model in metrics of RMSE and MAE for 0.5, 1 and 2h ahead forecasting when taking into account the values of weather data, as depicted in Tables 3, 5 and 7, is achieving better results than those without weather data, as given in Tables 1, 4 and 6; this indicates that the proposed model can handle larger data with better performance comparing to other models. Also from the given results in the previous tables, it was found that both RMSE and MAE for all the models increased with increasing the horizon times of ahead forecasting.

Figure 8 shows the RMSE and the MAE of the used networks and for each forecasting horizon, the proposed CNN-LSTM autoencoder provides less error for all the forecasting horizons, and it is clear that the error of least horizon is better. Also it is obvious that using the weather data in the forecasting process improves the prediction, as it provides more input data that enhances the process. The proposed model works well regardless the dimensions of the input data.

In Fig. 9, the loss curve of the proposed model during the training and test phases versus the number of epochs either without or with weather data for all the forecasting horizons.

Fig. 9
figure 9

The loss curve over 100 epochs during the training and test phases of the proposed model at the different time horizons, (a) 0.5 h. ahead forecasting without weather data, (b) 0.5 h. ahead forecasting with weather data, (c) 1 h. ahead forecasting without weather data, (d) 1 h ahead 1 h. ahead forecasting with weather data, (e) 2 h. ahead forecasting without weather data and (f) 2 h. ahead forecasting with weather data

Loss curve represents mean square error (MSE) calculated over the number of epochs during the training phase to the proposed CNN-LSTM autoencoder model for updating its parameters and after that its performance assessment through the test phase, as shown in Fig. 9; the value of MSE increases with increasing the horizon times of forecasting in the two mentioned phases.

Figures 10, 11, 12, 13, 14, 15 show the original and the predicted values of the proposed hybrid CNN-LSTM autoencoder and the comparative methods with and without weather data at different time horizons.

Fig. 10
figure 10

The actual and predicted results for the 0.5 h ahead forecasting to PV power generation data using the comparative models with PV power generation data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM Autoencoder model

Fig. 11
figure 11

The actual and predicted results for the 0.5 h ahead forecasting to PV power generation data using the comparative models with PV power generation and weather data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM autoencoder model

Fig. 12
figure 12

The actual and predicted results for the 1 h ahead forecasting to PV power generation data using the comparative models with PV power generation data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM autoencoder model

Fig. 13
figure 13

The actual and predicted results for the 1 h ahead forecasting to PV power generation data using the comparative models with PV power generation and weather data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM autoencoder model

Fig. 14
figure 14

The actual and predicted results for the 2 h ahead forecasting to PV power generation data using the comparative models with PV power generation data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM autoencoder model

Fig. 15
figure 15

The actual and predicted results for the 2 h ahead forecasting to PV power generation data using the comparative models with PV power generation and weather data: (a) LSTM model, (b) CNN-LSTM model, (c) GRU model, (d) CNN-GRU model and (e) CNN-LSTM autoencoder model

From the results shown in previous Figs. 10, 11, 12, 13, 14, 15, it is clear that PV power generation is approximately near to zero value at night, whereas the tendency of all the prediction models is to attain satisfactory forecasting accuracy in the other durations' day. On the other hand, the forecasting power curves of the proposed hybrid CNN-LSTM autoencoder model at different horizons time are close to the actual values of PV power generation and its forecasting accuracy is higher than those with the other comparative models at for 0.5 h, one hour and two hours ahead forecasting either without or with weather data. From the results we can say that the proposed method was able to forecast the required horizon in different weather conditions. It is obvious that the prediction of less horizon is better than that for larger horizon.

Besides, an analysis of the comparative findings of the proposed hybrid model and some competitive approaches in literature are presented. Table 8 displays the results of the work presented and the recent studies [19] for PV power generation forecasting from the 0.5h ahead to 2h ahead. The results given in this work [19] are rescaled for conducting fair comparison.

Table 8 Comparison of the proposed hybrid model with G. Li model [19]

The results indicate the superiority of the suggested hybrid model comparing to the state-of-the art methods to all time horizon in metrics of RMSE and MAE where their values were low as given in Table 8.

5 Conclusion

In this paper, new hybrid model combining CNN and LSTM autoencoder for forecasting PV power generation at different times ahead is suggested. Instead of considering the prior information to adjacent days as a representative to weather condition changes like the introduced work in literature [8] and [19], the weather data of temperature and solar radiation are given as input factors along with the actual values of PV power generation to train the proposed model for accurate forecasting. From the above results, it is clearly depicted that integrating CNN block either to LSTM or GRU and with the autoencoder blocks reduce significantly the execution time during training the models with almost 70% less. The performance of the proposed model is improved with a range of 5–25% from the other models compared with, and the CNN-LSTM autoencoder model provides the best performance. In addition to that, when we use weather data, the time taken in training the models for PV power generation prediction increases but the models performance improves better than that using the PV power generation data only. Besides, the suggested hybrid CNN-LSTM autoencoder model outperforms the state-of-the-art models in literature in metric of RMSE and MAE, where the suggested hybrid model achieves low values which are almost 40–80% compared to the other models in literature depending on the forecasting interval.

In the future work, the proposed hybrid model is also recommended to be used for forecasting the power consumption as it does not depend upon the intrinsic properties of the sequence of sample data.