Introduction

Photovoltaic (PV) power has proven to be one of the promising renewable energies in the recent years. This field has witnessed a significant increase in the value of investments; the production capacity reached 227 GW in 2015 compared to 5.1 GW in 2005. But with the emergence of renewable energy as a necessary alternative to the fossil one, new challenges have emerged, which requires both producers and managers to change control methods, distribution methods and all the related logistics. The main challenge is still the safe integration of renewable energy in the actual grid; it is challenging due to the volatile and uncertain nature of renewable power caused generally by weather conditions. In the traditional grid management, the grid operator must maintain the balance between supply and demand at all times to avoid security grid problems and economic losses. The grid operator uses a planning to ensure that power plants produce the right amount of electricity at the right time to meet consistently and reliably electric demand. Nowadays, the energy mixture has changed giving more place to renewable sources, which changed the structure of power grid and all traditional control and scheduling procedures. Recently photovoltaic power has begun to gain some place over other renewables; this is due to the lower total cost of production. But from grid management point of view, solar generation variability caused generally by clouds can make it more difficult for the grid operator to predict how much additional electric generation will be required to ensure the balance between supply and demand. For that reason, renewable power forecasting imposes itself as a key solution to efficiently handle renewable energy in power grid and must be properly accounted for in the complex decision-making processes required to balance supply and demand in the power system.

Nowadays renewable power forecast is a key activity for a number of reasons. It is used for monitoring the performance of the plant, detecting anomalies and faults, making reliable dispatching plans for the grid operators, helping operation and maintenance scheduling\(\ldots\) etc.

In the last years many research works have tried to handle the problem of PV power forecasting. The two main challenges of PV forecasting (this is also the cause of the poor penetration rate of PV systems) are variability and uncertainty, namely because the output of PV modules shows variability at all-time scales and the fact that this variability itself is difficult to predict, this fact subsequently makes the PV time series difficult to predict as shown in [36]. According to the state of the art, PV power forecasting models can be divided into three types: physical models, statistical models and hybrid models; Fig. 1. Physical models [14] are mathematical models based on a physical analysis of the process being studied; this model can contain a limit number of adjustable parameters, which have a physical meaning: in the case of photovoltaic, physical modeling uses mathematical equations that describe all the physical phenomena that govern PV conversion. Statistical models are used when there is not enough knowledge and information about the process and the parameters that influence it. Statistical modeling includes time series [1] and statistical learning models. Time series modeling aim to collect and study the past observations of a time series to fit a model which describes their internal structure (such as autocorrelation, trend or seasonality), the developed model is then used to forecast future values of the series, among the most used models we mention the AR, ARX, ARIMA models. A statistical learning model, called also black box model, is established from a set of measured variables \(X_{k}\) (inputs) and a set of measurements \(Y_{k}\) (outputs). We suppose that there is a relation between the \(X_{k}\) and the \(Y_{k}\), and we try to determine a mathematical form of this relation. We say that we try to establish a model of the process from the available measures. Among statistical Learning tools, artificial neural network is the most used technique due to its performance proven over time. In the case of PV power forecasting different Neural Network architectures were used with a multiple choice of input parameters, among them we can cite the Elman Neural Network (ENN), generalized regression neural network (GRNN) [27], radial basis function neural network (RBFNN), dynamic recurrent neural network (DRNN) [20] and the feed-forward neural network (FFNN), Which, in most cases, give the best results [20, 27]. In the same category as neural networks, another statistical learning technique named Support Vector machine (SVM) is beginning to gain success due to its generalization skills approved in several case studies; it has also been used in solar power forecasting in many studies, it was used for classification [29, 30] and regression [7, 36]. Hybrid models are a class of models that can be constructed from any combination of physical and statistical models; they can be a combination between physical and statistical approaches [34] or purely statistical like combining SOM and RBFNN [3], or SOM, SVR and fuzzy inference [35], or wavelet transform and RBFNN [17] and so on.

The choice of the appropriate technique depends on several parameters, in general there is no fixed rule to choose the technique to use. According to the current state of the art [26], the choice of the technique depends more on the horizon, so physical models are used for medium term, statistical models for very-short and short terms and hybrid models for medium and long term. Still, we must report that input parameters are also a very important factor that can change the final results; different collections of inputs were used in the literature. Research has shown that the main variables influencing PV power are global horizontal irradiation (GHI) at PV generator surface [30], plate temperature [30] and aerosol index [16], but this does not exclude the other parameters like Numerical Weather Predictions (NWP) [20, 27], Meteorological measurements made in ground stations, measurement of GHI and cloud coverage by Satellite [27, 30], PV power measurements [7, 27], Variables related to solar geometry and time (zenith angle, light duration) [30]\(\ldots\) etc.

In this work we will combine the characteristics of time series models and statistical learning models in order to forecast the short term photovoltaic power. This combination is beneficial since it allows to merge the simplicity of time series models and the non-linear character of black box models, the result of this fusion is a nonlinear time series model. This study will allow us to; First, asses the performance of two supervised machine learning techniques for intra-day PV power forecasting: Feed Forward Neural Network (FFNN) and least squares support vector machine regression (LSSVR). Second, it attempts to study the influence and sufficiency of in-setu collected data as input parameters to the developped models. For this purpose, we compared the performances of several models in order to find the best off-line model for PV power forecasting; by off-line we designate a model capable of giving accurate short-term forecasts without the need of weather forecasts. This is interesting because the majority of existing models use meteorological parameters to forecast PV power, especially forecasted parameters obtained from numerical weather prediction (NWP) systems like, solar irradiation forecasts [1, 36], ambient temperature forecasts [8], humidity [14, 27], cloud index [4, 30], wind speed [3, 16] and probability of precipitation [13, 35]. The problem is that access to NWP information is not given to everyone at any time, especially isolated installations. For this purpose, off-line models that use only local collected information to forecast PV power are of a great importance for grid operators as well as for individuals who do not have access to weather data and forecasts. To rectify the performances of our models we will compare their performances with an usual benchmark models, the persistent model and multivariate polynomial model.

This paper is structured as follows. Section 3 gives a brief look at related works; in Sect. 4, we introduce the used statistical learning techniques. Section 6 presents models development strategy, while result analysis and discussion are presented in Sect. 7, and finally a conclusion.

Fig. 1
figure 1

Photovoltaic forecasting approaches

Related work

Photovoltaic power forecasting increasingly attracts the attention of researchers. In the last few years, several PV power forecasting models have been developed. In [1] an ARX model was used to forecast 6 h ahead PV power output using historical PV power output and forecast irradiation as inputs for the model. In same perspective, [4] uses a recurrent neural network to forecast 24-h ahead PV power using also Historical PV power and forecast temperature. In [17], wavelet transformation and radial basis function neural network (RBFNN) were combined to generate a one hour ahead PV Power forecast; the RBFNN inputs included past PV power output, irradiation and temperature. The authors in [27] adopt a hybrid modeling approach by applying stepwise regression to select meteorological parameters that are strongly correlated with solar power; these variables were used to construct an FFNN model for 24-h ahead PV power forecasting. This model outperforms five other ones. The authors underline that average solar irradiation and average humidity are the two most significant parameters to forecast PV power output. In [3] the authors analyses the performance of a 24-h ahead PV power forecasting tool based on multilayer perceptron (MLP) neural network trained with error back propagation (EBP) procedure; three types of inputs were used: weather forecasts provided by meteorological services, geographical coordinates of site, date and time to determine the correct position. They propose a procedure to validate the correctness of data and highlight that the method performance is strictly related to the historical data pre-process step and to the weather forecasting accuracy.

Another interesting approach based on weather type classification and similar day detection to forecast PV power for a horizon up to one day is used in [8] where authors use a Recurrent Neural Network with structural elements for 24-h ahead PV power output forecasting. The inputs used include clear sky irradiation and forecasting weather type for the forecast days. In [31] the historical power output is classified into several weather types using forecast irradiation, total cloud and low cloud cover as parameters selection; the authors uses RBFNN as a technique to produce PV Power forecasts with 24-h ahead horizon. In [12] forecasts of high, medium and low temperatures are used to classify historical PV power output into three weather types. After that, three feed forward neural networks (FFNN) were employed to generate 24-h ahead forecasts. In [35], the authors present a hybrid method to forecast 1-day ahead PV power output; the proposed method comprises three stages: data classification stage, training stage and forecasting stage. The classification stage is developed using self organizing map (SOM) and learning vector quantization (LVQ); the objective is to classify the historical PV power data into five weather types according to the verbal weather forecast of the TCWB (Taiwan Center Weather Bureau). In the second stage, support vector regression (SVR) is used to construct five forecasting models—one for each weather type. In the last stage, a fuzzy inference algorithm is used to select an appropriate forecasting model to achieve more accurate results. The work presented in [13] proposes a hybrid model for one-day ahead hourly PV power forecasting; this work is an extension of [35]. The proposed method comprises three stages: data classification stage, training stage and forecasting updating stage. The classification stage is developed using Fuzzy K-Means clustering algorithm; the objective is to classify the historical PV power data into five weather types according to the verbal weather forecast of the TCWB (Taiwan Center Weather Bureau). In the second stage, RBFNN is used to construct five forecasting models, one for each weather type, and a fuzzy inference algorithm is used to select an appropriate forecasting model. In the last stage, the forecasts are updated every 3 h to cope with the possible fluctuation of PV power.

As can be seen from this brief state of the art, the majority of existing models use predicted inputs to forecast PV power, especially inputs obtained from NWP systems. Access to NWP information is not given to everyone at any time, especially for the Africa region. For this reason, off-line models that use only past information to forecast PV power are of a great importance. From this perspective, the goal of this work is to present, first, a short-term off-line forecasting model that uses only in-situ collected data. Also, the performances of several pure non-linear auto-regressive models are investigated against those of non-linear auto-regressive models with exogenous inputs. As such, two well-known statistical learning techniques, namely feed forward neural network (FFNN) and least square support vector machine (LSSVR), have been used.

Statistical learning techniques

Least squares support vector regression

The least squares support vector machine algorithm (LSSVM) is an improved version of the classical support vector machine (SVM) used to solve classification problems. Due to equality type constraints in the formulation, the solution will be obtained by solving a set of linear equations, instead of quadratic programming for classical SVM. Vapnik’s SVM formulation [5] was modified in [31] into the following LSSVM optimization problem underlying non-linear LSSVM training formulated by:

$$\begin{aligned} \underset{w,b,e}{\mathrm{min}} \quad \jmath \left( w,b,e \right) =\frac{1}{2}w^{T}w+\frac{\gamma }{2} \sum _{i=1}^{N}e_{i}^{2}. \end{aligned}$$
(1)

Subject to the equality constraints

$$\begin{aligned} y_{i}\left[ w^{T}\varphi \left( x_{i} \right) +b \right] =1-e_{i}. \end{aligned}$$
(2)

This formulation consists of equality instead of inequality constraints and takes into account a squared error with regularization term similar to ridge regression. The solution is obtained after constructing the Lagrangian:

$$\begin{aligned} {\mathcal {L}} \left( w,b,e,\alpha \right) =\jmath \left( w,b,e \right) -\sum _{i=1}^{N}\alpha _{i}\left\{ y_{i}\left[ w^{T}\varphi \left( x_{i} \right) +b \right] -1+e_{i} \right\} , \end{aligned}$$
(3)

where \(\alpha _{i}\in {\mathbb {R}}\) are lagrange multipliers that are always positive, from the conditions for optimality, one obtains the Karush–Kuhn–Tucker (KKT) system:

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \mathcal {L} }{\partial w}=0 \rightarrow w=\sum _{i=1}^{N}\alpha _{i}y_{i}\varphi \left( x_{i} \right) \\ \frac{\partial \mathcal {L} }{\partial b}=0 \rightarrow \sum _{i=1}^{N}\alpha _{i}y_{i}=0\\ \frac{\partial \mathcal {L}}{\partial e_{i}}=0 \rightarrow \alpha _{i}=\gamma e_{i}\\ \frac{\partial \mathcal {L} }{\partial e_{i}}=0 \rightarrow y_{i}\left[ w^{T}\varphi \left( x_{i} \right) +b \right] -1+e_{i} =0 \end{array}\right. \end{aligned}$$
(4)

Note that sparseness is lost, which is clear from the condition \(\alpha _{i}=\gamma e_{i}\). As in standard SVM, we calculate neither w nor \(\varphi \left( x_{i} \right)\). Therefore, we eliminate w and e yielding according to [31].

$$\begin{aligned} \begin{bmatrix} \frac{0}{y}\mid&\frac{y^{T}}{\varOmega +\gamma ^{-1}I} \end{bmatrix}\begin{bmatrix} \frac{b}{\alpha } \end{bmatrix}=\begin{bmatrix} \frac{0}{1_{v}} \end{bmatrix} \end{aligned}$$
(5)

with \(y=\left[ y_{1},\ldots ,y_{N} \right]\), \(1_{v}=\left[ 1,\ldots ,1\right]\), \(e=\left[ e_{1}, \ldots , e_{N}\right]\) and \(\alpha =\left[ \alpha _{1}, \ldots , \alpha _{N} \right]\). Mercer’s condition is applied within the \(\varOmega\) matrix.

$$\begin{aligned} \varOmega _{ij}=y_{i}y_{j}\varphi \left( x_{i}\right) ^{T}\varphi \left( x_{j}\right) =y_{i}y_{j}K\left( x_{i},x_{j}\right) . \end{aligned}$$
(6)

For the kernel function \(K\left( .,.\right)\), here again, one typically has the following choices:

$$\begin{aligned} \left\{ \begin{array}{l} \mathrm{Linear\, Kernel} : K\left( x_{i},x_{j} \right) =x_{i}^{T}x_{j}\\ \mathrm{Plynomial\, Kernel} : K\left( x_{i},x_{j} \right) =\left( 1+\frac{x_{i}^{T}x_{j}}{c} \right) ^{d}\\ \mathrm{RBF\, Kernel} : K\left( x_{i},x_{j} \right) = \mathrm{exp}\left\{ -\frac{\left\| x_{j}-x_{i} \right\| _{2}^{2}}{\sigma ^{2}} \right\} \\ \mathrm{MLP\, Kernel} : K\left( x_{i},x_{j} \right) =\tanh \left( \kappa x_{i}^{T}x_{j}+\theta \right) \end{array}\right. , \end{aligned}$$
(7)

where \(d,c,\sigma\) and \(\theta\) are constants. In the case of least squares support vector regression (LSSVR), some changes in LSSVM formulation will take place. In this case we try to find the best regression function in the form:

$$\begin{aligned} y\left( x\right) =w^{T}\varphi \left( x\right) +b, \end{aligned}$$
(8)

with \(x\in {\mathbb {R}}^{n}\),\(y\in {\mathbb {R}}\). Given a training set \(\left\{ x_{i},y_{i} \right\} _{i=1}^{N}\), in this case the optimization problem is given by:

$$\begin{aligned} \underset{w,b,e}{\mathrm{min}} \quad \jmath \left( w,b,e \right) =\frac{1}{2}w^{T}w+\frac{\gamma }{2} \sum _{i=1}^{N}e_{i}^{2}. \end{aligned}$$
(9)

Subject to the equality constraints

$$\begin{aligned} y_{i}= w^{T}\varphi \left( x_{i} \right) +b+e_{i}. \end{aligned}$$
(10)

The resulting dual problem in the case of regression will be:

$$\begin{aligned} \begin{bmatrix} \frac{0}{1_{v}}\mid&\frac{1_{v}^{T}}{\varOmega +\gamma ^{-1}I} \end{bmatrix}\begin{bmatrix} \frac{b}{\alpha } \end{bmatrix}=\begin{bmatrix} \frac{0}{y} \end{bmatrix}, \end{aligned}$$
(11)

with \(\varOmega _{ij}=\varphi \left( x_{i}\right) ^{T}\varphi \left( x_{j}\right)\). The final model will be:

$$\begin{aligned} y\left( x \right) =\sum _{i=1}^{N}\alpha _{i}K\left( x_{i},x \right) +b, \end{aligned}$$
(12)

with \(\alpha _{i}=\gamma e_{i}\).

Feed forward neural network

A neural network is a black box that directly learns the internal relations of an unknown system, without guessing functions for describing cause-and-effect relationships. It has been widely used in PV power forecasting [16, 17, 20, 27]. Neural network has the power of a universal approximator [12, 21], i.e., it can realize an arbitrary mapping of one vector space onto another vector space. The main advantage of neural networks is that they are able to use some a priori unknown information hidden in data (but they are not able to extract it). The process of ’capturing’ the unknown information is called ’learning’ or the training of neural network. In mathematical formalism to learn means to adjust the weight coefficients in such a way that some conditions are fulfilled [32]. To define a neural network first we introduce the static linear model defined as:

$$\begin{aligned} g\left( x,w \right) =\sum _{i=1}^{P}w_{i}f_{i}\left( x \right) , \end{aligned}$$
(13)

where the vector w is the vector of the parameters of the model, and where the functions \(f\left( x\right)\) are non-parameterized functions for the variable x. The networks of neurons are included in the category of the nonlinear models in their parameters. The most current form of static network of neurons is a simple extension of the previous relation:

$$\begin{aligned} g\left( x,w \right) =\sum _{i=1}^{P}w_{i}f_{i}\left( x,w^{'} \right) , \end{aligned}$$
(14)

where \(f_{i}\left( x,w^{'} \right)\) are parameterized functions, called ’neuron’ which is presented in Fig. 2. A neuron is a nonlinear function, parameterized, with limited values. The variables on which the neuron operates are often indicated under the term of inputs of the neuron, and the value of the function under the term of output [19, 22]. The parameters \(w_{i}\) are indicated under the name of ’weights’ or ’synaptic weights’, because of the biological inspiration of the neural networks. The output of neuron is a nonlinear function of a combination of the variables \(x_{i}\) weighted by the parameters \(w_{i}\). The parameter \(w_{0}\) is a constant term called ’bias’. The function f is called the ’activation function’. The output of a neuron has as equation:

$$\begin{aligned} y=f\left[ w_{0} + \sum _{i=1}^{n} w_{i}x_{i}\right] . \end{aligned}$$
(15)

A neuron realizes a nonlinear function. The advantage of the neurons lies in the properties which resulting from their association in networks, i.e., of the composition of the nonlinear functions fulfilled by each neuron. There is a large variety of topologies for this kind of networks; nevertheless, the most used topology is the so-called multi-layer perceptron (MLP), whose example is represented in Fig. 3. In this type of neural networks, the first layer is called ’the input layer’, and the last layer is called ’output layer’. The layers between are hidden layers; this network carries out \(N_{c}\) algebraic functions of N variable of the network.

Fig. 2
figure 2

A simple neuron presentation

Fig. 3
figure 3

Multi-layer perceptron neural network architecture

The MLP is mathematically represented by the expression:

$$\begin{aligned} \begin{aligned} g\left( x,w \right)&=\sum _{i=1}^{N_{c}}\left[ w_{N_{c}+1,i} f\left( \sum _{j=1}^{n}w_{ij} x_{j} + w_{i0} \right) \right] +w_{N_{c}+1,0}\\&= w_{2}f\left( W_{1} x\right) , \end{aligned} \end{aligned}$$
(16)

where x is the vector of variables (of dimension \(n+1\)), \(w_{2}\) is the vector of weights of the second layer (of \(N_{c+1}\) dimension) and \(W_{1}\) is the matrix of weights of the first layer (of dimension (\(N_{c}+1,n+1\)). By convention, the parameter \(w_{ij}\) designates the weight between neuron j towards neuron i. The model \(g\left( x,w \right) )\) is a linear function of the parameters of the last layer, and it is a nonlinear function of the parameters of the first layer of connections.

In this paper we used a feed forward neural network (FFNN), which is a type of MLP, with a network of neurons not buckled. In the FFNN, information circulates from the inputs towards the outputs without ’feedback’. It can be represented by an acyclic graph whose nodes are neurons and the edges ’connections’ between them. Training the network means to adjust its parameters. There exist two main types of training process: supervised and unsupervised training.In our case, we used a supervised learning process which is based upon the variation of the threshold coefficients \(w_{i0}\) ’bias’ and weight coefficients \(w_{ij}\) to minimize the sum of the squared error. This objective is accomplished by minimizing of the objective function:

$$\begin{aligned} E=\sum _{o}\frac{1}{2}\left( x_{o} - \hat{x_{o}}\right) ^2, \end{aligned}$$
(17)

where \(x_{o}\) and \(\hat{x_{o}}\) are vectors composed of the computed and required values of the output neurons and summation runs over all output neurons o [12]. The training mode begins with arbitrary values of the weights; the network uses a training algorithm and a set of training data to adjust the weights in the direction that reduces the error, until achieving the optimal set of values. The hope is that the neural network so designed will generalize. A network is said to generalize well when the network learns to correctly associate input patterns to output patterns, even for input–output patterns never used in training stage [12].

Platform and statistical metrics

Platform and data

The data used in this research work is collected from a hybrid platform located at the Moroccan School of Engineering Sciences in Casablanca, Morocco (latitude = 33.5415060 and longitude = \(-7.6735389\)). The platform is composed from Photovoltaic and wind installations, the photovoltaic installation is a 3.2kw rooftop plant, with an inclination of \(40^{\circ }\) and facing south, the PV plant consists of 12 modules from the constructor Voltec Solar (six mono-crystalline and six poly-crystalline), each six modules are connected to an SMA SUNNY-BOY inverter. The two SMA SUNNY-BOY inverters are connected to an inverter/charger MultiPlus manufactured by Victron Energy, the inverter/charger MultiPlus allow us to control the charge/discharge of a 3 kw battery park as well as controling the injection into the grid. The wind installation is composed from a 2.4 kw vertical axis wind turbine (Skystream 3.7) and a 2.5 kw Darrieus wind turbine (apple-wind AW). Also, the platform contains a small meteorological station based on the SMA sunny SensorBox that measures global horizontal irradiance GHI, ambient and module temperature as well as wind speed. The meteorological parameters are recorded every 15 min; all measurements are stored via an SMA WEbBox. A detailed description of the platform is given in [9, 10, 33]. The characteristics of the PV plant are presented in Tables 1 and 2.

Table 1 PV cell characteristics
Table 2 Inverter characteristics

The used database consists of 6 months records, from 01 July to 31 December 2014, the records of 5 months, from July to October; contain missing data, while the records of December are intact. To deal with the problem of missing data we use a filling gap procedure. In general there is no definitive guide to replacing missing data in time-series [2]. In the case of photovoltaoc time serie, choosing the appropriate method depends on different factors such as length of existing data, availability of reliable meteorological data and climate of the location [24]. The conventional interpolation stills the most used methods [2, 23, 28] because of their simplicity, but they are not always the most efficient. Meany other methods are presented in literature, such as regression, ARIMA, Spline, polynomial fitting [2]. Or more sophisticated method like adaptive interpolation schemes (AISs) [2], temperature based approach (TBA), singular spectrum analysis (SSA), and statistically adjusted solar radiation (SASR) methods [24] or special methods like METSTAT (meteorological/statistical) solar radiation model [18]. Also statistical learning approach can be used in this context as [15] where authors adopt Support Vector Machine (SVM) to obtain a nonlinear weather-type classifier based on humidity and temperature as input variables, the SVM is used to choose the days with the same season type, authors impute a missing value by the average over a specific set of those similar days. In this work we used the conventional interpolation method to fill gaps in solar and PV data, we choose this method because of its simplicity and also because gap length do not exceed 3 h.

Statistical metrics

To evaluate the model accuracy, we must choose the right performance metrics because modeling is an iterative process, which consists of going back and forth between the output of the model and the desired value. Measuring forecasting error is important to validate the model, so it is necessary to use performance criteria that measure how close outputs (forecasts) are to the eventual outcomes. For this purpose, well-known statistical metrics are used, the mean absolute error (MAE), mean bias error (MBE), mean squared error (MSE), root mean squared error (RMSE) and R-square error, also called coefficient of determination (\(R^{2}\)). These metrics are defined as.

$$\begin{aligned} \mathrm{MAE}= & {} \frac{1}{n}\sum _{i=1}^{n}\left| {\hat{y}}_{i} -y_{i}\right| \end{aligned}$$
(18)
$$\begin{aligned} \mathrm{MBE}= & {} \frac{1}{n}\sum _{i=1}^{n}\left( {\hat{y}}_{i} -y_{i}\right) \end{aligned}$$
(19)
$$\begin{aligned} \mathrm{MSE}= & {} \frac{1}{n}\sum _{i=1}^{n}\left( {\hat{y}}_{i} -y_{i}\right) ^{2} \end{aligned}$$
(20)
$$\begin{aligned} \mathrm{RMSE}= & {} \sqrt{\frac{\sum _{i=1}^{n}\left( {\hat{y}}_{i} -y_{i}\right) ^{2}}{n}} \end{aligned}$$
(21)
$$\begin{aligned} R^{2}= & {} 1 - \frac{\sum _{i=1}^{n}\left( {\hat{y}}_{i}-y_{i} \right) ^{2}}{\sum _{i=1}^{n}\left( {\bar{y}}_{i}-y_{i} \right) ^{2}} \end{aligned}$$
(22)

Models development

Training procedures

In this work, we adopted a learning procedure composed from three stages, a pre-processing stage, training and validation stage and finally a test stage. In our case, the pre-processing stage is constituted by a gap filling procedure using linear interpolation method and a scaling procedure which consists of scaling data between zero and one. In the training stage we tried to find the best settings of the FFNN and LSSVR algorithms; for this purpose, we used a general work-flow composed from a training algorithm combined with 10-folds cross validation procedure based on the mean square error (MSE) as a judgment criterion. In the test stage, we use data that have not been used in the training stage to test the model performances. This procedure was adapted for the different algorithms and for all models. So for LSSVR, in the training and validation stage, we used the sequential minimal optimization (SMO) algorithm to find the parameters of radial basis function (RBF), used as kernel function, as well as the parameters \(\gamma\) from (1) and \(\sigma ^{2}\) from (9). The best model with the best parameters is used to calculate output forecasts. In the test stage the algorithm is fed with new data; the estimated outputs are compared with real outputs, and performance metrics are calculated to evaluate model accuracy. The best model will be the one that will give us the minimum forecasting error, Fig. 4 resume the used procedure. All simulations were done in the Matlab2015b environment; also, we used the standard librarie LS-SVMlab [25]. The obtained results are discussed in Sect. 7.

In the case of FFNN, training the neural network amounts to adjusting the synaptic weights \(w_{i}\) as well as the number of hidden layers and the number of neurons in each hidden layer without forgetting to choose the right activation function. In order to find the optimal value of the synaptic weights, we used the lavenberg–Marquardt (LM) algorithm, which is an improvement of the classical gradient descent algorithm. For the number of hidden layers, we decided to use a single hidden layer FFNN since it is a universal approximator [12], therefore, the FFNN will consist of an input layer, a hidden layer and an output layer. The number of neurons in the input layer depends on the number of parameters used in each model, while the output layer consists of a single neuron with a linear activation function. To find the best number of neurons in the hidden layer, a sensitivity analysis was performed using the procedure reported in [11].

The final parameter to find is the activation function, the choice of activation function is an important design issue, it is a vital part of neural network providing nonlinear mapping potential and help achieving fast convergence and good generalization performance. To choose the right activation function we a asses the performance of the three most used activation functions in FFNN architecture, the radial basis function (RBF), the tangent sigmoid function (Tansig) and the logistic sigmoid function (Logsig), we do so for each developed model.

To carry out this simulation, we consider a number of neurons varying between 1 and 160, and we used the neural network MATLAB toolbox under Matlab2015b environment. The steps we followed are:

  1. 1.

    Choose the maximum number of neurons in the hidden layer “p” (\(1 \le p \le 160\)).

  2. 2.

    Initialize (init) the synaptic weights \(w_{i}\) randomly.

  3. 3.

    Train the FFNN with those settings using the lavenberg-Marquardt algorithm and 10-folds cross-validation procedure.

  4. 4.

    Calculate the “\(n_{t}\)” forecasts (estimations) obtained after training using validation data.

  5. 5.

    Calculate the Normalized Mean Absolute Error for each forecast \(\mathrm{NMAE}_{\%}\).

    $$\begin{aligned} \mathrm{NMAE}_{\%} = \tfrac{1}{N\cdot C}\sum _{i=1}^{N}\left| P_{m} - P_{f}\right| \cdot 100, \end{aligned}$$
    (23)

    where \(P_{m}\) is the measured value of the output power and \(P_{f}\) the estimated one, C is the net capacity of the plant and N is the number of samples.

  6. 6.

    Repeat from step 2 for a chosen number of times (in this study we have repeated the initialization 100 times).

  7. 7.

    Calculate the relative sample mean \(\overline{\mathrm{NMAE}_{p}}\) as an estimator of all possible \(\mathrm{NMAE}_{\%}\) values.

    $$\begin{aligned} \overline{\mathrm{NMAE}_{p}} = \frac{1}{n_{t}}\sum _{i=1}^{n_{t}}\,\mathrm{NMAE}_{i,p}, \end{aligned}$$
    (24)

    where \(\mathrm{NMAE}_{i,p}\) is the \(\mathrm{NMAE}_{\%}\) calculated for the i-th trial performed by the FFNN with the p-th settings.

  8. 8.

    Calculate the sample variance \({S_{p}}^{2}\) and the sample standard deviation \(S_{p}\).

    $$\begin{aligned} {S_{p}}^{2}= & {} \frac{1}{n_{t}-1}\sum _{i=1}^{n_{t}}\left( \mathrm{NMAE}_{i,p} - \overline{\mathrm{NMAE}_{p}}\right) ^{2} \end{aligned}$$
    (25)
    $$\begin{aligned} S_{p}= & {} \sqrt{\sum _{i=1}^{n_{t}}\frac{\left( \mathrm{NMAE}_{i,p} - \overline{\mathrm{NMAE}_{p}}\right) ^{2}}{n_{t}-1}} \end{aligned}$$
    (26)
  9. 9.

    Constructs a confidence intervals (CI) helping the estimation of the unknown population mean \(\mu\) defined as:

    $$\begin{aligned} \mathrm{CI} = \overline{\mathrm{NMAE}_{p}}\,\pm \, \mathrm{ME}, \end{aligned}$$
    (27)

    with ME a margin error defined as :

    $$\begin{aligned} \mathrm{ME} = t_{\frac{\alpha }{2}}\left( \frac{S_{p}}{\sqrt{n_{t}}} \right) , \end{aligned}$$
    (28)

    with t is set by the relative t student distribution according to the degree of freedom equal to \(n_{t}-1\).

  10. 10.

    After choosing the best settings according to \(\mathrm{NMAE}_{\%}\) score, we retrain the FFNN and we use test data to evaluate his performances using statistical metrics presented in Sect. 5. Figure 5 resume the procedure used to train the FFNN.

Fig. 4
figure 4

The procedure used in the case of LSSVR training

Inputs selection

There are several statistical models that describe the photovoltaic phenomenon using different weather parameters that influence the conversion. According to the state of the art, the parameters that influence the most PV forecasting are horizontal solar irradiation (Irr) [6, 30], cell temperature (Tc) [6, 30], ambient temperature [6], and aerosol index [16]. In this work, the data used to forecast photovoltaic power are: past PV power generation (P), past measured global horizontal solar irradiation (Irr) and past measured photovoltaic modules temperature (Tc), collected via the SMA WEbBox. The characteristics of these data is that they are simple to collect locally and do not require a considerable investment. Mathematically, find a one step photovoltaic forecasting model is to find a function in the form:

$$\begin{aligned} P_{t+1}=f\left( X \right) , \end{aligned}$$
(29)

with X vector of input parameters, it can be a vector of exogenous parameters or a vector of pure auto-regressive parameters. This will give rise to two types of models, a non linear auto-regressive with exogenous inputs model and a pure non linear auto-regressive model. In this study, we focused on the choice of X and its influence on the accuracy of the model, (the vector X present the locally (in-situ) measured parameter). Also, we tested different combinations of the three locally measured parameters: the solar irradiation (Irr), the temperature of the cells (Tc) as well as the PV power (P). To compare the accuracy of the obtained models, statistical metrics were used. Also, to give more meaning to the results, we compared the performances of the models with two other statistical models used as a benchmark: the persistent model as well as a Multivariate Polynomial Regression model (MPR).

Fig. 5
figure 5

The procedure used in the case of FFNN training

Results analysis and discussion

In this section, we will discus simulation results that describe the performances of several pure non-linear auto-regressive models (NAR) against those of non-linear auto-regressive models with exogenous inputs (NARX).

Nonlinear auto-regressive with exogenous inputs models (NARX)

In time series modeling, a nonlinear autoregressive with exogenous inputs model (NARX) is a nonlinear autoregressive model which has exogenous inputs. This type of models relates the current value of output to both past values of the same output and current and past values of externally inputs that influence the output of interest. Such a model can be formulated as.

$$\begin{aligned} Y_{t} = F\left( Y_{t-1},Y_{t-2},\ldots ,Y_{t-N};U_{t-1},U_{t-2},\ldots ,U_{t-N} \right) + \varepsilon _{t}, \end{aligned}$$
(30)

where the function F is some nonlinear function, with Y is the variable of interest, U is the exogenous variable and \(\varepsilon _{t}\) a forecasting error term. In this study we used a combinations of three in-situ measured parameters: the global horizontal solar irradiation (Irr) and the temperature of PV modules (Tc) as exogenous inputs U, and the PV power (P) as variable of interest Y. The first functions to evaluate are:

$$\begin{aligned} {\mathrm{MOD}}_{1}: P_{t+1}= & {} F\left( {\mathrm{Irr}}_{t}, \mathrm{Tc}_{t} \right) + \varepsilon \end{aligned}$$
(31)
$$\begin{aligned} {\mathrm{MOD}}_{2}: P_{t+1}= & {} F\left( {\mathrm{Irr}}_{t}, \mathrm{Tc}_{t}, P_{t} \right) + \varepsilon \end{aligned}$$
(32)
$$\begin{aligned} {\mathrm{MOD}}_{3}: P_{t+1}= & {} F\left( {\mathrm{Irr}}_{t}, \mathrm{Tc}_{t}, P_{t-1}, P_{t} \right) + \varepsilon \end{aligned}$$
(33)
$$\begin{aligned} {\mathrm{MOD}}_{4}: P_{t+1}= & {} F\left( {\mathrm{Irr}}_{t-1}, {\mathrm{Irr}}_{t}, \mathrm{Tc}_{t-1}, \mathrm{Tc}_{t}, P_{t-1}, P_{t} \right) + \varepsilon \end{aligned}$$
(34)
$$\begin{aligned} {\mathrm{MOD}}_{5}: P_{t+1}= & {} F\left( {\mathrm{Irr}}_{t}, \mathrm{Tc}_{t},P_{t-2}, P_{t-1}, P_{t} \right) + \varepsilon \end{aligned}$$
(35)
$$\begin{aligned} {\mathrm{MOD}}_{6}: P_{t+1}= & {} F\left( \mathrm{Irr}_{t}, \mathrm{Tc}_{t},P_{t-3},P_{t-2}, P_{t-1}, P_{t} \right) + \varepsilon , \end{aligned}$$
(36)

We used the FFNN and LSSVR approaches to find the most accurate function F given in the Eqs. 3136. The simulation results are presented here after.

Least square support vector regression

Using the procedure described in Sect. 6 and the sequential minimal optimization (SMO) algorithm to find the parameters of radial basis function (RBF) as well as the parameters \(\gamma\) and \(\sigma ^{2}\), the best founded parameters for LSSVR–NARX models are presented in Tables 3 and 4.

Table 3 LSSVR–NARX parameters

The obtained results show interesting characteristics. First, all the models give forecasts with sufficient precision. The classification of the models gives us an idea about the influence of output parameters on the precision of the results. After the analysis of the simulation results, we found that the \({\mathrm{Irr}}_{t}\) and \(\mathrm{Tc}_{t}\) data alone are not sufficient, since the model \({\mathrm{MOD}}_{1}\) gives less precise results than the other models. Also, according to the results of the model \({\mathrm{MOD}}_{2}\), we have observed that the combination of the parameters \(\mathrm{Irr}_{t}\), \(\mathrm{Tc}_{t}\) and \(P_{t}\) gives better results than those of \(\mathrm{MOD}_{1}\). The results obtained by the model \(\mathrm{MOD}_{3}\) confirm this observation, since the addition of the parameter \(P_{t-1}\) to the model \(\mathrm{MOD}_{2}\) increases the accuracy of forecasts, whereas the addition of the \(\mathrm{Irr}_{t-1}\) and \(\mathrm{Tc}_{t-1}\) parameters to \(\mathrm{MOD}_{2}\) (which gives the model \(\mathrm{MOD}_{4}\)) reduces the accuracy and gives results that are almost equivalent to those of \(\mathrm{MOD}_{1}\). As a first remark, we can observe that the use of the \(\mathrm{Irr}_{t}\), \(\mathrm{Tc}_{t}\) parameters gives good results, but adding historical power data greatly improves model accuracy. To examine this hypothesis, we decided to create other two models in which we will increase the number of auto-regressive inputs by adding to the model \(\mathrm{MOD}_{3}\) the input \(P_{t-2}\) to create \(\mathrm{MOD}_{5}\), and after that we added the input \(P_{t-3}\) to create \(\mathrm{MOD}_{6}\). The two new models realize the functions 35 and 36.

Table 4 LSSVR–NARX models results
Fig. 6
figure 6

PV power forecasts obtained by the LSSVR–NARX models

According to the simulation results resumed in Table 4, it can be concluded that, effectively, adding the past PV power values \(P_{t-2}\) and \(P_{t-3}\) helps to increase the accuracy of the offline  forecasting model. The comparison of the different models leads us to underline the importance of the parameters P in this kind of model. Figure 6 shows the graphical results of the models \(\mathrm{MOD}_{1}\) to \(\mathrm{MOD}_{6}\). From the results of the LSSVR–NARX models, the conclusion of this subsection is that for offline  short-term PV power forecasting, the most influential parameter is the past PV power (\(P_{t},\ldots ,P_{t-i}\)), while the parameters \(\mathrm{Irr}_{t}\), \(\mathrm{Tc}_{t}\) add precision to model forecasts. This is logical, since the parameters \(P_{t-i}\) implicitly contains information concerning the photovoltaic phenomenon, such as the effect of irradiation, temperature and even geographical parameters.

Feed-forward neural network

Training the FFNN amounts to adjusting the synaptic weights \(w_{i}\), as well as finding the best activation function and the best number of neurons in the hidden layer; for this purpose, we used the ANN Sizing procedure in a way that allowed us to find the best configuration for each model (best activation function and best number of hidden neurons). To do so, first, for one of the models, we choose an activation function and we use the sizing procedure to choose the best number of hidden neurons, after that we change the activation function and we repeat the same procedure to find the best number of hidden neurons and so on until finding the best activation function and best number of hidden neurons for all developed models, of course while using the specific input parameters of each model. After choosing the best number of neurons in the hidden layer, the FFNN was re-trained 100 times in such a way that each training is made with a synaptic weights \(w_{i}\) initialization different from the others. At the end of sizing procedure the FFNN that gives the minimum \(\mathrm{NMAE}_{\%}\) is considered as the best model. This procedure gives a satisfactory results, so we us it to choose the best configuration of each one of the models. The obtained results are presented in Table 5. Also Fig. 7 shows an example of mean \(\mathrm{NMAE}_{\%}\) evolution as well as its confidence interval (CI) during the training of the model \(\mathrm{MOD}_{5}\).

Fig. 7
figure 7

Mean normalized mean absolute error \({\overline{\mathrm{NMAE}}}\) as function of number of neurons in the case of \(\mathrm{MOD}_{5}\)

Table 5 ANN sizing procedure results for NARX models

After analyzing all FFNN–NARX models results presented in Table 6 and Fig. 8, almost the same remarks as those of the LSSVR–NARX are made; the \(\mathrm{Irr}\) and \(\mathrm{Tc}\) inputs alone are not enough to make accurate forecasting. The \(\mathrm{MOD}_{1}\) model gives the worst results with an MSE = 0.0124. The addition of the past PV power as inputs considerably improves the accuracy of the models, since in \(\mathrm{MOD}_{2}\) just the addition of the input \(P_{t}\), alone, improved the MSE by almost 24.6% (MSE = 0.0089). According to the results of the models \(\mathrm{MOD}_{3}\), \(\mathrm{MOD}_{3}\) and \(\mathrm{MOD}_{6}\), we observed that, the more we increase the number of used past PV power as input, the more we increases the forecasts accuracy. But \(\mathrm{MOD}_{4}\) constitutes an aberrant case; normally we tend to think that the more we increase the number of inputs, the more we add precision to our models. Here the \(\mathrm{MOD}_{4}\) is just the \(\mathrm{MOD}_{3}\) to which has been added the \(\mathrm{Irr}_{t-1}\) and \(\mathrm{Tc}_{t-1}\) inputs. We expected that \(\mathrm{MOD}_{4}\) will outperform the \(\mathrm{MOD}_{3}\) model, but the simulations show the opposite, with an increase in the forecast error by 20.2% with an MSE = 0.0099. Therefore, the results obtained are consistent with those of the LSSVR–NARX, except that the FFNN–NARX models demonstrates a slight superiority over the LSSVR–NARX models in 75% of the cases. For the FFNN–NARX  models, the Tangent sigmoid activation function gives the best results in 83% of the cases, with a coefficient of determination \(R^{2}=90.73\%\) given by \(\mathrm{MOD}_{6}\).

Table 6 FFNN–NARX models Results for the three activation functions
Fig. 8
figure 8

PV power forecasts obtained by the FFNN–NARX models

Nonlinear autoregressive models

In this section, in order to choose the best offline PV power forecasting model and to analyze the influence of locally collected parameters on model accuracy, we investigate the results obtained from four pure nonlinear auto-regressive models (NAR). In time series modeling, the nonlinear auto-regressive model specifies that the output variable non-linearly depends on its own previous values and on a forecasting error term. This type of models can be formulated as:

$$\begin{aligned} Y_{t} = F\left( Y_{t-1},Y_{t-2},\ldots ,Y_{t-N}\right) + \varepsilon _{t}. \end{aligned}$$
(37)

To realize those models, we used only the past PV power values as input. In this case, we try to realize a pure non-linear auto-regressive model without exogenous parameters. To find the best function F, we used, as for the NARX models, the LSSVR and FFNN approaches. To conduct this study, we have developed four models, \(\mathrm{MOD}_{7}\), \(\mathrm{MOD}_{8}\), \(\mathrm{MOD}_{9}\) and \(\mathrm{MOD}_{10}\), which perform the following functions, respectively:

$$\begin{aligned} \mathrm{MOD}_{7}: P_{t+1}= & {} f\left( P_{t} \right) + \varepsilon , \end{aligned}$$
(38)
$$\begin{aligned} \mathrm{MOD}_{8}: P_{t+1}= & {} f\left( P_{t-1},P_{t}\right) + \varepsilon , \end{aligned}$$
(39)
$$\begin{aligned} \mathrm{MOD}_{9}: P_{t+1}= & {} f\left( P_{t-2},P_{t-1},P_{t}\right) + \varepsilon , \end{aligned}$$
(40)
$$\begin{aligned} \mathrm{MOD}_{10}: P_{t+1}= & {} f\left( P_{t-3},P_{t-2},P_{t-1},P_{t}\right) + \varepsilon , \end{aligned}$$
(41)

here also the \(\varepsilon\) term designates the forecasting error; it has different values for the different models.

Least square support vector regression

This subsection presents the obtained simulation results using the LSSVR algorithm. After using the sequential minimal optimization (SMO), the best founded \(\gamma\) and \(\sigma ^{2}\) parameters for the LSSVR–NAR models are presented in Table 7. Also, the simulation results are resumed in Table 8.

Table 7 LSSVR–NAR best parameters
Table 8 LSSVR–NAR models results
Fig. 9
figure 9

Forecasting results of LSSVR–NAR models

In the previous subsection, it was concluded that the past PV generation is the most important parameter for short-term PV power forecasting model. The best NARX model, until now, is the model \(\mathrm{MOD}_{6}\), which realizes the function given by (36). The results resumed in Table 8 and presented in Fig. 9 allow us to make the following remarks. First, all LSSVR–NAR models give good results; also, we observe that the more we add past PV values as inputs, the more we increase the forecasts accuracy. The comparisons between LSSVR–NAR models and LSSVR–NARX models revealed that LSSVR–NAR models give clearly better results. For example, the model \(\mathrm{MOD}_{7}\) that takes into account the current power \(P_{t}\) as input gives a satisfactory result with an \(\mathrm{MSE}=0.0092\), a result that is better than those of \(\mathrm{MOD}_{1}\). From the accuracy point of view, we notice that all LSSVR–NAR models give better results than the model \(\mathrm{MOD}_{6}\) (for example \(\mathrm{MOD}_{8}\) with only two inputs gives better results than \(\mathrm{MOD}_{6}\) that use six inputs). We also note that the more we add the past PV power terms, the more we increase the accuracy of the forecasts obtained. As a result, the model \(\mathrm{MOD}_{10}\) that realizes the function given by the equation (41) gives the best results; Fig. 10 shows the PV power forecasts obtained by this model. These results are interesting from a practical point of view, since the use of this type of model does not generate an important cost add: most of the used converters can record those parameters and their recuperation are done in an easy and safe way. As a conclusion, the comparison between the different models leads us to underline the importance of past PV power as input parameters in offline short-term PV power forecasting. Table 9 resume all LSSVR based models results.

Table 9 LSSVR all results
Fig. 10
figure 10

a PV power forecasts obtained by the model LSSVR–\(\mathrm{MOD}_{10}\) with error bars

Feed-forward neural network

As in the case of the FFNN–NARX models, the FFNN-NAR models were trained using the same ANN Sizing procedure; the Table 10 summarizes the results obtained by giving the best number of neurons in the hidden layer as well as the corresponding \(\mathrm{NMAE}_{\%}\).

Table 10 ANN sizing procedure results for NAR models

The results obtained by the FFNN–NAR models are resumed in Table 11 and presented in Fig. 11, the results join and rectify, again, those announced in the case of LSSVR–NAR case. All NAR models demonstrate superiority compared to NARX models. Again, the FFNN algorithm results slightly surpasses the results obtained by the LSSVR, but if we take into account the execution time things will change, since for the FFNN the procedure consumes a very important time (almost 17 min) to find the best parameters of an FFNN model, while the LSSVR trained with SMO consumes only 1 min and 20 s to find the best parameters, which is much lower than the time consumed by the FFNN, Table 12 resume all FFNN based models results. Now for the FFNN–NAR activation function, results demonstrate that radial basis activation function gives the best results in 50% of cases, with a coefficient of determination \(R^{2}=92.03\%\) given by \(\mathrm{MOD}_{8}\). Also, in all models, the NAR models outperform the NARX models.

Table 11 FFNN–NAR models Results
Fig. 11
figure 11

PV power forecasts obtained by the FFNN–NAR models

Table 12 FFNN all results
Table 13 MPR and persistent versus LSSVR and FFNN

Benchmark models

In this subsection, we present the results obtained by the two statistical benchmark models: the multivariate polynomial regression model and persistent model. The persistent model is regarded as a naive predictor (today equals tomorrow), and it is the most cost-effective forecasting model which assumes that the conditions will not change; as a result, the PV power at time \(t+1\) will be equal to those at time t. In spite of its simplicity, it provides a good benchmark against more sophisticated models and still the most popular reference model in short-term PV power forecasting. On the other hand, the Multivariate Polynomial Regression Model (MPR) is a more sophisticated model. It is an extension of the ordinary polynomial regression, in which the relationship between the input variables x and the output variable y is modeled as an \(n^{\mathrm{th}}\) degree polynomial in x. Equation (42) presents an example of second order multiple polynomial regression:

$$\begin{aligned} y=\beta _{0}+\beta _{1} x_{1}+\beta _{2} x_{2}+\beta _{11} x_{1}^{2}+\beta _{22} x_{2}^{2}+\beta _{12} x_{1} x_{2}+\varepsilon . \end{aligned}$$
(42)

This can again be represented in Matrix form as:

$$\begin{aligned} Y= \mathbf \beta X + \varepsilon , \end{aligned}$$
(43)

where \(\beta\) is matrix of weights, X is matrix of input parameters and Y is the output. The two models are used as a benchmark to check the performance of the developed models, and subsequently it demonstrated the effectiveness of these models in the short-term PV power forecast. The simulation results are given in Table 13. It is observed that the MPR model gives results as close as those of the FFNN and LSSVR models; while the persistent model gives an \(\mathrm{MSE} = 0.0092\). We will use the persistent model performance, with an \(\mathrm{MSE} = 0.0092\), to help us identify the models that deserve to be used, since they have to give more interesting results than those of the persistent model as they are more complex. Thus, the models that can be described as interesting are those that outperform the persistent model. According to this constraint, we observe that the LSSVR and FFNN are ranked first because there results outperform those of the persistent model except the model LSSVR–\(\mathrm{MOD}_{1}\) and FFNN–\(\mathrm{MOD}_{1}\). The MPR model also surpasses the persistent model in 70% of cases, but it remains less efficient than the LSSVR and FFNN. A simple comparison of the results shows the superiority of the two approaches, FFNN and LSSVR, in almost all models. Until now the FFNN is the best, of course with very high calculation time. However, the MPR and the LSSVR consume almost the same time of calculation but with a superiority of the LSSVR. Another observation is that the persistent model gives excellent results in the stable weather conditions (clear day) as can be seen in Fig. 12, whereas nonlinear models such as LSSVR and FFNN are more efficient under unstable weather conditions, Fig. 13, which present the results obtained in the case of a cloudy day, demonstrate this observation. So according to this comparison we can conclude that the models proposed for the PV forecast show a superiority over the benchmark models, especially the NAR models. Those results are very interesting knowing the importance of short-term forecasts in the integration of photovoltaic sources in the energy mix and to guarantee the grid stability.

Fig. 12
figure 12

a PV power forecasts obtained by all best models and b forecasting error

Fig. 13
figure 13

PV power forecasts obtained by all best models in a cloudy day

Conclusion

In the present contribution offline models have been proposed that allows us to forecast the short-term PV power using only information collected from local monitoring system, i.e., without the need of weather forecasts. The offline models are interesting for grid operators as well as for individuals because the majority of existing PV power forecasting models uses NWP, the issue is that access to NWP information is not given to everyone, especially for isolated installations. In the aim of studying the behavior of each model and each algorithm, we combined the simplicity of time series models (AR and ARX) and the non-linearity of statistical learning models (FFNN and LSSVR), also we used a different combination of collected data in the aim to analyze the influence of different locally collected data on forecasts accuracy.

During the simulations, it was observed that the FFNN gives different results each time the simulation is repeated, which is due to the problem of initialization, whereas the LSSVR gives a unique solution, which constitutes the optimal one (As long as there is sufficient training database). To improve the FFNN results and to avoid over-fitting and local-minima problems during FFNN learning a sizing procedure have been proposed, by using the ANN sizing procedure we observe that the performance of FFNN models have improved. The choice of the right activation function is an important design issue, according to simulation results we found that tangent sigmoid function (Tansig) gives the best results in 70% of the cases, and even if logistic sigmoid (Logsig) and radial basis (RBF) functions outperforms the Tansig in 30% of the cases, the Tansig still the best activation function according the global performances. Prior to training the proposed models, the data used was subjected to a pre-processing procedure which consists of filling the gaps using a linear interpolation method and scaling data between zero and one.

The comparison between all FFNN based models and LSSVR based models indicate that the FFNN algorithm slightly outperforms the LSSVR algorithm. But if we take into account the execution time things will change, since for the FFNN the procedure consumes a very important time (almost 17 min) to find the best parameters of an FFNN model, while the LSSVR trained with SMO consumes only 1 min and 20 s to find the best parameters, which is much lower than time consumed by the FFNN.

To test the performance of the proposed models, the results obtained are compared with those of the persistent model as well as a multivariate polynomial regression model (MPR) as benchmark. Comparison demonstrate the superiority of FFNN and LSSVR against MPR and persistent models, simulation results indicates that the FFNN–\(\mathrm{MOD}_{9}\) with RBF activation function and LSSVR–\(\mathrm{MOD}_{10}\) give the best results and outperform all other models with an \(\mathrm{MSE} = 0.0065\) and \(\mathrm{MSE} = 0.0069\), respectively. Also results of the persistent technique and statistical techniques (MPR, FFNN and LSSVR) offer evidence regarding the advantage of using non-linear forecasting models over a trivial forecast. To not underestimate the persistent model, we underline that the persistent model gives excellent results in the stable weather conditions (clear day), whereas nonlinear models (in addition to stable conditions) are more efficient under unstable weather conditions.

The comparison of our results with other works will not be fair, since the data used and the weather conditions change from one country to another and from one installation to another. what we can do is to compare our main findings with another work, to do so we choose [1] as main reference. According to simulation results it was observed that the NAR models give better results than NARX models. These results seem contradictory with those of [1] in which ARX models outperform the AR models, the difference is that authors in [1] uses NWP of global solar irradiation as input for NARX model, in our work we try to avoid the use of NWP parameters and using only locally collected data. Moreover, the present contribution demonstrates that the use of past photovoltaic power production as input improves the accuracy of forecasting models, and the use of past generated power data only is enough to have an accurate and acceptable short-term PV power forecasts. This result confirm the findings of [1] where authors indicates that solar power is most important input for making forecasts of horizon shorter than 2 h.

We must report that, the length of data used in this work does not allow the proposed models to adapt to all types of weather conditions, this will cause a decrease in performance of our models especially in the case of overcasting days. Also, to increase forecast horizon the use of only local collected data is not sufficient, in this case the use of weather forecasts will be an obligation. Those issues will be resolved in future works.