Short-term nonlinear autoregressive photovoltaic power forecasting using statistical learning approaches and in-situ observations

Due to the low total cost of production, Photovoltaic energy constitutes an important part of the renewable energy installed in the world. However, photovoltaic energy is volatile in nature because it depends on weather conditions, which makes the integration, control and exploitation of this type of energy difficult for grid operators. In the traditional grid architecture, system operators have accumulated enough experience that enables them to determine how much operating reserves are required to maintain system reliability based on statistical tools. Still, with the introduction of renewable energy (wind and photovoltaic), the grid structure has changed, and to maintain grid stability, it is becoming fundamental to know renewable energy state and production that can be combined with other less variable and more predictable sources to satisfy the energy demand. Therefore, renewable energy forecasting is a straightforward way to integrate safely this kind of energy into the current electric grid, especially photovoltaic power forecasting, which is still at a relative infancy stage compared to wind power forecasting, which has reached a relatively mature stage. The goal of this work is to present, first, a short-term offline forecasting model that uses only in-situ (local) collected data. Also, the performances of several pure non-linear auto-regressive models are investigated against those of non-linear auto-regressive models with exogenous inputs. For this purpose, two well-known statistical learning techniques, namely Feed Forward Neural Network and Least Square Support Vector Regression, have been used. To test the performance of the models, the results obtained are compared with those of a benchmark model. In this paper, we used the persistent model as well as a multivariate polynomial regression model as benchmark.


Introduction
Photovoltaic (PV) power has proven to be one of the promising renewable energies in the recent years. This field has witnessed a significant increase in the value of investments; the production capacity reached 227 GW in 2015 compared to 5.1 GW in 2005. But with the emergence of renewable energy as a necessary alternative to the fossil one, new challenges have emerged, which requires both producers and managers to change control methods, distribution methods and all the related logistics. The main challenge is still the safe integration of renewable energy in the actual grid; it is challenging due to the volatile and uncertain nature of renewable power caused generally by weather conditions. In the traditional grid management, the grid operator must maintain the balance between supply and demand at all times to avoid security grid problems and economic losses. The grid operator uses a planning to ensure that power plants produce the right amount of electricity at the right time to meet consistently and reliably electric demand. Nowadays, the energy mixture has changed giving more place to renewable sources, which changed the structure of power grid and all traditional control and scheduling procedures. Recently photovoltaic power has begun to gain some place over other renewables; this is due to the lower total cost of production. But from grid management point of view, solar generation variability caused generally by clouds can make it more difficult for the grid operator to predict how much additional electric generation will be required to ensure the balance between supply and demand. For that reason, renewable power forecasting imposes itself as a key solution to efficiently handle renewable energy in power grid and must be properly accounted for in the complex decision-making processes required to balance supply and demand in the power system. Nowadays renewable power forecast is a key activity for a number of reasons. It is used for monitoring the performance of the plant, detecting anomalies and faults, making reliable dispatching plans for the grid operators, helping operation and maintenance scheduling … etc.
In the last years many research works have tried to handle the problem of PV power forecasting. The two main challenges of PV forecasting (this is also the cause of the poor penetration rate of PV systems) are variability and uncertainty, namely because the output of PV modules shows variability at all-time scales and the fact that this variability itself is difficult to predict, this fact subsequently makes the PV time series difficult to predict as shown in [36]. According to the state of the art, PV power forecasting models can be divided into three types: physical models, statistical models and hybrid models; Fig. 1. Physical models [14] are mathematical models based on a physical analysis of the process being studied; this model can contain a limit number of adjustable parameters, which have a physical meaning: in the case of photovoltaic, physical modeling uses mathematical equations that describe all the physical phenomena that govern PV conversion. Statistical models are used when there is not enough knowledge and information about the process and the parameters that influence it. Statistical modeling includes time series [1] and statistical learning models. Time series modeling aim to collect and study the past observations of a time series to fit a model which describes their internal structure (such as autocorrelation, trend or seasonality), the developed model is then used to forecast future values of the series, among the most used models we mention the AR, ARX, ARIMA models. A statistical learning model, called also black box model, is established from a set of measured variables X k (inputs) and a set of measurements Y k (outputs). We suppose that there is a relation between the X k and the Y k , and we try to determine a mathematical form of this relation. We say that we try to establish a model of the process from the available measures. Among statistical Learning tools, artificial neural network is the most used technique due to its performance proven over time. In the case of PV power forecasting different Neural Network architectures were used with a multiple choice of input parameters, among them we can cite the Elman Neural Network (ENN), generalized regression neural network (GRNN) [27], radial basis function neural network 1 3 (RBFNN), dynamic recurrent neural network (DRNN) [20] and the feed-forward neural network (FFNN), Which, in most cases, give the best results [20,27]. In the same category as neural networks, another statistical learning technique named Support Vector machine (SVM) is beginning to gain success due to its generalization skills approved in several case studies; it has also been used in solar power forecasting in many studies, it was used for classification [29,30] and regression [7,36]. Hybrid models are a class of models that can be constructed from any combination of physical and statistical models; they can be a combination between physical and statistical approaches [34] or purely statistical like combining SOM and RBFNN [3], or SOM, SVR and fuzzy inference [35], or wavelet transform and RBFNN [17] and so on.
The choice of the appropriate technique depends on several parameters, in general there is no fixed rule to choose the technique to use. According to the current state of the art [26], the choice of the technique depends more on the horizon, so physical models are used for medium term, statistical models for very-short and short terms and hybrid models for medium and long term. Still, we must report that input parameters are also a very important factor that can change the final results; different collections of inputs were used in the literature. Research has shown that the main variables influencing PV power are global horizontal irradiation (GHI) at PV generator surface [30], plate temperature [30] and aerosol index [16], but this does not exclude the other parameters like Numerical Weather Predictions (NWP) [20,27], Meteorological measurements made in ground stations, measurement of GHI and cloud coverage by Satellite [27,30], PV power measurements [7,27], Variables related to solar geometry and time (zenith angle, light duration) [30] … etc.
In this work we will combine the characteristics of time series models and statistical learning models in order to forecast the short term photovoltaic power. This combination is beneficial since it allows to merge the simplicity of time series models and the non-linear character of black box models, the result of this fusion is a nonlinear time series model. This study will allow us to; First, asses the performance of two supervised machine learning techniques for intraday PV power forecasting: Feed Forward Neural Network (FFNN) and least squares support vector machine regression (LSSVR). Second, it attempts to study the influence and sufficiency of in-setu collected data as input parameters to the developped models. For this purpose, we compared the performances of several models in order to find the best off-line model for PV power forecasting; by off-line we designate a model capable of giving accurate short-term forecasts without the need of weather forecasts. This is interesting because the majority of existing models use meteorological parameters to forecast PV power, especially forecasted parameters obtained from numerical weather prediction (NWP) systems like, solar irradiation forecasts [1,36], ambient temperature forecasts [8], humidity [14,27], cloud index [4,30], wind speed [3,16] and probability of precipitation [13,35]. The problem is that access to NWP information is not given to everyone at any time, especially isolated installations. For this purpose, off-line models that use only local collected information to forecast PV power are of a great importance for grid operators as well as for individuals who do not have access to weather data and forecasts. To rectify the performances of our models we will compare their performances with an usual benchmark models, the persistent model and multivariate polynomial model. This paper is structured as follows. Section 3 gives a brief look at related works; in Sect. 4, we introduce the used statistical learning techniques. Section 6 presents models development strategy, while result analysis and discussion are presented in Sect. 7, and finally a conclusion.

Related work
Photovoltaic power forecasting increasingly attracts the attention of researchers. In the last few years, several PV power forecasting models have been developed. In [1] an ARX model was used to forecast 6 h ahead PV power output using historical PV power output and forecast irradiation as inputs for the model. In same perspective, [4] uses a recurrent neural network to forecast 24-h ahead PV power using also Historical PV power and forecast temperature. In [17], wavelet transformation and radial basis function neural network (RBFNN) were combined to generate a one hour ahead PV Power forecast; the RBFNN inputs included past PV power output, irradiation and temperature. The authors in [27] adopt a hybrid modeling approach by applying stepwise regression to select meteorological parameters that are strongly correlated with solar power; these variables were used to construct an FFNN model for 24-h ahead PV power forecasting. This model outperforms five other ones. The authors underline that average solar irradiation and average humidity are the two most significant parameters to forecast PV power output. In [3] the authors analyses the performance of a 24-h ahead PV power forecasting tool based on multilayer perceptron (MLP) neural network trained with error back propagation (EBP) procedure; three types of inputs were used: weather forecasts provided by meteorological services, geographical coordinates of site, date and time to determine the correct position. They propose a procedure to validate the correctness of data and highlight that the method performance is strictly related to the historical data pre-process step and to the weather forecasting accuracy.
Another interesting approach based on weather type classification and similar day detection to forecast PV power for a horizon up to one day is used in [8] where authors use a Recurrent Neural Network with structural elements for 24-h ahead PV power output forecasting. The inputs used include clear sky irradiation and forecasting weather type for the forecast days. In [31] the historical power output is classified into several weather types using forecast irradiation, total cloud and low cloud cover as parameters selection; the authors uses RBFNN as a technique to produce PV Power forecasts with 24-h ahead horizon. In [12] forecasts of high, medium and low temperatures are used to classify historical PV power output into three weather types. After that, three feed forward neural networks (FFNN) were employed to generate 24-h ahead forecasts. In [35], the authors present a hybrid method to forecast 1-day ahead PV power output; the proposed method comprises three stages: data classification stage, training stage and forecasting stage. The classification stage is developed using self organizing map (SOM) and learning vector quantization (LVQ); the objective is to classify the historical PV power data into five weather types according to the verbal weather forecast of the TCWB (Taiwan Center Weather Bureau). In the second stage, support vector regression (SVR) is used to construct five forecasting models-one for each weather type. In the last stage, a fuzzy inference algorithm is used to select an appropriate forecasting model to achieve more accurate results. The work presented in [13] proposes a hybrid model for one-day ahead hourly PV power forecasting; this work is an extension of [35]. The proposed method comprises three stages: data classification stage, training stage and forecasting updating stage. The classification stage is developed using Fuzzy K-Means clustering algorithm; the objective is to classify the historical PV power data into five weather types according to the verbal weather forecast of the TCWB (Taiwan Center Weather Bureau). In the second stage, RBFNN is used to construct five forecasting models, one for each weather type, and a fuzzy inference algorithm is used to select an appropriate forecasting model. In the last stage, the forecasts are updated every 3 h to cope with the possible fluctuation of PV power.
As can be seen from this brief state of the art, the majority of existing models use predicted inputs to forecast PV power, especially inputs obtained from NWP systems. Access to NWP information is not given to everyone at any time, especially for the Africa region. For this reason, offline models that use only past information to forecast PV power are of a great importance. From this perspective, the goal of this work is to present, first, a short-term off-line forecasting model that uses only in-situ collected data. Also, the performances of several pure non-linear auto-regressive models are investigated against those of non-linear autoregressive models with exogenous inputs. As such, two wellknown statistical learning techniques, namely feed forward neural network (FFNN) and least square support vector machine (LSSVR), have been used.

Least squares support vector regression
The least squares support vector machine algorithm (LSSVM) is an improved version of the classical support vector machine (SVM) used to solve classification problems. Due to equality type constraints in the formulation, the solution will be obtained by solving a set of linear equations, instead of quadratic programming for classical SVM. Vapnik's SVM formulation [5] was modified in [31] into the following LSSVM optimization problem underlying non-linear LSSVM training formulated by: Subject to the equality constraints This formulation consists of equality instead of inequality constraints and takes into account a squared error with regularization term similar to ridge regression. The solution is obtained after constructing the Lagrangian: where i ∈ ℝ are lagrange multipliers that are always positive, from the conditions for optimality, one obtains the Karush-Kuhn-Tucker (KKT) system: Note that sparseness is lost, which is clear from the condition i = e i . As in standard SVM, we calculate neither w nor x i . Therefore, we eliminate w and e yielding according to [31].
For the kernel function K(., .) , here again, one typically has the following choices: (2) where d, c, and are constants. In the case of least squares support vector regression (LSSVR), some changes in LSSVM formulation will take place. In this case we try to find the best regression function in the form: , in this case the optimization problem is given by: Subject to the equality constraints The resulting dual problem in the case of regression will be: The final model will be:

Feed forward neural network
A neural network is a black box that directly learns the internal relations of an unknown system, without guessing functions for describing cause-and-effect relationships. It has been widely used in PV power forecasting [16,17,20,27]. Neural network has the power of a universal approximator [12,21], i.e., it can realize an arbitrary mapping of one vector space onto another vector space. The main advantage of neural networks is that they are able to use some a priori unknown information hidden in data (but they are not able to extract it). The process of 'capturing' the unknown information is called 'learning' or the training of neural network. In mathematical formalism to learn means to adjust the weight coefficients in such a way that some conditions are fulfilled [32]. To define a neural network first we introduce the static linear model defined as: where the vector w is the vector of the parameters of the model, and where the functions f (x) are non-parameterized functions for the variable x. The networks of neurons are included in the category of the nonlinear models in their parameters. The most current form of static network of neurons is a simple extension of the previous relation: which is presented in Fig. 2. A neuron is a nonlinear function, parameterized, with limited values. The variables on which the neuron operates are often indicated under the term of inputs of the neuron, and the value of the function under the term of output [19,22]. The parameters w i are indicated under the name of 'weights' or 'synaptic weights', because of the biological inspiration of the neural networks. The output of neuron is a nonlinear function of a combination of the variables x i weighted by the parameters w i . The parameter w 0 is a constant term called 'bias'. The function f is called the 'activation function'. The output of a neuron has as equation: A neuron realizes a nonlinear function. The advantage of the neurons lies in the properties which resulting from their association in networks, i.e., of the composition of the nonlinear functions fulfilled by each neuron. There is a large variety of topologies for this kind of networks; nevertheless, the most used topology is the so-called multi-layer perceptron (MLP), whose example is represented in Fig. 3. In this type of neural networks, the first layer is called 'the input layer', and the last layer is called 'output layer'. The layers between are hidden layers; this network carries out N c algebraic functions of N variable of the network.
The MLP is mathematically represented by the expression: where x is the vector of variables (of dimension n + 1 ), w 2 is the vector of weights of the second layer (of N c+1 dimension) and W 1 is the matrix of weights of the first layer (of dimension ( N c + 1, n + 1 ). By convention, the parameter w ij designates the weight between neuron j towards neuron i. The model g(x, w)) is a linear function of the parameters of the last layer, and it is a nonlinear function of the parameters of the first layer of connections.
In this paper we used a feed forward neural network (FFNN), which is a type of MLP, with a network of neurons not buckled. In the FFNN, information circulates from the inputs towards the outputs without 'feedback'. It can be represented by an acyclic graph whose nodes are neurons and the edges 'connections' between them. Training the network means to adjust its parameters. There exist two main types of training process: supervised and unsupervised training.In our case, we used a supervised learning process which is based upon the variation of the threshold coefficients w i0 'bias' and weight coefficients w ij to minimize the sum of the squared error. This objective is accomplished by minimizing of the objective function: where x o and x o are vectors composed of the computed and required values of the output neurons and summation runs over all output neurons o [12]. The training mode begins with arbitrary values of the weights; the network uses a training algorithm and a set of training data to adjust the weights in the direction that reduces the error, until achieving the optimal set of values. The hope is that the neural network so designed will generalize. A network is said to generalize well when the network learns to correctly associate input patterns to output patterns, even for input-output patterns never used in training stage [12].

Platform and data
The data used in this research work is collected from a hybrid platform located at the Moroccan School of and a 2.5 kw Darrieus wind turbine (apple-wind AW). Also, the platform contains a small meteorological station based on the SMA sunny SensorBox that measures global horizontal irradiance GHI, ambient and module temperature as well as wind speed. The meteorological parameters are recorded every 15 min; all measurements are stored via an SMA WEbBox. A detailed description of the platform is given in [9,10,33]. The characteristics of the PV plant are presented in Tables 1 and 2.
The used database consists of 6 months records, from 01 July to 31 December 2014, the records of 5 months, from July to October; contain missing data, while the records of December are intact. To deal with the problem of missing data we use a filling gap procedure. In general there is no definitive guide to replacing missing data in time-series [2]. In the case of photovoltaoc time serie, choosing the appropriate method depends on different factors such as length of existing data, availability of reliable meteorological data and climate of the location [24]. The conventional interpolation stills the most used methods [2,23,28] because of their simplicity, but they are not always the most efficient. Meany other methods are presented in literature, such as regression, ARIMA, Spline, polynomial fitting [2]. Or more sophisticated method like adaptive interpolation schemes (AISs) [2], temperature based approach (TBA), singular spectrum analysis (SSA), and statistically adjusted solar radiation (SASR) methods [24] or special methods like METSTAT (meteorological/statistical) solar radiation model [18]. Also statistical learning approach can be used in this context as [15] where authors adopt Support Vector Machine (SVM) to obtain a nonlinear weather-type classifier based on humidity and temperature as input variables, the SVM is used to choose the days with the same season type, authors impute a missing value by the average over a specific set of those similar days. In this work we used the conventional interpolation method to fill gaps in solar and PV data, we choose this method because of its simplicity and also because gap length do not exceed 3 h.

Statistical metrics
To evaluate the model accuracy, we must choose the right performance metrics because modeling is an iterative process, which consists of going back and forth between the output of the model and the desired value. Measuring forecasting error is important to validate the model, so it is necessary to use performance criteria that measure how close outputs (forecasts) are to the eventual outcomes. For this purpose, well-known statistical metrics are used, the mean absolute error (MAE), mean bias error (MBE), mean squared error (MSE), root mean squared error (RMSE) and R-square error, also called coefficient of determination ( R 2 ). These metrics are defined as.

Training procedures
In this work, we adopted a learning procedure composed from three stages, a pre-processing stage, training and validation stage and finally a test stage. In our case, the preprocessing stage is constituted by a gap filling procedure using linear interpolation method and a scaling procedure which consists of scaling data between zero and one. In the training stage we tried to find the best settings of the FFNN and LSSVR algorithms; for this purpose, we used a general work-flow composed from a training algorithm combined with 10-folds cross validation procedure based on the mean square error (MSE) as a judgment criterion. In the test stage, we use data that have not been used in the training stage to test the model performances. This procedure was adapted for the different algorithms and for all models. So for LSSVR, in the training and validation stage, we used the sequential minimal optimization (SMO) algorithm to find the parameters of radial basis function (RBF), used as kernel function, as well as the parameters from (1) and 2 from (9). The best model with the best parameters is used to calculate output forecasts. In the test stage the algorithm is fed with new data; the estimated outputs are compared with real outputs, and performance metrics are calculated to evaluate model accuracy. The best model will be the one that will give us the minimum forecasting error, Fig. 4 resume the used procedure. All simulations were done in the Matlab2015b environment; also, we used the standard librarie LS-SVMlab [25]. The obtained results are discussed in Sect. 7.
In the case of FFNN, training the neural network amounts to adjusting the synaptic weights w i as well as the number of hidden layers and the number of neurons in each hidden layer without forgetting to choose the right activation function. In order to find the optimal value of the synaptic weights, we used the lavenberg-Marquardt (LM) algorithm, which is an improvement of the classical gradient descent algorithm. For the number of hidden layers, we decided to use a single hidden layer FFNN since it is a universal approximator [12], therefore, the FFNN will consist of an input layer, a hidden layer and an output layer. The number of neurons in the input layer depends on the number of parameters used in each model, while the output layer consists of a single neuron with a linear activation function. To find the best number of neurons in the hidden layer, a sensitivity analysis was performed using the procedure reported in [11]. The final parameter to find is the activation function, the choice of activation function is an important design issue, it is a vital part of neural network providing nonlinear mapping potential and help achieving fast convergence and good generalization performance. To choose the right activation function we a asses the performance of the three most used activation functions in FFNN architecture, the radial basis function (RBF), the tangent sigmoid function (Tansig) and the logistic sigmoid function (Logsig), we do so for each developed model.
To carry out this simulation, we consider a number of neurons varying between 1 and 160, and we used the neural network MATLAB toolbox under Matlab2015b environment. The steps we followed are: 1. Choose the maximum number of neurons in the hidden layer "p" ( 1 ≤ p ≤ 160). 2. Initialize (init) the synaptic weights w i randomly. 3. Train the FFNN with those settings using the lavenberg-Marquardt algorithm and 10-folds cross-validation procedure. 4. Calculate the " n t " forecasts (estimations) obtained after training using validation data. 5. Calculate the Normalized Mean Absolute Error for each forecast NMAE % .
where P m is the measured value of the output power and P f the estimated one, C is the net capacity of the plant and N is the number of samples. 6. Repeat from step 2 for a chosen number of times (in this study we have repeated the initialization 100 times). 7. Calculate the relative sample mean NMAE p as an estimator of all possible NMAE % values.
where NMAE i,p is the NMAE % calculated for the i-th trial performed by the FFNN with the p-th settings. 8. Calculate the sample variance S p 2 and the sample standard deviation S p .
. Constructs a confidence intervals (CI) helping the estimation of the unknown population mean defined as: with ME a margin error defined as : with t is set by the relative t student distribution according to the degree of freedom equal to n t − 1. 10. After choosing the best settings according to NMAE % score, we retrain the FFNN and we use test data to evaluate his performances using statistical metrics presented in Sect. 5. Figure 5 resume the procedure used to train the FFNN.

Inputs selection
There are several statistical models that describe the photovoltaic phenomenon using different weather parameters that influence the conversion. According to the state of the art, the parameters that influence the most PV forecasting are horizontal solar irradiation (Irr) [6,30], cell temperature (Tc) [6,30], ambient temperature [6], and aerosol index [16]. In this work, the data used to forecast photovoltaic power are: past PV power generation (P), past measured global horizontal solar irradiation (Irr) and past measured photovoltaic modules temperature (Tc), collected via the SMA WEbBox. The characteristics of these data is that they are simple to collect locally and do not require a considerable investment. Mathematically, find a one step photovoltaic forecasting model is to find a function in the form: with X vector of input parameters, it can be a vector of exogenous parameters or a vector of pure auto-regressive parameters. This will give rise to two types of models, a non linear auto-regressive with exogenous inputs model and a pure non linear auto-regressive model. In this study, we focused on the choice of X and its influence on the accuracy of the model, (the vector X present the locally (in-situ) measured parameter). Also, we tested different combinations of the three locally measured parameters: the solar irradiation (Irr), the temperature of the cells (Tc) as well as the PV power (P). To compare the accuracy of the obtained models, statistical metrics were used. Also, to give more meaning to the results, we compared the performances of the models with two other statistical models used as a benchmark: the persistent model as well as a Multivariate Polynomial Regression model (MPR).

Results analysis and discussion
In this section, we will discus simulation results that describe the performances of several pure non-linear autoregressive models (NAR) against those of non-linear autoregressive models with exogenous inputs (NARX).

Nonlinear auto-regressive with exogenous inputs models (NARX)
In time series modeling, a nonlinear autoregressive with exogenous inputs model (NARX) is a nonlinear autoregressive model which has exogenous inputs. This type of models relates the current value of output to both past values of the same output and current and past values of externally inputs that influence the output of interest. Such a model can be formulated as.
where the function F is some nonlinear function, with Y is the variable of interest, U is the exogenous variable and t a forecasting error term. In this study we used a combinations of three in-situ measured parameters: the global horizontal solar irradiation (Irr) and the temperature of PV modules (Tc) as exogenous inputs U, and the PV power (P) as variable of interest Y. The first functions to evaluate are: We used the FFNN and LSSVR approaches to find the most accurate function F given in the Eqs. 31-36. The simulation results are presented here after.

Least square support vector regression
Using the procedure described in Sect. 6 and the sequential minimal optimization (SMO) algorithm to find the parameters of radial basis function (RBF) as well as the parameters and 2 , the best founded parameters for LSSVR-NARX models are presented in Tables 3 and 4.
The obtained results show interesting characteristics. First, all the models give forecasts with sufficient precision.
The classification of the models gives us an idea about the influence of output parameters on the precision of the results. After the analysis of the simulation results, we found that the Irr t and Tc t data alone are not sufficient, since the model MOD 1 gives less precise results than the other models. Also, according to the results of the model MOD 2 , we have observed that the combination of the parameters Irr t , Tc t and P t gives better results than those of MOD 1 . The results obtained by the model MOD 3 confirm this observation, since the addition of the parameter P t−1 to the model MOD 2 increases the accuracy of forecasts, whereas the addition of the Irr t−1 and Tc t−1 parameters to MOD 2 (which gives the model MOD 4 ) reduces the accuracy and gives results that are almost equivalent to those of MOD 1 . As a first remark, we can observe that the use of the Irr t , Tc t parameters gives good results, but adding historical power data greatly improves model accuracy. To examine this hypothesis, we decided to create other two models in which we will increase the number of auto-regressive inputs by adding to the model MOD 3 the input P t−2 to create MOD 5 , and after that we added the input P t−3 to create MOD 6 . The two new models realize the functions 35 and 36.
According to the simulation results resumed in Table 4, it can be concluded that, effectively, adding the past PV power values P t−2 and P t−3 helps to increase the accuracy of the offline forecasting model. The comparison of the different models leads us to underline the importance of the parameters P in this kind of model. Figure 6 shows the graphical results of the models MOD 1 to MOD 6 . From the results of the LSSVR-NARX models, the conclusion of this subsection is that for offline short-term PV power forecasting, the most influential parameter is the past PV power ( P t , … , P t−i ), while the parameters Irr t , Tc t add precision to model forecasts. This is logical, since the parameters P t−i implicitly contains information concerning the photovoltaic phenomenon, such as the effect of irradiation, temperature and even geographical parameters.

Feed-forward neural network
Training the FFNN amounts to adjusting the synaptic weights w i , as well as finding the best activation function and the best number of neurons in the hidden layer; for this  purpose, we used the ANN Sizing procedure in a way that allowed us to find the best configuration for each model (best activation function and best number of hidden neurons). To do so, first, for one of the models, we choose an activation function and we use the sizing procedure to choose the best number of hidden neurons, after that we change the activation function and we repeat the same procedure to find the best number of hidden neurons and so on until finding the best activation function and best number of hidden neurons for all developed models, of course while using the specific input parameters of each model. After choosing the best number of neurons in the hidden layer, the FFNN was re-trained 100 times in such a way that each training is made with a synaptic weights w i initialization different from the others. At the end of sizing procedure the FFNN that gives the minimum NMAE % is considered as the best model. This procedure gives a satisfactory results, so we us it to choose the best configuration of each one of the models. The obtained results are presented in Table 5. Also Fig. 7 shows an example of mean NMAE % evolution as well as its confidence interval (CI) during the training of the model MOD 5 . After analyzing all FFNN-NARX models results presented in Table 6 and Fig. 8, almost the same remarks as those of the LSSVR-NARX are made; the Irr and Tc inputs alone are not enough to make accurate forecasting. The MOD 1 model gives the worst results with an MSE = 0.0124. The addition of the past PV power as inputs considerably improves the accuracy of the models, since in MOD 2 just the addition of the input P t , alone, improved the MSE by almost 24.6% (MSE = 0.0089). According to the results of the models MOD 3 , MOD 3 and MOD 6 , we observed that, the more we increase the number of used past PV power as input, the more we increases the forecasts accuracy. But MOD 4 constitutes an aberrant case; normally we tend to think that the more we increase the number of inputs, the more we  add precision to our models. Here the MOD 4 is just the MOD 3 to which has been added the Irr t−1 and Tc t−1 inputs. We expected that MOD 4 will outperform the MOD 3 model, but the simulations show the opposite, with an increase in the forecast error by 20.2% with an MSE = 0.0099. Therefore, the results obtained are consistent with those of the LSSVR-NARX, except that the FFNN-NARX models demonstrates a slight superiority over the LSSVR-NARX models in 75% of the cases. For the FFNN-NARX models, the Tangent sigmoid activation function gives the best results in 83% of the cases, with a coefficient of determination R 2 = 90.73% given by MOD 6 .

Nonlinear autoregressive models
In this section, in order to choose the best offline PV power forecasting model and to analyze the influence of locally collected parameters on model accuracy, we investigate the results obtained from four pure nonlinear auto-regressive models (NAR). In time series modeling, the nonlinear autoregressive model specifies that the output variable non-linearly depends on its own previous values and on a forecasting error term. This type of models can be formulated as: To realize those models, we used only the past PV power values as input. In this case, we try to realize a pure nonlinear auto-regressive model without exogenous parameters. To find the best function F, we used, as for the NARX models, the LSSVR and FFNN approaches. To conduct this study, we have developed four models, MOD 7 , MOD 8 , MOD 9 and MOD 10 , which perform the following functions, respectively: here also the term designates the forecasting error; it has different values for the different models.

Least square support vector regression
This subsection presents the obtained simulation results using the LSSVR algorithm. After using the sequential minimal optimization (SMO), the best founded and 2 parameters for the LSSVR-NAR models are presented in Table 7. Also, the simulation results are resumed in Table 8.
In the previous subsection, it was concluded that the past PV generation is the most important parameter for short-term PV power forecasting model. The best NARX model, until now, is the model MOD 6 , which realizes the function given by (36). The results resumed in Table 8 and presented in Fig. 9 allow us to make the following remarks. First, all LSSVR-NAR models give good results; also, we observe that the more we add past PV values as inputs, the more we increase the forecasts accuracy. The comparisons between LSSVR-NAR models and LSSVR-NARX models revealed that LSSVR-NAR models give clearly better results. For example, the model MOD 7 that takes into account the current power P t as input gives a satisfactory result with an MSE = 0.0092 , a result that is better than those of MOD 1 . From the accuracy point (38) MOD 7 ∶ P t+1 =f P t + ,  of view, we notice that all LSSVR-NAR models give better results than the model MOD 6 (for example MOD 8 with only two inputs gives better results than MOD 6 that use six inputs). We also note that the more we add the past PV power terms, the more we increase the accuracy of the forecasts obtained. As a result, the model MOD 10 that realizes the function given by the equation (41) gives the best results; Fig. 10 shows the PV power forecasts obtained by this model. These results are interesting from a practical point of view, since the use of this type of model does not generate an important cost add: most of the used converters can record those parameters and their recuperation are done in an easy and safe way. As a conclusion, the comparison between the different models leads us to underline the importance of past PV power as input parameters in offline short-term PV power forecasting. Table 9 resume all LSSVR based models results.

Feed-forward neural network
As in the case of the FFNN-NARX models, the FFNN-NAR models were trained using the same ANN Sizing procedure; the Table 10 summarizes the results obtained by giving the best number of neurons in the hidden layer as well as the corresponding NMAE % . The results obtained by the FFNN-NAR models are resumed in Table 11 and presented in Fig. 11, the results join and rectify, again, those announced in the case of LSSVR-NAR case. All NAR models demonstrate superiority compared to NARX models. Again, the FFNN algorithm

Benchmark models
In this subsection, we present the results obtained by the two statistical benchmark models: the multivariate polynomial regression model and persistent model. The persistent model is regarded as a naive predictor (today equals tomorrow), and it is the most cost-effective forecasting model which assumes that the conditions will not change; as a result, the PV power at time t + 1 will be equal to those at time t. In spite of its simplicity, it provides a good benchmark against more sophisticated models and still the most popular reference model in shortterm PV power forecasting. On the other hand, the Multivariate Polynomial Regression Model (MPR) is a more sophisticated model. It is an extension of the ordinary polynomial regression, in which the relationship between the input variables x and the output variable y is modeled as an n th degree polynomial in x. Equation (42) presents an example of second order multiple polynomial regression: This can again be represented in Matrix form as: where is matrix of weights, X is matrix of input parameters and Y is the output. The two models are used as a benchmark to check the performance of the developed models, and subsequently it demonstrated the effectiveness of these models in the short-term PV power forecast. The     Fig. 12, whereas nonlinear models such as LSSVR and FFNN are more efficient under unstable weather conditions, Fig. 13, which present the results obtained in the case of a cloudy day, demonstrate this observation. So according to this comparison we can conclude that the models proposed for the PV forecast show a superiority over the benchmark models, especially the NAR models. Those results are very interesting knowing the importance of shortterm forecasts in the integration of photovoltaic sources in the energy mix and to guarantee the grid stability.

Conclusion
In the present contribution offline models have been proposed that allows us to forecast the short-term PV power using only information collected from local monitoring system, i.e., without the need of weather forecasts. The offline models are interesting for grid operators as well as for individuals because the majority of existing PV power forecasting models uses NWP, the issue is that access to NWP information is not given to everyone, especially for isolated installations. In the aim of studying the behavior of each model and each algorithm, we combined the simplicity of time series models (AR and ARX) and the non-linearity of statistical learning models (FFNN and LSSVR), also we used a different combination of collected data in the aim to analyze the influence of different locally collected data on forecasts accuracy.
During the simulations, it was observed that the FFNN gives different results each time the simulation is repeated, which is due to the problem of initialization, whereas the LSSVR gives a unique solution, which constitutes the optimal one (As long as there is sufficient training database). To improve the FFNN results and to avoid over-fitting and local-minima problems during FFNN learning a sizing procedure have been proposed, by using the ANN sizing procedure we observe that the performance of FFNN models have improved. The choice of the right activation function is an important design issue, according to simulation results we found that tangent sigmoid function (Tansig) gives the best results in 70% of the cases, and even if logistic sigmoid (Logsig) and radial basis (RBF) functions outperforms the Tansig in 30% of the cases, the Tansig still the best activation function according the global performances. Prior to training the proposed models, the data used was subjected to a pre-processing procedure which consists of filling the gaps using a linear interpolation method and scaling data between zero and one.
The comparison between all FFNN based models and LSSVR based models indicate that the FFNN algorithm slightly outperforms the LSSVR algorithm. But if we take into account the execution time things will change, since for the FFNN the procedure consumes a very important time (almost 17 min) to find the best parameters of an FFNN model, while the LSSVR trained with SMO consumes only 1 min and 20 s to find the best parameters, which is much lower than time consumed by the FFNN.
To test the performance of the proposed models, the results obtained are compared with those of the persistent model as well as a multivariate polynomial regression model (MPR) as benchmark. Comparison demonstrate the superiority of FFNN and LSSVR against MPR and persistent models, simulation results indicates that the FFNN-MOD 9 with RBF activation function and LSSVR-MOD 10 give the best results and outperform all other models with an MSE = 0.0065 and MSE = 0.0069 , respectively. Also results of the persistent technique and statistical techniques (MPR, FFNN and LSSVR) offer evidence regarding the advantage of using non-linear forecasting models over a trivial forecast.
To not underestimate the persistent model, we underline that the persistent model gives excellent results in the stable weather conditions (clear day), whereas nonlinear models (in addition to stable conditions) are more efficient under unstable weather conditions.
The comparison of our results with other works will not be fair, since the data used and the weather conditions change from one country to another and from one installation to another. what we can do is to compare our main findings with another work, to do so we choose [1] as main reference. According to simulation results it was observed that the NAR models give better results than NARX models. These results seem contradictory with those of [1] in which ARX models outperform the AR models, the difference is that authors in [1] uses NWP of global solar irradiation as input for NARX model, in our work we try to avoid the use of NWP parameters and using only locally collected data. Moreover, the present contribution demonstrates that the use of past photovoltaic power production as input improves the accuracy of forecasting models, and the use of past generated power data only is enough to have an accurate and acceptable short-term PV power forecasts. This result confirm the findings of [1] where authors indicates that solar power is most important input for making forecasts of horizon shorter than 2 h.
We must report that, the length of data used in this work does not allow the proposed models to adapt to all types of weather conditions, this will cause a decrease in performance of our models especially in the case of overcasting days. Also, to increase forecast horizon the use of only local collected data is not sufficient, in this case the use of weather forecasts will be an obligation. Those issues will be resolved in future works.

Compliance with ethical standards
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.