Local-Global Methods for Generalised Solar Irradiance Forecasting

As the use of solar power increases, having accurate and timely forecasts will be essential for smooth grid operators. There are many proposed methods for forecasting solar irradiance / solar power production. However, many of these methods formulate the problem as a time-series, relying on near real-time access to observations at the location of interest to generate forecasts. This requires both access to a real-time stream of data and enough historical observations for these methods to be deployed. In this paper, we propose the use of Global methods to train our models in a generalised way, enabling them to generate forecasts for unseen locations. We apply this approach to both classical ML and state of the art methods. Using data from 20 locations distributed throughout the UK and widely available weather data, we show that it is possible to build systems that do not require access to this data. We utilise and compare both satellite and ground observations (e.g. temperature, pressure) of weather data. Leveraging weather observations and measurements from other locations we show it is possible to create models capable of accurately forecasting solar irradiance at new locations. This could facilitate use planning and optimisation for both newly deployed solar farms and domestic installations from the moment they come online. Additionally, we show that training a single global model for multiple locations can produce a more robust model with more consistent and accurate results across locations.


Introduction
Power generators must accurately forecast their power output as any unplanned deviation put on the grid can push supply and demand away from equilibrium.In order to maintain stability, the grid operator is forced to intervene, taking action to balance the grid.The cost of this action, the balancing cost, is often passed onto the offending producer [1].As renewable energy sources become more prevalent, due to their variability, having timely and accurate forecasts of production is vital for their effective use [2].Predictions of just a few hours into the future can be used by generators to adjust their plans.In the case of the UK, generators can change their stated output (or by power) up to 1 hour before the generation period.
In the case of solar generation, irradiance (the power per unit area radiated from the sun [3]) is converted into electricity.Power output typically tracks the sun, peaking in the middle of the day.However, the amount of energy produced is dependent on total irradiance falling onto the panels and is susceptible to atmospheric interference [4].Changes in weather conditions can cause output to fluctuate abruptly throughout the day.
In the literature, a wide array of techniques have been used for forecasting solar irradiance, from ARIMA and Support Vector Regression [5,6] to deep learning (DL) methods [7,8].However, many of these approaches depend on observations of irradiance at the area of interest (AOI).For these methods to provide forecasts in "production", in addition to requiring enough historic data to effectively train a model, a feed of data from the AOI is required.Whilst new installations can be fitted with equipment capable of relaying irradiance observations, retrofitting existing plants can be cost-prohibitive, with domestic installations presenting an even larger challenge.
A solution to overcoming this data dependency is generating irradiance forecasts by regressing over weather data, a major factor in the variability of PV output.Doing so can also uncouple predictions from real-time irradiance observations.Access to point-based weather data (e.g.temperature, wind speed at a single location) is available from commercial providers who claim to offer accurate observations for virtually any point on the planet [9,10].This approach has been used successfully by [11,12,13].Additionally, realtime satellite imagery from providers such as [14] provide an alternative way to potentially capture a richer weather state as demonstrated in [15].The images capture features such as cloud cover; using a sequence of images, information such as their movements can be extracted and used in the forecasting models.
It has also been noted in the literature [16,17] that most existing methods focus on providing forecasts for a single location.Using this "Local" approach, each AOI would require its own model with enough corresponding historic data to be trained.When dealing with multiple AOIs developing and maintaining a Local model for each is not practical.To say nothing of the challenges this approach would present if used on a domestic level.
For new installations, this data simply might not exist.Furthermore, applying this approach on a domestic level would result in potentially thousands of models, an outcome that seems fundamentally floored.
Rather than focusing on a single AOI one can attempt to generate forecasts for multiple locations.Taking this "Global" approach and creating a generalised model to predict for multiple AOIs eliminates the practical challenges of managing multiple Local models.Furthermore, the use of a Global model presents several advantages.A Global model can result in higher quality forecasts by learning from multiple locations' data [18,19].One can even use a Global model to generate forecasts for an unseen AOI.This enables forecasting for locations, regardless of whether historical data is available, such as in the case of a new AOI.In this case, as time goes on, data collected at the AOI could be used to refine the model further.
A limited number of works exist exploring the use of a Global model to generate irradiance forecasts for multiple AOIs [17,20].Both works create Global models capable of generating robust forecasts for multiple locations.In [17] they compared a Global DNN to alternative Local methods while [20] specifically focused Global models for AOIs with no historic data.Neither explores the use of various possible input features (e.g., weather data, realtime irradiance) in both Local and Global approaches.
The main goal of this work is to provide practical industry applicable approaches useful for short to medium-range (intra-day) planning.We compare Local and Global models using several typical machine learning (ML) methods used for irradiance forecasting.We further extend the Global approach to circumvent some of the potential real-world data dependency issues and analyse their effect on performance; Global Plant Holdout (GPH) to handle a lack of historic data and Global Plant kNear (GPkN) to cope with missing real-time data.Additionally, we compare the use of satellite images to point base ground weather data.
• We show that Global models can leverage data from multiple locations for improved forecasting performance and they can generalise to unseen locations removing the need for historical data to predict at unseen AOIs.
• We analyse the impact use of real-time irradiance has on forecasts and explore methods to uncouple irradiance predictions from real-time observations by substituting observations from nearby plants.
• We compare a number of standard ML methods commonly used for forecasting Irradiance (Random Forests, DNN, LSTM and CNN).
• We show that using rich weather data from satellites can produce better forecasts.
We validate our proposed methods using irradiance measurements and corresponding weather data from 20 AOIs distributed across the UK (see Figure 3).We compare the performance of our models across four training schemes, Local, Global, GPH and GPkN, outlined in Section 3.2.
The remainder of this paper is structured as follows; Section 2 provides an overview of the literature.Section 3 describes our proposed methods while Section 4 outlines our experimental framework as well as gives details of the data used.In Section 5 we present our results and analyse the performance of our proposed methods.Finally, Section 6 concludes the paper.

Background
In this section, we provide background information on the problem domain and provide an overview of the various ML based methods used for irradiance forecasting.We start by formalising the irradiance forecasting problem in Section 2.1.In Section 2.2 we give an overview of the different kinds of weather data.Section 2.3 gives an overview of the various ML based irradiance forecasting methods, discussing their advantages and their main caveats.

Problem Definition
In its simplest form, our goal at time step t 0 is to predict future irradiance values θ t 0 = [i t+1 , . . ., i t+fh ], where fh is our forecast horizon.If we assume there exists a function that can map an input X to irradiance forecasts f (X t i ) = θ t i we can define a ML problem to approximate the function f using a model m such that m(X t i ) − → θt i .Here X represents any data that could be used by a model such as weather data, historic irradiance observations, or even the time of day.

Weather Data
There are numerous types of weather data.We focus on two types, ground-based and satellite weather.Ground-based weather data is comprised of observations of a number of variables such as wind speed, temperature, etc, made at an observation station in a fixed location.In the UK there exists a large fleet of stations distributed all over the country.Both observations and forecasts for these locations are made easily accessible from commercial suppliers such as [9].Satellite data consists of observations of several wavelengths of light and is typically presented as an image where each pixel covers an area in the order of 3km 2 .The satellite observations can be further processed to extract estimates of weather state such as cloud cover or temperature.Satellite images covering the UK are made hourly by [14].

Machine learning methods
In the literature, one can find multiple ML based methods of irradiance forecasting.These can be loosely divided into two methods: regression-based (Section 2.3.1) and time-series based (Section 2.3.2).However many newer methods combine elements from both approaches (Section 2.3.3).

Regression
Regression models use correlated values to make predictions.As previously noted, the predominant variable1 in how much solar irradiance makes it to the ground is the weather [2].Since weather and irradiance are correlated, a regressive model can be learned to predict irradiance using weather data.Using the above notation, we formulate the problem as; given a set of weather values, W t i = [wind, temp, pres]; we aim to create a model where m(W t i ) − → θt i .There are many examples throughout the literature of irradiance forecasts being created from regressing over weather data using a variety of techniques [21,22,23,13,24,25].Although classical ML approaches such as Support Vector Regression and Decision Trees are actively used for irradiance forecasting [26,27].In recent years, neural networks (NNs) have proved to be a highly effective tool for regression due to their power and flexibility as function estimators [28].Accordingly, a number of NN based methods have been employed [12,22] with convolutional neural networks (CNNs) being used to regress over satellite images to produce predictions [15].
An advantage of modelling irradiance predictions as a regression problem using weather data is that predictions are uncoupled from real-time irradiance observations such as in [17].However, regressive models typically map one set of input features to one output prediction [12,11].In order to forecast irradiance, future weather states are needed.As such, regressive methods must rely on the use of externally provided weather forecasts.
While the use of weather forecasts can introduce another level of uncertainty to the models.It also presents another challenge as updated predictions can only be made when new weather forecasts are provided.As such, the forecast used can limit the timeliness of a model's predictions.For example, If a model produces an hourly forecast with a 12-hour horizon using weather data that updates once every 6 hours, while predictions are made for all steps, there are times when the forecast is stale and its useful horizon reduced.For example, if a prediction was made at 6 am, covering all steps until 6 pm, by 9 am there are only 9, dated, forecast steps remaining.As such, weather data sources used must be carefully considered as they can affect both the performance and capability of models.

Time series
Time-series modelling is an alternative method for building forecasting models.A time series is a sequence of data where the order of observations matters, typically there exists a sequential relationship between examples [29].Time series forecasting methods use previous observations of the value the model intends to predict as input.In the case of our irradiance forecasting problem, observations from the last few hours would be used to predict the future values in the sequence.Using the notation from above we can formalise the ML problem as: given a sequence of irradiance measures, We aim to create a model m such that m(I t ) − → θt i .There are numerous methods are used throughout the literature for time series forecasting on a wide array of problems [30,31,32,33,34].We split these further into two approaches, autoregressive and sequence modelling.
Autoregressive.approaches use a lagged window over a number of the previously seen examples to create a forecast [35].There are several examples of purely autoregressive models being used for irradiance forecasting.These include classical statistical methods such ARIMA [5], as well as deep neural networks (DNNs) [36,37].The length of the window is a hyperparameter that can be tuned, however, the computational complexity will increase accordingly.While effective, autoregressive methods can only model sequences that fall within the lagged window.As such, they can fail to capture relationships that are beyond the length of the window [29].
Sequence modelling.can solve this issue, by modelling an evolving state.With the rise of deep learning, recurrent neural networks (RNNs) have become a popular way to do so.In order to create forecasts, RNNs process elements sequentially.At each step, t, taking both input features X t , and the models' previous state α t−1 from the preceding step as input, and outputs a prediction, θt and creating new state α t .
By passing in its previous state information can be transmitted from one time-step to another.long short-term memory (LSTM) architectures have proved to be a highly effective form of RNN, by selectively taking in its state from previous time steps they are able to model sequence dependencies [29,38].Some examples of LSTMs being used to generate irradiance forecasts can be found in [7,8].
Despite their effectiveness, both autoregressive and sequence modelling approaches require access to real-time data in order to make predictions.From a practical standpoint, this limits the applicability of time series models to locations that can provide a feed of data.

Hybrid
While it is possible to use just the irradiance sequence or weather data to create a forecasting model.It is possible to create a hybrid architecture that can leverage both kinds of data in a single model.Li et al proposed combing an ARIMA-based model with extra weather data to improve performance [39].Many of the more recent methods we have classed as time series are in fact hybrid, using both irradiance and weather data [8,33].

Local and Global Models
Local and Global methods are ways to create forecasting models with datasets containing multiple time series.The Local approach creates a model per series while the Global fits a single model to all of them [18].In [19,20], the Global approach is referred to as cross-learning.In the context of irradiance forecasting, Local and Global approaches are different ways to manage the data and model(s) when there are multiple AOIs.A Local model is an individual model learned for a specific location (AOI).A Global model is generalised across locations.In our case, a Global model is defined as a single model that can predict all AOIs within the set.
While works such as [40] enable predictions for locations with limited history by selectively extracting extra data from correlated locations, they are still Local models and as such would struggle to generalise to unseen locations.As previously mentioned, to the best of our knowledge a limited number of works exist on Global methods for irradiance forecasting.In [17], by leveraging a combination of satellite observations and irradiance forecasts from the European center for medium-range weather forecasts (ECMWF) a Global DNN model was trained.However, using the ECMWF their ability to output timely forecasts is limited as their forecasts update every 6 hours.To address these issues, we propose making use of standard, and widely commercially available, weather forecasts.Many of these update sub-hourly removing any data dependency [9].
Bottieau et al use weather forecasts to create a number of Global models using a number of ML methods to generate predictions at locations with no historic data [20].Whilst their approach allows for predictions in locations without real-time irradiance observations, they achieve this simply by not including them.We explore this as well as alternative methods such as GPkN that circumvent the data dependency by substituting values.

Motivation and Methodology
In this section we discuss our proposed methodology, outlining how we address the irradiance forecasting problem defined in Section 2.1.In Section 3.1 we outline our motivations while, Section 3.2 describes our proposed data models to solve the issues outlined.

Motivation
Our goal is to present plausible methods, capable of being deployed in an industry setting that provided timely, accurate forecasts of irradiance.To that end, our model must: (1) Produce forecasts at a frequency and with a forecast horizon to be of practical use; (2) Generalise across locations; (3) Uncouple irradiance forecasts from real-time observations.As such, when designing our methods, we must take into consideration the various possible technical limitations when selecting input features, i.e. availability, update frequency, timeliness, etc.
Given these objectives and limitations, for our study, we aim to create models capable of predicting with an hourly frequency with a forecast horizon of 6 hours.We show that predictions of steps after 6 hours are predominately dominated by the weather data and as such in "production" will be limited by weather forecast accuracy.

Data Models / Training Schemes
The data available at both training and inference times dictate the overall design of any forecasting model.Broadly, we consider two classes of data when designing the models: 1. Historic observations -this is data that is guaranteed to be available at training time.This would be a database of weather and irradiance values for one or more AOIs spanning multiple years.
2. Real-time -the data used to make the forecasts.It consists of telemetry/observations transmitted, in a timely manner (on the order of minutes), to the model in order for it to produce useful forecasts.This could be an observation of irradiance measured at given AOI or weather forecasts sourced from 3rd parties.
Both classes of data are needed to create a useful forecasting model.However, for any given AOI there may be limitations on data availability.Using the problem definition in Section 2.1, we outline four ways to train a forecasting model using various combinations of possibly available data.Each represents a conceivable dataset that could be available for a group of AOIs when attempting to build a forecasting model.
We use four data models, Local, Global, Global Plant Holdout (GPH) and Global Plant kNear (GPkN).Local is a typical approach with Global the logical way to generalise across locations solving some of the limitations of a Local model.GPH and GPkN are further extensions of the Global approach each solving a data limitation.An overview of the four methods is shown in Figure 1.
In this context, an AOIs data consists only of irradiance values, it is assumed that weather data (ground-based or satellite) will always be available, both historically and in real time, for any AOIs location.

Local Models
Local Models, as their name suggests, are localised to a single AOI.As such they are only trained on data from and produce predictions for a given AOI.This approach is common in the literature [22] and would be a typical approach when there are a small number of AOIs.Since local models are created by fitting a model using only data from the given AOI, one model must be created per AOI.While being a relatively straightforward approach Local models have three key limitations: • They require enough historic data for each AOI to effectively learn a model (the amount of historic data needed is dependent on the model and data being used).
• A model per AOI must be generated.If there are a large number of AOIs, such as in the case of a domestic solar fleet, computation and storage constraints could become an issue.
• If used, a real-time data feed of irradiance for all AOIs may be needed.

Global Model
In the case of the Global model, we create a generalised version of the local method, a single model that can predict for all AOIs [17].To do so, we train a single model on data for all AOIs.This is done by unioning all data for every AOI into a single composite dataset and using it to train a model.Global models solve two of the issues presented by local models.(1) A single model artefact is created, greatly reducing the overhead of managing multiple models, (2) Since they are trained on multiple AOI it is possible to create an effective model even if some AOIs have limited historical data.
This definition of a Global model still assumes the perfect case where there is access to at least partial historic data for all AOIs, and if used, a real-time feed of irradiance for all AOIs.Because of this, the Global approach is still, by definition, limited to AOIs with historic data.Additionally, if an AOI has missing real-time data no predictions can be made.

Global Plant Holdout (GPH)
We can extend the Global approach in an attempt to resolve these limitations.In the case of GPH, we eliminate the need for all AOIs to have historic data.Like the standard Global approach, we aim to produce a single model able to effectively predict for several AOIs, however not all AOIs have historic data available.This would be the case e.g. for a new installation or a new domestic AOI without historic data.This can be useful as data from decommissioned plants can still be used to train the models.
For GPH, a single model is trained using the data from AOIs with available historic data (even if only partial).The model is then used to make predictions for all AOIs both with and without historic data, using their real-time data if needed.For this approach to be viable, any models created must be able to generalise well to new unseen AOIs.
To simulate this, a standard cross-validation approach is taken.Each AOI is randomly assigned to one of 5 folds.At train time, for each fold a model is created using data from all but the AOIs in the given fold.At test time only the AOIs in the fold are evaluated.This process is repeated for each fold resulting in a worst-case prediction for every AOI in the training set.

Global Plant kNear (GPkN)
This approach only applies when the use of real-time irradiance is needed, GPkN attempts to solve the worst case scenario where we have neither access to historic irradiance measures nor real-time data for the AOI.While an unlikely scenario, it sets a baseline as the most challenging scenario.It could be a viable fallback strategy in the case of a sensor failure, used in a domestic setting, or to estimate the output an AOI.For GPkN, we produce forecasts for multiple AOIs where historic and real-time data is only available for a subset of the AOIs.Like GPH, a single model is trained using data from the AOIs with historic data.To generate predictions for the AOIs with no data, we substitute the real-time values from the nearest AOI with data.
To evaluate GPkN, the same per-fold GPH model was used, however, the real-time irradiance values, used as an input feature, were substituted to the nearest plant not in the hold-out fold.Distances were calculated using the haversine function.

Experimental Framework
In this section, we present our experimental framework.In Section 4.1 we provide the details of the raw data used as well as any prepossessing that was done.In Section 4.2 we describe the models used and their configurations.Finally, Section 4.3 explains the error metrics and validation methods used.

Data Details
The raw data is comprised of data from three unique data sources all covering the period 2015-01-01T00:00 to 2021-01-01T00:00.
Irradiance data.We sourced irradiance values from the "MIDAS Open: UK hourly solar radiation data [41].It consists of hourly irradiance ( KJ m 2 ) from over 80 locations distributed throughout the UK.Each location consists of a time series with data for all or part of the period.We selected a subset of 20 locations to use as our primary AOIs, the locations were selected as they have the fewest missing values for the timespan.This was done to as fair as possible comparisons when evaluating local models between AOIs.
Pre-processing -Any negative values in the dataset were replaced with a 0. The values were then transformed using the function I = max(3, ln(irradiance+ 1)) − 4.This was done to give an approximately normal distribution in the range [−4, +4] centred on 0 in the daytime hours, with raw values of less than ≈ 20 clipped out as we consider them night-time.
Weather data.Historic, hourly, ground station weather observations for all locations within the irradiance dataset.The data was sourced from a commercial supplier [10] and contains the features outlined in Table 1.The weather features were normalised using the methods outlined in the table.Satellite Data.Hourly Satellite imagery from EUMETSAT [14].The raw data is comprised of 11 channels spanning wavelengths from visible to infrared and an additional 12th channel in the visible spectrum at a higher resolution.We process a set of 12 images, 500px X 500px, of the UK area [-12.0W,48.0S, 5.0E, 61.0N], with each pixel being approximately 3km by 3km.The area processed is shown in Figure 2 with the location of AOIs highlighted.An example for each wavelength at day and night is shown in Figure 3.
Pre-possessing -The images are cropped so that they are centred on the given AOI.  2. These values help give the model context for when and where the forecast is being generated.As these values can be calculated simply from the AOI location and time it is assumed they are always available.

Model Configurations
As mentioned in Section 2.1 our aim is to create a model that approximates the function that maps its inputs to future irradiance values.There are many ML methods that can be used to do this.In order to evaluate the different data models outlined in Section 3.2, and the various possible input features described in Section 4.1; we utilise four typical ML methods, Trees, LSTMs, DNNs and CNNs.While both the Trees and DNN are purely regressive methods, the CNN and LSTM contain elements of both regressive Azimuth sin E/W of the position of the sun in the sky Azimuth cos E/W of the position of the sun in the sky Table 2: Calculated features, all solar positions were calculated using pvlib [42] and time series approaches.Specifically, they contain explicit architectures to capture the sequential nature of the data and generate their forecasts autoregressively, using previous outputs as inputs to generate the next step.
All methods are able to make use of both the real-time irradiance and the calculated values as input features.However, the LSTMs, DNNs, and Trees use the ground-based weather data, while the CNN uses the satellite data.An effort was made to ensure hyperparameters and architectural decisions will produce results indicative of the approach's possible performance, however, no full hyperparameter search was performed and better values or architectures may exist.The same model configuration was used regardless of the data model or input features where possible.The architecture and hyperparameters selected for each model are outlined below.

Trees
Trees, or tree-based models, are a classical ML approach.We use a random forest (RF), an ensemble model, training multiple decision trees each on subsets of the training data and combining the predictions from each tree to produce the final output [43].For the rest of this article, we use the term Trees interchangeably with Random Forests.Trees were chosen as they have been used effectively used on numerous forecasting problems.Their use of an ensemble approach makes them highly robust to over-fitting.They are also easy to implement with many standard library implementations.
The following hyper-parameters were used for all experiments:

Number of Trees 20
Min Examples per Leaf 2

Max Depth 32
Table 3: Hyperparameters used by the random forests Forecasting.While it is possible for a tree-based model to output multiple labels [44,45] A limitation of the implementation we used for our tree-based is that they can only output a single value.This means that unlike the other DL approaches used, in order to produce forecasts a unique model was trained for each step in the forecast horizon t 1 . . .t f h .

Deep neural network (DNN)
The DNN is a standard DL method.A DNN is typically comprised of an input layer, followed by a number fully connected hidden, and finally an output layer that produces the final result.Between each layer is a non-linear activation function such as Sigmoid or the ReLU.Each layer is comprised of several units, each of which outputs a weighted sum of all the previous layers' outputs.During the training phase, the weights are adjusted using backpropagation in order to minimise a loss function, such as mean square error in a regression context, with respect to training data.
In our case, the DNN takes as input the ground-based weather data, past irradiance and the calculated features.All inputs for every time step are stacked together as a single input vector.The DNN consists of 3 hidden layers all 128 units wide with a ReLU between each.The final output layer is forecast horizon units wide producing all 6 outputs at the same time.

Long short-term memory (LSTM)
The LSTMs use the same input features as that of the DNN and Tree models.For the first 12 input steps, i.e. past observations where there are both weather and irradiance data, the LSTM is in a warm-up phase establishing its internal state.Once the past data has been consumed, the model enters the prediction phase, where for each prediction step t i the previous output of the model at t i−1 is used in place of observed irradiance.
The topology of our LSTM network is as follows: The weather features are combined into a single vector and passed through a 3-layer MLP with 32 units.The output is then concatenated with the calculated features and passed through an LSTM layer with 128 units.The output of the LSTM layer is passed through another MLP with 2 hidden layers of 128 units with a final layer of 1 unit to produce the prediction.Of note is that all MLP layers share their weights across every time step.

Convolutional neural network (CNN)
Unlike the other DL based models, the CNN use satellite images rather than ground-based weather data.However, they can still incorporate both the real-time irradiance and calculated values as input.The images for each time step are stacked into a 4D tensor of shape [timesteps, height, width, channels], resulting in a standard weather state input with a shape: (18,16,16,12).
The 4D tensor is then passed into a CNN with a stack of three convolutional blocks made up of; a convolution with a kernel size of (3x3x3) convolving over both space and time with 16 features; A max pool of (1,2,2), reducing over just spatial dimensions; finally, a ReLU activation.
The output of this is treated as a weather feature vector like those used in the other methods, combined with the calculated features and passed into a model with the same architecture as that of the LSTM.Preliminary results suggested that the LSTM performed better than the DNN so was selected.

Training
In practice, we only use the last n values as irradiance values at step t 0 do not depend on historical values ad infinitum.Depending on the forecast problem a sensible value of n must be selected, in our case, we have selected n = 12.
All the DL models were trained with the setting specified in Table 4.

Hyperparameter Value
Learning rate 0.0003

Metrics and Validation
There exist numerous metrics for evaluating the performance of regression models throughout the literature.They primarily provide a summary of the error distribution where the error is defined as the difference between the observed and predicted value [46,47].Additionally, there exist many ways to measure forecast accuracy [48].A common feature of forecast errors is a scaling of the error enabling comparison across distributions.
We define our error metrics as follows; For a dataset comprised of n examples where Y = [y 0 , . . ., y n ] represents the observed value and Ŷ = [ ŷ0 , . . ., ŷn ] the predicted values.The following performance metrics defined below are used to evaluate and compare the various models: • Forecast skill (S) Where: The dataset is split into p periods.U w represents the error of the models' forecasts for a given period, calculated as the mean square of the error scaled by the clear sky: V w is a measure of the forecast difficulty for a given period calculated as the average of the variability of the irradiance S is in turn calculated as the average of all the periods within the dataset.We used a period size of 1 calendar month.
Both nRMSE and S are scale-invariant enabling a nuanced comparison between models and various output distributions.nRMSE was selected as it has been widely used in the literature.While being scale-invariant, absolute errors are punished the same regardless of the size of the target, i.e. a prediction of 15 for a true value of 10 results in the same error as a prediction of 105 for a true value of 100.S was proposed by [49] and is similar to mean absolute scaled error, adjusting in proportion to the size of the target sequence however the metric also factors in a measure of forecast difficulty.
A note on the error metrics and their interpretation.S and nRMSE are interpreted in inverse of one another.In the case of nRMSE -a lower value is better with 0 indicating the predictions are perfectly accurate.A value of 1 would mean forecasts are very bad as the RM SE is equal to the mean of the sequence, as such the average absolute error is the same as the mean of the sequence.S, conversely, is interpreted with a higher value indicating better performance.S can fall in the range (−∞, 1] although a value less than 0 indicates poor performance.A more detailed interpretation is in Table 5.

Value Interpretation
The prediction is perfectly accurate.0 The prediction is no better than that of a persistence model using the ratio of the last observed irradiance to clear sky ŷ = I −1 GHI −1 GHI 0 .
< 0 Negative values indicate that the prediction is worse than the persistence model.nRMSE punishes errors equally regardless of the size of the target.As such, a good nRMSE indicates that the magnitude of the errors is consistently small i.e. ŷ = y ± 30.
A low nRMSE error but poor forecast skill could indicate that the model performs poorly when the target irradiance values are low, early and late in the day e.g. the model always predicts 50 above the true value.It could be due to a low variance in targets vs GHI making it 'easy' to predict the sequence.Conversely, a higher nRMSE but good forecast skill could indicate that the models performed well during the day when target values are higher e.g. the model was always 10% over the target value, or that the sequence was challenging to predict, leading to larger errors.
All errors presented are calculated using only daytime values.We defined daytime as any point where the target irradiance, y > 20 and GHI > 1.We use both conditions to minimise the risk of any sensor errors sewing the results.

Validation
The data was split into train and test partitions of roughly 70% train and 30% validation.The data was split on 1st May 2019 with all models being trained on data from before the split date and evaluated on data after.The date was arbitrarily chosen from a previously used dataset.
For both GPH and GPkN we train each model on a subset of 16 AOIs for historic data and only test on the missing 4. We use a standard crossvalidation approach to generate predictions for all 20 AOIs in our test set.

Statistical Tests
We use two non-parametric statistical tests for hypothesis testing to give statistical support when analysing our results [50].We use non-parametric tests as the initial conditions required for parametric tests to be reliable may not be met.For pairwise comparisons, we make use of the Wilcoxon test [51,52].We assume a level of significance of α = 0.1 To evaluate our methods against one another we use the Friedman Aligned-Ranks test [53] to identify statistical differences among them.We use the Holm post-hoc test to determine which algorithms have significant differences among the 1 * n comparisons [54].

A note on Weather Forecasts
In order to evaluate our models we use historical observation in place of actual weather forecasts.This was done to remove a degree of uncertainty caused by any error in the weather forecast as we attempt to understand the effect different input features can have.As such the results presented are best case for any given method and in production using real weather forecasts we would expect a drop in performance depending on the accuracy of the forecasts.

Analysis of Results
Here we present our results and analysis.In Section 5.1 we address what input features are the best.Using a single forecasting model, we analyse the effects of various combinations of inputs on both the Local and Global approaches.In Section 5.2, we explore the use of Local and Global methods, analysing the various performance of models trained with each scheme as well as the effects use of different kinds of weather data.Then, in Section 5.3 we show the performance of GPH and GPkN training schemes to circumvent used to circumvent data limitations.We also revisit the effects different input features can have specifically with a focus on uncoupling real-time irradiance.

Effect of Inputs
In this section we analyse the effect using different input features can have on model performance.To do so we trained the DNN on various combinations of inputs.We focused on the DNN due to their easy ability to change input features and their speed to train.As outlined in Section 4.1, there are a number of potential input features that could be used by the models.We group the various features into three classes (1) Calculated values such as location, time, solar position etc. (2) Observations of the last few irradiance values.(3) Weather features, past observations as well as forecasts of future states.We trained the DNN using four combinations of these inputs: • Weather -Weather and static values data improves over the static only indicating that both features contain useful information the model can extract.Looking at the per step errors in Figure 4 it is clear that when using just irradiance the largest effect on performance for the in the first few steps before dropping towards that of the static only.Given a longer forecast horizon, we would expect this trend to continue until it eventually plateaus in line with the static as the importance irradiance decreases.Conversely, the use of just weather data produces a consistent improvement at every step when compared to static only.This is unsurprising as, much like when using just static data, the information available for the model to produce the forecasts is the same for all steps.
While the models that use just irradiance outperform just weather at step one, they reach an inflexion point around step two or three.This explains the switching of rankings at steps one and six in Table 7 When the model uses both irradiance and weather data, the behaviour is similar to that of just irradiance, starting at an improved state before dropping off, but only to be in line with that of the model that made use of weather data.
Looking at nRMSE errors the per AOI in Figure 5b there is a strata with relatively consistent performance increases for each of the input groups for all locations.However, this trend does not apply to the S as can be seen in Figure 5d.This would indicate that improvement is not consistent relative to the forecast difficulty of each AOI.Looking at the S per AOI for the static input, given the fairly consistent nRMSE values, the high variance would suggest that some locations are more challenging to predict for than others.This gives more evidence that the inclusion of both weather and irradiance produces better models as the std of the S drops from 0.08 to 0.06 Given these results, we can see that use of all features produces the best model.It is worth noting that in both the Local and Global approaches use of just weather data appeared to be competitive only being clearly beaten in the first few time steps.This is especially evident when we look at the average forecast skill with both Local and Global having a difference of less than 0.01.

Local vs Global
Here we compare the performance of the various learning methods trained using both the Local and Global data models.We aim to understand and compare the Local and Global approaches.We use all input features as, based on the results from the previous section, this produces the best models.Table 8 gives an overview of the nRMSE and S results for each model.We present both the overall average for all AOIs at every forecast step, 1-6, as well as the standard deviation.In italics, we emphasise if the Local or Global approach produced the best result for each model while in boldface is the best overall model.Figure 6 and Figure 7 show the spread of the nRMSE and S for each AOI and forecast step per model.
Looking at the average results in Table 8 it is clear that the Global approach outperforms the Local in all the DL methods (CNN, DNN, LSTM).However, for the Trees, performance is almost identical between the two approaches with the Local flavour presenting a slightly better average.At steps one and six this difference is insignificant with a p value > 0.2.
Looking at the per step errors in Figure 7 we see that the Global approach improves results at every step for the DL methods.We can see from Figure 6 that for the CNN and DNN in addition to improving mean error, there is a lower variance per AOI when using the Global method.We suspect this is because the Global models are able to extract information from one AOI and apply it to another.

Does location affect performance?
In Figure 8 we show the error for every time step per AOI plotted against the AOIs latitude.This gives us an indication if there is any correlation between how far North / South an AOI is and its performance relative to the other AOIs.When looking at the nRMSE there is a correlation between the AOIs error and its latitude with AOIs further north appearing to perform worse.r = 0.72 and r = 0.68 for Local and Global respectively.However, when comparing the S error, the correlation is not as strong, with Local r = −0.43 and Global r = −0.30.We believe this is because the nRMSE values are normalised by the mean irradiance of the AOI and locations further north receive less irradiance throughout the year and as such have a lower average exacerbating any forecast error relative to AOIs further south.
Overall, while location likely does play some part in the performance of any given AOI, we feel it is not as significant a factor relative to other factors that may affect model performance at any AOI.

Data Used / Learning methods
Of the methods that make use of ground-based point weather; DNN, LSTM and Trees, their performance is extremely consistent.This is especially true for the Global LSTMs and DNNs where their performance is almost identical as can be seen in Figure 7.It is also emphasised in Table 9 showing the Global model rankings at every step.The fact that all three  methods appear to perform comparably while the CNNs show a significant improvement, could suggest that there is a limit on the amount of information that can be extracted using ground-based weather for the forecasts.The use of the image data seems to break through the information floor, as is supported by the fact that even the local CNN outperforms all other methods' Global approach.We suspect this is due to the fact it receives a richer view of the weather as there is the distance between the observation station and the AOI.However, this improved performance comes with a much larger computational cost to train and run the models.Table 10 gives an overview of the approximate training times for each method.The CNNs takes significantly longer to train than the other methods.
As the time complexity for all the methods used is O(n), sequentially training a Local model for all AOIs or a single Global model will take approximately the same amount of time.Of course, in practice, it is possible to easily parallelise training the local approach3 .This is not possible for the Global approach as the model needs to be trained on all the data.This is also one of the advantages of the Global approach, and we suspect a reason it outperforms the Local, it sees more data. The

Conclusion
When working with multiple AOIs, the Global approach is better for DL based methods.While they take longer to train, they produce better and more consistent results.Additionally, in the case of the DL approaches if new data become available they could potentially be further refined by training the existing model on the new data transfer learning.When there are only a few AOI Trees, there is a minimal performance gain compared to using the local version.

GPH and GPkN
We have shown in Section 5.1, the use of real-time irradiance can improve accuracy for the first few steps of the predictions.However, until now we have presented results for the perfect case where data has been available for all AOIs.One of the main aims of our paper is to present an understanding of potential solutions for when there is no or limited access to data at the AOI.In Section 3.2 we presented two data models able to produce forecasts for AOIs with limited data, GPH for a lack of historic data and GPkN for a lack of real-time data at the AOI.In this Section, we analyse the performance of models trained using these alternate data models compared to the Local and Global approaches.
In Figure 9 we show the distribution of error per AOI for each of the models trained using all four data models, Local, Global, GPH and GPkN. Figure 10 shows a breakdown of the S for each of the four models at every forecast step.While Table 11 shows the Friedman ranking for each method at forecast one and six.
From Figure 9 it is clear that for the DL methods both GPH and GPkN appear to be viable approaches outperforming their Local counterparts.The GPH approach yields results in line with the Global approach.Looking at the per-step errors in Figure 10 it is clear they follow the same trend with a number of steps being indistinguishable.This is supported by rankings in Table 11 with Global and GPH consistently ranking in line with one another.
In the case of GPkN applied to the DL models from Figure 9 they appear to fit between the Local and Global methods.Looking at the per-step error in Figure 10 it is clear that this is the result of GPkN under-performing relative to even the Local approach for the first few forecast steps.The GPkN eventually rises to be only marginally worse than the GPH.We can omitted as without irradiance as an input feature the results are the same as GPkN.
Figure 11 shows the average S per AOI and per step of the Global and GPkN models trained both with and without real-time irradiance as an input feature.In the case of Global models, we clearly see the inclusion of realtime irradiance improves performance for all the models.This is unsurprising as we have already shown in Section 5.1 that the use of irradiance helps Global models.However, in the case of GPkN where each AOI does not have access to real-time and instead uses a substitute value, there is not a clear performance benefit.
In fact, in the case CNN performance appears to drop for the first time step when using the substituted irradiance.However, when performing the Wilcoxon test this difference was not significant with p = 0.13 this would suggest a few AOIs may be skewing the results.This trend held for all the   GPkN models, unlike the Global models the difference when using irradiance is not statistically significant.In other words when operating in the worstcase scenario of GPkN the inclusion of irradiance has no real benefit.We suspect this is due to a poor correlation between the true real-time irradiance used by the Global and the substituted value used in GPkN.With the average distance to the substituted AOI being 68km this is unsurprising.If the substituted values were closer we would expect to see performance improve.
As we observed in Section 5.1 for Global models, where the real-time irradiance is sauced from the AOI, its inclusion improves model performance for the first few forecast step.However, for GPkN, when irradiance is not sourced from the AOI, the use of irradiance is of limited value and in some cases may even hinder model performance.As such, when using GPkN mod-els care must be taken to ensure a correlation of irradiance between the target AOI and the substituted AOI.

Results Summary
Overall Global models with the use of real-time irradiance perform the best.However, their need for a complete dataset can limit their usefulness in the real world.We have shown through the use of GPH that the Global models generalise well across locations working well for unseen locations.
More generally it would seem that the inclusion of real-time irradiance can improve overall performance for an AOI where it is available, regardless of the data and training model used.However, its advantage over just using weather data seems to be limited to the first few forecast steps, and for forecasts with horizons longer than a few hours, its use is not as necessary.We have shown it is possible to uncouple irradiance observations from forecasting models, although with marginally reduced performance.

Conclusion
In this work, we have explored various techniques for building irradiance forecasting models.We used a number of standard ML methods, RFs, DNNs, LSTMs and CNNs.Each trained using four approaches: Local, Global, Global Plant Holdout (GPH) and Global Plant kNear (GPkN).The Local approach trains a model per location while the Global approach combines data from all locations and trains a single model.GPH and GPkN are extensions to the Global approach used to circumvent data dependency issues that may occur at training time and when the model is in production.GPH model generates forecasts for locations without historic data while GPkN circumvents any real-time data dependency by substituting values from nearby locations.Furthermore, we analysed the effects the use of diffident input features can have, specifically; real-time irradiance and weather data.We also explored different weather formats, point-based and satellite data.
Experimentally, we have shown that the Global approach and extensions are superior to the Local.While computationally more expensive to train, utilising all sequences to learn, the single Global model consistently outperformed its local counterpart.Furthermore, the Global approach can be utilised to generate forecasts for locations with limited historical data.
Our experiments have shown that the use of real-time irradiance can improve forecasts for the first few steps, however, after 2-3 hours its importance diminishes, and weather data is key.When attempting to forecast for locations without direct access to real-time data, while it is possible to substitute irradiance values from other locations care must be taken.The greater the distance between the two locations, the weaker their irradiance will correlate, and performance will be negatively impacted.
Additionally, our experimental results have shown that the use of satellite images works very well.While the RF, DNN, and LSTM perform in line with each other, the CNN using satellite imagery consistently outperforms all of them.In fact, the CNN operating in its worst case, GPkN, presents a significant improvement over its Global ground-based weather counterparts.While in practice, access to these kinds of forecasts may be challenging; this result would strongly suggest that the use of richer weather data, over singlepoint data, can significantly improve forecast accuracy.
We feel that these models can be integrated into a planning and optimisation system for use in the energy market.However, further exploration of the effect the use of richer weather data has on performance is needed.We plan work to combine satellite and ground-based weather data and increase the temporal resolution to better capture intra-hour fluctuations.We also plan to extend the substitution approach used in GPkN by combining data from multiple locations rather than just using the closest location.

Figure 2 :
Figure 2: UK area of visible wavelengths processed to RGB with the locations of the AOIs highlighted in red Figure 3: A sample of the satellite images covering the full UK area

•
All -Irradiance, weather and static values • Irradiance -Irradiance and static values • Static -Calculated values

Figure 6 :
Figure 6: Distribution of Local and Global errors S per step for Global -higher is better

Figure 7 :
Figure 7: Distribution of error per step for each of the models comparing local and global

Figure 9 :
Figure 9: Overview of Local, Global, GPH and GPkN per AOI.The plot shows the average error for each AOI

Figure 11 :
Figure 11: A comparison of Global and GPkN for all the ML methods with and without irradiance as in input used as an input feature

Table 1 :
Ground-based weather features used by the models.

Table 4 :
No architectural hyperparameters common to all DL models

Table 5 :
A description of forecast skill (S) interpretation

Table 8 :
Average of Global and Local results for steps 1-6 for each model.In italic is the best of either Local or Global for each method and metric.Bold face is the best overall method for the metric.

Table 9 :
Friedman Rankings of the S error for each Global model per step same training time constraints are true for the Trees as their trainingFigure 8: Error for all time steps 1-6 compared to AOI latitude (N/S)time grows in line with the amount of data you need to learn from.However, the Global approach does not seem to provide a performance gain.

Table 10 :
Approximate run times to train the models.For the trees, this is the time to train 6 trees, one for each forecast step.A large part of the Local training time ( 30%) is spent on overheads such as filling caches etc.A single Local training epoch for the DNN took approximately 5 seconds.

Table 11 :
Friedman rankings of the different data models at steps one and six for both error metrics.The best-ranked method is in bold