Content

At EPEX Spot exchange electricity is traded and exchange members can offer bids for buying and selling energy (EPEX Spot SE, 2018). In Germany bids for the next day must be submitted until noon. Normally, single hours are traded at the day-ahead market. The Market Clearing Price reveals as intersection between offered and demanded amount for each hour. In the end, every market member pays this price for a particular hour (next-Kraftwerke, 2018). The price process for January to June 2018 can be seen in Fig. 1.

Fig. 1
figure 1

Time series of German day-ahead prices in 2018

In context of the EFRE.NRW funded project KundenoRientiert Flexibilisierungs-poTenziale erschließen (KRaFT) we are interested in price and load profile forecasting. Today, most platforms focus on a single time series. When applying the forecast of time series to more sources, we are facing big data problems. One example of use may be the analysis of load profiles for each electric vehicle load station in a car park. So, strategies for best loading can be applied and may maximise profit. At the moment, the given time series can be analysed and predicted with R (R Core Team, 2018). But for new use cases R itself is not sufficient.

In this work it is analysed, how multiple time series can be processed. Therefore, parallel computing with R and the clustering tool Apache Spark (Apache, 2018a) is used. We analyse the feasibility of computing those to make differentiated forecasts for several datasets in parallel, allowing complex forecast scenarios in a virtual power plant environment as it is addressed by the KRaFT project.

Time series method

This section provides a short summary of the autoregressive-moving average (ARMA) models, which are used to predict day-ahead prices.

ARMA-models are a mix of autoregressive (AR) and moving average (MA) models. AR-models relate the current value \( \overset{\sim }{x} \) of a process to a finite, linear combination of previous values of the process and a random noise ω. They get abbreviated as AR(p), where p describes the order. Instead, MA-models represent \( \overset{\sim }{x} \)linearly dependent on a finite number q of previous random noise ω’s (Box et al., 2008).

With a mix of both model types more flexibility is achieved. This leads to ARMA models of order p and q, ARMA(p, q) (Box et al., 2008).

$$ {\overset{\sim }{x}}_t=\underset{\mathrm{AR}\ \mathrm{part}}{\underbrace{\sum \limits_{i=1}^p{\varphi}_i{\overset{\sim }{x}}_{t-i}}}+\underset{\mathrm{MA}\ \mathrm{part}}{\underbrace{\sum \limits_{j=1}^q{\theta}_j{\omega}_{t-j}}} $$

where φi and θj are the model coefficients.

Apache spark

With Apache Spark fast and general-purpose cluster computing can be performed. It can run computations in memory and is recommended for a usage with big data which are analysed parallel. The three main components of Apache Spark are Spark Core with the basic functionality, Spark SQL for working with structured data and Spark Streaming to process live streams of data (Karau, 2015) such as day-ahead prices from EPEX Spot or real-time energy loads of photovoltaic systems and electric vehicle load stations.

At the moment, the used architecture saves daily processed day-ahead prices from ENTSO-E Transparency Platform (ENTSO-E Transparency Platform, 2018) in a Hadoop Distributed File System (HDFS) on a cluster where Spark runs standalone. Within each time series analysis, the data is imported as HDFS-file and parallel predicted with Spark. The output can be stored as HDFS-files or with databases as Apache Cassandra (Apache, 2018b), which provides better perspectives for the distribution of multiple time series among the cluster.

In the following sections we analyse an energy price time series with R and execute it on a Spark-Cluster in parallel. To use Apache Spark a session is needed. In R it is started with the following commands:

Listing 1 starting a spark-session

figure a

Time series analysis of energy prices

The data for this analysis are taken from (ENTSO-E Transparency Platform, 2018). A script stores the day-ahead prices on the cluster. The time series of the price process can be seen in Fig. 1.

To compute a forecast, a Spark-session is started first. Subsequently, the day-ahead prices of the German energy market get imported as HDFS-file. Then the time series is created. Here the values for 2018 until June 7th are used. The dataset is divided into training and test data (the last day), which gets predicted.

There is no time series package for Apache Spark and R (R Core Team, 2018) on the market. So, another method to perform forecasts with R on a Spark-Cluster is needed. With the API SparkR the function spark.lapply () (Apache, 2018c) is introduced. Similar to known R-functions as apply, sapply and lapply, it runs a user defined function over a list of elements. For each element the R driver sends the function to an R worker, executes it and returns the result of all workers as a list to the R driver. With this possibility a forecast logic is written in R and is parallel performed for multiple time series on a Spark-Cluster. The command for running this is shown in Listing 2.

Listing 2 command for spark.Lapply

figure b

Multiple time series, which are available as distributed datasets in the cluster, are included in liste. The user defined function getForecast (see Listing 3) processes the prediction for each element in liste.

Listing 3 method used in lapply

figure c

All required packages need to be loaded within the function. The model of the time series is created with auto.arima, a known R-function (Hyndma & Khandakar, 2008). In this case, the time series of the day-ahead prices is modelled with an ARMA (3, 2) with non-zero mean model. With the R-function forecast (Hyndma & Khandakar, 2008) a prediction of the next 24 values (1 day) is calculated based on the training process.

In Fig. 2 the forecast (red) is shown. Additionally, the historic values of 1 week (black), the real observations for the last day (green) and the confidence interval for the prediction (blue) is plotted.

Fig. 2
figure 2

Forecast of day-ahead prices of June 7th 2018

Summary

To submit bids for electricity at the EPEX Spot a good knowledge of needs and a price forecast is recommended. Therefore, different methods can be used, for instance a simple ARMA-model can predict the first few hours of a day. Other models and a preceded clustering of the data may bring a better result.

The first test of Apache Spark and R showed, that even if there is no special package for time series analysis in R with Apache Spark, it is possible to run parallel forecasts on a Spark-Cluster in R. So, multiple time series of price processes or load profiles like those of electric vehicle load stations in a car park, can be done simultaneously.

In future works, the scalability of time series analyses with Apache Spark and R with the developed method will be thematised.