Modelling an energy market with Bayesian networks for non-normal data

Energy markets are typically characterized by high complexity due to several reasons such as the large number of occurring variables, different in nature, and their associative structure. Estimating a statistical model that properly represents the dependencies among the variables is crucial for managing such a complexity. In this paper, a simple energy market influenced by hydroelectric availability is studied by using Bayesian networks. Since the variables of interest are quantitative but non Gaussian, non-parametric strategies are used to infer the Colombian energy market association structure. We propose a comparison between the UniNet learning algorithm and the Rank PC algorithm, both based on normal copula assumption and Spearman correlation measure, in order to explore differences in the estimated models. Finally, model usability for energy managers is shown through the discussion of some scenarios.


Introduction
The electric power system is a complex integrated structure arranged on three levels: production, transmission and distribution.
Complexity of the energy market depends on its own characteristics as well as on several restrictions, such as the impossibility of power storage, the limitations in the transmission grid, the fact that the electricity transmission and the distribution grid use alternating current.Hence, plants have to be synchronized in order to ensure national security.
Generally, the cost of energy production is directly proportional to the energy demand: as the demand increases, the number of plants have to be upgraded accounting for an ordering among production plants (firstly the most efficient, less expensive, and then the less efficient plants, more expensive and sometimes polluting).Therefore, cost of energy production strongly depends on availability and technical characteristics of plants.Furthermore, energy demand variability, due to seasonal fluctuations, causes uncertainty in identifying the optimal amount of production and requires high flexibility with respect to energy supply.As a consequence, the market is administered by the market managing authority who collects the bids of power generators, optimizes quantities and costs and, finally, makes energy available to distributors.
As a matter of fact, handling an energy market can require different perspectives and expertise.Statistically speaking, the association structure among the variables of interest constitutes a relevant information for energy managers.Consequently, Bayesian networks (BNs) can be a useful tool for estimating relationships and for evaluating different scenarios (Cinar and Kayakutlu 2010).
In this paper, we consider the Colombian energy market, a quite simple market where power production is highly dependent on hydroelectric plants.The identification of other important factors setting the energy price level given a fixed amount of hydroelectric production is of interest.The variables involved in the analysis are quantitative and their marginal distributions are non Gaussian; as a consequence nonparametric strategies are required to properly manage non-normal data.In this case, two main approaches, both based on the assumption that multivariate data can be suitably described by a joint normal copula, are adopted to learn the structure: 1. non-parametric Bayesian networks (NPBNs) leading to a model where the presence of an arc between two nodes stands for a conditional rank correlation between the corresponding variables, greater or equal to a given threshold; 2. Bayesian networks estimated by Rank PC (RPC) algorithm leading to a model where the presence of an arc between a pair of nodes is tested by iteratively verifying (marginal and conditional) rank correlations between the corresponding variables, given a certain significance level α.
The aim of this paper is thus two-fold: to estimate a model for managing the energy market complexity by following both approaches, and to make a comparison between these two methods.In detail, first a NPBN is learnt in order to find out those factors influencing the energy price in addition to the hydroelectric production; then RPC algorithm is performed with the aim to compare the results.The practical usability of the estimated model for energy managers is then shown through some scenarios evaluation and discussion.
The paper is organized as follows: a brief introduction to BNs and their nonparametric version can be found in Sect.2; Sect. 3 deals with BNs structural learning with a particular focus on non-normal data.The application to real data is discussed in Sect. 4. Conclusions are addressed in Sect. 5.

Basics on Bayesian networks
A Bayesian network (Cowell et al. (1999)) represents the multivariate probability distribution of a set of variables by means of a directed acyclic graph (DAG), i.e., a mathematical object made of a finite set of nodes and directed edges arranged without producing direction preserving cycles.Nodes of a DAG are associated with random variables and arrows between nodes represent direct relevance of one variable to another.A node, say X 1 , is said parent of another node, say X 2 , if there is an arrow from X 1 to X 2 ; correspondingly, X 2 is said child of X 1 .
In a BN each node is associated with a conditional distribution given its parents and the joint distribution represented by the DAG can be factorized as follows: where pa(X j ) is the set of parents of X j in the DAG.
A BN can be built manually by experts if the dependence structure is known.More realistically, the relationships among variables are unknown, or partially known, so the network has to be learnt directly from data.
BNs are particularly appreciated in very many applied contexts for their easy-toread pictorial representation of complex problems and for their capability to evaluate scenarios making them decision support systems.In fact, BNs can be provided with an inference engine allowing to carry out what-if analysis by means of computationally efficient algorithms.This means that the impact of different scenarios can be evaluated in real time by inserting and propagating specific evidence sets throughout the network.
BNs can be used when the variables are either continuous or discrete or mixed continuous and discrete.In case of continuous variables, inference is possible if they are normally distributed (in the mixed continuous and discrete case, the conditional Gaussian distribution is assumed).However, in many practical problems, continuous variables are far from normal and their dependencies are substantially different from those encompassed by the Gaussian model.Different approaches can be used to tackle this issue.A common solution consists in the discretization of the continuous variables.A more recent approach is given by non-parametric Bayesian networks that are briefly introduced in the following section.

Non-parametric Bayesian networks
Non-parametric Bayesian networks are BNs where nodes can represent both quantitative and qualitative variables and their dependence is modelled through copulae.More specifically, for continuous BNs, each node is associated with a continuous variable with invertible distribution function, while each arc is associated with a (conditional) rank correlation.By copulae, continuous multivariate distribution representation is split into two components: -univariate marginal modelling; -dependence structure modelling.
Hence, a large number of variables can be handled without introducing any distributional assumption on the marginals.
Formally speaking, suppose U 1 . . .U d are random variables uniformly distributed on that all univariate marginals are uniform on the interval [0, 1]: The general representation of a multivariate distribution as a composition of a copula and its univariate marginals is due to Sklar (1959).By Sklar's theorem any cdf can be written as the copula C of its univariate marginals F 1 , . . ., F d : for some copula C.
Copulae are very flexible in the bivariate case but not in the multivariate one: for more than two variables, only the elliptical copulae (Normal and Student's t) are the possible choice.Nevertheless, multivariate copulae can be decomposed into a cascade of bivariate copulae.This decomposition is named pair copula construction (PCC).PCC can be represented through a graphical model called vine (Cooke 1997;Bedford and Cooke 2002), i.e., a nested set of trees, where the edges of the j-th tree are the nodes of the ( j + 1)-th tree and each tree has the maximum number of edges.
Each edge is associated with a conditional rank correlation (r ), so that the joint distribution is portrayed as a vine equivalent to a BN.
For copulae different from the normal one, translating a rank correlation specification for a BN into one suitable for a vine may require additional computations increasing complexity.NPBNs (discharging the assumption of joint normality) build a joint density using the joint normal copula: its important property, valid also for joint normal distributions, is that zero partial correlation is sufficient for conditional independence, i.e., zero partial correlation implies absence of the corresponding edge.Furthermore, partial correlation is equal to conditional correlation and zero conditional correlation implies conditional independence.Moreover, the product moment correlations (ρ) and the rank correlations (r ) are related to each other by the Pearson's transformation so that the computational speed is increased.
It is also worth noting that the conditional rank correlations in a BN are algebraically independent and, together with the hierarchical structure and the marginal distributions, uniquely determine the joint distribution.
For further readings we refer to Hanea et al. (2006) and Kurowicka and Cooke (2010) as well as to Dalla Valle (2014), Dalla Valle and Kenett (2015) and Hanea et al. (2010) for applications.As far as computation is concerned, the NPBN estimation, as showed above, can be performed by UniNet software (Morales-Napoles et al. 2007).

Bayesian networks structural learning
When the relationships among the variables are unknown, or partially known, the network has to be learnt directly from data by means of efficient algorithms.Techniques to perform inference from data, namely structural learning, have been largely discussed in the literature (Buntine 1996;Neapolitan 2003) and they are mainly developed according to two approaches: scoring and searching and constraint-based.
Alongside scoring-searching techniques (Cooper and Herskovits 1992; Heckerman 1995) that span the space of models and select those optimizing an information criterion, constraint-based algorithms carry out a sequence of independence tests and, according to their results, draw a DAG encoding the learnt conditional independence relations.Among the latter, one of the main methods for inferring a directed graph is the PC algorithm (Spirtes et al. 2000), described in more detail in the next section together with its rank version useful for the non Gaussian case.

PC algorithm and Rank PC algorithm
The PC algorithm proceeds along three main steps: 1. identification of the skeleton of the graph (i.e., the underlying undirected graph).
Starting from a complete undirected graph, i.e., a graph where all nodes are connected to each other, the algorithm recursively tests marginal and conditional independencies given a chosen significance level and a specific variables ordering.
Statistical tests used to decide whether removing or maintaining edges between pairs of nodes in the graph are usually chi-squared for categorical data and Pearson correlation for Gaussian data; 2. identification of v-structures-three nodes configurations → ← , standing for conditional dependence between two parent nodes given their child-by observing the test results of the previous step; 3. orientation of the remaining undirected links without producing additional vstructures and/or directed cycles.
For multivariate normal observations, PC algorithm has high-dimensional consistency properties as discussed by Kalisch and Bühlmann (2007).Recently, Naftali and Drton (2013) showed that PC algorithm retains these consistency properties when rank-based measures of correlation are used to assess conditional independence.The modified algorithm, namely Rank PC algorithm, differs from the traditional PC in the test statistics used in step 1 for checking conditional independencies.The standard Pearsontype empirical correlations are replaced by rank-based measures of correlation.
In details, in case of normal data, PC algorithm tests conditional independence between two variables, say X and Y , given a separating set S by computing the partial correlation obtained from the bivariate normal conditional distribution of (X ; Y )|S, denoted by ρ X ;Y |S .It holds that: (5) For non-normal data, RPC algorithm tests conditional independence between two variables, say X and Y , given a separating set S by computing the rank-partial correlation estimates between X and Y denoted by rX;Y |S .It holds that: where γ ∈ [0; 1] is a fixed threshold.The consistency of RPC algorithm is demonstrated by Naftali and Drton (2013) under some non-strict assumptions.It is shown that RPC works at the same strength of Pearson-PC algorithm for normal data and considerably better for non-normal data, under the strong assumption of the joint distribution following a normal copula model.RPC algorithm is implemented in the pcalg R package (Kalisch et al. 2012).

Learning NPBNs
When learning a NPBN, the one-dimensional marginal distributions are taken directly from data, and, as far as the model is concerned, it is only assumed that the joint distribution has a normal copula.The model selection technique is based on two validation steps: 1. validating that the joint normal copula adequately represents multivariate data; 2. validating that the NPBN is an adequate model for the saturated graph.This step consists in checking if the proposed DAG does not significantly differ from the complete graph.
This learning algorithm is implemented in UniNet.
In order to perform validation under the copula model, an overall measure of multivariate dependence needs to be computed.The determinant of the correlation matrix (D) is used.This determinant is equal to 1 if the variables are all uncorrelated, it is equal to 0 if they show linear dependence.For this purpose, we recall the theorem below (Hanea et al. 2010): Theorem 1 Let D the determinant of a n-dimensional correlation matrix, with D > 0.
For any partial correlation BN specification: where ρ 2 i j;D i j is the partial correlation of the arc connecting node i and j, with conditioning set D i j , and the product is taken over all arcs in the BN.
The validation process implemented in UniNet is based on the following three determinants: -DER: the determinant of the empirical rank correlation matrix; -DNR: the determinant of the normal rank correlation matrix;1 -DBBN: the determinant of the rank correlation matrix of a BN, under the assumption of joint normal copula.DNR will be generally different from DER since DNR assumes the normal copula rather than the empirical copula.Moreover, for the saturated net, it holds that DNR = DBBN, otherwise DBBN > DNR.
To perform the first validation phase, the sampling distribution of the DNR is calculated by simulation; if DER falls within the 90% central confidence band of DNR, it is possible to assume that joint normal copula is an adequate model for our data.
Only if this step is successfully concluded, it is possible to proceed to step 2 where a parsimonious model is evaluated.This model has less edges than the saturated one, and it is obtained by removing those edges with a small conditional rank correlation (corresponding to a small partial correlation in the normal copula case).See Hanea et al. (2010) for details about this euristic procedure.
To validate the proposed model, the sampling distribution of DBBN is determined by simulation; if DNR falls within the 90% central confidence band of DBBN, the model adequately approximates the complete graph.Once the structure has been estimated, the BN can be used for prediction via conditioning, thus simulating various scenarios.In fact, one of the main features of UniNet is its analitical conditioning capability: values can be assigned to one or more variables and the conditional distributions of the other variables are updated in a very short time thanks to the joint normal copula assumption.

An application to the Colombian energy market
Here a BN for the Colombian energy market is learnt using the methods in Sects.3.1 and 3.2.The analysis is based on data provided by Enel Group.The interest is two-fold: -estimating the relation structure in order to identify those variables that directly or indirectly influence the power price; -comparing the different methods for learning non Gaussian BNs described in Sect. 3.

Data description
It is well known that Colombia has natural resources for power generation such as water, coal and gas reserves in amounts allowing greater supply than the Country's demand.Moreover, due to its geographical position, the Colombian main power source is hydropower (69% of total production in 2014), followed by the thermal one (gas and coal in order).Recently, in 2014, the use of non-conventional resources (mainly renewables) has been promoted by new regulations.The data given to us by the experts comprise the energy price and its demand, the main important energy commodities and their costs.In Table 1 the variables used in the analysis and their description are reported.Hereafter, node names will coincide with variable names and will be written in italic.The data are monthly mean values spanning from 2010 to 2015, resulting in 72 observations.

Learning the structure
Subject-matter knowledge does not allow to manually draw the network, therefore data driven structural learning is performed.Logical ordering of the variables has been provided and has been taken into account while estimating the network.

Learning NPBNs by UniNet
According to the approach proposed by Kurowicka and Cooke (2006), the onedimensional marginal distributions are taken directly from data.By the normal copula assumption the variables' rank dependence structure is that of a joint normal distribution and conditional rank correlations are retrieved from data.
The analysis has been carried out using the UniNet software. 2The structural learning procedure implemented in UniNet requires a preliminary ordering of the variables, usually defined on the basis of expert knowledge.In our case, it has been provided by the experts of Enel and it is that listed in Table 1.
The saturated NPBN model is represented in Fig. 1 where each edge is associated with its corresponding conditional rank correlation computed from data.
In Fig. 1 the nodes are replaced by monitors showing the marginal distribution of each associated node (variable) with its mean ± the standard deviation.The histograms clearly show that most of the distributions are far from normality.Notice that, with respect to the variables listed in Table 1, the network in Fig. 1 contains an additional node, named Tightness, that is a functional node defined as the difference between With regard to step 1, in order to validate the joint normal copula assumption, the determinants DER and DNR have been compared.The sampling distribution of DNR is based on 10.000 simulations and its 90% confidence interval is [0.00056732, 0.0042003].DER is 0.0007082 so that it falls within the above interval.Then the hypothesis that data were generated from the joint normal copula is not rejected at a 10% level.For the saturated graph, the rank determinant is equal to DNR, since the NPBN uses the normal copula.Indeed, DNR = DBBN = 0.00197252.
Next, a parsimonious NPBN model adequately summarizing the complete graph is searched.The UniNet procedure consists in eliminating from the saturated graph those arcs corresponding to very weak associations.Specifically, UniNet works by simultaneous deletion of those edges whose conditional rank correlation is below a fixed threshold.This model selection procedure appears to be too naive due to the simultaneous arcs elimination without a step-by-step updating of the conditional correlations.Therefore, here, a backward approach has been used.Edges have been deleted one at a time and conditional rank correlations have been recomputed at each step.The conditional rank correlation threshold value to stop the backward procedure has been set equal to 0.2, as conventionally used.The resulting NPBN model is shown in Fig. 2. In Fig. 3 the same network is shown, with nodes replaced by the histograms representing the marginal empirical distribution of the associated nodes (variables).
To evaluate if the selected model adequately summarizes the saturated network, the determinants DBBN and DNR have been compared.The sampling distribution of DBBN is based on 10.000 simulations and its 90% confidence interval is [0.0006274, 0.0044747].DNR is 0.00197253 so that it falls within the above interval.Then the NPBN in Fig. 2 can be considered a good approximation for the complete model.The selected model in Fig. 2 clearly shows the variables directly affecting the target node Energy_Price.It is positively influenced by the Demand and by the amount of power produced by Coal plants; on the contrary, it is strongly negatively influenced by the amount of power produced by Hydro_Wind plants. 4Therefore, the highest the hydro production component the lowest the price.
In fact, the Colombian energy market is mainly dominated by the hydroelectric power production whose price is lower than that of gas and coal.Consequently, in case of dry periods, more gas and coal based energy must be produced (as suggested by the negative sign of the conditional rank correlation between Hydro_Wind and Coal and Gas) giving rise to energy price increase.
The net also suggests that some picks in the Demand could positively affect the level of Gas and Coal production, as well as their costs; on the contrary, it cannot affect the renewables (Hydro_Wind) because their production depends on climate change and seasonality only.
Moreover, the dependence structure represented by the model in Fig. 2 does not reveal any direct effect of the commodity prices (Coal_Cost and Gas_Cost) on the energy price.The nodes Coal_Cost and Gas_Cost indirectly affect Energy_Price through Demand.In fact a variation in the price of coal, say, impacts on the overall energy price if the power demand is so high that non renewable power sources, such as coal, are necessary.

Learning NPBNs by RPC algorithm
We next proceed to estimate the structure using RPC algorithm.
According to the approach proposed by Naftali and Drton (2013), we learn the structure of the NPBN by means of a modified version of the well known PC algorithm (Spirtes et al. 2000), suitable when observations follow a Gaussian copula model since it uses rank-based measures of correlation to assess conditional independence.This analysis is performed by using the pcalg R package.
Compared to UniNet method, the main advantage of RPC algorithm, is that it uses conditional independence tests to infer a DAG from data.Therefore, the presence/absence of an arc is tested at a significance level α.
Since the hypothesis of normal copula has already been tested in Sect.4.2.1, we can directly apply RPC algorithm to the dataset.
RPC runs using the function pc as implemented in the pcalg R package.Moreover, the option providing a variable order independent output (see Colombo and Maathuis (2014) for details) has been used.The resulting graph for α = 0.05 is shown in Fig. 4.
Compared to the UniNet model in Fig. 2, we can notice that Energy_Price is still directly influenced by Demand and Coal.Differently from Fig. 2, here Hydro_Wind affects Energy_Price via Gas and Demand.
It is also remarkable that the RPC model, as the NPBN in Fig. 2, does not suggest any direct relationship between the Energy_Price and the commodity prices (Coal_Cost and Gas_Cost).Hence, the fact that the cost of commodities affects the overall energy price via the demand only, can be considered a robust path.

Choosing the structure
The model in Fig. 4 is comparable in terms of number of edges to the UniNet model since it has one edge less only.If node cardinality is considered, the two models are also similar.A comparison between the networks in Fig. 2 and in Fig. 4 has been done in terms of predictive capability.We focused on the target variable, i.e., Energy_Price, and analysed the receiver operating characteristic (ROC) curve plotting the true positive rate (sensitivity) against the true negative rate (1-specificity).In particular the area under the curve (AUC) representing an overall measurement of model performance varying between 1 (for perfect discriminant capacity) and 0.5 (for null discriminant capacity) is considered.A 800,000 units sample has been drawn from each of the two networks in Fig. 2 and in Fig. 4. By each of these samples all the variables have been discretized in four levels and the conditional probability tables have been estimated,

Conditioning and prediction
Once the structure has been estimated, the NPBN in Fig. 2 can be used to address a number of queries about the effect of different levels of energy demand and/or commodities cost and/or levels of power production with different sources on the overall energy price.Various scenarios can be simulated by inserting and propagating the appropriate evidence (scenario) throughout the network.This operation is also called conditioning and can be performed in a very short time thanks to efficient algorithms.Note that the conditioning algorithm associated to NPBNs is computationally efficient thanks to the normal copula assumption (Kurowicka and Cooke (2006)).
Here, we evaluate how much low supply of hydroelectric energy (occurring after long-lasting dry periods) may affect other power productions and, in turn, the overall energy price.
It is also interesting to evaluate how much possible technological or energy policy changes or variations in the cost of raw materials could modify the energy price level.
In the following, three scenarios are analysed.The updated conditional distributions (for the variables not instantiated) are shown in black while the the unconditional distributions are in grey.The unconditional distributions of all variables are also shown in Fig. 3.
-First scenario Suppose that the amount of power produced by hydro and wind plants is low due to a long-lasting dry period.We conditionalize the node Hydro_Wind setting it at its lowest level as shown in Fig. 5.
Comparing the conditional distribution and parameters with the unconditional distribution and parameters, it seems clear that setting the renewables at their lowest level, the amount of gas production rises from 1220 to 1960 MWh and the amount of coal production increases from 479 to 771 MWh on average.As far as the costs are concerned, only Coal_Cost is influenced, decreasing from 134,000 (peso per Ton) to 128,000 (peso per Ton).The Energy_Price drastically grows from 183,000 (peso per MWh) to 459,000 (peso per MWh) and the whole updated distribution (in black) shifts to the right.As an obvious consequence, the Tightness also increases being the deterministic difference between demand and renewables based power production.-Second scenario We now suppose that there is a pick in the coal cost.The node Coal_Cost is instantiated at its highest value as shown in Fig. 6.
Comparing the conditional with the unconditional distributions it emerges that higher costs of coal commodity are due to a larger amount of the energy demand (whose updated distribution is shifted toward larger values) as well as of renewables and coal production.This means that coal becomes extremely expensive when the energy demand is so hight that hydro and wind energy are not sufficient to fully satisfy it and it is necessary to resort to high-priced power plants.Note, in fact, that also the distribution of the tightness remarkably concentrates on larger values.In such situation Gas_cost also increases.Consequently, the Energy_Price rises from 183,000 (peso per MWh) to 316,000 (peso per MWh) on average.-Third scenario Consider the case where there is a pick in the cost of gas.We conditionalize the node Gas_Cost instantiating it at its highest value as shown in Fig. 7.A pick in the cost of gas affects the level of energy price more than the coal cost.In fact, the Energy_Price dramatically increases on average, rising from 183,000 (peso per MWh) to 899,000 (peso per MWh).The average tightness goes up as well, meaning that gas and coal power plants are necessarily used to satisfy the demand.Consequently, there is also a remarkable growth of Coal.
The above scenarios are some examples about how to use NPBNs for making what-if analysis.It emerges that NPBNs constitute a real decision support tool for managers allowing to evaluate in real time different possible actions, such as different energy portofolio composition according to commodities cost and seasonality-this last feature being so crucial in Latin American countries such as Colombia.

Conclusions
In this paper we have shown by an application how Bayesian networks can help energy market managers in taking decisions.In particular, the Colombian market has been analysed.This market is strongly dominated by hydroelectric power production.In such situation, water supply management, i.e., the amount of hydroelectric production, is a crucial point since it is highly influenced by meteorologic phenomena such as niño and niña that may give rise to the alternation of long-lasting dry and wet periods.Drought is particularly worry because the energy request has to be alternatively satisfied using more expensive and polluting power sources, such as gas and coal.Consequently, it can be useful to have a tool supporting decision makers in evaluating possible adverse situations and in predicting their consequences on the overall cost of energy.The dataset, provided by Enel, comprises all variables having a non Gaussian distribution.Actually, Bayesian networks are limited to the Gaussian case.
Here we have overcome this limitation using two different methodologies able to perform structural learning without either resorting to previous discretization of the variables of interest or introducing unrealistic assumptions such as multivariate normality.Specifically, here it is assumed that the joint distribution has a normal copula.
The two learning algorithms are both based on conditional rank correlations, and work in a backward approach the first one, and in the PC-algorithm framework the second one.The analysis has been carried out using the software UniNet and the pcalg R package.The resulting networks have been compared in terms of their capability to predict the target variable energy price.In particular the ROC curve and the AUC have been computed and the network learnt in UniNet has been chosen.The energy demand (Demand) has a central role.It is not only directly related to the energy price but it also behaves like a filter between the cost of commodities (nodes Coal_Cost and Gas_Cost) and the overall energy price (node Energy_Price).The software UniNet has also allowed to quickly carry out what-if analysis on scenarios of interest by conditioning on fixed values for specific variables.

Fig. 3
Fig. 3 The estimated NBPN with nodes replaced by histograms

Fig. 5
Fig. 5 First Scenario with original distribution in grey and conditional distribution in black

Fig. 6
Fig. 6 Second Scenario with original distribution in grey and conditional distribution in black

Fig. 7
Fig. 7 Third Scenario with original distribution in grey and conditional distribution in black

Table 1
Description of variables involved in the analysis