Abstract
We present a robust approximation of joint chance constrained DC optimal power flow in combination with a model-based prediction of uncertain power supply via R-vine copulas. It is applied to optimize the discrete curtailment of solar feed-in in an electrical distribution network and guarantees network stability under fluctuating feed-in. This is modeled by a two-stage mixed-integer stochastic optimization problem proposed by Aigner et al. (Eur J Oper Res (2022) https://doi.org/10.1016/j.ejor.2021.10.051). The solution approach is based on the approximation of chance constraints via robust constraints using suitable uncertainty sets. The resulting robust optimization problem has a known equivalent tractable reformulation. To compute uncertainty sets that lead to an inner approximation of the stochastic problem, an R-vine copula model is fitted to the distribution of the multi-dimensional power forecast error, i.e., the difference between the forecasted solar power and the measured feed-in at several network nodes. The uncertainty sets are determined by encompassing a sufficient number of samples drawn from the R-vine copula model. Furthermore, an enhanced algorithm is proposed to fit R-vine copulas which can be used to draw conditional samples for given solar radiation forecasts. The experimental results obtained for real-world weather and network data demonstrate the effectiveness of the combination of stochastic programming and model-based prediction of uncertainty via copulas. We improve the outcomes of previous work by showing that the resulting uncertainty sets are much smaller and lead to less conservative solutions while maintaining the same probabilistic guarantees.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The proportion of renewable energy, such as solar and wind energy, in electrical distribution networks is constantly increasing. Due to these difficult to predict and highly fluctuating energy sources, the operational management of electrical networks becomes very challenging. Transmission system operators (TSO) have to control the feed-in and the power distribution in the network and have to meet safety requirements at the same time. If the network risks a system overload, the feed-in from renewables must be curtailed. However, the curtailed energy has to be minimized for financial and ecological reasons. Therefore, there is a high demand for the combination of advanced forecasting and optimization models. In this work, we show how these models can be applied and combined for the optimal curtailment of solar feed-in in an electrical distribution network.
The predominantly used model and optimizing the production and distribution of power in an electrical network is the Optimal Power Flow (OPF) model. In its classic version this is a non-linear non-convex optimization problem which is hard to solve and was originally introduced in Carpentier (1962). For a broad overview of the literature on OPF, we refer to Frank et al. (2012a) and Frank et al. (2012b). Due to the computational difficulty of the OPF problem, there are some approximation approaches in the literature. One of the most frequently used approximations is the DC Optimal Power Flow (DC OPF), see Christie et al. (2000). It results in a power flow model including only linear constraints and can be solved efficiently with standard software. For the optimization of power grids under uncertainty the DC OPF model is also used in this work.
In applications to power grids, it is important to ensure that there is a sufficiently high probability (chosen beforehand) that all safety constraints like transmission limits are satisfied. This can be modeled with a two-stage stochastic optimization model incorporating joint chance constraints that enforce the simultaneous satisfaction of several constraints with a predefined probability. In the first stage, the nominal network operating solution, including generator output, (discrete) curtailment, power flows and voltage angles, is decided before the realization of uncertainty is revealed (here-and-now). After the uncertain parameters manifest themselves, the two-stage variables react to them. In the second stage, the network response to fluctuation ensures that there is a high probability of transmission limits being maintained. From a practical perspective, protection through probabilistic constraints is suitable because short-term overloads in the electrical network are acceptable. In the event of larger or longer lasting overloads, countermeasures will need to be taken, where a TSO will need to (re-)optimize interventions in order to stabilize the network. In our model, curtailment limits the output of renewable power production to a specific percentage proportion of the installed power.
We approximate the probabilistic constraints in the optimization problem using robust constraints within a robust safe approximation, see Nemirovski (2012). By a suitable choice of the uncertainty set we can ensure that all robust feasible solutions are also feasible for the stochastic optimization problem. The constraints of the robust approximation thus lead to sufficient conditions for the chance constraints being satisfied. In particular, we use a mixed-integer linear reformulation for the approximation introduced in Aigner et al. (2021). Hence, by solving only one mixed integer optimization model to global optimality, a robust solution is computed that is feasible for the chance constrained problem. The respective uncertainty sets are computed with the procedure proposed in Margellos et al. (2014) based on the scenario approach (see Calafiore and Campi (2005)) of stochastic optimization, which uses samples from a suitably chosen probability distribution. The present paper proposes several enhancements of our previous work, which consist in the utilization of R-vine copulas (see e.g. Joe (2015)), a flexible parametric model to construct multivariate probability densities by decomposing them into several bivariate conditional (and univariate) densities to fit distributions to available data. Note that R-vine copulas contain the family of D-vine copulas as special case, which we used so far to model data from meteorology and solar power supply, see Schinke-Nendza et al. (2021); von Loeper et al. (2021). From the fitted R-vine copula model we draw samples in order to obtain the uncertainty sets with the help of the scenario approach mentioned above. Then, in a second step, we modify the R-vine copula model such that we can draw samples from conditional distributions. This allows us to determine uncertainty sets depending on weather forecasts provided by DWD (German Meteorological Service) which are significantly smaller and lead to a drastic reduction of conservatism and less costly curtailment with same probabilistic guarantees.
There are many research activities regarding OPF under uncertainty. The goal is to determine an optimal network configuration that remains feasible under uncertainty where the approach considered in this paper uses methods and models from stochastic and robust optimization. We refer to Ben-Tal et al. (2009) and Prékopa (1995) for a broad overview of these two paradigms regarding optimization under uncertainty. Note that due to the non-convexity of the nominal AC OPF, only solutions that are approximately protected against uncertainty can be computed as in (Dall’Anese et al. 2017; Roald and Andersson 2018; Zhang and Li 2011) with robust or probabilistic constraints.
Essential for an algorithmically tractable treatment of uncertainty in optimization problems is the possibility to solve the underlying deterministic problems (without uncertainty) efficiently. This is why the linear DC OPF model is suitable and of great interest. Such uncertain optimization problems are usually solved by reformulating them under specific assumptions on the underlying probability distribution or by using approximation techniques from stochastic programming. Most chance constrained OPF problems considered in the literature have separate chance constraints for each engineering limit, including both generation and transmission limits. For example, the authors of Bienstock et al. (2014); Lubin et al. (2016) focus on OPF with individual probabilistic constraints under Gaussian distributions. Uncertainty probabilities for specific classes of probability distributions are considered robustly in Roald et al. (2015); Xie and Ahmed (2018). Furthermore, there is a limited number of papers that deal with joint chance constraints OPF models. They allow much stronger system security guarantees, but are much harder to solve, see Geng and Xie (2019). Most common solution methods are based on the Boolean or Bonferroni approximation (see e.g. Jia et al. (2021)) and on scenario approximations (see e.g. Peña-Ordieres et al. (2021)).
In addition, the curtailment of renewable power is used in practice to reduce the feed-in of renewable energy sources, maintaining network stability and avoiding overloads of transmission lines. The curtailment of uncertain feed-in from renewables has also been considered in several OPF models. Examples can be found in (Aigner et al. 2021; Roald et al. 2016; Qiu and Wang 2014; Wang et al. 2011; Dall’Anese et al. 2017). Note that there are two principal types of curtailment strategies, which are usually modeled by additional discrete or continuous decision variables or fixed parameters. The first and more common type of curtailment uses output capacities, which restrict the maximum possible power input. This limit cannot be exceeded and any potential power production above the limit is cut off. The second type of curtailment reduces the produced energy by a fixed value regardless of how high the feed-in amount is. Chance constraints in combination with curtailment are usually tackled by sampling techniques from stochastic optimization already mentioned above. In the present paper we use discrete curtailment levels as it is common practice in many industrial applications and set by law in Germany.
To construct parametric models for multivariate distributions, vine copulas are a versatile tool which has been used in the literature for similar problems. For example, in Guo et al. (2021); Khuntia et al. (2019); Xiao et al. (2020), copulas are applied for dependency modeling of wind power in conjunction with OPF. Furthermore, in Xu et al. (2021), Gaussian copulas are used to determine uncertainty sets for an OPF problem with chance constraints.
The main contribution of the present paper is an extension of the safe approximation of the joint chance constrained DC OPF model introduced in Aigner et al. (2021), by combining it with a model-based prediction of solar power supply via copulas. Furthermore, additional information gained from weather data can be integrated into the copula approach and thus conditional distributions of solar power supply can be modeled. However, with regard to conditional sampling, vine copulas have some restrictions as described in Cooke et al. (2015), i.e., when drawing conditional samples from a given vine copula model, only some components of the underlying vector data can be taken into account in the conditioning set. To resolve this issue, various algorithms for conditional sampling from D- and C-vine copulas have been considered in the literature, see Bevacqua et al. (2017). In the present paper we propose a modification of the fitting procedure for the more general class of R-vine copulas. This modification allows us to obtain a suitable R-vine copula for any set of components on which we want to condition. To the best of our knowledge, this modification has not yet been considered before.
This rest of this paper is structured as follows. Section 2 recalls the joint chance constrained DC OPF model considered in Aigner et al. (2021), together with its robust approximation using box uncertainty sets. Then, in Sect. 3, the modeling of the underlying multivariate probability distribution with the help of R-vine copulas is introduced, where suitable uncertainty sets are constructed via the novel combination of the scenario approach and the fitted R-vine copulas. The numerical results of case studies based on real-world data for the distribution network of N-ENERGIE GmbH are presented in Sect. 4. They demonstrate the benefit of combining stochastic programming with a model-based prediction of uncertainty via copulas. The computed solutions are robust and lead to relatively small cost increase compared to the nominal optimization model that ignores uncertainty. The consideration of conditional probability distributions further improves the solution quality. Finally, Sect. 5 concludes.
2 Chance constrained DC optimal power flow model
In this section, we recall the chance constrained DC optimal power flow model with the possibility to curtail feed-in proposed in Aigner et al. (2021), which is based on Bienstock et al. (2014).
2.1 Nominal DC optimal power flow with curtailment
We model the electrical distribution network as an undirected graph \(\mathcal {G}=(\mathcal {N},\mathcal {L})\) where \(\mathcal {N}=\{1,\ldots ,n\}\) for some integer \(n>1\) represents the set of vertices and \(\mathcal {L}\subseteq \mathcal {N}\times \mathcal {N}\) denotes the set of edges. In the context of power system optimization, vertices are also called nodes or buses, and edges are called (transmission) lines. The set of those nodes that are connected with (continuously controllable) slack generators of higher network hierarchies is denoted by \(\mathcal {N}_\text {G}\subseteq \mathcal {N}\). Furthermore, for each \(k\in \mathcal {N}\) we denote the set of adjacent nodes with \(\mathcal {N}(k)\subseteq \mathcal {N}\). For notational ease, we assume that every node is connected to (discretely) controllable solar power generation units. The energy production on a bus without solar feed-in is set equal to zero.
In order to control the solar feed-in, discrete regulation decisions can be made at each node. Curtailment is realized by restricting the maximum feed-in to a certain fraction vector \(\beta =(\beta _1,\ldots ,\beta _{n})\in \mathcal {S}= \mathcal {S}_1\times \ldots \times \mathcal {S}_{n}\subset [0,1]^{n}\) of the installed capacity vector
Note that the installed capacity is the intended full-load sustained solar energy production at each node. In practice, sets of curtailment factors with a small number of levels are common. Typical sets of curtailment factors for single nodes are \(\{0,\,0.3,\,0.6,\,1.0\}\) or \(\{0,\,0.1,\,0.2,\,\ldots ,\,1.0\}\).
Thus, at a node \(k \in \mathcal {N}\), the power fed into the network cannot exceed \(\beta _k P^{\text {I}}_k\). Any potential feed-in above this value is cut off. We model the curtailed uncertain solar feed-in \(P^{\text {in}}_k\) based on a given solar power production \(P^{\text {PV}}_k\ge 0\) via
i.e., \(P^{\text {in}}_k =\min \{P^{\text {PV}}_k, \beta _k P^{\text {I}}_k\}\).
In the following, we briefly recall the DC optimal power flow model with discrete curtailment of solar feed-in proposed in Aigner et al. (2021), where Table 1 summarizes the notation used for decision variables and input parameters.
Decision variables are the vectors of generator outputs \(P^{\text {G}}= (P^{\text {G}}_k)_{k\in \mathcal {N}_\text {G}}\in [0,\infty )^{|\mathcal {N}_\text {G}|}\), voltage angles \(\theta = (\theta _1,\ldots ,\theta _n) \in [-\pi ,\pi ]^{n}\), power flows \(p = (p_{kl})_{(k,l)\in \mathcal {L}} \in \mathbbm {R}^{|\mathcal {L}|}\) and curtailment factors \(\beta \in \mathcal {S}\), where \(|\mathcal {N}_\text {G}|\), \(|\mathcal {L}|\) denote the cardinalities of the sets \(\mathcal {N}_\text {G}\) and \(\mathcal {L}\), respectively. The model reads as follows:
where the functions \(f_k:[0,\infty )\rightarrow [0,\infty )\) and \(c_k:[0,1]\rightarrow [0,\infty )\) model generator and curtailment costs, respectively.
The equality constraints (1b)–(1d) model the active power flow, which is determined by the power flow equations (1d) and Kirchhoff’s first law where we distinguish the two cases with and without generators, see (1b) and (1c) respectively. Note that the power at each node has to be balanced. This means that at each node \(k \in \mathcal {N}\) the active power production \(P^{\text {G}}_k + P^{\text {in}}_k \in [0,\infty )\) from generators and renewables equals the demand \(P^{\text {D}}_k \ge 0\) plus the active power sent to adjacent nodes \(\sum \nolimits _{l \in \mathcal {N}(k)} p_{kl} \in \mathbbm {R}\). The active power flow on transmission line \((k,l)\in \mathcal {L}\) is the product of voltage angle difference \(\theta _k-\theta _l\in [-2\pi ,2\pi ]\) and susceptance \(b_{kl}>0\). At the same time, the transmission limits considered in (1e) must not be exceeded. The vector of generator outputs \(P^{\text {G}}\) can be continuously controlled within the generator bounds considered in (1f). Furthermore, we assume that there is a bus \(k_0\in \mathcal {N}\) with a reference angle \(\theta _{k_0}=0\).
The optimization task consists in minimizing the objective function given in (1a) which is the sum of power generation costs (\(f_k\)) and curtailment costs (\(c_k\)) subject to the constraints mentioned above. Note that the functions \(f_k\) for all \(k\in \mathcal {N}_\text {G}\) and \(c_k\) for all \(k\in \mathcal {N}\) can be assumed to be linear or convex quadratic in the generator output. Since the minimum expressions in (1b) and (1c) can be linearized by introducing auxiliary variables and additional linear constraints (see e.g. Sherali and Adams (2013)), the optimization problem considered in (1) is a mixed-integer linear or convex quadratic program and can be solved efficiently to global optimality with standard techniques and software using, e.g., the Gurobi optimizer [23].
2.2 Uncertainty modeling
In practice, the vector of solar power production \(P^{\text {PV}}=(P^{\text {PV}}_1,\ldots ,P^{\text {PV}}_{n}) \in [0,\infty )^{n}\) is not known in advance. In addition, the production of renewable power can be subject to high fluctuations and is therefore an uncertain quantity. Using a network operating strategy that is computed by ignoring such uncertainties, a sudden fluctuation of renewable energy can lead to overloads in the electrical network. In the worst case, this can lead to failure of network elements owing to cascade effects. To prevent this, the optimization model explained in Sect. 2.1 has to be extended in order to take such fluctuations into account, and individual feed-in units may have to be regulated. In particular, we model the vector of produced solar power \(P^{\text {PV}}\) as the sum of a vector \(P^\mathrm{F}=(P^\mathrm{F}_1,\ldots ,P^\mathrm{F}_{n}) \in [0,\infty )^{n}\) of forecasted solar power and a random fluctuation vector \(X=(X_1,\ldots ,X_{n}):\Omega \rightarrow \mathbbm {R}^{n}\) defined on some probability space \((\Omega , \mathcal{F}, \mathbb {P})\), i.e.,
However, in a first step, we need to determine a nominal operating solution \((P^{\text {G}}, \theta , p)\) together with a curtailment decision \(\beta\) that is feasible for the nominal feed-in vector \(P^\mathrm{F}\) (corresponding to \(X=0\)), i.e., the decision variables \(P^{\text {G}}, \theta , p,\beta\) have to fulfill the constraints (1b)–(1e), where \(P^{\text {PV}}\) is given in (2) with \(X=0\). In addition, we require that, with high probability, the network reaction to fluctuating feed-in remains feasible, see the chance constraint given in (6g) below. To model this kind of network reaction, we consider randomized duplicates \(P^{\mathrm{G}, X}:\Omega \rightarrow [0,\infty )^{|\mathcal {N}_\text {G}|},\theta ^X:\Omega \rightarrow [-\pi ,\pi ]^{n}\) and \(p^X:\Omega \rightarrow \mathbbm {R}^{|\mathcal {L}|}\) of the decision variables \(P^{\text {G}}, \theta , p\) introduced in Sect. 2.1, which depend on the realizations \(X(\omega )\) for \(\omega \in \Omega\) of the random fluctuation vector X. Note that realizations \(X(\omega )\not =0\) of X may lead to a changed distribution of power in the network and, therefore, to an imbalanced network. The generators then change their output to \(P^{\mathrm{G},X(\omega )}\) in order to balance the total active network power. Furthermore, the decision variables \(\theta ^X\) and \(p^X\) are adjusted correspondingly to ensure feasibility.
Thus, in the setting of the two-stage stochastic optimization problem described above (see also Sects. 2.3 and 2.4), the variables \(P^{\text {G}}, \theta , p\) refer to first-stage (or here-and-now) decisions. They must be decided for the nominal feed-in vector \(P^\mathrm{F}\) (corresponding to \(X=0\)), before uncertainty is revealed. Moreover, for fixed first-stage variables \(P^{\text {G}}, \theta , p\), any realization \(X(\omega )\not =0\) of X leads to a reaction of the network by choosing optimal second-stage (or wait-and-see) variables \(P^{\mathrm{G},X(\omega )},\theta ^{X(\omega )},p^{X(\omega )}\), where we assume that the power generation is balanced by the Automatic Generation Control Borkowska (1974). This means that the total power generation mismatch \(\Delta _X=\sum \nolimits _{k \in \mathcal {N}} (\min \{P^\mathrm{F}_k+X_k,\beta _kP^{\text {I}}_k\}-\min \{P^\mathrm{F}_k,\beta _kP^{\text {I}}_k\})\) is shared among all generators according to given participation factors \(\alpha _k \in [0,1]\) for every \(k\in \mathcal {N}_\text {G}\) such that \(\sum \nolimits _{k \in \mathcal {N}_\text {G}} \alpha _k =1\). More precisely, for each \(\omega \in \Omega\) we put
The vector of decision variables \(\theta ^X\) is adjusted in a way that the power balance equations
are fulfilled for each \(\omega \in \Omega\). Furthermore, for each \(\omega \in \Omega\) we put
It can be shown, see Aigner et al. (2021), that for each realization \(X(\omega )\) of X the equation system given in (4a)–(4b) has a uniquely determined solution \(\theta ^{X(\omega )}\), i.e., the wait-and-see variables \(P^{\mathrm{G},X(\omega )},\theta ^{X(\omega )}\), and \(p^{X(\omega )}\) are uniquely determined by (3), (4a)–(4b), and (5).
2.3 Chance constrained DC optimal power flow
By construction, the vectors \(p^X\) and \(P^{\text {G},X}\) of power flows and generator outputs are random variables that depend on the realization \(X(\omega )\) of the random fluctuation vector X and on the values of first-stage decision variables \(P^{\text {G}}, \theta , p,\beta\). Thus, we are searching for solutions \((P^{\text {G}}, \theta , p,\beta )\) which satisfy the limits of type (1e) and (1f) for power flows and generators outputs, respectively, with a probability of at least \(1-\varepsilon\) for some small number \(\varepsilon \in [0,1]\).
We model this requirement by a joint chance constraint in order to guarantee network stability. This means that the desired compliance probabilities for all power flows and generator outputs are simultaneously met. Thus, combining all modeling elements considered in the previous sections, we formulate the joint chance constrained DC optimal power flow problem with discrete curtailment as follows:
where the wait-and-see variables \(P^{\mathrm{G},X(\omega )}_k, p^{X(\omega )}_{kl}\) are defined in (3) and (5), respectively.
2.4 Safe approximation of the chance constraints
Chance constrained optimization problems like (6) are in general hard to solve and may not be algorithmically tractable. Therefore, a large number of approximation techniques can be found in the literature, see Prékopa (1995) for a broad overview of the paradigm of stochastic optimization.
Thus, following Nemirovski (2012), we will replace the chance constraint considered in (6g) by a strictly robust protection against a suitably chosen uncertainty set \(B\in {\mathcal {B}}(\mathbbm {R}^{n})\) that fulfills
where \({{\mathcal {B}}}(\mathbbm {R}^{n})\) denotes the \(\sigma\)-algebra of Borel sets in the n-dimensional Euclidean space \(\mathbbm {R}^{n}\).
The robust approximation of (6) is then given by
where \(P^{\mathrm{G},u}_k,p^u_{kl}\) are determined as in (3) and (5) replacing \(X(\omega )\) by u.
One can show that every feasible solution of the safe approximation (8) is feasible for (6), see Gorissen et al. (2015). To ensure that the safe approximation generates not overly conservative solutions, the uncertainty set \(B\) should be chosen as small as possible, but as large as necessary.
Assuming that
for some \(\ell =(\ell _1,\ldots ,\ell _{{n}}),u=(u_1,\ldots ,u_{n}) \in \mathbbm {R}^{n}\) such that \(\ell _k<u_k\) for all \(k\in \mathcal {N}\), it has been shown in Aigner et al. (2021) that the optimization problem (8) possesses an equivalent mixed-integer linear reformulation which - although being NP-hard in general - can be solved e.g. with the Gurobi optimizer [23] within reasonable time also for huge instances.
3 Modeling the distribution of the random forecasting error
In order to solve the safe approximation (8) of the stochastic optimization problem (6) described in Sect. 2.3, a suitable uncertainty set \(B\subset \mathbbm {R}^n\) has to be determined such that (7) holds. For the novel construction of uncertainty sets with the help of copulas, we propose a method for modeling the multivariate probability distribution of the n-dimensional power forecasting error \(X = P^{\text {PV}}- P^\mathrm{F}\) introduced in (2). The model for the distribution of X is based on R-vine copulas, which are fitted to empirical data.
To make the paper self-contained, we first give a brief overview of some fundamentals of copula theory in Sect. 3.1. In Sects. 3.2 and 3.3 we explain how R-vine copulas are structured and how they can be fitted to empirical data. Once an R-vine copula is fitted for the distribution of the random fluctuation vector X, in Sect. 3.4 we explain how samples can be drawn from it, in order to determine an uncertainty set \(B\subset \mathbbm {R}^{n}\) of the form given in (9) which satisfies a slightly modified version of condition (7), see Sect. 3.6. Furthermore, in Sect. 3.5 we propose a modification of the fitting procedure for R-vine copulas in order to fit the distribution of the (2n)-dimensional random vector (S, X) to empirical data, where \(S:\Omega \rightarrow [0,\infty )^n\) models the forecasted solar radiation at the n nodes of the electrical network. This allows for an enhanced modeling of uncertainty sets \(B_s\in {{\mathcal {B}}}(\mathbbm {R}^n)\) conditioned on \(S=s\) for any given radiation forecast \(s\in [0,\infty )^n\).
3.1 Copulas: definition and sklar’s representation formula
A bivariate copula \(C:[0,1]^2\rightarrow [0,1]\) is the cumulative distribution function (CDF) of a two-dimensional random vector \(U=(U_1,U_2):\Omega \rightarrow [0,1]^2\), where both marginal distributions (of \(U_1\) and \(U_2\)) are the standard uniform distribution on the unit interval [0, 1], i.e., it holds that \(C(u_1,u_2)=\mathbb {P}(U_1\le u_1, U_2\le u_2)\) with \(C(u,1)=u_1\) and \(C(1,u_2)=u_2\) for any \(u_1,u_2\in [0,1]\). Moreover, by the choice of the copula \(C:[0,1]^2\rightarrow [0,1]\) the mutual interdependence of the components \(U_1\) and \(U_2\) can be described. For example, the product copula, where
models the case that \(U_1\) and \(U_2\) are independent random variables. On the other hand, if \(C(u_1,u_2)=\min \{u_1,u_2\}\) for all \(u_1,u_2\in [0,1]\), then \(\mathbb {P}(U_1=U_2)=1\), i.e., the components \(U_1\) and \(U_2\) are identical almost surely. Besides these two extreme cases, many further (parametric) families of bivariate copulas \(C:[0,1]^2\rightarrow [0,1]\) can be found in the literature, which model the case that \(U_1\) and \(U_2\) are neither independent nor identical. In particular, for the purposes of the present paper, the following bivariate copula families will be considered: Gaussian, Student t, Clayton, Gumbel, Frank, Joe, BB1, BB6, BB7, BB8 and their rotations, see e.g. Joe (2015); Nelsen (2006) for details.
Note that the notion of a copula is not restricted to the bivariate case. For any integer \(m\ge 2\), the function \(C:[0,1]^m\rightarrow [0,1]\) is called a copula if it is the CDF of an m-dimensional random vector \(U=(U_1,\ldots ,U_m):\Omega \rightarrow [0,1]^m\) such that the (marginal) distributions of \(U_1, \ldots ,U_m\) are the standard uniform distribution on the unit interval [0, 1]. The importance of copulas results from Sklar’s representation formula, see Joe (2015); Nelsen (2006), which states that the CDF of any random vector \(Y=(Y_1,\ldots ,Y_m):\Omega \rightarrow \mathbbm {R}^m\) with arbitrary (not necessarily uniform) marginal distributions can be written as the superposition of the univariate CDFs of \(Y_1, \ldots ,Y_m\) and a certain copula \(C:[0,1]^m\rightarrow [0,1]\). More precisely, it holds that
where \(F_{1,\ldots ,m}:\mathbbm {R}^m\rightarrow [0,1]\) with \(F_{1,\ldots ,m}(y_1,\ldots ,y_m)=\mathbb {P}(Y_1\le y_1,\ldots ,Y_m\le y_m)\) is the CDF of the m-dimensional random vector Y and \(F_{i}:\mathbbm {R}\rightarrow [0,1]\) with \(F_{i}(y_i)=\mathbb {P}(Y_i\le y_i)\) is the CDF of its ith component \(Y_i\) for each \(i\in \{1,\ldots ,m\}\). Vice versa, for any sequence \(F_{1},\ldots ,F_{m}\) of univariate CDFs and for any copula C, the superposition of \(F_{1},\ldots ,F_{m}\) and C considered on the right-hand side of (11) is the CDF of an m-dimensional random vector.
3.2 R-vine copulas
Note that the representation formula given in (11) can not directly be used in order to fit multivariate probability distributions to data. For this, sufficiently simple and, simultaneously, flexible parametric families of multivariate copulas \(C:[0,1]^m\rightarrow [0,1]\) are needed. One possible way to construct such parametric copula families is given by so-called R-vine copulas (regular vines), which is a generalization of D-vine copulas recently applied, e.g. in Schinke-Nendza et al. (2021); von Loeper et al. (2021), to model data from meteorology and solar power supply.
The structure of R-vine copulas offers the advantage that the probability distribution of the m-dimensional random vector \(Y=(Y_1,\ldots ,Y_m)\) to be modelled can be expressed in terms of a number of bivariate copulas. Hereby the structure of an R-vine copula is given by a vector of trees \(\mathcal {R} = (\mathcal {T}_1, \ldots , \mathcal {T}_{m-1})\) with the following properties, see also Fig. 1:
-
1.
\(\mathcal {T}_1=(\mathcal {V}_1,\mathcal {E}_1)\) consists of the set of vertices \(\mathcal {V}_1 = \{1, \ldots , m\}\) and some set of edges \(\mathcal {E}_1\subset \mathcal {V}_1\times \mathcal {V}_1\).
-
2.
For the remaining trees \(\mathcal {T}_2=(\mathcal {V}_2,\mathcal {E}_2), \ldots , \mathcal {T}_{m-1}=(\mathcal {V}_{m-1},\mathcal {E}_{m-1})\), it holds that \(\mathcal {V}_i = \mathcal {E}_{i-1}\) for each \(i=\{2,\ldots ,m-1\}\), i.e., the set of vertices \(\mathcal {V}_i\) of \(\mathcal {T}_i\) consists of the edge set of the previous tree \(\mathcal {T}_{i-1}\).
-
3.
For each \(i \in \{ 1, \ldots , m-2\}\), two edges in tree \(\mathcal {T}_i\) are joined by an edge in tree \(\mathcal {T}_{i+1}\) only if these edges share one common vertex.
Let \(\mathcal {E}(\mathcal {R})\) denote the set of all edges in \(\mathcal {R}\), meaning that \(\mathcal {E}(\mathcal {R})=\mathcal {E}_1\cup \ldots \cup \mathcal {E}_{m-1}\). Furthermore, we need the following notation. First, for each \(e=\{v_1,v_2\}\in \mathcal {E}_1\) we define \(\mathcal {S}(e) = \emptyset\) and \(\mathcal {O}(e) =\{v_1,v_2\}\). Next, we iterate over \(i\in \{2,\ldots ,m-1\}\) and, for each \(e=\{v_1,v_2\}\in \mathcal {E}_i\), we define \(\mathcal {S}(e)=\mathcal {S}(v_1)\cup \mathcal {S}(v_2)\cup (\mathcal {O}(v_1)\cap \mathcal {O}(v_2))\) and \(\mathcal {O}(e)=(\mathcal {O}(v_1)\cup \mathcal {O}(v_2))\setminus \mathcal {S}(e)\). We call \(\mathcal {S}(e)\) the conditioning set and \(\mathcal {O}(e)\) the conditioned set of edge e. According to Kurowicka and Joe (2010), it holds that \(|\mathcal {O}(e)|=2\) for each \(e\in \mathcal {E}(\mathcal {R})\) and, for each pair of indices \(\{i,j\}\in \{1,\ldots ,m\}\times \{1,\ldots ,m\}\) with \(i\not = j\), there is exactly one edge \(e\in \mathcal {E}(\mathcal {R})\) such that \(\mathcal {O}(e)=\{i,j\}\). Thus, for each each \(e\in \mathcal {E}(\mathcal {R})\), there are indices \(o_1,o_2\in \{1,\ldots ,m\}\) such that \(\{o_1,o_2\}=\mathcal {O}(e)\) and \(o_1< o_2\).
Suppose now that \(Y=(Y_1,\ldots ,Y_m)\) is a random vector with continuously differentiable CDF \(F_{1,\ldots ,m}:\mathbbm {R}^m\rightarrow [0,1]\), where the joint probability density of Y is denoted by \(f_{1,\ldots ,m}:\mathbbm {R}^m\rightarrow [0,\infty )\), and \(f_1,\ldots , f_m:\mathbbm {R}\rightarrow [0,\infty )\) are the marginal (univariate) densities of the components \(Y_1,\ldots ,Y_m\). Furthermore, let \(\mathcal {R} = (\mathcal {T}_1, \ldots , \mathcal {T}_{m-1})\) be a vector of trees with the properties mentioned above. Then, the following representation formula is true, see Czado (2019); Bedford and Cooke (2001); Joe (2015): For any \(y=(y_1,\ldots ,y_m)\in \mathbbm {R}^m\) such that \(f_{1,\ldots ,m}(y)>0\) it holds that
where \(Y_{\mathcal {S}(e)}\) denotes the random vector consisting of those components of \(Y=(Y_1,\ldots ,Y_m)\) the indices of which belong to the set \(\mathcal {S}(e)\subset \{1,\ldots ,m\}\), and, analogously, \(y_{\mathcal {S}(e)}\) is the corresponding subvector of \((y_1,\ldots ,y_m)\). Furthermore, \(c_{o_1,o_2\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}:\mathbbm {R}^2\rightarrow [0,\infty )\) denotes the bivariate copula density of the conditional probability distribution of the two-dimensional random vector \((Y_{o_1},Y_{o_2})\) given that \(Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}\), and \(F_{o_j\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}:\mathbbm {R}\rightarrow [0,1]\) is the conditional CDF of \(Y_{o_j}\) given that \(Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}\), where \(j=1,2\).
Note that the right-hand side of (12) is the product of uni- and bivariate functions. Thus, in order to determine the multivariate probability density \(f_{1,\ldots ,m}\), we just have to determine the univariate (marginal) densities \(f_1,\ldots , f_m\), the (conditional) univariate CDFs \(F_{o_j\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}\), and the (conditional) bivariate copula densities \(c_{o_1,o_2\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}\) for all \(e=(o_1,o_2)\in \mathcal {E}(\mathcal {R})\), where the recursion formulas (see Aas et al. (2009))
and
are used in order to determine the univariate CDFs \(F_{o_j\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}\) for \(j=1,2\).
3.3 Fitting R-vine copulas to empirical data
In this section we outline how the representation formula given in (12) can be utilized in order to fit an m-dimensional probability density \(f_{1,\ldots ,m}\) to empirical data, i.e., for a given sample of k realizations \(y^{(1)}=(y^{(1)}_1,\ldots ,y_m^{(1)}), \ldots , y^{(k)}=(y^{(k)}_1,\ldots ,y_m^{(k)}) \in \mathbbm {R}^m\) of the random vector \(Y=(Y_1,\ldots ,Y_m)\), where we use the sequential algorithm proposed in Dissmann et al. (2013). First, for each \(i\in \{1,\ldots ,m\}\), we use the sample \(y_i=(y^{(1)}_i,\ldots ,y_i^{(k)})\) to determine a kernel density estimator (KDE) \({\widehat{f}}_i:\mathbbm {R}\rightarrow (0,\infty )\), see Silverman (1986), for the marginal density \(f_i\) of the i-th component \(Y_i\) of Y, which is numerically integrated in order to obtain the univariate CDF \({\widehat{F}}_i:\mathbbm {R}\rightarrow [0,1]\). Then, in the next step, a valid tree \(\mathcal {T}_1=(\mathcal {V}_1,\mathcal {E}_1)\) with \(\mathcal {V}_1 = \{1, \ldots , m\}\) is chosen such that the expression
is maximized with respect to \(\mathcal {E}_1\), where \({\widehat{\tau }}\) denotes an empirical version of Kendall’s tau, which is defined for pairs of realizations \(\{(x_1, y_1)\ldots ,(x_n, y_n)\}\) of two random variables X and Y
where \(x=(x_1,\ldots ,x_n)\) and \(y=(y_1,\ldots ,y_n)\).
In other words, the edge set \(\mathcal {E}_1\) is chosen such that the sum of pairwise empirical correlations between \(Y_{o_1}\) and \(Y_{o_2}\) is maximized, where the sum extends over all edges \(e=(o_1,o_2) \in \mathcal {E}_1\). Subsequently, for each \(e=(o_1,o_2) \in \mathcal {E}_1\), a bivariate copula \(C_e\) is fitted. For this, the independence of \(Y_{o_1}\) and \(Y_{o_2}\) is checked via a statistical test Dissmann et al. (2013). If the null hypothesis (stating that \(Y_{o_1}\) and \(Y_{o_2}\) are independent) is not rejected, then the product copula given in (10) is chosen for \(C_e\). Otherwise, an (unconditional) bivariate copula \({\widehat{C}}_e\) and its parameters are fitted to the data vectors \(({\widehat{F}}_{o_1}(y^{(1)}_{o_1}),\ldots {\widehat{F}}_{o_1}(y^{(k)}_{o_1}))\) and \(({\widehat{F}}_{o_2}(y^{(1)}_{o_2}),\ldots {\widehat{F}}_{o_2}(y^{(k)}_{o_2}))\) with the help of a maximum likelihood method Joe (2015).
Now, analogously to (15), a valid tree \(\mathcal {T}_2=(\mathcal {V}_2,\mathcal {E}_2)\) with \(\mathcal {V}_2=\mathcal {E}_1\) is selected such that the following expression is maximized:
Note that \(|\mathcal {S}(e)|=1\) for all \(e \in \mathcal {E}_2\). Thus, using (13) and (14), the conditional CDFs \({\widehat{F}}_{o_1 \mid Y_{\mathcal {S}(e)}=y^{(\ell )}_{\mathcal {S}(e)}}\) and \({\widehat{F}}_{o_2 \mid Y_{\mathcal {S}(e)}=y^{(\ell )}_{\mathcal {S}(e)}}\) for \(\ell \in \{1,\ldots ,k\}\), can directly be obtained from the (unconditional) bivariate copula \({\widehat{C}}_{o_1,o_2}\) and the (unconditional) CDFs \({\widehat{F}}_{o_1}\) and \({\widehat{F}}_{o_2}\), which are determined as described above. Then, for each \(e\in \mathcal {E}_2\) and \(o_1,o_2\in \mathcal {O}(e)\), a bivariate copula \({\widehat{C}}_{o_1,o_2\mid \mathcal {S}(e)}\) and its parameters are fitted to the data vectors \(({\widehat{F}}_{o_j \mid Y_{\mathcal {S}(e)}=y^{(1)}_{\mathcal {S}(e)}}(y^{(1)}_{o_j}),\ldots ,{\widehat{F}}_{Y_{o_j} \mid Y_{\mathcal {S}(e)}=y^{(k)}_{\mathcal {S}(e)}}(y^{(k)}_{o_j}))\) for \(j=1,2\), where the simplifying assumption is made that the copula \({\widehat{C}}_{o_1,o_2\mid \mathcal {S}(e)}={\widehat{C}}_{o_1,o_2\mid Y_{\mathcal {S}(e)}=y_{\mathcal {S}(e)}}\) does not depend on the given realization \(y_{\mathcal {S}(e)}\) of \(Y_{\mathcal {S}(e)}\), see e.g. Haff et al. (2010).
Finally, in the same way as described above, the trees \(\mathcal {T}_i=(\mathcal {V}_i,\mathcal {E}_i)\), the conditional CDFs \({\widehat{F}}_{o_j \mid Y_{\mathcal {S}(e)}=y^{(\ell )}_{\mathcal {S}(e)}}\) for \(j=1,2\) and \(\ell =1,\ldots ,k\), and the bivariate copulas \({\widehat{C}}_{o_1,o_2\mid \mathcal {S}(e)}\) are determined for all \(e\in \mathcal {E}_i\) and \(i=3,\ldots ,m-1\).
3.4 Sampling from multivariate probability densities
In Sect. 3.3 we showed how the multivariate probability density \({\widehat{f}}:\mathbbm {R}^m\rightarrow [0,\infty )\) given by the representation formula
for \((y_1,\ldots ,y_m)\in \mathbbm {R}^m\) can be fitted to empirical data. We now explain how samples can be drawn from the probability density given in (17).
Recall that the Rosenblatt transform Joe (2015) maps a sample \(y=(y_1, \ldots , y_m)\) of a random vector \(Y = (Y_1, \ldots , Y_m)\) with joint probability density \(f_{1,\ldots ,m}:\mathbbm {R}^m\rightarrow (0,\infty )\) onto a sample \(u=(u_1, \ldots , u_m)\) of a vector of independent and uniformly distributed random variables \(U = (U_1, \ldots , U_m):\Omega \rightarrow [0,1]^m\) such that
where \(F_{Y_i \mid Y_1=y_1, \ldots , Y_{i-1}=y_{i-1}}:\mathbbm {R}\rightarrow [0,1]\) denotes the (conditional) CDF corresponding to the conditional density \(f_{Y_i \mid Y_1=y_1, \ldots , Y_{i-1}=y_{i-1}}:\mathbbm {R}\rightarrow (0,\infty )\) for \(i=1,\ldots ,m-1\). Assuming that the densities \(f_{Y_i \mid Y_1=y_1, \ldots , Y_{i-1}=y_{i-1}}\) for \(i=1,\ldots ,m-1\) are positive, the CDFs \(F_{Y_i \mid Y_1=y_1, \ldots , Y_{i-1}=y_{i-1}}\) are bijective for \(i=1,\ldots ,m-1\) and thus, by applying the inverse CDFs to both sides of the above equations, we obtain the inverse Rosenblatt transform:
which maps a sample \(u=(u_1, \ldots , u_m)\) of U onto a sample \(y=(y_1, \ldots , y_m)\) of Y. Note that the (inverse) Rosenblatt transform works for any permutation of the indices \(1, \ldots , m\).
Now, consider some sequence of edges \(e^{(1)}, \ldots , e^{(m-1)}\) with \(e^{(i)} \in \mathcal {E}_i\) for \(i=1,\ldots ,m-1\) such that \(e^{(i)} \in e^{(i+1)}\) for \(i=1,\ldots ,m-2\). For the given edges, it follows from the third property of the trees \(\mathcal {T}_1,\ldots ,\mathcal {T}_{m-1}\) introduced in Sect. 3.2 that there is a permutation \((o_1, \ldots , o_m)\) of \((1,\ldots ,m)\) such that \(o_1 \in \mathcal {O}(e^{(1)})\) and \(o_{i+1} \in \mathcal {O}(e^{(i)})\) for \(i=1,\ldots ,m-1\). Thus, the inverse Rosenblatt transform can be used as follows, in order to draw a sample \((y_1,\ldots ,y_m)\) from the probability density \({\widehat{f}}_{1,\ldots ,m}\) given in (17):
where \(u=(u_1, \ldots , u_m)\) is a sample of a vector of independent and uniformly distributed random variables \(U = (U_1, \ldots , U_m):\Omega \rightarrow [0,1]^m\), the (unconditional) CDF \({\widehat{F}}_{o_1}\) is given by an integrated kernel density estimator (KDE), and the (conditional) CDFs \({\widehat{F}}_{o_i \mid Y_{\mathcal {S}(e^{(i-1)}) \cup \{o_{i-1}\}}}\) for \(i=2,\ldots ,m\) are determined as described in Sect. 3.3.
Later on, in Sect. 4, the algorithms stated in Sects. 3.3 and 3.4 are applied to derive the numerical results presented in this paper, where the implementation provided by the python library pyvinecopulibNagler and Vatter (2021) is used.
3.5 Conditional sampling
In the previous section we described a method how to sample from a multivariate distribution with the help of the Rosenblatt transform. This method is used in Sect. 4 below in order to draw samples from the (unconditional) distribution of the forecasting error \(X = P^{\text {PV}}- P^\mathrm{F}\). Furthermore, to model the distribution of the random fluctuation vector X more accurately, we modify the approach considered in Sects. 3.3 and 3.4 such that we can draw samples from the conditional distribution of X for any given radiation forecast \(S=s\). For D-vine copulas, a similar conditional sampling algorithm can be found in Aas et al. (2021) and Bevacqua et al. (2017).
Let \(m,m'\ge 1\) be some integers with \(m'<m\). We first explain the reasons why the fitting and (unconditional) sampling approach considered in Sects. 3.3 and 3.4 has to be modified such that we can draw samples from arbitrary conditional distributions of a random vector \(Y = (Y_1, \ldots , Y_m)\), i.e., to draw samples \(y=(y_1, \ldots , y_m)\) from the conditional distribution of \(Y = (Y_1, \ldots , Y_m)\), given that \(Y_{i_1}=y_{i_1},\ldots ,Y_{i_{m'}}=y_{i_{m'}}\) for some subset of indices \(D=\{i_1,\ldots ,i_{m'}\} \subset \{1, \ldots , m \}\) and some vector \((y_{i_1},\ldots ,y_{i_{m'}})\in \mathbbm {R}^{m'}\),
Recall that the (direct and inverse) Rosenblatt transform considered in Sect. 3.4 works for arbitrary permutations of the sampling order provided that all conditional CDFs required for this transformation are known. Here, the sampling order refers to the order of the marginal dimensions from which samples are drawn. However, if we want to obtain these CDFs with the help of (13) and (14), the structure of the underlying R-vine copula restricts the choice of possible sampling orders. To understand why this is the case, note that in order to sample in any given order would require the construction of arbitrary (conditional) CDFs, the total number of which is equal to \(m 2^{m-1}\). However, an R-vine copula of dimension m consists of \(\frac{m (m-1)}{2}\) bivariate copulas. With the help of (13) and (14) two (conditional) CDFs can be obtained from each bivariate copula, i.e., we can obtain \(m (m-1)\) (conditional) CDFs in total from a given R-vine copula, which limits the number of possible sampling orders.
Consider the R-vine copula in Fig. 1 which has (1, 2, 3, 5, 4) as a possible sampling order. To sample in this order with the inverse Rosenblatt transform, we obtain the required inverse CDFs \(F^{-1}_{Y_1}\), \(F^{-1}_{Y_2 \mid Y_1=y_1}\), \(F^{-1}_{Y_3 \mid Y_1=y_1, Y_2=y_2}\), \(F^{-1}_{Y_5 \mid Y_1=y_1, Y_2=y_2, Y_3=y_3}\) and \(F^{-1}_{Y_4 \mid Y_1=y_1, Y_2=y_2, Y_3=y_3, Y_5=y_5}\) from the marginal distribution \(\boxed {1}\) and the copulas \(\boxed {1,2}\), \(\boxed {1,3 \mid 2}\), \(\boxed {1,5 \mid 2,3}\) and \(\boxed {1,4 \mid 2,3,5}\) respectively. Note that this sampling order is possible because each copula corresponds to an edge connected to the previous copula or marginal distribution, e.g., \(\boxed {1,3 \mid 2}\) corresponds to an edge connected to \(\boxed {1,2}\) while \(\boxed {1,2}\) corresponds to the edge connected to \(\boxed {1}\). This ensures that a suitable copula for the next dimension in the sampling order exists.
Now consider the sampling order (1, 2, 3, 4, 5), which is impossible. Analogously to the previous sampling order the inverse CDFs \(F^{-1}_{Y_1}\), \(F^{-1}_{Y_2 \mid Y_1=y_1}\) and \(F^{-1}_{Y_3 \mid Y_1=y_1, Y_2=y_2}\) can be obtained. However, to obtain the \(4^{\hbox {th}}\) necessary inverse CDF \(F^{-1}_{Y_4 \mid Y_1=y_1, Y_2=y_2, Y_3=y_3}\) for the inverse Rosenblatt transform, the copulas \(\boxed {1,4 \mid 2, 3}\) or \(\boxed {3,4 \mid 1, 2}\) are required which do not exist within the considered R-vine copula.
As shown in Theorem 5.1 in Cooke et al. (2015), an R-vine copula of dimension m has only \(2^{m-1}\) possible sampling orders. This is due to the fact that every possible sampling order corresponds to a vector \(\lambda = (\lambda _1,\ldots ,\lambda _m) = (v_1, \ldots , v_{m-1}, e)\) with \(v_i \in \mathcal {T}_i\) and \(e \in \mathcal {E}_{m-1}\), i.e., the CDF used in the first equation of the Rosenblatt transform is the marginal CDF \(F_{v_1}\) whereas the conditional CDFs of the equations thereafter are given by the copulas corresponding to \(\lambda _2,\ldots ,\lambda _m\). Recall that for each equation of the Rosenblatt transformation the dimension of the condition of the corresponding conditional CDF grows by one. This restricts the choice of \(\lambda _{i+1}\) to copulas for which it holds that \(\lambda _i \in \lambda _{i+1}\) for all \(i \in \{1, \ldots , m-2\}\), i.e., \(\lambda _i \in \mathcal {T}_i\) must be a vertex of the edge \(\lambda _{i+1} \in \mathcal {T}_{i+1}\), because only then the copula corresponding to \(\lambda _{i+1}\) can be used to construct a conditional CDF with a valid condition for the \((i+1)\)-th equation of the Rosenblatt transform.
We thus modify the fitting process for vine copulas presented in Sect. 3.3 such that a vector \(\lambda =(\lambda _1,\ldots ,\lambda _m)\) as described above exists for a given set of indices \(D=\{i_1,\ldots ,i_{m'}\} \subset \{1, \ldots , m \}\). For this, we consider \(\mathcal {T}_1^D = (\mathcal {V}_1^D, \mathcal {E}_1^D) = (D, \{e \in \mathcal {E}_1: e \subseteq D\})\), i.e., \(\mathcal {T}_1^D\) is a graph with vertex set D and edges \(e \in \mathcal {E}_1\) which connect two vertices in D. For \(i > 1\), we recursively define \(\mathcal {T}_i^D = (\mathcal {V}_i^D, \mathcal {E}_i^D) = (\mathcal {E}_{i-1}^D, \{e \in \mathcal {E}_i: e \subseteq \mathcal {E}_{i-1}^D\})\). Note that in general \(\mathcal {T}_i^D\) is not a tree but a forest, however, only if all \(\mathcal {T}_i^D\) are trees the vector \((\mathcal {T}_1^D, \ldots , \mathcal {T}_{m'}^D)\) is a valid R-vine copula. This is necessary to construct an inverse Rosenblatt transform for the dimensions in D, or more generally speaking, it is necessary for the construction of an inverse Rosenblatt transform for all dimensions \(\{1, \ldots , m \}\) where the dimensions in D occur at the beginning.
To ensure that there is a sampling order in which all indices in D are in successive order, we choose the graphs \(\mathcal {T}_i^D\) in the fitting process of the R-vine copula such that \(I(\mathcal {E}_i^D)\) in (15) is maximized (as in the unmodified fitting process considered in Sect. 3.3), where additionally it must hold that \(\mathcal {T}_i^D\) is a tree for all \(i \in \{1, \ldots , m'\}\) because only then we can chose a sampling order where \(\lambda _i \in \lambda _{i+1}\) holds for all \(i \in \{1, \ldots , m-1\}\).
Without loss of generality, we now assume that \(D=\{1,\ldots ,m'\}\). Thus, we omit the first \(m'\) equations of the inverse Rosenblatt transform and sample values \(y_{m'+1},\ldots ,y_m\) for the remaining \(m-m'\) components via
As an example, consider again the R-vine copula in Fig. 1 and the set \(D = \{1, 2, 3\}\) to sample from the conditional distribution of \((Y_4, Y_5)\mid _{Y_1=y_1, Y_2=y_2, Y_3=y_3}\). The graphs \(\mathcal {T}_1^D\), \(\mathcal {T}_2^D\) and \(\mathcal {T}_3^D\) with the sets of vertices \(\{\boxed {1}, \boxed {2}, \boxed {3}\}\), \(\{\boxed {1,2}, \boxed {2,3}\}\), {\(\boxed {1,3 \mid 2}\)} and the corresponding edges correspond to the lower left part of the diagram. Since the graphs \(\mathcal {T}_1^D\), \(\mathcal {T}_2^D\) and \(\mathcal {T}_3^D\) are trees and \((\mathcal {T}_1^D, \mathcal {T}_2^D, \mathcal {T}_3^D)\) is a valid R-vine copula, sampling orders with 1, 2 and 3 at the beginning are possible.
Now consider \(D=(1,2,4)\) for which \(\mathcal {T}_1^D = (\{\boxed {1}, \boxed {2}, \boxed {4}\}, \{\{\boxed {1}, \boxed {2}\}\})\) is not a tree and the vector \((\mathcal {T}_1^D, \mathcal {T}_2^D, \mathcal {T}_3^D)\) is not an R-vine copula. Since \(\boxed {4}\) is not connected to \(\boxed {1}\) or \(\boxed {2}\) in \(\mathcal {T}_1^D\) there can be neither \(\boxed {1,4}\) nor \(\boxed {2,4}\) in \(\mathcal {T}_2^D\) and in turn there can be neither \(\boxed {2,4 \mid 1}\) nor \(\boxed {1,4 \mid 2}\) in \(\mathcal {T}_3^D\). Therefore it is not possible to obtain the required inverse CDFs for an inverse Rosenblatt transform for which the sampling order begins with the elements of D.
In the following section, we explain how the construction of uncertainty sets is performed with the scenario approach from stochastic optimization. We then use the copula-based modeling from this section in order to construct high-quality uncertainty sets for given weather situations.
3.6 Scenario approach to determine a suitable uncertainty set
In order to determine a suitable uncertainty set of the form given in (9) which satisfies (a slightly modified version of) condition (7), we apply, as in Aigner et al. (2021), an idea described in Margellos et al. (2014) and formulate the estimation of the uncertainty set \(B=[\ell _1,u_1]\times \ldots \times [\ell _{n},u_{n}] \subset \mathbbm {R}^{n}\) as an auxiliary probabilistic optimization problem. Then, for this problem with chance constraints, we apply the scenario approach proposed in Campi and Garatti (2008), i.e., the chance constraints considered in (7) are replaced by constraints based on a sufficiently large number of samples drawn from the probability distribution of the random forecasting error \(X = P^{\text {PV}}- P^\mathrm{F}\). In this work this distribution is fitted to empirical data, using the algorithm described in Sect. 3.3, and simulation is performed with the technique described in Sect. 3.4.
The auxiliary optimization problem in its general form consists of a chance constraint model for the enclosure \(B\in {\mathcal B}(\mathbbm {R}^{n})\) of the probability mass of \(X=(X_1,\ldots ,X_n)\) satisfying the condition \(\mathbb {P} (\{\omega : \ X(\omega ) \in B\}) \ge 1-\varepsilon\) for some \(\varepsilon \in (0,1)\), see (7). At the same time, this problem aims for an uncertainty set B such that its size is as small as possible. Thus, in order to apply the scenario approach proposed in Campi and Garatti (2008) to determine an uncertainty box \(B=[\ell _1,u_1]\times \ldots \times [\ell _{n},u_{n}] \subset \mathbbm {R}^{n}\), we consider the probabilistic optimization problem
where the minimum in (18a) extends over all \(\ell =(\ell _1,\ldots ,\ell _n),u=(u_1,\ldots ,u_n)\in \mathbbm {R}^n\) with \(\ell _k<u_k\) for all \(k=1,\ldots ,n\).
Thus, to control the size of the set B, we minimize the sum of interval lengths \(u_k-\ell _k\). In contrast, if minimization of the box volume were used instead, this would lead to a non-convex objective. In this case, the scenario approach proposed in Campi and Garatti (2008) is no longer applicable. Although the solution of (18) does not necessarily minimize the box volume, the solution of the following scenario program does. This is why this choice of objective is suitable. We further explain this after introducing our scenario program.
Suppose that \(N>0\) samples \(x^1,\ldots ,x^N\) are independently drawn from the probability distribution of X. Instead of (18b), in our scenario approach we want to ensure that the samples \(x^1,\ldots ,x^N\) are included in the uncertainty set B. The resulting scenario program for computing \(B=[\ell ,u]\) is thus given by
The solution of this optimization problem can be written explicitly as \([\ell ^*,u^*]\), where \(\ell ^*_k=\min _{i=1,...,N} \{x_k^i\}\) and \(u_k^*=\max _{i=1,...,N}\{x_k^i\}\) for every vector component k. It is true that set \(B^*=[\ell ^*,u^*]\) also minimizes the volume over all sets [l, u] containing the samples \(x^1,\ldots ,x^N\). Although, in general, the solution of problem (18) does not calculate boxes with minimal volume, this is the case for the optimization problem given in (19).
From the results presented in Campi and Garatti (2008), we know that the optimal solution \(B^*=[\ell ^*,u^*]\) of (19) fulfills condition (18b) with a confidence probability of at least \(1-\delta\) for some small \(\delta \in (0,1)\) if \(N>0\) is chosen such that
Note that in the latter inequality, the necessary number of samples \(N>0\) for a predefined confidence level \(1-\delta \in (0,1)\) is given implicitly. However, an explicit sufficient condition has been derived in Alamo et al. (2010), which reads as
Furthermore, we determine the optimal solution \(B^*_s=[\ell ^*_s,u^*_s]\) of (19) based on samples drawn, as described in Sect. 3.5, from the conditional distribution of X for given radiation forecasts \(S=s\).
4 Numerical results
In order to derive the results presented in this section we used the library pyvinecopulib Nagler and Vatter (2021). Furthermore, we utilized Gurobi 9.1.2 [23] as solver for mixed-integer linear programs. The computations were carried out by means of a python implementation on a cluster using 4 cores of a machine with two Xeon E3-1240 v6 “Kaby Lake” chips (4 cores, HT disabled) running at 3.7 GHz with 32 GB of RAM.
4.1 Data description
Data regarding power measurements as well as weather forecasts were provided by the distribution network operator N-ERGIE Netz GmbH (NNG) and the German weather service Deutscher Wetterdienst (DWD). In particular, NNG provided data of solar power supply at more than 150 feed-in points and corresponding active power measurements at 13 network nodes (buses) measured in 15 min intervals. Moreover, NNG provided data regarding the positions of network nodes (buses) and their connections through lines (branches) which include resistance values and transmission limits of each line in the distribution network. A fragment of the NNG distribution network with 34 nodes and 37 lines is visualized in Fig. 2. The solar power forecast \(P^\mathrm{F}\) is provided by a model proposed in Schinke-Nendza et al. (2021).
DWD provided hourly forecasts of global horizontal irradiation, which were generated by the ensemble system of the numerical weather prediction model COSMO-DE, called COSMO-DE-EPS, and statistically interpreted based on synoptic observations at weather stations by Ensemble-MOS of DWD, see Hess (2020). The weather forecasts are issued on a 20 km \(\times\) 20 km grid covering Germany and parts of the neighboring countries at every third hour. The forecasts of global horizontal irradiation were provided with forecast lead times up to 19 h, where the measurements and forecasts range over the months May, June and July of the years 2015–2017.
We split the data into a training set and a validation set. The training set is used to fit model parameters and consists of data from the years 2015 and 2016. Based on the validation set from 2017 the accuracy of the predictions generated by the fitted model is evaluated.
4.2 Fitting unconditional and conditional distributions of forecasting errors
In this section we discuss the fitting of R-vine copulas, as outlined in Sects. 3.3 and 3.5, in order to determine uncertainty sets \(B^*\) of the form introduced in Sect. 3.6. First we explain how to model the (unconditional) distribution of the n-dimensional random vector \(X = P^{\text {PV}}-P^\mathrm{F}\) of power forecasting errors at the n nodes of the electricity network considered in the present paper, where \(n=13\). Besides this, we additionally consider the random vector \(S=(S_1,\ldots ,S_n):\Omega \rightarrow [0,\infty )^n\), which describes the forecasted solar radiation at the n nodes of the electricity network, and we model the conditional distribution of X given that \(S=s\) for some \(s\in [0,\infty )^n\). Moreover, we consider two further types of conditional distributions of X under the condition that \({\overline{S}}={\overline{s}}\) and \(S_k=s_k\), respectively, for some \({\overline{s}}\ge 0\), \(s_k\ge 0\) and \(k\in \{1,\ldots ,n\}\), where
As outlined in Sect. 3, copula theory allows for the modeling of the multivariate distribution of random vectors like the random power forecasting error \(X:\Omega \rightarrow \mathbbm {R}^{n}\). In order to estimate the univariate (marginal) CDFs \(F_{X_1},\ldots ,F_{X_n}\) we use numerically integrated KDEs, with a Gaussian kernel and a bandwidth being equal to the estimated standard deviations \(\sigma _k\) of \(X_k\) for \(k=1,\ldots ,n\), see the left column of Fig. 3. Once an R-vine copula is fitted to the distribution of X, as descibed in Sect. 3.3, we are able to draw realizations from the fitted distribution of X, with which the uncertainty set \(B^*\) can be determined as described in Sect. 3.6. This method results in one single uncertainty set \(B^*\) for all considered hours, since the fitted R-vine copula models the (unconditional) distribution of X, irrespective of other variables, which are possibly correlated with X. Thus, it is sensible to investigate if and to which extent the random vector X of power forecasting errors depends on various other variables, like the random vector S of forecasted solar radiations at the n nodes. For this reason, we also model various conditional distributions of X.
To condition on the forecasted solar radiation vector S, we consider the three cases mentioned above, i.e., \(S=s\), \({\overline{S}} = {\overline{s}}\), and \(S_k=s_k\) for some \(k\in \{1,\ldots ,n\}\). From a meteorological perspective, the network nodes in \(\mathcal {N}\) are in close geographical proximity and, therefore, the forecasted solar radiations \(S_{1}, \ldots , S_n\) at the n network nodes are highly correlated. Thus, it might be sufficient to consider either the average solar radiation \({\overline{S}}\) or the solar radiation \(S_k\) for one single node, instead of the random vector S, which reduces the complexity of the copula model without much loss of information.
As can be seen in Fig. 3, the power forecasting errors \(X_k,X_{k'}\) have unimodal distributions which are well approximated by KDEs. For the forecasted solar radiations, \(S_k,S_{k'}\), however, the values of the densities are significantly larger than zero at the distribution limits. Since the kernel of the KDE would cross the bounds of the distribution for data points close to those bounds, we first transform the components of S, as well as \({\overline{S}}\) and \(S_k\), using the mapping \(T:[a,b]\rightarrow [-\infty ,\infty ]\) with \(T(x) = F_{N(0,1)}^{-1}(F_{U(a,b)}(x))\) for each \(x\in [a,b]\), where \(F_{N(0,1)}\) is the CDF of the standard normal distribution and \(F_{U(a,b)}\) is the CDF of U(a, b), the uniform distribution for the interval [a, b] for some \(a,b\in \mathbbm {R}\) with \(a<b\). Thus, T maps the bounded interval [a, b] onto \(\mathbbm {R}\). Since the endpoints a and b are mapped to \(-\infty\) and \(\infty\), respectively, we choose them to be slightly outside the bounds of the solar radiation distribution such that T does not map any data point to \(\pm \infty\). The ranges of values of the transformed random variables T(S), \(T({\overline{S}})\) and \(T(S_k)\) are unbounded and we can apply kernel density estimators to their transformed data points T(s), \(T({\overline{s}})\) and \(T(s_k)\), where \(T(s)=(T(s_1), \ldots , T(s_n))\). Finally, we transform the density functions \({\hat{f}}_{T(S_i)}\) back to the interval [a, b] with \({\hat{f}}_{S}(x) = \frac{1}{c} {\hat{f}}_{T(S)}(T(x))\) for each \(x\in [a,b]\), where \(c>0\) is a normalizing constant.
Once the densities of the marginal distributions of X and S, as well as the densities of \({\overline{S}}\) and \(S_k\) are determined, they are numerically integrated to obtain the corresponding CDFs with which an R-vine copula is fitted, as described in Sect. 3.3. Now we can draw samples from the (unconditional and conditional) R-vine copula model with which we construct uncertainty sets \(B^*\), as described in Sect. 3.6. Figure 4 shows the histograms of samples drawn from conditional R-vine copula models for different solar radiation forecasts and, in particular, how the conditional error distribution changes for different forecasted solar radiations.
To check how well the R-vine copula model captures the correlations of the dataset of forecasted radiations and power forecasting errors, we compare the values of empirical Kendall’s tau (see (16)) for all pairs of components of the vector \((S_1,\ldots ,S_{n},X_1,\ldots ,X_{n})\). It can be seen in Fig. 5 that the R-vine copula model manages to capture the correlation within the underlying dataset quite well, since the values of empirical Kendall’s tau computed from the dataset of forecasted radiations and power forecasting errors (left) and from simulated realizations of the R-vine copula model (right), respectively, show very similar correlation structures.
Note that we consider copulas with up to 26 dimensions while the available dataset contains only 180 data points. This makes it difficult to reliably assess the goodness of fit of the copula model. However, in the following we evaluate the entire model chain with various validation scores in order to assess the additional benefit of the copula model.
4.3 Analyzing the size of uncertainty sets
We now analyze the size of uncertainty sets for the robust approximation of chance constraints using the scenario approach described in Sect. 3.6. The resulting sets depend on the samples drawn from the unconditional probability distribution and the three conditional distributions of power forecasting errors, respectively, considered in Sect. 4.2. Note that the minimum number N of samples required for the scenario approach, determined by means of (20), goes from \(N=48\) (for \(1-\varepsilon =0.01\)) over \(N=469\) (\(1-\varepsilon =0.9\)) to \(N=4684\) samples (for \(1-\varepsilon =0.99\)). In practice, a coverage probability \(1-\varepsilon\) of about 0.9 is often practically relevant and therefore \(N=469\) samples are sufficient for the scenario approach with a confidence of \(1-\delta =0.99\).
For the numerical results discussed in the present section, we use an average uncertainty set which is obtained from applying the scenario approach 500 times. In this way, our numerical results become reproducible because the average uncertainty set does not change significantly, when the procedure described above is repeated.
Figure 6 shows values of the size measure given in (18a), i.e. for the sum of interval lengths, of uncertainty sets computed exemplarily for a usual summer day at noon with an average hourly global horizontal irradiation of 0.63\(\frac{kWh}{m^2}\), in dependence of different values of the coverage probability \(1-\varepsilon\) with a confidence of \(1-\delta =0.99\). Note that smaller confidence levels would lead to smaller uncertainty sets, but the quality of these sets also decreases. In particular, there would no longer be a confidence probability of 0.99 that the computed uncertainty set covers the chosen probability mass of \(1-\varepsilon\).
The values displayed in Fig. 6 are normalized by the size of the largest uncertainty set, namely the unconditional uncertainty set for a coverage probability of 0.99. It can be seen that the sizes of the uncertainty sets increase with increasing probabilities \(1-\varepsilon\) as the confidence regions cover a larger set of realizations of the random vector X of power forecasting errors. In comparison to the uncertainty sets constructed with conditional probability distributions of X, the unconditional distribution of X leads for all coverage probabilities \(1-\varepsilon\) to larger uncertainty sets. Thus, with knowledge on the forecasted solar radiation, it is possible to adapt the uncertainty sets to the current weather situation, which leads to small sizes. Not surprisingly, the conditional distribution of X with given solar radiation at all n solar feed-in nodes yields the smallest uncertainty sets for all coverage probabilities \(1-\varepsilon\). However, the differences between these sizes and those obtained for the other two conditional settings with less complete information on the forecasted solar radiation, i.e. knowledge of average solar radiation (avg), and at one single node (one), are not too large. Furthermore, the size differences between the conditional settings ’avg’ and ’one’ are negligible.
The numerical results presented in the remaining part of this section concern the case \(1-\varepsilon = 0.9\), i.e. the practically most relevant value of the coverage probability \(1-\varepsilon\). For this safety margin, we analyze the uncertainty sets obtained for the four (unconditional and conditional) distributions of X described above and for each day in the validation dataset. In particular, we determine the empirical coverage probability by counting how often the realizations drawn from the respective distribution of the random vector X belong to the corresponding uncertainty set. Furthermore, we compute and compare the average size of the uncertainty sets, i.e. the sum of interval lengths, and their average volume, i.e. the product of interval lengths. The results are displayed in Table 2, where it can be seen that the four different settings lead to similar empirical coverage probabilities around the given level of 0.9. On the other hand, the reductions of size and volume of uncertainty sets implied by considering conditional distributions of the power forecasting error X are clearly visible. Again, the case with given solar radiation at all n solar feed-in nodes yields the smallest uncertainty sets, whereas the size differences between the conditional settings ’avg’ and ’one’ are negligible.
To further analyze the impact of additional knowledge regarding solar radiation forecast on size and location of uncertainty sets, we determined uncertainty sets for a rather sunny day at noon with a high average solar radiation forecast of 0.76 \(\frac{kWh}{m^2}\) and a less sunny day at noon with a low average solar radiation forecast of \(0.18~\frac{kWh}{m^2}\). The results are shown in Fig. 7, where the uncertainty sets are plotted via their confidence intervals (in MW) for each solar feed-in point.
It turned out that the lengths of the confidence intervals significantly shrink by considering conditional distributions of the power forecasting error X, given a high average solar radiation forecast. More precisely, the lower endpoints of the confidence intervals are shifted upwards, i.e., negative power forecasting errors are less likely, whereas the upper endpoints remain almost unchanged, see Fig. 7 (left). On the other hand, for low average radiation forecast, the confidence intervals are shifted downwards by considering conditional distributions of the power forecasting error, but their lengths remain almost unchanged, see Fig. 7 (right).
Finally, we note that also the results of the numerical experiments presented in Aigner et al. (2021) are based on (measured) power feed-in data from NNG and forecasted radiation data from DWD. However, the used database differs from that of the present paper, where, in addition, solar power forecast data are exploited provided by the forecasting model of Schinke-Nendza et al. (2021). In this way, by modeling the multivariate probability distribution of solar power forecast data via R-vine copulas, it is possible to determine conditional uncertainty sets, which meet the desired coverage probability of 0.9. They have significantly smaller sizes than the corresponding unconditional uncertainty sets from Aigner et al. (2021) which led to an larger empirical coverage of 0.98 although \(1-\varepsilon =0.9\) was required.
4.4 Robust curtailment
As important as the size of the computed uncertainty sets is the quality of solutions obtained by solving the robust approximation (8) of the chance constrained optimization problem described in (6). In order to solve (8), we use the network parameters given by the power network operator NNG. The curtailment options for the feed-in nodes in the electrical power network of NNG are \(\beta _k \in \{0,\,0.1,\,0.2,\,\ldots ,\,1.0\}\). Moreover, the participation factors of the generators are fixed values given by NNG (\(\alpha _{31} = \alpha _{34} = 0.05\), \(\alpha _{32} = \alpha _{33} = 0.45\)). There are no costs affiliated with the power transfer at the (slack-) generators on the boundary nodes. Hence, there are no generator production costs and the corresponding term in the objective function is given as \(\sum \nolimits _{k \in \mathcal {N}_\text {G}} f_k(P^{\text {G}}_k)\) with \(f_k(P^{\text {G}}_k)=0\) for each \(k \in \mathcal {N}_\text {G}\). The curtailment costs are modeled as \(\sum \nolimits _{k \in \mathcal {N}} c_k(\beta _k)\) with \(c_k(\beta _k)=P^{\text {I}}_k(1-\beta _k)\) for each \(k\in \mathcal {N}\). The minimization of this objective function leads to a minimum curtailment of solar feed-in.
Due to the balanced network situations in the historical data, there is no need to curtail the solar feed-in in the instances from the validation set. There is also no danger of overload and the optimization leads to trivial solutions with a curtailed solar power equal to 0. Thus, in order to generate test cases with critical network situations (and non-trivial solutions), we artificially increased the solar power feed-in, whereas the network topology, transmission line parameters and the power demand remained unchanged. More precisely, based on the data of the validation set, we increased the installed solar power and the feed-in up to the by NNG planned total solar power capacities of the year 2022 and the planned total solar power increase of year 2025. The corresponding scaling of power generation forecast and uncertainty sets creates an oversupply of renewable energy, and therefore it is more likely in these instances that a curtailment will be required. Furthermore, in addition to the up-scaled solar power, we simulated the impact of transmission line failure on the solution of our optimization problem.
Thus, we now discuss further details for the following experimental setups:
-
A:
Installed solar power as planned in 2025,
-
B:
Installed solar power as planned in 2022 with a failure of lines (6, 19) and (9, 30).
To obtain the results, a mixed-integer optimization problem was solved for each instance and each (unconditional and conditional) uncertainty set. The computing times are very low and, thus, solutions can be generated efficiently. Indeed, the average computing times for the two settings are 2.8s (setting A) and 1.1s (setting B), with a maximal run time of 8.2s (setting A) and 4.0s (setting B).
The robustness of a solution of (8) can be validated by checking if the computed network configuration leads to an overload after the realization of uncertainty. The corresponding entries in Table 3 show that nominal solutions generated without probabilistic constraints (or, in other words, for \(1-\varepsilon =0\)) lead to overload in a large amount of test instances. In contrast, only up to three robust solutions lead to constraint violation in each setting for the different probabilistic models. The relative frequencies for this is therefore below the given threshold of \(\varepsilon =0.1\). This indicates the feasibility of the robust solutions for the chance constraints. This shows that the robust protection against uncertainties is necessary and reasonable, since the number of technical constraint violation could be strongly reduced in the numerical experiments.
To further investigate the quality of solutions of (8), we computed the amount of curtailed solar power of the robust solution in comparison to the solution of the nominal problem (1) without a protection against uncertainty. The increase in curtailed energy of the robust solutions in comparison to the nominal ones can be interpreted as the cost of robust protection. That means how much the curtailment costs increase due to the protection against uncertainties. Figure 8 shows box plots for the increase of relative curtailment costs using the four (unconditional/conditional) types ’no’, ’avg’, ’one’ and ’all’ of uncertainty sets. One can see that, again, the addition of further knowledge about the solar radiation improves the performance in both settings. This corresponds to the size reduction of the uncertainty sets recognized in Sect. 4.3. Overall, the relative cost increase in all experiments is relatively small. However, using the samples drawn from the three conditional distributions of power forecasting errors enable us to further reduce the amount of wasted energy under the same solution guarantees, where, again, the conditional settings ’avg’ and ’one’ have a similar impact. In comparison with the preliminary results obtained in Aigner et al. (2021), the amount of curtailed energy could drastically reduced on average from about 13 to \(5\%\) under the same solution quality guarantees. This coincides with the reduction of uncertainty set size discussed at the end of Sect. 3.6.
In summary, the obtained results show that the scenario approach for the considered instances in combination with the copula-based stochastic modeling of power forecasting errors leads to high-quality solutions. The addition of further knowledge about the current weather situation allows us to construct more precise uncertainty sets. We are able to produce robust solutions with a relative small increase of curtailment costs, while maintaining the same level of protection.
5 Conclusion
In this paper, we combine the robust approximation of chance constrained DC Optimal Power Flow with a probabilistic uncertainty model based on R-vine copulas to reduce the curtailment of solar power while keeping the power grid stable. The chance constrained DC Optimal Power Flow determines appropriate levels of curtailment based on a deterministic forecast for the expected solar power feed-in and uncertainty sets, i.e., multidimensional cuboids which contain the forecasting error with a given probability. These uncertainty sets are approximated with the help of the multivariate probability distribution of the forecasting error at all considered power grid nodes. This results in less curtailments and a more stable power grid compared to the results of a model without uncertainty sets.
To further improve upon these results, we incorporate knowledge about solar radiation in the solution process by considering the conditional forecasting error distribution for a given solar radiation forecast. This leads to sharper distributions, i.e., the forecasting error can be predicted with higher accuracy, which results in smaller uncertainty sets. Compared to the unconditional case, this leads to even less curtailments and improved stability of the power grid.
Our numerical results demonstrate the applicability of our procedure and the positive effects of incorporating a probabilistic model for the distribution of random solar radiation vectors. Future research can transfer our solution framework to different applications under uncertainty like in energy network optimization.
Future research could add further features and investigate questions arising from the application, for example adding optimal transmission switching under uncertainty or including storage elements and unit commitment constraints over time. From a mathematical point of view, it would be interesting to study different geometries for uncertainty sets to further reduce the conservatism of the robust approximation. The major challenge is to find assumptions where an equivalent reformulation for the resulting problems is possible. In order to improve the copula-based sampling from conditional probability distributions, it might be promising to add more information (e.g. temperature, solar altitude, time) to the model.
References
Aas K, Czado C, Frigessi A, Bakken H (2009) Pair-copula constructions of multiple dependence. Insur Math Econ 44(2):182–198
Aas K, Nagler T, Jullum M, Løland A (2021) Explaining predictive models using shapley values and non-parametric vine copulas. Depend Model 9(1):62–81
Aigner K-M, Clarner J-P, Liers F, Martin A (2022) Robust approximation of chance constrained dc optimal power flow under decision-dependent uncertainty. Eur J Oper Res. https://doi.org/10.1016/j.ejor.2021.10.051
Alamo T, Tempo R, Luque A (2010) On the sample complexity of randomized approaches to the analysis and design under uncertainty. In Proceedings of the 2010 American Control Conference, pp 4671–4676. IEEE
Bedford T, Cooke RM (2001) Probability density decomposition for conditionally dependent random variables modeled by vines. Ann Math Artif Intell 32(1):245–268
Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust Optim. Princeton University Press
Bevacqua E, Maraun D, Hobæk Haff I, Widmann M, Vrac M (2017) Multivariate statistical modelling of compound events via pair-copula constructions: analysis of floods in Ravenna (Italy). Hydrol Earth Syst Sci 21(6):2701–2723
Bienstock D, Chertkov M, Harnett S (2014) Chance-constrained optimal power flow: risk-aware network control under uncertainty. SIAM Rev 56(3):461–495
Borkowska B (1974) Probabilistic load flow. IEEE Trans Power Appar Syst 93(3):752–759
Calafiore G, Campi M (2005) Uncertain convex programs: randomized solutions and confidence levels. Math Programm 102:25–46
Campi MC, Garatti S (2008) The exact feasibility of randomized solutions of uncertain convex programs. SIAM J Optim 19(3):1211–1230
Carpentier J (1962) Contribution a l’etude du dispatching economique. Bull Soc Francaise Electr 8:431–447
Christie RD, Wollenberg BF, Wangensteen I (2000) Transmission management in the deregulated environment. Proc IEEE 88(2):170–195
Cooke R, Kurowicka D, Wilson K (2015) Sampling, conditionalizing, counting, merging, searching regular vines. J Multivar Anal 138:4–18
Czado C (2019) analyzing dependent data with vine copulas. Springer
Dall’Anese E, Baker K, Summers T (2017) Chance-constrained AC optimal power flow for distribution systems with renewables. IEEE Trans Power Syst 32(5):3427–3438
Dissmann J, Brechmann EC, Czado C, Kurowicka D (2013) Selecting and estimating regular vine copulae and application to financial returns. Comput Stat Data Anal 59:52–69
Frank S, Steponavice I, Rebennack S (2012) Optimal power flow: a bibliographic survey I. Energy Syst 3(3):221–258
Frank S, Steponavice I, Rebennack S (2012) Optimal power flow: a bibliographic survey II. Energy Syst 3(3):259–289
Geng X, Xie L (2019) Data-driven decision making in power systems with probabilistic guarantees: theory and applications of chance-constrained optimization. Annu Rev Control 47:341–363
Gorissen BL, Yanıkoğlu I, den Hertog D (2015) A practical guide to robust optimization. Omega 53:124–137
Guo R, Ye H, Song C, Gao W (2021) Research on optimal power flow calculation method for multi-wind power distribution network. In IOP Conference Series: Earth and Environmental Science, vol 811, pp 012017
Gurobi Optimization LLC (2021). Gurobi Optimizer Reference Manual. http://www.gurobi.com
Haff IH, Aas K, Frigessi A (2010) On the simplified pair-copula construction-simply useful or too simplistic? J Multivar Anal 101(5):1296–1310
Hess R (2020) Statistical postprocessing of ensemble forecasts for severe weather at Deutscher Wetterdienst. Nonlinear Processes Geophys 27:473–487
Jia M, Hug G, Shen C (2021) Iterative decomposition of joint chance constraints in opf. IEEE Trans Power Syst 36(5):4836–4839
Joe H (2015) Dependence Modeling with Copulas. Chapman and Hall/CRC
Khuntia SR, Rueda JL, van der Meijden MA (2019) Risk-based security assessment of transmission line overloading considering spatio-temporal dependence of load and wind power using vine copula. IET Renew Power Gener 13(10):1770–1779
Kurowicka D, Joe H, editors (2010) Dependence Modeling : Vine Copula Handbook. World Scientific Publishing Co
Lubin M, Dvorkin Y, Backhaus S (2016) A robust approach to chance constrained optimal power flow with renewable generation. IEEE Trans Power Syst 31(5):3840–3849
Margellos K, Goulart P, Lygeros J (2014) On the road between robust optimization and the scenario approach for chance constrained optimization problems. IEEE Trans Autom Control 59(8):2258–2263
Nagler T, Vatter T (2021) Documentation of the Pyvinecopulib package. https://vinecopulib.github.io/pyvinecopulib/
Nelsen R (2006) An introduction to copulas. Springer
Nemirovski A (2012) On safe tractable approximations of chance constraints. Eur J Oper Res 219(3):707–718
Peña-Ordieres A, Molzahn DK, Roald LA, Wächter A (2021) DC optimal power flow with joint chance constraints. IEEE Trans Power Syst 36(1):147–158
Prékopa A (1995) Stochastic programming. Springer
Qiu F, Wang J (2014) Chance-constrained transmission switching with guaranteed wind power utilization. IEEE Trans Power Syst 30(3):1270–1278
Roald L, Andersson G (2018) Chance-constrained AC optimal power flow: reformulations and efficient algorithms. IEEE Trans Power Syst 33(3):2906–2918
Roald L, Misra S, Chertkov M, Backhaus S, Andersson G (2016) Chance constrained optimal power flow with curtailment and reserves from wind power plants. In Proceedings of PSCC (Power Systems Computation Conference), arXiv:1601.04321
Roald L, Oldewurtel F, Van Parys B, Andersson G (2015) Security constrained optimal power flow with distributionally robust chance constraints. arXiv preprint arXiv:1508.06061
Schinke-Nendza A, von Loeper F, Osinski P, Schaumann P, Schmidt V, Weber C (2021) Probabilistic forecasting of photovoltaic power supply - a hybrid approach using D-vine copulas to model spatial dependencies. Appl Energy 304:117599
Sherali HD, Adams WP (2013) Reformulation-linearization techniques for discrete optimization problems. In: Pardalos PM, Du D-Z, Graham RL (eds) Handbook of Combinatorial Optimization. Springer, pp 2849–2896
Silverman BW (1986) Density estimation for statistics and data analysis. CRC Press
von Loeper F, Kirstein T, Idlbi B, Ruf H, Heilscher G, Schmidt V (2021) Probabilistic analysis of solar power supply using D-vine copulas based on meteorological variables. In: Goettlich S, Herty M, Milde A (eds) Mathematical Modeling, Simulation and Optimization for Power Engineering and Management, volume 34 of Mathematics in Industry. Springer, pp 51–68
Wang Q, Guan Y, Wang J (2011) A chance-constrained two-stage stochastic program for unit commitment with uncertain wind power output. IEEE Trans Power Syst 27(1):206–215
Xiao Q, Zhou S, Xiao H (2020) Probabilistic optimal power flow analysis incorporating correlated wind sources. Int Trans Electr Energy Syst 30(8):e12441
Xie W, Ahmed S (2018) Distributionally robust chance constrained optimal power flow with renewables: a conic reformulation. IEEE Trans Power Syst 33(2):1860–1867
Xu Y, Korkali M, Mili L, Valinejad J, Chen T, Chen X (2021) An iterative response-surface-based approach for chance-constrained ac optimal power flow considering dependent uncertainty. IEEE Trans Smart Grid 12(3):2696–2707
Zhang H, Li P (2011) Chance constrained programming for optimal power flow under uncertainty. IEEE Trans Power Syst 26(4):2417–2424
Acknowledgements
We are grateful to Rainer Bäsmann for many fruitful discussions on the operation of electricity networks. We also thank ”Deutscher Wetterdienst”, especially Reinhold Hess, and ”N-ERGIE Netz GmbH” for providing data used in this study. We are thankful to Jan-Patrick Clarner for providing parts of the implementation. This research has been funded by the Federal Ministry of Education and Research of Germany (grants 05M18WEB and 05M18VUB). Furthermore, we would like to thank Deutsche Forschungsgemeinschaft (DFG) for their support within projects A05, B06, B07, and B10 of the Sonderforschungsbereich/Transregio 154 ”Mathematical Modelling, Simulation and Optimization using the Example of Gas Networks”. This work has been supported by grant 03El1036A from the Federal Ministry for Economic Affiars and Energy, Germany.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aigner, KM., Schaumann, P., Loeper, F.v. et al. Robust DC optimal power flow with modeling of solar power supply uncertainty via R-vine copulas. Optim Eng 24, 1951–1982 (2023). https://doi.org/10.1007/s11081-022-09761-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-022-09761-0