1 Introduction

As renewable energy, which is based on photovoltaic (PV) and wind turbine (WT) power, increasingly penetrates the distribution network [1, 2], its uncertainty and volatility bring great challenges to the optimal operation of the distribution network [3, 4]. The traditional deterministic optimization method is no longer suitable [5], and the uncertainty optimization method is created. To ensure the safety and reliability of the distribution network operation [6, 7] and to adapt to the uncertain development of renewable energy sources and loads, this study considers the optimal scheduling problem of the active distribution network (ADN), considering the uncertainties of the source and load.

Renewable distributed energy accesses the distribution network in large quantities, increasing the uncertainty factor and enriching the schedulable resources of the distribution network. The optimal dispatching model of an ADN with a distributed power supply and the optimal scheduling model of a traditional distribution network have changed greatly in optimization methods, optimization objectives, control variables and selection of grid constraints [8]. The traditional deterministic optimization method is no longer applicable, and the scheduling model shifts from deterministic to uncertain. The deterministic optimization method considers the predicted value of PV and WT power output to be a certain value, and the error between the actual and predicted values results in insufficient or excess distribution network power, which is compensated by the spare capacity in the system [9]. At present, when solving the problem of optimal scheduling of distribution networks with uncertainty in existing research, two kinds of uncertainty optimization methods are usually used: stochastic programming and robust optimization.

The stochastic programming method optimizes the confidence level of a given constraint to the target expected value, which is necessary to obtain accurate probability distribution data. The method has large scale and high cost of calculation. Because it is necessary to obtain a deterministic probability distribution, which may lead to inaccuracy, these models cannot reflect the actual situation [10, 11]. Reference [12] adopts the Monte Carlo test method to consider the uncertainty of PV output, WT output, and load power, generate scenarios and corresponding probabilities, and use the mixed-integer linear programming method to solve the model. Two-stage stochastic optimization models are proposed in [13, 14].

Compared to the stochastic programming approach, robust optimization [15] does not need to explicitly know the probability distribution of random variables; it only needs to obtain the range of the random variables. However, this method does not make full use of the available probability statistics, and thus, the resulting solution is too conservative [16, 17]. In recent years, two-stage robust optimization models and multi-stage robust optimization models have been introduced and studied, also called robust or adaptive optimization [18, 19]. Due to the improved modeling capabilities, the two-stage robust optimization method has become a popular decision-making tool for power system scheduling problems.

In practical problems, both risk and ambiguity should be considered when modeling an optimization problem under uncertainty. To solve the shortcomings of the stochastic programming and robust optimization methods, the two methods are organically combined to obtain a distributionally robust optimization method [20,21,22]. Scarf finds that the precise demand distribution is unknown but is characterized by its mean and variance. In recent years, many more studies have been conducted [23, 24]. Unfortunately, such problems are typically computationally intractable [25, 26]. To circumvent the intractability, an approach known as the linear decision rule is used. However, it has rarely been applied and changed in power system.

This paper establishes an ADN scheduling model that considers the uncertainties of the source and load based on a distributionally robust optimization method, and introduces an ambiguity set to capture the uncertainties of renewable energy outputs and load. It uses the generalized linear decision rule to approximate the two-stage model strictly to reduce the conservativeness of the optimal solution obtained by the robust optimization method. Finally, the simulations are performed on the improved IEEE 33-node and IEEE 118-node systems for comparison with the deterministic, stochastic programming, and robust optimization methods to verify the validity and superiority of the proposed model and method.

2 Two-stage distributionally robust optimization approach

The uncertain random variables involved in this paper include the PV output, WT output, and fluctuating load. Because there is a certain error between the predicted value and actual value, the random variable v is set to represent the prediction error of PV output, WT output, and load power at each time t. This paper combines the advantages of stochastic programming and traditional robust optimization approaches. It also proposes a two-stage distributionally robust optimization approach by introducing the probability distribution information of uncertain random variable v into an ambiguity set F to find the optimal solution under the worst-case distribution conditions to obtain a min–max problem. The model is as follows. Equation (1) represents a two-stage distributionally robust model.

$$\mathop {\hbox{min} }\limits_{x} C_{1} (\varvec{x}_{ 1} ) + \mathop{\hbox{max} }\limits_{ \, P \in F} E[Q(\varvec{x}_{ 2} ,\varvec{v})]$$
(1)
$$Q(\varvec{x}_{ 1} ) = \hbox{min}\, C_{1} (\varvec{x}_{ 1} )$$
(2)
$$\varvec{Ax}_{1} \le \varvec{b} \quad\quad \varvec{x}_{1} \in \mathbf{R}^{{N_{ 1} }}$$
(3)
$$Q(\varvec{x}_{ 2} ,\varvec{v}) = \hbox{min} \,C_{2} (\varvec{x}_{ 2} ,\varvec{v})$$
(4)
$$\varvec{Tx}_{ 1} + \varvec{Wx}_{ 2} \le \varvec{h}(\varvec{v}) \quad\quad \varvec{x}_{ 2} \in \mathbf{R}^{{N_{2} }}$$
(5)

where x1 and x2 are the decision variables of the first stage and second stage, respectively; E[Q(x2,v)] is the expected value of the minimum value of the second-stage cost Q(x2,v); P is the probability distribution of the random variable v; F is the ambiguity set representing the distribution information of the random variable v. In (2) and (4), C1(x1) and C2(x2,v) are the cost functions of the first stage and second stage, respectively. The inequality constraints of first stage and second stage are represented in (3) and (5), respectively. Matrix \(\varvec{A} \in {\mathbf{R}}^{{M_{1} \times N_{1} }}\), \(\varvec{T} \in {\mathbf{R}}^{{M_{2} \times N_{1} }}\), and \(\varvec{W} \in {\mathbf{R}}^{{M_{2} \times N_{2} }}\) where N1 and N2 represent the set of decision variables of the first stage and second stage, respectively; and M1 and M2 represent the set of constraints of the first stage and second stage, respectively.

From (5), it can be seen that the function h(v) changes with variation in the random variable v and is expressed by the affine function of (6).

$$\varvec{h}(\varvec{v}) = \varvec{h}^{0} + \sum\limits_{s \in S} {\varvec{h}_{s}^{v} \varvec{v}_{s} }$$
(6)

where S is the set of all random variables v; and \(\varvec{h}_{s}^{v} \in {\mathbf{R}}^{{M_{2} }}\) and \(\varvec{h}^{0} \in {\mathbf{R}}^{{M_{2} }}\) are constants.

3 Two-stage distributionally robust optimization model for ADN

3.1 Objective function

The objective function of the two-stage distributionally robust optimization model of the ADN is used to minimize the total expected cost and construct the model (1). This paper does not consider the distribution network reconfiguration and assumes local reactive power compensation.

The first-stage optimization model can be seen in (7). The actual values of the uncertain random variable v (e.g. WT, PV and load outputs) of ADN are unknown in the first-stage, they can be obtained by prediction. The decision variable x1 is represented as the active power of all kinds of controllable distributed power sources and flexible loads. Furthermore, the first-stage cost can be expressed as the sum of operating costs of schedulable distributed energy resources (DERs), such as microturbines (MTs) \(C^{\text{MT}}\), energy storage systems (ESSs) \(C^{\text{ESS}}\), and flexible loads (FLs) \(C^{\text{FL}}\); main grid purchase cost \(C^{\text{TR}}\); and line loss cost of the ADN \(C^{\text{Loss}}\).

$$C_{1} (\varvec{x}_{ 1} ) = \sum\limits_{t = 1}^{{N_{T} }} {(C_{t}^{\text{Loss}} + C_{t}^{\text{TR}} + C_{t}^{\text{MT}} + C_{t}^{\text{ESS}} + C_{t}^{\text{FL}} )}$$
(7)

where NT is the set of time periods.

The second-stage optimization model can be seen in (8). In the second stage, the actual input values of random variable v are known, which takes the deviation to the predicted value. The decision variable x2 is expressed as the active output of PV, WT, and load reduction. In (8), DG represents the PV and WT units, and the second-stage cost is set to the sum of the DG cut-down penalty cost \(C^{\text{DG}}\) and load loss cost \(C^{\text{L}}\).

$$C_{2} (\varvec{x}_{ 2} ) = \sum\limits_{t = 1}^{{N_{T} }} {(C_{t}^{\text{DG}} + C_{t}^{\text{L}} )}$$
(8)

3.2 Constraints

1) Equality constraints

The common power flow equations are as follows.

$$\left\{ \begin{aligned} &P_{j,t} = \sum\limits_{k \in \delta \left( j \right)} {P_{jk,t} - \sum\limits_{i \in \pi \left( j \right)} {\left( {P_{ij,t} - I_{ij,t}^{2} r_{ij} } \right)\;\;\;\;\;} } \hfill \\ &Q_{j,t} = \sum\limits_{k \in \delta \left( j \right)} {Q_{jk,t} - \sum\limits_{i \in \pi \left( j \right)} {\left( {Q_{ij,t} - I_{ij,t}^{2} x_{ij} } \right)\;\;\;\;\;} } \hfill \\ &V_{j,t}^{2} = V_{i,t}^{2} - 2\left( {P_{ij,t} r_{ij} + Q_{ij,t} x_{ij} } \right) + I_{ij,t}^{2} \left( {r_{ij}^{2} + x_{ij}^{2} } \right) \hfill \\ &\frac{{P_{ij,t}^{2} + Q_{ij,t}^{2} }}{{V_{j,t}^{2} }} = I_{ij,t}^{2} \;\;\;\;\; \hfill \\ \end{aligned} \right.\;\;\;\;\forall j \in B,{\kern 1pt} \forall ij \in E$$
(9)
$$P_{i,t} = P_{i,t}^{\text{MT}} + P_{i,t}^{\text{TR}} + P_{i,t}^{\text{PV}} + P_{i,t}^{\text{WT}} + \left( {P_{i,t}^{\text{Bc}} - P_{i,t}^{\text{Bd}} } \right) - P_{i,t}^{\text{L}}$$
(10)

where \(P_{j,t}\) and \(Q_{j,t}\) are the injecting active power and reactive power at bus j at time t, respectively; B and E are the sets of all buses and lines, respectively; \(\delta \left( j \right)\) and \(\pi \left( j \right)\) are the sets of all end and head branch nodes, respectively; \(P_{ij,t}\) and \(Q_{ij,t}\) are the active power and reactive power from i to j at time t, respectively; \(r_{ij}\) and \(x_{ij}\) are the line resistance and reactance; Iij,t is the line current at time t; \(V_{j,t}\) is the node voltage at bus j at time t; \(P_{i,t}^{\text{PV}}\) and \(P_{i,t}^{\text{WT}}\) are the predicted active power outputs of the PV and WT units, respectively; \(P_{i,t}^{\text{Bc}}\) and \(P_{i,t}^{\text{Bd}}\) are the charge and discharge powers of the ESS, respectively; and \(P_{j,t}^{\text{L}}\) is the load active power.

2) Inequality constraints

$$P_{i}^{\hbox{min} } \le P_{i,t} \le P_{i}^{\hbox{max} }$$
(11)
$$\Delta P_{i,t}^{\text{down}} \le P_{i,t} - P_{i,t - 1} \le \Delta P_{i,t}^{\text{up}}$$
(12)

where \(P_{i}^{\hbox{max} }\) and \(P_{i}^{\hbox{min} }\) are the upper and lower active power constraints of schedulable DERs; and \(\Delta P_{i,t}^{\text{up}}\) and \(\Delta P_{i,t}^{\text{down}}\) are the upper and lower climbing limits of MT in the first stage.

Equations (13)–(14) represent the PV, WT, and load-reduction constraints in the second stage:

$$\left\{ \begin{aligned} &0 \le P_{i,t}^{\text{PV}} \le \overline{P}_{i}^{\text{PV}} \quad \forall i \in B^{\text{PV}} \hfill \\ &0 \le P_{i,t}^{\text{WT}} \le \overline{P}_{i}^{\text{WT}} \quad \!\! \forall i \in B^{\text{WT}} \hfill \\ \end{aligned} \right.$$
(13)
$$\Delta P_{i,t}^{\text{L}} \le \omega_{i,t}^{\text{cut}} P_{i,t}^{\text{L}} \;\;\;\;i \in B^{\text{L}}$$
(14)

where \(\bar{P}_{i}^{\text{PV}}\)and \(\bar{P}_{i}^{\text{WT}}\) are the upper limit values of PV and WT generation, respectively; \(\omega_{i,t}^{\text{cut}}\) is the upper limit value of the active power ratio of the lost load, for which we use 0.1 in this paper.

3) Other constraints

To reduce the influence of the randomness of the output uncertain variables on the safe operation of the power grid, the node-voltage and node-imbalance constraints are set.

$$\underline{V}_{i} \le V_{i,t} \le \overline{V}_{i} \;\;\;\;\;i \in B$$
(15)
$$\left\{ \begin{aligned} \underline{P}_{i}^{\text{TR}} \le P_{i,t}^{\text{TR}} \le \overline{P}_{i}^{\text{TR}} \hfill \\ \underline{Q}_{i}^{\text{TR}} \le Q_{i,t}^{\text{TR}} \le \overline{Q}_{i}^{\text{TR}} \hfill \\ \end{aligned} \right.\;\;\;\;\;i \in B^{\text{TR}}$$
(16)

where \(\overline{V}_{i}\) and \(\underline{V}_{i}\) are the maximum and minimum values of voltage, respectively; \(\overline{P}_{i}^{\text{TR}}\) and \(\underline{P}_{i}^{\text{TR}}\) are the maximum and minimum values of active power, respectively; and \(\overline{Q}_{i}^{\text{TR}}\) and \(\underline{Q}_{i}^{\text{TR}}\) are the maximum and minimum values of reactive power, respectively.

3.3 Model linearization

It can be seen from the above model that the nonlinear term is contained in the constraint of (9), which is difficult to solve effectively. In this section, the constraint is processed by the second-order cone-relaxation (SOCR) technique [27, 28].

By the second-order cone-relaxation convex optimization process, (9) is converted into an inequality constraint, thereby eliminating the non-convexity of the original equality constraint.

\(\widetilde{I}_{ij,t} = I_{ij,t}^{2}\) and \(\tilde{V}_{ij,t} = V_{ij,t}^{2}\), thus:

$$\frac{{P_{ij,t}^{2} + Q_{ij,t}^{2} }}{{V_{j,t}^{2} }} \le I_{ij,t}^{2} \;\;\;\;\forall ij \in E$$
(17)
$$\left\| \begin{aligned} &2P_{ij,t} \hfill \\ &2Q_{ij,t} \hfill \\ &\widetilde{I}_{ij,t} - \tilde{V}_{ij,t} \hfill \\ \end{aligned} \right\|_{2} \le \widetilde{I}_{ij,t} + \tilde{V}_{ij,t} \;\;\;\;\forall t,\forall ij \in E$$
(18)

3.4 Model of uncertainty

The uncertainty set U of the distributionally robust optimization method is the same as the definition of the traditional robust optimization. The fluctuation range of each random variable \(\tilde{\varvec{v}}\) is located in a polyhedral uncertainty set U shown as follows.

$$\left\{ \begin{aligned} &U = \left\{ \tilde{\varvec{v}}: \begin{aligned} &\tilde{\varvec{v}} = (\tilde{v}_{r,t}^{\text{PV}} ,\tilde{v}_{r,t}^{\text{WT}} ,\tilde{v}_{r,t}^{\text{L}} )\in R^{|S|} \quad t \in T \hfill \\ &V_{r,t}^{\hbox{min} } \le \tilde{v}_{r,t} \le V_{r,t}^{\hbox{max} } \hfill \\ \end{aligned} \right\} \\ &\tilde{P}_{r,t} = \bar{P}_{r,t} + \tilde{v}_{r,t} \end{aligned} \right.$$
(19)

where \(\tilde{P}_{r,t}\), \(\bar{P}_{r,t}\), \(\tilde{v}_{r,t}\) are the actually value, predicted value and random variable, respectively; \(V_{r,t}^{\hbox{min} }\) and \(V_{r,t}^{\hbox{max} }\) are the lower and upper bounds of each random variable \(\tilde{\varvec {v}}\), respectively; R is a set of all nodes of PV, WT units, and volatility loads; S denotes all possible sets of scenarios; and T is a set of all time periods within the operating range.

Unlike the traditional robust optimization (RO) method, the distributionally robust optimization (DRO) method introduces some probability distribution information (such as covariance and moment [26,27,28]), which is easily obtained. Nonlinear functions are usually used to characterize the probability distribution information of the random variable \(\tilde{v}\). However, due to the large scale and high cost of the nonlinear function calculation [29], we consider using piecewise linear functions \(g_{k}\), defined as:

$$\left\{ \begin{aligned} &E_{P} \{ g_{k} (\tilde{\varvec{v}})\} \,=\, E_{P} \{ g_{j,r,t} (\tilde{\varvec{v}})\} { = }E_{P} (\hbox{max} \{ \tilde{v}_{r,t} - C_{j,r,t} ,0\} ) \le \sigma_{j,r,t} \hfill \\ &K = JRT \quad \forall r \in R,\forall j \in J,\forall t \in T \, \hfill \\ \end{aligned} \right.$$
(20)

where the probability distribution information of each random variable \(\tilde{v}\) is divided into j segments; J is a set of segmentation marks that describe all probability distributions; K is a set of functions describing the distribution of random variables; Cj,r,t is the cut-off constant of each segment, and the expected value of the positive part of vr,t-Cj,r,t is limited to below the constant \(\sigma_{j,r,t}\). The value of \(\mathop {\hbox{max} }\limits_{{\tilde{v} \in U}} g_{k} (\tilde{\varvec{v}})\) under the worst-case distribution with the corresponding random variable v for each segment j can be obtained by (18), which is a constant value.

F represents a set of distributions that contain the distribution P determined by the uncertainty variable v. Various forms of distribution sets have been proposed. Bental and Nemirovski [30] used the set P as a set containing a single-point distribution on the ambiguity set. Shapiro [31] considered the inclusion of a distribution set that satisfies the given support constraints. Scarf [32] and Prekopa [33] used some linear constraints on the properties of random variables that are satisfied when constructing P.

In this paper, the following standardized framework has been used. It is computationally attractive and may be able to characterize the dispersion of \(\tilde{\varvec{v}}\). The location of \(\tilde{\varvec{v}}\) can be specified using the affine expectation constraint, whereas the dispersion can be characterized by bounding the expectation of convex functions over \(\tilde{\varvec{v}}\).

Using the piecewise linear function to integrate the probability distribution information P of each random variable \(\tilde{\varvec{v}}\) into the ambiguity set F, the standardized framework of the ambiguity set F that represents the distribution family is as follows:

$$F = \left\{ P \in \rho_{0} (R^{S} ):\begin{aligned} \, \tilde{\varvec{v}} \in R^{S} \hfill \\ \, E_{P} \{ \tilde{\varvec{v}}\} = 0 \hfill \\ E_{P} \{ g_{k} (\tilde{\varvec{v}})\} \le \sigma_{k} \hfill \\ \, P\{ \tilde{\varvec{v}} \in U\} = 1 \hfill \\ \end{aligned} \right\}$$
(21)

where \(\rho_{0} (R^{S} )\) is the set of all probability distributions on RS; \(E_{P} \{ \tilde{\varvec{v}}\} = 0\) denotes the expected value of the random variable \(\tilde{v}\) is zero, and the function is used to characterize the probability distribution information of the random variable v. \(\, P\{ \tilde{\varvec{v}} \in U\} = 1\) indicates that all random variables v are in uncertainty set U.

Because the expected value \(E_{P} \{ g_{k} (\tilde{\varvec{v}})\}\) is difficult to estimate directly in the ambiguity set F. In this section, we introduce an auxiliary variable \(\tilde{\varvec{u}}\) (\(\tilde{\varvec{u}} \in R^{K}\)) to limit the upper bound of each piecewise function \(g_{k}\), and \(\tilde{\varvec{u}}\) is limited by the worst-case \(\mathop {\hbox{max} }\limits_{{\tilde{\varvec{v}} \in U}} g_{k} (\tilde{\varvec{v}})\). Then, the original uncertainty set \(U\) is expanded to set \(\bar{U}\) as follows.

$$\bar{U} = \left\{ \begin{aligned} &(\tilde{\varvec{v}},\tilde{\varvec{u}}) \in R^{S} \times R^{K} \hfill \\ &\varvec{v} \in U \hfill \\ &g_{k} (\varvec{v}) \le u_{k} \hfill \\ &u_{k} \le \mathop {\hbox{max} }\limits_{v \in U} g_{k} (\varvec{v}) \hfill \\ \end{aligned} \right\}$$
(22)

From (22) above, we can see that the original uncertainty set U is a set of second-order cone polyhedrons in which a piecewise linear function \(g_{k}\) is introduced. Therefore, the extended uncertainty set \(\bar{U}\) can be represented by a series of linear constraints, and its matrix form is as follows:

$$\bar{U} = \{ (\varvec{v},\varvec{u}) \in R^{S} \times R^{K} :\varvec{Cv} + \varvec{Du} \le \varvec{d}\}$$
(23)

At the same time, the original ambiguity set F is expanded to G, where Q represents the joint probability distribution of the random variable v and auxiliary variable u. When the third constraint of the extended set G of (24) is satisfied, the third constraint of the original ambiguity set F corresponding to (19) also holds.

$$G = \left\{ Q \in \rho_{0} (R^{S} \times R^{I} ): \begin{aligned} { (}\tilde{\varvec{v}},\tilde{\varvec{u}}) \in R^{S} \times R^{K} \hfill \\ \, E_{Q} \{ \tilde{\varvec{v}}\} = 0 \hfill \\ E_{Q} \{ \tilde{u}_{k} \} \le \sigma_{k} \hfill \\ \, Q\{ (\tilde{\varvec{v}},\tilde{\varvec{u}}) \in \bar{U}\} = 1 \hfill \\ \end{aligned} \right\}$$
(24)

This section constructs an ambiguity set F that incorporates the probability distribution information of random variables v. The auxiliary variable u is introduced into the original ambiguity set F, which imposes a tighter upper bound on the piecewise linear function \(g_{k}\); this not only increases the flexibility of the linear decision rules, but also helps to reduce the conservative degree of the optimization solution.

3.5 Model transformation using generalized linear decision rule

Based on the model of two-stage distributionally robust optimization (1), we can find that the solution is an NP-hard problem and the current solutions mainly use the linear decision rule methods [24, 34]. In this section, a generalized linear decision rule is used to approximate the model and transform it into a tractable robust counterpart. Compared with the traditional linear decision rule, it can approximate the two-stage robust optimization problem well and strictly obtain a slightly conservative and tractable solution.

Replacing the affine function x2(v) based on random variables v in the traditional linear decision rules. In this section, using the extended ambiguity G and the generalized linear decision rule, we propose an affine function x2(v, u) to encompass the random variables v and auxiliary variables u as well. An upper bound of the affine functions is as follows:

$$x_{2n} (\varvec{v}) = x_{2n}^{0} + \sum\limits_{{s \in S_{n} }} {x_{2ns}^{v} v_{s} } \quad \forall n \in N^{2}$$
(25)
$$x_{2n} (\varvec{v},\varvec{u}) = x_{2n}^{0} + \sum\limits_{{s \in S_{n} }} {x_{2ns}^{v} v_{s} + } \sum\limits_{{k \in K_{n} }} {x_{2nk}^{u} u_{k} } \quad \forall n \in N^{2}$$
(26)

where \(S_{n}\) is a subset of random variables \(\tilde{\varvec{v}}\); and \(K_{n}\) is a subset of auxiliary variables \(\tilde{\varvec{u}}\).

We can observe any function, \(\varvec{x}_{ 2} \in {\mathbf{R}}^{{M_{1} \times N_{2} }}\) satisfying (5) would be an upper bound of \(Q(\varvec{x})\), which is equivalent to the following function:

$$\varvec{x}_{ 2} (\varvec{v}) \in \arg \hbox{min} \left\{ {\varvec{d}^{\text{T}} \varvec{x}_{ 2} :\varvec{Tx}_{ 1} + \varvec{Wx}_{ 2} \le \left. {\varvec{h}(\varvec{v})} \right\}} \right.$$
(27)

Equation (4) shows the set of all decision variables x2 and random variable v when the total expected cost of the second stage is the minimum value.

For the new affine function x2(v, u), the original nonlinear model (1) is equivalent, and the approximate transformation of Q(x) is obtained as follows:

$$\bar{Q}(\varvec{x}) = \hbox{min} \mathop {\hbox{max} }\limits_{Q \in G} E_{Q} \{ \varvec{d}^{\text{T}} \varvec{x}_{ 2} (\varvec{v},\varvec{u})\}$$
(28)

s.t.

$$\varvec{Tx}_{ 1} + \varvec{Wx}_{ 2} (\varvec{v},\varvec{u}) \le \varvec{h}(\varvec{v}) \quad \forall (\varvec{v},\varvec{u}) \in \bar{U}$$
(29)

We use the principle of Lagrange duality to convert the min–max problem of (30) into a min-problem [26, 35].

The dual variables \(\varvec{\pi}_{ 0}\) and \(\varvec{\pi}_{m}\) are introduced to convert the uncertainty constraints (3) and (5) into their robust counterparts. After conversion, (28) is equivalent to the following robust counterpart problem, as shown in (30)–(32).

$$\left\{ \begin{aligned} &\hbox{min} \;r +\varvec{\sigma}^{\text{T}}\varvec{\lambda}\hfill \\ &\varvec{\lambda}\ge 0 \, \hfill \\ &r + \varvec{v}^{\text{T}}\varvec{\rho}+ \varvec{u}^{\text{T}}\varvec{\lambda}\ge \varvec{d}^{\text{T}} \varvec{x}_{ 2} (\varvec{v},\varvec{u})\quad\forall (\varvec{v},\varvec{u}) \in \bar{U} \hfill \\ \end{aligned} \right.$$
(30)
$$\left\{ \begin{aligned} &\varvec{\pi}_{ 0} \ge 0 \hfill \\ &r \ge \varvec{d}^{\text{T}} \varvec{x}_{ 2}^{0} -\varvec{\pi}_{ 0}^{\text{T}} \varvec{q} \hfill \\ &\varvec{\pi}_{ 0}^{\text{T}} \varvec{C}_{s} = \sum\limits_{{n \in N_{s}^{v} }} {d_{n} x_{2n,s}^{v} } - \rho_{s} \quad \forall s \in S \hfill \\ &\varvec{\pi}_{ 0}^{\text{T}} \varvec{D}_{k} = \sum\limits_{{n \in N_{i}^{u} }} {d_{n} x_{2n,k}^{u} } - \lambda_{k} \quad \forall k \in K \hfill \\ \end{aligned} \right.$$
(31)
$$\left\{ \begin{aligned} &\varvec{\pi}_{m} \ge 0\quad\forall m \in M^{2} \hfill \\ &\varvec{T}_{m}^{\text{T}} \varvec{x}_{ 1} + \varvec{W}_{m}^{\text{T}} \varvec{x}_{ 2}^{0} - h_{m}^{0} + \varvec{q}^{\text{T}}\varvec{\pi}_{m} \le 0 \hfill \\ &\varvec{\pi}_{m}^{\text{T}} \varvec{C}_{s} = \sum\limits_{{n \in N_{s}^{v} }} {W_{mn} x_{2n,s}^{v} } - h_{ms}^{v} \quad \forall m \in M^{2} \hfill \\ &\varvec{\pi}_{m}^{\text{T}} \varvec{D}_{k} = \sum\limits_{{n \in N_{k}^{u} }} {W_{mn} x_{2n,k}^{u} } \quad \forall k \in K, \, \forall m \in M^{2} \hfill \\ \end{aligned} \right.$$
(32)

where Cs is the sth column of matrix C; Dk is the \(k\)th column of matrix D; \(\varvec{T}_{m}^{\text{T}}\) and \(\varvec{W}_{m}^{\text{T}}\) are the mth row of matrix T and W, respectively; hms is the mth element of vector hs; and \(N_{s}^{v}\) and \(N_{k}^{u}\) are the sets of decision variables that depend on random variable v and auxiliary variable u in the second stage, respectively.

As the generalized linear decision rule incorporates random variables and auxiliary variables, the quality of the bound improves, albeit at the expense of increased model size [36]. In the section, we can obtain the cost of the second stage and combine the two stages to obtain the optimal decision. The whole process of the optimization model solution is shown in Fig. 1.

Fig. 1
figure 1

Model solution of two-stage distributionally robust optimization problem

4 Case studies

This paper uses the improved IEEE 33-bus and IEEE 118-bus systems to verify the feasibility and effectiveness of the proposed model and method. Figure 2 shows the prediction of total load, PV power and WT power. In Appendix A, the extended configuration in the system is shown in Table A1; Tables A2, A3 and A4 show the parameter settings of each unit; and the average outputs of load, PV, and WT with their maximum and minimum fluctuation range curves of ADN are shown in Appendix A Figs. A1, A2 and A3. We set the penalty cost coefficients for discarded light, abandoned wind power, and lost load as 0.8 $/kWh, 0.6 $/kWh, and 0.5 $/kWh, respectively, and set the network loss cost factor as 0.3 $/kWh. The transaction between the ADN and power grid use the electrical mechanism, for which the specific data is shown in Table 1 below.

Fig. 2
figure 2

Prediction of total load, PV power and WT power

Table 1 Time parameters of electricity price

All algorithm programs are modeled in the MATLAB R2014b-YALMIP environment and solved using the commercial solver CPLEX12.6.0. Simulation environment is a computer of Intel (R) Core (TM) i5-3337U CPU @ 1.80 GHz, 8 GB, operating system Win7 64-bit.

4.1 Related settings of model

The piecewise linear function is set according to the DRO method mentioned above. The corresponding piecewise linear function is used to characterize the prediction error v of PV, WT, and fluctuating load outputs, and is divided into three sections. The cut-off value of each section is set to 0, \(V_{r,t}^{\hbox{min} } /3\), and \(2V_{r,t}^{\hbox{min} } /3\). The constant value \(V_{r,t}\) is the lower bound of random variable v. The constant \(\sigma_{j,r,t}\) represents the distribution information of the uncertain random variables; we assume that the prediction error v complies with the β distribution based on historical data.

$$\begin{aligned} g_{k} (\tilde{\varvec{v}}) &= g_{j,r,t} (\tilde{\varvec{v}}) \hfill \\ &=\hbox{max} \{ \tilde{v}_{r,t} - C_{j,r,t} ,0\} ) \le \sigma_{j,r,t} \hfill \\ \end{aligned}$$
(33)
$$\begin{aligned} E_{P} \{ g_{k} (\tilde{\varvec{v}})\} = E_{P} \{ g_{j,r,t} (\tilde{\varvec{v}})\} \le \sigma_{j,r,t} \hfill \\ \, \forall j = 1,2,3,\;\forall r \in R, \, \forall t \in T \hfill \\ \end{aligned}$$
(34)

The uncertainty fluctuation range α of the uncertain variable is changed from 0.1 to 0.5. For the traditional RO method, we weigh the accuracy and calculation time using the Monte Carlo simulation test, and set the number of samples to 5000 times and the confidence level to 0.95. The Benders decomposition method is used to solve the problem.

4.2 Analysis of effects of different optimization models on operation results

In this section, D-model represents the deterministic model, SP-model represents the stochastic programming model, RO-model represents the robust optimization model, and DRO-model represents the proposed distributionally robust optimization model.

The cost of each item in the Table 2 is the maximum cost of each one in all scenarios. The maximum value of the total cost is not the sum of the individual costs in the worst-case scenarios. By analyzing the data in Table 2, the following conclusions can be drawn:

Table 2 Comparison of expected costs of the worst-case scenario with different α

1) In combination with the maximum fluctuation scenarios of PV, WT, and load shown in Fig. 1 and Figs. A1, A2 and A3 in Appendix A, it is found that under the RO-model, there are also some cases that abandon PV, WT, and loss of load, indicating that the RO-model and DRO-model can obtain a better degree of conservation in scheduling.

2) The D-model does not consider the fluctuation characteristics of PV, WT, and load outputs. Therefore, the optimization results in the worst-case scenario are inferior to the DRO methods and traditional RO methods.

3) RO can reduce the purchase of power from grid, and can further reduce the line loss cost. More schedulable DERs inputs can play a certain role in smoothing the equivalent load and provide a better reactive support effect.

4) In comparison of the two-stage cost of each model with increasing fluctuation range α, it is found that the operating cost of every model is continuously increasing. Based on comparing the total cost of each model, the DRO-model proposed in this paper can not only reduce the amount of abandoned PV, WT, and loss of load, but also reduce the expected cost to improve the economy.

4.3 Analysis of results of optimization methods under different fluctuation ranges α

We compare the expected total cost of the four models with different α.

1) DRO-model and D-model

Because the DRO-model considers the randomness of extreme scenes, it can effectively reduce the operating cost of the ADN, as seen from Table 3: with the increase in the fluctuation range α, the operating cost of the D-model is greater than that of the DRO-model; in terms of security, the DRO-model can guarantee the security of system operation better than the D-model; and the DRO-model considers the penalty cost of abandoning PV, abandoning WT, and losing load, which is superior to the D-model in economy.

Table 3 Comparison of average expected costs of different optimization models with different α

2) DRO-model and SP-model

The operating cost of SP-model is less than the DRO-model, but it is assumed that the probability distribution parameter estimation of uncertain variables (PV and WT predicted output) is inaccurate, and the scheduling result obtained by the SP-model will cause the security condition constraint to be unsatisfied, resulting in serious consequences. Therefore, for the possible extreme bad situation, the dispatcher adopts the DRO method, although it increases the operating cost of the system, as it can ensure the security of the system.

3) DRO-model and RO-model

By the expected cost of DRO-model and RO-model in Table 3 and Fig. 3, the operating cost of the DRO-model is always less than that of the RO-model as the fluctuation range α increases. When the minimum fluctuation range α = 0.1, the DRO-model is the most cost effective and the most meaningful.

Fig. 3
figure 3

Comparison of DRO method with RO method

There are two reasons for this. The first is that the scheduling decision from the RO method is based on some worst-case uncertainties, so it cannot effectively simulate changes in the distribution information, making the method too conservative and giving up too much optimality to ensure its robustness. The second reason is that the ambiguity set constructed in the DRO method is able to capture the distribution information of PV, WT, and load output changes. As the power variation of the uncertain variable decreases, it is less conservative based on the distribution information contained in the ambiguity set. Thus, the total solution cost of ADN scheduling decisions is reduced. Because of its conservatism, the RO-model ensures the security of the system, but reduces the economy of operation. Therefore, from the economic perspective, the DRO-model is superior to the RO-model. Therefore, the distributionally robust optimization dispatch method is superior to the other three methods in the study of renewable energy consumption in ADNs.

Table 4 shows the optimization time comparison of IEEE 33-bus and IEEE 118-bus systems in different fluctuation ranges. It can be seen that the calculation time of the DRO method is slightly higher than that of the RO method because the DRO method can model the change in uncertainty distributions to generate less conservative solutions that increase the solution scale. In the same worst-case scenario, the calculation time of IEEE 118-bus system is always higher than that of the IEEE 33-bus system. In the same system and model, with increasing fluctuation range α, the cost grows continuously.

Table 4 Comparison of computation time of algorithm

5 Conclusion

In this paper, a two-stage distributionally robust optimization model is established to minimize the total expected cost of the ADN under the worst-case distribution of the uncertainty variables. Because a part of the distribution information of the uncertainty variables is reasonably included in the distributionally robust optimization model, the conservativeness of the obtained solution is reduced compared with the traditional robust optimization model. The generalized linear decision rule is used to approximate the second-stage optimization problem, and the original model is transformed into a tractable mixed-integer linear model. The two-stage optimization model is a better and stricter approach that improves the performance of the solution.

The linear decision rules currently studied are only conservative approximations to the second-stage optimal decision making. In addition to using the piecewise linear functions introduced in an ambiguity set to better approximate the original two-stage problem, it is also possible to apply the proposed ambiguity set and reformulation techniques to evaluate scheduling risk [37]. Furthermore, it is also possible to study how different linear decision models can be used to approximate the two-stage optimization model and improve them.