1 Introduction

A key to the stable operation of future electricity grid is realizing efficient demand response management (DRM). With the increasing share of renewables in energy mix, the production is becoming less and less controllable. At the same time, electricity consumption is becoming more controllable due to new types of loads and storage (e.g., electric vehicles, home-level or small business energy management solutions) and various intelligent appliances at end consumers. As a result, a gradual shift from the traditional “supply follows demand” paradigm to a new “demand follows supply” approach can be observed. The critical success factor for efficient DRM is an appropriate electricity tariff that motivates consumers to schedule their loads and manage their batteries in such a way that it contributes to grid stability.

This paper studies the problem of optimizing the electricity tariff offered by an electricity retailer to its customers in a game theoretical setting. A bilevel programming approach is introduced, where the retailer is the leader and the groups of end consumers act as multiple independent followers. The customers are modelled as “prosumers”, i.e., simultaneous producers and consumers of electricity, who look for the best tradeoff between maximizing their utility and minimizing their cost of electricity. An effective and computationally efficient solution method is proposed. The bilevel program is firstly transformed into an equivalent single-level optimization problem using a primal-dual reformulation, and then solved using a successive linear programming (SLP) algorithm.

After reviewing the related literature in Section 2, a formal definition of the tariff optimization problem is given in Section 3. The proposed approach is presented in detail in Section 4. The approach is illustrated on a small-scale example and evaluated in thorough computational experiments in Section 5. Finally, conclusions are drawn and possible directions for future research are suggested.

2 Literature review

2.1 Game-theoretic models for DRM

Game-theoretic models for DRM have received significant attention recently [1]. A fundamental classification of these models differentiates between “real-time” vs. “day-ahead” approaches. Real-time pricing (RTP) models often focus on the present time instant only, and ignore the interdependence between the present energy tariff and past or future consumption. Accordingly, these models concentrate on load curtailment, but fail to capture deferrable loads appropriately. This limitation can be lifted by applying multi-period models. Still, most of the earlier contributions focus on the RTP scenario with a single time instant: a multi-leader, multi-follower Stackelberg game is defined for DRM among independent electricity providers and consumers in [2]. A closed-form analytical solution is derived, which can be obtained by a distributed algorithm. The management of consumer-to-grid systems is modelled as a Stackelberg game in [3], with a central power station acting as the leader, and consumers as multiple followers. Embedded into the Stackelberg game, the consumers play a generalized Nash game to establish their equilibrium strategies, and hence, to determine their response to the energy prices offered by the power station. A similar approach is applied to electric vehicle charging in [4]. Reference [5] investigates DRM on three levels of hierarchy (the grid operator, multiple service providers, and the consumers) with RTP, and proposes a two-loop Stackelberg game model. The existence of a unique Stackelberg equilibrium is proven by exploiting the strictly convex sub-problems of the individual players, and an iterative distributed algorithm is proposed for reaching it. A Stackelberg approach is investigated for DRM under load uncertainty in [6]. Again, an analytical solution could be derived.

Reference [7] studies a Stackelberg game for RTP over multiple time periods with a profit-maximizing retailer and a single end consumer who looks for the best tradeoff between electricity cost and comfort in the heat management of a building. The problem is formulated as a bilevel program, and then converted to and solved as a single-level mixed-integer linear program (MILP) using the Karush–Kuhn–Tucker (KKT) conditions. The same paper shows that while RTP is vastly efficient for load shifting, it can cause excessive and imponderable payments for small consumers. Therefore, more predictable day-ahead pricing schemes are an attractive approach for households.

Despite this, the literature of “day-ahead” tariff optimization models is significantly scarcer. In [8], a Stackelberg model is proposed for energy pricing and dispatch in a multi-period day-ahead setting in two coupled stages. The first stage addresses price setting subject to demand response from consumers who minimize their energy cost and maximize their utility by scheduling their controllable loads. In the second stage, the retailer establishes the operation strategy for its storage unit and its energy contracts by solving a robust optimization problem considering uncertain market prices. The authors transform this problem into a single-level MILP by exploiting the KKT conditions and duality theory. In [9], a Stackelberg game is formulated and solved using an iterative heuristic approach. Two different games related to demand side management are studied in [10]: a Nash game between consumers equipped with batteries and a Stackelberg game between the utility provider and the consumers. A bilevel programming approach to the operation scheduling of a distribution network, with a cost-minimizing network operator as the leader and multiple profit-maximizing microgrids as followers, is considered in [11]. Again, KKT reformulation is applied to arrive at a single-level problem. A sophisticated Stackelberg game model is presented in [12] to capture the interplay of a retailer (leader) and various types of distributed energy resources, including generators and consumers with different types of controllable load as well (followers). Again, the problem is converted to a single-level MILP using the KKT reformulation. A similar problem, with power flow constraints and a retailer who also oversees the operation of distributed generators and batteries, is studied in [13].

2.2 Related problems in energy management and DRM

The consumers’ (followers’) problem in the above Stackelberg games corresponds to an energy management problem for minimizing the cost and maximizing the utility. Linear programming (LP) models limited to active power flow equations are commonly used in the literature for solving this problem [14]. More sophisticated, non-linear models allow capture reactive power and voltage magnitudes as well [15, 16], or describe the behavior of the energy system components (battery, or heating, ventilation, and air conditioning in buildings) in a more realistic way [17].

Obviously, Stackelberg games and bilevel programming approach are not the only possible approaches to DRM problems. Alternative methods are often based on statistical models of the grid-level load response to the variation of energy prices [18]. An iterative solution procedure that alternates between the optimization problems of the consumers (minimizing the cost) and the grid (maximizing the load factor) is presented in [19] for smart building-to-grid systems, using a sophisticated thermal model of the buildings. In [20], the problem of dynamic pricing for DRM is formulated as a Markov decision process, and reinforcement learning is used to solve it. In [21], demand response is modeled by directly quantifying the delay-tolerant demand and its dependence on price by linear, potential, exponential and logarithmic load functions.

Game-theoretic approaches to different, but related problems in energy management include [22], where a Stackelberg approach is proposed for achieving a fair curtailment of renewable energy generation. A Stackelberg game model is investigated in [23] with a central production unit (leader) who sets the electricity price to maximize its profit subject to the response from an electricity service provider (follower) that will accept load curtailment and distributed generation bids from various microgrids in view of the central producer’s price. A supply-demand game is investigated in a smart grid setting in [24], with generators and loads acting as multiple followers, and a data center server as the virtual leader; a deep transfer Q-learning algorithm is applied for finding the equilibrium. The optimal operation of multi-carrier energy systems is modelled as bilevel optimization problem in [25]. The upper level problem of minimizing the total energy cost and the lower level problem of minimizing the operation and dissatisfaction costs are solved through an iterative procedure.

2.3 Mathematical methodology

An introduction to bilevel programming approach, including basic modelling and solution techniques, is given in [26, 27]. Approaches to transforming bilevel optimization problems into equivalent single-level models, including the optimal value or the KKT reformulation, are studied in [28]. A recent survey on bilevel programming for price setting problems is given in [29].

SLP has been applied in smart grid community, e.g., to the planning of generators’ investments and transmission network extensions [30], or to the tackling of non-linear phenomena in variants of the optimal power flow problem [31]. At the same time, to the best of our knowledge, this paper is the first to apply SLP to tariff optimization for demand response.

2.4 Positioning of current contribution

This section surveys the literature on game theoretical models to electricity tariff optimization for DRM and related problems. The algorithmic techniques applied to Stackelberg tariff optimization problems are summarized in a tabular format in Table 1. The survey shows that although some simpler formulations, all focusing on a single time period, adopt a solution that can be computed analytically in closed form, multi-period problems are computationally more challenging. This observation is also supported by a formal proof in [32], which states that multi-period models for DRM with controllable loads at the consumers are NP-hard. Accordingly, the vast majority of earlier contributions apply the KKT reformulation to arrive at a single-level MILP that can be solved using commercial solvers. However, at the price of considerable computational effort, and a number of papers mention that the solution approach is applicable mostly to small-scale problems [13]. Other approaches use customized heuristics for solving the problem.

Table 1 Algorithmic techniques applied for solving Stackelberg tariff optimization models in literature

The contribution of this paper is twofold. On one hand, it defines a generic game-theoretic model for DRM that slightly extends the above discussed models (e.g., it captures both distributed battery storage and controllable loads characterized by a given utility function at the consumers). And the key properties (e.g., necessary conditions for feasibility, computational complexity) can be formally proven. On the other hand, it proposes an efficient solution approach based on well-established mathematical programming techniques that first exploits duality for the followers’ model to convert it into a single-level quadratically constrained quadratic program (QCQP), and subsequently, applies an SLP approach to solve it. It is shown in computational experiments that the proposed approach outperforms earlier KKT-based methods regarding both solution quality and computational effort for practically relevant problem sizes.

This paper is a substantially extended version of the earlier conference paper [33]. In addition to a refined model that captures the profit-maximizing behavior of the retailer, the extensions are related to the core contributions of the present paper, i.e., the formal proofs of the fundamental characteristics of the model and the thorough computational experimentation for assessing solution quality and computational efficiency.

3 Problem definition

3.1 System architecture

This paper investigates DRM as an interaction among an electricity retailer and various prosumers, i.e., clients who can both produce and consume electricity, in a smart grid. In order to ensure the tractability of the problem over a large population, prosumers are classified into prosumer groups (PGs), where each PG consists of prosumers with similar electricity consumption and production profiles as well as storage capabilities. The system architecture is displayed in Fig. 1.

Fig. 1
figure 1

System architecture with a retailer and multiple PGs

PGs are characterized by their uncontrollable production and consumption, controllable load requirements as well as their storage capabilities. The uncontrollable production \(C^+_{i,t}\) and consumption \(C^-_{i,t}\) of PG\(_i\) is fixed and given in the input for each time period \(t=1, 2, ..., T\). In addition, PG\(_i\) needs to schedule a (potentially zero) controllable load of \(M_i\) (\(M_i\) is the total controllable load during the horizon) over the time horizon, where a maximum of \(\bar{L}_{i,t}\) can be scheduled in each period t. It is noted that time windows can be defined for the controllable load by setting \(\bar{L}_{i,t} = 0\) for the appropriate period t. The preferences of PG\(_i\) on the timing of the controllable load are encoded in utility values \(U_{i,t}\), where \(U_{i,t}\) captures the utility of scheduling a unit of controllable load in t. Hence, if PG\(_i\) decides for a controllable load of \(L_{i,t}\) over time, this incurs a utility of \(\sum \limits _{t=1}^T U_{i,t} L_{i,t}\) for the PG. Similar models for controllable load are used frequently in the literature [8].

PGs can further optimize their energy management by the appropriate charging and discharging of their battery storage. The battery is characterized by its capacity \(\overline{B}_i\), the maximum charge and discharge rates \(R^+_i\) and \(R^-_i\), the initial battery state-of-charge (SoC) \(b_{i,0}\), and its cycle efficiency \(\eta _i\). In order to safeguard from unexpected power outages, the prosumer wishes to retain a given, time-varying minimum SoC \(\underline{B}_{i,t}\) in the battery.

Each individual PG schedules its controllable load \(L_{i,t}\) and determines its battery SoC \(b_{i,t}\) over time to optimize its own objective, composed of maximizing the utility and minimizing the electricity cost with regard to the energy tariff set by the retailer. This PG model is generic enough to capture the behavior of diverse types of prosumers, ranging from households or offices with uncontrollable consumption only (and therefore, unresponsive to the energy tariff), via prosumers equipped with renewable energy generation and/or storage devices, owners of electric vehicles, to complex microgrid systems. It should be noted that various alternative approaches for modeling prosumer behavior have been subjected to extensive research recently. Questions of special interest include addressing individual prosumers or organizing them into PGs as well as using deterministic or probabilistic models. A richer, probabilistic approach to characterize the responsiveness of prosumers to the variation of the electricity tariff is presented in [34], together with a review of the recent literature on the benefits and drawbacks of different approaches.

The retailer employs the same time-of-use electricity tariff for all prosumers. The tariff is specified in the form of day-ahead electricity purchase prices \(Q^+_t\) and feed-in prices \(Q^-_t\) offered to PGs for periods \(t = 1, 2, ..., T\). It is assumed that the tariff is regulated by an a priori agreement between the retailer and the prosumers, which defines minimum, maximum, and maximum average electricity prices in the form of \(0 < \underline{Q} \le Q^-_t \le Q^+_t \le \overline{Q}\) and \(\frac{1}{T} \sum \limits _{t=1}^T Q^+_t \le \tilde{Q}\), where \(\tilde{Q}\) is max. average electricity price for prosumers; \(\overline{Q}\) is maximum electricity price for prosumers; \(\underline{Q}\) is minimum electricity price for prosumers. Such an agreement is necessary to prevent the profit maximizing retailer from increasing purchase prices without any limit [7].

The focus of this paper is on the problem faced by the retailer, who has to cover the (potentially negative) net consumption of the ensemble of all prosumers from the electricity purchased or sold on the wholesale market. This paper assumes a time-variant dual pricing scheme on the wholesale market, given in the form of purchase prices \(P^+_t\) and feed-in prices \(P^-_t\). It is noted that the same model can be naturally applied to markets with uniform pricing (purchase prices equal to selling prices) by letting \(P^+_t = P^-_t\). It is assumed that the retailer appears as a price-taker on the market, without any ability to influence the market prices.

By offering an appropriate electricity tariff to its prisoners, the retailer can initiate a demand response program that motivates the prosumers to purchase electricity in valley periods. When an ample amount of cheap energy is available on the market, the retailer can sell their surplus energy in peak periods. In this way, the retailer can contribute to the grid stability and maximize its profit at the same time. In this paper, the maximization of the retailer’s profit is addressed.

3.2 Stackelberg game model and its basic characteristics

The following communication protocol is implemented among various stakeholders: the retailer firstly announces the day-ahead electricity tariff to all prosumers. The prosumers observe this tariff and optimize their consumption profile, i.e., the amount of electricity purchased from or fed into the grid over time. Then, the parties implement their actions as planned. It is assumed that the retailer is aware of the decision model and the parameters of PGs. This leads to a “Stackelberg game” with the retailer as the “leader” and PGs as “multiple followers”. The so-called “optimistic” assumption is adopted, i.e., if a follower has more than one optimal solutions according to its own objective, then it chooses its optimal solution that is the most favorable for the leader. The following additional assumptions are made, which guarantee the feasibility of the problem:

$$\begin{aligned}&\sum _{t=1}^T \bar{L}_{i,t} \ge M_i&\qquad&\forall \, i \end{aligned}$$
$$\begin{aligned}&\underline{B}_{i,t} \le \overline{B}_i&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&b_{i,0} \le \overline{B}_i&\forall \, i \end{aligned}$$
$$\begin{aligned}&b_{i,0} + tR^+_i \le \underline{B}_{i,t}&\forall \, i,t \end{aligned}$$

where T is the number of time periods.

These assumptions require that bounds on the controllable load allow schedule the required power over the horizon as for (1), the bounds on the battery SoC are consistent as for (2), the initial battery charge satisfies these bounds as for (3), and finally, that the charging rate of the battery allows satisfying the lower bounds on the SoC as for (4).

Lemma 1

(Existence of a solution) If assumptions (1)–(4) hold, then the followers’ problem is feasible for any electricity tariff \(Q^+_t\) and \(Q^-_t\) set by the leader.


Setting the battery SoC to the required minimum, i.e., \(b_{i,t} = \max \left( b_{i,t-1}, \underline{B}_{i,t} \right)\), and scheduling the controllable as early as the bounds allow, i.e., \(L_{i,t} = \min \left( \bar{L}_{i,t}, M_i - \sum \limits _{u=1}^{t-1} L_{i,u} \right)\) result in a feasible solution for each follower i.

Lemma 2

(Independence of followers’ problems) The optimal demand response of an individual PG to a given energy tariff is independent of the response of other PGs.


The objectives of the individual PG, i.e., its energy cost and utility, depend solely on the energy tariff and the consumption profile of the given PG. Moreover, the feasibility of a consumption profile is also independent of other PGs’ response, since the amount of electricity that can be purchased or sold on the market by the retailer to maintain the grid-level balance is unbounded.

It is emphasized that different PGs’ problems are still interconnected trough the retailer’s problem, but for any fixed decision of the retailer on the tariff, the PGs can optimize their behavior without considering the problems faced by fellow PGs. Hence, the problem can be modeled as a Stackelberg game with a single leader (the retailer) and multiple independent followers (the PGs). It is noted that when the optimal response of a follower is not unique, the response induced by the optimistic assumption (from the set of all optimal responses) can be dependent on other PGs’ response.

Lemma 3

(Computational complexity) The above defined bilevel energy tariff optimization problem is NP-hard.


The simple multi-period energy tariff optimization problem (SMETOP) has been introduced as a minimal bilevel optimization model of energy tariff optimization for DRM, and it has been proven to be NP-hard in [32]. The problem investigated in this paper generalizes SMETOP in the sense that, in addition to all features captured by SMETOP, it also handles batteries and uncontrollable energy production and consumption at the PGs as well as bidirectional grid connections. This implies that the currently investigated, generalized problem is NP-hard, too.

4 Solution approach

4.1 Overview

This section presents a bilevel programming formulation of the above Stackelberg game model, and proposes an efficient solution approach for that formulation. First, the models of an individual follower and the leader are formally defined. Then, the bilevel programming model received as a combination of the two parties’ problems is reformulated into a single-level QCQP, which is, in turn, solved using an SLP algorithm.

4.2 Prosumer groups’ (followers’) problem

The decision problem faced by an individual PG\(_i\) (the follower) is a parametric optimization problem, whose parameters encode the electricity tariff determined by the retailer (decision variables \(Q^+_t\) and \(Q^-_t\) controlled by the leader). The problem can be captured by the following LP, where the symbol \(\varphi _{i,t}^k\) on the right-hand side of the constraints denotes the dual variables associated with the given constraint:

$$\begin{aligned} \min g_i(Q^+, Q^-) = \sum _{t=1}^T \left( Q^+_t x^+_{i,t} - Q^-_t x^-_{i,t} - U_{i,t} L_{i,t}\right) \end{aligned}$$
$$\begin{aligned}&\text {s.t.}&\nonumber \\&C^+_{i,t} - C^-_{i,t} + x^+_{i,t} - x^-_{i,t} - L_{i,t} = r^+_{i,t} - r^-_{i,t}&\qquad&\forall \, t,[\varphi _{i,t}^6] \end{aligned}$$
$$\begin{aligned}&\eta _i\, r^+_{i,t} - r^-_{i,t} = b_{i,t} - b_{i,t-1}\qquad& \forall \, t,[\varphi _{i,t}^7] \end{aligned}$$
$$\begin{aligned}&\sum _{t=1}^T L_{i,t} = M_i\qquad &[\varphi _{i}^8]&\end{aligned}$$
$$\begin{aligned}&L_{i,t} \le \bar{L}_{i,t}\qquad &\forall \, t,[\varphi _{i,t}^9] \end{aligned}$$
$$\begin{aligned}&\underline{B}_{i,t} \le b_{i,t}\qquad&\forall \, t,[\varphi _{i,t}^{10}] \end{aligned}$$
$$\begin{aligned}&b_{i,t} \le \overline{B}_i\qquad &\forall \, t,[\varphi _{i,t}^{11}] \end{aligned}$$
$$\begin{aligned}&r^+_{i,t} \le R^+_i\qquad &\forall \, t,[\varphi _{i,t}^{12}] \end{aligned}$$
$$\begin{aligned}&r^-_{i,t} \le R^-_i\qquad &\forall \, t,[\varphi _{i,t}^{13}] \end{aligned}$$
$$\begin{aligned}&0 \le x^+_{i,t}, x^-_{i,t}, r^+_{i,t}, r^-_{i,t}, L_{i,t}\qquad & \forall \, t&\end{aligned}$$

where \(x^+_{i,t}\) is the electricity purchased; \(x^-_{i,t}\) is the electricity fed into the grid; \(r^+_{i,t}\) is the electricity charged into battery; \(r^-_{i,t}\) is the electricity discharged from battery; \(\varphi _{i,t}^k\) is the dual variables.

The follower’s objective (5) is comprised of the total cost of energy, i.e., the cost of energy purchased minus the income from feeding energy into the grid, and the PG’s utility achieved by the timing of the controllable load. Constraint (6) encodes that the energy balance at the PG is maintained. Equation (7) computes the battery state-of-charge based on the charge and discharge rates. Constraints (8) and (9) ensure that the amount and the timing of the controllable load satisfies the requirements. Finally, inequalities (10)–(14) define the allowed range of the battery SoC, the charge and discharge rates as well as the electricity purchase and feed-in rates at the PG.

It is noted that all constraints in the followers’ model are linear, whereas the objective contains the leader’s variables as multipliers, making it a bilinear (quadratic) expression. The models of different followers are linked only via the problem of the leader’s decision.

4.3 Retailer’s (leader’s) problem

The optimization problem faced by the retailer can be formulated as a bilevel program that contains the PGs’ problem as a nested sub-problem. This nested sub-problem, encoded as a constraint in the model, expresses that a part of the variables (decision variables \(x^+_{i,t}\) and \(x^-_{i,t}\), corresponding to the amount of electricity purchased from and fed into the grid) are controlled by the followers, according to their known decision model:

$$\begin{aligned} \max f = \sum _{t=1}^T \left( \sum _{i=1}^N \left( Q^+_t x^+_{i,t} - Q^-_t x^-_{i,t} \right) \ -\ P^+_t y^+_t\ +\ P^-_t y^-_t \right) \end{aligned}$$


$$\begin{aligned}&y^+_t - y^-_t = \sum _{i=1}^N(x^+_{i,t} - x^-_{i,t})&\qquad&\forall \, t \end{aligned}$$
$$\begin{aligned}&\underline{Q} \le Q^-_t \le Q^+_t \le \overline{Q}&\quad&\forall \, t \end{aligned}$$
$$\begin{aligned}&\frac{1}{T} \sum _{t=1}^T Q^+_t \le \tilde{Q}&\end{aligned}$$
$$\begin{aligned}&\left( \begin{aligned} x^+_{i,t} \\ x^-_{i,t} \end{aligned} \right) \in \arg \min \left\{ \ g_i(Q^+, Q^-) \ |\ (6)-(14) \right\}&\quad \forall \, i \end{aligned}$$

where N is the number of PGs.

The leader’s objective (15) is to maximize its profit, calculated as its revenue from the prosumers, minus the cost of electricity purchased on the market, and plus the income from the electricity sold on the market. Equation (16) encodes the grid-level energy balance. Inequalities (17) and (18) define the valid range of the energy tariff variables. Finally, constraint (19) states that the electricity purchase and feed-in values of prosumers are determined using the above optimization model.

4.4 Single-level QCQP reformulation

When the complexity of a bilevel optimization problem does not allow developing an analytical solution, which is apparently the case above, the two candidate solution approaches are the application of (meta-)heuritics directly to the bilevel problem, or the reformulation to a single-level problem. The considerable benefit of the latter technique is that it allows the application of theoretically well-founded, potentially even exact mathematical programming approaches to solve the problem. For this reason, this paper adopts the reformulation approach and looks for a transformation of the bilevel problem (15)–(19) into a single-level mathematical program. The key to achieve this is the modeling of the optimality condition of the followers (19). By exploiting duality for the followers’ LP model it is noted that the followers’ LP model (5)–(14) contains the bilinear term \(Q^+_t x^+_{i,t} - Q^-_t x^-_{i,t}\) in the expression of \(g_i(Q^+, Q^-)\), i.e., a multiplication of the leader’s and the followers’ variables. With this, the model is still linear in the followers’ variables, and LP duality can be exploited for reformulating it. By exploiting duality for the followers’ LP model, primal-dual reformulation of the followers’ problem can be applied: the optimality condition (19) is translated into the conjunction of followers’ primal constraints (6)–(14), dual constraints, and an equality constraint between the primal and the dual objectives. By duality, the ensemble of these constraints is satisfied if and only if the given instantiation of the variables is an optimal solution for the follower.

The complete single-level reformulated problem is shown below. It consists of the leader’s objective (15), the leader’s constraints (16)–(18), the followers’ primal constraints (6)–(14), an equality relation between the followers’ primal and dual objectives (20) as well as the followers’ dual constraint corresponding to the primal variables for the battery charge rate \(r^+_{i,t}\) (21), discharge rate \(r^-_{i,t}\) (22), SoC \(b_{i,t}\) for \(t<T\) (23) and \(b_{i,T}\) (24), electricity purchase \(x^+_{i,t}\) (25), electricity feed-in \(x^-_{i,t}\) (26), and controllable load \(L_{i,t}\) (27).

$$\begin{aligned}&\max f = \sum _{t=1}^T \left( \sum _{i=1}^N \left( Q^+_t x^+_{i,t} - Q^-_t x^-_{i,t} \right) \ -\ P^+_t y^+_t\ +\ P^-_t y^-_t \right) \nonumber \\ \text {s.t.} \nonumber \\&(6)-(14), (16)-(18)&\nonumber \\&\sum _{t=1}^T \left( Q^-_t x^-_{i,t} - Q^+_t x^+_{i,t} + U_{i,t} L_{i,t}\right)&\nonumber \\&\quad = \sum _{t=1}^T \Big ( \big (C^-_{i,t} - C^+_{i,t}\big )\varphi _{i,t}^{6} + \bar{L}_{i,t} \, \varphi _{i,t}^{9} + R^+_i \, \varphi _{i,t}^{12}&\nonumber \\&\qquad + R^-_i \, \varphi _{i,t}^{12} -\underline{B}_{i,t} \, \varphi _{i,t}^{10} + \overline{B}_i \, \varphi _{i,t}^{11} \Big ) \qquad \forall \, i&\end{aligned}$$
$$\begin{aligned}&-\varphi _{i,t}^{6} + \eta _i \varphi _{i,t}^{7} + \varphi _{i,t}^{12} \ge 0&\qquad&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&\varphi _{i,t}^{6} - \varphi _{i,t}^{7} + \varphi _{i,t}^{13} \ge 0&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&-\varphi _{i,t}^{7} + \varphi _{i,t+1}^{7} - \varphi _{i,t}^{10} + \varphi _{i,t}^{11} \ge 0&\forall \, i,t<T \end{aligned}$$
$$\begin{aligned}&-\varphi _{i,T}^{7} - \varphi _{i,T}^{10} + \varphi _{i,T}^{11} \ge 0&\forall \, i \end{aligned}$$
$$\begin{aligned}&\varphi _{i,t}^{6} \ge -Q^+_t&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&-\varphi _{i,t}^{6} \ge Q^-_t&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&-\varphi _{i,t}^{6} + \varphi _{i}^{8} + \varphi _{i,t}^{9} \ge U_{i,t}&\forall \, i,t \end{aligned}$$
$$\begin{aligned}&\varphi _{i,t}^{9},\ \varphi _{i,t}^{10},\ \varphi _{i,t}^{11},\ \varphi _{i,t}^{12},\ \varphi _{i,t}^{13} \ge 0&\forall \, i,t \end{aligned}$$

The primal-dual reformulation particularly suits the problem in the scope, since the only occurrence of the leader’s variables, \(Q^+_t\) and \(Q^-_t\), in the followers’ problem is in their primal objective, and consequently, on the right hand side of the dual constraints. As a result, the only non-linear term in the single-level reformulation is the payment from the PGs to the retailer, contained both in the leader’s objective (15) and in the followers’ optimality constraint (20), which is a bilinear expression containing the multiplication of the followers’ and the leader’s variables. All other constraints are linear.

4.5 SLP solution method

Since the above QCQP is non-convex, no efficient exact algorithm can be expected for solving it, and accordingly, (meta-)heuristic approaches are of interest. Therefore, we propose an SLP heuristic solution approach, which shows good convergence properties especially on problems where most of the constraints are linear. SLP solves non-linear problems by iteratively constructing local LP approximations of the original problem, and solving each approximation using standard LP techniques [35, 36]. The algorithm departs from an initial solution \(X_0\), and in each iteration k, it builds a local linearization of the original problem around \(X_k\), denoted as LP\(_k\). Then, the optimal solution of LP\(_k\) is sought subject to a given step bound, \(-s \le X - X_k \le s\). If the optimal LP solution is feasible with a given tolerance, then it is accepted as the next solution \(X_{k+1}\) (possibly s is increased). Otherwise \(X_{k+1} = X_k\) and s is decreased.

The above SLP algorithm converges to a locally optimal solution of QCQP, which is potentially different from the global optimum. In order to reduce the risk of getting stuck in a local optimum, the SLP algorithm is embedded into a randomized restart procedure. It executes multiple SLP runs, using a random perturbation of the previous best solution as an initial solution in each run (or \(Q^-_t = Q^+_t = \underline{Q}\) in the first run). The implementation reported in this paper is based on the SLP package of Fico Xpress 7.8 by using its default SLP algorithm, with the number of SLP runs set to 10 in all computational experiments.

4.6 Discussion on possible extensions

While the above presented bilevel model captures the most important generic features of prosumers (production and consumption, controllable load, battery storage), it can be extended and refined in many different ways. The most relevant directions include the extension of the prosumer model with features for specific types of equipment that induce elastic load (e.g., high voltage alternating current (HVAC) in buildings, or refined battery storage models capturing state-dependent charging properties and losses [17]) as well as the extension of the retailer model with the generation or energy storage. The proposed solution method is directly applicable to the extended models as long as the prosumer model remains linear. The proposed reformulation still applies with binary variables in the retailer model (e.g., due to switchable generators). And commercial solvers offer algorithms to tackle the resulting mixed-integer QCQP though the computational efficiency of the approach which needs to be verified for the given application.

Below, we review two minor refinements of the baseline bilevel model (15)–(19) fixing specific issues that might be undesirable in some application scenarios. Firstly, the baseline model may trigger inappropriate end-of-horizon effects, namely, the followers sell all the energy stored in the batteries to maximize their revenue. This can be avoided by subtracting a term that valuates the energy stored in the batteries at the end of the planning horizon from the followers objective (5) as follows:

$$\begin{aligned} \frac{Q^+_T + Q^-_T}{2} b_{i,T} \end{aligned}$$

Another example of a possible requirement that is not captured readily by the above model is that, among different optimal solutions that maximize the retailer’s profit, a solution with a smooth electricity purchase and/or sale over time is preferred. Unwanted oscillation of the energy purchased or sold on the wholesale market can be smoothed out by adding the following term to the retailer’s objective:

$$\begin{aligned} - \varepsilon \sum _{t=1}^{T}(y^+_t - y^-_t)^2 \end{aligned}$$

This quadratic term measures the squared deviation of the energy traded over time with a constant bias. Accordingly, adding it to the retailer’s objective with a small multiplier \(\varepsilon\) smooths unnecessary oscillations without affecting the payoffs of the players.

4.7 Discussion on KKT reformulation

As an alternative to the proposed solution, KKT reformulation and linearization can be applied to convert the proposed bilevel model (15)–(19) into a single-level MILP. This approach is often considered to be the default choice for transforming bilevel problems into single-level ones. Moreover, the resulting MILP, in theory, can be solved to exact optimality by commercial solvers.

Converting the bilevel model into a single-level MILP requires linearizing the KKT complementary slackness conditions using big-M constraints over additional binary variables as well as linearizing the quadratic term in the objective by expressing and substituting it from (20). However, as it will be shown in the computational experiments, this approach is computationally challenging due to the high number of binary variables and big-M constraints. In particular, linearizing the complementary slackness conditions requires introducing ca. 22 \(\cdot\) N T auxiliary binary variables into the model (one for each primal and dual variable, resulting in over 20000 additional binary variables for \(N=20\) and \(T=48\)). Moreover, the corresponding big-M constraints are typically difficult to solve due to their weak LP relaxations. For further details on the KKT reformulation, the interested reader is referred to [26, 28]. Finally, even minor modifications in the bilevel model can hinder the linearization of the KKT reformulation, as is the case with terms for the valuation of the remaining charge (29) or for smoothing (30).

In this paper, we use the KKT reformulation and the exact MILP solution approach in computational experiments to assess the quality of the solutions found by the proposed SLP solution on small-size problems.

5 Experimental evaluation

5.1 Illustrative example

In this section, the proposed approach to DRM is demonstrated on a small-scale illustrative example, with three PGs and a one-day horizon (from 8:00, using hourly time units). The PGs correspond to different types of consumers as follows:

  1. 1)

    PG1 represents an intelligent energy-positive street lighting microgrid system called E+grid [37, 38]. Since the lighting system is controlled according to local traffic and environmental conditions, as captured by motion sensors and a local weather station, its consumption varies dynamically over time. The grid-connected system is also equipped with photovoltaic (PV) power generation and battery storage, which enables it to perform active energy management using an optimization approach that corresponds to the PG model adopted in this paper. Real-life data originates from a physical prototype with 191 luminaries and 151.2 m\(^2\) of active PV surface area, and reflects the operation of the system on a sunny day in October. The E+grid PG is a net producer (up to 15 kW) during the day, and a net consumer (up to 3.5 kW) during the night.

  2. 2)

    PG2 comprises owners of plug-in electric vehicles. Data used in the example corresponds to three Nissan Leaf electric vehicles, with a 24 kWh battery pack in each vehicle, which has to be charged from a 50% state to 100% state during the night. Individual vehicles are connected to the grid between 17:00 and 20:00 and disconnected between 6:00 and 8:00 in the morning. With the vehicle-to-grid (V2G) option ignored, this can be modeled as a controllable load of 36 kWh. It is assumed that the owners have a slight preference for charging the electric vehicles as early as possible, which is captured by utility values \(U_{2,t}\) linearly decreasing over time.

  3. 3)

    Finally, PG3 contains households with uncontrollable consumption only. This case study uses the data of 15 average Hungarian homes, with a peak consumption of 5.7 kW during the day, and a minimum consumption of 3.8 kW during the night. Since this PG has no controllable load or battery storage, it cannot participate actively in DRM, and its consumption appears only as a time-varying bias in the grid-level consumption.

The retailer aims to maximize its profit by offering an appropriate time-of-use electricity tariff to the PGs, respecting a priori contract that sets \(\underline{Q}=1\) c/kWh, \(\overline{Q}=100\) c/kWh, and \(\tilde{Q}=10\) c/kWh. For the sake of simplicity, the wholesale market prices are assumed to vary in two steps: 12 c/kWh during the day (between 8:00 and 21:00) and 6 c/kWh during the night. The feed-in price on the wholesale market is a constant 3 c/kWh.

Fig. 2
figure 2

Solution with optimized tariff: energy purchase price and overall consumption over time

Fig. 3
figure 3

Solution with optimized tariff: consumption and battery state of the E+grid lighting system PG over time

Fig. 4
figure 4

Solution with optimized tariff: consumption and cumulated load of the electronic vehicle PG over time

Fig. 5
figure 5

Solution with optimized tariff: consumption of the household PG over time

The system-level optimum for this example is determined by the following characteristics: the overall grid is a net producer until 17:00 due to PV generation in the E+grid microgrid, whereas it is a net consumer afterwards. In order to avoid losses stemming from dual pricing on the wholesale market, the retailer should motivate the PGs to anticipate load and charge batteries before 17:00. On the contrary, in the period after 17:00, it should encourage PGs to defer their load from the peak period lasting until 21:00 to the valley period afterwards.

This sample instance was solved using the proposed approach, applying formula (30) to eliminate the oscillations of the energy flow that are visually disturbing. Otherwise, it does not affect the payoffs of the players. The results displayed in Figs. 2, 3, 4, and 5 show that the proposed approach could indeed reach the above described system-level optimum. The diagrams compare the optimized consumption profile (red curve) to the baseline consumption (green curve) for the overall grid and for the individual PGs, where the baseline consumption is computed by scheduling the controllable loads to maximize utility (ignoring the electricity tariff) without using the batteries. The characteristic time periods are separated by dashed lines at 17:00 and 21:00. Finally, the optimized purchase tariff is also shown in the diagram of the overall grid: constant low prices (1 cent/kWh) are applied while the system is a net producer until 17:00, whereas high, slightly decreasing prices are used afterwards (15.74 cent/kWh at 18:00, decreasing by 0.05 cent/kWh per hour).

On the level of individual PGs, the applied tariff motivated the E+grid PG to charge its battery while it is a net producer, to reach a fully charged state during 14:00–17:00, and to gradually discharge the battery in the rest of the peak period between 17:00 and 21:00. The controllable load of the EV PG was fully deferred to the valley period after 21:00. In that period, the slight decrease of the purchase prices over time compensated the PG for its linearly decreasing utility function, Therefore, an arbitrary scheduling of the controllable load became optimal for this PG. There were no controllable variables for the household PG. This tariff and consumption profile are globally optimal for the retailer, since no further load can be moved outside the peak period between 17:00 and 21:00.

With 10 SLP runs within the randomized restart heuristic, the implementation of the proposed solution approach in Fico Xpress 7.8 could solve the above problem instance in 9.8 seconds on a computer with Intel i7 2.70 GHz CPU and 16 GB RAM.

5.2 Computational experiments

The evaluation of the proposed approach in computational experiments focuses on two questions: \({\textcircled {1}}\) the computational effort required by the proposed SLP solution; \({\textcircled {2}}\) the quality of the solutions found. Large problem instances are generated by the multiplication and random perturbation of the data used in the above illustrative example. Table 2 displays the average computation time in seconds over 10 instances for different combinations of N (number of PGs) and T (number of time periods), which are achieved with the proposed model (15)–(19) and the algorithm using 10 SLP runs. The results show that the computation time increases moderately with problem size, and practically relevant problem sizes, e.g., with \(N=20\) and \(T=48\), are tractable in a reasonable amount of time. In applications where a different tradeoff between solution quality and computation time is looked for, the algorithm can be tuned by modifying the number of SLP runs.

Table 2 Average computation time in seconds by problem size for proposed SLP solution
Table 3 Comparison of exact KKT and proposed SLP solution

In order to evaluate the quality of the solutions found by the proposed approach, they are compared to the exact optimal solutions of the MILP model received by applying KKT reformulation and linearization as discussed in Section 4.7. The results of the comparison are displayed in Table 3, which displays aggregated results over 10 instances for each problem size. The column of Opt. (short for optimality) contains the ratio of instances that could be solved to prove the optimality using KKT, and the column of Time shows the average computation time required for this. The branch and bound search are aborted when the time limit of 600 s is hit, and the best integer solution found is recorded. The columns of Min., Avg., and Max display the minimum, the average, and the maximum gap between the SLP and the KKT solutions for the given problem size. Finally, the column of Time contains the average computation time for the SLP solution. The results show that the smallest instances, with \(N=3\), could be solved to prove the optimality using KKT with a single exception. Although SLP is not an exact solution approach, in practice it also builds close-to-optimal solutions with an average gap of only 0.01%–0.1%. For larger instances \(N=5\), where KKT failes to find the optimal solution, SLP often constructs significantly better solutions, as indicated by negative gap values. Namely, SLP finds up to 40% better solutions than KKT with one or two orders of magnitude lower computation times.

Hence, it can be concluded that although KKT is an exact solution in theory, its applicability is limited to small problems, e.g., with \(N=3\). In contrast, the proposed primal-dual reformulation couples with SLP scales much more favorably, and it computes high-quality solutions efficiently even for practically relevant problem sizes.

6 Conclusion

This paper introduces a bilevel programming approach to energy tariff optimization for DRM in smart grids. In the Stackelberg game model, the leader is a profit maximizing retailer, who sets the energy tariff offered to its prosumers and purchases electricity for them from the wholesale market. The prosumers, who act as multiple independent followers, optimize their controllable load and their battery charging schedule to maximize their utility and minimize their cost of energy. A new solution is introduced, which exploits the primal-dual reformulation of the followers’ problem to arrive at a single-level QCQP equivalent of the bilevel problem. It has been shown that the resulting QCQP can be solved efficiently using an SLP algorithm. In particular, it is illustrated in computational experiments that the proposed approach outperforms the technique based on the KKT reformulation, which is the dominant approach for solving similar problems in the literature. Hence, the main contributions of the paper are a bilevel programming formulation of the tariff optimization problem, formal proofs of some basic properties, and the application of new and efficient mathematical programming techniques to solve this problem.

The proposed model can be trivially extended to some more complex problems, e.g., with various types of controllable loads and storage devices for each PG, or switchable generators and energy storage at the retailer. A more important and challenging direction for future research is the investigation of richer, non-linear prosumer models that can capture more realistically, e.g., thermal processes of HVAC in buildings or charging properties of batteries. The extension to a stochastic variant, accounting for uncertainties in consumption, production, and spot market prices is also of interest.

Finally, it must be observed that while Stackelberg game models are becoming ubiquitous in the literature of DRM, a critical pre-condition of their practical applicability is that the leader should be able to identify the decision models and parameters. This is a challenging problem in application scenarios characterized by information asymmetry. A promising solution can be the application of inverse optimization, analogously to a case in inventory control [39]. With historical pairs of a follower’s input (i.e., energy tariff) and response (consumption), the inverse optimization approach looks for parameters which ensure that each response is optimal for the corresponding input.