Inverse optimization approach to the identification of electricity consumer models

Stackelberg game models for demand response management in smart electricity grids have been studied extensively in the scientific literature. Still, a barrier to their practical applicability is the assumption that the retailer (leader in the game) has perfect knowledge about the consumers’ (followers’) decision model. This paper investigates the possibilities of reconstructing the consumers’ decision model from historic tariff and consumption data. For this purpose, it introduces an inverse optimization approach to eliciting the parameters of electricity consumer models formulated as linear programs from the historic samples. The inverse problem is first transformed into a quadratically constrained quadratic program, and then solved using successive linear programming techniques. The approach is demonstrated on a common consumer model with multiple types of deferrable loads behind a single smart meter. Experimental results are presented, and directions for future research are proposed.


Introduction
An utmost challenge in the operation of future smart electricity grids is developing effective practices for demand response management (DRM): with the increasing share of renewables in the electricity mix, power generation is becoming less and less flexible. This implies that the traditional supply follows demand approach, i.e., power plants at any point in time generating exactly as much electricity as required by consumers, seems less and less feasible. Fortunately, in parallel, with the appearance of new, flexible types of load (e.g., electric vehicle charging) and new ICT devices to control traditional loads, the alternative demand follows supply paradigm came into B András Kovács andras.kovacs@sztaki.hu 1 Institute for Computer Science and Control (SZTAKI), Budapest, Hungary the limelight. The collective name of the practices for adjusting consumer behavior to better match available supply is demand response management (DRM, with a focus on short-term, e.g., intra-day behavior) or demand-side management (DSM, allowing a longer time horizon). The primary means to achieving DRM is an appropriate financial incentive, such as a time-of-use electricity tariff. However, designing a suitable electricity tariff requires a deep understanding of consumer behavior.
The natural and commonly applied mathematical model for DRM is a multifollower Stackelberg game (Kovács 2019a;Maharjan et al. 2016;Zugno et al. 2013). The leader in the game is an electricity retailer, who aims to motivate its consumers to adjust their electricity consumption to the system-level objectives. Consumers are multiple independent followers, who respond to the electricity tariff by scheduling their load to minimize their cost of electricity and to maximize their utility.
A common critical assumption of the above Stackelberg approaches is that the retailer has perfect information about the decision model and the parameters of the consumers. Obviously, this assumption cannot be met in practice: consumption is determined partly by hard-to-predict human behavior (which probably dominates for residential consumers), and partly by economic rationale (which prevails in case of industrial consumers). It is particularly challenging to characterize consumer responsiveness to electricity tariff, i.e., to find out how consumers will react to different candidate tariffs. At the same time, due to the ever wider availability of smart metering technologies (Avancini et al. 2019;Kabalci 2016), the retailer holds a huge amount of historic data about the behavior of its individual consumers, including corresponding electricity tariff and consumption time series. This paper investigates how these historic records can be exploited to reconstruct the decision model of the followers. For this purpose, it assumes that some of the consumer models frequently applied in the literature, with stationary parameters over time, capture consumer behavior with a suitable precision. Then, it applies an inverse optimization approach to elicit parameter values for that decision model from historic data. The proposed approach is applicable to arbitrary consumer models formulated as linear programs (LP), which is the most common representation in literature, see, e.g., (Sharma et al. 2016). The generic approach is illustrated on a typical consumer model, with multiple deferrable loads behind a single smart meter at the consumer. The effectiveness of the approach is investigated in computational experiments. The need for extending inverse optimization models typically addressed by the operations research community is also highlighted. This paper is a substantially extended version of the earlier conference paper (Kovács 2019b). Extensions include a proper positioning of the novel results in the state-of-the-art of inverse optimization, involving a formal demonstration that the problem at hand is non-convex, as well as a thorough experimental investigation of the proposed approach. An additional outlook to alternative modeling approaches is also given, including a mixed-integer linear program (MILP) that exploits a structural property of the consumer's optimal solutions.
The paper is structured as follows. A review on DRM and the relevant mathematical methodologies is presented in Sect. 2. The parameter elicitation problem is defined formally in Sect. 3. Then, the proposed solution approach is introduced (Sect. 4). The approach is validated in computational experiments in Sect. 5. A discussion on potential alternative solution methods is presented in Sect. 6. Finally, conclusions are drawn and directions for future research are discussed.

Demand response management in smart grids
Stackelberg game approaches are widely used to address optimization problems in DRM (Esther and Kumar 2016). In most such models, the leader is an electricity retailer, while the followers are its consumers, who aim to schedule deferrable (Yu and Hong 2016) or curtailable (Maharjan et al. 2016) loads, charging their batteries (Kovács 2019a) or electric vehicles (Tushar et al. 2012) taking into account the electricity tariff set by the retailer. The Stackelberg equilibrium can be computed analytically for some simpler models, typically those considering a single time period (Maharjan et al. 2016). At the same time, the solution of more sophisticated multi-period models requires search. The commonly applied solution method is transforming the corresponding bilevel program into a single-level MILP using the Karush-Kuhn-Tucker (KKT) conditions (Zugno et al. 2013;Tushar et al. 2012). Recently, successive linear programming (SLP) has also shown favorable performance on some models (Kovács 2019a).
The practical applicability of the above Stackelberg approaches has been criticized for two main reasons: (1) the assumption that the retailer has perfect information about its consumers; and (2) for the simplistic models applied to characterize consumer behavior.
Indeed, the modeling of consumer behavior from the viewpoint of DRM has received significant attention recently. Approaches can be roughly classified as technological engineering models and econometric empirical studies (Vallés et al. 2018). The former group of methods build detailed models of the main load components, and aggregate these components to calculate the grid-level electricity consumption. This allows the investigation of power systems and their operation practices in simulated environments. However, the accuracy of the models is often disputed, and matching these formal models with observed consumption is a challenge. A particular difficulty is a mismatch in their granularity: apart from experimental scenarios, smart electricity meters measure the total consumer-level (e.g., household-level) consumption, whereas the load profile of individual appliances is not readily accessible. Load disaggregation addresses the decomposition of the total consumption to device-level load using machine learning and background knowledge to achieve non-intrusive appliance load monitoring (NIALM) (Miyasawa et al. 2019;Zhang et al. 2017).
Econometric empirical studies, in contrast, rely on statistical data obtained from measurements on the physical system, without a formal model of the individual load components. They look for correlation between the measured consumption, the electricity tariff, and other external variables using statistical methods. A probabilistic characterisation of the load flexibility of residential consumers is given in (Vallés et al. 2018). Price and volume signals are considered as incentives from the demand response provider to consumers, which offer monetary benefits for consumers in exchange for the modifications of their consumption in given time intervals. The DRM potential in electric vehicle charging has received special attention: while classical contributions depart from the fraction of the charging time and the connection time (Kara et al. 2015), some recent studies aim at composing a more sophisticated statistical characterization (Sadeghianpourhamami et al. 2018). A methodology for analyzing the reflectivity of electricity tariffs in simulation models is proposed in (Jargstorf et al. 2015) for residential consumers with photovoltaic (PV) generation and battery storage.

Inverse optimization
In mathematical programming terms, the problem of finding parameter values for an optimization problem that lead to a given optimal solution is called inverse optimization. Models and algorithms for inverse optimization are reviewed in (Heuberger 2004). The classical work of Ahuja and Orlin (2001) addresses inverse optimization problems where a cost vector is looked for that makes a given solution optimal, while it causes the smallest perturbation compared to an initial cost vector under the L 1 or the L ∞ norm. The constraint coefficients are assumed to be fixed. With this restriction, polynomial algorithms are given for various inverse problems, including the inverse variants of linear programming and different graph problems. An outlook to the L 2 norm, with the same restriction, is given in (Amirkhanova et al. 2016). The paper (Schaefer 2009) investigates inverse integer programming, again, with unknown parameters in the objective only.
A substantial generalization where both the cost vector and the constraint coefficients can be varied within a closed polyhedron for inverse linear problems is studied in (Dempe and Lohse 2006). Given a desired solution x 0 , parameter values and a modified solution x are looked for that minimize the distance |x, x 0 | in the Euclidian norm. It is shown that in the general case, this problem is non-convex with multiple locally optimal solutions. Necessary and sufficient conditions of optimality are proven.
Many of the above papers highlight the relation of inverse optimization to (or its potential applications in) other fields of computer science, including parameter identification and machine learning. The recent survey (Ye et al. 2019) highlights the similarities between the challenges faced by the optimization and the machine learning communities in solving inverse problems, and investigates the possibilities of crossfertilization. From among optimization approaches, this survey stresses the common application of iterative, gradient-based methods for solving non-linear problems.
A key novelty of the problem addressed in this paper compared to the above stateof-the-art is that suitable parameter values should be found for a large set of historical solutions, rather than a single desired solution. We are aware of a single earlier contribution from the literature where inverse optimization is used for parameter elicitation of a set of historic solutions to a lot sizing problem (Egri et al. 2014). Still, in that problem, all unknown parameters were located in the objective, and it was assumed that there exists a combination of parameters that renders all historical solution a feasible solution of the lot sizing problem (i.e., there is no noise on the historical data). The current problem lifts both of these assumptions.

Direct problem
This paper illustrates the proposed parameter elicitation technique on a common electricity consumer model with multiple controllable loads behind a single smart meter. This problem, solved by the consumer to schedule its loads subject to the time-of-use electricity tariff, constitutes the direct problem in our inverse optimization approach. A similar consumer model is used, e.g., in (Kovács 2018) and (Yu and Hong 2016).
In this model, the consumer schedules N different types of controllable loads over a finite time horizon divided into T time periods of equal length. For each type of load i = 1, ..., N , the total demand M i over the horizon is given. The load scheduled into period t is bounded from above by L i,t . Scheduling a unit of load of type i into period t incurs a utility of U i,t for the consumer. 1 The unit price of electricity Q t also varies over time. Then, the consumer aims to maximize its total utility incurred and minimize the total cost of electricity. This problem can be formulated as an LP as follows. Symbols in brackets on the r.h.s. of the constraints represent the dual variables assigned to the constraint. The applied notation is displayed in Table 1.
In this LP formulation, the objective (1) states that the consumer maximizes its total utility minus the cost of electricity. Equality (2) declares that the total load of type i must equal M i , whereas constraint (3)  A core assumption of the approach is that consumer behavior can be characterized sufficiently well by the above model with stationary parameters over time. Accordingly, M i , L i,t , and U i,t are common over all historic samples, where samples correspond to repetitive time intervals that are characteristic for the given type of load, such as days, workdays, or weekend days. To reflect that sufficiently good characterization is assumed instead of perfect characterization, we allow the realized total load of the consumer, z k t , to deviate from the load predicted by the model The extent of tolerable deviation will be analyzed in computational experiments.

Inverse problem
Let us assume that the above direct problem captures the consumer's behavior with a reasonable accuracy. Then, the goal is to elicit the parameter values M i , L i,t , and U i,t applied by the consumer to schedule its loads.
The electricity retailer is aware of the electricity tariff Q k t and the per period total consumption z k t (with z k t ≈ i=1 N x k i,t ), i.e., the consumer's past demand responses to the variation of the electricity tariff from smart meter readings. At the same time, it is unable to directly observe the detailed, per device consumption x k i,t , and does not dispose of any background information on the parameter values M i , L i,t , and U i,t . Then, the inverse problem consist in determining these unknown parameter values from historic data. This inverse problem can be formulated as follows. Minimize The objective (5) is minimizing the misfit of the model, calculated as the absolute difference between the measured historic consumption and the consumption predicted by the model (6). The equilibrium constraint (7) states that load values x k i,t are derived from solving the parametric direct problem to optimality.

Problem characteristics
In the above formulation, the parameters to be elicited appear both in the objective and on the r.h.s. of the constraints of the direct problem (and accordingly, of the equilibrium constraint in the inverse problem), and hence, this problem does not fit into the classes of inverse problems typically investigated in the literature, with unknown parameters appearing only in the objective (Ahuja and Orlin 2001;Schaefer 2009). Moreover, this problem is non-convex, as it is shown in the lemma below.

Lemma 1 The above inverse optimization problem is non-convex.
Proof Consider an instance of the inverse problem with a single type of load, two time periods, and a single historic sample (N = 1, T = 2, and K = 1), where the sample contains one unit of load in the first time period, and no load in the second period (z 1 1 = 1, z 1 2 = 0). Observe that the following solutions are both optimal. In solution S 1 , the utility of the second period is very high, but the maximum load is zero (S 1 : U 1,1 = 1, U 1,2 = 1000, L 1,1 = 1, L 1,2 = 0, M 1 = 1); whereas in S 2 , the maximum load is very high, but the utility is zero (S 2 : U 1,1 = 1, U 1,2 = 0, L 1,1 = 1, L 1,2 = 1000, M 1 = 1). Both of these parameter combinations induce that all load will be scheduled into the first time period, and therefore, these solutions incur zero error (x k 1,1 = 1, x k 1,2 = 0, k,t ε k t = 0). Now, a convex combination of S 1 and S 2 with identical weights has high utility and high maximum load in the second period (S : U 1,1 = 1, U 1,2 = 500, L 1,1 = 1, L 1,2 = 500, M 1 = 1). With these parameters, the consumer schedules all load into the second period (x k 1,1 = 0, x k 1,2 = 1), which incurs a positive error ( k,t ε k t = 2). This contradicts convexity.

Solution approach
The above defined inverse optimization problem is a mathematical program with an equilibrium constraint, which is not directly trackable using classical tools of operations research. Therefore, it is first reformulated to a quadratically constrained quadratic program (QCQP). Since the resulting QCQP is non-convex, SLP is applied to solving it.

Reformulation to QCQP
Reformulation to QCQP is performed by exploiting strong duality for the direct problem, formulated as an LP. Accordingly, in the resulting QCQP, the equilibrium constraint (7) is replaced by the following set of constraints: the primal constraints of the direct problem (constraints (12)-(13) below); the dual constraints of the direct problem (14); and finally, a constraint stating that the primal and the dual objectives are equal (15). For self-containedness, the complete QCQP reformulation is presented below: Minimize Quadratic terms in the above formulation appear solely in constraint (15): U i,t x i,t from the primal objective, while M i α k i and L i,t β k i,t from the dual objective.

Solution by SLP
Since the above inverse optimization problem is non-convex (see Lemma 1), there is little hope for finding efficient exact solution approaches for solving it. For this reason, an SLP solution approach has been implemented, which has been successfully applied to similar problems in demand response management (Kovács 2019a). SLP solves non-linear problems by iteratively building local LP approximations of the original problem, and solving each approximation using standard LP techniques (Byrd et al. 2003;Palacios-Gomez et al. 1982). Departing from some initial solution X 0 , in each iterative step j, SLP builds a local linearization of the problem around X j , denoted by LP j . Then, it solves LP j within a bounded environment of X j . If the resulting LP solution is feasible for the original problem with a given tolerance, then it is accepted as X j+1 . Otherwise, another solution for LP j is looked for within a closer environment of X j . Since SLP is an iterative heuristic by nature, it may get stuck in local optima and return a sub-optimal solution. The quality of the solutions found and the computational efficiency on the problem at hand will be investigated in computational experiments.

Design of experiments
The effectiveness of the proposed approach was investigated in computational experiments on generated data. Experiments addressed whether a hypothetical retailer can predict the future behavior of its consumers based on model parameters elicited from historical samples using the proposed approach. For this purpose, consumer models were constructed with randomized parameter values U 0 i,t , M i 0 , and L 0 i,t . It was assumed that the retailer cannot observe these original parameter values, but it has access to a set of K historic samples, each sample containing a corresponding vector of tariff and overall consumption values Q t k , z t k T t=1 . These samples were generated by solving the direct problem with random tariff values Q k t . The resulting overall consumption was perturbed by noise to reflect that the model cannot give a perfect characterization of real consumer behavior, using the formula z t k = U (1 − π, 1 + π) i x i,t k . Here, U (a, b) denotes the continuous uniform ran-dom distribution over the interval [a, b]. The level of relative noise was varied between π = 0 (i.e., no noise) and π = 0.2 (significant deviation from the model). Applying the proposed approach to these samples resulted in elicited parameter values U i,t E , M i E , and L i,t E . Then, the proposed approach was evaluated by comparing the solutions of the direct problem with the original (U i,t 0 , M 0 i , L 0 i,t ) and the elicited (U i,t E , M i E , and L i,t E ) parameter values. A test set of randomized tariff values, independent of the historic samples used for elicitation, was used as additional input for the comparison.
Instances with different sizes were generated by selecting the number of load types from N ∈ {1, 3, 5} and the number of samples from K ∈ {25, 50, 100, 200}. The length of the time horizon was fixed to T = 12. The test set contained 50 randomized tariff vectors. All experiments were performed using an implementation of the QCQP (9)-(16) in Xpress 7.8 in the Mosel programming language, by applying the SLP package for solving the model. The experiments were run on a personal computer with Intel i7 2.70 GHz CPU and 16 GB RAM.

Results with a single type of load
For the special case with a single type of load (N = 1), the proposed approach enabled a rather successful prediction of consumer behavior. This is shown in Fig. 1, which compares the load curves over time with the original (blue) and the elicited (orange) parameter values, with a low number of samples (K = 25) and considerable noise (π = 0.2). Even better predictions could be achieved with a higher number of samples or less noise. The root mean square error (RMSE) of the predicted behavior with different values of π and K is depicted in Figs. 2, 3 and 4. Each diagram shows the mean RMSE over the 50 test instances (red dashed line), and the mean RMSE (blue continuous line) with a 80% confidence interval around it (light blue area). Hence, out of the 50 test instances, the result of 40 tests lie in the blue area, whereas 5 above it and 5 under it.
It is emphasized that for instances without noise (π = 0), the consumer model could be fitted to the historic samples without any error (ε k t = 0, ∀k, t in the inverse problem). Yet, the elicited parameter values do not match perfectly the original ones. The reason is that the consumer may have different motivations behind the same behavior pattern. For instance, scheduling no consumption into a time period t may stem from a very low utility U i,t or a maximum load of zero, L i,t = 0. These two possible motivations could be distinguished only by extreme tariff values that are not present in the available historic samples. Accordingly, for π = 0, the predicted behavior on the independent test set usually matches the model result with the original parameter values, which is illustrated by both a median RMSE of zero and a confidence interval width of zero. However, for a few 532 A. Kovács outliers outside the 80% confidence interval, the predicted behavior was structurally different from the model behavior, resulting in non-zero mean RMSE. What is even more important, consumer behavior could be reconstructed from noisy samples as well, which is demonstrated by errors converging to zero as K increases, independently from the value of π . The mean RMSE decreased to 1.15% (median: 0.27%) with π = 0.1 and to 0.91% (median: 0.54%) with π = 0.2 as the number of historic samples increased to 200.
Computation times are displayed in Table 2, where each row contains average computation time for solving the elicitation problem with a given number of samples K . Each average value is computed over instances with three different levels of perturbation, with five SLP runs for each instance starting from different random initial solutions, i.e., 15 SLP runs altogether. Solution times range from 0.17 s for small instances (K = 25) to 2.03 s for the largest problems investigated (K = 200). Instances without perturbation (π = 0) could be solved quickly, in 0.08 s for K = 25 and 0.49 s for K = 200, since SLP terminated after a few iterations with an objective value of zero. In contrast, there was no clear correlation between the computation times and the level of perturbation for π > 0.

Results with multiple types of load
Results for the generic case with multiple types of load (N ≥ 2) are also promising, but still somewhat more ambiguous. The proposed approach could predict consumer behavior with a reasonable accuracy, as depicted by the power curves for the instance with K = 25 and π = 0.2 in Fig. 5,and the MRSE diagrams in Figs. 6,7,8,9,10 and 11. A typical error of 5-8% in the predicted consumption of an individual consumer, in itself, is acceptable in typical applications. Yet, these errors are an order of magnitude larger than with N = 1.
Moreover, the SLP solution approach could not reconstruct the original parameter values (U 0 i,t , M 0 i , L 0 i,t ) even for instances without noise (π = 0), where these parameter values would incur an optimal solution with zero error. Here, the MRSE does not converge to zero as the number of samples K increases. These negative results indicate that further research should be invested into developing more efficient algorithms for solving the inverse optimization problem.
Computation times for different problem sizes are shown in Table 2, with average values computed over 15 SLP runs. While small problems could be solved quickly, e.g., in 4.19 s for N = 3 and K = 25, larger instances required considerable computation Lemma 2 Given an optimal solution to the investigated direct optimization problem, for any load type i, sample k, and for any two time periods t 1 and t 2 , at least one of the following statements holds: -No load is scheduled to period t 1 , i.e., x k i,t1 = 0; or -Period t 2 is saturated, i.e., x k i,t2 = L i,t ; or Proof Assume that none of the above conditions hold in a given solution. Then, there is a nonzero amount of load, min(x k i,t1 , L i,t2 − x k i,t2 ) > 0, that can be rescheduled from period t 1 to period t 2 , and this causes an increase of the objective value for the consumer. Hence, the original solution cannot be an optimal one. This lemma can be exploited directly by a mixed-integer linear programming (MILP) formulation with two sets of binary variables, one to indicate if a time period is empty (x k i,t = 0) and another one to denote that the period is full (x k i,t = L i,t ). This MILP was implemented in FICO Xpress, and it was used to validate the results of the proposed solution approach on small problem instances. However, this MILP does not scale up to larger instances, due to the large number of binary variables and the weak LP relaxation.
An interesting alternative approach can be the application of stochastic optimization techniques motivated by machine learning, such as the stochastic gradient descent (SGD) algorithm or one of its descendants with reduced variance (Shalev-Shwartz and Zhang 2013;Ye et al. 2019). These iterative algorithms have been developed to learn from large data sets. They employ a small batch of data in each individual iteration to increase the speed of learning, while they have shown favorable convergence properties on the whole data set. Finally, the coupling of the inverse optimization approach with load disaggregation techniques is a promising direction for the multiple load types case.

Conclusions
This paper introduced a novel approach to elicit the parameters of electricity consumer models from historic time series based on inverse optimization for the purpose of DRM. The approach is applicable to consumer models formulated as a LP, and relies on converting the inverse LP model to a QCQP and solving it using SLP. This requires the generalization of the models commonly applied in inverse optimization in two different ways: by unknown parameters both in the objective and the constraints, and by adopting multiple desired solutions.
The approach was illustrated on a consumer model with multiple types of deferrable load behind a single smart meter, which is a common model in the literature of DRM.
In computational experiments on generated data, the approach has shown promising performance. For instances with a single load type, the elicited parameters enable a nearly perfect prediction of future consumer behavior. The relative error of 5-8% is also a reasonable result for instances with multiple load types, but for this generic case, the convergence properties of the SLP algorithm did not satisfy our expectations.
Accordingly, future research has to pursue multiple directions: for improving the performance of the algorithm, the combination of the inverse optimization approach with stochastic optimization methods or load disaggregation using machine learning techniques has to be investigated. Moreover, the evaluation of the proposed approach on more generic consumer models and real measured data is of interest.