Abstract
The constantly increasing electricity and energy demand in residential buildings, as well as the need for higher absorption rates of renewable sources of energy, demand for an increased flexibility at the endusers. This need is further reinforced by the rising numbers of residential Photovoltaic (PV) and batterystorage systems. In this case, flexibility can be viewed as the excess energy that can be charged to or discharged from a battery, in response to a group objective of several such batterystorage systems (aggregation).
One such group objective considered in this paper includes marketing flexibility (charging or discharging) to the Dayahead (DA) spot market, which can provide both a) financial incentives to the owners of such systems, and b) an increase in the overall absorption rates of renewable energy. The responsible agent for marketing and offering such flexibility, herein aggregator, is directly controlling the participating batteries, in exchange to some financial compensation of the owners of these batteries.
In this paper, we present an optimization framework that allows the aggregator to optimally exchange the available flexibility to the DA market. The proposed scheme is based upon a reinforcementlearning approach, according to which the aggregator learns through time an optimal policy for bidding flexibility to the DA market. By design, the proposed scheme is flexible enough to accommodate the possibility of erroneous forecasts (of weather, load or electricity price). Finally, we evaluate our approach on realworld data collected from currently installed batterystorage systems in Upper Austria.
Introduction
A recent trend in home automation is a constant increase in the number of batterystorage systems (Kairies et al. 2016). So far, such storage systems are mainly used to maximize the onsite absorption of the PV generation. Given the current need for increasing the percentage of renewable energy fed into the grid, the available charging/discharging flexibility potential can also be used to react to price variations in the Dayahead (DA) or the Intraday (ID) electricity spotmarket. The responsible party (or aggregator) could make such decisions over the specific use of the participating storage units with respect to the optimal participation in the DA and ID spot markets, the benefits of which are then transferred to the owners of the participating batteries.
In this paper, we focus on addressing the problem of optimal participation of a set of residential batterystorage systems in the DA market. The underlying assumption is that the aggregator (of a pool of batteries) may directly control the operation of the participating batteries if required, thus any charging/discharging flexibility potential can be extracted in realtime. The proposed scheme will be based upon an approximatedynamicprogramming (or reinforcementlearning) methodology. According to this scheme, an approximation function of the performance is being trained (using historical data) that can be used to generate optimal biddings/schedules for the DA market. By design, the proposed scheme is flexible enough to accommodate the possibility of erroneous forecasts as well as the need for reoptimizing in realtime upon receipt of corrected/updated forecasts.
It is worth noting that the proposed framework is part of the Flex+ project (Flex+). In this project, several componentpools consisting of a single technology (amongst heat pumps, boilers, batterystorage systems and ecars) are enabled to participate in the shortterm electricity markets, whilst also considering the interests of the consumers (e.g., comfort, ecofriendliness, greediness, etc.).
The remainder of this paper is organized as follows. In “Related work and contributions” section, we present related work and the main contributions of this paper. “Participation costs and notation” section presents the notion of participation costs in the flexibility extraction program. “System dynamics” section presents a detailed formulation of the overall system dynamics, and “Optimal activation” section addresses the problem of (instantaneous) optimal activation of a set of batteries, given a desirable amount of flexibility extraction (charging/discharging). Both the system dynamics and the instantaneous optimal activation are essential parts for formulating the problem of optimal flexibility bidding in the DA market (briefly, DAoptimization), which is presented in “Dayahead (DA) optimization” section. In “Approximate dynamic programming (ADP) for DA optimization” section, we propose a reinforcementlearning scheme specifically tailored for addressing the DAoptimization, the effeciency of which is evaluated on realworld data. Finally, “Conclusions and future work” section presents concluding remarks.
Related work and contributions
With the constantly increasing renewable generation, users need to be flexible in adjusting their energy consumption, giving rise to demand response mechanisms. Demand response refers to the ability of each user to respond to certain requests reported by the network operator. This is usually performed either in the form of a commitment of the consumer to reduce load during peak hours (Ruiz et al. 2009; Chen et al. 2014) or by introducing financial incentives that affect prices during peak hours (Herter 2007; Triki and Violi 2009; Xu et al. 2016). For example, a commitmentbased approach has been proposed by Chen et al. (2014), in which the operator distributes portions of its desired aggregated demand in the household users, using an average consensus algorithm. In particular, each one of the households receives a local demand objective which may only be fulfilled through the adjustment of its own flexible loads. On the other hand, an incentivebased approach has been proposed by Xu et al. (2016), where each participating household communicates to the operator a bidding curve, that is a function that provides the load adjustment that each user is willing to perform at a given price. Then, the group operator computes the clearing prices, so that the overall cost of the participating households is minimized while achieving the desired demand adjustment. A similar approach to the one proposed in this paper, including also battery charging/discharging control, is considered in Nguyen et al. (2015). However, there is no specific centralized operator objective for either production/consumption. Instead, the objective in Nguyen et al. (2015) is to drive all instantaneous energy demand close to the average demand of the network.
Apart from these approaches (commitmentbased or incentivebased), there is an alternative methodology which can be considered as a combination of the two and it is the one employed in this paper. According to such methodology, an aggregator directly extracts the required flexibility from the participating equipment when necessary. In return the aggregator offers to the owners of the participating equipment an agreed financial compensation. Such methodology is usually referred to as demandresponse aggregation (Parvania et al. 2013). It has been employed in Parvania et al. (2013), where aggregators can activate load reduction in a set of consumers according to an agreed demandresponse strategy for each consumer. Similar in spirit is also the work in references (Iria et al. 2017; Nan et al. 2018), where an aggregator directly controls a set of different types of loads in residential buildings to reduce total electricity consumption. As expected, a feature that distinguishes demandresponse aggregation is the selfscheduling or activation optimization problem, that is the optimization of optimally utilizing the available flexibility (stemming from several households) over a future time horizon. Such feature (of multiple households) is not usually considered in the context of participation in a wholesale electricity market (see, e.g., (GomezVillalva and Ramos 2003; Philpott and Pettersen 2006)).
In the context of batterystorage systems, demandresponse aggregation (as discussed in the previous paragraph) has not yet been addressed in an effective and computationally efficient way. In this context, the aggregator wishes to compute an optimal (dayahead) schedule for extracting flexibility (charge, discharge or do nothing) for each one of the participating batteries. So far, such optimization problem has mostly been addressed for a single battery system, e.g., (MohsenianRad 2016; He et al. 2016). Existing methodologies also include a detailed modeling of the battery as well as a detailed description of the cycle life costs of the battery due to the frequent charging/discharging (He et al. 2016). It may include computations of optimal bids for the dayahead electricity market, as in MohsenianRad (2016); He et al. (2016), or the intraday/hourahead electricity market, as in Jiang and Powell (2015). In order to effectively address the uncertainty of the initial/final stageofcharge of the battery, reference (Jiang and Powell 2015) also employs an approximate dynamic programming formulation.
In this paper, participation in the dayahead wholesale electricity market is implemented by directly controling the batterystorage systems, as in MohsenianRad (2016); He et al. (2016); Jiang and Powell (2015). The specific contributions and novelty of this paper can be summarized as follows:

In comparison to the work on demandresponse aggregation, the aggregator derives specific optimal scheduling (or activation) strategies for the set of participating batteries over the duration of the next day. The specifications and cost functions of the participating batteries are directly included in the optimization (contrary to a generic load reduction strategy, e.g., (Parvania et al. 2013)).

In comparison to the work on dayahead optimal bidding of flexibility available in batterystorage systems, we consider aggregators that can directly control multiple batterystorage systems simultaneously. This extends prior work presented in MohsenianRad (2016); He et al. (2016); Jiang and Powell (2015) which are restricted to a single battery.

In order to address the complexity of deriving optimal bidding schedules for multiple batterystorage systems simultaneously, we introduce a novel optimization method that hierarchically decomposes the problem into a twolevel optimization problem: a) at the upperlevel, we compute the optimal (aggregate) flexibility (charging/discharging) that can be extracted from the set of available batteries for each 15min interval over the duration of the next day, b) at the lowerlevel, we compute the optimal activation strategy of the participating batteries in order to generate the overall flexibility determined by the upper level (a) for each 15min interval. Naturally, this decomposition significantly reduces the computational complexity of the problem, contrary to Iria et al. (2017); Nan et al. (2018) where these two computations are treated simultaneously.

The computation of the flexibility bids/schedules is based upon an Approximate Dynamic Programming (ADP) methodology specifically tailored to this problem. An ADP approach can naturally incorporate uncertainties or erroneous forecasts, while it provides optimal strategies (instead of direct schedules). Thus, the schedules can directly be updated in case of updated forecasts. An ADP optimization approach also offers an attractive computational efficiency, as it will be discussed in detail in the forthcoming “Discussion” section.
Participation costs and notation
The participation costs are the costs experienced by the owners of the batterystorage systems due to the intervention of the aggregator through the DA flexibility schedules. We will consider a generic form of participation costs that can be used to model discomfort of the participants. Discomfort could be expressed with respect to the weight that the battery owner puts in certain preferences, e.g., autarky (i.e., maintaining a high stateofcharge at all times), ecofriendliness (i.e., priority on charging the battery only with PV generation), greediness (i.e., always selling available PV or battery energy), etc.
The optimization horizon (next day) is divided into time intervals of duration ΔT. We enumerate the resulting time intervals (T in total) using index t=1,...,T. Let N be the total number of participating batterystorage systems and let i∈{1,...,N} be a representative element of this set. The parameter u_{i}(t)∈[−1,1] will denote the activation factor of battery i at time t. If u_{i}(t)≥0, then u_{i}(t) expresses the percentage of the charging flexibility potential that is activated. Analogously, if u_{i}(t)≤0, then u_{i}(t) expresses the percentage of the discharging flexibility potential that is activated. The quantities v_{c,i}(t)≥0,v_{d,i}(t)≤0 will denote the charging and discharging flexibility potential that is available in household i, respectively. We will also denote v_{c}(t)=[v_{c,i}(t)]_{i} and v_{d}(t)=[v_{d,i}(t)]_{i} as the vectors of the available positive and negative flexibility potential, respectively. Finally, let \(V_{d}(t)\doteq \mathbf {1}^{\mathrm {T}}\mathbf {v}_{d}(t)\leq {0}\) and \(V_{c}(t)\doteq \mathbf {1}^{\mathrm {T}}\mathbf {v}_{c}(t)\geq {0}\)) be the total discharging and charging flexibility potential, respectively.
The user’s participation costs could be expressed through a set of bidding curves. In particular, we introduce the positive constants β_{c,i}(t),β_{d,i}(t), that represent the cost that the provider pays to the participating household per unit of charging and discharging potential retrieved at time interval t, respectively. For example, if the user values autarky a lot, then we can select a large β_{d,i}(t) when the stateofcharge of the battery is high, and small β_{c,i}(t) when the stateofcharge is low. Then, when there is a decision for a charging activation of user i at time interval t, i.e., u_{i}(t)>0, then u_{i}(t)v_{c,i}(t)β_{c,i}(t)>0 represents the cost of activation. Analogously, in case of a discharging activation of user i at time interval t, i.e., u_{i}(t)<0, then u_{i}(t)v_{d,i}(t)β_{d,i}(t)>0 represents the cost of activation.
We will denote E_{i}(t) to be the energy charged to or discharged from household i. We will also adopt the convention that the energy is positive if it is charged to the battery and negative otherwise. In other words, if u_{i}(t)≥0, then E_{i}(t)=u_{i}(t)v_{c,i}(t)≥0 (energy is charged to household i), and if u_{i}(t)≤0, then E_{i}(t)=−u_{i}(t)v_{d,i}(t)≤0 (energy is discharged from household i). In several cases, we will denote by E(t) as the total energy charged/discharged from all participating households, i.e., \(E(t)\doteq \sum _{i=1}^{N}E_{i}(t)\).
System dynamics
The SOC_{i}(t) (stateofcharge) of the participating batterystorage systems, i=1,...,N, together with the exogenous parameters of P_{PV,i}(t) (PV powergeneration) and P_{load,i}(t) (load consumption), and the activation factors (or control parameters) u_{i}(t), are sufficient to determine the evolution of the stateofcharge through time. We will often refer to this update mechanism as the system dynamics, which are also visualized in Fig. 1. We use the notation \(\Delta {P}_{i}(t) \doteq P_{\text {PV},i}(t)P_{\text {load},i}{i}(t)\) to denote the excess PV generation. Furthermore, for any quantity x, the notation “\(\hat {x}\)” will denote either estimated or forecast quantity.
The system dynamics comprises three main procedures (operations), namely:

Baseline computation which refers to the computation of the power from/to the battery under normal operation (i.e., when no additional charging or discharging potential is activated by the aggregator). This procedure is described by Algorithm 1.

Flexibility potential computation, that computes the total amount of energy that can be charged or discharged to the participating households. This procedure is described by Algorithm 2.

StateofCharge update, that computes the new stateofcharge (at the beginning of the next time interval t+1), when energy commitment level u_{i}(t) has been assigned. This procedure is described by Algorithm 3.
Regarding the baseline computation, we assume here that the participating batterystorage systems follow the following simple strategy: a) if ΔP_{i}(t)≥0 (i.e., PV power generation is larger than the load consumption), then any excess PV power generation is first used to charge the battery, and any additional power is fed into the grid, b) if ΔP_{i}(t)≤0 (i.e., the PV power is less than the load consumption), then the excess load consumption is first covered by the battery and if not sufficient additional power is withdrawn from the grid. We may argue that this criterion values autarky and ecofriendliness, since at times of energy excess the priority is to charge the battery, while at times of energy shortage, the priority is to use the energy available in the battery. Alternative criteria may also be employed, e.g., an economic optimum, where the objective would have been the maximization of the monetary utility of the participant.
Algorithm 3 defines the system dynamics update (i.e., the update of the SOC_{i}(t)) and uses the results of Algorithms 1 and 2 for the baseline and flexibilty computation. In summary, given the current stateofcharge SOC_{i}(t) (at the beginning of time interval t) and the planned flexibility extraction u_{i}(t) for that interval, updating the stateofcharge is performed as follows:
Since we will be addressing a DAoptimization, we will particularly be interested in updating the total flexibility potential, which leads to a different ordering of the above sequence of equations. In particular, given v_{d,i}(t), v_{c,i}(t), SOC_{i}(t) and u_{i}(t), we have:
In this case, we will refer to the overall available potentials {V_{d}(t),V_{c}(t)} as the state variables of the overall process and u(t)=[u_{i}(t)]_{i} as the control variables. The terms \(\Delta {P}\doteq [\Delta {P}_{i}(t)]_{i}\), which capture the available excess of power available in all households, are considered exogenous parameters or disturbances. To summarize, we can write the state dynamics of Eq. 2 more compactly as:
from which we can derive an aggregate flexibility update recursion of the form:
which updates the aggregate flexibility potential.
Optimal activation
In this section, we address the optimization problem of the optimal activation of some given flexibility (positive or negative), E(t), over a time interval t∈{1,...,T}. That is, we wish to compute the set of batteries that should be activated to provide total energy E(t), as well as the amount of flexibility extracted from each activated battery. This optimization problem is solved centrally by an aggregator which tries to minimize the overall activation or participation cost (defined in “Participation costs and notation” section through the positive cost factors β_{c,i},β_{d,i}). The solution of this optimization problem is the basis of the upcoming dayahead market optimization, and it has been presented in detail in Chasparis et al. (2019). It is evident that any flexibility potential E(t) extracted from the pool of batteries, at any given time interval t, should satisfy: V_{d}(t)≤E(t)≤V_{c}(t).
Forecast flexibility potential
The optimal activation problem is driven by our forecast energy potential for the next day. It is evident that we do not have knowledge of the actual flexibility potential during the time interval t of the next day, since this depends on the actual PV generation P_{PV,i}(t) and the actual load consumption P_{load,i}(t). For this reason, any optimization for computing the optimal activation for time interval t may only be based upon the forecast (or predicted) flexibility potential, \(\hat {v}_{c,i}(t)\) and \(\hat {v}_{d,i}(t)\). Analogously, we define the overall forecast charging and discharging flexibility potential as \(\hat {V}_{c}(t)\) and \(\hat {V}_{d}(t)\), respectively. Thus, the above activation constraint should be replaced by \(\hat {V}_{d}(t) \leq E(t) \leq \hat {V}_{c}(t).\)
Activation optimization
For each battery i=1,...,N, we wish to compute the activation factor u_{i}(t)∈[−1,1] for time interval t. This computation is provided by a function of the form \(\mathcal {O}_{\text {act},t}(E(t),\hat {\mathbf {v}}_{d}(t),\hat {\mathbf {v}}_{c}(t))\), which accepts as inputs, the desired total activation E(t) and the available forecast flexibility potential of the participating batteries. The output of this function will be the optimal activation factors \(\{u^{*}_{i}(t)\}_{i}\) for each one of the participating batteries i=1,...,N at time interval t. In other words, the optimal activation optimization is summarized by \(\mathcal {O}_{\text {act},t}:[\hat {V}_{d}(t),\hat {V}_{c}(t)]\times \mathbb {R}_{}\times \mathbb {R}_{+} \mapsto [1,1]^{N}\), such that:
where u^{∗} denotes the optimal activation. At the optimal activation of a positive flexibility (E(t)>0), we should expect that u^{∗}≥0 (elementwise). In other words, given that a positive activation has a strictly positive activation cost, a positive flexibility can only be optimally extracted through positive activations. Analogously, when E(t)<0, then u^{∗}≤0 (elementwise). To this end, we can decompose the problem of optimal activation as follows:
In particular, let us first consider the case of a negative (or discharging) desirable energy activation, i.e., E<0. In this case, the optimal activation is the solution to the following optimization problem:
Analogously, for the case of positive (or charging) desirable activation E>0, the optimization takes on the following form:
Optimal activation algorithm
Reference (Chasparis et al. 2019) provides an algorithm for computing the optimal activation (cf., (Chasparis et al. 2019, Algorithm 1)), and takes the form of a meritorder optimization. We will present here only the case of discharging activation, i.e., when E(t)<0, since the charging activation will be similar.
We first order the set of participating batteries in ascending order, with respect to the cost coefficient β_{d,i}(t). In other words, we order the batteries as follows: β_{d,1}(t)≤β_{d,2}(t)≤…≤β_{d,N}(t). Define also the function \(\kappa _{d}:[\hat {V}_{d}(t),\hat {V}_{c}(t)]\times \mathbb {R}_{}^{N}\mapsto \mathbb {N}\), such that κ_{d}+1 corresponds to the minimum number of batteries that should be activated in order to generate a total discharging flexibility potential of E(t)<0. In other words,
It is straightforward to show (using strong duality) that the optimal activation is given by Algorithm 4.
Proposition 1
(Proposition 7.2 in Chasparis et al. (2019)) The activation u^{∗} computed by Algorithm 4 is an optimal solution to optimization (7).
Dayahead (DA) optimization
Optimization under perfect forecasts
In this section, we present the optimization of flexibility extraction {E(t)}_{t} (positive or negative), during the next day t=1,...,T, that would be optimal for the aggregator to provide to the DA market. The aggregator would like to exploit the variations in the DA price over the next day in order to increase its revenues.
In this subsection, we will assume that the aggregator has perfect knowledge over the dynamics of the dayahead flexibility potential of the participating batteries, i.e., all the required forecasts are assumed perfectly known. In this case, the optimization problem for computing the optimal activations, given some exogenous sequence of the clearing DA prices, denoted by {p_{DA}(t)}_{t}, will be of the following form:
where,
represents the utility of the aggregator when provides flexibility E(t) (positive/negative) at time t, where \(\mathbb {I}_{A}:\mathbb {R}\mapsto \{0,1\}\) denotes the index function for some set \(A\subseteq \mathbb {R}\), i.e.,
Note that, when E(t)>0, the aggregator is charged with −p_{DA}(t)E(t) (since this energy is purchased from the DA market), and when E(t)<0, the aggregator is credited with −p_{DA}(t)E(t) (since this energy is sold to the DA market). The variable u^{∗}(t) is a solution to the activation optimization \(\mathcal {O}_{\text {act},t}\) presented in “Optimal activation” section, i.e., it is a function of the overall offered flexibility E(t). In other words, in order to evaluate the utility of an offered flexibility E(t), we should know the corresponding optimal activation (i.e., which batteries should contribute to generate E(t)). We write u^{∗}(t) instead of u^{∗}(E(t),v_{d}(t),v_{c}(t)) to simplify notation.
Due to the equilibrium constraints that should be satisfied by u^{∗} summarized in the constraint \(\mathbf {u}^{*}(t) \in \mathcal {O}_{\text {act},t}(E(t),\mathbf {v}_{d}(t),\mathbf {v}_{c}(t))\), optimization \(\mathcal {O}_{\text {DA}}\) corresponds to a mathematical program with equilibrium constraints (MPEC). The outcome of the DA optimization will be a sequence of optimal flexibilities E^{∗}(t) over the next day that the aggregator should commit to offer to the DA market.
Optimization under imperfect forecasts
In the presence of imperfect forecasts, the aggregator can still address the optimization problem (10) using forecasts of the flexibility potential, \(\hat {\mathbf {v}}_{d}(t)\) and \(\hat {\mathbf {v}}_{c}(t)\), and the DA price \(\hat {p}_{\text {DA}}(t)\). However, possible discrepancies between actual and estimated quantities will lead to imbalances between the promised commitment E^{∗}(t) (calculated using forecasts) and the available one. This can lead to imbalance costs.
Assuming that the imbalance price is p_{imb}(t) at time interval t, the actual objective function that the aggregator is facing when implementing E^{∗}(t) is instead:
where recall that u^{∗}(t) is the optimal activation under E^{∗}(t). In other words, the above utility function is the one realized by the aggregator after the completion of the optimization horizon and when the actual measured data (v_{d}(t), v_{c}(t) and p_{DA}(t)) are revealed. The last two terms of the actual utility function corresponds to the penalty that the aggregator pays due to the resulting energy imbalance.
Ideally, we would like that the actual utility of Eq. (11) replaces the perfectforecast utility of optimization (10), so that we also incorporate the expected costs of faulty forecasts. However, in order to incorporate these costs, it is necessary that we have available an accurate distribution of the forecast errors. Usually, such distribution of errors is not available. For this reason, in the following section, we propose a reinforcementlearning methodology, where, due to an averaging effect, the trained policy will incorporate the possibility of forecast errors.
Approximate dynamic programming (ADP) for DA optimization
ADP background and algorithm
We will consider a version of ADP that can be used to compute optimal policies for dynamic optimization problems of the form (10). The proposed scheme is based upon MonteCarlo simulations and leastsquares approximation.
ADP methodologies are based on the notion of costtogo, or better here utilitytogo. That is, for each timeinterval t∈{1,2,...,T}, we consider the objective function of the subproblem starting from t onwards (until the end of the optimization horizon T). The utilitytogo at time t will be denoted by \(J_{t}^{\mu _{t}}\), defined as follows \(J_{t}^{\mu _{t}}\doteq \sum _{\tau =t}^{T}\tilde {g}_{t}\), which is a function of the current state variables V_{d}(t), V_{c}(t) and the exogenously defined DA prices \(\{p_{\text {DA}}(\tau)\}_{\tau =1}^{T}\) during the optimization horizon. In other words, \(J_{t}^{\mu _{t}}=J_{t}^{\mu _{t}}\left (E(t),V_{d}(t),V_{c}(t),\{p_{\text {DA}}(\tau)\}_{\tau =1}^{T}\right)\). The superscript μ_{t} refers to the policy implemented at time t and it captures the reasoning based on which actions are selected as a function of the currently available information. The u^{∗}(t), which corresponds to the optimal activation of the selected flexibility E(t) will not be directly included as a parameter of \(J_{t}^{\mu _{t}}\), but it is indirectly taken into account when E(t) is implemented. The utilitytogo \(J_{t}^{\mu _{t}}\) also depends on the forecast PV generation and load consumption of the remaining optimization horizon, however we suppress this dependence to simplify notation.
For finitehorizon dynamic programming problems (of the form (10)), Bellman’s Dynamic Programming principle (cf., (Bertsekas 2000, Proposition 1.3.1)) states that a policy \(\mu _{t}^{*}\) that maximizes \(J_{t}^{\mu _{t}}\) at time t, is also an optimal policy for the original optimization problem (10) at time t. As we discussed though in “Optimization under imperfect forecasts” section, due to forecast errors, the actual realized utility is not g but \(\tilde {g}\) (possibly incorporating imbalance costs). Thus, the explicit form of \(J_{t}^{\mu _{t}}\) is not apriori known, and therefore an exact solution (i.e., a closedform expression for the optimal policy μ_{t}) cannot be computed in practice. To this end, we introduce instead approximation functions of the form \(Q_{t}(E(t),\hat {V}_{d}(t),\hat {V}_{c}(t),\{\hat {p}_{\text {DA}}(\tau)\}_{\tau =1}^{T})\) (so called, Qfactors) that approximate the utilitytogo function \(J_{t}^{\mu _{t}}\) based on the available forecasts. A different Qfactor applies to each time interval t, and it can be used to generate an approximate policy \(\hat {\mu }_{t}\) for this interval. In particular, the approximate policy \(\hat {\mu }_{t}\) can be computed as:
which is a function of the available information at time t, \(\hat {V}_{d}(t),\hat {V}_{c}(t),\{\hat {p}_{\text {DA}}(\tau)\}_{\tau =1}^{T}\).
A common approach that is used to compute the Qfactors is MonteCarlo simulations that proceeds as follows. Prior to any dayahead optimization, we first generate a timeseries forecast (sample path) for the overall PV generation, the overall load consumption and the DA price. Then, for each time interval t of this sample path, we compute the optimal flexibility commitment E^{∗}(t) using the approximate policy of Eq. (12). Then, at the end of the optimization horizon, we evaluate the performance of the approximate policy \(\hat {\mu }_{t}\) by computing the actual utilitytogo \(J_{t}^{\hat {\mu }_{t}}\) using the actual measurements. By utilizing the approximation error \(J_{t}^{\hat {\mu }_{t}}Q_{t}\), we can improve our approximation functions Q_{t}.
The specific form of the Qfactors considered here is the following:
for \(E \in [\hat {V}_{d}(t),\hat {V}_{c}(t)]\), where
is the average DA price in the future time intervals, and
is the average DA price in the previous time intervals. Furthermore, for some real number x, we denote \([x]_{+}\doteq \max \{x,0\}\) and \([x]_{}\doteq \min \{x,0\}\). Finally, the constants {α_{0,t},α_{1,t},...,α_{6,t}} for each interval t are the unknown parameters. We employ also the constraint α_{1,t},...,α_{6,t}≥0.
The Qfactors of Eq. (13) capture the anticipated utility/losses in the remaining optimization steps. In particular, the first part (multiplying α_{1,t}) captures the anticipated revenues when purchasing electricity (i.e., E>0), due to a future increase in the DA price. The fourth part (multiplying α_{4,t}) captures the anticipated revenues when selling electricity (i.e., E<0) due to a current increase in the DA price. The remaining parts intend to minimize different forms of anticipated opportunity costs, which may or may not eventually be part of the actual utilitytogo \(J_{t}^{\hat {\mu }_{t}}\). For example, the sixth term (multiplying α_{6,t}) captures the anticipated revenues of discharging the battery when anticipating a drop in the future electricity price (thus, giving the opportunity to charge the battery at later stages).
As discussed in detail in (Bertsekas and Tsitsiklis (1996), Section 6.2.1), an issue critical to the success of this method is whether the assumed form of the Qfactors is rich enough to capture the actual utilitytogo. Furthermore, the simulations should generate sample paths that are persistently exciting is order for the (leastsquares) approximation to converge (cf., (Ljung 1999)). As we will see in the forthcoming experimental evaluation, the considered form (13) will be sufficient to generate accurate approximations.
The details of the ADP algorithm implemented for the DA optimization is presented by Algorithm 5. It consists of two main processes, namely a) forward simulation (Step 3), and b) backward evaluation (Step 4). In the forward simulation, and for each sample path \(s\in \mathcal {S}\) (i.e., each simulation day), we simulate the action selection process (according to Step 3a) and using the current approximation factors \(\{Q^{(s)}_{t}\}_{t}\). In the backward evaluation, and starting from the last interval of each sample day, we evaluate the utilitytogo performance of the offered flexibility on the actual/realized data. This evaluation can then be used to better approximate the Q factors (Step 4c). We should expect that as we increase the number of tested sample paths, the approximation error of the utilitytogo approaches zero.
After the Qfactors have been trained, as Algorithm 5 demonstrates, at the beginning of each day, we can employ the trained factors to compute the optimal flexibility bidding as Step 3 of Algorithm 5 describes.
Discussion
The reasons for considering the proposed hierarchical and ADPbased optimization are the following: i) the original problem formulation of Eq. (10) constitutes a dynamicprogramming (DP) optimization problem (due to the battery dynamics); ii) the original problem is subject to equilibrium constraints (for each quantity of extracted flexibility there is an optimal activation); iii) the original problem involves integer (activation) variables; iv) an exact DP solution to the combined (integer) optimization of optimal bidding and optimal activation of the batteries is practically infeasible; v) by separating the decisions over the optimal energy commitment (Step 3a of Algorithm 5) from the (precomputed) optimal activation (Step 3b of Algorithm 5), the overall scheme avoids the computational complexity of the combined scheduling problem; vi) the ADP optimization provides strategies (rather than explicit schedules), thus the solution can be updated whenever updated forecasts become available; vii) the learningbased nature of the algorithm allows for capturing the effects of imperfect forecasts due to averaging.
Experimental evaluation
Algorithm 5 was implemented on realworld data collected from N=30 batterystorage systems located in the state of UpperAustria, over the duration of approximately one year. The simulation days used for training the Qfactors correspond to approximately 4000 days (since a simulation day was used more than once to achieve better training performance of the recursiveleastsquares filter implemented). For these evaluations, we considered ΔT=15min, which implies that each day consists of T=96 time intervals. The available data include recordings of PV power generation and load consumption.
At the beginning of each simulation day, the trained Qfactors are used to derive the optimal flexibility commitments over the upcoming day using forecast values of the unknown data (\(\widehat {\Delta {P}}\) and \(\hat {p}_{\text {DA}}\)). At the end of the upcoming day, when the actual data ΔP and p_{DA} have been revealed, the performance of these flexibility commitments are evaluated and the approximation errors over t=1,...,T are used to train the Qfactors. The approximation error of the utilitytogo function during training is depicted in Fig. 2, where we see that the considered Q_{1}factors (of the first time interval) can quite accurately approximate the actual utilitytogo J_{1}.
When the training of the Qfactors has been completed, we evaluate the trained factors on a test data set of sample paths (days), which have not been used during training. The outcome of these test evaluations are depicted in Fig. 3. Finally, Fig. 4 demonstrates the realtime implementation of the computed optimal flexibility over the next day. Note that discrepancies between the offered flexibility and the actual flexibility potential can still be observed due to forecast inaccuracies.
Conclusions and future work
This work presented an optimization framework for optimal participation of an aggregator in the DA spotmarket through the direct control of a set of residential batterystorage systems. The aggregator optimizes the amount of flexibility (the energy that can be charged/discharged in the participating batteries) that can be offered to the DA market during the next day. The optimization is based upon forecast PV generation and load consumption of the participating households over the next day. Given the expected forecast errors, as well as the complexity of the involved optimization, we proposed a reinforcementlearning methodology that trains over time to adapt to forecast inaccuracies and provides an optimal strategy for each time interval of the next day. Given that the outcome of the optimization is a strategy (rather than a specific schedule), we can immediately readjust the optimal schedules (e.g., in case of updated forecasts) without the need for a full reoptimization.
References
Bertsekas, D (2000) Dynamic Programming and Optimal Control. 2nd edn. Athena Scientific, Belmont.
Bertsekas, D, Tsitsiklis J (1996) NeuroDynamic Programming. Athena Scientific, Belmont.
Chasparis, G, Pichler M, Natschläger T (2019) A DemandResponse Framework in Balance Groups through Direct BatteryStorage Control In: 18th European Control Conference, 1392–1397, Napoli.
Chen, C, Wang J, Kishore S (2014) A Distributed Direct Load Control Approach for LargeScale Residential Demand Response. IEEE Trans Power Syst 29(5):2219–2228.
Flex+. https://www.flexplus.at. https://doi.org/10.1109/TPWRS.2009.2016607.
GomezVillalva, E, Ramos A (2003) Optimal energy management of an industrial consumer in liberalized markets. IEEE Trans Power Syst 18(2):716–723. https://doi.org/10.1109/TPWRS.2003.811197.
Herter, K (2007) Residential implementation of criticalpeak pricing of electricity. Energy Policy 35(4):2121–2130.
He, G, Chen Q, Kang C, Pinson P, Xia Q (2016) Optimal Bidding Strategy of Battery Storage in Power Markets Considering PerformanceBased Regulation and Battery Cycle Life. IEEE Trans Smart Grid 7(5):2359–2367. https://doi.org/10.1109/TSG.2015.2424314.
Iria, JP, Soares FJ, Matos MA (2017) Trading small prosumers flexibility in the dayahead energy market In: 2017 IEEE Power & Energy Society General Meeting, 1–5.. IEEE, Chicago. https://doi.org/10.1109/PESGM.2017.8274488.
Jiang, DR, Powell WB (2015) Optimal HourAhead Bidding in the RealTime Electricity Market with Battery Storage Using Approximate Dynamic Programming. INFORMS J Comput 27(3):525–543. https://doi.org/10.1287/ijoc.2015.0640.
Kairies, K, Haberschusz D, Ouwerkerk J, Strebel J, Wessels O, Magnor D, Badeda J, Sauer U (2016) Wissenschaftliches Messund Evaluierungsprogramm Solarstromspeicher 2.0. Jahresbericht. Technical Report, Institut für Stromrichtertechnik und Elektrische Antriebe, RWTH Aachen.
Ljung, L (1999) System Identification: Theory for the User. 2nd edn. Prentice Hall Ptr, Upper Saddle River.
MohsenianRad, H (2016) Optimal Bidding, Scheduling, and Deployment of Battery Systems in California DayAhead Energy Market. IEEE Trans Power Syst 31(1):442–453. https://doi.org/10.1109/TPWRS.2015.2394355.
Nan, S, Zhou M, Li G (2018) Optimal residential community demand response scheduling in smart grid. Appl Energy 210:1280–1289. https://doi.org/10.1016/j.apenergy.2017.06.066.
Nguyen, HK, Song JB, Han Z (2015) Distributed Demand Side Management with Energy Storage in Smart Grid. IEEE Trans Parallel Distrib Syst 26(12):3346–3357. https://doi.org/10.1109/TPDS.2014.2372781.
Parvania, M, FotuhiFiruzabad M, Shahidehpour M (2013) Optimal Demand Response Aggregation in Wholesale Electricity Markets. IEEE Trans Smart Grid 4(4):1957–1965. https://doi.org/10.1109/TSG.2013.2257894.
Philpott, AB, Pettersen E (2006) Optimizing DemandSide Bids in DayAhead Electricity Markets. IEEE Trans Power Syst 21(2):488–498. https://doi.org/10.1109/TPWRS.2006.873119.
Ruiz, N, Cobelo I, Oyarzabal J (2009) A Direct Load Control Model for Virtual Power Plant Management. IEEE Trans Power Syst 24(2):959–966.
Sayed, A (2003) Fundamentals of Adaptive Filtering. John Wiley & Sons, Inc., New Jersey.
Triki, C, Violi A (2009) Dynamic pricing of electricity in retail markets. 4OR 7(1):21–36.
Xu, Y, Li N, Low SH (2016) Demand Response With Capacity Constrained Supply Function Bidding. IEEE Trans Power Syst 31(2):1377–1394.
About this supplement
This article has been published as part of Energy Informatics Volume 2 Supplement 1, 2019: Proceedings of the 8th DACH+ Conference on Energy Informatics. The full contents of the supplement are available online at https://energyinformatics.springeropen.com/articles/supplements/volume2supplement1.
Funding
This work has been supported by the Austrian Research Agency FFG through the research project Flex+ (FFG # 864996). It has also been partially supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH. Publication of this supplement was funded by Austrian Federal Ministry for Transport, Innovation and Technology.
Availability of data and materials
The data used in this paper have been extracted from 30 batterystorage systems located in the state of Upper Austria. Due to privacy issues and nondisclosure agreements, these data are not publicly available.
Author information
Authors and Affiliations
Contributions
GC and MP have designed and simulated the ADPbased optimization framework for the optimal activation and bidding of flexibility potential in the Dayahead electricity market. JS and TE have provided feedback and data with respect to the operation of the Dayahead electricity market and the participation of pools of flexible loads. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chasparis, G.C., Pichler, M., Spreitzhofer, J. et al. A cooperative demandresponse framework for dayahead optimization in battery pools. Energy Inform 2 (Suppl 1), 29 (2019). https://doi.org/10.1186/s421620190087x
Published:
DOI: https://doi.org/10.1186/s421620190087x
Keywords
 Demandresponse
 Aggregation
 Reinforcement learning
 Dayahead spot market