1 Introduction

1.1 Problem statement

Traditionally, the way in which trading wind energy has been considered relied on a two-step approach. These start with the predictive modeling of future energy generation (within either deterministic or probabilistic frameworks). Such forecasts are subsequently used as input to expected utility maximization strategies or, alternatively, some more general forms of optimization problems, e.g., within a stochastic framework and accommodating risk aversion. Although fruitful, these methodologies may be computationally expensive. As a representative recent example, for a scenario-based stochastic optimization setup to offer in electricity markets, Kraft et al (2023) mentions that computational costs may reach 3 h for a single trading instance. In addition, the value of the final decisions is highly affected by the quality of the forecasts employed. This fact was looked at for the general case of newsvendor problems (which are the type of stochastic opimization problems at hand here) by Maggioni et al (2019), while a detailed investigation of the impact of forecast quality on optimization in electricity markets (though, not exactly for market participation problems), was detailed in Ordoudis and Pinson (2016). As a consequence, it may be beneficial to integrate the forecasting and decision-making steps, within a so-called prescriptive analytics framework (Bertsimas and Kallus 2019). In parallel, electricity markets are amid rapid transformations towards reducing granularity and lead times, facilitating the integration of non-dispatchable energy sources but increasing the computational and adaptability requirements of the offering algorithms.

In a data-rich and nonstationary environment, approaches relying on online learning and online convex optimization are of direct relevance. For a very complete introduction to these topics, the reader is referred to Shalev-Shwartz et al (2012). On the one hand, online learning algorithms free the decision-maker from most assumptions about the wind or market dynamics, since it does not require specific probabilistic forecasts or models about such dynamics. This is more generally the case for a broad range of prescriptive analytics approaches that bypass the forecasting step. On the other hand, online learning algorithms are typically efficient methods capable of adapting to the increasing computational needs (as will be illustrated by the numerical case study in this paper). Furthermore, the online learning analysis is based on regret as opposed to the classical maximization of the expected utility, possibly allowing to derive additional insights into the properties of trading strategies.

1.2 Status quo with trading wind energy and underlying newsvendor problems

Most wind energy is traded in wholesale electricity markets (referred to as forward markets in this paper), where an offer is submitted prior to the actual delivery of energy. However, the stochastic nature of wind energy entails incurring deviations from the original offer. There are countless ways of approaching this problem depending on the market structure and how uncertainty is accommodated, and therefore, it is infeasible to fully address such a vast literature. However, let us provide an overview in the following. As a starting point, and since there is no single authoritative review that covers this topic of renewable energy offering in electricity markets, we refer the reader to Morales et al (2014), where the authors study different market variants and strategies assuming a classical stochastic programming framework, as well as Conejo et al (2010), which introduces general concepts of decision-making under uncertainty within electricity markets. We deal, in particular, with markets with a dual-price settlement for imbalances, under which there is no possibility of benefiting from a deviation and where imbalance penalties are asymmetric.

Early works in this area proposed an optimal quantile strategy based on probabilistic forecasts for wind energy production (Bremnes 2004). Specifically, Pinson et al (2007) showed that, in its simplest version of a risk-neutral wind farm without any other assets (e.g., storage, conventional generation), the offering problem necessarily takes the form of a newsvendor problem. Various generalizations were explored by others. Zugno et al (2013a) proposed constraining the offer in both power and probability spaces in order to accommodate risk aversion and behavioral aspects of trading (e.g., anchoring effects towards traditional single-valued forecasts). In parallel, Mazzi and Pinson (2016) devised and tested a reinforcement learning algorithm to track the optimal quantile in a nonstationary environment. Similarly, Dent et al (2011) revisited the problem by accounting for the possibility of a population-based price-making behavior. And, for more complex versions of the offering problems, one can revert to a stochastic programming setup (Morales et al 2010), for instance, owing to inter-temporal constraints, or risk-aversion. If generally considering market offering problems where renewable energy producers are not price-takers (i.e., their decision can then affect market outcomes), Baringo and Conejo (2013), as well as Zugno et al (2013b), have proposed approaches based on bilevel optimization. Recently, Kakhbod et al (2021) have investigated the population effect of renewable energy producers and how this affects their offering strategies. Even though these varied approaches explore alternative angles to generalizing the underlying newsvendor problems in wind energy offering in electricity markets, they still require a two-step procedure (i.e., “predict, then optimize"). In contrast, a prescriptive approach does not require a forecasting step, since it directly goes from input data to decision. Consequently, there is no need to describe future wind power generation and market quantities. Hence, no assumption is made about their dynamics.

Inspired by new advances in decision making under uncertainty in data-rich environments, this problem regained interest in recent years within a prescriptive analytics framework (hence, by integrating forecasting and optimization steps). As a representative example, Stratigakos et al (2022) used an ensemble of decision trees that considers the objective function to estimate the energy production. From the modeling perspective, the work of Muñoz et al (2020) is one of the closest to ours, also aligned with the new stream of research that utilizes features to produce context-specific decisions in a fully data-driven environment. They built upon recent advances with data-driven newsvendor problems (Ban and Rudin 2019), and proposed an approach that iteratively solves a linear optimization problem to update offering decisions. Although relatively inexpensive, the computation time involved may become an issue in electricity markets like the Australian NEM,Footnote 1 where trading and dispatching is based on 5-minute time steps and updates. Moreover, this approach seems redundant in the sense that the complete optimization problem is solved at each and every trading session, even though consecutive training sets may only differ by one or a few samples. Such pitfalls motivates our proposal to explore alternative approaches to wind energy offering in electricity markets.

1.3 From optimization to online learning

Instead of using optimization directly, we introduce an offering approach within an online learning paradigm. Online learning can be seen as a special case of online convex optimization (OCO – considering convex loss functions only) where, instead of tracking optimal decisions, one adaptively and recursively estimate parameters of decision rules (often also referred to as policies). Decision rules are functions that yield decisions based on values of relevant input features. For an introduction to online optimization, we refer the reader to the surveys of Shalev-Shwartz et al (2012) and Hazan et al (2016). In addition, for the case of online learning, a recent extensive textbook-like coverage is given by Orabona (2022).

Within OCO, we place emphasis on algorithms that continuously update variables based on gradients (or subgradients) of a convex objective function. Whenever new values of input features and outcomes become available, these algorithms make a step along the gradient, towards the optimum. They ideally accommodate problems for which a closed-form expression to evaluate the sub-gradient exists (and fast to compute) (Duchi et al 2011; Zheng 2011). The well-known online gradient descent approach can be traced back to Zinkevich (2003) and inspired many further developments. Among those are numerous applications within power system operation and electricity markets (Gan and Low 2016; Hauswirth et al 2017; Colombino et al 2019; Guo et al 2021; Yuan et al 2022). These methods offer long-term regret guarantees (Orabona 2022).

Within the frame of decision-making under uncertainty, the strategy followed by online gradient descent algorithms is in sharp contrast with optimization approaches. These latter approaches solve an independent optimization problem with a different training set (a batch of data) every time a decision has to be updated, e.g., the parameter of the decision rule in Muñoz et al (2020). Under convexity assumptions, an optimal solution can be found to each optimization problem, meaning that no single decision can ever achieve better performance on average in that training set. However, there is no certainty that the out-of-sample performance of such a decision enjoys the same privilege in finite sample sets. Instead, only probability guarantees can be offered even if the samples are i.i.d. (Van Parys et al 2021).

Indeed, when the underlying data generating processes are nonstationary, the out-of-sample performance can be very poor. This issue can be partly compensated by using a rolling window setting (Bashir and Lehtonen 2018) that updates the variables frequently. However, there can also be substantial changes within the training set. In that case, the performance of batch optimization approaches may be affected by old samples that do not reflect current conditions. On the contrary, online gradient descent algorithms update the parameters of decision rules through a point-wise update that involves the most recent information only, which enables capturing changes in the characteristics of the underlying data generating processes. Therefore, online gradient methods do not only offer computational advantages. They may also outperform established approaches, e.g., using linear programming with contextual information (even if using a sliding window scheme). This is illustrated based on the toy model examples in Sect. 4, as well as the case study in Sect. 5. Their superiority eventually is in terms of both (i) better tracking of the optimal solution within a nonstationary environment, as well as (ii) an increase in market revenues.

1.4 Contributions and structure

The Australian NEM is an example of the existing trend towards shortening lead times and increasing granularity in electricity markets. These developments reduce operational and forecast uncertainty, hence facilitating the integration of stochastic renewable energy sources.Footnote 2 At the same time, they increase computational needs and require methodologies that adapt to changes in rapid manner. To face these new challenges, we propose an algorithm that combines a feature-driven newsvendor model inspired by Ban and Rudin (2019) with a variant of the online gradient descent algorithm presented in Zeiler (2012). We conceive a case study in which we analyze an hourly forward market that closes just before the start of the next period. It relies on actual data from the Danish Transmission System Operator (TSO), Energinet,Footnote 3 and provide a relevant test bench to illustrate and discussion the salient features of our approach. To the best of our knowledge, this is the first paper that analyzes the problem of trading wind energy in an online learning setting. The contributions of our work are threefold:

  • we develop an online offering algorithm within an online learning framework. Results show that this algorithm is computationally inexpensive and achieves substantial economic profits;

  • we propose a new nonstationary regret benchmark against which we empirically compare our algorithm;

  • we showcase the ability of the proposed algorithm to adapt to nonstationary scenarios through a concise illustrative example. In addition, we analyze the superior economic performance and computational efficiency of our approach based on a case study using real-world data (published by the Danish TSO, Energinet) for a period of more than five.

The remaining of the manuscript is structured as follows: Sect. 2 introduces the problem of a wind farm offering in the forward market, and for which balancing using a two-price imbalance settlement. Section 3 develops a new offering algorithm based on an adaptive gradient descent algorithm and explores several performance metrics. Section 4 is built upon two illustrative examples that investigate the behavior of an alternative online implementation and the dynamic response of this algorithm in comparison with previous rolling window approaches. Section 5 empirically analyzes the performance of our proposed algorithm in a case study based on real data retrieved from the Danish TSO, Energinet. Finally, conclusions and perspectives for future work are gathered in Sect. 6.

2 Preliminaries

2.1 Mathematical notations

We introduce here some of the most relevant mathematical notations used throughout the paper. These are placed into context when further describing the optimization and learning problems at hand in the following. In terms of indices and sets, we use j as an index for features and auxiliary information, while t is an index for time periods (hours in practice, or programme time units in the electricity market of interest). These time indices are gathered within 2 sets \({\mathcal {T}}^{\textrm{in}}\) and \({\mathcal {T}}^{\text{oos}}\), which are for training (in-sample) and testing (out-of-sample), respectively.

When looking at newsvendor problems and offering in electricity markets, key parameters include \(\psi ^{+}_t\), the marginal opportunity cost for overproduction at hour t (€/MWh), and \(\psi ^{-}_t\), the marginal opportunity cost for underproduction at hour t (€/MWh). These are defined based on \(\lambda ^{\textrm{F}}, \lambda ^{\text {UP}}\), and \(\lambda ^{\text {DW}} \in {\mathbb {R}}\), which are the forward, up-regulation and down-regulation prices, respectively. In terms of the renewable energy producer, the asset or portfolio at hand has a nominal capacity \({\overline{E}}\), also translating to a maximum offer in terms of energy in the market for each and every programme time unit (hence, we express \({\overline{E}}\) in MWh eventually). The decision variable is then the energy bid \(E^{\textrm{F}}_t\) (MWh) for that time, while the amount of energy actually produced is \(E_t\) (MWh). Within our data-driven framework, that decision is based on a vector \({\textbf{x}}_t\) of auxiliary information (i.e., features), associated to a decision rule vector \({\textbf{q}}_t\).

Finally, for the type of online learning approach described in the following, the method and resulting algorithm rely on the gradient or subgradient of the objective function at hand, which we denote by \({\textbf{g}}_t\), as well as a dynamic learning vector \(\varvec{\eta }_t\). We write \(g_{t,j}\) and \(\eta _{t,j}\) the \(j{\text {th}}\) components of these vectors. The algorithm has a number of hyperparameters involved, i.e., \(\mu\) as a forgetting factor to temporally smooth the marginal opportunity costs \(\psi ^{+}_t\) and \(\psi ^{-}_t\), \(\eta\) to control the learning rate, \(\alpha\) to smooth the discontinuity in the derivative of the pinball loss function, and \(\rho\) as a decay constant that controls the adaptation to new gradients. A strictly positive, though small, constant \(\epsilon\) is used in the definition of the dynamic learning vector \(\varvec{\eta }_t\) in order to avoid dividing by 0.

2.2 Newsvendor problem on a rolling time-window

We first introduce the problem of a wind farm offering in a forward market, which is cleared some time before their actual production is realized. Therefore, the producer is likely to suffer deviations from her offer. These are settled ex-post in a real-time (balancing) market under a two-price imbalance settlement mechanism. Furthermore, the offer is assumed to be always accepted, as the marginal operational cost of wind farms is close to zero and therefore this technology is usually prioritized for being scheduled. The eventual market revenue \(\rho \in {\mathbb {R}}\) of a wind farm is given by the summation of the amounts obtained in the forward (\(\rho ^{\textrm{F}}\)) and in the balancing markets (\(\rho ^{\textrm{B}}\)), i.e.,

$$\begin{aligned} \rho = \rho ^{\textrm{F}} + \rho ^{\textrm{B}} = \lambda ^{\textrm{F}} E^{\textrm{F}} - \lambda ^{\text {UP}} \left( E^{\textrm{F}} - E\right) ^{+} + \lambda ^{\text {DW}} \left( E- E^{\textrm{F}}\right) ^{+} \,, \end{aligned}$$
(1)

where \((a)^{+}= \max (a,0)\). In addition, the unknown parameters \(\lambda ^{\textrm{F}}, \lambda ^{\text {UP}}\), and \(\lambda ^{\text {DW}} \in {\mathbb {R}}\) are the forward, up-regulation and down-regulation prices, respectively. The key decision variable for the wind farm is her offer \(E^{\textrm{F}} \in {\mathbb {R}}^+\) at the forward stage. Note that \(E \in {\mathbb {R}}^+\) denotes the actual realization of her stochastic energy production, which is obviously unknown at the forward stage. In accordance to (1), the revenue (\(\lambda ^{\textrm{F}} E^{\textrm{F}}\)) from the forward stage is then altered when the producer deviates from her offer \(E^{\textrm{F}}\). When the production is greater than expected \(E \ge E^{\textrm{F}}\), the producer is to sell excess energy generation \(E - E^{\textrm{F}} > 0\) at the downward regulation at price \(\lambda ^{\text {DW}}\). On the contrary, if she produces less than her forward offer \(E \le E^{\textrm{F}}\), the wind farm has to buy the missing energy \(E^{\textrm{F}} - E > 0\) at teh upward regulation at price \(\lambda ^{\text {UP}}\). Under a two-price imbalance settlement, one has \(\lambda ^{\text {UP}} \ge \lambda ^{\textrm{F}}\) and \(\lambda ^{\text {DW}} \le \lambda ^{\textrm{F}}\), with at most one of them different from \(\lambda ^{\textrm{F}}\) (Morales et al 2014, Ch. 7). In accordance with the aforementioned description, let \(\psi ^{+}, \psi ^{-} \in {\mathbb {R}}^+\) denote penalties for over- or under-production as

$$\begin{aligned} \psi ^{+}&= \lambda ^{\textrm{F}} - \lambda ^{\text {DW}}, \end{aligned}$$
(2)
$$\begin{aligned} \psi ^{-}&= \lambda ^{\text {UP}} - \lambda ^{\textrm{F}} \, . \end{aligned}$$
(3)

Using (2) and (3) and the equivalence \(E - E^{\textrm{F}} = (E - E^{\textrm{F}})^{+} - (E^{\textrm{F}} - E)^{+}\), we reformulate (1) as

$$\begin{aligned} \rho = \lambda ^{\textrm{F}} E - \left( \psi ^{+} \left( E - E^{\textrm{F}}\right) ^{+} + \psi ^{-} \left( E^{\textrm{F}} - E\right) ^{+} \right) \,. \end{aligned}$$
(4)

Note that the first term of (4) is out of the control of the price-taker wind farm, as both \(\lambda ^{\textrm{F}}\) and E are uncertain parameters. Therefore, the profit-maximizing offer \(E^{\textrm{F}^{*}}\) of the wind farm in the forward market can be computed by minimizing the expected deviation cost as

$$\begin{aligned} E^{\textrm{F}^{*}} = \mathop {\mathrm {arg\,min}}\limits _{E^{\textrm{F}} \in [0, {\overline{E}}]} {\mathbb {E}} \left[ \psi ^{+}\left( E - E^{\textrm{F}}\right) ^{+} + \psi ^{-}\left( E^{\textrm{F}} - E\right) ^{+} \right] , \end{aligned}$$
(5)

where \({\mathbb {E}}[\cdot ]\) is the expectation operator. The optimization program (5) solves an instance of the very well-studied newsvendor model (Qin et al 2011). Under a price-taker scenario, i.e., when the market participant’s decision are assumed not to affect market outcomes, an analytical solution to (5) can be computed with (Bremnes 2004; Pinson et al 2007)

$$\begin{aligned} E^{\textrm{F}^{*}} = F^{-1}_{E}\left( \frac{{\bar{\psi }}^{+}}{{\bar{\psi }}^{+}+{\bar{\psi }}^{-}}\right) , \end{aligned}$$
(6)

where \(F^{-1}_{E}(.)\) is the cumulative distribution function (cdf) of the renewable energy production and the overline denotes the expected value of the random variable (estimated as the average over available data). The reader is referred to Maggioni et al (2019) for a discussion about the value of right distribution in newsvendor applications.

On top of the fact that the true distribution of the wind production and the optimal quotient are generally unknown, (6) suffers from another major drawback, which is its inability to directly profit of additional information that may be available, e.g., wind energy forecasts for neighboring areas, or additional information about the state of the electricity market. In fact, it is usually the case that the wind farm operator has access to a vector of auxiliary information, also known as features \({\textbf{x}} \subseteq {\mathcal {X}} \in {\mathbb {R}}^p\), where p denotes the dimension of the feature vector. This feature vector may help explaining the behavior of the uncertain parameters in (5). As proposed by Ban and Rudin (2019), this information can be exploited in newsvendor instances assuming that the optimal offer follows a linear decision rule of the form \(E^{\textrm{F}}: {\mathcal {X}} \rightarrow {\mathbb {R}}\), \(E^{\textrm{F}} = {\textbf{x}}^\top {\textbf{q}}\) with \({\textbf{q}} \in {\mathbb {R}}^p\) being a decision vector that parameterizes the linear model. This decision rule can easily reproduce an intercept setting a component of the feature vector \({\textbf{x}}\) equal to one. Then, considering that a set of historical samples \(\left\{ (E_t, \psi _t^{-}, \psi _t^{+}, {\textbf{x}}_t), \, \forall t\in {\mathcal {T}}^{\textrm{in}} \right\}\) is available, we compute the best decision \({\textbf{q}}^{\text {LP}}\) for this set by solving the following linear program:

$$\begin{aligned} {\textbf{q}}^{{\text {LP}}^{*}} = \mathop {\mathrm {arg\,min}}\limits _{{\textbf{q}}}&\frac{1}{|{\mathcal {T}}^{\textrm{in}}|} \sum _{t\in {\mathcal {T}}^{\textrm{in}}} \psi _{t}^{+}\left( E_t - {\textbf{x}}^\top _t {\textbf{q}} \right) ^{+} + \psi _{t}^{-}\left( {\textbf{x}}^\top _t {\textbf{q}} - E_t\right) ^{+} \end{aligned}$$
(7a)
$$\begin{aligned} \text {s.t.}\,&0 \le {\textbf{x}}^\top _t {\textbf{q}} \le {\overline{E}}, \, \forall t\in {\mathcal {T}}^{\textrm{in}}, \end{aligned}$$
(7b)

where \(|\cdot |\) denotes the cardinality of a set. Note that this model does not implicitly assume a price-taker scenario. In fact, correlations between penalties and wind features may be captured in systems with high wind power penetration. Although the linear structure of the mapping may seem restrictive, more complex relationships can be obtained by transforming the feature space, e.g., using a Taylor approximation (Ban and Rudin 2019) or a spline basis. Next, by defining the box projection

$$\begin{aligned} \pi ({\textbf{x}}, {\textbf{q}}) = \min \left(\max (0, {\textbf{x}}^\top {\textbf{q}}), {\overline{E}} \right), \end{aligned}$$
(8)

the optimal offer derived from new contextual information \({\textbf{x}}_{t'}\) can be computed as \(E^{\textrm{F}}_{t'} = \pi ({\textbf{x}}_{t'}, {\textbf{q}}^{\text {LP}})\). As discussed in Muñoz et al (2020), when new points are incorporated into the dataset \({\mathcal {T}}^{\textrm{in}}\), the problem (7) can be iteratively solved to update the value of \({\textbf{q}}^{\text {LP}}\). In the remaining of the manuscript, we refer to this approach as LP (from Linear Programming).

3 Online learning in newsvendor problems

In the Online Convex Optimization (OCO) framework, a decision-maker faces an online learning problem where iterative decisions are to be made. The cost of each decision is determined by a convex loss function \(f_t: {\mathbb {R}}^{d_z} \rightarrow {\mathbb {R}}\) unknown beforehand. After a decision \({\textbf{z}}_t \in Z \subseteq {\mathbb {R}}^{d_z}\) is made, the decision-maker learns \(f_t\) and pays \(f_t({\textbf{z}}_t)\). Within OCO the Online Gradient Descent (OGD) algorithm, introduced by Zinkevich (2003), has proven to be very effective and versatile (Gan and Low 2016; Narayanaswamy et al 2012; Hauswirth et al 2016; Nonhoff and Müller 2020; Wood et al 2021). Starting from an initial value, the OGD performs iterative updates \({\textbf{z}}_t\) based on (sub-)gradients of \(f_t\), denoted as \({\textbf{g}}_t\) from hereon. The magnitude of the step is controlled through a variable learning rate \(\eta _t\). On each round, the updated vector is forced to lie within the feasible region Z through the Euclidean projection. In the OGD we rely on just the last point learned to obtain a gradient, thus resulting in a computationally inexpensive method, especially if the gradient and projection can be computed through a closed-form expression.

The selection of the learning rate is of paramount importance. The original proposal by Zinkevich (2003) presents two main alternatives, namely, a variable and a fixed learning rate. In a dynamic environment, the classical choice \(\eta _t \in {\mathbb {R}}^+\), \(\eta _t \propto t^{-1/2}\) where \(\propto\) denotes the proportional operator, is not suitable due to the fact that \(\lim _{t \rightarrow \infty } = 0\), reducing the ability to track changes as t increases. Alternatively, one could select a fixed value \(\eta _t = \eta\) that keeps this capacity unaltered but may lose the fast convergence that the initial high values of \(\eta _t\) provide. Regardless of the selection, both choices are scale-dependent and treat each component of the gradient vector equally. To tackle this, McMahan and Streeter (2010) and Duchi et al (2011) propose to use a component-wise adaptive rate \(\varvec{\eta }_t \in {\mathbb {R}}^p\) and \(\eta _{t,j} = \eta (\sum _{k=1}^t g_{k,j}^2)^{-1/2}\) where \(g_{t,j}\) is a component of the gradient vector \({\textbf{g}}_t = [g_{t,1},..., g_{t,j},..., g_{t,p}]^{\top }\). As in the case of \(\eta _t \propto t^{-1/2}\), the previous expression is monotonically decreasing (component-wise), again limiting the long-term ability to learn. Aware of this limitation, Zeiler (2012) suggests an exponentially decaying average of the squared gradients to modulate the learning rate based on the most recent information. We employ this gradient descent variant to implement our algorithm in Sect. 3.1.

In the online learning community, the de facto metric to evaluate the performance of a series of decision vectors \({\textbf{z}}_1,..., {\textbf{z}}_T\) is the regret \({\mathcal {R}}_T \in {\mathbb {R}}\). The regret provides a versatile and, in a sense, normalized metric to compare an algorithm through different problems with the advantage that little assumption is made about the oracle that generates the decisions. Traditionally, the benchmark used to compute regret is the best single action in hindsight that can be obtained as the solution to an offline optimization problem under perfect information. However, in a dynamic environment, this benchmark can be beaten easily. In Sect. 3.3 we propose an alternative benchmark more suitable for the nonstationary context of the wind energy problem.

3.1 Online newsvendor

In this section, we particularize the gradient descent introduced in the previous paragraphs to the context of the wind farm offering in a forward market, incorporating elements of the rolling window problem presented in Sect. 2. We name the resulting algorithm OLNV (from OnLine NewsVendor). Contrary to the rolling window approach, the OLNV algorithm updates \({\textbf{q}}\) based on the information provided by the last realization. The objective function (7a) when the set \({\mathcal {T}}^{\textrm{in}}\) reduces to one sample yields

$$\begin{aligned} NV_t({\textbf{q}}) = \psi _{t}^{+}\left( E_t - {\textbf{x}}^\top _t {\textbf{q}} \right) ^{+} + \psi _{t}^{-}\left( {\textbf{x}}^\top _t {\textbf{q}} - E_t\right) ^{+}. \end{aligned}$$
(9)

The OLNV method requires computing a gradient of the objective function, for which we analyze two alternative procedures in the following paragraphs.

The first approach is inspired by the work of Zheng (2011) on the pinball loss, a particular case of the objective function found in newsvendor models. Since the pinball loss is not strictly differentiable, the authors propose an alternative smooth approximation to ensure that computing gradients is always possible. Note that in our case the objective function (9) is not differentiable at \(E_t = {\textbf{x}}_t^\top {\textbf{q}}\). Therefore, we first propose to circumvent this issue extending the approach in Zheng (2011) to the more general expression (9) that considers arbitrary (positive) penalties as

$$\begin{aligned} NV_{t, \alpha }({\textbf{q}})&= \psi ^{+}_t \left( E_t - {\textbf{x}}^\top _t {\textbf{q}}\right) + \alpha \left( \psi ^{+}_t + \psi ^{-}_t\right) \log \left( 1 + e^{ - (E_t - {\textbf{x}}^\top _t {\textbf{q}}) / \alpha }\right) \, , \end{aligned}$$
(10)

where \(\alpha > 0\) is a parameter that controls the approximation and where higher values of this parameter result in smoother functions. The function \(NV_{t, \alpha }\) is convex in \({\textbf{q}}\) and upper bounds \(NV_t\) for any value of \({\textbf{q}}\) as proven in Propositions 1 and 2 in Appendix, respectively. Then, we derive a closed-form solution to obtain gradients of (10), yielding

$$\begin{aligned} \nabla NV_{t, \alpha }({\textbf{q}})&= \bigg (-\psi ^{+}_t + (\psi ^{+}_t + \psi ^{-}_t) \frac{1}{1 + e^{(E_t - {\textbf{x}}^\top _t {\textbf{q}}) / \alpha }} \bigg ) {\textbf{x}}_t \, . \end{aligned}$$
(11)

The second approach deals directly with the objective function as formulated in (9). Even though the original objective is not strictly differentiable, a variant of the OLNV algorithm is readily applicable to subdifferentiable functions, provided that a subgradient can be computed instead (Orabona 2022). In this case, the mapping that returns a subdifferential of (9) is given by

$$\begin{aligned} \partial NV_t({\textbf{q}}) = {\left\{ \begin{array}{ll} - \psi ^{+}_t {\textbf{x}}_t, &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} > 0, \\ \psi ^{-}_t {\textbf{x}}_t, &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} < 0, \\ {[}-\psi ^{+}_t {\textbf{x}}_t, \psi ^{-}_t {\textbf{x}}_t], &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} = 0 \, . \\ \end{array}\right. } \end{aligned}$$
(12)

Note that, when \(E_t - {\textbf{x}}^\top _t {\textbf{q}} = 0\), any value in the interval \([- \psi ^{+}_t {\textbf{x}}_t, \psi ^{-}_t {\textbf{x}}_t]\) is a legitimate subgradient belonging to \(\partial NV_t({\textbf{q}})\). For the sake of simplicity and reproducibility, the implementation of our algorithm returns zero whenever this condition is fulfilled.

Once a gradient as in (11) or a subgradient as in (12) has been computed, the key step of OLNV is to update \({\textbf{q}}_t\) using a multidimensional learning rate \(\varvec{\eta }_t \in {\mathbb {R}}^p\) through

$$\begin{aligned} {\textbf{q}}_{t+1}&= \Pi \left( {\textbf{q}}_{t} - \varvec{\eta }_t \circ {\textbf{g}}_t, {\textbf{x}}_t\right) \, , \end{aligned}$$
(13)

where \(\circ\) denotes the element-wise product, \({\textbf{g}}_t = \nabla NV_{t, \alpha }({\textbf{q}}_t)\) or \({\textbf{g}}_t = \partial NV_t({\textbf{q}}_t)\) depending on the implementation of OLNV, and \(\Pi\) is a projection operator defined as \(\Pi : {\mathbb {R}}^p \times {\mathcal {X}} \rightarrow {\mathbb {R}}^p\). Precisely, \(\Pi\) maps its arguments into the solution of the following optimization problem:

$$\begin{aligned} \Pi ({\textbf{o}}, {\textbf{x}})&= \mathop {\mathrm {arg\,min}}\limits _{{\textbf{q}} \in Q({\textbf{x}})} \frac{1}{2} \left| \left| {\textbf{q}} - {\textbf{o}}\right| \right| _2 \, ,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad ({\mathcal {P}}) \end{aligned}$$

where o represents a candidate to update the decision vector and is computed \(o = {\textbf{q}}_{t} - \varvec{\eta }_t \circ {\textbf{g}}_t\). The feasible set in \(({\mathcal {P}})\) is defined by the set-valued mapping \(Q: {\mathcal {X}} \rightrightarrows {\mathbb {R}}^p\), \(Q({\textbf{x}}) = \{{\textbf{q}}: 0 \le {\textbf{x}}^\top {\textbf{q}} \le {\overline{E}}\}\). Note that, for any input \({\textbf{x}}\) the output of Q is a convex region bounded by two parallel hyperplanes. As the Euclidean norm is used, a unique solution is guaranteed to exist for any instance of \(({\mathcal {P}})\). Generally, the Euclidean projection of a point into a convex set requires solving a convex optimization problem, however the definition of Q allows us to find a closed-form expression, yielding

$$\begin{aligned} \Pi ({\textbf{o}}, {\textbf{x}}) = {\left\{ \begin{array}{ll} {\textbf{o}}, &{} 0 \le {\textbf{x}}^\top {\textbf{o}} \le {\overline{E}} \, , \\ {\textbf{o}} + \frac{{\overline{E}} - {\textbf{x}}^\top {\textbf{o}}}{\left| \left| {\textbf{x}}\right| \right| ^2_2} {\textbf{x}}, &{} {\textbf{x}}^\top {\textbf{o}} >{\overline{E}} \, , \\ {\textbf{o}} + \frac{ - {\textbf{x}}^\top {\textbf{o}}}{\left| \left| {\textbf{x}}\right| \right| ^2_2} {\textbf{x}}, &{} {\textbf{x}}^\top {\textbf{o}} < 0 \, . \end{array}\right. } \end{aligned}$$
(14)

This reduces the resolution of the optimization problem \(({\mathcal {P}})\) to evaluating the above expression. Even though the operator \(\Pi\) guarantees the feasibility of \({\textbf{q}}_{t}\) under the realization \({\textbf{x}}_{t-1}\), we need to resort to (8) setting \(E^{\textrm{F}}_t = \pi ({\textbf{x}}_{t}, {\textbf{q}}_t)\) to guarantee \(E^{\textrm{F}}_t\) remains feasible for any new arbitrary \({\textbf{x}}_t\).

The last remaining aspect is to compute the vector \(\varvec{\eta }_t\) following the ideas in Zeiler (2012). Let \({\textbf{g}}_t = [g_{t,1},..., g_{t,j},..., g_{t,p}]^{\top }\) be a gradient or subgradient vector computed through (11) and (12). Then, we can define the squared running average of each component as

$$\begin{aligned} {\overline{g}}_{t,j}^2 = \rho {\overline{g}}_{t-1,j}^2 + \left( 1 - \rho \right) g_{t,j}^2 \, , \end{aligned}$$
(15)

where \(\rho \in [0, 1)\) is a decay constant and \({\overline{g}}_{0,j}^2 = 0\). The auxiliary variable \({\overline{g}}_{t,j}^2\) is then used to compute the independent learning rate applied to the associated decision vector component following

$$\begin{aligned} \eta _{t,j} = \frac{\eta }{\sqrt{{\overline{g}}_{t,j}^2 + \epsilon }} \, , \end{aligned}$$
(16)

where \(\epsilon \in {\mathbb {R}}^+\) helps better conditioning the denominator (by avoiding division by 0)and \(\eta > 0\) is a constant. We use the update given by (15) and (16) in the proposed OLNV algorithm with the values \(\epsilon = 10^{-6}\) and \(\rho =0.95\), as originally suggested in Zeiler (2012). The benefits of this update is twofold. On the one hand, OLNV adapts each learning rate component to the scale of the incumbent feature. On the other hand, OLNV tracks the most recent dynamic between the uncertain vector \([E_t, \psi _t^+, \psi _t^-\)] and the feature vector \({\textbf{x}}_t\). The OLNV algorithm for the feature-driven wind energy trading problem is compiled in Algorithm 1.

figure a

Despite the fact we have considered a single wind farm in the derivation, the proposed OLNV algorithm is general enough to be exploited for an aggregation of wind farms, or in general, for a portfolio of diverse renewable energy sources with uncertain production, just by combining the capacity and generation of the assets. Equally, the potential spatial correlation among production of wind farms does not affect the feasible region of the newsvendor model, and therefore does not complicate the OLNV algorithm. On the contrary, adding storage to the generation portfolio forces the model to include inter-temporal constraints that dramatically reshape the feasible region, implying that the current decision will affect future outcomes. In this case, the decision-maker can resort to classical dynamic programming (Hargreaves and Hobbs 2012) or more advanced learning algorithms such as budget-constrained online learning (Liakopoulos et al 2019; Sherman and Koren 2021) or reinforcement learning algorithms (Kuznetsova et al 2013; Sutton and Barto 2018).

Finally, even if a population effect may be present for renewables in electricity markets (i.e., even if price-taker individually, the sum of individual actions of these producers may impact market outcomes), several wind power producers can effectively use the OLNV algorithm to improve the profitability of their offer within the same region. The fact that each competing producer has different contextual information available and may process it in alternative ways mitigates possible increases in the volatility of their outcomes that could arise from correlated generation.

3.2 Regularization through average penalty anchoring

In an electricity market with a two-price imbalance settlement scheme, it is common that \(\psi ^{+}_t=\psi ^{-}_t = 0\) over a significant number of hours, meaning that load and generation are close to being balanced. In this situation, from (9), the producer experiences no cost no matter the deviation from the actual production. Moreover, the gradients computed through (9) are zero and therefore the variable vector \({\textbf{q}}_t\) is not updated, wasting information about the relationship between \(E^{\textrm{F}}_t\) and \({\textbf{x}}_t\). And, when penalties are different from zero, they typically exhibit random behavior with sharp spikes representing highly imbalanced scenarios which, in turn, yields destabilizing updates of the vector \({\textbf{q}}_t\). To tackle both issues, we propose performing the following convex transformation of the original penalties:

$$\begin{aligned} \psi ^{+'}_t&= \mu \psi ^{+}_t + (1 - \mu ) {\overline{\psi }}^{+} \, , \end{aligned}$$
(17)
$$\begin{aligned} \psi ^{-'}_t&= \mu \psi ^{-}_t + (1 - \mu ) {\overline{\psi }}^{-} \, , \end{aligned}$$
(18)

where \(0 \le \mu \le 1\) and \({\overline{\psi }}^{+}, {\overline{\psi }}^{-} \in {\mathbb {R}}^+\) are the historical average penalties. This convex transformation is inspired by the concept of constraining the optimal offer around the point forecast proposed by Zugno et al (2013a). In contrast though, we do not impose hard constraints on the decision vector \({\textbf{q}}_t\). Instead, we smooth the objective function using as anchor the sample average optimal market quantile determined by the average market penalties \({\overline{\psi }}^{+}\) and \({\overline{\psi }}^{-}\). To do so, we consider a convex combination of the original objective function (7a) with an additional term that minimizes such a quantile,

$$\begin{aligned} NV_t^{\textrm{R}} =\,&\mu \psi _{t}^{+}\left( E_t - {\textbf{x}}^\top _t {\textbf{q}} \right) ^{+} +\mu \psi _{t}^{-}\left( {\textbf{x}}^\top _t {\textbf{q}} - E_t\right) ^{+} \nonumber \\&+ (1 - \mu ) {\overline{\psi }}^{+}\left( E_t - {\textbf{x}}^\top _t {\textbf{q}} \right) ^{+} + (1 - \mu ) {\overline{\psi }}^{-}\left( {\textbf{x}}^\top _t {\textbf{q}} - E_t\right) ^{+} \, . \end{aligned}$$
(19)

Then, using (17) and (18), the original objective structure is recovered, i.e.,

$$\begin{aligned} NV_t^{\textrm{R}}&= \psi ^{+'}_t \left( E_t - {\textbf{x}}^\top _t {\textbf{q}} \right) ^{+} + \psi ^{-'}_t\left( {\textbf{x}}^\top _t {\textbf{q}} - E_t\right) ^{+} \, . \end{aligned}$$
(20)

Therefore, by replacing \(\psi ^{+}_t, \psi ^{-}_t\) with \(\psi '^{+}_t, \psi '^{-}_t\) in the original objective function, we regularize the learning procedure at no extra computational cost. When the available samples are not sufficient to provide reliable estimates of the true \({\overline{\psi }}^{+}\) and \({\overline{\psi }}^{-}\), the producer can resort to assume a balanced market with penalties \({\overline{\psi }}^{+} = {\overline{\psi }}^{-} = 1\). Thus, with \(\mu < 1\), provided that \({\overline{\psi }}^{+}, {\overline{\psi }}^{-} > 0\), the algorithm utilizes the information contained in samples with both penalties equal to zero, potentially accelerating the convergence and obtaining smoother updates through the gradient. The same reasoning applies to the smooth objective function.

3.3 Performance evaluation

In order to assess the economic performance of our algorithm over a set of testing samples \(\left\{ (E_t, \psi _t^{-}, \psi _t^{+}, {\textbf{x}}_t), \forall t\in {\mathcal {T}}^{\text{oos}} \right\}\), we use the average deviation cost. To lighten the notation, we write \(T = |{\mathcal {T}}^{\text{oos}}|\). Consider that we have obtained successive offers \(E_1^{\textrm{F}},..., E_{T}^{\textrm{F}}\) over the test set, by using (7) and (8) or from Algorithm 1, after iteratively going through all the samples belonging to the test set \({\mathcal {T}}^{\text{oos}}\). We then calculate the average deviation cost as

$$\begin{aligned} NV^{\text{oos}} = \frac{1}{T} \sum _{t \in {\mathcal {T}}^{\text{oos}}} \psi ^{-}_{t} \left( E_{t} - E^{\textrm{F}}_{t}\right) ^+{+} \psi ^{+}_{t} \left( E^{\textrm{F}}_{t} - E_{t}\right) ^+ \, . \end{aligned}$$
(21)

The value of this metric gives limited information about how a particular method is performing. A natural benchmark is the score obtained when a forecast of the wind energy production (in the sense of minimizing the root mean square error) is directly used as an offer in the market. We refer to this method as \(\text {FO}\) (from FOrecast). Let \(NV^{\text{oos}}_{\text {FO}}\) be the deviation cost incurred by \(\text {FO}\). We then redefine the original metric in relative terms, i.e.,

$$\begin{aligned} NV^{\text{oos}} (\%) = 100 \, \frac{NV^{\text{oos}}_{FO} - NV^{\text{oos}}}{NV^{\text{oos}}_{FO}} \, . \end{aligned}$$
(22)

Consequently, the metric expresses an improvement (as a percentage), where a value of 100% means perfect performance with zero deviation cost.

For online learning problems the customary performance measure is the regret. Traditionally, the regret compares a sequence of decision \({\textbf{q}}_1,..., {\textbf{q}}_{T}\) against the best single vector in hindsight \({\textbf{q}}^{{\mathcal {H}}}\). The latter is computed ex-post solving a problem analogous to (7) once the whole collection of samples belonging to \({\mathcal {T}}^{\text{oos}}\) is known. Let \(Q^{{\mathcal {H}}}\) be the intersection of all feasible sets \(Q({\textbf{x}}_t)\), more precisely \(Q^{{\mathcal {H}}}: {\mathcal {X}} \rightrightarrows {\mathbb {R}}^p\), \(Q^{{\mathcal {H}}} = \{{\textbf{q}}: 0 \le {\textbf{x}}_t^\top {\textbf{q}} \le {\overline{E}}, \, \forall t \in {\mathcal {T}}^{\text{oos}}\}\). The static regret is

$$\begin{aligned} {\mathcal {R}}_T^s&= \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{q}}_t) - \min _{{\textbf{q}} \in Q^{{\mathcal {H}}}} \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{q}}) \, . \end{aligned}$$
(23)

Given the assumption of a nonstationary environment, outperforming a constant \({\textbf{q}}^{{\mathcal {H}}}\) can be a relatively easy task even though it is determined under perfect information. Alternatively, one may consider the worst-case regret (Besbes et al 2015) interchanging the sum and minimum, i.e.,

$$\begin{aligned} {\mathcal {R}}_T^w&= \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{q}}_t) - \sum _{t \in {\mathcal {T}}^{\text{oos}}} \min _{{\textbf{q}} \in Q({\textbf{x}}_t)} NV_t({\textbf{q}}) \, , \end{aligned}$$
(24)

where the second term of (24) gives the best individual decision \({\textbf{q}}_t^{{\mathcal {H}}} \in \mathop {\mathrm {arg\,min}}_{{\textbf{q}} \in Q({\textbf{x}}_t)} NV_t({\textbf{q}})\). The regret computed in this way can be very pessimistic and unrealistic. Note that in the context of the wind farm, it is always possible to find a value for \({\textbf{q}}\) such that \(E_t - {\textbf{x}}_t^{\top } {\textbf{q}} = 0\), and therefore (24) readily reduces to the summation of the original objective function \({\mathcal {R}}^w_T = \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{q}}_t)\). Alternatively, Zinkevich (2003) proposed to compare the performance of online algorithms against a sequence of arbitrary decisions \({\textbf{u}}_1,..., {\textbf{u}}_T\), \({\textbf{u}}_t \in Q({\textbf{x}}_t)\),

$$\begin{aligned} {\mathcal {R}}_T^d&= \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{q}}_t) - \sum _{t \in {\mathcal {T}}^{\text{oos}}} NV_t({\textbf{u}}_t) \, . \end{aligned}$$
(25)

We refer to this approach as dynamic regret. This formulation allows to define a metric with an adjustable difficulty between the previous benchmarks. Note that (23) and (24) are special cases of (25) with \({\textbf{u}}_t = {\textbf{q}}^{{\mathcal {H}}} \forall t\) and \({\textbf{u}}_t = {\textbf{q}}_t^{{\mathcal {H}}}\forall t\), respectively. Then, the question is how to choose a reasonable series of reference benchmarks \({\textbf{u}}_t\) to use against OLNV. To this end, we propose dividing \({\mathcal {T}}^{\text{oos}}\) in k adjacent partitions of equal length l, except possibly the last one. Without loss of generality, by assuming \(T - k l = 0\), we have \({\mathcal {T}}^{\text{oos}}_i = \{t: (i-1) l + 1 \le t \le i l\}, i = 1,..., k\). Let us define the feasible sets \(Q^{{\mathcal {H}}}_i = \{{\textbf{q}}: 0 \le {\textbf{x}}_t^\top {\textbf{q}} \le {\overline{E}}, \, \forall t \in {\mathcal {T}}^{\text{oos}}_i\}\). Accordingly, we can compute \({\textbf{q}}^{{\mathcal {H}}}_i = \mathop {\mathrm {arg\,min}}\nolimits _{{\textbf{q}} \in Q^{{\mathcal {H}}}_i} \sum _{t \in {\mathcal {T}}^{\text{oos}}_i} NV_t({\textbf{q}})\). Finally, the sequence of reference benchmarks that we propose to use in this paper is \({\textbf{u}}_t = {\textbf{q}}^{{\mathcal {H}}}_i, \forall t \in {\mathcal {T}}^{\text{oos}}_i\). We will empirically investigate the regret performance of OLNV in the case study presented in Sect. 5.

4 Illustrative examples

This section analyzes several illustrative examples to gain insight into the behavior of OLNV. The first case compares the two alternative implementations introduced in Sect. 3.1 and discusses their main properties. As a result of this analysis, we select the subgradient objective function as the default procedure to perform the update of \({\textbf{q}}_t\) in OLNV. One of the key features of online learning algorithms is their tracking ability, given the chronological order in which the updates are performed. In the second illustrative example, we deal with alternating penalty scenarios, showing the salient properties of OLNV to adapt to a changing environment.

4.1 Comparing the smooth and subgradient implementations

This illustrative example aims to elucidate whether the smooth approximation presented in (10) provides any advantage over the direct subgradient implementation of OLNV. This will allow us to determine which implementation to be used for further numerical experiments.

We consider a simplified setting with a single feature, a forecast of the wind power generation that we also use as the baseline for the FO method, and a single regressor \(q_t \in {\mathbb {R}}\). No intercept is considered to ease the representation and analysis of \(q_t\). We sample the feature from a uniform distribution \(x_t \sim U[10, 90]\) (MW) and the true wind generation series is built adding a Gaussian noise, \(E_t = x_t + \epsilon _t\) with \(\epsilon _t \sim {\mathcal {N}}(0, 6)\) (MW). We generate a dataset of a 1-year duration (8760 samples, as if of hourly temporal resolution). Given that the penalties \(\psi _t^{+}\) and \(\psi _t^{-}\) are difficult to simulate, we compute them based on real day-ahead and regulation prices of the Danish DK1 bidding zone. We retrieve data corresponding to the year 2017 from the data portal of the Danish TSO, Energinet.Footnote 4 Four implementations of Algorithm 1 are executed, three of them computing gradients of the smooth objective function through (11) with \(\alpha = 0.05\), 5 and 20 and the last one using subgradients of the original cost mapping as in (12), to which we refer to as \(\partial\). All instances are initialized with \(q_1 = 1\), which means that the first offer produced by FO and OLNV are the same. In this section we do not use any convex transformation of the prices, i.e., \(\mu = 1\), and we set \(\eta = 0.005\). We run the OLNV algorithm throughout the dataset, performing updates of \(q_t\) every hour.

In this section, we accompany the numerical results with some theoretical analysis. The function \(NV_{t,\alpha }({\textbf{q}})\) approximates well the original function \(NV_t({\textbf{q}})\) when \(\left| E_t - {\textbf{x}}^\top _t {\textbf{q}}\right| \rightarrow \infty\) as shown in Proposition 3 in Appendix. Then, an interesting point of analysis related to the behavior of both functions in the neighborhood of \(E_t - {\textbf{x}}^\top _t {\textbf{q}} = 0\), defined by \(\varphi = \{{\textbf{q}}: - \delta \le E_t - {\textbf{x}}^\top _t {\textbf{q}} \le \delta \}\) with \(\delta > 0\). Let \({\textbf{q}}_1\) and \({\textbf{q}}_2\) be two vectors with \(E_t - {\textbf{x}}^\top _t {\textbf{q}}_1 \le 0\), \(E_t - {\textbf{x}}^\top _t {\textbf{q}}_2 \ge 0\) and \({\textbf{q}}_1, {\textbf{q}}_2 \in \varphi\). The subgradient that OLNV computes for each vector changes substantially with \(\partial NV_t({\textbf{q}}_1) = \psi ^{-}_t {\textbf{x}}_t\) and \(NV_t({\textbf{q}}_2) = - \psi ^{+}_t {\textbf{x}}_t\), which may result in very different updates of the vector \({\textbf{q}}\) for similar values of \({\textbf{x}}_t\) or \({\textbf{q}}_t\). Conversely, \(NV_{t,\alpha }\) is everywhere differentiable, which ensures a smooth change of \(\nabla NV_{t,\alpha }({\textbf{q}})\) for similar values of \({\textbf{q}}-T\) and \({\textbf{x}}_t\).

Figure 1 shows a sample of \(\partial NV_{t}\) and \(\nabla NV_{t,20}\) that corresponds to the subgradient and gradient of the smooth objective function with \(\alpha = 20\). Only \(NV_{t}\) and \(NV_{t, 20}\) are represented, for the sake of clarity. Most of the spikes in the case of \(NV_{t, 20}\) are comparatively lower due to the aforementioned smoothing effect in the neighborhood of \(E_t - {\textbf{x}}^\top _t {\textbf{q}} = 0\). This is aligned with the decreasing value of the standard deviation of the (sub-)gradients \(\sigma\) collated in Table 1 as \(\alpha\) increases.

Fig. 1
figure 1

Sample of \(\partial NV_{t}\) and \(\nabla NV_{t,20}\) computed in the dataset of the illustrative example

Table 1 Average absolute value \(\left| {\overline{g}}\right|\) and standard deviation \(\sigma\) of the (sub-) gradients and the metric \(NV^{\text{oos}}(\%)\) computed for three smooth (\(\alpha\)) and one subgradient (\(\partial\)) implementations of the OLNV

On the contrary, the mean absolute value of the (sub-)gradients, denoted as \(\left| {\overline{g}}\right|\), follows the opposite evolution. To understand the rationale behind this evolution, we provide Fig. 2 showing three instances of the original and smooth losses. In all cases, we see that \({NV}_{t, \alpha }\) is an upper bound for \({NV}_{t}\) by a finite amount as expressed in Proposition 2 (with a proof available in Appendix). However, Fig. 2(a) shows that the minimum of \({NV}_{t, \alpha }\) is not aligned with the minimum of the original pinball loss function. This is true whenever \(\psi ^{+}_t \ne \psi ^{-}_t\) (i.e., asymmetric penalties in the market), a common situation in markets with a two-price imbalance settlement. Furthermore, when one penalty is equal to zero, the minimum is never attained.

Consequently, the gradient computed through (11) almost always introduces a deviation that is positive, compared to the true value returned by (12). The value of this error is given by the following expression:

$$\begin{aligned} \nabla {NV}_{t, \alpha } -&\partial NV_t({\textbf{q}}) =& {\left\{ \begin{array}{ll} (\psi ^{+}_t + \psi ^{-}_t) (1 + e^{(E_t - {\textbf{x}}^\top _t {\textbf{q}}) / \alpha })^{-1} {\textbf{x}}_t, &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} > 0 \, , \\ - (\psi ^{+}_t + \psi ^{-}_t) (1 + e^{-(E_t - {\textbf{x}}^\top _t {\textbf{q}}) / \alpha })^{-1} {\textbf{x}}_t, &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} < 0 \, , \\ {[}-\frac{\psi ^{+}_t + \psi ^{-}_t}{2} {\textbf{x}}_t, \frac{\psi ^{+}_t + \psi ^{-}_t}{2} {\textbf{x}}_t], &{} E_t - {\textbf{x}}^\top _t {\textbf{q}} = 0 \, . \\ \end{array}\right. } \end{aligned}$$
(26)

The imperfect approximation of \({NV}_{t, \alpha }\) distorts the magnitude and even the sign of the gradients, causing a long-term drift of \(q_t\) that increases with the smoothing parameter \(\alpha\) as shown in Fig. 3.

Finally, the last row of Table 1 presents the \(NV^{\text{oos}}(\%)\) obtained by each implementation with respect to FO. One sees that \(NV^{\text{oos}}(\%)\) deteriorates when \(\alpha\) increases. The smooth approach increasingly dampens the evolution of the decision vector for higher values of \(\alpha\), but at the expense of a biased \(q_t\) and with non-negligible economic losses. Therefore, the smooth approximation does not provide any substantial advantage over the subgradient implementation in this application, given that the producer is neutral to risk and volatility (only being concerned with expected profits), while there is no technical constraint that encourages a smooth evolution of q. As a consequence, we will only use subgradients to implement the OLNV method throughout the remainder of the manuscript.

Fig. 2
figure 2

Different instances of the original NV and smooth \(NV_{0.3}\) objective function with \(\alpha =0.3\) and \(u = E_t - x_t q\)

Fig. 3
figure 3

Example of the evolution of the coefficient q for different implementations of OLNV

4.2 Dynamic behavior

In this illustrative example, we compare the tracking ability of OLNV and LP approaches in a nonstationary environment. Similar to the previous case, we assume that the producer has access to a unique feature and considers a model with a single regressor. Again, we sample the forecast from a uniform distribution \(x_t \sim U(10, 90)\) (MW) and the true wind power generation series is obtained by adding a normal noise \(E_t = x_t + \epsilon _t\) with \(\epsilon _t \sim {\mathcal {N}}(0, 6)\) (MW). Instead of the real DK1 data, we consider two possible scenarios with penalties \(\psi _t^+=1, \psi _t^-=3\) and \(\psi _t^+=3, \psi _t^-=1\), alternating every two months. This process yields 8 months of data (5760 h) using the last 4 months (2880 h) as the test set. The start of the test set is aligned with the beginning of a two-month scenario with \(\psi _t^+=1\) and \(\psi _t^-=3\). The rolling window approach is implemented solving the optimization problem (7) with a set of historical samples \({\mathcal {T}}^{\textrm{in}}\). Then, we use (8) to cast an offer based on the context \(E^{\textrm{F}}_t = \pi (x_{t}, q^{\text {LP}}_t)\). The coefficient \(q^{\text {LP}}_t\) is refreshed every 24 h by solving problem (7), and based on a rolling window. The reason for a 24-hour update is twofold: it is aligned with the original proposal in Muñoz et al (2020) and we empirically checked that there was little economic gain to be obtained with more frequent updates. The computing time in the case of an hourly update, for example, took 24 times longer. As will be shown in the following, LP based on a rolling window approach only produces small changes over the training set, resulting in similar \(q^{\text {LP}}_t\). We train four versions of the LP model with \(|{\mathcal {T}}^{\text{in}}| = 720\), 1440, 2160 and 2880 (1, 2, 3, or 4 months), denoted as LP-1 M to LP-4 M, respectively. We use the first four months of the dataset to construct the initial training sets. Although the concept of training is not strictly the same for OLNV (since it always learns on the fly, as new samples become available), only the last month of the training set is used to update the value of \(q_t\), originally initialized with \(q_1 = 1\), to resemble a model that has been operating for some time.

Figure 4 depicts the evolution of the single regressor \(q_t\) over the test set, together with the optimal \(q^*\) for each penalty scenario. Over the first two months, the higher value of \(\psi _t^-\) penalizes offers above the true production \(E^{\textrm{F}}_t > E_t\) and, consequently, the optimal strategy is to underestimate \(E^{\textrm{F}}_t\) with \(q^* < 1\). Over the final months, we observe the opposite.

As one may expect, the evolution of the decision vector of LP models is smoother than in the case of OLNV, given that the former approach considers many historical samples at once to perform the update. However, Fig. 4 also shows that the trajectory of \(q_t\) produced by the rolling window models LP-1 M to LP-4 M is substantially lagged with respect to the change in the penalty scenario (emphasized by different background colors). This delay increases with the length of the training set, to the point that LP-4 M completely overlooks it. Note that the length of the training set in LP-4 M and the period of the penalty scenarios are identical. Therefore, the number of samples that penalizes under- or overproduction is equal and remains constant. As a result, LP-4 M offers no incentive to overestimate or underestimate the forecast, yielding the same value as FO (neglecting slight deviations due to the finite sample and noise).

Figure 4 additionally shows that OLNV is substantially faster at tracking the optimal \(q^*\). In contrast, the LP problem (7) determines the decision \(q_t\) with the best performance on average in the training set, assuming that all the samples in the set are equally probable representations of future outcomes. Conversely, OLNV only uses the most recent information to perform a point-wise update that swiftly captures changes in the environment.

The tracking capability of both approaches has an impact on their economic performance. Table 2 summarizes the out-of-sample \(NV^{\text{oos}}(\%)\) obtained by each approach in the test set. In line with the previous analysis, LP-4 M obtains the same performance as FO. The other three LP methods experience decreasing \(NV^{\text{oos}}(\%)\) as the length of the training set and the lag of \(q_t\) increase. Finally, the adaptability of OLNV allows outperforming the LP approaches.

Table 2 Out-of-sample \(NV^{\text{oos}}\) (%) obtained in the test set of the illustrative example

In this simplified example, we could have analyzed LP models with a shorter training set, probably resulting in reduced lag and better performances. However, in a realistic situation with a huge feature space and random penalties, months of data are typically required to capture the underlying relationships and generalize well in the out-of-sample set (Muñoz et al 2020). Therefore, the length of the training set of the LP models has to be selected as a trade-off; enough data is required to learn a policy that generalizes well, but shorter sets capture dynamics better. On the contrary, the OLNV approach completely avoids this dichotomy, providing a fast and effective method that adapts to uncertain parameters generated by nonstationary environment.

Fig. 4
figure 4

Evolution of q produced by five models over the test set. The blue and orange shaded periods correspond to the penalty scenarios \(\psi _t^+=1, \psi _t^-=3\) and \(\psi _t^+=3, \psi _t^-=1\), respectively. The entry \(q^{*}\) corresponds to the best single vector for each penalty scenario

5 Case study

Electricity markets are in the midst of a rapid development towards reducing the time between market transactions and the actual exchange of electricity. Examples of this transformation are given, i.e., by the reduction of the electricity lead time (Australian NEM or the Californian CAISOFootnote 5) or by the development of new intraday markets (OMIE intraday markets or NordPool ELBASFootnote 6). Inspired by this trend, we analyze a case study that considers an online forward market that takes place every hour followed by a balancing market with a two-price imbalance settlement. The gate closure of the forward market happens just before the start of the next period. We assume that the wind farm continuously participates in the market and her offer is always accepted.

In the following we first describe the data used in this case study. Then, several benchmark methods are proposed to compare against OLNV. Finally, in a last part, we analyze the numerical results obtained, based on regret, economic performance and computational costs.

5.1 Data and experimental setup

This case study is based on historical data compiled by the Danish TSO, Energinet.dk, since it includes market prices and several wind power forecasts that can be employed as input features. We collect the true and day-ahead forecast issued by Energinet for the on- and offshore wind power production of both DK1 and DK2 Danish bidding zones together with the day-ahead and regulation prices of DK1 for the period 01/07/2015 to 06/04/2021 (mm/dd/yyyy). The day-ahead spot and regulation prices are mapped into hourly penalties through equations (2) and (3) and some small negative values, obtained due to rounding errors, are filtered out.

The raw wind power forecast series are also processed to suit our needs. Given that the installed capacity of the four wind categories shown in Table 3 varies differently over the dataset, we independently normalize each series to lie between 0 and 100 MW, a figure that can easily represent the capacity of a large wind farm. According to the Danish TSO, the raw wind power forecasts are issued between 12 to 36 h ahead, although the exact time is difficult to know because no timestamp is provided. To overcome this issue, we use a standard ordinary least square regression model to produce enhanced forecasts with an accuracy comparable to an hour-ahead forecast and, therefore, suitable for our case study. We feed each raw wind power forecast into an independent linear regression model together with the last three lags of the true historical wind realization of the pertaining series. Finally, we use the first 6 months of our dataset to independently train each of the four predictive models, one per column of Table 3.

Table 3 Installed capacity in MW by bidding zone and technology

Table 4 compares the root mean square error (RMSE) of the original and improved out-of-sample forecast against the naive benchmark provided by the first lag of each series (the wind power production of the previous hour), also known in the literature as persistence. Results show that the improved hour-ahead series significantly outperforms both original forecasts and persistence. As a byproduct, note that the wind power forecasts issued by the Danish TSO have quality metrics (e.g., RMSE) that are consistent with expectation, i.e., with offshore conditions being harder to predict than onshore conditions, while DK2 also having lower predictability since having small capacity and coverage area.

Table 4 Average RMSE (MWh) of the original forecast, the persistent (naive 1 h lag) and improved 1 h-ahead forecast computed on the out-of-sample period 07/01/2015 to 06/04/2021 with a normalized generation capacity of 100 MW

Once we have processed the wind power production series, we explain how we use them in our case study. The power generation of the wind farm offering in the market is simulated using the normalized onshore time-series of the Danish DK1 bidding zone, which is consistent with the bidding zone of the imbalance penalties utilized. The four hour-ahead forecasts of the wind power production of DK1 and DK2 are available to the producer as contextual information. Although additional wind power forecasts of neighboring bidding zones could have been used as features, we restrict ourselves to the ones produced by the Danish TSO to avoid potential inconsistencies regarding the issuing time that could cast doubt on the results obtained (Muñoz et al 2020).

Given that our goal is to reduce the imbalance cost incurred by the wind farm, we also consider several price-related features to be used as contextual information. To this end, we include the first lag of the imbalance penalties \(\psi ^+_{t-1}\) and \(\psi ^-_{t-1}\) in the vector of contextual information. As commented in Sect. 2, it is well known that the ratio between the penalties provides valuable information about the optimal decision of the newsvendor model and, therefore, we add the series \(r_{t-1} = \psi ^+_{t-1} / ( \psi ^+_{t-1} + \psi ^-_{t-1} + \upsilon )\) where \(\upsilon = 10^{-5}\) is a constant that helps better condition the denominator. Finally, we add a column of ones that enable one of the regressors to become an intercept, completing our feature set.

As a summary, let \(E_t^{on1}, E_t^{of1}, E_t^{on2}, E_t^{of2}\) denote the hour-ahead wind power forecast of DK1 onshore, DK1 offshore, DK2 onshore and DK2 offshore, respectively. Then, at the moment of delivering the offer, the producer has available a feature vector \({\textbf{x}}_t = [1, E_t^{on1}, E_t^{of1}, E_t^{on2}, E_t^{of2}, \psi ^+_{t-1}, \psi ^-_{t-1}, r_{t-1}]^\top\) to infer the optimal offer \(E_t^{\textrm{F}}\).

5.2 Benchmark methods and implementation details

In this section, we describe several benchmark methods against which we compare the performance of OLNV. The first benchmark approach is the enhanced hourly forecast of DK1 itself, produced through the ordinary least square regression model described before. Although a prediction that minimizes the RMSE may seem naive, one can expect that the deviation cost incurred by the producer vanishes as the RMSE of the forecast approaches zero. Therefore, an hour-ahead forecast is expected to perform relatively well. We also use this hour-ahead forecast as the baseline to compute the metric \(NV^{\text{oos}}(\%)\) for the rest of the approaches in the way described in Sect. 3.3.

The second benchmark is that of Muñoz et al (2020), based on two-step approach using two variants of (7). In the first step, the first model only considers wind-related features plus the intercept and set \(\psi ^{+}_t = \psi ^{-}_t = 1, \, \forall t\). The series resulting from such model can be interpreted as an enhanced forecast of the wind energy production with a reduced mean absolute error. In a second step, this enhanced forecast is fed into (7), considering this time the true historical penalties \(\psi ^{+}_t\) and \(\psi ^{-}_t\) to correct for market patterns but neglecting the capacity constraint (7b). The training set is updated following a rolling window, adding new samples and eliminating the same amount of the oldest. We replicate this method, called LP2 (Linear Programming 2-steps), considering the four-hour-ahead enhanced wind power forecasts of DK1 and DK2 as the input of the first step, this is, \({\textbf{x}}_t = [1, E_t^{on1}, E_t^{of1}, E_t^{on2}, E_t^{of2}]^{\top }\). In line with their findings, we choose a training set of \(|{\mathcal {T}}^{\text{in}}| = 4320\) (6 months) and a rolling window step of 24 h.

In addition, we analyze a rolling window model, called LP, that solves exactly (7) and (8) using the full vector of available contextual information. This method is the one from the illustrative example in Sect. 4.2, but with different inputs. Given the similarities with the other rolling window approach LP2, we also choose a training set length of 6 months and a rolling window step of 24 h.

Finally, we discuss a benchmark that cannot be implemented in practice, inspired by the static regret metric defined in (23). We assume perfect information about the whole out-of-sample dataset and consider (7) to compute the best linear model in hindsight, determined by the vector \({\textbf{q}}^{{\mathcal {H}}}\). Once this optimal single vector is computed, the whole sequence of offers is determined through \(E^{\textrm{F}}_{t} = \pi ({\textbf{x}}_{t}, {\textbf{q}}^{{\mathcal {H}}})\). We name this benchmark FX (for FiXed).

Next, we discuss the implementation of OLNV in this case study. The OLNV algorithm does not need to solve an optimization problem but requires initializing two parameters. To choose \(\mu\) and \(\eta\), we perform an offline grid search on the chunk of data spanning 07/01/2015 to 12/31/2015. As candidate values for \(\mu\) we consider \([0, 0.1, \ldots , 1]\) and for \(\eta\) we analyze \([10^{-2}, 10^{-3}, 10^{-4}]\). The grid search is carried out executing \(3 \times 11 = 33\) independent instances of the OLNV algorithm, initializing each time the OLNV regressor associated with the onshore DK1 forecast to 1 and the rest of the values to 0.01. The average \(NV^{\text{oos}}(\%)\) obtained by each instance is collated in Table 5. After analyzing the results, we select the combination of values \(\mu = 0.7\) and \(\eta = 0.001\) which achieve the highest \(NV^{\text{oos}}(\%)\). Even though in this case study a grid search was used for the sake of clarity, other more complex cross-validation techniques (Refaeilzadeh 2009) can be used instead to select the values of \(\mu\) and \(\eta\), including repeating this process periodically to update the values of \(\mu\) and \(\eta\) after a change in the environment.

Table 5 Out-of-sample \(NV^{\text{oos}}\) (%) for different combinations of parameters \(\mu\) and \(\eta _0\) over the span 07/01/2015 to 12/31/2015

In this case study, we assume a balanced penalty anchor \({\overline{\psi }}^+ = {\overline{\psi }}^- = 1\). Again, we initialize the OLNV regressor associated with the onshore DK1 forecast to 1 and the rest of the values to 0.01. In other words, we start the online offering with a strategy very close to FO, mainly relying on the forecast of the wind energy production. We use the next 6 months (01/01/2016 to 06/30/2016) to update (initialize) \({\textbf{q}}_{\text {OL}}\) with the aim of having a fair comparison against LP and LP2.

The performance of all the methods presented in this section is evaluated using the dataset spanning from 07/01/2016 to 06/04/2021 (5 years with 43 200 samples). The optimization models LP, LP2, and FX are implemented with the Python package Pyomo (Bynum et al 2021) and solved through the optimization solver CPLEX,Footnote 7 whereas the implementation OLNV is developed by the authors based on standard Python packages and uploaded to an open repository.Footnote 8

5.3 Numerical results

Next, we discuss the results obtained in this case study. We start examining the regret suffered by OLNV over the aforementioned out-of-sample dataset with a length of \(D=43,200\) hours (60 months). Let \({\mathcal {T}}_j^{\text{oos}} = \cup ^{j}_{i=1} {\mathcal {T}}^{\text{oos}}_i\) and recall \({\textbf{u}}_t = {\textbf{q}}^{{\mathcal {H}}}_i , \, \forall t \in {\mathcal {T}}^{\text{oos}}_i\). We assess the average dynamic regret \(R_{T}^d/T\) for each sequence \({\mathcal {T}}_j^{\text{oos}}, j=1,...,D/l\) with partition length \(l=2160\), 4320, 8640 hours (3, 6, 12 months). As an additional case, we compute the evolution of the static regret for a sequence \({\mathcal {T}}_j^{\text{oos}}, j=1,...,20\) with a step of \(l=2160\) hours (3 months). In each step, we refresh the best single action in hindsight as \({\textbf{q}}^{{\mathcal {H}}}_j = \mathop {\mathrm {arg\,min}}\nolimits _{{\textbf{q}} \in Q^{{\mathcal {H}}}_j} \sum _{t\in {\mathcal {T}}_j^{\text{oos}}} NV_t({\textbf{q}})\) and \({\textbf{u}}_t = {\textbf{q}}^{{\mathcal {H}}}_j \forall t\).

The four aforementioned regret series are depicted in Fig. 5. As expected, the average dynamic regret incurred by OLNV deteriorates quickly as l decreases since lower values of l translate in a more challenging benchmark closer to the the worst-case regret defined in (24). Nevertheless, Fig. 5 clearly shows that OLNV achieves a sublinear static regret, i.e., \(\lim _{T \rightarrow \infty } \sup {\mathcal {R}}^s_T / T \le 0\). This is also the case for the dynamic regret with partitions of length \(l \ge 6\) months, proving the ability of OLNV to track dynamic environments.

Fig. 5
figure 5

Average dynamic regret \(R^d_{T} / T\) for \(l=3, 6, 12\) months and static regret \(R^s_{T} / T\) updated every 3 months (denoted as s) of the OLNV method

The economic gains obtained by each method are assessed through the \(NV^{\text{oos}}(\%)\). The average values achieved over the evaluation dataset are collated in Table 6. First, note that all methods outperform the naive FO strategy of offering the DK1 forecast, obtaining positive values and demonstrating that this set of features contributes to reducing the deviation cost.

The LP2 method is developed in a context where recent lags in the penalties are not available. Indeed, the lack of penalty-related features translates into a modest score, showing the evident benefits of disclosing recent information in electricity markets, i.e., reducing the lead time. Even though FX determines the optimal \({\textbf{q}}^{{\mathcal {H}}}\) in hindsight (i.e., under perfect information), its choice is limited to a single vector for the whole horizon. The fact that several approaches perform better than FX proves the dynamic behavior of the uncertain parameters and the need for updating the decision vector. Therefore, it does not come as a surprise that LP improves the first two approaches as it relies on the full vector of features and periodically updates \({\textbf{q}}^{\text {LP}}_t\). However, the superior adaptability of OLNV allows it to obtain the best score, achieving an additional 7.6% compared to LP and a total 38.6% deviation cost reduction compared to FO. The latter figure translates into an extra 25,930.22 €/year on average for a wind farm with a capacity of 100 MW.

Finally, the last row of Table 6 summarizes the computational time corresponding to four approaches. The FX method requires little time as it only solves a single optimization problem for the whole horizon. This contrasts with the significant amount of time required by the constant re-optimization of LP and LP2. It is noteworthy that even though OLNV produces 24 times more updates of the vector \({\textbf{q}}_t\), the time invested is several orders of magnitude lower. In conclusion, OLNV is up to the challenge of the electricity markets transformation achieving significant cost reduction together with exceptional computational performance.

Table 6 Out-of-sample \(NV^{\text{oos}}\) (%) and execution time (s) over the span 07/01/2016 to 06/04/2021

6 Conclusions

This paper develops a new algorithm, named OLNV, combining a variant of the online gradient descent with recent advances that extends the newsvendor model to consider contextual information directly. The component-wise update of the learning rate enables the use of features with different scales seamlessly. In nonstationary environments, conventional stochastic approaches may consider misleading old samples in their training sets. On the contrary, our algorithm tracks the most recent information of the gradients, adapting the learning rate to follow the dynamics of the uncertain parameters and potentially obtaining higher profits. The closed-form expressions derived to compute the projection into the feasible region and a gradient of the objective function yield a efficient algorithm that can be used in computationally expensive problems. We envision the use of OLNV in future electricity markets that evolves toward continuous offering with reduced lead time. In particular, we apply this algorithm to the wind farm problem offering in an hourly forward market with a dual-price settlement for imbalances.

Several numerical experiments are carried out to assess the properties of the proposed OLNV algorithm. In the first illustrative example, we compare the behavior of two alternative implementations, namely, a subgradient approach and a smooth approximation of the original newsvendor function. The numerical and theoretical analysis provided in this example indicates that computing subgradient on the original objective function proves more profitable since it avoids update errors that may be introduced by the smooth approximation. Consequently, we determined that the subgradient implementation was the most suitable to this application and used it throughout the rest of the numerical experiments. Nevertheless, the smooth approximation could be utilized in other applications where other technical concerns advice a smooth update.

The second example shows the adaptability of the OLNV algorithm to nonstationary environments, clearly outperforming other stochastic approaches that optimize (using mathematical programming techniques) over a training set of past information. This superior performance is justified by the point-wise update that only uses the most recent information. Our case study, built upon real data of the Danish TSO Energinet, shows that OLNV achieves a 38.6% cost reduction against using a point forecast as offer and 7.6% compared to a state-of-the-art method. These significant improvements contribute to accelerating the integration of renewable energy technologies. Furthermore, we empirically analyze several dynamic definitions of regret, showing the desired sublinear convergence against most benchmarks.

Although this research focused on wind energy producers, OLNV is readily applicable to managing a portfolio of variable renewable energies with zero marginal cost, including wind, solar and other technologies. Similar algorithms can be developed when the producer’s portfolio includes other assets such as loads, thermal power plants, or energy storage facilities, replacing the aggregated source of uncertainty, i.e., the variable net energy production, by a linear decision rule. In this case, the projection step on the feasible region would likely involve solving a quadratic optimization program that can still be efficiently solved with modern solvers, when the feasible region is convex. Another attractive front is extending the OLNV algorithm to address inter-temporal constraints, observing a similar note with regard to the feasible region as in the previous case. This may require first generalizing the newsvendor framework to offering in electricity markets though.

Future work also includes delving into the theoretical guarantees that this algorithm offers in terms of regret. On a different front, a wealth of other algorithms within the field of online learning can be applied to this problem, potentially bringing additional benefits such as faster convergence rates or improved performance. Similarly, variable selection techniques could help determine the subset of the available feature streams that provide the most economic value, whereas nonlinear mapping, i.e., kernels or generalized additive models (GAMs), can extend the regression capabilities of the method. Another exciting line of research concerns the risk analysis of the producer, where other metrics can be used instead of the expected value to create risk-averse strategies.