Ant colony optimization for feasible scheduling of step-controlled smart grid generation

The electrical energy grid is currently experiencing a paradigm shift in control. In the future, small and decentralized energy resources will have to responsibly perform control tasks like frequency or voltage control. For many use cases, scheduling of energy resources is necessary. In the multi-dimensional discrete case–e.g., for step-controlled devices–this is an NP-hard problem if some sort of intermediate energy buffer is involved. Systematically constructing feasible solutions during optimization, hence, becomes a difficult task. We prove the NP-hardness for the example of co-generation plants and demonstrate the multi-modality of systematically designing feasible solutions. For the example of day-ahead scheduling, a model-integrated solution based on ant colony optimization has already been proposed. By using a simulation model for deciding on feasible branches, artificial ants construct the feasible search graphs on demand. Thus, the exponential growth of the graph in this combinatorial problem is avoided. We present in this extended work additional insight into the complexity and structure of the underlying the feasibility landscape and additional simulation results.


Introduction
The share of large, mostly fossil-fueled power plants is constantly declining. Thus, responsibility for the safe operation of the electric power grid has to be transferred to small and volatile renewable energy resources that are replacing conventional plants (Farhangi 2010). The desired growth in the share of cleaner energy resources, on the other hand, entails new challenges for control strategies within the grid. Meanwhile, solutions are in place for decades for finding an optimal assignment of generation schedules for traditional power plants (Kerr et al. 1966;Lowery 1966;Sheble and Fahd 1994;Padhy 2004). Renewable energy resources are small and volatile by nature. Individual and hardly generalizable operational constraints render similar scheduling tasks and proper modeling of flexibilities way more challenging for renewables (Bremer et al. 2011;Su et al. 2013;Förderer et al. 2018).
Different types of scheduling tasks frequently occur in the smart grid. Among others, such tasks comprise predictive/day-ahead scheduling like planning production based on forecasts for the next day. Ancillary services for power conditioning and supply security like voltage or frequency control are another example (Saboori et al. 2011;Nieße et al. 2012). Day-ahead scheduling tasks in applications for the future smart grid have been investigated for years and many solutions are available. Most of these solutions are only suitable for devices in which the power control is non-discrete. Many solutions are furthermore limited in the number of periods that may be considered.
A problem arises when devices with some (e.g., thermal) storage are considered for time-ahead scheduling, especially if they can only be controlled by step-control.
Step-control allows altering power output-or input respectively-in a few discrete steps (power levels). Many devices may even be limited to be switched on or off only. An attached thermal buffer store with naturally limited capacity constraints the choice of possible power levels in each time interval. The limitation in each time interval also depends on power level decisions for previous time intervals; at least for a recent subset of intervals. Due to this entanglement, the decision for a specific time interval cannot be made independently from the other decisions. A special case of such a constrained discrete scheduling problem has been proven to be NP-hard (number of planned time intervals) even for a single device (Bremer 2006). Another problem is the non-separability in the time domain. Schedules may not be optimized for each time interval individually due to the dependency of the different search space dimensions; possible choices depend on previously taken decisions (e.g., minimum down time). Thus, scheduling here always needs to be treated as a multivariate optimization problem.
In (Bremer and Lehnhoff 2020b), ant colony optimization (ACO) (Dorigo and Di Caro 1999;Dorigo and Stützle 2003) has been applied to solve the problem of finding an optimal schedule for a given energy resource under model-predicted technical operational constraints. Different objectives defining optimality are supported. As an example, the optimality of an operation schedule may be calculated by integrating over the product of generated electricity and some given energy prices or by the similarity to (correlation with) some targeted generation profiles. Time-varying energy tariffs and load balancing are two traditional examples (Faria and Vale 2011;Wedde et al. 2008).
Ensuring the feasibility of solutions found by new control algorithms is crucial for a wide acceptance by different stakeholders in the energy domain (Bremer and Lehnhoff 2016). This holds especially true if heuristics are used. We extend the findings from Bremer and Lehnhoff (2020b) by a landscape analysis of the constraint-handling part. Fitness landscape analysis is a tool for scrutinizing the structure and characteristics of-especially, non-linear-optimization problems in advance (Watson 2010). Reformulating constraints to make them quantifiable allows an adoption of techniques from fitness landscape analysis to scrutinize non-linear constraints and the related difficulty of ensuring the feasibility by design. Designing solution candidates (operation schedules in this case) that are provably feasible in advance, on the other hand, guarantees feasible solutions and enables optimization with anytime property; meaning that heuristics return the best-so-far feasible solution whenever they are stopped. As many real-world problems need to adhere to some given deadline, this is an important property for heuristics.
The rest of the paper is organized as follows. The paper starts with a brief overview of related work concerning the continuous problem version. After formalizing the discrete problem and proving single device NP-hardness using the example of co-generation, a landscape analysis of the concurrent problem of ensuring the feasibility is conducted. We recap the proposed solution based on ant colony optimization with the integrated simulation model from Bremer and Lehnhoff (2020b) and present some additional results.

Related work
Within the future smart grid, many use cases will require the adaptation of operation, especially of small-sized energy resources to some given electric production or consumption profile (Ramchurn et al. 2012;Behrangrad 2015;Ruiz-Romero et al. 2014). In this way, a schedule has to be determined for a controllable energy resource. The term schedule often refers to a real-valued vector x ∈ ℝ d with each element x i denoting the mean active power (or equivalently the amount of energy) generated or consumed during the ith time interval. Negative values denote consumption. In the discrete step-control case, a schedule x ∈ ℕ d will denote the indices of the respective power levels (operation modes)-with zero denoting off-setting. Exemplary use cases are demand-side management (Gellings and Parmenter 2016), microgrid control (Nosratabadi et al. 2017), demand response (Palensky and Dietrich 2011), or virtual power plants (Saboori et al. 2011).
To provide the necessary flexibility for grid operation, adaptive behavior will be necessary for generation units like small co-generation plants, for controllable consumers like cool storages, as well as for prosumers like batteries. Applications comprise decentralized production planning, ancillary services like frequency or voltage control, virtual power plants, micro-grid control, or grid-compliant charging of electric vehicles (Lu et al. 2018;Baharlouei and Hashemi 2013;Deng et al. 2015;Mukherjee and Gupta 2014;Ramchurn et al. 2012). Orchestration of decentralized energy resources might be achieved by direct control or by indirect control via incentives like varying energy prices (Khan et al. 2016;Sonnenschein et al. 2006). Both cases can be attacked with the same algorithm.
The number of algorithms applied to (in-house) energy management problems is numerous, but solutions to devices with step-control are scarce (Beaudin and Zareipour 2015). Many approaches consider only continuous power level variations; examples can be found in (Boynuegri et al. 2013;Koch et al. 2009). In (Capone et al. 2013), an example with binary decision variables and a mixed integer non-linear program can be found, but without including appliances with storage constraint characteristics. In contrast, for the continuous case, several solutions are available that have been designed to handle constraints imposed by some integrated buffer stores, for example, by decoders as constraint-handling technique . Examples are given in (Bremer and Lehnhoff 2016;Li et al. 2017;Yu et al. 2013;Bremer et al. 2009). We will now consider the stepcontrol case that constitutes the discrete optimization problem. A good overview can also be found in (Beaudin and Zareipour 2015). Dethlefs et al. (2015) presented an ant colony-based solution to a closely related combinatorial problem. The authors propose a solution for finding discrete starting times for shiftable loads in demand-side management. A solution to the step-control case with discrete power levels is given in (Hinrichs et al. 2014;Hinrichs and Sonnenschein 2017). The problem is solved with an agent-based approach. Each agent represents the flexibility of its controlled energy resource by a fixed set of a priori model-predicted schedules. The problem is then treated as a (distributed) constraint optimization problem. This approach does not consider large portions of the real flexibility of the energy resources as it uses just a small subset of example instances. Bremer and Lehnhoff (2020b) integrated the prediction model directly into the algorithm and hence harnessed the full potential and flexibility of the controlled energy resource. Applications of more recent ant systems like the max-min ant system (Stützle and Hoos 2000) within the smart grid domain are rather scarce. An example for energy savings within the transport sector is given in (Ke et al. 2009).
Most works propose using heuristics as an appropriate means for attacking scheduling problems within the smart grid. In (Nieße et al. 2017), a systematic analysis of the occurrence of local minima for decentralized predictive scheduling problems has been conducted. The used approach of fitness landscape analysis is especially suitable for analytically difficult, non-linear objectives, decentralized problems, or complex relations between objective and constraints. Different approaches for fitness landscape analysis can be found in (Pitzer and Affenzeller 2012;Watson 2010). Surprisingly, not many a priori studies on practical problems are published. Some examples can be found in (Tavares et al. 2008;Rapp 2019;Nieße et al. 2017).
We applied fitness landscape analysis to scrutinize the complexity of ensuring the feasibility of solutions. Feasibility by design in energy management can also be achieved by metamodeling the flexibility as a multi-dimensional state space of feasible operations of an energy resource. Examples of machine learning-based a priori modeling can be found in (Bremer et al. 2011;Schiendorfer 2014;Bremer and Sonnenschein 2013;Förderer et al. 2018;Förderer and Schmeck 2019;Pinto et al. 2017). Although being quite accurate, guarantees cannot be given by any stochastic approach. In addition, these methods often cannot guarantee a full coverage of the complete flexibility.

Problem statement
The problem from the smart grid domain that we consider here has some specific characteristics that make its solution hard. Thus, we first analyze these characteristics.
We start by explaining the underlying planning problem from the smart grid domain, which is to find an operation schedule for a given energy device. Such schedule for some given, future planning horizon denotes the operation of the device such that individual constraints are satisfied and some given goal is met. Often, such a schedule is given as a real-valued vector denoting the mean generated or consumed power during the respective time interval. A typical scenario comprises 96 dimensions for all 15-minute intervals of a day. Several solutions exist for the real-valued case. We extend the planning problem to the case of step-controlled devices with discretized power output levels.
When the planning procedure starts, the following information is available: starting time t 0 , the initial operational state of the device at t 0 , and some cost c 1 , … , c d for d discrete time intervals T = {T 1 , … , T d } (not necessarily equidistant). For the period of one-time interval, costs are considered constant. Examples are monetary costs for electricity given by time-varying prices, or set points given by some external smart grid control strategy (Sarstedt et al. 2019). In the latter case, costs are defined by the deviation from the desired energy feed-in.
The step-controller of a device may choose among a limited number of m discrete power levels P = {P 1 , P 2 , … , P m } denoting operated mean active power, for the duration of each time interval. The relation between power levels and the index of the operation mode is given by a device-specific mapping where u i ∈ {1, … , m}, P i ∈ P; Negative values for active power (as a result of ) are allowed and denote consumption instead of generation. With this we follow the active sign convention from electrical engineering for defining the sign of electric power in an electric circuit (Goswami 2004). We do not consider switching of power levels during a time interval. Equivalently, to mean active power, the amount of energy consumed or produced during T i could be used instead. T i denotes the ith time interval from time t i−1 to time t i . Time t i denotes the absolute time.
Controlling devices with some integrated (energy) storage like cooling devices, batteries, co-generation plants, or similar lead to a dependency of possible choices at later planning intervals on previous choices, but allow on the other hand for additional flexibility. Let R be the controlled variable, e.g., the state of charge of a thermal buffer store (represented by the temperature), controlled by heating through a co-generation plant and disturbed by thermal demand and hot water usage. Keeping R close to the set point or at least within an allowed range may be achieved with different operation schedules, as heat can be intermediately stored in the buffer store. In this case, the temperature within the thermal store may vary within a given range. In general, operation modes for a co-generation plant have to be chosen in a way that for every point in time: R ∈ [R min , R max ].
Which operation modes P T i ⊆ P specifically are possible during time interval T i depends on the operational state of the device at the beginning of an interval T i and thus depends on the previous operations. P T i denotes the set where ⊕ denotes the operation that determines R T i in interval i from R T i−1 and operated power (if not given by the initial state) and from the chosen operation mode. R denotes the set of all feasible values of R and depends on the technical specification of the controlled device.
When looking at a single device, the problem can be defined as follows. Assuming constant power input (or feed-in) the amount of energy consumed or generated during time interval T i is given by E i = T i P i , with T i = t i − t i−1 . With this notation, we are open to future scenarios with different operational and pricing time intervals with different and unequal lengths. Thus, the cost for the whole planning horizon of a single device is given by: In case all intervals have the same length , the objective function for g concurrently controlled devices can be simplified and expressed with operation mode schedules x ∈ {1, … , m} d : with • denoting the Hadamard product. F i denotes the feasible region specific to the ith energy resource and with respect to individual technical restrictions; x i denotes the operational mode schedule of the ith device.
In the following, we consider only cases with equidistant time intervals, and if not stated otherwise, with a single device. Constraints are given by the controlled variable R that has to stay within the given range: A set of device-specific operational constraints further restrict possible operations. Examples of such constraints are: the allowed power range, buffer restriction, min./max. on/off times, or ramping (Bremer and Lehnhoff 2020a;De Angelis et al. 2013;Nieße et al. 2016). ( For each time interval, an assignment of electrical power has to be found such that the controlled variable R stays within the allowed range and technical, operational constraints are fulfilled. In the case of a co-generation plant, R relates to the buffer store temperature. Due to the interconnection (induced by the buffer) of possible power levels in each time interval, they cannot be chosen independently. Each subset of possible feasible power levels for time interval T i depends (in the worst case) on all choices for T 1 , … , T i−1 . This is the major reason for the computational complexity, not the (rather simple) objectives. For the special case of in-house energy management and a two-step controlled single fridge, the problem has shown to be NP-hard with the growing planning horizon (Bremer 2006).

Complexity analysis
When examining the complexity of the scheduling problem and the proposed solution, we have to distinguish between actual scheduling and ensuring the feasibility by design. The complexity of the scheduling problem can be analytically attacked. The operations for constructing feasible solutions, on the other hand, involve individually parameterized simulation models with unknown starting conditions. Thus, we split examinations and start with a formal analysis of the NP-hardness of the scheduling part prior to a fitness landscape analysis of the feasibility part. The feasibility analysis cannot be derived formally and needs simulation to evaluate the feasibility.

Scheduling complexity
The scheduling problem sketched above is NP-hard already for a single device; with the problem, size depending on the number of intervals in the planning horizon. NP-hardness has already been proven in the case of a fridge (Bremer 2006). In the context of a co-generation plant with the attached thermal buffer store used in this study, the NP-hardness of the problem can be derived as follows.
As a model, we used a simulation model that takes an electric schedule and calculates the temperature profile within the buffer store by repeatedly evaluating a difference equation derived from a differential equation describing the physical model. The feasibility of the electric schedule is determined by checking different technical constraints: the allowed temperature range, minimum on and off times, ramping, allowed electric power range, and fulfillment of heat demand.

Proof
For showing NP-equivalence of the scheduling problem, NP-completeness of the related decision problem has to be demonstrated. We define the decision problem SATCHP as follows: Given: a vector of fixed individual costs c for each time interval, a value for the total cost c tot , and an instance of the scheduling problem for a co-generation plant. Question: does any combination of feasible power levels exist such that for the total cost In a first step, we show that ∈ NP . It can be easily seen that this is the case: An assignment B = (v 1 , … , v d ) of power levels to the d time intervals (with v i ∈ P ) can be guessed in polynomial time; actually in O(d) . Checking whether this non-deterministically found assignment is a solution to SATCHP can also be done in polynomial time using equation For checking the feasibility, the simulation model is initialized with the initial state (assignment of variables in constant time). For each time interval, the follow-up state of the model is calculated using a difference equation for the new buffer store temperature (constant time). Additional constraints are checked by simple condition checking (e.g., temperature range). Thus, checking the feasibility is also just linear with the number of time intervals. Consequently, ∈ NP holds. Now, we can show NP-completeness of SATCHP by reducing the NP-complete problem SUBSET SUM to SATCHP. SUBSET SUM is defined as follows (Schöning 2001): Given: a set of natural numbers a 1 , a 2 , … , a n ∈ ℕ as well as b ∈ ℕ.
The reduction ≤ p can be achieved by the following mapping: We choose a co-generation plant with only two modes (off and on) resulting in relative power levels 0 and 1 (100% of rated power). Moreover, let the length of all time intervals be = 1 , then there exists a subset

Hardness of ensuring the feasibility
As has been argued, checking the feasibility for a single schedule can be done in linear time. Nevertheless, systematically searching for a feasible schedule (even an arbitrary one that is not necessarily optimal) already entails some challenges. For gaining more insight into the complexity of ensuring the feasibility by design already during optimization, we adapted a technique for fitness landscape analysis to feasibility landscapes. One major aspect of fitness landscape analysis is analyzing the structure of a fitness landscape in optimization in order to derive characteristics like shape, number, and distribution of (local) minima, flat valleys, and so forth. These characteristics in turn give evidence on the expected complexity of the optimization problem. Many different procedures have so far been proposed for analyzing the landscape of fitness (Pitzer and Affenzeller 2012). As a starting point, we used an approach by (Vassilev et al. 2000) that has already been used in energy management-related questions (Nieße et al. 2017). Given an optimization problem, a landscape L is defined by the search space S , a neighborhood relation N and an error function (equivalently: the fitness) . Plotting overall x ∈ S yields the landscape that is explored by an optimization algorithm in search of an optimal point. The structure of this landscape determines (premature) convergence and thus whether the algorithm succeeds with a sufficiently small budget or might get stuck within some local optimum. On the other hand, information about the expected number of objective evaluations before reaching promising regions may be derived.
Fitness landscape analysis usually works on landscapes spanned over the parameter space and ignores infeasible parts. As we want to analyze the hardness of ensuring the feasibility as a side task of our actual optimization problem, we need a quantifiable criterion. The feasibility usually is binary, that is, a solution is either feasible or infeasible. But, we can define partial feasibility f (x) by measuring the share of feasible intervals of a d-dimensional schedule x of power levels instead of absolute feasibility. We use F p denotes a function that counts the number of feasible intervals that can be operated without violating any constraint. Practically, the number is determined by simulating the operation until the first error occurs. Relating this count to the length d of the schedule yields the measure for the partial feasibility. One way to analyze the structure by scrutinizing the partial feasibility correlation of neighboring candidate solutions (Vassilev et al. 2000). Neighboring solutions from flat regions of the landscape exhibit a higher correlation than solutions from rugged parts of the landscape. Thus, the correlation can be seen as some measure that estimates the ruggedness of L.
We study the autocorrelation of random paths on the landscape (random walk through S ) (Weinberger 1990). Let x be a solution to the scheduling problem Eq. (3). Each element denotes a schedule and each element of a schedule represents the index of the operated power level during the respective time interval i. Let {f k } k=1 be a sequence of feasibility quantifications sampled as follows: starting from a randomly chosen solution x 0 ∈ S , successive, neighboring solutions x k+1 are generated according to Merkuryeva and Bolshakovs (2011) by altering each element in x k by adding or subtracting 1 (switch to neighboring power level) with a probability of 1/3 each: where x k+1 is the neighbor solution generated from x k and r is real number uniformly randomly sampled from the interval [0, 1].
The series {f k } k=1 now contains values f k = f (x k ) . Next, we calculate the autocorrelation for a given path length , with E[f k ] and V[f k ] denoting expectation and variance, respectively. Moreover, Weinberger (1990) defines the correlation length = − 1 ln( (1)) , denoting the mean distance (in the sense of neighboring hops in N ) after which the majority of the solutions is no longer correlated (Vassilev et al. 2000). The correlation length can also be interpreted as the expected maximum width of flat valleys in the landscape.
Another aspect of fitness landscape analysis is information analysis. Analyzing the correlation of random paths on the landscape gives an impression of the structure. In (Vassilev et al. 2000) an extended analysis based on entropy measures on {f k } k=1 is proposed. Founded on ideas from algorithmic information theory (Chaitin 1987) and the Shannon entropy (Shannon 1948), a characterization of the distribution and number of local optima along the path is given as a measure of the complexity of the fitness landscape. For the following indicators, random paths on the landscape are seen as ensemble of basic elements. Three element types (token) can be distinguished: -flat areas (neighboring points have similar fitness), -isolated points (surrounded solely by better, or solely by worse points), and -slope points (neither isolated nor flat).
In a first step, each path is transformed in a sequence of tokens S( ) = s 1 s 2 … s o over the alphabet {1, 0, 1} by for a given ∈ [0, max f k ] (cf. Vassilev et al. 2000). A string S( ) then contains information on the structure of the landscape along a randomly chosen path. Now one can define an object by two successive tokens in the string. For example, the sequence 11 denotes a change from downslope to upslope and thus a trough. The entropy measure for such an ensemble of objects can be determined according to (Vassilev et al. 2000): with P [pq] denoting the frequency of the occurrence of the sequence pq in S( ) . Also, the modality of the objective function can be derived from S( ) . The entropy is a measure of the diversity of objects along the path. In contrast, the modality must be measured by a classification of objects in order to determine the number of (local) optima. First, the partial information content is determined (Vassilev et al. 2000). To achieve this, the string S( ) is transformed into S � ( ) = o 1 o 2 … o over the alphabet {1, 1} . S � ( ) is determined in a way that it contains the shortest string that represents the alternations from uphill to downhill changes along the path. This can be done recursively. The partial information content is then defined as (Vassilev et al. 2000): A value of 1 denotes the maximum modality. The absolute number of (local) minima (according to a given ) can be derived by ⌊ (n ⋅ M( )) −2 ⌋ . All these measures are sensitive to the choice of . Small values lead to a higher sensitivity to changes in fitness between neighboring solutions. The smallest value of that lets all differences vanish is called information stability (Vassilev et al. 2000) (fully flat feasibility quantification function).
For analyzing the complexity of finding feasible solutions, we conducted several experiments. In particular, paths of length 2000 were studied. Each experiment was repeated 1000 times with random starting solutions. Figure 1a shows a first result demonstrating the differences induced by the dimensionality d of the schedules. The correlation length grows with longer schedules. This fact is also reflected by the respective steepness of the absolute correlation along the path of solution evaluations in Fig. 1b. The absolute correlation depicts the decline of the correlation between solutions with growing distance. The correlation length condenses this fact by denoting the mean number of operations after which any correlation disappears. The distance between two solutions in this context is measured as the number of mutation operations Eq. (8) that have transformed one solution into another. The entropy is a measure of diversity along the path and also drops with growing dimension giving evidence-together with the growing correlation length-of larger valleys. The partial information content denotes the relative modality along the path. The result in Fig. 1a reflects a co-generation plant with given fixed hot water drawing profile leading to a general correlation of feasible schedules. Figure 2a shows the same result for a co-generation model with fully random hot water drawing. Obviously, there is almost no impact that is induced by model internal correlation. On the other hand, using 30 power levels changes the game. Figure 2b shows this situation. Having more power levels to choose from simplifies the optimization problem. A similar effect for ensuring the feasibility can be observed at least for growing schedule dimension.
Looking at the range of different interval lengths with a constant number of planning intervals reveals, that the main difference in complexity here is reflected by a growing diversity of paths. The interval length varies from 10 sec to 30 min. The diversity increases mainly during one step between 2 and 3 minutes interval length. From a mathematical point of view, the reason is that one sequence pq is always present (cf. Eq. 9). The root cause lies in the used co-generation model. Due to the constant number of time intervals, a different total time horizon is covered. Starting from an interval length of approximately 80 sec, a sufficiently large period is covered to incorporate morning heating and showering as an additional heat sink. Hence, the feasible region is significantly narrowed. The information content grows due to more active constraints. The standard deviations of the correlation lengths in Fig. 3 show that coarser planning obviously leads to a higher dependency on the initial solution candidate. Some starting solutions lead to a search within rather flat valleys, others do not. Thus, we can conclude that the choice of the starting solution has a major impact on the optimization performance as the feasibility has to be ensured several times when generating candidate solutions.

Creating feasibility by design
As a result of the previous complexity analysis, we can now motivate the necessity of an integrated means for ensuring the feasibility by design when attacking the scheduling problem. NP-hardness motivates the use of a heuristic as large problem instances that are to be expected in practical scenarios can hardly be tackled otherwise. We identified ant colony optimization as a suitable approach for the discrete problem at hand. This algorithm will work with an integrated systematic design of feasible solutions.
When optimizing the schedule for an energy resource (or an ensemble of devices), the recursive dependencies on prior decisions turn out to be problematic for the definition of a neighborhood relation between different solutions. The domain of x i depends on the  Fig. 2b, the number of power levels was increased from 4 to 30. The graphs show the means of (dimensionless) landscape characteristics Fig. 3 Impact of the interval length T on the mean number of correlated succeeding solution candidates (correlation length) and the (dimensionless) entropy of the fitness landscape analysis as measures for the dependency on the initial solution candidate. The number of intervals is kept constant for each experiment assignments of (x 1 , … , x i−1 ) , because the operation of the sub-schedule (x 1 , … , x i−1 ) has an impact on the feasible operational phase-space of the device for x i . It is an NP-hard problem in itself to construct the whole feasible search graph in advance.
Thus, we used an ACO approach in which each artificial ant is equipped with a simulation model of the respective device as a means to ask for possible branches at each node. In this way, the search graph can be generated locally on demand. Figure 4a shows the general idea of the used search graph.
Nodes represent different power levels operated during each planning interval and are organized in layers. Each layer represents the start of a new time period. Each ant has to make its way from the beginning of the planning horizon to the end. Each layer represents the set of all existing power levels independent of the actual operational state; the power level that is to be operated during the next time interval is chosen at the beginning of the interval and fixed for the duration of the interval. Costs for a time interval-cf. Eq. 3-are calculated on the basis of the chosen power level and some global cost information; e.g., on energy prices during the interval. Edges are allowed only in between neighboring layers and are existent just virtually.
For the construction of a solution, each ant starts at time t 0 with the power level given by the initial operation state of the energy resource. At each layer, the path taken so far is passed to the simulation model to calculate the feasible choices for the current decision on a new power level and thus on the next edge. Each ant walks from layer to layer and knows the so far covered path; the simulation model calculates-based on initial state and path-a selection of feasible power levels (subset of feasible edges) for the next time interval; the ant chooses from this selection and moves on to the next layer. Edges materialize only after the simulation model acknowledges feasibility based on the ants previous path. In this way, the feasibility of constructed paths (representing an operation schedule for the energy resource) is ensured. Fig. 4 Path construction by artificial ants for an operational schedule (a single resource; b multiple constraint-coupled resources, 2-dimensional example with i; j meaning ith power level of the first and jth power level of the second unit); cf. (Bremer and Lehnhoff 2020b) The graph of feasible schedules is obviously not complete as often the case in ACO algorithms. The graph of feasible schedules is a d-partite graph (cf. Fig. 4) whose edges all point towards immediate future time intervals: Each node set V i consists of different compositions of power levels that are operable during interval T i for devices G 1 to G g , where V i ⊆ M is a subset of the set of all theoretically possible compositions M: M corresponds to the set of virtual nodes. Which nodes V i become available for ant A and are thus reachable, must always be evaluated based on the current operational state and the so far covered path (course of previously operated power levels). Each time an ant has decided on a power level for a planning interval T it moves along this edge and then involves the simulation model to discover available feasible branches for time interval T +1 ; cf. Fig. 4a. The current partial path ( P T 1 , … , P T ) is passed as input to the simulation model, which is already parametrized with the initial state at T 0 . The model returns a set of feasible power modes (or, respectively, a set of possible absolute power values). This set constitutes possible new edges (i, j) ∈ V pointing from the ith power level in time interval T to the jth power level in T +1 . The weight of each of these new edges results from the electric power and the given cost for this time interval. A decision for an edge from the set of power levels is made by calculating a probability for each edge (i, j): Taking into account the amount of pheromone associated with (i, j) and additionally, a priority rule, intensifies the search within the local neighborhood. For an example, choosing the highest power level as in a greedy approach could be used as a priority rule. This would promise the highest profit. A weighting of both types of information is given by and . According to these probabilities, the next edge is chosen by random wheel selection. With a given probability, an alternative rule is used . This rule uses the maximum product of priority and pheromone ( max j∈I V i ( ij ⋅ ij ) ) as the criterion and allows for searching the immediate neighborhood.
It may occur that an ant finds itself in a dead-end, and the simulation model is not able to find any feasible extension of the current path. In this case, something in the already generated path has to be changed in order to be able to proceed. For this reason, we integrated a classical backtracking as a rollback. In case of a dead-end, the last edge is removed for the ant's path and from the previous set of feasible power levels. Then, another decision is made with the reduced set but with the same mechanisms. If the set is empty, this is again treated as a dead-end and triggers another step backward.
After every ant has constructed a path (each representing an operation schedule denoting electric power for each time interval), all paths are evaluated. The best ants are selected to deposit pheromones on the trail: Parameter controls pheromone evaporation. As the search graph is not known in advance, we cannot use a static data structure for deposition. Instead, we use maps for each time interval containing the edges as key-value-pairs, with a default value of zero if an edge has so far not been present in the map. The key for edge identification is composed of both power-levels i and j at the start and the end of the edge (i, j).
In case the given problem instance comprises more than one energy resource and each energy resource cannot be optimized individually due to a joint constraint, the approach can easily be extended to a higher-dimensional graph for an ensemble of more than just one energy resource, as shown in Fig. 4b for two dimensions. For each time interval, the cross-product of power levels from individual energy units is taken. In this way, the concept of power levels is expanded to tuples of power levels with the rest of the procedure left unchanged.

Simulation results
For testing the algorithm, a simulation model of a co-generation plant has been used. The model has already been evaluated and proven useful in several smart grid-related projects (Neugebauer et al. 2015;Hinrichs et al. 2013;Nieße and Sonnenschein 2015). This model comprises a micro co-generation plant with 4.7 kW of rated electrical power (12.6 kW thermal power) and is bundled with a thermal buffer store. Constraints restrict power band, buffer charging, gradients, min. on and off times, and satisfaction of thermal demand. Thermal demand is determined by simulating losses of a detached house (including hot water drawing) according to given weather profiles. Electric power feed-in is either zero (off) or between 1.3 and 4.7 kW. This operational range is subdivided into m equidistant discrete power levels resulting in m + 1 operation modes in total.
The parameters of our algorithm (weighting of pheromone and priority rule , ; priority rule share , share of depositing ants , evaporation , min. pheromone ) have been tuned in advance by using a Halton sequence (Halton and Smith 1964;Kuipers and Niederreiter 2006) for a random search. A Halton sequence generates a high-dimensional, low discrepancy pseudo-random series. Each value is from the interval [0, 1] and can easily be scaled to the respective parameter domain. A series of 1000 experiments were used as an experimental design to find the best values for the parameters. In each experiment, we considered a load balancing scenario. As a scenario size, we choose to use 96-dimensional schedules. Scenarios with shorter schedules are a subproblem covering just the early part of the longer scenarios and can be solved with the same parameterization. In each experiment, we used the model of a co-generation plant with a randomly initialized state of charge (of the thermal buffer store). The number of operation modes was set to 4 because this covers a wide range of today's co-generation controllers. For each experiment, 20 optimization runs were conducted to find the mean results for evaluation. We tuned the parameters once and used the found values for all later experiments: = 0.82, = 0.87, = 0.5, = 0.145, = 0.51 and = 0.12 . If not otherwise stated, 10 ants were used in each run. We use two scenarios: (1) load balancing and (2) variable energy pricing. In scenario one, the objective is to minimize the distance between the operation schedule x ∈ ℕ d and a (possibly market given) target schedule ∈ ℝ d . As (x) and denote the generated and the wanted amount of energy (or equivalently the mean electrical power) for d successive time intervals, any distance measure would do. We used | ⋅ | 2 during optimization. In scenario two, a time-varying tariff c ∈ ℝ d is given denoting the energy price for each of the d time intervals. Thus, the objective is to minimize the overall cost or for generators (to maximize profit) For all simulations, a randomly initialized model of the co-generation plant is used by the algorithm. Figure 5 shows as a first example the convergence of different runs of a 96-dimensional load balancing scenario. The given budget was 1000 iterations resulting in a max. 10.000 objective evaluations. The residual error denotes the mean absolute error in power feed-in. For a better assessment, this is an additional domain-specific criterion. A deviation from the committed energy feed-in (at the market) would be responsible for possible break-up fees.
Co-generation plants with a controller that allows for a more fine-grained power control are easier to plan. On the one hand, the search space grows significantly with the number of power levels per time interval. But, the relation of feasible to infeasible schedules becomes advantageous with more power levels. Thus, as the tricky part is ensuring the feasibility of solutions, planning becomes easier with fine-grained control. In fact, with a continuous controller, the problem is no longer NP-hard. Table 1 shows the impact of the number of power levels on the solution quality with a fixed budget of evaluations. Table 2 shows the impact of the number of used ants (with a fixed number of iterations).

Fig. 5
Convergence of different runs for a 96-dimensional loadbalancing problem for a 4-modes co-generation plant (known optimum: zero); cf. (Bremer and Lehnhoff 2020b). The residual error is given by the Euclidean distance | ⋅ | 2 between wanted schedule and solution schedule Next, we compared the ACO approach with the covariance matrix adaptation evolution strategy (CMA-ES) (Ostermeier et al. 1994;Hansen 2006). CMA-ES is a well-known evolution strategy for solving multimodal black-box problems and has demonstrated excellent performance (Hansen 2016), especially for non-linear, non-convex black-box problems. CMA-ES improves its operations by harnessing lessons learned from previously successful evolution steps for future search directions. A new population of solution candidates is sampled from a multivariate normal distribution N(0, C) with covariance matrix C , which is adapted such that it maximizes the occurrence of improving steps according to previously seen distributions for good solution candidates. Sampling offspring is weighted by a selection of solutions from the parent generation. In a way, the method learns a second-order model of the objective function and exploits it for structure information and for reducing calculations of objective evaluations. In order to use CMA-ES for the non-linear discrete step-control problem, we relaxed it to an equi-dimensional continuous search space and rounded the results back to ℕ . The constraints for co-generation plant operation were integrated using a classical penalty approach (Smith and Coit 1997). We added a penalty term to the objective that reflects the inverse length of the feasible part of a solution schedule. A second penalty reflecting infeasible power levels improved the result. The weighting for the scalarization of the objectives has again been tuned using Halton sequences. For algorithm parametrization, we relied on Smith and Coit (1997) giving recommendations for a wide range of applications. Figure 6 shows two results with budgets of 20000 (a) and 10 6 (b) objective evaluations. Both algorithms have been tested on load balancing problems with different dimensions. The number of power levels was set to 5. As can be seen, ACO performs better; probably because ACO constructs feasible solutions in a systematic manner, Table 1 Impact of the number of power levels that the device controller may pilot on the residual error after optimization (for a 32-dimensional load balancing scenario and a budget of 1000 iterations); cf. (Bremer and Lehnhoff 2020b) No. of op. modes Error ( | ⋅ | 2 of schedules) Mean abs. error (kW) 2 4.500 × 10 −1 ± 4.218 × 10 −2 5.850 × 10 −1 ± 5.484 × 10 −2 3 9.438 × 10 −2 ± 7.016 × 10 −2 1.354 × 10 −1 ± 9.757 × 10 −2 4 2.423 × 10 −2 ± 3.841 × 10 −2 2.853 × 10 −2 ± 4.521 × 10 −2 5 1.020 × 10 −2 ± 2.502 × 10 −2 9.247 × 10 −3 ± 2.282 × 10 −2 6 4.557 × 10 −3 ± 1.288 × 10 −2 4.310 × 10 −3 ± 1.109 × 10 −2 7 3.989 × 10 −3 ± 1.401 × 10 −2 2.748 × 10 −3 ± 9.461 × 10 −3 3.168 × 10 −2 ± 2.675 × 10 −2 3.572 × 10 −2 ± 3.121 × 10 −2 50 3.419 × 10 −2 ± 2.266 × 10 −2 3.928 × 10 −2 ± 2.836 × 10 −2 whereas CMA-ES is just guided by the degree of feasibility calculated afterward. This is also reflected in the results from Table 3 showing the respective feasibility of the found results. The ACO approach achieves a share of 100% feasible results regardless of the dimension of the problem. CMA-ES achieves an acceptable share of feasible results only for low-dimensional problems. Significantly increasing the budget for objective evaluations improves the result, but the penalty approach still fails to ensure the feasibility for higher-dimensional problems. For the quite usual case of planning one day with 15-minute resolution (96 dimensions), CMA-ES fails completely. The problem, obviously, is the discrete nature of the problem. Increasing the number of power levels lets CMA-ES become very competitive. We tested two scenarios (with a dimension of 48 and otherwise the setting from Fig. 6b). With 100 power levels CMA-ES scores with a residual error of 0.0864 ± 0.0701 and for 1000 power levels the result was 0.0573 ± 0.0405 . The remaining issue still is the feasibility of the schedules. The share (in percent) in both experiments was 88.75 ± 19.83 and 98.96 ± 4.66 respectively. Thus, CMA-ES actually does a good job. The crucial part of the scheduling problem is ensuring the feasibility. We have tested other standard heuristics, but with even worse results. Again, the reason is that each time interval entails independent constraints. This effect has also been observed in other studies .   Figure 7 shows some example runs for a multi-plant scenario. Here, 10 co-generation plants are controlled by artificial ants at the same time. Finally, two example results on time-varying energy prices are shown in Figs. 8; 8a shows an example with two peak prices in the morning and the evening, Fig. 8b shows the example of a photovoltaics dominated regime with the aim of shifting feed-in from co-generation to the dark hours. In both cases, feed-in is successfully shifted to more profitable hours. An exception is the early hours of the day because the empty buffer store had to be charged first to gain enough flexibility for the rest of the day.
In addition to (Bremer and Lehnhoff 2020b), we conducted experiments regarding the effort in ensuring flexibility. Ideally, the ants may simply have a walkover when constructing a feasible schedule, but often constraints are too tight, and the ants get stuck in a situation where no feasible move to the next time interval is left. In such situations, a backtracking mechanism steps in. Figure 9a shows the results of an experiment counting the backtracking steps that eventually are responsible for slowing down the feasibility design process and thus the optimization. We counted the taken backtracking steps per ant while the optimization was run with a fixed budget of 10000 objective evaluations for all ants in  Figure a shows the results of a scenario with two price peaks in the morning and in the evening; b shows a scenario with high photovoltaics feed-in resulting in low prices during the sunny hours; cf. (Bremer and Lehnhoff 2020b) 1 3 total. The figure displays the means as well as the positions of the lower and upper quantiles and the inter-quantile distance as an area between the plots. As expected, the number of backtracking steps grows with growing schedule length and thus growing problem size. But more importantly, this is obviously not an exponential growth. The experiment in Fig. 9a has been conducted with a setup of three power levels. Having more levels eases the optimization, but also eases the systematic construction of feasible schedules. This can be seen in Fig. 9b. Here, the number of backtracking steps is counted for different numbers of power levels for a fixed problem size of 96 dimensions. Thus, it is highly desirable to have some more power levels for optimization if the problem size grows beyond 48 intervals. Nevertheless, increasing the number of power levels that an engine controller might operate seems to be the most effective countermeasure to keep the NP-hard optimization process efficient for larger problem sizes.

Conclusions
The paradigm shift in control that can currently be seen in energy supply systems entails new challenges to algorithmic control. Regarding the smart grid, small and distributed energy resources have to be integrated into the grid. Optimizing and orchestrating stepcontrolled devices in day-ahead scheduling is NP-hard already for single devices with a growing number of time intervals within the planning horizon. Providing ancillary services for power conditioning (like voltage or frequency control) requires planning in short periods resulting in higher-dimensional optimization problems in the future. New types of generation units and controllable consumers entail new difficulties in constraint-handling, especially if some buffer technology for intermediate energy storage does not allow for an independent consideration of different time intervals. If this is the case, the problem has to be solved by multi-dimensional optimization.
These challenges render heuristics the most promising as a possible solution. On the other hand, a wide acceptance of such approaches in practice demands some guarantee on the feasibility of solutions. Ensuring the feasibility by design in the sense of systematically Fig. 9 Mean number of necessary backtracking steps for constructing the feasibility (solid lines) and interquantile distance of the variability (filled area) depending on the dimension of the schedules (a) and on the number of power levels (b) constructing new solution candidates by operators, a priori as a feasible solution is an important precondition for an application within a critical infrastructure like the energy supply system.
In Bremer and Lehnhoff (2020b) we already proposed an ant colony-based approach with model integration for constraint-handling. By integrating a model of the planned energy resource, artificial ants may decide on feasible further directions while already underway and constructing their path in the search graph. The feasible graph does not have to be known in advance. A priori construction would also be NP-hard.
Here, we applied fitness landscape analysis to answer the question regarding the complexity of finding feasible solutions. We showed that ensuring the feasibility-as a sub-task of the actual optimization problem-is an additional, non-trivial problem. Integrating a model of the energy resource into the ant algorithm and allowing backtracking mechanisms as fallback guarantees that the ants at all times only consider feasible solutions-at least if there exists one. Due to the landscape analysis, some first insights into prerequisites for accelerating the feasibility operator have already been derived. Simulation results rendered this approach valid, competitive, and sufficiently fast.
So far we have used the classical ant system. In future work, we will also implement integration into more recent algorithms like the max-min ant system and expect a further improvement of the results.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.