Learning-Assisted Optimization for Transmission Switching

The design of new strategies that exploit methods from Machine Learning to facilitate the resolution of challenging and large-scale mathematical optimization problems has recently become an avenue of proliﬁc and promising research. In this paper, we propose a novel learning procedure to assist in the solution of a well-known computationally diﬃcult optimization problem in power systems: The Direct Current Optimal Transmission Switching (DC-OTS) problem. The DC-OTS problem consists in ﬁnding the conﬁguration of the power network that results in the cheapest dispatch of the power generating units. With the increasing variability in the operating conditions of power grids, the DC-OTS problem has lately sparked renewed interest, because operational strategies that include topological network changes have proved to be eﬀective and eﬃcient in helping maintain the balance between generation and demand. The DC-OTS problem includes a set of binaries that determine the on/oﬀ status of the switchable transmission lines. Therefore, it takes the form of a mixed-integer program, which is NP-hard in general. In this paper, we propose an approach to tackle the DC-OTS problem that leverages known solutions to past instances of the problem to speed up the mixed-integer optimization of a new unseen model. Although our approach does not oﬀer optimality guarantees, a series of numerical experiments run on


Introduction
Power systems are colossal and complex networks engineered to reliably supply electricity where it is needed at the lowest possible cost.For this, operational routines based on the Optimal Power Flow (OPF) problem are executed daily and in real time to guarantee the most cost-efficient dispatch of power generating units that satisfy the grid constraints.In particular, the way power flows through a power network is determined by the so-called Kirchhoff 's laws.These laws are responsible for the fact that switching off a transmission line in the grid can actually result in a lower electricity production cost (a type of "Braess' Paradox") and have provided power system operators with a complementary control action, namely, changes in the grid topology, to reduce this cost even further.The possibility of flexibly exploiting the topological configuration of the grid was first suggested in [1] and later formalized in [2] into what we know today as the Optimal Transmission Switching (OTS) problem.Essentially, the OTS problem is the OPF problem enriched with a whole new set of on/off variables that model the status of each switchable transmission line in the system.The OPF formulation we use as a basis to pose the OTS problem is built on the widely used direct current (DC) linear approximation of the power flow equations.Even so, the resulting formulation of the OTS problem, known as DC-OTS, takes the form of a mixed-integer program, which has been proven to be NP-hard for general network classes [3,4].
Thus, the DC-OTS problem consists in finding the configuration of the power network that results in the cheapest dispatch of the power generating units subject to constraints such as thermal limits on transmission lines, generating units' capacity bounds, and network connectivity conditions.To date, the resolution of the DC-OTS has been approached from two distinct methodological points of view, namely, by means of exact methods and by way of heuristics.The former exploit techniques from mixed-integer programming such as bounding, tightening, and the generation of valid cuts to solve the DC-OTS to (certified) global optimality, while the latter seek to quickly identify good solutions of the problem, but potentially forgoing optimality and even at the risk of suggesting infeasible grid topologies.
Among the methods that are exact, we highlight the works in [3], [4], [5], and [6].More specifically, the authors in [3] propose a cycle-based formulation of the DC-OTS problem, which results in a mixed-integer linear program.They prove the NP-hardness of the DC-OTS even if the power grid takes the form of a series-parallel graph with only one generation-demand pair, and derive classes of strong valid inequalities for a relaxation of their formulation that can be separated in polynomial time.In [4], the authors work instead with the mixed-integer linear formulation of the DC-OTS that employs a big-M to model the disjunctive nature of the equation linking the power flow through a switchable line and the voltage angles at the nodes the line connects.This is the formulation of the DC-OTS we also consider in this paper.The big-M must be a valid upper bound of the maximum angle difference when the switchable line is open.In [4], it is proven that determining this maximum is NP-hard and, consequently, they propose to set the big-M to the shortest path between the nodes concerned over a spanning subgraph that is assumed to exist.The authors in [5] conduct a computational study of a mixed-integer linear reformulation of the DC-OTS problem alternative to that considered in [4].This reformulation makes use of the so-called power transfer distribution factors (PTDFs) and the notion of flow-cancelling transactions to model open lines.They argue that this reformulation comparatively offers significant computational advantages, especially for large systems and when the number of switchable lines is relatively small.Finally, a family of cutting planes for the DC-OTS problem are developed in [6].These cutting planes are derived from the polyhedral description of the integer hull of a certain constraint set that appears in the DC-OTS problem.Specifically, this constraint set is made up of a nodal power balance equation together with the power flow limits of the associated incident lines.Those of these limits that correspond to switchable lines are multiplied by the respective binary variable.
In practice, though, the complexity and size of real-life power grids often render exact solutions computationally infeasible.Therefore, heuristics, or approximate solution methods, become essential for tackling the DC-OTS efficiently.Among the heuristic methods that have been proposed in the technical literature, we can distinguish two main groups.The first group includes the heuristic approaches that do not rely on the solutions of previous instances of the OTS problem.For example, some heuristics trim down the computational time by reducing the number of lines that can be switched off [7][8][9].While these approaches do not reach the maximum cost savings, the reported numerical studies show that the cost increase with respect to the optimal solution is small in most cases.Other related works maintain the original set of switchable lines and determine their on/off status using greedy algorithms [10,11].They use dual information of the OPF problem to rank the lines according to the impact of their status on the operational cost.Finally, the authors of [12] propose solving the OTS problem in parallel with heuristics that generate good candidate solutions to speed up conventional MIP algorithms.The second group comprises data-based heuristic methods that require information about the optimal solution of past OTS problems.For instance, the authors of [13] use a K-nearest neighbor strategy to drastically reduce the search space of the integer solution to the DC-OTS problem.In particular, given a collection of past instances of the problem (whose solution is assumed to be known and available), they restrict the search space to the K integer solutions of those instances which are the closest to the one to be solved in terms of the problem parameters (for example, nodal demands).They then provide as solution to the instance of the DC-OTS problem under consideration the one that results in the lowest cost.This last step requires solving K linear programs, one per candidate integer solution.Conversely, various alternative data-driven methods, distinct from the K nearest neighbor, have also been explored to enhance the solution of the DC-OTS problem.For example, references [14][15][16] present sophisticated methodologies to learn the status of switchable lines using neural networks.
Against this background, in this paper, we propose a novel method to address the DC-OTS by exploiting known solutions to past instances of the problem.Indeed, according to [17,18], our approach aligns with machine learning strategies that extract valuable insights from prior solutions of an optimization problem, subsequently applying this knowledge to address new, unseen instances.Specifically, our approach leverages information from previous instances in two distinct yet potentially synergistic ways.First, from these past solutions, we infer those switchable lines that are most likely to be operational (resp.inoperative) in the current instance of the problem (the one we want to solve).Mathematically, this translates into fixing a few binaries to one (resp.zero), an apparently small action that brings, however, substantial benefits in terms of computational speed.Second, beyond the speed-up that one can expect from simply reducing the number of binaries in a MILP, this strategy also allows us to leverage the shortest-path-based argument invoked in [4] to further tighten the big-Ms in the problem formulation, with the consequent extra computational gain.
Alternatively, we also investigate the potential of directly inferring the big-M values from past solutions to the problem, eliminating the need for the shortest-path calculation.In any case, the inference of the binaries to be fixed and/or the values of the big-Ms to be used is conducted through a Machine Learning algorithm of the decision-maker's choice.In this paper, we have opted for the use of the K-nearest neighbors methodology due to its simplicity, as well as its interpretability and low computational time required for the training task.Besides, this approach has demonstrated success in mitigating the complexity of related challenges, such as the widely studied DC Unit Commitment problem, as evidenced by prior works [19,20].
Importantly, while our proposal is not endowed with theoretical guarantees of optimality (and thus, belongs to the group of heuristics discussed above), the role that Machine Learning plays in it is supportive rather than surrogative (we still need to solve the MILP problem), which results in significantly lower rates of infeasibility and suboptimality, as demonstrated in the numerical experiments.
The remainder of this paper is structured as follows.Section 2 introduces the DC-OTS problem mathematically and discusses how to equivalently reformulate it as a mixed-integer linear program (MILP) through the use of large enough constants (the so-called big-Ms).Section 3 describes the different methods we consider in this paper to identify the most cost-efficient grid topology of a power system, including those we propose and those we use for benchmarking.A series of numerical experiments run on a 118-bus power system typically used in the context of the DC-OTS problem are presented and discussed in Section 4. Finally, conclusions and further research are duly drawn in Section 5.

Optimal transmission switching
We start this section by introducing the standard and well-known formulation of the Direct Current Optimal Transmission Switching problem (DC-OTS), which will serve us a basis to construct and motivate its mixed-integer reformulation immediately after.
Consider a power network consisting of a collection of nodes N and transmission lines L. To lighten the mathematical formulation of the DC-OTS, we assume w.l.o.g that there is one generator and one power load per node n ∈ N .The power dispatch of the generator and the power consumed by the power load are denoted by p n and d n , respectively.Each generator is characterized by a minimum and maximum power output, p n and p n , and a marginal production cost c n .We represent the power flow through the line (n, m) ∈ L connecting nodes n and m by f nm , with f nm ∈ [−f nm , f nm ].For each node n we distinguish between the set of transmission lines whose power flow enters the node, L + n , and the set of transmission lines whose power flow leaves it, L − n .The power network includes a subset L S ⊆ L of lines that can be switched on/off.If the line (n, m) ∈ L S , its status is determined by a binary variable x nm , which takes value 1 if the line is fully operational, and 0 when disconnected.In a DC power network, the flow f nm through an operational line is given by the product of the susceptance of the line, b nm , and the difference of the voltage angles at nodes n and m, i.e., θ n − θ m .We use bold symbols to define the vec- x nm ∈ {0, 1}, ∀(n, m) ∈ L S (1h) The objective is to minimize the electricity generation cost, expressed as in (1a).For this, the power system operator essentially decides the lines that are switched off and the power output of generating units, which must lie within the interval [p n , p n ], as imposed in (1b).The flows through the transmission lines are governed by the so-called Kirchhoff 's laws, which translate into the nodal power balance equations (1c) and the flow-angle relationship stated in (1d) and (1e).In the case of a switchable line, this relationship must be enforced only when the line is in service.This is why the binary variable x nm appears in (1d).Naturally, x nm = 0 must imply f nm = 0. Constraints (1f) and (1g) impose the capacity limits of the switchable and non-switchable lines, respectively.Constraint (1h) states the binary character of variables x nm , while equation (1i) arbitrarily sets one of the nodal angles to zero to avoid solution multiplicity.
Problem ( 1) is a mixed-integer nonlinear programming problem due to the product x nm (θ n − θ m ) in (1d).This problem has been proven to be NP-hard even when the power network includes a spanning subnetwork connected by non-switchable lines only [4] or takes the form of a series-parallel graph with a single generator/load pair [3].The disjunctive nature of Equation (1d) allows for a linearization of Problem (1) at the cost of introducing a pair of large enough constants M nm , M nm per switchable line [21].Indeed, Equation (1d) can be replaced by the inequalities provided that the large constants M nm , M nm respectively constitute a lower and an upper bound of b nm (θ n − θ m ) when the line (n, m) is disconnected (x nm = 0), that is, where Otherwise, i.e., if x nm = 0, Equation (1f) leads to f nm = 0, which, together with (2), results in Finally, by Equation (3), we have First of all, for (3) to be of any use, M OPT nm and M OPT nm must be finite.As proven in [4], this is not the case in power systems where switching off lines can result in disconnected subnetworks.The possibility of islanding renders the minimization (3a) and the maximization (3b) unbounded.Consequently, the linearization of the DC-OTS problem based on (2) is not equivalent to its original nonlinear mixed-integer formulation (1) in this case.However, in practice, islanding in power grids is to be avoided in general for many reasons other than the minimization of the operational cost (e.g., due to reliability and security standards).Consequently, in what follows, we assume that the set of switchable lines L S is such that the connectivity of the whole power network is always guaranteed.In this setting, it is ensured that there exist finite valid large constants as stated in (3), namely, those corresponding to the longest path between nodes n and m on the undirected graph represented by the power grid.This already gives us an idea of how difficult the calculation of these constants is.In this vein, the authors in [4] show that, even when M OPT nm and M OPT nm are finite, computing them is as hard as solving the original DC-OTS problem.Therefore, we are obliged to be content with a lower and an upper bound.The choice of these bounds, or rather, of the large constants M nm , M nm (for all (n, m) ∈ L S ) is of utmost importance, because it has a major impact on the relaxation bound of the mixed-integer linear program that results from replacing (1d) with the inequalities (2), that is, min pn,fnm,θn,xnm n Tighter constants M nm , M nm lead to stronger linear relaxations of (4), which, in turn, is expected to impact positively on the performance of the branch-and-cut algorithm used to solve it.Let us define d = [d n , n ∈ N ] and M = [(M nm , M nm ), (n, m) ∈ L S ].We also define the lower and upper bounds of the binary decision variables as x = [x nm , (n, m) ∈ L S ] and x = [x nm , (n, m) ∈ L S ], respectively.Then, we denote as x = OTS(d, M, x, x) the solution of model ( 4) with the additional constraint x ≤ x ≤ x.In the general case, x = 0 and x = 1.However, these bounds may change if the status of some switchable lines are fixed through learning.
On the assumption that the power network includes a spanning tree comprising non-switchable lines, the authors in [4] propose the following symmetric bound: where SP 0 nm is the shortest path between nodes n and m through said spanning tree.Note, however, that the shortest path between two nodes can be modified if some of the switchable lines are known to be connected.In that case, the resulting bounds are reduced.Therefore, for a given status of the switchable lines x, we denote by SP nm (x) the updated shortest path, with SP 0 nm = SP nm (0).Besides, the bounds obtained using Equation ( 5) with the updated shortest paths SP nm (x) is referred to as M = FAT(x).This symmetric bound can be computed in polynomial time using Dijkstra's algorithm [22].
In this paper, we propose and test simple, but effective data-driven scheme based on nearest neighbors to estimate lower bounds on M OPT nm and upper bounds on M OPT nm .This scheme is also used to fix some of the binaries x nm in (4).While the inherent sampling error of the proposed methodology precludes optimality guarantees, our numerical experiments show that it is able to identify optimal or nearly-optimal solutions to the DC-OTS problem very fast.

Solution methods
In this section, we present the different methods we consider to solve the DC-OTS problem.First, we describe the exact method proposed in [4], which we use as a benchmark.Second, we explain a direct learning-based approach that utilizes the K nearest neighbors technique and the learning-based heuristic approach investigated in [23].Finally, we introduce the data-based methodologies proposed in this paper.
Suppose that the DC-OTS problem (4) has been solved using the big-M values suggested in [4] for different instances to form a training set is the vector of optimal binary variables, which determine whether line (n, m) in instance t is connected or not; and θ t = [θ t n , n ∈ N ] is the vector of optimal voltage angles.For notation purposes, we use C(d t , x t ) to denote the value of the objective function (1a) when model ( 1) is solved for demand values d t and the binary variables fixed to x t .This function can be evaluated for any set of feasible binary variables x t by solving a linear programming problem.If this linear problem is infeasible, then C(d t , x t ) = ∞.Additionally, for a given subset of instances T ′ ⊂ T , we define x(T ′ ) as the component-wise average of the binary variables corresponding to the instances in T ′ .
In what follows, we present different strategies to solve the DC-OTS problem for an unseen test instance t with demand values d t.The goal is to employ the information from the training set, T , to reduce the computational burden of solving the DC-OTS reformulation (4) for the test instance t.Note that depending on the strategy that is applied, the response variable of the test instance to be learned can be x t, θ t or the tuple (x t, θ t).

Exact benchmark approach
In the benchmark approach (Bench) the optimal solution of the test DC-OTS problem is obtained using the proposal in [4].Particularly, problem (4) is solved using the big-M values computed according to Equation ( 5).This strategy is an exact approach that does not make use of previously solved instances of the problem, but guarantees that its global optimal solution is eventually retrieved.Nevertheless, the computational time employed by this approach may be extremely high.Algorithm 1 shows a detailed description of this approach.

Algorithm 1 Bench
Input: load vector for test instance t, d t.

Existing learning-based approaches
In this subsection we present two existing learning approaches based on the K nearest neighbors technique [23].The first approach is a pure machine-learning strategy that directly predicts the binary variables of the test instance using the information of the K closest training data.Such closeness is measured in terms of the ℓ 2 distance among the load values of the training and test points, that is, by computing d t − d t 2 , for t = 1, . . ., |T |}.For each test instance t, the set of K closest instances is denoted as This method is referred to as Direct since it directly predicts the value of all binary variables from the data.
In the particular case of the DC-OTS problem, we adapt the Knn strategy as follows: for a fixed number of neighbors K, we fix the binary variables of the test problem (1) to the rounded mean of the decision binary variables of such K nearest neighbors.Once all binary variables are fixed, model (1) becomes a linear programming problem that can be rapidly solved.Algorithm 2 shows a detailed explanation of the procedure.Note that, in this strategy we only need the information about the load vector and the optimal binary variables in the training data, i.e., we only need {(d t , x t )} for t = 1, . . ., |T |}.This approach is very simple and fast.However, fixing the binary variables using a rounding procedure may yield a non-negligible number of infeasible and suboptimal problems.

1)
2) Compute the binary variables x t = ⌈x(T t K )⌋, where ⌈x⌋ denotes the component-wise nearest integer function.
The second learning-based methodology explained in this subsection is proposed in [13] and also employs the Knn technique.As occurs in the previous strategy, here, the authors assume given the set {(d t , x t ), t = 1, . . ., |T |}.In short, their proposal works as follows: for a fixed value of K, the K closest instances to the test point are saved in the set T t K .Then, we evaluate function C(d t, x t ) for each t ∈ T t K by solving K linear problems.The optimal binary variables for the test instance x t are set to those x t that lead to the lowest value of C(d t, x t ).This approach is denoted as Linear and more details about it are provided in Algorithm 3.

1) T
2) Select t = arg min Output: Network configuration x t.
Note that the value of K strongly affects the speed of the algorithm as well as the number of suboptimal or infeasible problems.Larger values of K imply taking into account more training points to get the estimation of the test response.As a consequence, a larger number of LPs should be solved, and the computational burden increases.However, the probability of having suboptimal or, even worse, infeasible solutions is reduced.On the contrary, lower values of K diminishes the computational time of the procedure but increases the risk of obtaining suboptimal or infeasible solutions.

Proposed learning-based approaches
In this subsection, we propose two improved methodologies which combine the benefits of exact and learning methods.Both approaches start by finding the K closest training points to the test instance t and fixing those binary variables that reach the same value for all nearest neighbors according to an unanimous vote.The two proposed approaches also find, in a different fashion, lower values of the big-Ms than those computed in [4].Since some binary variables may have been fixed to one thanks to the neighbors' information, the first approach we propose consists in recomputing the shortest paths and the corresponding big-M values using (5).Differently, the second methodology proposed in this paper directly set the big-M values to the maximum and minimum values of the angle differences observed in the closest DC-OTS instances.Either way, smaller big-Ms are obtained, and hence, the associated feasible region of the DC-OTS problem is tighter.As a consequence, we solve a single MILP with a tighter feasible region and a smaller number of binary variables.
More specifically, in the first proposed approach (denoted as FixB-FatM ) the binary variables of the test instance are set to 1 (resp.to 0) if all the training instances in T t K concur that the value should be 1 (resp.0).On the other hand, for those binary variables that are not fixed, the corresponding big-M values are updated using the information of the previously fixed variables.In particular, these fixed binaries are used to recompute the shortest path that determines the big-M values in Equation (5).In essence, the computation of the new shortest path involves not only the non-switchable lines from the original spanning tree but also those switchable lines with a learned status equal to 1.This update could result in even shorter paths, leading to improved big-M bounds and a more tightly defined feasible region.This strategy relies on the unanimity of all the nearest neighbors and therefore, this learning-based approach is expected to be quite conservative, specially for high values of K.
In order to further assess the computational savings yielded by this approach we also investigate two variations.For instance, we denote by FixB the approach in which binary variables are fixed but big-M values are computed using only the information from the original spanning tree.We also consider the FatM approach that does not fix any binary decision variable but only uses the information of the closest neighbors to recompute the shortest paths and update the big-M values with Equation (5).In other words, while none of the binary variables are fixed in this method, the learned status of switchable lines can still be utilized to decrease the big-M values.By comparing the computational burden of these three approaches we can analyze whether the numerical improvements are caused by the lower number of binary variables or the tighter values of the big-M parameters.Algorithms 4, 5 and 6 show a detailed description of the methods FixB, FatM and FixB-FatM, respectively.
2) Compute x t = ⌊x(T t K )⌋ and x t = ⌈x(T t K )⌉.
Output: Network configuration x t.
Output: Network configuration x t.
FixB and FixB-FatM can be slightly modified to relax the unanimity condition required to fix binary variables.To do so, we introduce a threshold parameter τ < 0.5.The binary variables are then fixed according to the following rules: -If the predicted status for a particular line falls in [0, τ ], the binary variable is fixed to 0. -If the predicted status for a particular line falls in [1 − τ, 1], the binary variable is fixed to 1. -If the predicted status for a particular line falls between (τ, 1 − τ ), the binary variable is left unfixed.This can be implemented by replacing, respectively, step 2) in Algorithm 4 and step 3) in Algorithm 6 by: 2) Compute x t = ⌊x(T t K )⌋ and M t = FAT( x t).
3) Determine x t = ⌊x(T t K )⌋ and Output: Network configuration x t.
The value of K also plays an important role in these approaches.Low values of K increase the chances of unanimous consensus of the nearest neighbors and therefore, a higher number of binary variables are expected to be fixed, and tighter big-M values are obtained.This way, the computational burden of the OTS problem is reduced at the expense of increasing the risk of obtaining infeasible or suboptimal problems.In the extreme case, if K = 1, all binary variables are fixed to the values of the closest instance of the training set.On the contrary, large values of K increase the computational burden but the resulting problems have a high chance of being feasible.In the extreme case, if the whole training set is considered, very few binary variables are expected to be fixed and the computational savings are reduced.
The three methodologies presented above compute the big-M values using past observed data through the shortest path algorithm.However, as can be derived from Equation (3), the values M nm and M nm for a switchable line are just the maximum and minimum values of the difference between the voltage angles at nodes n and m multiplied by b nm .Therefore, following this idea, the second data-driven approach that we propose (denoted as FixB-AngM ) estimate the big-M values using the information of historic observed angles as follows: Using (6) to compute the bounds values for a set of past instances T is denoted as M = ANG(T ) for notation purposes.It is important to clarify that computing the big-M values using (3) and ( 6) involves significant differences.The problems addressed by (3) focus on identifying the tightest valid bounds by solving mixed-integer problems, which are as challenging as the original OTS problem.In contrast, Equation ( 6) efficiently approximates these bounds using observed angles from the historical dataset.Consequently, the bounds derived from ( 6) are consistently tighter than those obtained from (3), potentially excluding feasible solutions to the original OTS problem if the training set lacks sufficient representativeness.In fact, this strategy is riskier than the one used in FixB-FatM since it leads to much tighter feasible regions, which significantly reduces the computational burden of solving the OTS problem, but also increases the chances of yielding infeasible problems.To avoid using too tight big-M values that could cut off the optimal solution, the learned bounds obtained through (6) can be multiplied by a security factor λ ≥ 1.
For the sake of comparison, we also consider the approach AngM in which no binary variables are fixed and big-M values are set using the observed angle differences.More details about the approaches FixB-AngM and AngM are provided in Algorithms 7 and 8, respectively.It is worth noticing that while the big-M values computed by ( 5) are symmetric, those derived by Algorithms 7 and 8 are not.
4) Solve x t = OTS(d t, M, x t, x t).
Output: Network configuration x t.
Output: Network configuration x t.
To sum up, Table 1 provides a brief description of the different methods explained throughout Section 3. The first column of the table includes the name of each strategy.The second column shows whether the final problem to be solved is a linear program (LP) or a mixed-integer linear program (MILP).
In the third column, the total number of problems to be solved is indicated.Column four shows the number of binary decision variables of the MILPs to be solved.Particularly, original means that the number of variables is exactly the same as the one from the original OTS formulation (4).In contrast, reduced means that the number of binary variables of the resulting MILP has been reduced compared to the original formulation.Finally, the last column indicates how the big-M values have been computed.If shortest (spanning) is written, then we indicate that the bounds are computed by means of the shortest path method and only using the information from the original spanning subgraph.On the contrary, the choice shortest (update) means that the shortest paths needed to compute the big-M values have been updated with the information provided by the closest neighbors.Finally, the word historic angles implies that the bounds are computed using the voltage angle information of previously solved instances.

Numerical simulations
In this section, we present the computational results of the different methodologies discussed in Section 3 for a realistic network.In particular, we compare all approaches using a 118-bus network that includes 186 lines [24].This network size is sufficiently substantial to render the instances nontrivial for current algorithms, yet not so large as to make them computationally intractable.Indeed, this is the most commonly used network to test OTS solving strategies in the literature [2-4, 6, 13].As justified in Section 2, we consider a fixed connected spanning subgraph of 117 lines, while the remaining 69 lines can be switched on or off to minimize the operation cost.The spanning subgraph has been chosen in order to obtain sufficiently challenging problems.For this network, we generate 500 different instances of the OTS problem that differ in the nodal demand d n using probability distributions centered in the baseline demand d n .
Since the demand variability may significantly affect the performance of the compared methodologies, we consider the following three cases: -Unif10 : The demand levels are sampled using independent uniform distributions in the range [0.9 d n , 1.1 d n ].Learning-Assisted Optimization for Transmission Switching -Unif20 : The demand levels are sampled using independent uniform distributions in the range [0.8 d n , 1.2 d n ]. -Normal : The demand levels are sampled using a multinormal distribution with the correlation matrix obtained from the demand time series available at [25].The three database files can be downloaded from [26].We use a leave-one-out cross-validation technique under which all the available data except for one data point is used as the training set, and the left-out data point is used as the test set.Consequently, the number of nearest neighbors K ranges from 1 to 499.This process is repeated for all data points and the resulting performance metrics are averaged to get an estimate of the model's generalization performance.
All optimization problems have been solved using GUROBI 9.1.2[27] on a Linux-based server with CPUs clocking at 2.6 GHz, 1 thread and 8 GB of RAM.In all cases, the optimality GAP has been set to 0.01% and the time limit to 1 hour.
The simulation results are presented in two subsections.In Subsection 4.1, a comprehensive comparison is conducted for all learning strategies introduced in Section 3 using the Unif10 database.Subsection 4.2 utilizes the Unif20 and Normal databases to explore the impact of increased demand variability and correlation on the computational performance of these methodologies.

Base case study
All simulation results presented in this subsection correspond to the Unif10 database.To illustrate the economic advantages of disconnecting some lines, Figure 1 depicts an histogram of the relative difference between the DC-OTS cost if model ( 4) is solved by the benchmark approach described in Section 3.1, and the cost obtained if all the 186 lines are connected.This second cost is computed by fixing binary variables x nm to one and solving model (1) as a linear programming problem. Figure 1 does not include the instances for which this linear problem is infeasible.As observed, the cost savings are significant in most instances, and in the most favorable cases it reaches 15%.The average cost savings for this particular network and the 500 instances considered is 13.2%.On the other hand, solving model ( 4) is computationally hard and to prove it, Figure 2 plots the number of problems solved as a function of the computational time.For illustration purposes, the left plot shows the 439 problems solved in less than 100 seconds ("easy" instances) and the right plot the remaining "hard" instances that require a longer time.The average time of all instances is 145s, while the average time of the hard instances amounts to 1085 seconds, which demonstrates the difficulty of solving model (4) to certified optimality.In addition, the benchmark approach is unable to solve 12 of the 500 instances to global optimality within one hour (with a maximum mip-gap equal to 2.46%) even though model ( 4) "only" includes 69 binary variables associated to the 69 switchable lines.This means that, for these 12 instances, this method has not been able to certify the optimality of the best integer solution found within the time limit, due to the poor relaxation bound originated from excessively large big-M values.We have thoroughly examined the simulations of this case study and verified that, for all instances, the best integer solution identified by the benchmark consistently matches the best solution discovered by all the other (learning-based) approaches.This lead us to conjecture that the benchmark does find the optimal solution for all instances in the Unif10 database.Therefore, throughout this section, we compare the different methodologies with the best integer solution found in one hour by the Bench approach.Next, we discuss the results provided by the Direct approach described in Section 3.2, where the binary variables are just fixed to the values predicted by the nearest neighbor technique.Table 2 collates, for different number of neighbors K, the number of instances in which Direct delivers the same solution obtained by the benchmark (# opt), the number of instances with a suboptimal solution (# sub) as well as the average and maximum relative gap with respect to the benchmark approach (gap-ave, gap-max).Note that the metrics # opt, # sub, gap-ave and gap-max are computed with respect to the best solution found within one hour, which may not correspond to the true optimum.Finally, the results are also compared in terms of the average computational time, which can be seen in the last column of the table.Unsurprisingly, this approach is extremely fast and the computational time is just negligible.On the other hand, the vast majority of the instances only attain suboptimal solutions for any number of neighbors K, and the maximum gap is above 8% in all cases.These results illustrate that the use of machine-learning approaches to directly predict the value of the binary variables of mixed-integer problems is likely to be extremely fast but potentially suboptimal.Now we run similar experiments using the Linear approach described in Section 3.2 and proposed in [13].The corresponding results are presented in Table 3. Logically, the Linear solves a higher number of LP problems for different combinations of the binary variables and therefore, some instances are solved to optimality, specially for large values of K.Although this methodology could be parallelized, Table 3 includes the sum of the computational times required to solve all the LP problems and therefore, this time increases with K.It is worth clarifying that the computational time required to find the nearest neighbors is below 1ms in all cases.Although the computational burden is insignificant if compared with the benchmark, the number of suboptimal cases and maximum gap are still considerable.
We continue this numerical study by comparing approaches FixB, FatM and FixB-FatM discussed in Section 3.3.For simplicity, Table 4 provides the results for K = 50 (10% of the training data) and τ = 0. Unlike Direct and Linear, these three approaches lead to the optimal solution for all instances, which confirms their robustness for a sufficiently high number of neighbors.Therefore, although these approaches require a higher computational burden than Direct and Linear, they still involve significant computational savings with respect to the benchmark, while reducing the probability of returning suboptimal solutions.4 also shows that approaches FixB, FatM and FixB-FatM differ in terms of their computational burden.The FatM approach reports higher times than FixB, which allows us to conclude that fixing some binary variables involves higher computational savings than tightening the big-M constants.Notwithstanding this, the highest computational gains are obtained if both effects are combined under the FixB-FatM approach.Figure 3 plots the number of problems solved as a function of time.In the left subplot, the x-axis ranges from 0 to 100s, while in the right subplot the x-axis goes from 100s to 3600s.In the left subplot we can observe that approaches FixB and FixB-FatM are able to solve most of the instances in less than 100s, while approach FatM has a similar performance as the benchmark.In the right subplot we see that the hardest instance solved by FixB and FixB-FatM requires 1645s and 296s, respectively.On the contrary, although FatM outperforms the benchmark, this approach is not able to solve all instances in less than one hour.Table 4: Performance of FixB, FatM, FixB-FatM for K=50 and τ = 0 It is also relevant to point out that the higher the value of K, the lower the chances of achieving unanimity on the status of switchable lines, and thus, the lower the number of binary variables that are fixed in the OTS problem.To illustrate this fact, Table 5 collects the results of approach FixB-FatM for τ = 0 and for different values of K including the average number of binary variables fixed to one or zero using the training data (# bin).For K = 5, 28 binary variables (out of the original 69 binary variables) are fixed in average, then leading to low computational times but a larger number of suboptimal instances.For K = 499, only 8 binary variables are fixed (in average), no suboptimal solutions are obtained, but the computational time is increased.Figure 4 also illustrates the impact of K on the performance of the FixB-FatM approach.Note that setting K equal to 5 yields the lowest computational times Fig. 3: Computational burden of FixB, FatM, FixB-FatM for K=50 and τ = 0 and all instances are solved in less than 100s.However, this method leads to 47 suboptimal solutions.On the other hand, if K is set to 499, the maximum time reaches 400s but all instances are solved to optimality.Table 5: Impact of K on the performance of FixB-FatM for τ = 0 While in the previous simulations τ was set to zero in all cases, increasing its value has the potential to fix a greater number of binary variables, thereby decreasing the time to solve the resulting OTS problem.This, however, comes at the cost of potentially increasing the number of infeasible and/or suboptimal instances.For K = 50, Table 6 presents the simulation results for the FixB-FatM method with various values of the threshold parameter τ .The last column of this table (# bin) shows the average number of fixed binary variables, which logically rises with increasing values of τ .However, the reduction in computational time is arguably minor and, certainly, may not justify the trade-off, as the gap values and the number of suboptimal instances increase significantly in contrast.Therefore, the relaxation of the unanimity condition in the proposed learning-based methods may not be deemed worthwhile.
Next, we analyze the results of the two remaining approaches: the FixB-AngM approach that uses the nearest neighbors to fix some binary variables and all the elements in the training to learn the big-M values as explained Fig. 4: Impact of K on the computational burden of FixB-FatM for τ = 0 Table 6: Impact of threshold τ on FixB-FatM approach in Section 3.3, and the AngM approach described in the same section.The results of these two methods for λ = 1 are provided in Table 7 and allow us to draw some interesting conclusions.First, both approaches lead to suboptimal solutions for some instances.This is understandable since, as explained in Section 3.3, these methods set the big-M constants fully relying on the maximum angle difference observed in the training set.Therefore, if the training set does not include an instance in which the actual maximum angle difference realizes, then the learned values of the big-Ms may leave the optimal solution out of the feasible region.In other words, while this strategy usually leads to very tight big-M values, it also increases the probability of having suboptimal or even infeasible solutions.This strategy is substantially different from approaches FatM and FixB-FatM that learn shorter paths of connected lines based on the optimal solution of the OTS problem for the training data and recompute the big-M constants using (5).Since shorter paths are only updated under the unanimity of the nearest neighbors, this strategy leads to more conservative big-M values and, consequently, larger feasibility regions and computational times.These facts are confirmed by comparing Tables 5  and 7.For instance, for K = 50, FixB-FatM solves all instances to optimality and takes 12.33s in average, the FixB-AngM yields five suboptimal solutions but the average computational times is reduced to 0.7s only.The third relevant fact arises from the comparison of the average computational times of the two approaches in Table 7.As observed, these times are particularly similar for all values of K.This leads us to conclude that the obtained big-M constants are so tight that fixing some binary variables does not have a significant impact on the computational burden.For completeness, Figure 5 compares, for λ = 1, the number of problems solved by FixB-AngM for 50 neighbors and AngM with the benchmark.Notice that these two methodologies are able to solve most instances in less than 5 seconds, while only 250 instances are solved by the benchmark in that time.This figure also proves that fixing the binary variables has a negligible effect on the computational savings.To reduce the number of suboptimal instances, AngM can be run with values of the multiplying factor λ higher than 1.To further illustrate the performance of the two data-driven strategies to learn the big-M constants, Table 9 provides, for ten of the switchable lines, the big-M values for approaches Bench, FixB-FatM for K = 50, τ = 0 and AngM for λ = 1.For the first two methods, M nm and M nm are symmetric for all lines, whereas approach AngM computes asymmetric values as explained in Section 3.3.Since the learned large constants may change for each instance, Table 9 includes value ranges.Thanks to the status of switchable lines of the nearest neighbors, the FixB-FatM approach is able to reduce the shortest paths used in (5) and significantly decrease the values of the big-Ms for some lines.For lines 2, 58 and 103, these values remain, however, unaltered.The approach AngM learns from the observed angle differences and therefore, the big-M are tightened even further.In fact, for lines 58, 85, 135, 164, this methodology is able to infer the direction of the power flow through these lines and consequently one of the big-M values is set to 0. This bound reduction effectively tightens the DC-OTS model (4) and significantly reduces its computational burden.After this in-depth analysis of the simulation results for the Unif10 database, we can conclude that the most promising approaches are Linear with K = 499, FixB-FatM with K = 50 and τ = 0, and AngM with λ = 1.1.
Table 10 summarizes the computational results of these approaches.The Linear approach is the fastest, but returns 373 suboptimal instances, a maximum gap of 0.71% and an average gap that is four times the target value of 0.01%.On the other hand, FixB-FatM and AngM achieve the optimal solution for all instances.Besides, AngM reports the lowest computational time, which is in fact slightly above that of the Linear approach.To conclude this section, we remark that the primary goal of these learning procedures is to swiftly generate solutions needed for online applications.However, it is crucial to note that the rapid solutions obtained are not directly included in the training data.As new demand levels materialize over time, each instance must undergo an offline optimization using the benchmark approach to achieve optimality before integrating its corresponding solution into the expanding training set.

Impact of demand variability and correlation
As mentioned earlier, the variability and correlation of nodal demand levels can influence the performance of the learning-based methods compared in this paper.Specifically, an increase in demand variability relative to nominal values is expected to reduce the accuracy of any learning method, given the same size of the training dataset.Conversely, a higher correlation among demand levels at different nodes in the network simplifies the learning task, thanks to a more pronounced data structure.
Table 11 compiles the simulation results of various methods for the Unif20 database, which has a higher variability than Unif10.While none of the methods applied to the Unif10 database result in any infeasible instances, this is not the case for the Unif20 dataset.The fifth column of the table indicates the number of infeasible instances for each approach.It is worth noting that, for this dataset, the benchmark fails to achieve optimality for 43 instances within one hour, resulting in an average mip-gap of 0.50% and a maximum mipgap of 2.40%.The average time required by the benchmark method is 510.9s.For the Unif20 database, which includes the most challenging instances, we do observe a few cases where some of the learning-based methods produce slightly improved integer solutions compared to Bench.However, for consistency, the reported gaps in this case study are calculated using the solutions identified by the Bench approach as optimal.The simulation results in Table 11 yield noteworthy observations.Firstly, as anticipated, increasing the variability of demand levels leads to a rise in the number of suboptimal and infeasible instances.For instance, FixB-FatM with K = 5 produced 47 suboptimal instances for the Unif10 database.However, for the Unif20 database, this method resulted in 153 suboptimal instances and 4 infeasible problems.The maximum gap for this approach has also increased from 1.92% to 5.93%.Secondly, augmenting the number of closest neighbors diminishes the number of infeasible instances, as binary variables are fixed only under the unanimity condition.Indeed, the FixB-AngM approach exhibits no infeasible instances when K is increased from 5 to 50.Similarly, the Linear approach avoids any infeasible instance for a value of K = 499.This suggests that these approaches are not particularly suitable for high variability in parameters or a low number of training instances.Thirdly, the Linear approach is very fast, but involves large average and maximum gap values, even for K = 499.Finally, considering both the number of suboptimal instances, the average and maximum gaps, and the average computational time, it can be concluded that the AngM with λ = 1.1 method exhibits superior performance for the Unif20 database.11, one could argue that electricity demand in real power systems exhibits a higher spatial correlation.Therefore, utilizing uncorrelated probability distributions for the nodal demands may diverge from reality.To address this concern, in Table 12 we present results analogous to those in Table 11 where demand levels are randomly sampled from a multinormal distribution with a correlation matrix computed using data from [25].For the Normal dataset, the benchmark approach fails to solve 30 instances within one hour, yielding an average mip-gap of 0.49% and a maximum mip-gap of 2.45%.Besides, the benchmark approach takes an average time of 289.5 seconds.As with the Unif10 database, none of the learning-based approaches improves the solution found by the Bench approach in one hour for any of the 500 instances of the Normal database.In this more realistic setting, we observe that there are no infeasible instances for any of the methods, while most methods result in some suboptimal instances.Notably, the computational times required by Linear, FixB-FatM and AngM are of the same order of magnitude.However, while the Linear approach returns suboptimal instances for the three values of K, the proposed methodologies FixB-FatM with K = 499 and τ = 0, and AngM with λ = 1.1 are able to solve the 500 instances to global optimality.This underscores the efficacy of learning-based procedures in delivering rapid solutions that closely approximate the original solution for the OTS problem, even with realistic demand level variability.

Conclusions and further research
In the field of power systems, the optimal transmission switching problem (OTS) determines the on/off status of transmission lines to reduce the operating cost.The OTS problem can be formulated as a mixed-integer linear program (MILP) that includes large enough constants.This problem belongs to the NP-hard class and its computational burden is, consequently, significant even for small networks.While pure end-to-end learning approaches can solve the OTS problem extremely fast, the obtained solutions are usually suboptimal, or even infeasible.Alternatively, we propose in this paper some learning-based approaches that reduce the computational burden of the MILP model by leveraging information of previously solved instances.These computational savings arise from the fact that some binary variables are fixed and tighter big-M values are found.Numerical simulations on a 118-bus power network show that the first proposed approach is able to solve all instances to optimality in less than 300 seconds, while the benchmark approach is unable to solve all of them in 3600 seconds.The second approach we propose is more aggressive and solves all instances in less than 10 seconds, but 1% of them do not reach the optimal solution.We also assess the performance of the proposed learning-based approaches under increased demand variability and correlation.
All the learning approaches presented in this paper utilize the Knn algorithm and the l 2 norm distance.The exploration of different machine learning methods and/or distances is left as a potential avenue for future research.In this paper, we introduce a machine learning approach that leverages the structural patterns observed in past DC-OTS instances to improve the performance of new problems.However, the solver hyperparameters are set to default values.Future research could explore utilizing the data information not only to exploit the problem structure but also to finely tune solver hyperparameters, as demonstrated in [28,29].Additionally, our study assumes the use of DC approximations for power flow equations.A potential research direction involves addressing the more challenging AC-OTS problem, considering data-driven strategies to simplify it into a DC-OTS format, akin to approaches presented in [18].

Table 1 :
Summary of the methods explained in Section 3.

Table 2 :
Performance of the Direct approach

Table 3 :
Performance of the Linear approach Table gap-ave gap-max time (s) # bin

Table 7 :
Performance of approaches FixB-AngM and AngM for λ = 1 Table 8compiles the simulation results for AngM with various values of λ.It is observed that a slight increase in the big-M values above those learned from historical data has a minimal impact on computational time, but reduces the number of suboptimal instances.Remarkably, even for λ = 1.1, all instances are solved optimally by AngM.

Table 8 :
Impact of factor λ on AngM approach

Table 10 :
Summary of computational results for the Unif10 database

Table 11 :
Computational results for the Unif20 database Despite the insightful findings presented in Table

Table 12 :
Computational results for the Normal database