Distribution network optimization: predicting computation times to design scenario analysis for network operators

The increased use of renewable energies promotes decarbonization and raises the load on power distribution networks, forcing responsible distribution network operators to re-evaluate and re-design their networks. Infrastructure planners employ a rolling-horizon planning procedure with frequent recalculations to face informational uncertainty, which require solving multiple scenarios. Keeping complexity manageable is particularly challenging as distribution network areas may span multiple cities and counties. In this study, we focus on infrastructural decomposition, where the distribution network is decomposed into multiple parts and planning problems, which are then optimized separately. However, infrastructure planners lack the knowledge of how they should design a scenario analysis for a subnetwork to account for informational uncertainties subject to limited planning time and computing resources. Based on empirical requirements from literature and discussions with experts, we present a novel mixed integer linear optimization model that allows to use exact solution approaches for realistic large-scale distribution networks. Our approach considers the primary and secondary distribution network in an integrated way and designs a flexible topology for high reliability power distribution. We perform extensive computational experiments and a sensitivity analysis to determine correlations between the values of model parameters and computation times required to solve the resulting model instances to optimality. The results of the sensitivity analysis indicate that the combination of the number of buses, lines and the considered action scope have a considerable influence on the solving time. In contrast, a higher number of available transformers led to a better solvability of the model. From these computational insights, we derive implications for infrastructure planners who wish to perform scenario analysis for planning their power distribution networks.

in their values of exogenous model parameters, including future energy needs. (3) Planning time and computing resources are limited. (4) The solution space has many local optima, and complexity increases rapidly with increasing network size [21]. Keeping complexity manageable is particularly challenging as distribution network areas may span multiple cities and counties. For instance, the distribution network operator Westfalen Weser Netz serves 658,000 customers in an area of 6500 km 2 around Paderborn, Germany. This involves 29,251 km of lines, 108 distribution substations, and 7477 transformer stations.
In general, there are several options for addressing the computational complexity of DNO problem instances: (1) design and application of heuristic procedures, (2) model simplification, and (3) infrastructure decomposition. Although heuristic approaches are popular due to their computational efficiency, a key issue with most heuristic procedures (not only for the DNO problem) lies in the unknown quality of generated partial and complete solutions due to the lack of computationally or analytically identified bounds [1,11,61]. DNO model simplification usually involves the simplification of technical assumptions or the application of linearization techniques in order to apply exact solution methods. These approaches often neglect or abstract from the low-voltage part of the distribution network, including voltage quality and the network's behavior during load peaks. However, due to the increasing penetration of low-voltage networks, the consideration of these issues is important [6,19,59]. When infrastructure decomposition is applied, the distribution network is decomposed into multiple parts and planning problems, which are then optimized separately. Although merging optimal solutions of subproblems do not necessarily lead to an optimal solution of the overall problem when subproblems are connected to each other, using exact procedures to solve (sub)problems resulting from decomposition allows determining bounds and the quality of found solutions for each subproblem. Thus, the decomposition approach is often useful and addressed in this article.
An infrastructure planner needs to answer two key questions when performing infrastructure decomposition: (1) the type of decomposition: How can the network be decomposed? (2) The scenario analysis design: given a decomposition of a network into subnetworks, how should a scenario analysis for a subnetwork be designed to account for informational uncertainties subject to limited planning time and computing resources? This study focuses on the latter question, which has often been neglected in the literature. When addressing this question, one key issue of computational scenario analysis is the discretization of value ranges for (uncertain) exogenous parameters, such as the future load, available assets, or investment opportunities. Associated with this issue, a second issue is the question of how parameter values should be combined to form a set of scenarios to be computationally analyzed. For example, it may turn out that a full factorial design is computationally intractable due to limitations in time and computing resources.
To guide our scenario analysis design, we study correlations between parameter values of scenarios with their solution times and pursue the following research question: How do scenario characteristics affect the model's computational solution time? Methodologically, we perform extensive computational experiments and a statistical analysis of computation times.
The contributions of our work consist of (1) the formulation of a linear optimization model which considers distribution networks with their technical details, (2) the conduction of extensive computational experiments for determining correlations between parameter values and computation times, and (3) the provision of recommendations for infrastructure planners who wish to perform scenario analysis for planning their power distribution networks.
The remainder of this paper proceeds as follows. In Sect. 2, we survey relevant literature. We provide a problem description and requirements which need to be considered in Sect. 3. We present a novel MILP formulation in Sect. 4 and evaluate our approach in Sect. 5. Then, we discuss our results in Sect. 6 before we conclude in Sect. 7.

Literature
Distribution network optimization has received a lot of attention in the literature. In Sect. 2.1, we present related research and show the extent to which optimization models ensure supply reliability as well as the effectiveness in terms of the size of solved distribution networks. In Sect. 2.2, we conclude research gaps and situate this study within the existing literature.

Recent research
To present the literature, we use the taxonomy of Georgilakis and Hatziargyriou [20], who classify studies according to their objective, the problem type, the planning period, and distribution levels. The distinction according to the objective indicates the optimization goals. The problem type can either design a new distribution network or expand an existing one and adapt it to future needs. The planning period can be a static single-stage period, where a solution is determined for a certain demand. A dynamic planning period designs a successive expansion plan for the distribution network, divided into multiple stages. The distribution level indicates whether a primary or secondary distribution network is considered, which maintains medium and low voltage, respectively.
To better address the need for further research, we add the categories of uncertainty management and the network size, that are part of the taxonomy of Saboori et al. [46]. In the area of uncertainty management, we distinguish between reliability and demand forecasting. The reliability indicates whether the distribution network design takes supply interruptions (e.g., due to damaged lines) into account. The demand forecast describes whether the model considers unknown future demand. The network size highlights the number of nodes, that are used for the evaluation in the respective studies. Table 1 summarizes the considered literature.
Distribution network optimization pursues various objectives like minimizing investment and operation cost, decreasing energy losses and emissions, and increasing reliability and social benefit. However, multi-criteria decision-making is not necessary since the different goals can be valued monetarily. All studies reviewed in Table 1 consider investment costs. Operational costs are sometimes neglected [17,26,44,52] or included in the investment costs [3,29]. Asensio et al. [4], El-Fouly et al. [17], Haffner et al. [22], Jabr [26], Jooshaki et al. [27], Santos et al. [47] and Shen et al. [51] pay special attention to energy production and loss costs. Energy production and loss costs refer to the conversion of energy, e.g., when substations change the voltage level, or generators and storages feed in energy. Asensio et al. [4], Haffner et al. [22], Jooshaki et al. [27], Lotero and Contreras [33], Santos et al. [47] and Shen et al. [51] also consider the cost of unserved energy. These costs are also called value of lost load and refer to expenses associated with a disruption of supply. Santos et al. [47], Wong et al. [57] and Zidan et al. [63] price CO 2 -emissions and consider them to penalize investments in polluting generators. Asensio et al. [4] pay special attention to the social benefit of a solution in relation to the network's consumers. They design their objective function to consider the consumer's present costs as an available budget for investment and operative costs. The unused budget is maximized to prevent additional costs for consumers. GA genetic algorithm, PSO particle swarm optimization While few papers in the past newly construct distribution networks [17,29,30,36,37], most studies extend existing infrastructure. Arias et al. [3] keep a given initial solution and extend the distribution network. In addition to the expansion, Asensio et al. [4] allow the model to upgrade existing substations. Shen et al. [51] consider substations and lines and allow the replacement of existing parts of the distribution. This makes it possible to also scale back the distribution network and to better adapt it to the requirements.
Jabr [26], Li et al. [32], Paiva et al. [40], Santos et al. [47], Shen et al. [51] and Shu et al. [52] perform a static optimization. They calculate a distribution network for one certain time in the future. Prioritization and order of the expansion are left to the decision maker. Approaches with multiple time stages are more complex but offer the advantage of creating a more detailed expansion plan [3,4,17,22,27,33].
Mendoza et al. [36], Nazar et al. [38] and Paiva et al. [40] state that the design of the secondary distribution network has an effect on the optimality of the overall design. Synergy effects between the two networks have a strong impact on investment costs. Thus, they consider an integral planning of the primary and secondary network. Cossi et al. [12] and Mendoza et al. [37] focus their work on the secondary distribution network. All other studies in Table 1 evaluate their approaches with primary network instances only and neglect the secondary network.
Another issue of distribution network optimization is the design of a reliable distribution network to prevent supply interruptions. To consider the reliability of the distribution network, the cost of unserved energy can be penalized to prevent unreliable networks [4,18,29,30,37,38,42,44,47,49,62]. Jooshaki et al. [27] and Lotero and Contreras [33] calculate a system average interruption frequency index to measure reliability and assign an objective coefficient. Instead of ensuring low supply interruptions, reliability can also be ensured by a flexible topology. A flexible topology is characterized by a redundant supply routes: if an interruption of supply occurs in a part of the network, it can be bypassed by changing the circuit and supplying the affected area via other feeders. For example, if one transformer of the network fails, the circuit of the network can be changed and affected customers can be supplied via other transformers. Shen et al. [51] formulate constraints to allow such a flexible topology. In the event of a supply interruption in the network, the topology can thus be flexibly switched and sections affected by supply interruptions can be covered via other routes. However, aforementioned approaches encourage reliability but not enforce it. Failures in the network do not receive sufficient attention, if the predicted costs of interruptions are too low. For this reason, they only partially meet the reliability criterion (cf. Table 1).
To account for different future situations, Asensio et al. [4] formulate their model as a scenario-based stochastic programming model. Arias et al. [3] introduce chance constraints to model uncertain conventional loads and the demand for electric vehicles. Koutsoukis et al. [29], Santos et al. [47], Shen et al. [51] and Shu et al. [52] use scenarios to map different cases of the future. The solutions of the future scenarios are collected into a solution pool. Experts can analyze this pool to make a selection for a final expansion strategy. Lotero and Contreras [33] calculate a solution for each year of the planning period to also provide an orientation in time. However, they provide no distinctive support for the decision maker.
Many studies choose a nonlinear model (QP/NLP) as formulation to model current flows with physical precision [12,18,42,49,62,63]. A nonlinear model formulation results in a high problem complexity [55]. Therefore, heuristic methods are applied predominately to networks with a large number of load nodes. For example, Mendoza et al. [37], Nazar et al. [38] and Ziari et al. [62] use heuristic approaches to evaluate distribution networks with 812, 5000, and 205 load nodes, respectively. In the work in Table 1, exact methods are applied only on smaller networks with 136 load nodes or less [26].

Research gaps
Recent research places various emphases in the area of distribution network optimization (cf. Sect. 2.1). We identify three research gaps that we address in our work: (1) solving large networks with exact optimization methods, (2) integrating the primary and secondary distribution network, and (3) designing flexible topologies for high reliability. (1) The presented studies focus on solving distribution networks with few load nodes exactly or apply heuristic methods when there are many load nodes. However, there are no model formulations that are solved with exact methods for networks with a high number of load nodes. (2) Considering networks with a high number of load nodes is important to also model the secondary network and enable integrated expansion planning. However, the secondary network is predominantly neglected, even though this is just as important for transporting energy to network customers [12,40]. (3) Previous work has evaluated reliability using costs of undelivered energy. However, no study requires the design of a flexible topology.
To the best of our knowledge, there is no linear formulation that considers the primary and secondary networks in an integrated way, imposes a flexible topology, and can be solved exactly. We wish to extend the research of network optimization and close this research gap by presenting a novel MILP formulation. Our goal is to minimize investment and operation costs of the primary and secondary distribution network using an exact solution approach. In addition to physical requirements, the constraints must also ensure a flexible topology in order to ensure practically applicable results. We want to focus on a static planning horizon; dealing with an uncertain future is still possible by computing multiple future scenarios and consolidating an expansion plan by experts, as shown in other studies.

Problem description and requirements
The distribution network consists of different feed-in, demand, and switch nodes, and is divided into a primary and a secondary distribution network. As Fig. 1 shows, one node connects the primary network with the external grid and serves as the source of electricity. The primary network delivers the electricity under medium voltage (MV). It also distributes electricity to secondary networks, which operates a lower voltage (LV) level. Nodes represent producers (e.g., small power plants, photovoltaic panels), consumers (e.g., households, businesses), or junctions (e.g., distribution boards), that are associated with a demand. Lines and transformers are assets. Lines connect different nodes that have the same voltage level. Transformers connect nodes from different voltage levels. Hence, they represent connections from the primary to the secondary network and vice versa. For connecting two nodes, different transformer types and line types can be installed in the distribution network. Transformer types differ in their capacity of power they can supply to the distribution network, based on the number of windings and other characteristics. Similarly, line types differ in current flow limitations or voltage losses, based on their material and cross-section: For example, a line type made of aluminum with a cross-section of 150 mm 2 can carry a higher current flow and has a better voltage loss behavior than a line type of the same material with a smaller cross-section.
Literature provides requirements for realistic calculation and usability in practice [20,55]. In Table 2, we present and explain the requirements for our model and distinguish between physical and topological requirements. We discussed the requirements and supplemented them in a workshop with practical users to ensure relevance and completeness.  Physical requirements (I-IV) affect the current flow and voltage level: according to Kirchhoff's Current Law there has to be a power flow equality (I), such that the current entering a node is exactly equal to the current leaving it. The voltage level (II) has to be within specified limits and the voltage drop has to be considered. Also, the current flow through a line must not exceed its limit (III). Transformers must not be overloaded (IV) with current outside permissible limits, since this can lead to outages [3,32,55] Topological requirements (V-VIII) influence the design of the network: A distribution network's design may be meshed or radial. A meshed network has multiple power sources and uses loops in its circuitry, resulting in better voltage maintenance and reliability. A radial network has only one energy source and distributes the energy using a tree structure. The majority of the world's power distribution networks are operated radially, because they are easier to plan, operate and maintain [13]. In our work, we consider the design of a German distribution network layout that uses a meshed topology, but is operated radially by using sectionalizing switches (V). The distribution network must satisfy the demand of every network customer (VI). For the network customer's supply, we require a high reliability (VII). A high reliability can be achieved by designing a redundant supply, such that the topology can be switched flexibly [32]. A flexible topology allows affected transformers and lines to be bypassed in the event of supply interruptions. Figure 2 illustrates an example of a reliable network whose circuit can be switched in the event of a transformer failure. Finally, logical constraints (VIII) must be met to ensure the operability of the network, e.g., only one asset type can be chosen, and adjacent lines must be compatible [55].

Optimization model
In this section, we introduce our mathematical model for the DNO. The distribution network is modelled as a directed graph G = (V, E) . Nodes V = {1, … , n} represent points of the network and the set E = {e 1 , … , e m } ⊂ V × V defines edges e ij ∶= (i, j) between two nodes i and j.
To model the use of different line and transformer types, we introduce node and edge qualifications: node qualifications provide information about the nodes' voltage levels, so we can consider whether two nodes can be connected with a line or a transformer type. The function c ∶ V → V q = {EG, MV, LV} defines the node qualifications and provides information about whether the node belongs to the primary MV network or the secondary LV network. At least one node is part of the medium voltage network and connected to the external grid (EG) and thus serves as a source for energy. Edge qualifications provide information about the line and transformer types that can be used on an edge. The function d ∶ E → Q = Q ∪ Q provides the set of available qualifications for a particular edge. Therefore, the codomain Q can be subdivided into available line types Q and transformer types Q . This means that the function d(e) reflects the set of potentially installable line and transformation types q ∈ Q for an edge e ∈ E : If an

Mixed-Integer Linear Problem Formulation of the Distribution Network Optimization Problem
The decision variable j)) . The Objective Function (1) minimizes the associated capital expenditures c ijq . The maintenance of lines and transformers takes place at regular intervals and is usually independent of the amount of energy transported, which is why these costs can be estimated before an investment and included in the capital expenditures c ijq . Table 4 elucidates which requirements are considered by which constraint. To design a flexible topology and therefore maintain reliability (Req. VII), the set of failure scenario S lists edges s = (i, j) ∈ S ⊆ E , whose potential failure must be compensated. Variable f ijs returns the flow of current from node i to j in the case that the edge s is unavailable. Hence, Constraint 2 sets the flow f ijs equal to zero for s = (i, j) , which means that any flow from node i to j is prohibited when this particular edge is unavailable. If f ijs is not restricted by s, the current flow is limited to the bound u c q of its qualification q ∈ d((i, j)) as stated in Constraint 3 (Req. III and IV). A negative flow of f ijs means that the current flows in the opposite direction, i.e. from node j to i.
To maintain radiality (Req. V), Constraint 4 forces a variable ijs to indicate whether there is a positive current flow from node i to j during failure scenario s. In combination, Constraint 5 limits the sum of ingoing edges ijs for a certain node j to one. This causes that each node is supplied by only one predecessor, which results in a radial supply.
Concerning flow balance (Req. I), Constraint 6 matches the incoming flow f jis and outgoing flow f ijs with the node's demand d i for each node i ∈ V . The flow balance does not apply for source nodes as they can serve an arbitrarily high demand. Parameter ij models current losses in the case that edge (i, j) is serviced by a transformer. This procedure is explained in more detail in Appendix A. Constraint 6 also ensures the connectivity (Req. VI) of all nodes with a demand. For rare real-world cases where a node does not have a demand d i , a minimal demand can be assumed to maintain its connectivity.
The voltage value v i of node i has to be within its bounds for each failure scenario s ∈ S , as stated in Constraint 7 (Req. II). The model considers the voltage drop that occurs when current flows through an edge. For lines, the model considers a voltage drop factor Δ ijq (see Appendix B) that occurs within an edge (i, j) ∈ E depending on the qualifying type of line q ∈ Q . The voltage level v is of node i minus the voltage drop Δ ijq f ijs has to be equal to the voltage level of the adjacent node j for each failure scenario s ∈ S (Constraints 8 and 9). This constraint only applies if qualifier q ∈ Q is active ( x ijq = 1 ) and if there is a current flow f ijs through edge (i, j) ∈ E in failure scenario s ∈ S ( ijs = 1 ∨ jis = 1 ). Otherwise, another qualifier q ∈ Q is active or there is no current flow, so the constraint's right-hand side is unbounded.
For transformers, Constraints 10 and 11 sets the output voltage v js of node j ∈ V within a certain voltage interval determined by the coefficients l q and u q (see Appendix A). Similar to Constraints 8 and 9, the constraints 10 and 11 are unbounded if qualifier q ∈ Q is not active or if there is no current flow f ijs in failure scenario s ∈ S.
As a logical restriction, Constraint 12 defines the limitation that only one qualification q can be built between two nodes i and j at the same time. To meet requirement VIII, set K ⊂ P(E) represents a set of distinct subsets of E. For each set in K, the same qualification has to be active (Constraint 14). To summarize, the model considers all imposed requirements. To guarantee the admissibility of computed solutions and their practicality, we present a simulative validation in Sect. 5.2.
The model contains binary as well as continuous variables, and all constraints are convex and linear. Thus, the optimization model is a mixed integer linear problem (MILP). To determine the size of the optimization model, the cardinality of vertices |V| , edges |E| , qualifications |Q| , failure scenarios |S| and line cluster's union � ⋃ K� are relevant. The total number of variables is 3|E| ⋅ |S| + |E| ⋅ |Q| + |V| ⋅ |S| . The number of constraints equals

Computational experiments
In this section, we present our computational experiments. In Sect. 5.1, we introduce different instances based on empirical data. Solutions are checked for validity in Sect. 5.2. Section 5.3 presents the execution times and a sensitivity analysis of the execution times depending on different distribution network characteristics. The problem is solved with Gurobi 9.1.0 under standard parameterization on a Linux CentOS 7 operating system with an Intel Xeon Gold 6148 CPU with 16 × 2.4 GHz and 190 GB main memory.

Instances and data generation
We create a set of instances with different characteristics to investigate their effects on solvability and execution time. A direct choice of the model's sets and parameters (cf.

Network parameter
As network parameters, we consider (1) the city network structure cn, (2) the number of buses bu, (3) transformers tr per 100 buses, (4) line and transformer types lt, (5) the load factor lf, and (6) the action scope as, see Table 5.
(1) To construct the distribution network, we use data from the OpenStreetMap service [23] of ten different city networks cn in Germany. Since lines align along roads, we use the road network for locating the secondary network lines. (2) To allocate buses, we extract five different random subsections per city network with up to bu = 5000 buildings and waypoints. Studies predict a change of the future load by − 17 to 34% compared to today [9,48]. Therefore, we vary a load factor lf between 0.83 and 1.34, and multiply it with the assigned loads for each bus. (6) When planning a real distribution network expansion, an infrastructure planner has to distinguish whether lines and transformer types must or may be replaced, or must remain in place: Assets must be replaced, e.g., when they are worn out. Some assets must not be replaced, e.g., because their costs are still being amortized. All other assets may be replaced. The action scope as reflects the relative amount of assets that must and may be modified and is varied between 20 and 50%.
To summarize, we distinguish ten city layouts with five network sizes. For each network size bu, we generate instances with three different load factors lf, three different transformer distributions tr, three settings of line and transformer types lt and three different sizes for the action scope as. That is a total of 4050 different instances.

Correspondence of network and model parameters
This section explains the relationship between network and model parameters. Therefore, we explain how the graph G = (V, E) as well as the sets S, Q and K from the mathematical model result from the network parameters. Table 6 provides an overview of the relationships. The set V contains one node for each bus bu, one associated node for each transformer tr and one for the external grid. The load of a bus (cf. Sect. 5.1) results in the node's demand d i of the mathematical model. For the bounds of the voltage l v i and u v i of the mathematical model we follow German specifications: The voltage must be between l v i 18000 and u v i 22000 for nodes of the primary network and between l v i = 360 and u v i = 440 for nodes of the secondary network. The set E contains one edge for each bus bu. Additionally, two edges for each transformer tr represent the transformer itself as well as the feeder for the transformer in the primary network. Depending on the topology (e.g., due to intersections or junctions) there are few more edges to model crossings and branches.
The set Q contains the qualifications for each edge. The edge qualifications Q depend on the selected number of line and transformer types lt. The associated parameters of a line or transformer type result in the model parameters c ijq , u c q , l q , u q and Δ ijq . Appendix C presents the specific model parameters and explains how they are connected to the different line and transformer types.
The set of failure scenarios S contains all edges that are transformers, as we require all transformers to be failure-safe.
The set K contains the sets of edges that must be assigned with the same line type. In our instances, we decide to group lines between two intersections or branches. Thus, the set K is smaller than E. The exact cardinality depends on the given topology of the city network cn.

Simulative validation of solutions
As emphasized in the introduction, it is particularly important that solutions can be implemented in practice. To ensure the feasibility of solutions in practice, we simulate all solutions using the current flow simulation library pandapower, which is described by Thurner et al. [54]. During simulation, we consider the failure scenarios to verify the reliability of the solution: we successively deactivate the distribution network's transformers to simulate their failure. In this case, the affected transformer cannot be used, and its load is redistributed to other transformers. Appendix E lists detailed simulation results. Overall, 3476 out of 4050 instances are Table 6 Correspondence of network and model parameters Correspondence Description |V| = bu + tr + 1 V contains one node for each bus bu, one associated node for each transformer tr and one for the external grid |E| = bu + 2tr + E contains one edge for each bus bu and two edges for each transformer tr. Due to the topology of the city network, E can contain further edges Since the simulation calculates current and voltage differently, few violations of the given bounds may occur in some cases. They are of no real consequence due to the fact that in practice, a decision maker could adjust the transformer settings, set the bounds for the optimization more narrowly or, if necessary, specifically strengthen the solution obtained at the affected network parts.

Results
In this section, we present the results of our computational experiments. First, we provide an overview of the optimization's computing time for the instances. Then, we consider the influence of the exogenous network characteristics on the computation time and solution quality using an ordinary least square regression. Finally, we investigate the effect size of the network size on the required computation time using a logistic regression. Figure 3 presents a cactus plot of the optimization's computing time (wall time) of the instances. The instances are divided into five groups of 810 instances each, according to their number of buses. The cactus plot provides an overview about how efficiently the instances were solved depending on their number of buses. The x-axis shows the computing time in minutes, while the y-axis indicates the number of instances solved optimally in the respective time. We consider an instance to be solved optimally if its gap is 0.5% or less.

Fig. 3 Cactus plot of solved instances
To understand the influence of exogenous network characteristics on computing times and solution quality more precisely, we perform a sensitivity analysis using a logistic and an ordinary least square regression: first, for the logistic regression, we introduce the dependent variable solved, which is 1, if the solver could calculate an optimal solution during the specified solution time, and 0 otherwise. The logistic regression provides insights into which network characteristics caused an instance to be solved to optimality during a specified time window. We use the network characteristics bu, tr, lt and as as independent variables (cf. Table 5) and include the number of lines nl as well as the total demand td of the network. These values correspond with the mathematical model's sets and parameters as described in Sect. 5.1.2.
To avoid multicollinearities, nl and td are set in relation to the number of buses. The probability of solved given bu, nl bu , tr, td bu , lt and as is stated in Equation (16) with (z) = 1 1+e −z as the sigmoid function: Table 7 shows the results of the regression. To prevent distortion of the analysis, the instances were checked for outliers according to the Bonferroni correction (error < 5% ) and no outliers could be detected.
Second, we use 3473 of 4050 instances which could be solved optimally and perform an OLS regression on the logarithmically transformed computing time time as the dependent variable as shown in Eq. (17). Complementary to logistic regression, OLS regression examines the effect size of network characteristics of solved instances on time required. We excluded 11 outliers according to the Bonferroni correction (error < 5% ) to prevent distortion of the analysis. Table 8 presents the results of the regression. Since the effect of transformers per 100 buses tr has no statistical significance, we refine the regression analysis (16) Prob solved = 1|bu, nl bu , tr, td bu , lt, ds and group the instances according to the distribution density of transformers. The regression models generated in this process can be found in Appendix D. 1 Appendix E provides detailed results for every instance.

Discussion
In this section, we discuss the results of the computational experiments. First, we interpret our results and discuss the statistical significance and quality of our sensitivity analysis. Second, we derive implications for practitioners from our findings.

Result interpretations
In this section, we address our research question about how scenario characteristics do affect the model's computation time. We discuss the results of the computation time analysis, the logistic regression and the OLS regression. In terms of computation time, Fig. 3 shows that as the instance size increases, the average computation time increases and fewer instances can be solved. At the same time, it demonstrates that other network characteristics are also decisive for computing a solution. Within the given computation time limit and hardware resources, all instances with 1000 buses and almost every (98.3%) instance with 2000 buses were solved optimally. With more buses, the number of optimally solved instances decreases to 77.4% and 60.7% for instances with 4000 and 5000 nodes, respectively. These results indicate that networks with more buses tend to be computationally more challenging. However, looking at instances with a computation time of less than 6 h, some instances with 5000 buses were solved in less time than instances with lower bus count. This means that other characteristics besides the number of buses influence the computational complexity of the model. We find that the logistic regression results are suitable to guide the decision of whether to consider an instance in a scenario analysis. For example, the decision maker can decide to include an instance that is expected to be solved close to optimality with an expected gap of 0.5% or less. On the other hand, they can decide to omit instances of an insufficiently expected solution quality. The regression results show statistically significant influences on the solvability of an instance for all considered independent variables. To evaluate the model's quality, we calculated McFadden's pseudo R 2 -value. Analogous to the R 2 -index in OLS regressions, McFadden's pseudo R 2 evaluates the predictive capacity of a logistic regression model [53]. A pseudo R 2 of 0.4125 underlines the model's quality, since values between 0.2 and 0.4 are considered as an indicator of an excellent model fit [34, p.54]. The coefficients of the regression model in Table 7 provide insights into the effects of network characteristics on the probability of obtaining an optimal result within 48 hours. When interpreting the coefficients, they need to be considered as a parameter of the sigmoid function (z) = 1 1+e −z . For example, for a network with bu = 5000 buses, nl bu = 1 line per bus, and an action scope of as = 0.3 , the probability of solving the network within our experimental setting is (35.2395 − 0.002 ⋅ 5000 − 17.811 ⋅ 1.0 − 19.5555 ⋅ 0.3) ≈ 0.83 (irrespective of the other network characteristics). If the number of lines per bus nl bu increases to 1.2 or the action space as raises to 0.5, the probability reduces to 0.12 and 0.09, respectively. If both variables increase simultaneously, obtaining a solution is hardly expected with a probability of 0.002. Decreasing the network size to 2000 or 1000 would increase the probability to 0.52 and 0.89, respectively, and allows consideration of high line density and action scope again. This behavior can be explained by the fact that with a smaller number of lines, there are fewer options for reliably routing the current flows through the network. Similarly, a smaller action scope allows several variables to be fixed beforehand. Conversely, as the number of lines or action scope increases, the size of the solution space increases rapidly, requiring increased computation time and computing resources.
Looking at the OLS regression, we find that the combination of the number of buses bu, lines per bus nl bu , and the action scope as have a considerable impact on the computation time. The OLS regression in Table 8 evaluates the influence of network characteristics on the expected computation time and finds statistically significant effects. The R 2 = 0.773 indicates, that the regression model explains 77.3% of the variation in the dependent variable [15, p.18]. Its coefficients show the expected logarithmic computation time depending on the variables. Therefore, the actual computation time is obtained by using the exponential function: ). An increase of the lines per bus by 20% to nl bu = 1.2 leads to an expected computation time of 26,743 s ( = 7.42 h, an increase by the factor of 21.8). Thus, the example illustrates that the choice and combination of individual parameters can greatly change the computation time. The increase in line and transformer types have only a weak effect on the computation time due to a small effect size and a low coefficient, which is why we do not go into detail on this network characteristic. We also neglect the density of the transformers and the load per bus due to their weak significance. The results confirm the logistic regression in the sense that the lines per bus, number of buses, number of lines and transformer types as well as the action space influence the optimization model's complexity. However, for the share of transformers in the network and the load per bus, the OLS regression confirms only weakly significant influences.
Since the value P > |t| is least significant for transformers, we divided the data into instances with low, medium, and high density of transformers in the network and created three additional OLS regression models, which are presented in detail in Appendix D. These regression models show that no statistical significance can be found for the action scope when focusing on low transformer density ( tr ≤ 0.9 ), while this feature significantly affects the computation time for instances with high density of transformers ( tr > 1.1 ). The opposite is found for the variable of the load per node, which is only weakly significant at low transformer densities and not significant otherwise. Compared to the OLS regression in Table 8, the coefficients reveal a higher effect size of the action scope (3.7135) and lines per bus (16.2187). The coefficients of the other variables do not change or show weaker effects. Overall, the results complement the logistic regression findings and provide further insight into the impact of individual network characteristics on computation time. Due to the high quality of the models and the statistical significance of their variables, we derive implications for practitioners in the following section.

Implications for practitioners
Our optimization model enables practitioners to calculate optimal distribution networks that can be deployed in practice. The results of our computational experiments provide important insights into the design of scenario analyses for distribution network planning. The implications are summarized in Table 9 and discussed in this section.
We find that decomposition should divide distribution networks into small contiguous areas (cf. Implication P1). When infrastructure planners use decomposition techniques, they are guided by geographical contexts, e.g., cities, city districts or neighborhoods. Although the regression results for the number of buses show the smallest coefficients (0.0017), due to the large effect sizes with up to 5000 buses, this network characteristic has a considerable influence on the computation time. Figure 3 additionally illustrates the increase in complexity and shows that with increasing network size, fewer instances were solved to optimality. We recommend infrastructure planners to decompose the networks as far as possible into city districts or neighborhoods. These small, but connected network areas have the advantage that the complexity is kept low without neglecting dependencies and synergy effects between neighboring distribution areas of different transformers.
We find that the action scope has a great influence on the solvability of the distribution network's model (cf. Implication P2). When infrastructure planners consider a scenario, they have to determine an action scope, which means that they need to decide which assets will be subject of expansion planning. Typically, ailing lines have to be replaced, while recent purchases are still being written off and therefore remain unchanged. For other assets, the infrastructure planners are free to decide whether they should be part of the action scope. According to the results, the action scope has the greatest influence on the solvability of a scenario. If the expected solvability is too improbable, we recommend reducing the scope of action if possible.
We find that the ratio of lines to buses is significant for the computation time and has the greatest influence on it (cf. Implication P3). For example, increasing the ratio of lines to buses by one percent would increase computation time by almost 17%. An increase in the ratio of 10% would result in an increase in computation time of almost 470%. Due to the strong effect size, the necessary computation time can quickly exceed the time budget. Since the lines per bus are given exogenously, the Table 9 Implications for practitioners # Implication for practitioners P1 When network planners use decomposition techniques, an orientation towards geographical contexts is usually used. The results show that the number of buses has a considerable influence on the solution time. We recommend network planners to decompose the networks as far as possible into city districts or neighborhoods. This ensures that interdependencies between neighboring supply areas are respected and that complexity is kept low at the same time P2 When network planners consider a scenario, they have to determine an action scope, which means that they need to decide which assets will be subject of expansion planning. Results show that the action scope disrupts the solvability of the model the most. Therefore, we recommend network planners to set the action scope cautiously P3 When network planners analyze a network, the ratio of lines to buses is highly relevant to the computing time budget. Since the lines per bus are given exogenously, the network planner cannot reduce the number of lines per bus. We recommend network planners to choose a suitable decomposition or an appropriate scope for action to counteract an unacceptably high computing time, when there is a high amount of lines per bus P4 When network planners assemble potential new investments, the results show a reciprocal relationship between the number of transformers and the expected solvability or computing time of the model. This means that additional transformers are conducive to solvability. Therefore, we recommend network planners to give extensive thought to possible locations for new transformers in order to benefit from the positive effects mentioned P5 When network planners consider which asset types for lines and transformers to use in the future, the results of the regression analyzes show that the effect sizes of the bus load and the number of alternatives for asset types have only a minor impact on solvability and solving time. An increase in the load per node and the asset types would only extend the expected computing time slightly. We recommend network planners to consider the possible options for asset types comprehensively in order to create a suitable basis for determining expansion solutions infrastructure planner cannot reduce the number of lines per bus. To avoid an unacceptably high computation time in case of a too high density of lines, infrastructure planners may choose a suitable decomposition or an appropriate scope for action (cf. Implications P1 and P2). We find that a higher number of alternatives for transformers had a reciprocal relationship between the number of transformers and the expected solvability or computation time of the model (cf. Implication P4). Contrary to the expectation that an increase in the number of alternatives for transformers and the accompanying increase in the solution space create a more complex model, it is shown here that additional transformers are conducive to solvability. The foremost cause for this rather contradictory result is likely the fact that additional transformers prevent expensive expansion of the lines, so the solver can exclude uneconomical solutions more quickly. We recommend infrastructure planners to give extensive thought to possible uses for new transformers to benefit from the positive effects mentioned.
We find that the number of asset types and the load per bus is of less importance to ensure solvability or fast computation time (cf. Implication P5). When infrastructure planners consider which asset types for lines and transformers to use in the future, the results of the regression analyzes show that the effect sizes of the bus load and the number of alternatives for asset types have only a minor impact on solvability and computation time. Due to the disproportionately low ratio, it is recommendable to not restrict the number of asset types and bus load in the scenario design to reduce complexity.

Conclusion
In this section, we summarize our work and provide an outlook on future research avenues.

Summary
This study addresses the challenge of planning future power distribution networks and designing scenario analyses in order to account for informational uncertainties subject to limited planning time and computing resources. We pursue the research question of how scenario characteristics affect the model's computation time to analyze correlations between parameter values of scenarios with their computation times. Based on requirements from literature and discussions with infrastructure planners, we propose a linear model for the distribution network optimization problem. Using an exact state-of-the-art solver, we conduct extensive computational experiments on distribution network instances based on empirical data, varying several parameters to adjust different network characteristics. We perform a simulative validation to confirm the solution's applicability in practice. We also examine the influence of network characteristics on the computation time in more detail with a sensitivity analysis. From the results, we derive implications for practitioners to guide them in how to design a scenario analysis with respect to limited time and computing resources. This enables infrastructure planners to consider the uncertain development of the future energy consumption and to determine a cost-effective expansion strategy for their distribution networks.

Research agenda
We are aware of the limitations of our work and advise avenues for future research. As a first research path, we recommend integrating electrical storage systems (ESS) into the model and thereby contribute to identify potentials for smart distribution network expansion. For example, the optimization of distribution networks including ESS is considered in the studies of Asensio et al. [4] or Mariut and Helerea [35]. Since our approach is tailored for German distribution system operators, who are not yet authorized to use flexible assets due to German legislation, we neglected ESS in this study. However, ESS could be integrated as controllable loads in our model with their charging and discharging as a new decision variable. Load losses and voltage behavior would have to be taken into account in the constraints.
A second research path may extend our approach to include planning periods, thus open up the possibility of designing transformation plans for distribution networks in temporal context. The integration of time intervals can be found in the work of Arias et al. [3] or Shen et al. [51]. To model the temporal intervals, the decision variables can be extended by a temporal dimension. Adding another dimension would increase the number of variables and thus the complexity of the model; on the other hand, adding another dimension could result in a better handling of the uncertain future: Considering planning periods, a rolling horizon approach could be applied so that the optimization problem is solved for a specified forecast period and a solution is implemented. A new optimization for the subsequent planning period can then be performed at a later point in time with adjusted demands based on better knowledge about the change in load demand.
A third research path may deal with the computing resources to be used. From an experimental point of view, our study was limited to the effects of network characteristics. Supplementary, further research can determine how varying computational resources, e.g., memory and threads, affect the solver's performance. This approach would extend our computational experiments by varying not only the network characteristics, but also the resources available. An evaluation of solution times as well as solution quality in dependence on the computational resources could provide further information about which requirements should be placed on systems in order not to acquire expensive resources unnecessarily.
A fourth research path could shed light on the nonlinear current behavior. While we formulated a linear model and simulatively validated our results for it, other researchers like Wang et al. [56] or Aldarajee et al. [2] formulate nonlinear models. The nonlinear consideration accurately maps the current behavior, but worsens the performance of the solution calculation. More in-depth experiments can address the question to what extent the optimum of the linear model formulation deviates from the optimum of the nonlinear one and whether consequences for the decision recommendation arise from this.
Finally, as a fifth research path, our results should be further analyzed with real distribution networks. For example, in the work of Bagheri et al. [5] and Paiva et al. [40], a real distribution network is used for evaluation. In a case study, our approach could be used to analyze how an infrastructure decomposition can be used to enforce large distribution networks in a cost-minimizing way to meet the future needs. These insights can guide infrastructure planners on how to decompose their networks to enable coherent planning of different sub-networks that, when combined, represent the entire service area.