1 Introduction

Simulations based on standard models are often used as part of the engineering design process to examine or test the innovative concepts, novel algorithms or control schemes proposed by researchers. In the electric power industry, studies based on the IEEE benchmark systems [1] defy enumeration. For example, the effectiveness of the proposed algorithm for intentional islanding was tested on the PG&E 69-node system in our previous work [2]. It would be more sufficient and convincing to evaluate the performance of the proposed algorithm based on a large number of test cases. However, the number of standard models is very small (less than one hundred), even including other open source real-world power system models [3]. There is a lack of power grid models to simulate a range of phenomena in power systems [47]. In order to provide a statistically significant number of examples to explore the variability of phenomena under study, a fundamental requirement is to generate a large number of test power systems featuring the same topology characteristics as realistic or standard test systems.

Connectivity is the most important property of power system topologies because the primary function is to supply energy to customers. A power system is not always in a normal operation state. When faults occur, it is possible that the transmission systems are disconnected and the distribution systems are broken into several regions by the fault elements. From the point of view of graph theory, the topology of a power system under a fault condition is a graph with multiple connected components. The reconfiguration of distribution feeders will create a large number of combinations of switch states in a post-fault system, which means a great variety of distribution network structures and corresponding graphs. However, there is no appropriate random graph which is capable of fully exhibiting such characteristics of distribution networks.

Sparsity is another main feature of power grid topologies. The average degree (i.e. the average value of the number of lines connected to each node) of a realistic power grid ranges from 2 to 5, normally between 2 to 3, and does not scale with the grid size [8]. For example, the average degrees of the power grids in northern China and western US are 2.23 [9] and 2.67 [10], respectively, while the degree of medium/low voltage grids in the Netherlands is 2.18 [11]. Generally, it is difficult to generate a connected random graph with a very low average degree, because a sparser graph has a lower possibility of being connected. Yet the distribution of node degrees [12] of generated random graphs must fit with real-world power grid topologies, and this must be achieved without consuming excessive time in the process of generating random graphs.

It is necessary to research how to generate in a reasonable time frame a large number of sparse random graphs with prescribed numbers of connected components, and with the appropriate distribution of node degree, to substitute for real-world power grid topologies.

A series of synthetic power system models were designed to assist the study of power system blackouts by Carreras et al. [46]. The purpose was to avoid incomplete and inaccurate historical blackout data affecting the credibility of the analysis results. Synthetic power system models with tree structure, quadrilateral structure and hexagonal structure were implemented to simulate large power outages. Parashar et al. [7] designed a ring model of power grid, when testing the proposed continuum model of electromechanical dynamics in large-scale power systems. They attempted to use a simple ring structure to reflect the essential characteristics of electromechanical wave propagation in the power system. However, real-world power systems are not uniform, and may contain a large number of irregular structures, so the ideal models used in [47], including tree, ring or other regular networks, are all incapable of fully reflecting the topological properties of realistic power grids. The results obtained using such regular network models may be biased or incomplete. Therefore, we need to establish a more rigorous model of power grid topology.

The Erdös-Rényi (ER) random graph [13] is often used as a test case to substitute for a real network topology. In the ER model, the probability of any line connecting two arbitrary nodes is p. Thus, in a random graph with n nodes, the expected number of lines is p[n(n − 1)/2]. The expected average degree in an ER random graph is p(n − 1). We can adjust the input value p to obtain the average degree that we need. In [14, 15], an improved model called ZER was developed based on the ER model. It can shorten the computation time by skipping some specific lines. Although the model is more advanced than the ER model, both of them have the disadvantage that they give no assurance about the connectivity of the generated random graphs. When n → ∞, if p is larger than a critical value p c, (\(p_{c} \propto \ln \;n/n\)), then the generated random graphs are connected in most cases. Conversely, when p is small, the graphs are hardly connected at all. As a result, the generation processes needs to be repeated many times until a connected random graph is generated, which is inefficient.

The work by Watts and Strogatz [10] first statistically modeled the power grid as a small-world graph, which is generated by starting with a regular ring lattice, then rewiring some local lines with a probability p to arbitrary nodes chosen uniformly at random from the entire graph. A small-world model of a power transmission system was established in [16] to evaluate a disturbance in it. Although the small-world characteristic of the power grid is highlighted, the average degree of the Watts-Strogatz small-world model can only be an even number, which is not in accord with the node degree heterogeneity of the real-world power grid.

Wang et al. [17] proposed a random topology generation algorithm for studying the scale of communication needs and the performance of the combined electric power control and communication networks. The algorithm consists of three steps: (a) randomly place n nodes in a fixed area, using a normal or Poisson distribution; (b) given the distance constraint d, connect each node to all its neighboring nodes within the distance d; (c) check whether the generated topology is connected, if not, repeat the step (a) and (b). The average degree of the random topology generated by this algorithm is related to the distance d and the distribution of the nodes, and is not controllable. Based on the above model, an improved hierarchical random topology model was proposed in [18], as an analytical tool to examine the efficiency of any networked control architecture in smart grid applications. The generation algorithm consists of two steps: (a) generate a number of sub-networks, where in each sub-network every node is connected with k nodes within the distance d (where k is a positive integer greater than one); then determine which lines need to be rewired by adjusting three parameters α, β, and p rw; (b) connect all the sub-networks generated in step (a). The average degree of the above random topology is k, which can only be an integer. The topological characteristics depend on the four parameters d, α, β, and p rw.

The above works demonstrate the importance of generating random topologies for power system studies. However, the existing random topology models of power grids have several shortcomings: (a) the generated topologies are not necessarily connected; (b) the average degree of the generated topologies cannot be controlled precisely; (c) the generated topologies differ from real-world power grid topologies in the node degree distribution; (d) the number of connected components contained in the generated topology is equal to one. Therefore, in this paper we focus on the above problems, and propose a dual-stage constructed random graph (DSCRG) generation algorithm to model power grid topologies, which is superior to the existing algorithms. The proposed algorithm can guarantee the connectivity of the generated random graph through forming a spanning tree; the number of lines in the generated random graph is completely determined by the required average degree; the node degree distribution fits well with real-world power grid topologies; and the number of connected components in the generated random graph is controllable. The effectiveness of the proposed algorithm is validated by experimental studies.

The remainder of this paper is structured as follows. The next section introduces the definitions and notation used in this paper. Section 3 describes the DSCRG algorithm. We conduct an extensive empirical comparison of the performance of proposed algorithm and several existing algorithms in Section 4. We conclude in Section 5.

2 Definitions and notation

  1. 1)

    Graph: A graph G is a representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by mathematical abstractions called nodes (also called vertices or points), and the links that connect some pairs of nodes are called lines (also called arcs or edges).

  2. 2)

    Undirected graph: An undirected graph G (V, E) is a graph in which lines have no orientation; V is the set of nodes in G; E is the set of lines.

  3. 3)

    Connected graph: In an undirected graph, an unordered pair of nodes {x, y} is called connected if a path leads from x to y. Otherwise the unordered pair is called disconnected. A connected graph is an undirected graph in which every unordered pair of nodes in the graph is connected. Otherwise, it is called a disconnected graph.

  4. 4)

    Connected component: A connected component is a connected sub-graph of graph G. A connected graph has only one connected component, that is, itself, while an unconnected graph has multiple connected components.

  5. 5)

    Node degree: The degree of a node is the total number of lines connected to the node.

  6. 6)

    Average degree: The average degree (denoted by <k>) of a graph G is another measure of how many lines are in set E (denoted by e) compared to the number of nodes (denoted by n) in set V. Because each line is incident to two nodes and counts in the degree of both nodes, the average degree and number of lines of a graph is:

    $$\langle k\rangle = 2e/n \Leftrightarrow \;e = n\langle k\rangle /2$$
    (1)
  7. 7)

    Node degree distribution: The node degree distribution (degree distribution for short) of a graph can be described by a distribution function P(k), which represents the probability that the degree of an arbitrary node is exactly k. A completely random graph has a distinctive characteristic that the degree distribution is similar to a Poisson distribution with a mean value <k> [19].

  8. 8)

    Kullback–Leibler divergence: In probability theory and information theory, the Kullback–Leibler divergence [20] (also information divergence, information gain, relative entropy, KLIC, or KL divergence) is a measure of the difference between two probability distributions P and Q. It is not symmetric in P and Q. In applications, P typically represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P. Specifically, the Kullback–Leibler divergence of Q from P, denoted D KL (PQ), is a measure of the information gained when one revises one’s beliefs from the prior probability distribution Q to the posterior probability distribution P. For discrete probability distributions P and Q, the Kullback–Leibler divergence of Q from P is defined to be:

    $$D_{KL} (P||Q) = \sum\limits_{x \in X} {P(x)\log \frac{P(x)}{Q(x)}}$$
    (2)
  9. 9)

    Complete graph: A complete graph is a graph in which each pair of graph nodes is connected by a line. The complete graph with n graph nodes is denoted by K n . The average degree of K n denoted by <k Kn > is:

    $$\langle k_{Kn} \rangle = {{2 \cdot \frac{n(n - 1)}{2}} \mathord{\left/ {\vphantom {{2 \cdot \frac{n(n - 1)}{2}} n}} \right. \kern-0pt} n} = n - 1$$
    (3)
  10. 10)

    Forest: A forest (denoted by F) is an acyclic graph (i.e., a graph without any graph cycles). Forests therefore consist only of (possibly disconnected) trees, hence the name “forest”. The line number of a forest with n nodes and c connected components is n − c. The average degree of a forest denoted by <k F > is:

    $$\langle k_{F} \rangle = 2(n - c)/n$$
    (4)
  11. 11)

    Tree: A tree (denoted by T) is a connected acyclic graph. Thus, each component of a forest is a tree, and any tree is a connected forest. The line number of a forest with n nodes is n − 1. The average degree of a tree denoted by <k T > is:

    $$\langle k_{T} \rangle = 2(n - 1)/n$$
    (5)
  12. 12)

    Spanning forest: A spanning forest is a forest that contains every node of G such that two nodes are in the same tree of the forest when there is a path in G between these two nodes.

  13. 13)

    Spanning tree: A spanning tree of a connected graph G is a maximal set of lines of G that contains no circle or a minimal set of lines that connect all nodes.

  14. 14)

    Node-branch incidence matrix: If a power network has n nodes and m branches, its node-branch incidence matrix A(n × m), can be formulated as follows:

    $$\varvec{A} = \left\{ \begin{aligned} A_{i,b} = 1;A_{j,b} = - 1\;\;\;\;\;\;{\text{if}}\;b^{th} \;{\text{branch}}\;{\text{is}}\;{\text{between}}\; \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \;\;\;\;\;\;{\text{node}}\;i\;{\text{and}}\;j \hfill \\ A_{k,b} = 0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\text{if}}\;k \ne i,j \hfill \\ \end{aligned} \right.$$
    (6)
  15. 15)

    Node admittance matrix: Also network admittance matrix, is defined as follows:

    $$\varvec{Y} = \varvec{AY}_{\text{b}} \varvec{A}^{T}$$
    (7)

    where Y b represents the branch admittance matrix, and can be derived from branch impedance matrix; A represents the above mentioned node-branch incidence matrix.

3 DSCRG algorithm

In this section, we firstly propose an algorithm to generate a connected random graph (that contains only one connected component), then generalize the algorithm to generate random graphs with prescribed numbers (one or multiple) of connected components. We call the algorithm “dual-stage constructed random graph” (DSCRG for short).

As an artificial network, the main task of a power system is to deliver electrical energy to customers, which means connecting the power sources and loads through power lines. Hence the connectivity is a key feature of power system topologies. Reviewing the history of power systems, we find the power system topologies in the past can only meet the requirement of connectivity. From the point of view of graph theory, the earliest power system topology is a tree-like structure. We can easily define the power source and the loads as the root node and the leaf nodes, respectively. There is only one path between the root node and each leaf node. As power system technology was developed, and customers required more reliable electricity, the topology of power systems began to change. Transmission systems are designed as reticular structures. A number of tie lines or tie switches are installed in distribution systems, which are implemented as backup lines or switches to transfer loads when faults occur. Through the above analysis, we design an algorithm to obtain connected random topologies for simulating real-world power grids. Firstly, we stochastically generate a spanning tree that contains all the nodes in the power grid, and then add a certain number of lines to meet the requirement of the average degree of the power grid topology.

However, the power system is not always in its normal operation state. When faults occur, it is possible for transmission systems to be disconnected and the distribution systems will be broken into several regions by the fault elements. From the point of view of graph theory, the topology of power system in a fault condition is a graph with multiple connected components. Fault conditions of power systems need more attention by system operators and researchers, but random topologies with only one connected component are not adequate for simulating a fault condition. We need to develop an approach for generating random graphs with prescribed numbers (one or multiple) of connected components.

3.1 Generation of random graph with one connected component

3.1.1 Computations

Consider a connected random graph G, for which the node number is n and the average degree is <k>, where n > 1. The value of <k> is between the average degree of a spanning tree with n nodes <k T > (derived from (5)) and the average degree of a complete graph with n nodes <k Kn > (derived from (3)), which means

$$2(n - 1)/n \le \langle k\rangle \le n - 1$$
(8)

Thus, a connected graph with n nodes can be regarded as a spanning tree plus a certain number of expanded lines. This number, denoted by S, equals the difference between e and the line number of the spanning tree. By (1), e can be calculated, and the line number of the spanning tree is n − 1. Thus, the number of lines to be expanded S is

$$S = \langle k\rangle \cdot n/2 - (n - 1)$$
(9)

In this way, the process of generating a connected random graph can be divided into two stages: randomly generating a spanning tree with n nodes and expanding the spanning tree by randomly adding a certain number of lines. In the second stage, the number of optional lines denoted by R is equal to the line number of K n minus the line number of the spanning tree, which is

$$R = n(n - 1)/2 - (n - 1)$$
(10)

Thus, in the second stage, we need to select S lines randomly from R lines.

3.1.2 Procedure

We propose an algorithm to generate a random graph with one connected component. The procedure is explained below.

3.1.2.1 First stage: Randomly generating a spanning tree with n nodes
  1. 1)

    A random graph is represented by a node adjacency matrix A(n × n). The elements of matrix A are 0 or 1, where “0” means that no line exists between two nodes while “1” is the contrary. Assign A to a zero matrix at the beginning.

  2. 2)

    The serial numbers of n nodes are {1, 2,…, n}. Pick one node from n nodes at random, and put it into set X. X contains the nodes belong to the spanning tree.

  3. 3)

    Put the remaining (n − 1) nodes into set Y. Y contains the nodes don’t belong to the spanning tree.

  4. 4)

    Determine whether Y is empty, and if so, then go to stage 2 and do step 1, or otherwise proceed to step 5.

  5. 5)

    Randomly select one node from each of set X and Y, denoted by x and y, respectively.

  6. 6)

    Connect x and y with a line, i.e., set A(x, y) = 1 and A(y, x) = 1.

  7. 7)

    Remove node y from set Y, and put it into set X, then return to step 4.

3.1.2.2 Second stage: Randomly adding the correct number of lines to the spanning tree
  1. 1)

    Compute S by (9).

  2. 2)

    Compute R by (10).

  3. 3)

    Utilize i and j to represent two different nodes, and let i = 1, j = 2.

  4. 4)

    Determine whether there is a line between i and j, i.e., A(i, j) = 0. If so, proceed to step 5, otherwise proceed to step 10.

  5. 5)

    Generate a uniform random number θ that is between 0 and 1.

  6. 6)

    Determine whether θ is less than the ratio of S and R, and if so, then proceed to step 7, otherwise proceed to step 8.

  7. 7)

    Connect i and j with a line, i.e. set A(i, j) = 1 and A(ji) = 1. Let S = S − 1.

  8. 8)

    Let R = R–1.

  9. 9)

    Determine whether S is zero. If so, then exit the loop, and matrix A is the resulting generated random graph, otherwise proceed to step 10.

  10. 10)

    Let i = i + 1, j = j + 1.

  11. 11)

    Determine whether i equals n − 1. If so, then exit the loop, and matrix A is the resulting generated random graph, otherwise proceed to step 5.

3.2 Generation of random graph with a prescribed number of connected components

Based on the algorithm presented in section 3.1, we propose a more generalized algorithm to generate the random graph with a prescribed number (one or multiple) of connected components. To obtain a random graph with n nodes and c connected components, we intend to generate the connected components one by one, and then form them into a whole graph. The first thing we should do is to determine the value of node number and average degree for each connected components.

3.2.1 Computations

  1. 1)

    The value of node number for each connected component

    The sum of the node numbers of all the connected components is n. Let n i be the node number of connected component i, then we have (11):

    $$\left\{ \begin{aligned} n = \sum\limits_{i = 1}^{c} {n_{i} } \hfill \\ n_{i} \ge 2\; \hfill \\ \end{aligned} \right.\;\;\;\;\;\;\;\;i = 1,2, \ldots ,c$$
    (11)

    We randomly generate n i for all the connected components with the constraint of (11).

  2. 2)

    The value of average degree for each connected component

    We attempt to make the average degree of each connected component be the same as that of the whole graph, but the node number of each connected component is generated randomly, which may cause some component to have a small n i , so that it is impossible for its average degree to reach the value of <k>, i.e.

    $$\left\{ \begin{aligned} \exists i \in \{ 1,\;2, \ldots ,\;c\} \hfill \\ {\text{s}} . {\text{t}} .\;\;\;n_{i} - 1 < \langle k\rangle \hfill \\ \end{aligned} \right.$$
    (12)

    Meanwhile some other components may have larger n j . This means we have to increase the number of lines in the larger components to make up the shortage of lines in the smaller components, so that the average degree of the entire graph can reach the given value of <k>. If the given <k> is out of an appropriate range, then the average degree of the generated random graph may not be the required value. For example, given that n = 6, c = 2, and <k> = 7/3, then there needs to be 7 lines in the random graph. First, we randomly generate the node number of each connected component, and assume that the result is n 1 = 3, and n 2 = 3. At this moment, there can be 6 lines at most in the two components, thus the average degree can be 2 at most, which doesn’t reach the given value 7/3. In order to avoid that situation, we discuss the appropriate range of the given value <k>. Hence, we present a Proposition to restrict the range of <k>.

Proposition

The average degree of a random graph with n nodes and c connected components has to satisfy the following inequality:

  1. a)

    Prove the right side of (13)

$$2(n - c)/n \le \langle k\rangle \le n/c - 1$$
(13)

When every connected component is a complete sub-graph, the line number of the entire graph is maximized, denoted by e Kn :

$$e_{Kn} = \sum\limits_{i = 1}^{c} {n_{i} (n_{i} - 1)/2}$$
(14)

Due to (11), e Kn is minimized when the node number of each connected component is equal, i.e.

$$n_{i} = n/c,\;i = 1,2, \ldots ,c$$
(15)

Substitute (15) into (14), and we obtain (16) as follows:

$$\hbox{min} \;(e_{Kn} ) = c \cdot \frac{n}{c}(\frac{n}{c} - 1)/2 = n(n - c)/2c$$
(16)

According to (8), the upper bound of <k> is the minimum value of <k Kn >, which is obtained by substituting (16) into (1). So the upper bound of <k> is:

$$\langle k\rangle \le \hbox{min} (\langle k_{Kn} \rangle ) = \frac{{2\hbox{min} (\it{e_{Kn}} )}}{n} = \frac{2n(n - c)}{2cn} = \frac{n}{c} - 1$$
(17)

This proves the right side of (13).

  1. b)

    Prove the left side of (13)

The spanning forest of a random graph has the minimum number of lines. In other words, when all of the connected components are trees, the line number of the random graph is minimized. The line number is denoted by e f where

$$e_{F} = \sum\limits_{i = 1}^{c} {\left( {n_{i} - 1} \right)}$$
(18)

Substitute (11) into (18), and then we obtain (19):

$$e_{F} = n - c$$
(19)

According to (8), the lower bound of <k> is <k F >, which is obtained by substituting (19) into (1). So the lower bound of <k> is

$$\langle k\rangle \ge \langle k_{F} \rangle = 2e_{F} /n = 2(n - c)/n$$
(20)

This proves the left side of (13), and the Proposition is proved through (a) and (b).

Inequality (13) shows the value range of average degree of a random graph with n nodes and c connected components. Based on (13), we calculate the average degrees of all the components as follows:

  1. i)

    i∈{1, 2,…, c}, s.t. n i ≥ <k> + 1, let

    $$\langle k_{i} \rangle = \langle k\rangle$$
    (21)
  2. ii)

    j∈{1, 2,…, c}, s.t. n j ≥ <k> + 1, let the j th connected component be a complete sub-graph. Then the average degree of j th connected component is

    $$\langle k_{j} \rangle = n_{j} - 1$$
    (22)
  3. iii)

    The number of lines of all of the remaining connected components except for the j th component is denoted by e r , and equals the line number of the entire random graph minus the line number of the j th connected component, i.e.

    $$e_{\text{R}} = \langle k\rangle \cdot n/2 - n_{j} (n_{j} - 1)/2$$
    (23)
  4. iv)

    The average degree of all of the remaining components except for the j th component is denoted by <k R>,

    $$\langle k_{\text{R}} \rangle = \frac{{2e_{\text{R}} }}{{n - n_{j} }} = \frac{{\langle k\rangle \cdot n - n_{j}^{2} + n_{j} }}{{n - n_{j} }}$$
    (24)
  5. v)

    Continue to determine whether the node numbers of the remaining components except for component j are less than <k R> + 1, until the average degrees of all of the components are obtained.

3.2.2 Procedure

Consider that the number of connected components is c (c is a positive integer); the node number is n (n ≥ 2c); and the average degree is <k>, whose value range is restricted by (13). The procedure of the generalized algorithm is:

  1. 1)

    Randomly generate the node numbers of all of the connected components, and sort them into ascending order, denoted by n 1, n 2, …, n c .

  2. 2)

    Let j (0) = 0, l = 1, <k (l)> = <k>, and l is the number of iterations.

  3. 3)

    Let i = j (l − 1) + 1, j (l) = 0.

  4. 4)

    Determine whether n i is less than <k (l)> + 1. If so, proceed to step 5, otherwise proceed to step 8.

  5. 5)

    Let j (l) = i.

  6. 6)

    Let i = i + 1.

  7. 7)

    Determine whether i is more than c. If so, proceed to step 8, otherwise return to step 4.

  8. 8)

    Determine whether j (l) is more than 0. If so, proceed to step 9, otherwise proceed to step 13.

  9. 9)

    Compute the number of remaining lines in the l th iteration:

    $$e_{{}}^{(l)} = e_{{}}^{(l - 1)} - \sum\limits_{m = 1}^{{j^{\;(l)} }} {n_{m} (n_{m} - 1)/2}$$
    (25)
  10. 10)

    Compute the number of remaining nodes in the l th iteration:

    $$n_{{}}^{(l)} = n_{{}}^{(l - 1)} - \sum\limits_{m = 1}^{{j^{\;(l)} }} {n_{m} }$$
    (26)
  11. 11)

    Compute the average degree of the remaining part of the graph in the l th iteration:

    $$\langle k^{(l)} \rangle = 2e_{{}}^{(l)} /n^{(l)}$$
    (27)
  12. 12)

    Let l = l + 1, then return to step 3.

  13. 13)

    Generate complete sub-graphs for the first to (j (l − 1))th connected component.

  14. 14)

    Generate random sub-graphs for the (j (l − 1))th to c th connected component with <k (l − 1)> using the algorithm in section 3.1.2. Then, exit the loop, and the random graph with c components is obtained by putting all of the connected components together.

3.3 Electrical parameters

After the generating the random power grid topology, electrical parameters need to be determined. In this paper, we discuss two issues: bus type assignment and line impedance assignment.

3.3.1 Bus type assignment

In power transmission networks, there are three types of buses: generation (G), load (L) and connection (C). Different power systems have varying bus type ratios. The generation buses may comprise (10-40)% of total grid; the load buses (40-60)%; and the connection buses (10-20)% [21]. Random bus type assignment only according to the bus type ratios may lead to unrealistic electrical characteristics. In this paper, we apply the method presented in [22] to assign the bus types in the randomly generated power grid topologies.

3.3.2 Line impedance assignment

As indicated in (7), the node admittance matrix Y can be expressed as a function of the node-branch incidence matrix and the branch impedance matrix. If line impedances are determined, then Y can be derived. The study in [18] showed that the line impedance in the power grid is heavy-tailed and can be captured quite accurately by a clipped double Pareto lognormal (DPLN) distribution.

After the assignment of bus types, we can define six types of lines: GG, GL, GC, LL, LC and CC. For example, GC denotes the line connecting a generation bus and a connection bus, and similarly for the other pairs. Through bus types and the node adjacency matrix, we can determine the type of each line. The distribution of line impedance of each type is not the same. A unique DPLN distribution is not adequate for six types of lines. We will comprehensively discuss this issue in our future work.

4 Experimental studies

The experiments were run on a Core 2 DUO 3.3 GHz CPU machine with 6 GB ROM under Windows 7. All algorithms were implemented in MATLAB R2012b. We measure the execution time as user time, averaging the results over 100 runs.

4.1 Instructions of existing algorithms

Two existing random topology generation algorithms are introduced in this section, consisting of small-world [10] and RT-nested-Smallworld [18].

4.1.1 Small-world networks

Watts–Strogatz small-world models (“small-world” for short) are often used for modeling power grid topologies. The small-world network is generated starting from a regular ring lattice, then using a small probability p, by rewiring some local links to an arbitrary node chosen uniformly at random in the entire network (to make it a small-world network).

  1. 1)

    Connectivity check

    Small-world networks are not always connected. Their connectivity depends on the average degree and the rewiring probability p. Therefore, we need to add a loop to check connectivity each time a small-world network is generated.

  2. 2)

    The rewiring probability

    To make the small-world networks match the characteristics of the realistic power grid topologies, we need to choose an appropriate rewiring probability p. In Fig. 1, we present the degree distributions of connected small-world networks with n = 3000, <k> = 2, 4, and p = 0.1, 0.5, 0.9. Horizontal and vertical axis represent node degree and probability mass function (i.e. PMF) respectively.

    Fig. 1
    figure 1

    Degree distributions of small-world networks with different rewiring probability p

It can be seen that the degree distributions of small-world networks become broad and close to a Poisson distribution, as the value of p increases. In other words, the rewiring probability p determines the degree of randomness of small-world networksa larger p means a more randomized network.

4.1.2 RT-nested-Smallworld

There is no specific instruction about how to choose the parameters of RT-nested-Smallworld in the literature, so we provide some guidance.

  1. 1)

    Connectivity check

    We first review the procedure of the algorithm. The RT-nested-Smallworld network is formed using a hierarchical process: first form connected sub-networks with size limited by the connectivity requirement; then connect the sub-networks through lattice connections. To make sure the generated sub-networks are connected, the experiments in [18] have shown that: for <k> = 2 to 3, the sub-network size should be limited to no greater than 30; and for <k> = 4 to 5, no greater than 300. Therefore, the first step of this model is to select the size of sub-networks according to the connectivity limitation. In this paper, we call the sub-network “RT-Smallworld”.

  2. 2)

    Parameter selection

    There are four parameters to be determined when RT-Smallworld is formed, which are the neighborhood range d, the Markov transition probabilities (α, β), and the rewiring probability p rw. In the experiment, we investigate four groups of different RT-Smallworld sub-networks: {n = 30, <k> = 2}; {n = 30, <k> = 3}; {n = 300, <k> = 4}; {n = 300, <k> = 5}. The parameters: d = {<k>, 2<k>}; α, β, and p rw all range from 0.1 to 1 with step 0.1.

Firstly, we determine the value of d by setting it as the unique variable. The execution time ratios of d = 2<k> and <k> are recorded, when α, β, and p rw remain unchanged. There are 1000 values of time ratio in each group. The four groups of time ratios are shown in Fig. 2. The red, pink and light blue dots means that the ratio is over 10, between 1 and 10, and less than 1, respectively. Obviously, the algorithm with d = <k> is faster than the one with d = 2<k>. Hence, the value of d is determined to be <k>.

Fig. 2
figure 2

Execution time ratios between d = 2<k> and d = <k> for generating RT-Smallworld networks

Secondly, the values of α, β, and p rw are to be determined. The definition of Kullback–Leibler divergence was introduced in section 2, and it can be an index to measure the difference between two degree distributions. Based on the D KL, we propose an offline method to determine the parameters α, β, and p rw for RT-Smallworld.

In (2), P(x) and Q(x) represent the degree distribution of the realistic power grid topology to be simulated and the generated RT-Smallworld network, respectively. The purpose is to select the RT-Smallworld network with the most similar degree distribution to the realistic power grid topology.

For this purpose, we obtain the degree distributions of RT-Smallworld networks under different α, β, and p rw in advance, and use them as an offline database. When the degree distribution of the targeted power grid topology P(x) is given, we compare the values of D KL(P||Q) and choose the RT-Smallworld network with the smallest D KL(P||Q). Then the corresponding values of α, β, and p rw are obtained.

4.2 Typical power system cases

Four typical power system test cases are introduced in this section, which are the IEEE-300 test system [1], the GB-2224 power system [3], the PL-2383 power system [3] and the EU-2869 power system [23]. These power grid topologies are summarised in Table 1 and are all connected.

Table 1 Four typical power system cases

4.3 Comparison of simulation results

It should be noted that neither small-world nor RT-nested-Smallworld are capable of generating random topologies with multiple connected components. In section 4.3.1 we compare the performance of the algorithms for generating random topologies with only one connected component, and in section 4.3.2 we present the results of generating random topologies with prescribed numbers of connected components by using DSCRG.

4.3.1 Results of generating random topologies with one connected component

The four typical power system cases introduced in section 4.2 are simulated by the Small-world (SW), the RT-nested-Smallworld based on D KL offline database (PreRTNSW) and the DSCRG algorithms. The comparison results of the three algorithms are presented in Table 2. <k real> represents the realistic average degree of each power grid topology, while <k ob> is the average degree of the obtained random topology. We introduce an index ε to denote the relative error between <k real> and <k ob>, where ε = |<k real> − <k ob>|. The smaller ε is, the better is the corresponding algorithm. The numbers in parentheses are the parameters of the corresponding algorithm. For SW, the parameters are (<k>, p); for PreRTNSW, (<k>, α, β, p rw); for DSCRG, (<k>, c).

Table 2 Simulation results of SW, PreRTNSW and DSCRG for four typical power systems

There are four columns under each algorithm name, which are execution time, average degree of the generated random topology, relative error ε and D KL of degree distributions between the realistic power grid topology and the generated random topology. There are four different values of n, the first is equal to the node number of the realistic power grid, and the others are 1000, 3000 and 5000. The best performances are listed in bold font.

From the above table, we observe that DSCRG is the fastest algorithm for generating all the random topologies. The execution time of DSCRG is only related to n; for SW and PreRTNSW, it depends on <k>, and more time is required to check connectivity when <k> is smaller.

The obtained average degree <k ob> of DSCRG is identical to the realistic average degree <k real>, because the number of lines is precisely controlled in the second stage of DSCRG. By contrast, <k ob> of SW are even numbers, and <k ob> of PreRTNSW are near integers, because the process of generating the regular ring lattice has limited the value of <k>. The values of <k real> are between 2 to 3, so the <k> of SW can only be 2, and <k> of PreRTNSW needs to be rounded to the nearest integer.

The relative error ε shown in Table 2 is the difference value between <k real> and <k ob>. For DSCRG, ε is zero, clearly less than for SW and PreRTNSW. This proves that the average degree of the generated random topology by DSCRG can be precisely controlled.

The values of D KL of DSCRG are the smallest for all the generated random topologies, which means the degree distributions of DSCRG are most similar to the realistic ones. The values of D KL of PreRTNSW are the second smallest, because of the offline method we proposed for selecting parameters. However, the degree distribution is somehow affected by the average degree, so the D KL of PreRTNSW is greater than that of DSCRG.

We provide the bar charts of the degree distributions in Fig. 3, from which can be seen that SW networks have the greatest difference compared to realistic power grid topologies. As shown in Fig. 3(a), the node proportion of DSCRG is closer to that of IEEE-300 than the node proportion of PreRTNSW when the node degree is 1, while the opposite is true when the node degree is 3.

Fig. 3
figure 3

Comparisons of degree distributions

Fig. 3(b) shows that the performance of DSCRG is better than that of PreRTNSW when the node degree is from 1 to 3. On the contrary, when the node degree is over 3, the performance of PreRTNSW is slightly better than the one of DSCRG. Some nodes with degree greater than 14 are generated by PreRTNSW, but there is no such node in GB-2224.

Fig. 3(c) shows that the degree distribution of DSCRG is closer with PL-2383 than that of PreRTNSW except when the node degree is 2. Finally, Fig. 3(d) shows that the degree distributions of DSCRG and PreRTNSW are nearly the same when the node degree is below 12. Nodes with degree 14 and 15 are not generated by the former, while nodes with degree greater than 15 (that don’t exist in EU-2869) are generated by the latter.

From the above analyses, we conclude that DSCRG is superior to SW and PreRTNSW with respect to execution time, average degree and distribution degree for simulating the realistic power grid topologies.

4.3.2 Results of generating random graphs with prescribed numbers of connected components

An experimental study was conducted to generate random topologies with prescribed numbers of connected components by applying DSCRG only. With n = {1000, 2000, 3000, 4000, 5000}, <k> = {2, 3, 4, 5, 6, 7, 8, 9, 10} and c = {1, 3, 5, 7, 9}, 225 (n×<kc) kinds of different random graphs were generated. The execution times are shown in Fig. 4. All the random graphs can be generated within 2 seconds. The execution time increases linearly with the graph size. The average degrees are shown in Fig. 5. We observe that the obtained average degrees are the same as the given value of <k> for any c and n.

Fig. 4
figure 4

Execution times of generated random graphs with prescribed numbers of connected components

Fig. 5
figure 5

Average degrees of generated random graphs with prescribed numbers of connected components

5 Conclusion

Numerous studies on power systems call for the generation of random graphs. In order to substitute generated random graphs for realistic power grid topologies in simulations, we have studied several existing random graph generation algorithms and pointed out their shortcomings clearly. To effectively model power grid topologies, we have presented a novel algorithm named “dual-stage constructed random graph”. In the first stage, a spanning tree with a given number of nodes is randomly constructed; in the second stage, the spanning tree is expanded to the required random graph by arbitrarily adding a certain number of lines to it. To simulate power grid topologies under fault conditions, the algorithm has been generalized to generate random graphs with prescribed numbers (one or multiple) of connected components. By empirical comparisons, it has been proved that:

  1. 1)

    When the number of connected components is prescribed to be one, the proposed DSCRG can guarantee the connectivity of the generated random graph. Unlike the existing algorithms, there is no need to check connectivity repeatedly.

  2. 2)

    The number of lines in the generated random graph by DSCRG can be precisely controlled to ensure that there is no difference between the obtained and the realistic average degrees.

  3. 3)

    Similar node degree distributions to real-world power grid topologies have been obtained by DSCRG.

  4. 4)

    Random graphs with a prescribed number of connected components can be generated by DSCRG within a shorter time frame than existing algorithms.

In general, the proposed DSCRG is a preferable algorithm that is capable of quickly generating a large number of sparse random graphs with prescribed numbers of connected components and the appropriate degree distribution to substitute for real-world power grid topologies.