1 Introduction

Belief propagation (BP) is an algorithm that is known from statistical physics [1, 2], computer science, artificial intelligence and information science (for a review see, for example, [3]). It runs also under the name of ‘message passing’. In some cases, BP provides an exact rearrangement of the original calculational objective, while in general it calculates a mean-field approximation [1,2,3,4] to that. The BP method has two established main advantages over traditional algorithms (such as least-squares and quasi-Newton methods [5, 6]). It is fast even for large networks, as the computation time scales linearly in the system size, and it is robust against large differences in the input parameters. Its robustness avoids convergence issues associated with the traditional approach ([7]). These properties make it uniquely suitable for dealing with large datasets and for frequent and large-scale network analyses (if possible online), which are increasingly required for supply networks.

In electric power grids, BP is in principle applicable to optimization problems and state estimation (in Sect. 3 we give a concrete implementation for state estimation). Optimization problems refer to cost-efficient and low-risk performance under diverse sources of uncertainties. State estimation is the procedure of using measurement data to infer an estimate of state variables such as power flows and phase angles as accurately as possible. State estimation is important to convert system measurements into reliable information on the true network state and to ensure the stability of operation. In [8,9,10], BP-based algorithms were found to outperform traditional (least-squares) approaches to state estimation. In particular, the speed and robustness of the BP-based algorithms enable state estimation in real-time [8, 11, 12] and statistical analyses of large networks [9, 10], even if data are partially missing [10]. Thus, state estimation combined with BP amounts to an important step in view of improving supervisory control and planning decisions in running power grids if a fast online estimation of the true state of the grid is required.

A second range of applications of BP are gas networks that we consider in some more detail in the appendix. Natural gas is still one of the important energy resources worldwide. A reliable and efficient operation of gas-pipeline networks becomes increasingly important due to the liberalization of the European gas market. In [5], the impact of injecting alternative gas supplies at different locations is studied to facilitate decisions on the allowable amount and composition also of alternative gas sources such as hydrogen and biogas.

BP has been applied to further supply networks [13,14,15,16], other than power or gas networks. The authors of [13] and [14] used BP to identify faults and contamination sources in water networks. In [15], a BP algorithm was shown to effectively optimize public transport for urban planning and telecommunication networks, while [16] applied BP to a generic nonlinear resource allocation problem.

For a given network with an associated graph, the BP algorithm makes use of an additional auxiliary graph called the factor graph. The novel contribution of this paper is the assignment of an appropriate factor graph to supply networks. If the very factor graph has a tree structure, BP is known to be exact [17]. In general, BP implements an approximation corresponding to Bethe mean-field theory [2,3,4]. Practically, BP is accurate if there are only a few short loops in the factor graph.

A naively assigned factor graph directly reflects the basic variables on the original network and their mutual dependencies. In the supply networks that we consider in this paper these interdependencies result from Kirchhoff’s laws, corresponding to the conservation of flow at the vertices of the network (first law) and the constraint that the drop in variables like the voltage or pressure sums up to zero around elementary loops (second law). In general, these laws impose nonlinear constraints on the flows, and these constraints are responsible for additional loops in a naively assigned factor graph, even if the original network is a tree. It is these loops that we want to avoid by a suitable assignment of factor graphs. The loops are numerous as the constraints between the variables are omnipresent in the network. The mathematical structure of these constraints is found in many supply networks, so that our proposed new algorithm applies to all of them. Although our method succeeds in avoiding the many additional small loops in the factor graphs which result from constraints as mentioned before, it does not address the possible challenges which result from loops in the original supply networks. In the concrete test cases that we consider, the resulting algorithm still shows an excellent performance without addressing these loops. In cases where these loops do impede the performance of BP, our method should be combined with additional approaches. Several such improvements have been proposed and implemented, see, for example, [8, 18,19,20,21].

Such a combination is possible because our method, which relies on a type of clustering (to be defined below), changes only the factor graph rather than the BP algorithm itself. In contrast to other clustering methods, we propose a systematic way of clustering factor graphs in terms of clusters which by construction depend only on a few variables. This way the price to pay for clustering remains moderate. The resulting method then improves the speed and convergence of the BP algorithm.

The paper is organized as follows. In Sect. 2, we describe the clustering procedure, assigning a factor graph which differs from a straightforward assignment, but prevents by construction additional loops in the factor graph assignment. In Sect. 3, we illustrate the application of the procedure with state estimation for the artificial IEEE-300 electrical grid. We discuss the accuracy and performance of the algorithm. In Sect. 4, we give an outlook to other applications of our algorithm to power grids. The conclusions are summarized in Sect. 5. In Appendix A, we point to further possible supply networks which share the essential structure of the equations; in particular, we work out the case of natural gas-pipeline networks with an example from the steady-state analysis of two realistic GasLib benchmark networks. Here our method enables the very applicability of BP, as with our method, BP converges exceedingly fast while BP with a naive factor graph assignment does not converge at all.

2 Assigning factor graphs to supply networks

We consider supply networks consisting of vertices and links and assume the following generic description:

  • To each link in the supply network, a flow of some quantity is assigned which traverses the link; flows from vertex i to vertex j and vice-versa are denoted as \(f_{ij}\) and \(f_{ji}\), respectively.

  • Associated with each vertex i is a variable \(v_i\); examples are the voltage in electric circuits, or the pressure in fluid networks.

  • The flow \(f_{ij}\) through a link (ij) is determined by the variables \(v_i\) and \(v_j\) at the vertices at either end of the link (e.g., Ohms law in electric circuits). The flow is thus a function \(f_{ij}(v_i,v_j)\) of the vertex variables \(v_i\) and \(v_j\).

  • Flows are conserved at each vertex: The sums of in and outgoing flows are equal. Denoting external injections into a vertex i as \(g_i\), we thus have \(g_i = \sum _{j \in N(i)} f_{ij}(v_i, v_j)\), where N(i) is the set of vertices adjacent to vertex i. The injection can thus be written as a function \(g_i(v_i, v_j: j \in N(i))\) of the vertex variables \(\{v_i, v_j: j \in N(i)\}\).

The class of problems that we study on these supply networks is defined by the criterion that the state is determined in terms of a probability distribution \(P(\varvec{v})\), depending on the vertex variables \(\varvec{v}\) of the following form:

$$\begin{aligned} P(\varvec{v})=\prod _i H_i(v_i, v_j:j\in N(i))\times \prod _{(ij)} H_{(ij)}(v_i,v_j),\nonumber \\ \end{aligned}$$
(1)

where the products run over all vertices i and all links (ij) of the network, respectively, and N(i) is the set of vertices j that are connected to i via a link. The function \(H_i\) depends on the variables \(\{v_i, v_j:j\in N(i)\}\), which can in particular incorporate a mutual dependence on \(v_i\) and on the injection \(g_i(v_i, v_j: j \in N(i))\). The \(H_{ij}\) are functions depending on pairs of variables \(v_i,v_j\), which may incorporate a dependence on the flows \(f_{ij}(v_i,v_j)\) and \(f_{ji}(v_j,v_i)\). In Sect. 3 and Appendix A we will study the state estimation of power grids and a network analysis of gas pipelines, where we will explicitly show that these problems reduce to the marginalization of a probability distribution of the form of Eq. 1.

Applications of Eq. 1. Although Eq. 1 may look rather specific, it is in fact a general formulation that comprises the following cases:

  • Uncertainty in the state variables: The probability distribution may express the uncertainty in the values of the vertex variables \(v_i\), the flows \(f_{ij}\) and/or the injections \(g_i\). Such uncertainty may arise from fluctuating injections (for example, in renewable energy generation), or from uncertainties in the measurements of these quantities (as in the Bayesian state estimation problem studied in Sect. 3). (If the value of a quantity is certain, this can be incorporated as a delta function in the distribution. If nothing about a quantity is known, the associated factor is assigned a very high variance.)

  • Optimization problems: For a given cost function \(C(\varvec{v})\), one considers the distribution

    $$\begin{aligned} P_T(\varvec{v})\propto \exp (-C(\varvec{v})/T) \,. \end{aligned}$$
    (2)

    In the limit \(T \rightarrow 0\) the probability distribution peaks at the minimal costs \(C(\varvec{v})\). The exponential turns sums into products, such that the distribution of Eq. 1 incorporates a minimization of a sum of costs on the vertex variables, the flows and the injections. Constraints can be implemented by setting the cost function equal to infinity whenever the constraints are violated. It is possible to explicitly take this limit in the BP Eqs. 46, thereby converting them to a form that is more convenient for optimization (called the min-sum algorithm). We refer to [3, 22] for details.

  • Constraint satisfaction problems (i.e., finding configurations \(\varvec{v}\) that satisfy a number of constraints): These can be included by studying products of Dirac delta functions, where each delta function incorporates a constraint. Alternatively, a cost function is optimized that assigns a penalty to each violated constraint. We will use this option to analyze the steady state of gas-pipeline networks in Appendix A.

  • Optimization under uncertainty: If costs need to be minimized in an inherently fluctuating environment [23], decisions on the production, for example, may lead to stochastic rather than deterministic costs. For power grids such a situation has been investigated in [24, 25], where fluctuations are due to uncertain power injections by renewable resources.

If the distribution of Eq. 1 should be evaluated to gain insight into the probability of individual variables, it amounts to a marginalization of this joint probability distribution by summing or integrating over a subset of variables. This is the place where BP enters in the sense that the sums or integrals are performed in a very efficient way.

2.1 The choice of factor graphs

BP can be used to efficiently calculate marginals of probability distributions such as those of Eq. 1. It is convenient if BP makes use of a graphical representation of the probability distribution in terms of a factor graph. The factor graph is a bipartite graph, made of two types of nodes, variable nodes, represented by circles, and factor nodes, represented by squares. The assignment of variables and factors is not unique and a matter of convenience. The procedure of assigning a factor graph to any probability distribution \(P(\varvec{v})\) proceeds in the following steps:

  1. 1.

    Partition the vector \(\varvec{v}\) into new ’variables’ \(\{x_{I}\}\), where each \(x_I\) is a disjoint subset of \(\varvec{v}\) (i.e., possibly containing multiple \(v_i\)).

  2. 2.

    For each \(x_I\), draw a circle. This defines a variable node of the factor graph. As a special case, the correspondence between variables on the original grid and the variable nodes on factor graphs may be one to one in a straightforward assignment (which we refer to as the naive assignment).

  3. 3.

    Define factors \(W_a(\varvec{x}_a)\) such that \(P(\varvec{v}) = \prod _a W_a(\varvec{x}_a)\), where \(\varvec{x}_a\) are (in general overlapping) sets of some of the new variables \(\{x_I\}\).

  4. 4.

    For each factor \(W_a\), draw a square. This is a factor node of a factor graph.

  5. 5.

    Each factor \(W_a(\varvec{x}_a)\) depends on \(\varvec{x}_a\), which contains multiple variable nodes \(x_I\). Draw an edge between the factor node \(W_a(\varvec{x}_a)\) and each variable node \(x_I \in \varvec{x}_a\) on which it depends.

Ambiguities in the assignment of the factor graph result from step (1) and (3).

Example of a straightforward choice of a factor graph. Let us give a concrete example, considering a simple building block of three vertices \(\{1,2,3\}\) of a larger network, connected by links (1, 2) and (2, 3) (shown in Fig. 1a). The distribution (Eq. 1) we are interested in is then given by:

$$\begin{aligned} P(v_1,v_2,v_3)= & {} \big [H_1(v_1,v_2) \cdot H_2(v_1,v_2,v_3) \cdot H_3(v_2,v_3) \big ] \nonumber \\&\times \big [H_{(1,2)}(v_1,v_2) \cdot H_{(2,3)}(v_2,v_3) \big ] \,. \end{aligned}$$
(3)

Following the steps (1)–(4), the most straightforward way of assigning a factor graph to this distribution is to assign a variable node to each \(v_i\), and a factor node to each \(H_i\) and each \(H_{(ij)}\). The factor graph corresponding to the distribution (3) is shown in Fig. 1b.

In particular, it is important to note that this straightforward assignment has a number of loops (four in this case, one of them indicated in blue) though the original network (Fig. 1a) has none. Responsible for the loops are the factors \(H_i\): Without these factors, the topology of the factor graph would directly reflect the topology of the original supply network (for each vertex it contains a variable node \(v_i\), and for each link it contains a factor node \(H_{ij}\) connecting variables nodes \(v_i\) and \(v_j\)). The clustering applied to this factor graph (to be discussed later) refers to the variable nodes, in Fig. 1b, we indicate two clusters with dotted lines, overlapping in variable node \(v_2\). In Fig. 2, we show how to deal with the overlaps. Moreover, if \(H_i\) would depend only on \(v_i\) (rather than on all \(\{v_i, v_j: j \in N(i)\}\)), the factor nodes corresponding to \(H_i\) would be leaf nodes and create no loops.

Fig. 1
figure 1

a Simple building block of a supply network, which gives the probability distribution of Eq. 3. b The factor graph assigned in a straightforward way (Sect. 2.1) to Eq. 3, representing the simple network shown in (a). Note that the four elementary loops (one of them indicated in blue) would be absent if each \(H_i\) would depend only on \(v_i\) rather than on all \(\{v_i, v_j: j \in N(i)\}\). The dotted lines indicate two clusters of vertices, which the approach described in Sect. 2.3 will use to eliminate the loops (after dealing with the overlap at vertex \(v_2\))

Number of extra loops in a naive assignment. The total number of extra loops can be determined as follows. In general, the total number of loops in a graph, here the factor graph, is given by (\( \sharp \;\text {connected components} + \sharp \;\text {edges} - \sharp \;\text {vertices} \)) [26]. The naively assigned factor graph contains factor nodes \(H_i\) and \(H_{ij}\) and variable nodes \(v_i\) as well as edges connecting these nodes. As mentioned before, if we consider only the variable nodes, the factor nodes \(H_{ij}\) and the edges connecting them, the resulting structure directly reflects the topology of the original supply network, in particular it has the same amount of loops. The amount of extra loops in the factor graph (as compared to the supply network) can thus be calculated as (\(\sharp \;\text {additional edges due to factor node}\;\{H_i\} - \sharp \;\text {additional nodes which are of type}\;H_i\)). On the factor graph, each \(H_i\) must be connected by an edge to all variable nodes in \(\{v_i, v_j: j \in N(i) \}\): The amount of extra edges is thus given by \(\sum _i |v_i, v_j: j \in N(i)| = \sum _i(1 + |N(i)|)\). The number of extra nodes is simply \(\sum _i \,1\) (one factor node for each \(H_i\)). In total the number of extra loops in the factor graph is thus given by \(\sum _i (1 + |N(i)|) - \sum _i 1 = \sum _i |N(i)| = 2\cdot \sharp \; \text{ links }\). This means in the example of Fig. 1b that there are \(2 \cdot 2 =4\) extra loops (where the number of links can be found from Fig. 1a). The dependence of \(H_i\) on further variables from N(i) due to ubiquitous constraints on all of the network variables thus leads to a proliferation of loops in the factor graph.

2.2 Sketch of the BP algorithm

To fix the notation, we summarize the basic steps of the BP algorithm. For a given factor graph with variable nodes \(\{x_I\}\) and factor nodes \(W_a(\varvec{x}_a)\) (such that \(P(\varvec{x}) = \prod _a W_a(\varvec{x}_a) \)), BP can be used to calculate marginals \(P_I(x_I) \equiv \int \prod _{J \ne I}\mathrm {d} x_J P(\varvec{x})\) and \(P_a(\varvec{x}_a) \equiv \int \prod _{J: x_J \notin \varvec{x}_a}\mathrm {d} x_J P(\varvec{x})\). Here we will give the basic BP-algorithm, for which several extensions exist [8, 18,19,20,21], as mentioned in the introduction. The steps of the BP-algorithm for the marginalization are the following:

Initialization. For each factor-variable pair (aI) that is connected on the factor graph (that is, for which \(x_I \in \varvec{x}_a\)), messages \(\{m_{I \rightarrow a}(x_I), m_{a \rightarrow I}(x_I)\}\) are initialized uniformly: At time \(t=0\), the messages are set to \(m_{I \rightarrow a}^{t = 0}(x_I) \propto 1\) and \(m_{a \rightarrow I}^{t = 0}(x_I) \)

\( \propto 1\). The messages are functions of the variables, If the variables are discrete or if the factors are Gaussian (implying also Gaussian messages) the messages can be parameterized by a few real numbers. Otherwise one needs to find an approximation, such as a discretization of the messages [22, 27] or the basis function expansions considered in [28]. In case of Gaussian distributions we initialize the messages with zero mean and large variance to generate a uniform distribution.

Updates. At each step t, we keep track of approximations \(\{b_I^t(x_I)\}\) and \(\{b^t_a(\varvec{x}_a)\}\) (with I and a running over all variable and factor nodes respectively), which for large t are supposed to converge to the marginals of \(P(\varvec{x})\) according to \(b_I(x_I)\rightarrow P_I(x_I)\) and \(b_a(\varvec{x}_a)\rightarrow P_a(\varvec{x}_a)\). The approximations are given in terms of the messages \(\{m_{I \rightarrow a}^t(x_I), m_{a \rightarrow I}^t(x_I) \}\), which are updated according to:

$$\begin{aligned} m_{J \rightarrow a}^{t+1}(x_J)= & {} \prod _{b \ne a; x_J \in \varvec{x}_b} m_{b \rightarrow J}^t(x_J) \end{aligned}$$
(4)
$$\begin{aligned} m_{a \rightarrow I}^{t+1}(x_I)= & {} \int \big [\, \prod _{J \ne I; x_J \in \varvec{x}_a} \quad m_{J \rightarrow a}^{t+1}(x_J) \big ] \nonumber \\&\times \big [W_a(\varvec{x}_a) \big ] \times \quad \prod _{J \ne I; x_J \in \varvec{x}_a} \quad \mathrm {d}\, x_J \,, \end{aligned}$$
(5)
$$\begin{aligned} b_I^{t+1}(x_I)\propto & {} \prod _{a: x_I \in \varvec{x}_a} m_{a \rightarrow I}^{t+1}(x_I) \,. \end{aligned}$$
(6)
$$\begin{aligned} b_a^{t+1}(\varvec{x}_a)\propto & {} \prod _{a: x_I \in \varvec{x}_a} m^{t+1}_{I \rightarrow a}(x_I) \,. \end{aligned}$$
(7)

Note that the notation \(x_I \in \varvec{x}_a\) means that variable node I and factor node a are connected on the factor graph. The updates are repeated until a reasonable stopping criterion is reached, for example, when \(b^{t+1}_I(x_I) - b^{t}_I(x_I)\) or \(b^{t+1}_a(\varvec{x}_a) - b^{t}_a(\varvec{x}_a)\) reach some desired tolerance.

Output. The sets \(\{b_I(x_I)\}\) and \(\{b_a(\varvec{x}_a)\}\) then give the approximations of the marginals \(\{P_I(x_I)\}\) and \(\{P_a(\varvec{x}_a)\}\), respectively, that we want to determine.

2.3 Systematic clustering of short loops in factor graphs

Importantly, in relation to our application to supply networks, we should distinguish between the topology of the original network such as the power grid and the topology of the associated factor graph. As we have seen in Sect. 2.1, even if the graphical representation of the original network is a tree, a naively assigned factor graph may unavoidably contain loops. Our statements below refer always to the topology of the factor graph.

Clustering is a method to find factor graph representations with a reduced number of loops by aggregating multiple variable nodes and/or factor nodes into a reduced set of variable or factor nodes. For variable nodes this corresponds to treating a subset of the variables as a new single variable node, while for factor nodes their clustering corresponds to multiplying factors together to obtain a new, aggregated factor. The catch in the choice of factor graphs is that the larger the subsets \(\varvec{x}_a\), the more difficult is the calculation of messages resulting from Eqs. 45. In the extreme case, where the whole distribution is clustered into a single factor, the algorithm simply returns the original marginalization problem \(b_I(x_I) = \int P(\varvec{x}) \prod _{J \ne I} \mathrm {d} x_J\), such that BP is exact but of no advantage anymore. Thus it is important to find a clustering that 1. guarantees a high accuracy, and 2. avoids difficulties in computing the messages via Eqs. 45.

Fig. 2
figure 2

a A simple tree network. b The striped ellipses indicate the clusters proposed as new variable nodes: Each cluster is assigned to a link and consists of the vertex variables at each end of that link. c Copying process of the vertex variable nodes on the factor graph to avoid overlapping. d Adding factor nodes \(H_i\) between those variable nodes that enter \(H_i\) of Eq. 8, including the \(\delta \)-constraints, and attaching leaf nodes (dashed squares) (one for each variable node) that represent the \(H_{ij}\)-terms of Eq. 8. Note that (d) has the same tree structure as (a), where the links of (a) with attached vertices become variable nodes in (d) and the vertices of (a) have their counterpart in \(H_i,\delta \)-factor nodes with edges correspondingly attached. The links of the original network furthermore give an additional leaf node on the factor graph

2.3.1 Generating the clustered factor graph

In the clustered factor graph that we propose, we assign a variable node to each link of the original network. Thus, each of these variable nodes is a tuple consisting of the vertex variables at each end of that link. For a simple tree network, as in Fig. 2a, this is indicated in Fig. 2b. In Fig. 2b, the tuples of vertex variables are shown that are supposed to make up the variable nodes of the clustered factor graph. For each link (ij), the variable node at this link consists of the two vertex variables \((v_i,v_j)\) at each end of the link. However, the variable nodes are overlapping in the sense that each vertex variable \(v_i\) is contained in multiple variable nodes: The variable node \((v_i, v_j)\) for each link \((ij): j \in N(i)\) connected to vertex i contains the variable \(v_i\). When copying the vertex variables to decouple the clusters (Fig. 2c), one has to compensate the copying by introducing \(\delta \)-constraints which enforce that all copies of a given vertex variable remain equal. This copying procedure is a generalization of the Shafer–Shenoy algorithm [29] (in the sense that the Shafer–Shenoy algorithm requires the so-called ’running intersection property’, while the copying procedure here does not). Mathematically this amounts to an identity operation, but it allows us to improve the performance of BP by eliminating loops from the factor graph. Connecting the clustered variable nodes to the factor nodes then leads to a tree-like factor graph as in Fig. 2d.

The number of clusters a given vertex variable belongs to, is equal to the number of links connected to the vertex. We thus have to make a copy for each of those links. Thus we define a probability distribution which depends on all the copies of v-variables, while the original v-variables are integrated out by delta constraints:

$$\begin{aligned} P_c(\varvec{v}^c) \equiv \int P(\varvec{v}) \prod _{i} \Big ( \Big [ \prod _{j \in N(i)} \delta (v^c_{ij} - v_i) \Big ] \mathrm {d} v_i \Big ) \,, \end{aligned}$$

where \(\varvec{v}^c \equiv \{v^c_{ij}: j \in N(i)\}\) is the set of all copies of vertex variables \(v_i\) kept for those links (ij) that are connected to vertex i (thus, \(v_{ij}^c\) is a copy of \(v_i\)). Stated differently, each link connecting vertices i and j keeps a copy of \(v_i\) and \(v_j\) at its ends, and the delta function constrains the copies to remain equal to the original variables. Figure 3 shows the multiplication of vertex i by three further copies, carried by the incident links toward vertices jlk. From the definitions it is clear that calculating marginals in \(P_c\) is equivalent to calculating marginals in the original distribution \(P(\varvec{v})\). The advantage is that \(P_c\) contains no extra loops, it can be represented as a loop-free factor graph for which BP is exact if the supply network itself has no loops. Writing out \(P_c\) explicitly, we get

$$\begin{aligned} P_c(\varvec{v}^c)= & {} \Big [ \prod _{i} \int \mathrm {d} v_i \, \Big \{H_i(v_i, v^c_{ji}: j \in N(i))\prod _{j \in N(i)} \nonumber \\&\quad \delta (v^{c}_{ij} - v_i) \Big \}\Big ] \Big [ \prod _{(i,j)} H_{(ij)}\big (v^c_{ij}, v^{c}_{ji}) \big ) \Big ] \,. \end{aligned}$$
(8)

In the first product, we have explicitly kept the v-integration to include the \(\delta -\)constraints, while the v-integration has been carried out in the second product. Note that each \(H_{ij}\) depends on a single variable on the new factor graph, while \(H_i\) may depend on several variables on the factor graph.

Now we are ready to define the clustered factor graph. As anticipated already in Fig. 2a–d, we are able to assign a factor graph to \(P_c\) which has the same amount of loops as the original network.

Fig. 3
figure 3

Copying procedure of the vertex variables as already used in Fig. 2. The variable \(v_i\) is copied to \(v^c_{ij}, v^c_{ik}\) and \(v^c_{il}\), one copy for each incident link

Variable nodes. As variable nodes we use the tuples \((v^c_{ij}, v^c_{ji})\), one for each link (ij) of the original network (Fig. 2a).

Factor nodes. Each vertex i of the original network gives a factor node \(\int \mathrm {d} v_i \, \Big \{H_i(v_i, v^c_{ji}: j \in N(i)) \prod _{j \in N(i)} \delta (v^{c}_{ij} - v_i) \Big \}\), abbreviated as \(H_i\cdot \delta \), see Fig. 2d. Each link (ij) of the original network gives a factor node \(H_{(ij)}\big (v^c_{ij}, v^{c}_{ji})\).

The new factor graph is illustrated in Fig. 2d. The new variable nodes partition \(\varvec{v}^c\). Multiplying all the factors together gives the full distribution \(P_c(\varvec{v}_c)\) of Eq. . This is thus a valid factor graph. Note that here the new variable nodes \((v^c_{ij}, v^{c}_{ji})\) are assigned to the links rather than to the vertices of the original network. The reason is that the copied variables enter always in pairs, one vertex variable for each end of the link. The original variables can be retrieved from the corresponding copied variables at any of the links entering the vertex from the various directions.

Claim: The resulting factor graph has exactly as many loops as the original supply network. The factor at each vertex i depends on \(\{(v^c_{ij}, v^c_{ji}): j \in N(i) \}\); thus on the factor graph we connect the factor nodes \((H_i\cdot \delta )\) at each vertex i to all variables \((v^c_{ij}, v^c_{ji})\) representing links incident to i. Given only variables \((v^c_{ij}, v^c_{ji})\) and factors at the vertices of the supply network, the factor graph thus has the same topology as the supply network, with a variable for each link and a factor for each vertex, see Fig. 2d. The other factors, associated with each link, depend on only one variable node each and are thus leaf nodes. The topology of the factor graph thus equals the topology of the supply network with the addition of some leaf nodes. In particular, it therefore has the same number of loops. This is graphically seen in Fig. 2d.

2.3.2 The flow-only factor graph as a special case

Before comparing the results of BP on the straightforward and the clustered factor graphs (Sects. 2.1 and  2.3.1), let us first discuss a special case of our clustering. The clustering simplifies if the whole distribution (Eq. 1) can be written in terms of flows \(f_{ij}(v_i,v_j)\) rather than vertex variables:

$$\begin{aligned} P_f(\varvec{f}) \equiv \Big [ \prod _{i} H_i(\sum _{j \in N(i)}f_{ij})\Big ] \times \Big [ \prod _{(i,j)} H_{ij}(f_{ij}) \Big ] \,, \end{aligned}$$
(9)

and additionally we assume that \(f_{ij} = -f_{ji}\) to implement flow conservation. If one would explicitly keep the dependence on \(\varvec{v}\), writing the flows as \(f_{ij}(v_i,v_j)\), one sees that the distribution still corresponds to a distribution of the form of Eq. 1. If we assign a factor graph to this distribution \(P_f\) in a straightforward way, using as variables \(f_{ij}\) (\(= - f_{ji}\)) for each (ij), and factors \(H_i\) for each vertex and \(H_{ij}\) for each link, the factor graph has exactly the same topology as the clustered factor graph we proposed. The advantage is that the expressions are simpler due to the absent dependence on \(v_i,v_j\).

Not all distributions of the form of Eq. 1 can be written as in Eq. 9 in terms of flow only. For example, the distribution of Eq. 9 cannot constrain the flows to Kirchhoff’s second law, so one requirement is that the original network is a tree. In this case, the flows are exclusively determined by the conservation law \(g_i = \sum _{j \in N(i)} f_{ij}\), and do not receive extra constraints from the vertex variables \(\varvec{v}\). Another requirement is that the choice of the distribution of Eq. 1 does not involve the vertex variables \(\varvec{v}\) directly, but only indirectly through \(\{g_i\}\) and \(\{f_{ij}\}\). If the network does not satisfy these requirements, the flow-only distribution (Eq. 9) can still be used as an approximation, where we ignore the constraints that the vertex variables \(\varvec{v}\) induce on the flows. In the case of an electric power grid, this corresponds to ignoring Kirchhoff’s second law, taking only power flow conservation into account. We call this approximation the “flow-only”-approximation, which we considered in [10]. We will further compare this approximation to our clustering in Sect. 3.

2.3.3 Comparison of factor graphs

We defined three different ways of constructing a factor graph from a probability distribution of the form of Eq. 1:

  • the naively assigned factor graph (Sect. 2.1), from here on denoted as \(F_v\),

  • the factor graph clustered according to our procedure of Sect. 2.3.1, denoted as \(F_c\),

  • the flow-only factor graph (Sect. 2.3.2), completely ignoring the vertex variables (according to Eq. 9), denoted as \(F_f\).

Figure 4 compares how these factor graphs look like concretely for the simple network of Fig. 1. \(F_c\) and \(F_f\) have the same amount of loops as the original network, in this case none. \(F_v\) has an increased number of loops, where the difference in the amount of loops is given by \(\sum _i |N(i)|\), as argued before in Sect. 2.1. These factor graphs lead to three different sets of BP equations (following Eqs. 47).

Fig. 4
figure 4

a A simple network, which gives the probability distribution of Eq. 3. b The factor graph assigned in the straightforward way (Sect. 2.1) to Eq. 3. c The factor graph assigned to the same distribution by our clustering procedure. d The factor graph assigned to the flow-only approximation of the same distribution, as described in Sect. 2.3.2

3 A concrete implementation: Bayesian inference for state estimation in power grids

The previous section contains the general formulation of our clustering method; it gives an improved BP for supply networks by considerably reducing the number of loops in the factor graph. In this section we will use a realistic implementation for state estimation in power grids as a concrete test case to show the improvement over the naive assignment of factor graphs. The goal in state estimation problems is to reliably retrieve the underlying state of the system in terms of its variables. The variables are correlated by power flow equations and in principle accessible to measurements, but these are affected by errors, therefore some care is needed to reliably estimate the state.

In AC-power grids, the power flow equations restrict the measured values for active (reactive) power injections at vertex i, the active (reactive) flows between vertex i and vertex j, as well as the voltages, given the conductances and susceptances of the transmission lines. The DC-approximation to the AC-equations, which we consider in more detail, corresponds to a linearization of the AC-equations which is justified for high-voltage grids when angle differences are small and Ohmic power losses can be neglected. In this case, the DC-approximated AC-equations read

$$\begin{aligned} f_{ij}=B_{ij}(\theta _i-\theta _j)\;,\qquad g_i= \sum _{j\in N(i)}f_{ij} \end{aligned}$$
(10)

with \(\theta _i\) the phase angles at the vertices i, \(g_i\) representing active power injections at vertices i, \(f_{ij}\) the active power flow. The susceptances \(B_{ij}\) are provided in the data sets of the considered grids and N(i) denotes the set of vertices directly connected to vertex i by a transmission line. In [10] we ignored the first of Eq. 10 and considered only the flows as variables characterizing the state of the system. This corresponds to the flow-only approximation discussed in Sect. 2.3.2. In principle, the injections \(g_i\), the flows \(f_{ij}\) and more recently also the angles (via phasor measurement units (PMUs)) are accessible to direct measurements. However, one can do better than taking these direct measurements for the state estimation and use Bayes’ theorem in the form of

$$\begin{aligned} P(\varvec{x}|\varvec{z}) = \frac{P(\varvec{z}|\varvec{x}) P_{\text {pr}}(\varvec{x})}{P_{\text {pr}}(\varvec{z})} \,, \end{aligned}$$
(11)

where \(P(\varvec{z}|\varvec{x})\) is the probability that a state \(\varvec{x}\) would give data \(\varvec{z}\) and \(P_{\text {pr}}(\varvec{x})\) is the prior belief that the state is \(\varvec{x}\). If no prior knowledge exists about \(\varvec{x}\), it will be chosen as a uniform distribution. The prior belief over data, \(P_{\text {pr}}(\varvec{z})\), is independent of \(\varvec{x}\); hence it only provides a normalization constant and is irrelevant for our purposes. Bayes’ theorem is then used for state estimation and real-time processing of measurement data.

The injections \(\{g_i\}\) and flows \(\{f_{ij}\} = \{-f_{ji}\}\) are restricted by the DC-approximated power-flow equations (Eqs. 10), so we can write \(f_{ij}\) and \(g_i\) as functions of the angles, \(f_{ij}(\theta _i, \theta _j)\) and \(g_i(\theta _i, \theta _j: j \in N(i))\). Denoting a given measurement as \(z_a\) (always a scalar), and the subset of variables which enter the measurement as \(\varvec{\theta _a}\), we then assume \(z_a = f(\varvec{\theta }_a) + \xi _a\), where \(\xi _a\) are generated independently from Gaussian distributions with known standard deviations \(\sigma _a\). The function f represents the scalar quantity which is measured, specified as a function of the angle variables. If we consider direct measurements of state variables (\(g_i\), \(\theta _i\) or \(f_{ij}\)), then we have \(f(\varvec{\theta _a})\) equal to \(g_i(\theta _i, \theta _j: j\in N(i))\), \(\mathbbm {1}(\theta _i)\) or \(f_{ij}(\theta _i, \theta _j)\), respectively.

Measurement errors of this form thus give \(P(z_a|\varvec{\theta }_a) \sim N(f(\varvec{\theta _a}), \sigma _a)\), which contribute to the joint probability distribution \(P(\varvec{z}|\varvec{\theta }) = \prod _a P(z_a|\varvec{\theta }_a)\). If no direct measurement is available, it is convenient to write this as a measurement with \(\sigma _a \rightarrow \infty \) (and \(z_a\) arbitrary). With Bayes’ theorem we get:

$$\begin{aligned}&P(\varvec{\theta }\vert \varvec{z}_g, \varvec{z}_f, \varvec{z}_\theta ) \nonumber \\&\quad = P(\varvec{z}_g, \varvec{z}_f, \varvec{z}_\theta \vert \varvec{\theta }) P_{\text {pr}}(\varvec{\theta })\nonumber \\&\quad = \Big [ \prod _{i} P(z_{g_i}\vert g_i(\theta _i, \theta _j: j \in N(i)) \times P(z_{\theta _i}\vert \theta _i) \Big ]\nonumber \\&\qquad \times \Big [ \prod _{(i,j)} P(z_{f_{ij}}\vert f_{ij}(\theta _i, \theta _j)) \Big ] \times P_\text {pr}(\varvec{\theta })\,, \end{aligned}$$
(12)

where \(\varvec{z}_g, \varvec{z}_f\) and \(\varvec{z}_\theta \) denote the set of power injection, flow and angle measurements, respectively. Thus, this distribution gives the likelihood that the true state is \(\varvec{\theta }\), given the measurements in \( \varvec{z}_g, \varvec{z}_f, \varvec{z}_\theta \). In view of the state estimation problem, we are interested in calculating marginals of Eq. 12 such as \(P_{(ij)}(\theta _i, \theta _j)\equiv \int \prod _{k\not =i,j} d\theta _k P(\varvec{\theta }| \varvec{z}_g, \varvec{z}_f, \varvec{z}_\theta )\) to calculate likely values of the flow \(f_{ij}\) and the corresponding phase angles (and similar for other quantities). To calculate the marginals, we have to deal with a number of integrals of large products over all vertices and transmission lines.

3.1 BP for power grid state estimation: performance in terms of speed and accuracy

The distribution \(P(\varvec{\theta }| \varvec{z}_g, \varvec{z}_f, \varvec{z}_\theta )\), as a function of \(\varvec{\theta }\), is of the form of the general distribution given in Eq. 1 (assuming a uniform prior). We can thus use BP to solve the state estimation problem and compare the results for the different factor graphs.

In the following, the subscript x shall indicate for which factor graph \(F_x\) is evaluated: \(F_v\) (the straightforward factor graph), \(F_c\) (the clustered factor graph), and \(F_f\) (the flow-only factor graph). We use BP to solve the state estimation problem on the IEEE-300 benchmark network [30]. The IEEE-300 network is a realistic and heterogenous benchmark network with 300 vertices, 411 links and 112 loops, it is described in more detail in Appendix C. As explained in Sects. 1 and 2.3, the use of the new algorithm refers to the avoidance of extra loops in the associated factor graph. We do not modify the 112 loops (by clustering some vertices) in the IEEE-grid, but keep them and compare the performance with and without the loops in the assigned factor graphs. Note that the number of additional loops in the naively assigned factor graph would be \(\sum _i\vert N(i)\vert =2\cdot \sharp \; \text{ links }=2\times 411=822\) on top of the 112 loops of the IEEE-300 grid, and all these additional loops would be short.

We will consider measurements of the power flows, measurements of the power injections, and, if PMUs are assumed to be present, measurements of the phase angles. We discuss two situations: one where all of the variables are measured (measurement devices at every vertex and transmission line), and one—the more realistic case—where only the flows and injections are measured (i.e., without any PMUs). The flow and injection measurements are assumed to have an error \(\xi _a\) with variance \(\sigma ^2_a = 10^{-3}\), while the angle measurements, if present, are assumed to have an error \(\xi _a\) with variance \(10^{-6}\). Using these values, we randomly draw the measurements \(\varvec{z}\) following the description in the previous section, and use BP to find estimates of the state variables by calculating marginals of \(P(\varvec{\theta }|\varvec{z})\).

For comparison, we first make use of a “damping” method proposed in [8] to improve the convergence of BP on \(F_v\) (since numerical simulations have shown that naively running BP on \(F_v\) gives diverging estimates). According to this damping procedure of [8], for each Gaussian message \(m_{x \rightarrow y}^t \sim N(\mu _{x\rightarrow y}^t, (\sigma ^2)_{x \rightarrow y}^t)\) of Eqs. 4 and 5 one chooses with probability 1/2 either \(\delta =0\) or \(\delta =1\) (\(P(\delta = 0) = P(\delta = 1) = 1/2\)), and updates:

$$\begin{aligned} \mu _{x\rightarrow y}^{t + 1} = \delta \times {\hat{\mu }}_{x\rightarrow y}^{t + 1} + (1-\delta ) \times 1/2 \times \Big ({\hat{\mu }}_{x\rightarrow y}^{t + 1} + \mu _{x\rightarrow y}^{t} \Big ),\nonumber \\ \end{aligned}$$
(13)

where \({\hat{\mu }}^{t+1}_{x\rightarrow y}\) is the mean of the message that would have been calculated at step \(t+1\) without damping. The variance \(\sigma ^2_{x \rightarrow y}\) is damped equivalently. Thus, with probability 1/2 the message is updated as usual, and otherwise damped by a factor of 1/2. Testing the method according to Eq. 13 for different damping parameters and comparing it to the damping algorithm proposed in [19], we indeed find that damping according to [8] (Eq. 13) improves the convergence the most. For the results for \(F_v\) we will use this “damped” version of BP and use it as the best existing alternative, which is still outperformed by our method. For \(F_c\) and \(F_f\), the algorithm converges without problem also without damping, so we will consider their undamped versions.

3.2 Results for the factor graphs \(F_v\), \(F_c\) and \(F_f\)

We present our results for the predictions from the three factor graphs \(F_v\), \(F_c\) and \(F_f\). (Details on the implementation are given in Appendix D.) We focus here on the estimation of the flows \(\{f_{ij}\}\) (in \(F_c\) and \(F_v\) these can be retrieved as \(B_{ij}(\theta _i - \theta _j)\)). BP on \(F_v\), \(F_c\) and \(F_f\) will produce different estimates of the marginals \(\{P_{(ij)}(f_{ij})\}\), which we will denote by \(\{b_i^v(f_{ij})\}\), \(\{b_i^c(f_{ij})\}\) and \(\{b_i^f(f_{ij})\}\), respectively. To retrieve their accuracy, we need a way to compare them with the ‘true’ marginals \(P_{(ij)}(f_{ij})\).

Note that the variables of the factor graph \(F_c\) are tuples \(\theta _{ij}^c,\theta _{ji}^c\), the beliefs resulting from \(F_c\) depend on \(\theta _i,\theta _j\), so that averages or variances of the flows \(B_{ij}(\theta _i-\theta _j)\) can be directly calculated using these beliefs as the probability distribution \(P_{(ij)}(\theta _i,\theta _j)\) according to Eq. 6, when calculating expectation values. In contrast, the variables of \(F_v\) are \(\theta _i\), the resulting beliefs depend on \(\theta _i\) separately. In this case, the variance of the flow \(f_{ij}\) cannot simply be obtained as the sum (\(B_{ij}^2\) times the variances of \(\theta _i\) and \(\theta _j\)), since \(\theta _i\) and \(\theta _j\) are correlated. The marginal distribution of \(f_{ij} = B_{ij} (\theta _i - \theta _j)\), in particular its variance, must be calculated from Eq. 7.

Since we assume all factors are Gaussian, the marginal distributions of the flows are Gaussian as well, so we denote them by their mean and standard deviation as \(P_{(ij)}(f_{(ij)}) \sim N(\mu _{(ij)}, \sigma _{(ij)})\). Here the subscript (ij) represents the flow variables in the following. We calculate the means \(\{\mu _{(ij)}\}\) via the least-squares approach and the standard deviations \(\{\sigma _{(ij)}\}\) by a matrix inversion. Both the means and standard deviations that are calculated in this way are presumed to set the accurate benchmark for a comparison to the accuracy of the different implementations of BP. Each implementation of the factor graph with \(x\in \{v,f,c\}\) gives an estimate \(b_{f_{(ij)}}^x(f_{(ij)}) \sim N(\mu _{(ij)}^x, \sigma _{(ij)}^x)\), which should be close to the benchmark values \(\mu _{(ij)}, \sigma _{(ij)}\). For a chosen method x we summarize the estimates into the average square error of \(\{\mu _{(ij)}^x\}\) and \(\{\sigma _{(ij)}^x\}\) by defining:

$$\begin{aligned} \varDelta _\mu\equiv & {} \frac{1}{411}\sum _{(ij)}(\mu _{(ij)}^x - \mu _{(ij)})^2 \end{aligned}$$
(14)
$$\begin{aligned} \varDelta _\sigma\equiv & {} \frac{1}{411}\sum _{(ij)}(\sigma _{(ij)}^x - \sigma _{(ij)})^2 \,, \end{aligned}$$
(15)

where the sum runs over all 411 links of the IEEE-300 network. We use these deviations to assess the accuracy of the estimates provided by the different implementations of BP. Since random generation of the measurement data \(\varvec{z}\) leads to differing errors, we repeat the procedure 100 times. For a fair comparison, we note that per BP iteration the wall-clock time for a native Python 3.7 implementation (on an Intel i5-2400 processor) of \(F_f\), \(F_c\) and \(F_v\) takes 0.01 s, 0.03 s, and 0.1 s, respectively. The implementation is given in the supplementary material. Native Python is relatively very slow (up to two orders of magnitude slower than other faster implementations) and the calculations can be massively parallelized, so these time-scales can be significantly reduced (the relative speed of BP on the different factor graphs are expected to remain more or less unchanged).

Figure 5 shows how the error on the variance \(\varDelta _\sigma \) saturates for BP on the different factor graphs. For all situations discussed above, the estimated variances converge fast to small values. The estimates provided by \(F_c\) are significantly more accurate than those provided by \(F_v\), which are again significantly more accurate than those provided by \(F_f\). Note that \(F_v\) corresponds to the standard implementation of BP with a naive factor graph assignment.

Fig. 5
figure 5

The saturation of the variance predicted by BP on the different factor graphs, as measured by the average square error \(\varDelta _\sigma \) (see Eq. 15), showing that the clustered version of BP is most accurate: a with angle measurements, b without angle measurements. The variances predicted by \(F_c\) and \(F_f\) do not depend on the values of the chosen measurements. The variances predicted by \(F_v\) are slightly different every time BP is run because of the probabilistic damping (Eq. 13); here an average over 100 random measurement sets is shown. Note that \(F_v\) corresponds to the standard factor graph assignment and serves as the best version of existing alternatives

Focusing on the estimates for the mean, the convergence of the estimates for the different scenarios are shown in Fig. 6. Figure 6a shows the situation where PMU measurements are included. After convergence, the means predicted by \(F_v\) and \(F_c\) are both exact (as is in general true for means predicted by Gaussian BP [31, 32]). However, \(F_c\) converges in much less iterations than \(F_v\), by around a factor of 400. Looking at the mean estimates for the situation without PMUs, as shown in Fig. 6b, the situation is similar. \(F_c\) converges quickly to the exact answer. Although it is not shown here, experiments indicate that eventually the mean predicted by \(F_v\) does converge. However, the time scale over which it converges is so much larger (about \(10^6\) iterations) that it renders the final estimate practically irrelevant. In practice, in the absence of angle measurements even \(F_f\) performs better than \(F_v\).

Fig. 6
figure 6

The convergence of the mean predicted by BP on the different factor graphs, as measured by the average square error \(\varDelta _\mu \) (Eq. 14), as a function of the number of BP iterations, showing that BP with only flows (\(F_f\)) is fastest but not very accurate, while clustered BP (\(F_c\)) converges reasonably fast and gives the exact answer (up to machine precision). When un-clustered BP (\(F_v\)) does converge it also gives the exact answer; this takes, however, very long: a with angle measurements, b without angle measurements. The predictions depend on the values of the measurements; here the lines give an average of \(\varDelta ^\mu \) over 100 sets of random measurements. The filled region gives the standard deviation of \(\varDelta ^\mu \) between different random sets of measurements

In summary of Figs. 5 and 6, it should be emphasized that our comparison refers to the performance of different BP-algorithms, differing by the assigned factor graph, for which the newly proposed assignment \(F_c\) performs best. In contrast to other approaches such as least-squares or quasi-Newton methods, the general supremacy of BP based algorithms was demonstrated already in [8,9,10], as mentioned in the introduction.

4 Outlook to other applications of the algorithm to power grids

Our clustering rule for loopy factor graphs can be used for any distribution of the form of Eq. 1. It thus applies for BP to a variety of supply networks, if the flows are conserved at the vertices and are determined by variables at the vertices (\(\varvec{v}\), in our notation). In the following we mention different versions of power flow in electricity grids and discuss applications to gas-pipeline networks and fluid flow networks in the appendix. Problems that are studied in relation to power flow include power flow analysis, i.e., solving the power flow equations, similar to what we analyze for the gas pipe network in Appendix A, optimal power flow [22], state estimation [8, 33], as considered in Sect. 3, and optimization under uncertainty [24, 25]. In these applications, typically three different power flow equations are distinguished:

  • The DC-approximation as used in Sect. 3 as an approximation of the AC-equations, valid for high voltages at low power losses. Note that despite the name, the application of the DC-approximation is to AC-networks.

  • In a Direct Current (DC)-network, the size of the current \(I_{ij}\) between vertices i and j is given by Ohm’s law \(I_{ij} = (V_i - V_j)/R_{ij}\), where \(V_i\) and \(V_j\) are the voltages at vertices i and j and \(R_{ij}\) is the resistance of the link connecting vertices i and j. DC is used in low-voltage distribution grids and in very long distance transmission [34].

  • Alternating Current (AC) is typically used for high-voltage long-distance transmission [34]. The network voltages \(\{V_i(t)\}\) oscillate at a constant frequency \(\omega \), such that \(V_i(t) = \sqrt{2}|V_i|\sin (\omega t + \theta _i)\). Here \(|V_i|\) is the voltage magnitude. Together with the phase angles \(\theta _i\), they can be determined from the equations

    $$\begin{aligned} f^P_{ij}&= |V_i| |V_j|\big [G_{ij} \text{ cos }(\theta _i - \theta _j) + B_{ij}\text{ sin }(\theta _i - \theta _j)\big ] \nonumber \\&\quad -|V_i|^2G_{ij}\,, \end{aligned}$$
    (16)
    $$\begin{aligned} f^Q_{ij}&= |V_i| |V_j|\big [G_{ij} \text{ sin }(\theta _i - \theta _j) - B_{ij}\text{ cos }(\theta _i - \theta _j)\big ] \nonumber \\&\quad + |V_i|^2B_{ij}, \end{aligned}$$
    (17)

    where \(f^P_{ij}\) and \(f^Q_{ij}\) are the active and reactive power flow, each of which is conserved at the vertices. Solving the equations requires the specification of the quantities \(B_{ij}\) and \(G_{ij}\) for each transmission line, known as the susceptance and the conductance, respectively, which can be calculated from the impedance and resistance of the transmission line [34]. To implement AC in our framework, the tuples (\(\theta _i\), \(|V_i|\)) should be considered as the vertex variables \(v_i\). Due to the nonlinearity of Eq. 16, BP should be combined, for example, with GN, as in [33].

For power grids, a situation with uncertain costs due to fluctuations in uncertain power injections by renewable resources was investigated in [24, 25]. In this case, each possible production assignment leads to a different distribution of the form of Eq. 1. Uncertain power injections may enter the probability distribution \(P(\varvec{v})\) of Eq. 1 as a product over all vertices \(\prod _i\exp {\big \{-[g_i-\sum _{j\in N(i)}f_{ij}(v_i,v_j)]^2/(2\sigma _i^2)\big \} }\) if the production fluctuates according to such a Gaussian distribution with variance \(\sigma _i^2\) around some mean injection \(g_i\), while fixed and controllable production at vertices k would multiply this term by \(\prod _k\delta (g_k-\sum _{j \in N(k)}f_{kj}(v_k,v_j))\). \(P(\varvec{v})\) induces a distribution of flows \(P(\{f_{ij}\})\) which directly indicates possible overflows of transmission lines. BP may then be used to calculate average costs induced by the production \(g_k\) at the set of controllable production vertices. These costs can furthermore include probabilistic constraints on \(\{g_k\}\) which enforce, for example, that severe link overloads are rare [24]. In [23], it is shown that such an optimization can be performed with BP using the fact that marginals of the distribution satisfy the BP equations (to be given below). This is mathematically similar to the ’survey propagation’ studied in [1] and investigation of large deviations given in [35].

5 Conclusions

For applications of BP, we considered a new assignment method of factor graphs which avoids the generation of additional loops as compared to the original supply networks. Our method applies to state estimation or optimization problems whose state can be described by a probability distribution that factorizes over the vertices and over the links of the network. If these distributions are summed or integrated over upon marginalization, BP provides an efficient way of organizing the sum or integrals over these products. In the naive assignment of a factor graph, the variables on the original supply network are chosen as variable nodes on the factor graph, while the factors of the probability distribution determine the factor nodes on the factor graph. Constraints and physical laws between the variables on the original grid may then induce additional loops on the factor graph that are detrimental for the convergence speed of BP. Our main goal was to avoid these loops.

This means that our algorithm does not address the handling of loops in the graphical representation of the original supply networks, which may differ in size and reflect the original network architecture. Such loops will survive our factor graph representation and may impede the convergence or accuracy of BP. Our method is supposed to complement other methods such as loop expansions and additional clustering rather than replacing them. For practical applications, we have furthermore focussed on cases in which the messages themselves are Gaussian functions so that the messages reduce to a few real numbers to be sent. In particular, we have assumed Gaussian distributed errors in the state estimation problems, and used successive Gaussian approximations in the steady-state analyses.

In the cases we considered, additional loops in the factor graphs result from constraints between the variables which are analogues to the two Kirchhoff laws in power grids, one corresponding to flow conservation at vertices (the flow in general being electricity, gas, water, air, soil, traffic), the second one restricting a quantity related to energy (voltage, pressure, time, other costs) along loops in the grid. The shared mathematical structure of these constraints explains the wide range of applicability of our algorithm. When the resistance in the transmission lines of the original grid are depending on the flow, the analogue of Kirchhoff’ s second law amounts to a nonlinear relation (differently from Ohm’s law). In this case an additional iterative method such as the Gauss–Newton method is required, and BP can be applied in the intermediate steps.

While our algorithm enhances the accuracy and improves the convergence speed, the additional computational effort is moderate. We have explicitly worked out this approach for the state estimation on a benchmark power grid. In the appendix we describe the state determination on two benchmark gas-networks with nonlinear flow relations, the latter case detailed in 2.3.1. We compared the performance of three factor graphs: the naively assigned, the newly proposed clustered factor graph, and the flow-only factor graph. The naively assigned one can lead to accurate results, but at the price of slow convergence if at all. The flow-only factor graph ignores constraints from (the analogue of) Kirchhoff’s second law, it is thus less accurate but useful for a first estimate and fast. The clustered factor graph is both fast and accurate and—combined with an iterative procedure in case of nonlinear constraints—it is widely applicable. Further applications to other indicated supply networks should be worked out in the future.