A-optimal designs for state estimation in networks

We consider two models for estimating the expected states of nodes in networks where the observations at nodes are given by random states and measurement errors. In the first model, we assume independent successive observations at the nodes and the design question is how often the nodes should be observed to obtain a precise estimation of the expected states. In the second model, all nodes are observed simultaneously and the design question is to determine the nodes which need larger precision of the measurements than other nodes. Both models lead to the same design problem. We derive explicitly A-optimal designs for the most simple network with star configuration. Moreover, we consider the network with wheel configuration and derive some conditions which simplify the numerical calculation of the corresponding A-optimal designs.


Introduction
The design problem addressed in this paper is motivated by a cooperation with electrical engineers who study electrical power distribution grids of medium and low-voltage levels. In a specific distribution grid the question rises where measurements of the electrical power should be taken and how precise these measurements should be in order to get a precise estimation of the state of the grid. Due to high costs, it is not possible to use sensors for measuring the electrical power at each position of the grid and at some positions so-called pseudo measurements have to be used instead, see e.g. Muscas et al. (2014), Schlösser et al. (2014) or Schurtz (2020). These pseudo measurements are typically obtained from historical data or weather data to estimate, for example, the needed heating energy of a household or the produced energy of a photovoltaic system. It is obvious that these pseudo measurements are less precise than the measurements of sensors. Moreover, they can vary in their precision. For example, a temperature measurement close to a node of the grid is a more precise estimate of the electrical power at this node than a temperature measurement further away. However, the more precise a measurement is the more expensive it is. Hence having cost constrains, the design problem is to determine the necessary measurement precision at the nodes, which is connected to the problem of sensor allocation.
The problem of sensor allocation in distributed systems has often been addressed in research during the past 50 years, see for example the surveys in Kubrusly and Malebranche (1985), Uciński (2022), Duan et al. (2022) or the book of Uciński (2004). In most of the considered cases, methods are developed to minimize a function of the covariance matrix of an appropriate estimator of the system, see e.g. Uciński (2000) and Singhal and Michailidis (2008). In particular, Patan and Patan (2005) use partial differential equations and its simplification to a non-linear models in combination with a steepest descent method to find optimal weights of given support points (i.e. the sensor positions), whereas Uciński (2022) addresses the best selection of sensors in order to obtain a proper estimation of subsets of unknown parameters of a spatiotemporal system modelled by a partial differential equation. The experimental design problem for state estimation in electrical power grids is especially treated for example by Li et al. (2011), Xygkis et al. (2018), and Cao et al. (2022). However, they all solve the problem of allocating F sensor or generator positions out of E > F possible positions by greedy algorithms since the number of positions is high. Azhdari and Ardakan (2022) modify this problem by allocating E components into F groups where each group belongs to a node of the network.
All of these approaches deal with large networks so that approximate solutions can only be found numerically. Moreover, the aim in electrical grids is to estimate unknown expected states at certain positions of the grid. In the present paper, we simplify this state estimation problem so that exact optimal solutions can be found. For this purpose, we consider two specific models: In a first scenario, called Model A, we assume independent univariate observations given by random states and additive measurement errors at given nodes of the grid where the variances of the random states are equal and the same holds for the measurement errors. In this situation, the design problem is given by the question of allocating the observations at the different nodes. However, this approach does not treat the possibility of different precision of measurements at nodes. Moreover, the assumption of independent univariate observations is unrealistic in electrical grids. As soon as several sensors (including pseudo measurements) exist, one would use the simultaneous observations at the sensors placed at the different nodes. Hence, we consider a second scenario, called Model B, where independent simultaneous observations are available at the given nodes of the network for the different observation time points. Consequently, a single observation is a vector given by the random states of the nodes with additive measurement errors consisting of variances depending on the nodes. Here, the design question is at which nodes the variance of the measurement could be high and where not. This concerns the question at which nodes less precise pseudo measurements are sufficient, and where more precise measurements of sensors are necessary. In the following, we show that the design problem of Model B coincides with the design problem in Model A if nonrandom states are assumed in Model A.
The paper is organized as follows. Section 2 presents the two simple models and how they are related to each other. Section 3 shows how a general result concerning A-optimal designs with minimum support can be used to derive A-optimal designs in the two models analytically. This result is applied in Sect. 4 to the most simple network, a so-called star network, which nevertheless is often considered in studies for electrical power distribution grids, see e.g. Su and Wang (2020) and Azhdari and Ardakan (2022). In particular, in Sects. 4.2 and 4.3 we study the situation where the whole expected state vector is not identifiable. In Sect. 5, we consider an extension of the star network, given by a wheel. In particular, we derive sufficient conditions for the identifiability of the state vector, which can also be used to reduce the numerical complexity for that type of network. Finally, some extensions of the presented approach are discussed in Sect. 6.

Simple models for state estimation in networks
We consider a network with I +1 nodes 0, . . . , I , where node 0 denotes a central node or outgoing node of the electrical power distribution grid. The expected observations Y 0 , Y 1 , . . . , Y I at these nodes depend on the unkown expected states s 0 , s 1 , . . . , s I of the different nodes in the network. The aim is the estimation of these states, contained in the state vector s = (s 0 , s 1 , . . . , s I ) ∈ R I +1 or an appropriate linear aspect L s with L ∈ R q×(I +1) using the observation vector Y = (Y 0 , Y 1 , . . . , Y I ) . In the situation under consideration, the expected observation Y i at a particular node i is both influenced by the corresponding expected state s i and by the expected states s j of other nodes j = i, that are connected to node i (i = 0, . . . , I ). More precisely, let x i j be the influence of the state s j at node j on the expected observation Y i taken at a particular node i (i = 0, . . . , I ) and denote the matrix storing these influences by X = (x i j ) i, j=0,...,I ∈ R (I +1) . Then the expected observation vector The matrix X ∈ R (I +1)×(I +1) is called influence matrix of the network, as it describes the influence of the states on the observations at the different nodes. Note that X is strongly connected to the adjacency matrix of a network with weighted edges: if the diagonal elements of X are removed, the resulting matrix describes the structure of the network, where two nodes i = j are connected with an edge weighted by x i j if x i j = 0. Denoting the (i + 1)-th unit vector in R I +1 by u i , the observation y i at node c i can be rewritten by u i Xs.
Later in the paper, we restrict ourselves to the case, where the influence of the state s i on the observation at node i is given by a > 0, whereas the influence of the states s j ( j = i) of the adjacent nodes on the expected observation at node i is equal to b > 0. Then, the influence matrix X is of the form where I (I +1) denotes the (I + 1)-dimensional identity matrix and A ∈ {0, 1} (I +1)×(I +1) is the adjacency matrix of the considered (unweighted) network. Moreover, the expected observation Y i at node i can be written by Example 1 1. Star-Network. Let node 0 be the center of the network, which is connected to the other nodes 1, . . . , I (see left panel of Fig. 1for I = 4). Let a be the influence of the state s i on the expected observation at the corresponding node i (i = 0, . . . , I ), whereas b denotes the influence of the states s j ( j = i) of the adjacent nodes on the expected observation taken at node i. Using (1), the influence matrix X is given by where 1 I = (1, . . . , 1) ∈ R I and I I denotes the I -dimensional identity matrix. Consequently, the expected observation Y 0 obtained at the central node 0 is given by whereas the expected observations at the non-central nodes are of the form 2. Wheel-Network. Let node 0 be again the center of the network, which is connected to all other nodes of the network. Moreover, the remaining nodes are connected to two others nodes (see right panel of Fig. 1 for case I = 4). Similar to the situation of the star network, let a be the influence of the state s i on the expected observation at the corresponding node i (i = 0, . . . , I ), whereas b denotes the influence of the states s j ( j = i) of the adjacent nodes on the expected observation taken at node i. Using (1), the influence matrix X is given by where the matrices A ∈ R 2×2 and B ∈ R (I −1)×2 are of the form where u I −1 j denotes the j-th unit vector in R I −1 . The matrixX ∈ R (I −1)×(I −1) is a triadiagonal matrix with main diagonal elements equal to a, whereas the lower and upper diagonal elements are equal to b, that is Based on the structure of the network and on the notation introduced beforehand, the expected observation at the central node is again given by whereas the expected observations at the non-central nodes are of the form Y 1 = u 1 Xs = bs 0 + bs 2 + bs I + as 1 , Y I = u I Xs = bs 0 + bs I −1 + bs 1 + as I .
In practice, observations of the form Y given in (1) are not available: On the one hand the expected observations Y might be corrupted by random measurement errors, on the other hand the states at the different nodes might not be fixed to s, but also random. Consequently, the vector s only describes the expected state of the network. Nevertheless, the aim of the present paper is to estimate the unknown expected state vector s or a linear aspect L s of it using random observations Y 1 , . . . , Y N at the different nodes of the network. For this purpose, we introduce two different models, called Model A and Model B.

Model A
In the first scenario, we assume that at each time point n ∈ {1, . . . , N }, one observation Y n at one particular node i(n) ∈ {0, . . . , I } is available, where Y n is a linear combination of the random states vector S n = (S 0,n , . . . , S I ,n ) of the network at that time point and an additive measurement error E n . Furthermore, the distance between two consecutive time points is assumed to be sufficiently large, so that we assume that Y 1 , . . . , Y N are successive independent univariate observations at different nodes of the network.
Under the assumption that the random state vector S n at time point n is of the form where s is the expected state vector of the network and Z 1 , . . . , Z N are independent random vectors with mean 0 and covariance matrix ρ z σ 2 I (I +1)×(I +1) , ρ z ≥ 0, σ 2 ≥ 0, the n-th observation Y n at the node i(n) is given by where X is the influence matrix, u i is the (i + 1)-th unit vector, and the independent measurement errors E 1 , . . . , E N have mean 0 and variance ρ E σ 2 . The random elements E 1 , . . . , E N and S 1 , . . . , S N are also assumed to be independent. Choosing either ρ E = 0 or ρ Z = 0, we obtain either a model without measurement errors or a model with non-random states at the different nodes, respectively. The variance of an observation Y n in model (7) is given by where the variance σ 2 i(n) at node i(n) is of the form Let D := diag(σ 0 , . . . , σ I ), where diag(σ 0 , . . . , σ I ) denotes the diagonal matrix with diagonal elements σ 0 , . . . , σ I . Using 1 σ i(n) u i(n) Xs = u i(n) D −1 Xs, we define transformed random variables Y n by where E 1 , . . . , E N are independent with mean 0 and variance σ 2 . Note that the model given in (9) is a linear model with homescedastic errors, where the experimental condition at time point n is given by node i(n), n = 1, . . . , N . Hence, . . . , u i(N ) ) and X d = U d D −1 X, we obtain where the best linear unbiased estimator for an aspect L s in (10) is given by The corresponding covariance matrix of that estimator is . . , I . Note that δ i is equal to the relative amount observations taken at node i, i = 0, . . . , I . In order to use the established methods of optimal design theory for approximate designs, we further relax the condition on the values of δ 0 , δ 1 , . . . , δ I and assume that (11) where the set denotes the set of all approximate designs δ with support at the nodes 0, . . . , I . If an approximate design δ is given and N observations can be taken, a rounding procedure is applied to obtain integers n 0 , . . . , n I from the not necessarily integer valued quantities δ i N (see Pukelsheim and Rieder (1992)). Then, the design problem reduces to the determination of an approximate design δ = (δ 0 , . . . , δ I ) ∈ such that the covariance matrix Cov(L s( Y )) becomes small in some sense. Since the interest lies in estimating L s, we are interested in determining the widely used A-optimal designs. More precisely, following Pukelsheim (2006), p. 137, a design δ * ∈ is called A-optimal, if it minimizes the trace of the covariance matrix, i.e.
with D δ = diag(δ 0 , . . . , δ I ). In the case of non-random states at the different nodes (i.e. ρ z = 0), we set ρ E = 1 without loss of generality and the design problem stated in (12) reduces to

Model B
For electrical power distribution grids, it is more realistic to assume that for each time point n ∈ {1, . . . , N }, the observation Y n consists of simultaneous observations at all nodes i = 0, 1, . . . , I of the network. Hence, Y n is a (I + 1)-dimensional random vector. If the distance between two consecutive time points is sufficiently large, we can still assume that Y 1 , . . . , Y N are independent random vectors. With the notation of the previous section, the n-th observation is a (I + 1)-dimensional random vector Y n of the form where the measurement errors E 1 , . . . , E N and random effects Z 1 , . . . , Z N are independent random (I + 1)-dimensional random vectors with mean vector 0 I +1 .
Additionally, we assume that the components of the measurement error E n are independent, whereas the entries of the random effect Z n might be correlated indicating a dependence between the states of different nodes. More precisely, the covariance matrices of E n and Z n are assumed to be of the form Cov where the entries of D E indicate the different accuracies with which the states are measured at the different nodes. Then, the covariance matrix of of the observation Y n is given by Since Y 1 , . . . , Y N are independent, the covariance matrix of the vector of all available where ⊗ denotes the Kronecker product and I N ×N is the N × N identity matrix.
Transforming the vector of observations by we obtain a linear model with homescedastic errors. Hence, the best linear unbiased estimator for L s is given by The corresponding covariance matrix is of the form If the influence matrix X of the network is non-singular, the covariance matrix further reduces to The covariance Cov(L s( Y )) directly depends on the matrix D E in (15) whose diagonal entries indicate the inaccuracy of the applied measurement procedures at the different nodes. More precisely, if the applied measurement procedure is precise at node i, the variance σ 2 i E will be small (i = 0, . . . , I ). In the following, we assume that the quantitive relation of all available measurement procedures to one reference measurement procedure is known, i. e. the constants c i = * is the variance of the reference measurement procedure. In the context of electrical power distribution grids, that would mean that the practioner does not know the exact precision of a particular measurement procedure, but has knowledge about its relative precision compared to the best available procedure based on sensors. We now address the design problem of allocating the different available measurement procedures at the different nodes such that the resulting covariance matrix Cov(L s( Y )) becomes small in some sense and such that the estimation of the linear aspect L s is precise. For that purpose, we define the precision of the measurement procedure applied at node i by δ i := 1 We further assume that the sum of these precisions is bounded by probably unknown constant K < ∞, i.e. I i=0 δ i ≤ K < ∞. Note that this can be achieved under the condition that the constants c 0 , . . . , c I are known. Due to the fact that D δ = D −1 E and that the formulation of the covariance matrix Cov(E n ) is in terms of an overall variance σ 2 (c. f. (15)), we can assume that I i=0 δ i ≤ 1 without loss of generality. As the optimal design will be allocated at the boundary of that condition we can restrict ourselves to the side condition I i=0 δ i = 1 so that the set of admissible designs is given by which is a subset of for Model A introduced in (11). The reformulation in terms of δ leads to the design problem which is similar to the design problem (13) in Model A with nonrandom states (ρ Z = 0). Note that in contrast to the situation of (13) the regularity of the influence matrix X and the restriction to˜ are necessary to define the design problem stated in (16).
Note that a solution of the design problem stated in (16) might not exist due to the fact that the set˜ is not compact anymore (the boundaries are excluded).

A general result for A-optimal designs in Models A and B
The Models A and B lead to design problems of the form where X is an Since X is a square matrix, the problem at hand is a design problem with minimum support. It is easy to see that the D-optimal design for L s = s in this case is given by However, the A-optimal designs are of a different form. We now assume that X is a non-singular matrix so that its inverse X −1 exists. Then the following proposition holds.

Proposition 1 If X is non-singular, then the design δ
Proof According to the General Equivalence Theorem for A-optimality, see (Pukelsheim 2006), Theorem 7.19, with for p = −1, K = L, a design δ * is A-optimal if and only if the inequality is satisfied for all nodes i = 0, 1, . . . , I , where the vector x i is given by in the situation under consideration.
Using that X is non-singular, we obtain for the right-handside of (19), whereas the left-handside reduces to Consequently, equality holds in (19) for all i = 0, . . . , I and the equivalence theorem for A-optimality is satisfied. That provides the assertion.
Note that v i defined in (18) are the diagonal elements of the matrix X −1 L L X −1 .
By setting , and (16), respectively, are given by where D = diag(σ 0 , σ 1 , . . . , σ I ) and The assertion for Model A with non-random states follows by ρ Z = 0. Note again that the A-optimal design for Model B can be obtained by the A-optimal design for Model A by setting ρ Z = 0, ρ E = 1, which means that the states are nonrandom. Then the assertion for Model B follows as well.
Hence as soon as the inverse X −1 of the influence matrix is determined, the Aoptimal design is available. For large complex networks, this inverse can only be determined numerically. However, for some simple networks as stated in Example 1, X −1 can be calculated analytically. This is the case, for example, for the star network introduced in Example 1, as shown in the next section.

A-optimal designs in a star network
Networks with a star configuartion, shortly star networks, are simple, but realistic networks for electrical power distribution grids, as e.g. Su and Wang (2020) and Azhdari and Ardakan (2022) pointed out. They consist of a central or outgoing node 0 which is connected to all other nodes i = 1, . . . , I of the network, wheras the other nodes are terminal nodes that are only connected to the central node 0. Therefore, we now concentrate on the situation introduced in the first part of Example 1 with the influence matrix X of the star network given by Note again that a describes the influence of the state on the observation at the respective node, whereas b denotes the amount of influence of the states at the adjacent nodes on that observation. We are now interested in the analytic determination of the corresponding A-optimal designs if the influence matrix is given by (20). For that purpose, Theorem 1 is only applicable, if the influence matrix X is non-singular. Therefore, we state an equivalent condition for the non-singularity of X given in (20) in the following lemma. (20), it holds: The statement in a) directly follows. b) Note that the determinant of X is given by

Lemma 1 For the influence matrix in
which is non-zero if and only if b 2 = a 2 I .
Against expectation the influence matrix X stays non-singular if the influence of the non-central nodes on the central node is equal to the influence of the central node, i. e. I b = a. Instead, the matrix X becomes singular, if I b 2 = a 2 , whereas this combination has no obvious effect on the structure of the star network. Nevertheless, the non-singularity of the influence matrix X has an direct impact on the availability of an appropriate estimator of the complete expected state vector s: s is only identifiable and thus estimable if and only if the influence matrix X is non-singular. In the next section, we derive the A-optimal design for the complete state vector s under the assumption of identifiability.

A-Optimal designs for the complete state vector s under identifiability
where 1 I ×I ∈ R I ×I is the I × I -matrix consisting of ones.
Proof It is well known that for symmetric matrices A and C, where C and E = A − B C −1 B are non-singular, it holds (see, e.g., Rencher (1998), p. 407).
and thus Hence Theorem 1 and Lemma 1, a) provide the following theorem: Theorem 2 If b 2 = 1 I a 2 , then the A-optimal design δ * = (δ * 0 , δ * 1 , . . . , δ * I ) for estimating the expected state vector s in the star network is given by in Model B.
Proof Setting L = I (I +1)×(I +1) we obtain by Theorem 1 that Lemma 1 a) provides the additional terms u i X X u i ρ Z + ρ E in Model A. The common terms in both models are u i (X −1 ) X −1 u i , where the inverse X −1 is given by Lemma 2. At first note that the factor 1 a 2 −I b 2 of X −1 cancels out in δ * i so that we only have to consider Here, we get for i = 1, . . . , I . Hence these are the common terms for w and v in Model A and Model B, respectively.
Remark 1 Note that δ * 1 = . . . , = δ * I , i.e. the terminal nodes are treated equally, wheras the central node 0 obtains a different value δ * 0 in general. A special case of the star network is b = 0, where the states of the adjacent nodes do not influence the observation at a particular node. Then we get v = w in both models so that δ * 0 = δ * 1 = . . . = δ * I = 1 I +1 . Hence, the A-optimal design is equal to the A-optimal design obtained in the classical model, where I + 1 independent levels of one factor are considered.
If the star network only consists of two nodes, i. e. I = 1, it also follows v = w in both models so that δ * 0 = δ * 1 = 1 2 for any b with b 2 = a 2 . Hence the design does not depend on the adjacent effect b. This is not the case for I > 1 which will be considered in detail in the following example.
Note that the A-optimal design according to the formula stated in Theorem 2 can also calculated in the case b 2 = 1 I a 2 , i.e. in case of a singular matrix X, since the factor 1 a 2 −I b 2 in X −1 cancels out and does not appear in the formula as well as in the proof. Nevertheless, the whole state vector s is not identifiable for b 2 = 1 I a 2 which is shown in the next section.

Example 2
We investigate the behaviour of the A-optimal design in dependence on different values of a and b in the situation of Model A, where either nonrandom states are given, i.e. ρ Z = 0, or no measurement errors occur, i.e. ρ E = 0. The A-optimal designs only depend on the relationship between a and b and we can set a = 1 without loss of generality. Hence, Fig. 2 shows the optimal values for δ * 0 depending on the quantity b for a star network with I = 4, 9, 25 nodes and a = 1 for Model A with nonrandom states given by ρ Z = 0 (left-hand side) and for Model A with no measurements errors given by ρ E = 0 (right-hand side). Note that the designs for Model B coincide with those of Model A with nonrandom states, if X is non-singular. In particular, Fig. 2 shows that the special case b 2 = 1 I , where the state vector s is not identifiable leads to a smooth continuation of the case b 2 = 1 I . Furthermore, if the influence b of the central node 0 goes to infinity, then the optimal weight δ 0 at the central node 0 goes to zero. This means that only a small proportion of observations should be done at the central node if it has a big influence on its adjacent nodes and vice versa. Surprisingly, the optimal weight δ * 0 increases for b 2 < 1/I and decreases for b 2 > 1/I , i. e., when the value of b is reached where the state vector s is not identifiable. Moreover, Model A with random states and no measurements errors provides larger weights δ * 0 at the central node than the Model A with nonrandom states and measurement errors. Probably, this is caused by the increased uncertainty given by the random states.

Nonidentifiability in a star network
As mentioned in Remark 1, the state vector s is not identifiable, as soon as the influence matrix X becomes singular. In the case of the star network, this is equivalent to the case where b 2 = 1 I a 2 , where the influence matrix Moreover, the often used aspect , where the central node 0 is considered as control level, is not identifiable, since However, the aspect

A-optimal designs for the always identifiable aspect L s
The aim is to determine the A-optimal designs for estimating the aspect L s given by (22) in case of identifiability and nonidentifiability of s so that, according to (12), the design problem in the general Model A is given by with At first, we consider the case of nonidentifiability, i.e. b 2 = 1 I a 2 , where only Model A makes sense. For this case, we are now going to prove that the A-optimal design δ * = (δ * 0 , δ * 1 , . . . , δ * I ), i.e. a solution of (23), is given by δ * 0 = 0 and δ * 1 = . . . = δ * I = 1 I using the equivalence theorem of A-optimality. Hence, it is sufficient to consider the information matrix for δ * and the corresponding generalized inverse (since the information matrix is not invertible for δ * ). At first, note that Lemma 1 provides for Lemma 3 Set α := 1 I 1 a 2 I +1 I ρ Z +ρ E , then the information matrix I (δ * ) in Model A is given by Moreover, a generalized inverse of I (δ * ) is given by . (24), we obtain We are now going to show that the matrix I (δ * ) − proposed in (25) is a generalized inverse of I (δ * ) by checking whether I (δ * ) I (δ * ) − I (δ * ) = I (δ * ). Since a and α are multiplicative constants, we do not need to consider them. Without them, we get 1 Note that the generalized inverse calculated in Lemma 3 is the Moore-Penrose generalized inverse. We are now able to prove the A-optimality of the design δ * in case of non-identifiability. (20). If b 2 = 1 I a 2 and L = 1 √ I 1 I , I I ×I , the design δ * = (δ * 0 , δ * 1 , . . . , δ * I ) is A-optimal for estimating L s (i. e. a solution of (23)) if and only if δ * 0 = 0 and δ * 1 = δ * 2 = . . . = δ * I = 1 I .

Proof
According to the General Equivalence Theorem for A-optimality, see Pukelsheim (2006), Theorem 7.19, with for p = −1, K = L, a design δ * is A-optimal if and only if the inequality is satisfied for all nodes i = 0, 1, . . . , I , where the vector x i is given by The design matrix containing x 1 , . . . , x n is of the form where σ 2 1 = . . . = σ 2 I = a 2 I +1 I ρ z + ρ E = 1 I α according to (24) and u i is the i-th unit vector in R I . First, we consider the nodes {1, . . . , I } with vectors x i = 1 σ i (b, a u i ) with i = 1, . . . , I and check the inequality given by (26). In this situation, we get For the central node i = 0, we have x 0 = 1 σ 0 (a, b1 I ) . Using similar arguments, we obtain for the left hand side of inequality (26) where the last inequality follows by the fact that Hence, the equivalence theorem for A-optimality provides the assertion.
In the remaining part of this section, we derive the A-optimal designs for L s when s is identifiable, i.e., when b 2 = 1 I a 2 . This result is both applicable in Model A and Model B, as the influence matrix X is non-singular.
Again, Lemma 1 provides the terms u i X X u i ρ Z + ρ E . Hence only u i (X −1 ) L L X −1 u i hast to be calculated. Again, the factor 1 a 2 −I b 2 of X −1 cancels out in δ * i so that we only have to consider Here, we get Remark 2 Contrary to the estimation of the complete vector s of states in Sect. 4.1, the A-optimal designs for estimating L s in the case b 2 = 1 I a 2 cannot be extended to the case b 2 = 1 I a 2 since the values v and w are then equal to zero. However, as in Sect. 4.1, we get in the case of no influence of adjacent nodes, i.e. b = 0, the equalities w = v and w = v so that the A-optimal design is again given by δ * 0 = δ * 1 = . . . = δ * I = 1 I +1 .
so that δ * 1 = 1 2 for any a and b with b 2 = a 2 . This design coincides with the A-optimal design for estimating the complete state vector s (see Remark 1). In case of more than one terminal node, i.e. I > 1, the A-optimal design for estimating the linear aspect L s depends on the relationship between a and b as shown in the following example.

Example 3
We consider a similar situation as in Example 2. Therefore, we can set a = 1 without loss of generality, since the A-optimal designs are only influenced by the ratio of the values a and b. Figure 3 shows the optimal values for δ 1 for estimating L s depending on the quantity b for I = 4, 9, 25 terminal nodes and a = 1. It shows in particular that the special case b 2 = 1 I , where the state vector s is not identifiable and the A-optimal design is given by δ * 0 = 0 and δ * 1 = . . . δ * I = 1 I , is not a smooth continuation of the case b 2 = 1 I . Furthermore, if the influence b of the central node 0 goes to infinity, the A-optimal weights δ 1 = . . . = δ I at the terminal nodes i = 1, . . . , I go to 1 I , i.e. converge to the A-optimal weight in the case of no identifiability of s. This again means that only a small proportion of observations should be done at the central node if it has a big influence on its adjacent nodes and vice versa. As for estimating the complete state vector s, Model A with random states and no measurements errors provides larger weights δ * 0 at the central node and thus smaller weights at the terminal nodes than the Model A with nonrandom states and measurement errors. Nevertheless, the differences are relatively small.

A-optimal designs in a wheel network
In this section, we consider the other network introduced in Example 1, namely the wheel network with I + 1 nodes. In that case, the network consists of the central node 0, which is connected to all remaining nodes, whereas the remaining nodes are both connected to the central node and two other nodes. More precisely, we consider the influence matrix X of the form where the matrices A ∈ R 2×2 and B ∈ R (I −1)×2 are of the form and u I −1 j denotes the j-th unit vector in R I −1 . The matrixX ∈ R (I −1)×(I −1) contained in X is a Toeplitz-triadiagonal matrix with main diagonal elements equal to a, whereas the lower and upper diagonal elements are equal to b, that is, We are now interested in determining the A-optimal design for estimating the complete state vector s. In that case, the influence matrix X given by (27) must be non-singular, otherwise the state vector will not be identifiable. The following Lemma 5 contains conditions on the influencing values a > 0 and b > 0 of the network, which ensure the non-singularity of X.
Theorem 5 Let X ∈ R (I +1)×(I +1) be of the form (27) with corresponding matrices A and B of the form given by (28). Then the following statements hold: (a) The eigenvalues of the Toeplitz-tridiagonal matrixX are given by The eigenvector corresponding to λ i is of the form v i = 2 I sin( i I π), . . . , sin( i(I −1) I π) , i = 1, . . . , I − 1 .
Proof (a). The statement is a well-known result used in a lot of applications, for instance, in order to solve specific types of differential equations. The result can be found in Noschese et al. (2013) among others. (b). The matrixX becomes singular if and only if one of its eigenvalues is equal to zero. The first statement of (b) follows by setting equation (30) equal to zero. The second statement of (b) follows by the fact, thatX has I − 1 distinct eigenvalues. Consequently, at most one eigenvalue can be equal to zero and the resulting dimension of the eigenvectors corresponding to the non-zero eigenvalues is equal to I − 2.
(c). Under the assumption thatX is non-singular, the determinant of the influence matrix X can be reformulated in terms of the schur complement ofX, that is, (see Harville (1997), Theorem 13.3.8). SinceX is non-singular, it holds det(X) = 0.
Consequently, X is singular if and only if det(A − B X −1 B) = 0. We first concentrate on determining B X −1 B. Using part (a), the inverseX −1 can be represented in terms of the eigenvalues and eigenvectors ofX, that is, where λ i and v i are given by (30) and (31), respectively.
Due to the structure of the matrix B given in (28), we obtain where the value c 1 contains the sum of all elements of the matrixX −1 , c 2 is the sum of the first and last row ofX −1 , and c 3 is the sum of the upper right and upper left and bottom right and bottom left elements ofX −1 . Using the formula we obtain the expressions for c 1 , c 2 and c 3 given in part (c).

Remark 3
If the matricesX and X are non-singular, Theorem 1 can be used to calculate the A-optimal design for estimating s. In particular, the inverse X −1 , which is needed for the determination of the A-optimal weights δ 0 , δ 1 , . . . , δ I , can be calculated by using formula (21) and the inverse ofX given by (33). Note that the non-singularity of the Toeplitz-tridiagonal matrixX is not a necessary condition for the non-singularity of the influence matrix X. If the values a and b result in a singularX, the influence matrix X can still be non-singular and Theorem 1 be applicable. In case ofX being singular, the influence matrix X should be partioned in the following way: where the matricesÂ ∈ R 3×3 andB ∈ R (I −2)×3 are of the form and the matrixX is a Toeplitz-tridiagonal matrix of dimension I − 2. Note that Part a) and b) of Lemma 5 also hold for the matrixX. In particular, it follows thatX is non-singular, as the fixed values a, b > 0 cannot both result in a trivial eigenvalue of X andX. IfX is non-singular, X is non-singular if and only det(Â −B X −1B ) = 0. Note that the inverse ofX can be determined by using part a) of Lemma 5.
We conclude this section by considering an example of a wheel network, which is similar to Example 2. singular, we use Lemma 5 and Remark 3. For each I and the corresponding remaining values of b the A-optimal designs are calculated numerically using again Remark 3. First, we observe that the numerically calculated A-optimal weights δ * 1 , . . . , δ * I of the non-central nodes coincide in the wheel network, that is δ * 1 = δ * 2 = . . . = δ * I , for I = 4, 9, 25. The weight of the central node is then given by δ * 0 = 1 − I * δ * 1 . Note that the A-optimal weights for the non-central nodes also coincide in the star network (see Theorem 4) and it seems that the non-central nodes obtain equal weights as soon as they have a similar adjacency structure. Figure 4 shows the A-optimal values for δ * 1 depending on the quantity b for the wheel network with I = 4, 9, 25 nodes and a = 1 for Model A with nonrandom states given by ρ Z = 0 (a)) and for Model A with no measurements errors given by ρ E = 0 (b)). The vertical lines indicate the values of b for which the influence matrix is singular and thus result in a non-identifiable state vector s (independent from the selected design). For I = 4, we obtain a singular influence matrix X for b = 0.5 ∈ [0, 3]. For I = 9, the matrix X is not invertible if b ∈ {0.464, 0.532, 1} ⊂ [0, 3]. If the wheel has I = 25 non-central nodes, the influence matrix X is singular if b is contained in the set {0.244, 0.504, 0.538, 0.618, 0.784, 1.174, 2.668} ⊂ [0, 3]. Similar to the star network, we observe that the weight δ * 1 is not monotonically increasing with b. Moroever, for b going to infinity, the A-optimal weight δ * 1 goes again to 1 I in each of the considered cases. Consequently, the weight δ * 0 tends to zero and a small amount of observations at the central node is sufficient in the wheel network, if the influence of the other nodes is great and vice versa. Comparing the A-optimal weight for Model A with random states to the corresponding weight for Model A with nonrandom states, the curves are similar for all cases of nodes I = 4, 9, 25 under consideration. In particular, we observe that the weight δ * 1 are slightly smaller, if random states are considered instead of nonrandom states. Note again that the designs for Model B coincide with those of Model A with nonrandom states such that the left hand side of Fig. 4 also describes the A-optimal weight for δ 1 in Model B.

Discussion
We have derived a general characterization of A-optimal designs for networks where all adjacent nodes have the same influence on the state of a node. For the most simple network, a network with star configuration, we derived the A-optimal designs explicitly. Moreover, we showed that not always all expected states are identifiable and we derived A-optimal designs for an aspect of the states which is always identifiable.
Moreover, we considered a more complex network with wheel configuration and derived analytical conditions on the influences of the states which ensure the identifiability of all states.
The star and the wheel configuration lead to similar results: Depending on the influence of the states at adjacent nodes, the A-optimal design puts more or less weight at the central node than at the non-central nodes while the non-central nodes get always equal weights. The higher the influence of the adjacent nodes is the smaller should be the weight at the central node. This means in particular that less precise measurements can be used at the central node and the more precise measurements should be used at the non-central nodes when the influence of adjacent nodes is high.
These results can also be used to simplify the numerical calculation of the Aoptimal design, which is necessary due to the numerical instability of the original design problem. For instance the numerical calculation for a mixture of a star and a wheel network can be simplified if the results about the A-optimal designs of the individual networks are used. The remaining problem can then be solved by several optimization algorithms as the multiplicative algorithm developed by Yu (2010) or a Particle Swarm optimization algorithm (see Kennedy and Eberhart (1995) among many others).
For the considered networks, we obtained equal A-optimal weights at nodes that have a similar adjacency structure, in particular, they have the same number of adjacent nodes. The proof of that observation will be addressed in further research. Moreover, the presented approach is based on the assumption that the influence of adjacent nodes is the same and known. It is an open problem how to derive optimal designs if this is not the case.
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflicts of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.