A measure-theoretic representation of graphs

Inspired by the notion of action convergence in graph limit theory, we introduce a measure-theoretic representation of matrices, and we use it to define a new notion of pseudo-metric on the space of matrices. Moreover, we show that such pseudo-metric is a metric on the subspace of adjacency or Laplacian matrices for graphs. Hence, in particular, we obtain a metric for isomorphism classes of graphs. Additionally, we study how some properties of graphs translate in this measure representation, and we show how our analysis contributes to a simpler understanding of action convergence of graphops.


Introduction
In the last 20 years, the study of complex networks has permeated many areas of social and natural sciences.Important examples are computer, telecommunication, biological, cognitive, semantic and social networks.Network structures are usually modeled using graph theory to represent pairwise interactions between elements of a network.For this reason, it is particularly important to find advantageous ways to represent and compare graphs.Many graph distances have been proposed from both an applied and a theoretical perspective.In applications, the most common pseudo-distances are inspired by local comparison (e.g.Hamming distance, Jaccard distance) and/or global spectral methods.For an overview of the most commonly-used graph pseudo-distances to compare empirical networks in practice, see [11].From a theoretical point of view, the selection of a metric on the space of graphs is related to graph limit theory.This is a very active field of mathematics that connects graph theory with many other mathematical areas such as stochastic processes, ergodic theory, spectral theory and several branches of analysis and topology.In fact, in mathematical terms, one is interested in finding a metric/topology on the space of graphs, and a completion of this space with respect to that topology.Traditionally, this field grew in two distinct directions: limits of graph sequences of bounded degree on the one hand (Benjamini-Schramm convergence [2] and the stronger notion of local-global convergence [3,12]), and limits of dense graph sequences on the other hand (graphons and the related notion of cut-metric [9,15,17]).For a complete treatment of these topics, see the monograph by L. Lovász [16].More recently, the challenging intermediate case of sparse graph sequences with unbounded degree attracted a lot of interest, as this case covers the vast majority of networks in applications.In fact, real networks are usually sparse, but not very sparse and heterogeneous.For instance, there is the relaxation of graphons to less dense graph sequences, namely L p -graphons [8,7].In a recent paper, A. Backhausz and B. Szegedy introduced a new functional analytic/measuretheoretic notion of convergence [1], that not only covers the intermediate degree case, but also unifies the graph limit theories that we mentioned previously.Other works in this direction are [5,6,13,14,19].
In this paper, we contribute to the study of representation and comparison of graphs, by introducing and investigating the following measure-theoretic representation of matrices.
where δ denotes the Dirac measure, and p 1 , . . ., p n ∈ R >0 such that n i=1 p i = 1 are fixed.Let also p := (p i ) i ∈ R n .We will simply denote by µ (A, x) the measure µ p (A, x) , where there is no risk of confusion about the given probability vector p.The family of measures generated by A is The set of measures generated by A is We will drop the A in the subscript of µ (A, x) and Z A , writing µ x and Z, respectively, whenever the dependence on the matrix A is obvious.
We use this representation to define a new notion of pseudo-metric on the space of matrices.Moreover, we show that, for this pseudo-metric, the distance between two matrices A and B is zero if and only if A and B are switching equivalent, i.e., there exists a permutation matrix P such that A = P BP .Formally, Theorem 2. Let K ∈ R ≥1 and let S be the set of n × n matrices A such that ||A|| ∞→1 ≤ K.Then, a matrix A ∈ S is determined, up to switching equivalence, by the set Z A of measures generated by A, where we consider the measures relative to the uniform probability measure.
For example, if two matrices A and B are the adjacency (or Laplacian) matrices of two graphs G 1 and G 2 , respectively, then they are switching equivalent if and only if G 1 and G 2 are isomorphic.As a consequence, the above framework will allow us to define a metric on the class of isomorphic graphs.Additionally, we study how some properties of graphs, as the spectrum, the vertex degrees, and some homomorphism numbers, translate in this measure representation.
Such representation and metric are inspired by the notion of action convergence in graph limits theory [1], but it is simpler.For this reason, our analysis also contributes to a simpler understanding of the limit notion for graphs.Our results show that for discrete probability measures with finite support, such distance has the same expressive power of the more complex action convergence metric [1,Definition 2.6].However, for general measures it stays open whether this metric has strictly less expressive power or not.

Structure of the paper
This work is structured as follows.In Section 2 we introduce some relevant definitions and notations, in Section 3 we give some preliminary results, and in Section 4 we prove Theorem 2.Moreover, in Section 5 we relate properties of matrices and graphs with the measure representation, and in Section 6 we underline the relationship with action convergence.Finally, in Section 7, we present some open questions as well as future directions.

Basic definitions and notations
Throughout the paper we fix n ∈ N ≥2 , we let P denote the set of permutation matrices of order n, and we let e i denote the i-th vector of the canonical basis of R n , for i = 1, . . ., n.We use δ to denote a Dirac measure and P (R p ) to represent the space of probability measures in R p .Definition 3. Let ρ be a probability measure on R 2 whose support is given by exactly n points.A pair of vectors (x, y) for some p 1 , . . ., p n ∈ R >0 such that n i=1 p i = 1.Observe that, in the above definition, if the entries of x are all pairwise distinct, then there is a unique ordered support of ρ.
From here on, we also fix p 1 , . . ., p n ∈ R >0 such that n i=1 p i = 1, and we let p := (p i ) i ∈ R n .
In the following, we will mainly focus on the case where p is the uniform distribution on [n].However, we point out that, in some cases, other probability measures can be more appropriate.For example, if we consider the family of measures generated by the transition probability matrix of a reversible Markov chain, an appropriate choice for p would be the stationary distribution of the Markov chain.
Given an n×n matrix A and a vector x = (x i ) i ∈ R n , we define the marginal with respect to the first variable as the discrete measure µ p,M x such that for every q 1 ∈ R. Being the measure discrete, it is completely characterized by the previous formula.Also in this case, we will denote by µ M x the measure µ p,M x when there is no risk for confusion.We also let = µ P x ∈ Z : P ∈ P and P p = p .
We observe that Z x has finitely many elements, since P is a finite set, and that Z x = Z P x for all P ∈ P.

Let also
A := a 11 a 12 a 21 a 22 . Then, In particular, The following vectors will play an important role in the proof of our main result.Therefore, we give them a name.Definition 5. Let A be an n × n matrix and let (µ x ) x∈R n be the family of measures generated by A. A vector v ∈ R n is p−irreducible for A if the following condition holds.For every P 1 ∈ P and for every vector y ∈ R n , µ p P1v = µ p y if and only if there exists P 2 ∈ P such that y = P 2 P 1 v, P 2 A = AP 2 and P 2 p = p.We call it irreducible for A if it is u−irreducible for A, where u is the uniform n-dimensional probability vector.
Notice that, if a vector is irreducible for A, then it is p−irreducible for every probability vector p.For this reason, we will just consider irreducible vectors for A in our following arguments.
We consider the following Example 6.For the matrix is irreducible, while the vector is not an irreducible vector.
We now want to be able to compare measures.For this reason, we recall the following well-known metric: Definition 7 (Lévy-Prokhorov metric).The Lévy-Prokhorov Metric d LP on the space of probability measures P (R p ) is where B p is the Borel σ-algebra on R p and U ε is the set of points that have distance smaller than ε from U .
The above metric metrizes the weak/narrow convergence for measures.Now, we want to be able to compare sets of measures.We therefore introduce the following Definition 8 (Hausdorff metric).Given X, Y ⊂ P (R p ), their Hausdorff distance , where cl is the closure in d LP .It follows that d H is a pseudometric for all subsets in P R k , and it is a metric for closed sets.
By definition, the Lévy-Prokhorov distance between probability measures is upper-bounded by 1 and, therefore, the Hausdorff metric for sets of measures is upper-bounded by one.Now, we define the L ∞ and the L 1 norm of a vector v ∈ R n as respectively.
Additionally, we define the (∞ → 1)−operator norm of a n × n matrix A as

Preliminary results
In this section we prove some preliminary results that will be needed for the proof of the main one.
Lemma 9. Let A be an n×n matrix and let (µ x ) x∈R n be the family of measures generated by A. Then, x, y ∈ R n are such that if and only if there exists P ∈ P such that y = P x, P p = p and x ∈ ker(P A − AP ).
Proof.By definition, µ x = µ y if and only if hence, if and only if there exists P ∈ P such that (P x, P Ax) = (y, Ay) and P p = p.This happens if and only if P p = p, y = P x and P Ax = AP x, hence, if and only if P p = p, y = P x and x ∈ ker(P A − AP ).
An immediate consequence is the following Corollary 10.Let A be an n × n matrix and let (µ x ) x∈R n be the family of measures generated by A. For each P ∈ P such that P A = AP and P p = p, we have that, for all x ∈ R n , µ x = µ P x .
We now prove a preliminary lemma that will be needed for the proof of the next theorem.
Proof.We prove the claim by induction over N .For N = 1, the claim is trivially true as the the matrix is non-zero.
We now assume the statement to be true for N − 1, and we prove it for N .From the inductive hypothesis, there exists v ∈ R n such that v / ∈ ker(K i ) for every i ∈ [N −1].We also know that there exists w ∈ R n such that w / ∈ ker(K N ) from the base case.Therefore, we can observe that we can choose an α > 0 such that the vector v + αw / ∈ ker(K i ) for every i ∈ [N ].In fact, for every i ∈ [N ], using linearity and the reverse triangular inequality we have We now notice that K i v and K i w cannot both be zero for every i ∈ [N ], as a consequence of the above discussion.We can therefore choose an α such that the line passing trough the origin in the plane with slope α does not intersect any of the points with coordinates ( K i v , K i w ).
We also notice that in the previous lemma, we could have considered, more generally, bounded operators on a normed vector space instead of only square matrices.More generally, we could also have considered countably many bounded operators, instead of finitely many, on a complete normed vector space.In fact, the proof idea of Lemma 11 rewritten in set theoretic language reads: the kernel of a non-zero operator is nowhere dense and, therefore, the countable union of the kernels is a set of first category, therefore its complement is dense in the complete normed vector space (i.e.non-empty) as a consequence of Baire category theorem.
The next theorem ensures the existence of irreducible vectors with pairwise distinct entries, for any given matrix.
Theorem 12. Let A be an n × n matrix and let (µ x ) x∈R n be the family of measures generated by A. Then, 1.There exists an irreducible vector v = (v i ) i ∈ R n for A.
2. We can always assume that the entries of v are pairwise distinct, that is, Proof.If P A = AP for every P ∈ P, then every vector is an irreducible vector, and there is nothing to show.Therefore, we consider the case in which there exists at least one P 2 ∈ P such that P 2 A = AP 2 .For every P ∈ P such that P A = AP , there exists a non-zero vector v P / ∈ ker(P A − AP ).Therefore, for every P 1 ∈ P, where we are using the fact that P 1 = P −1 1 , since this is a permutation matrix.Now, by Lemma 11 we can choose v ∈ R n such that v / ∈ ker (P A − AP )P 1 for all P ∈ P such that P A = AP and P 1 ∈ P, where we consider all matrices (P A − AP )P 1 .Therefore, given P 1 , P 2 ∈ P, we have that Now, by Lemma 9, we have that µ P1v = µ y if and only if there exists P 2 ∈ P such that y = P 2 P 1 v and P 1 v ∈ ker(P 2 A − AP 2 ).Hence, by the above observation, µ P1v = µ y if and only if there exists P 2 ∈ P such that y = P 2 P 1 v and P 2 A = AP 2 .This proves the first claim.
To prove the second claim, assume that v i = v j for some i, j such that i = j.

Consider the vector
where the absolute value of ξ ∈ R is small enough that v / ∈ ker((P A − AP )P 1 ), for P, P 2 ∈ P such that P A = AP .This is possible since the kernel is closed, therefore its complement is open.
From here on in this section, we fix a constant K ∈ R ≥1 and we let S be the set of n × n matrices A such that ||A|| ∞→1 ≤ K.Moreover, given a vector x ∈ R n and d > 0, we let We observe that, if x satisfies the conditions of Theorem 12, then also x i d satisfies them, for every d > 0 small enough.
Example 13.If K ≥ (n − 1), then S contains all adjacency matrices associated to graphs on n nodes.
We now recall the following result from [1], since it will be needed in the proof of Lemma 15 below.Lemma 14 Let X, Y be two jointly distributed R k -valued random variables.Then, We apply the above lemma to prove the following Lemma 15.Fix A ∈ S and let x ∈ R n .Let ν 1 ∈ Z x and write ν 1 = µ P x for some P ∈ P.Then, the measure Proof.We have that and The claim then follows by letting µ P x and µ P (x i d ) be the distributions X # P and Y # P of the R 2 -valued random variables X(ω) = (P x ω , AP x ω ) and Y (ω) = (P (x i d ) ω , AP (x i d ) ω ) from Lemma 14.
Next, we prove a theorem that allows us to associate, to any measure in Z v , a unique measure in Z v i ε , whenever v is irreducible and ε is small enough.Theorem 16.Fix A ∈ S and let v ∈ R n be an irreducible vector for A whose entries are pairwise distinct.Let ε > 0 be such that: where the minimum exists since Z v is finite; 2. ε is small enough that the vectors v i ε , for i = 1, . . ., n, are such that, if P ∈ P and Let also ν 1 ∈ Z v and write ν 1 = µ P v for some P ∈ P.Then, the measure Proof.First, we prove that Z v has cardinality larger or equal than Z v i ε .Fix two distinct elements in Z v i ε and write them as µ P1(v i ε ) and µ P2(v i ε ) , for some P 1 , P 2 ∈ P. If we show that µ P1v = µ P2v , we are done.Assume that µ P1v = µ P2v .Then, since v is irreducible, there exists P 3 ∈ P such that P 2 v = P 3 P 1 v and P 3 A = AP 3 .By Corollary 10 and by the choice of ε, this implies that , which is a contradiction since we are assuming that µ P1(v i ε ) and µ P2(v i ε ) are distinct.Therefore, Z v has cardinality larger or equal than Z v i ε .
Now, we illustrate the rest of the proof in Figure 1.By Lemma 15 we know that ν 1 and ν 2 are such that Fix any ν 1 ∈ Z v such that ν 1 = ν 1 , and consider the corresponding measure ν 2 ∈ Z v i ε as in Lemma 15, so that By the choice of ε, we have that d LP (ν 1 , ν 1 ) > ε and now, applying the reverse triangular inequality, it is easy to see that d Figure 1: An illustration of the proof of Theorem 16.

Main result
We are now ready to prove Theorem 2.
Proof of Theorem 2. If we know the set Z A of measures generated by A, then for any x ∈ R n we also know the set Z (A, x) .Hence, we can choose a vector It is easy to see that v must be an irreducible vector for A and, up to a small perturbation, we can choose v such that all its entries are pairwise distinct.Now, fix ε > 0 as in Theorem 16, and choose it also small enough that: 1.For i ∈ {1, . . ., n}, so that also these vectors are irreducible for A; 2. The entries of each v i ε are pairwise distinct.
Fix now ν 1 ∈ Z (A, v) .Then, we know that ν 1 = µ (A, P1v) , for some P 1 ∈ P, but we don't know A and P 1 .However, we know that there exists P 2 ∈ P such that the unique ordered support of ν 1 is (cf.Example 17 below) and we do know the resulting pair in (4.1).In particular, since we know both v and P 2 P 1 v, we can also reconstruct P 2 P 1 .Now, fix i ∈ {1, . . ., n} and observe that, since P 1 , P 2 ∈ P, there exists j ∈ {1, . . ., n} such that e j = P 1 P 2 e i .By applying Theorem 16 to ν 1 , we know that the measure Hence, we can identify ν 2 and therefore also its unique ordered support, that we can write as Taking the difference between (4.1) and (4.2) leads to therefore we are able to reconstruct the i−th column of P 2 AP 2 .Since we can do this for every i ∈ {1, . . ., n}, we can reconstruct the entire matrix P 2 AP 2 .This proves the claim.
We illustrate the first part of the proof of Theorem 2 with an example.
Example 17.Let n = 3, so that |P| = 6, and let Then, P A = AP , implying that  Figure 2(b) shows the support of ν 1 := µ (A, P1v) , and it is clear from the picture that the unique ordered support of ν 1 is

Now, by letting
where We note that, if we consider measures µ x which are relative to a probability measure p that is not the uniform probability measure, then a matrix A as in Theorem 2 will be characterized by a stronger notion than switching equivalence.In particular, in this case Z A = Z B if and only if A = P BP , where P is such that P p = p.An interesting particular case is when the vector p has pairwise distinct entries p i = p j for i = j.In this case, 5 Properties of matrices and graphs from the generated measure We now discuss how some properties of matrices and graphs directly translate in terms of the measure generated by the matrices.
We first notice that, for an n × n matrix A and a scalar λ ∈ R, a measure in Z A is supported on the graph of the linear function λx, graph(λx) := {(x, λx) ∈ R 2 : λ ∈ R}, if and only if λ is an eigenvalue of the matrix A. Moreover, we notice that the measure µ which is completely supported on the vertical line corresponds to the constant 1 vector.For this reason, the set {x ∈ R : (1, x) ∈ supp(µ)} corresponds to the row sums of the row of the matrix A.
We now want to relate our measure representation of matrices to graphs.In order to do this, we introduce several matrix representations of graphs.We start with the simplest possible choice.Definition 18.Let G = (V, E) a graph and on vertices v 1 , . . ., v N .The adjacency matrix of G is the N × N matrix A, whose entries are Another possible matrix representation that can be advantageous in some cases is the following.
where D is the diagonal matrix of the degrees.
We remark that both the adjacency matrix and the Kirchhoff Laplacian matrix are such that two graphs are isomorphic if and only if the respective matrices M 1 and M 2 are related by the the relationship M 1 = P M 2 P .
Additionally, we recall the following Definition 20.The normalized Laplacian is the N × N matrix where Id is the N × N identity matrix.For i = j, we have which is minus the probability that a random walker goes from v i to v j .
We refer to [10,18] for more details on the normalized Laplacian and its spectral theory.
We now consider the above matrices in relation to graphs, and we observe how properties of graphs translate in the family of measure representations.
We consider a graph G = (V, E) and the related adjacency matrix A. We already discussed how to directly extract the spectral information of a matrix.Additionally, as the degrees of the graph G correspond to the sums of the rows of the adjacency matrix, we can directly determine from Z A the degrees as presented above.
The spectrum and the degree distribution determine many graph properties.For example, the spectrum of the normalized Laplacian determines the number of connected components, and whether the graph is bipartite.Additional properties are characterized by this information.We now recall the following definition, and we refer to [4] for more details on this topic.Definition 21.For two graphs F = (V (F ), E(F )) and G = (V (G), E(G)), a graph homomorphism from F to G is a function φ : V (F ) → V (G) such that for v, w ∈ V (F ) and {v, w} ∈ E(F ) implies {φ(u), φ(v)} ∈ E(G).We denote by hom(F, G) the number of homomorphisms from F to G.
We now give some examples of homomorphism numbers that can be extracted directly from the measure-theoretic representation.Notice that, in general, it is not easy to reconstruct directly general information about the adjacency matrix A and its powers, because the measure representation does not keep track of permutations of vectors.

Relationship with action convergence metric
In this section we briefly present the notion of action convergence from [1], and we underline the relationship with the metric defined in this work.We start by recalling the following Definition 24.A P-operator is a linear operator of the form A : L ∞ (Ω) → L 1 (Ω) such that is finite, where (Ω, F, P) is a generic probability space.We denote by B(Ω) the set of all P -operators on Ω.
We show, with the following example, that a matrix is a P −operator.
Example 25.If Ω = [n] := {1, . . ., n} and P = (p ω ) ω∈ [n] on Ω is the probability measure relative to the probability vector p In this case, B(Ω) is the set of all n × n matrices.Thus, every matrix A ∈ R n×n is a P -operator.
We consider now a P −operator This is a bounded random variable, and we also have that is a random variable with finite expectation.We can therefore define the 2−dimensional random vector (X, AX).More generally, for all random variables Z 1 , Z 2 , . . ., Z k ∈ L ∞ (Ω), we can define the 2k−dimensional random vector We show, with the following example, how the 2−dimensional random vectors constructed above relate to the measures generated by A defined in Section 2.
Example 26.For the probability space Ω = [n], with probability P = (p ω ) ω∈[n] , a matrix A ∈ R n×n and a vector X = (X(ω)) ω∈[n] ∈ R n the law of the 2−dimensional random vector (X, AX) is These, in fact, are the measures generated by A according to Definition 1.
We now define the k−profile of A as Considering two P −operators we finally define the action convergence metric.
Definition 27 (Metrization of action convergence).For the two P -operators A, B the action convergence distance is This metric has some nice compactness properties as stated in the following Theorem 28 (Theorem 2.9 in [1]).
be a convergent sequence of P -operators with uniformly bounded • p→q norms.
Then there is a P -operator A such that lim i→∞ d M (A i , A) = 0, and A p→q ≤ sup i∈N A i p→q .
Moreover, action convergence unifies several approaches to graph limits theory.In particular, consider the sequence of adjacency matrices A n of graphs G n , and let v n be the number of vertices of G n .Then, • The action convergence of the sequence We have that the following special class of P −operators is important in graph limits theory as it naturally generalizes the notion of adjacency matrix of a graph.Definition 29.A positivity-preserving and self-adjoint P −operator is called a graphop.

Consider two P −operators
We can now define the simplified metric Notice that, on the space of graph isomorphisms of finite graphs, the 1−profile metric and the action convergence metric both induce the discrete topology and, therefore, we have the following Corollary 33.The 1−profile metric and the action convergence metric are topologically equivalent on the space of graph isomorphisms for graphs with at most n vertices.This is obviously not clear anymore for general P −operators as the topology induced by the two metrics is not anymore discrete.

Future directions
In future work, we aim to better understand when the action convergence metric is topologically equivalent to the simplified 1-profile Metric we introduced.This would contribute to a better understanding of action convergence, potentially giving new insight about the difference between the convergence of dense graph sequences and sparse/bounded degree sequences.The 1-profile metric could be related to weaker notions of local-global convergence for bounded degree graph sequences, as the notion of Benjamini-Schramm convergence.A better understanding of these types of metrics could also help to understand if limiting the number of colorings in the notion of local-global convergence to 2 or more does actually change the notion of convergence, as asked in [12].

1
hence, in particular, v is irreducible for A. The support of µ (A, v) is illustrated in Figure2(a).Now, let P

Example 22 .,
Let S k denote the star graph with k nodes.Then, for any graph G on n nodes, hom (S k , G) = where d 1 , . . ., d n are the degrees of G. Example 23.Let C k denote the cycle graph on k nodes, and again let G be any graph on n nodes.Then hom (C k , G) = n i=1 λ k i where λ 1 , . . ., λ n are the eigenvalues of the adjacency matrix of G.
convergence [1, Theorem 8.2 and Lemma 8.3] • The action convergence of the sequence A n coincides with local-global convergence [1, Theorem 9.2].

Definition 30 ( 1 − 2 ∞ k=2 2 −
profile Metric).For the two P -operators A, B the 1-profile distance is d S (A, B) := d H (S 1 (A), S 1 (B))We now compare the 1−profile Metric introduced in this work with the action convergence metric.We have the followingLemma 31.d S (A, B) ≤ 2d M (A, B) Proof.d S (A, B) = d H (S 1 (A), S 1 (B)) ≤ d H (S 1 (A), S 1 (B)) + k d H (S k (A), S k (B)) ≤ 2d M (A, B).Moreover, a direct Corollary of Theorem 4 is the following Corollary 32.Let K ∈ R ≥1 .Let S be the set of n × n matrices A such that ||A|| ∞→1 ≤ K.Then, for matrices A, B ∈ S we get that d M (A, B) = 0 if and only if d L (A, B) = 0