1 Introduction

Weighted graphs are a common data structure in many real-world scenarios. Recently, persistent homology became a widespread tool for data analysis, classification, comparison, and retrieval. However, this technique is by its very own nature limited to the analysis of weighted simplicial complexes. Although a graph is a one-dimensional complex, relevant information is not always carried by its topology, but, for instance, by graph-theoretical structures. A common choice to overcome this issue is to associate auxiliary simplicial complexes to the graph, see for instance (Bergomi et al. 2020). This strategy has been successfully applied in many interesting applications, e.g. (Petri et al. 2014; Lord et al. 2016; Reimann et al. 2017; Rieck et al. 2018; Sizemore et al. 2018; Chowdhury and Mémoli 2018; Port et al. 2018; Blevins and Bassett 2020; Anand et al. 2020).

It is possible to define and compute persistence in other categories than simplicial complexes or topological spaces (Bergomi and Vertechi 2020; Bergomi et al. 2021) and, in a different sense, Patel (2018), McCleary and Patel (2020), Kim and Mémoli (2021), and McCleary and Patel (2022). We introduce a further class of indexing-aware persistence functions (ip-functions), defined on \((\mathbb {R}, \le )\)-indexed diagrams in a given category, that can be described via persistence diagrams. Additionally, we display a specific way of building ip-functions for filtered graphs and digraphs, introducing the concepts of steady and ranging sets.

We are rather far from the categorifications of Bubenik and Scott (2014), Lesnick (2015), Oudot (2015), de Silva et al. (2018): we aim to provide a simple and agile tool that can be applied directly to graphs (i.e., without mapping graphs to simplicial complexes), and possibly to other structures arising naturally from applications. The constructions derived from the framework we propose have a topological counterpart obtainable considering the simplicial complex associated with a poset (see Rem. 1 Bergomi et al. (2021)). Here, we show how to bypass that topological construction.

Section 1.1 briefly recalls the classical notions of persistence diagram and bottleneck distance. Section 2 focuses on graphs. First, we define ip-functions, and balanced ip-functions and discuss their stability. Then, we introduce steady and ranging sets as swift generators of ip-functions based directly on graph-theoretical features. These constructions are the theoretical core of the work. Thereafter, we apply them to study persistent Eulerian sets and monotone features on some elementary graphs. Section 2.5 showcases how the steady and ranging constructions can be leveraged in hub-detection tasks. Concrete applications follow in Sect. 3: we compute steady and ranging hubs in a network of airports, the character co-occurrence networks of Les Misérables and Game of Thrones, and a set of languages. Section 4 extends to weighted digraphs the theory developed in the previous sections. Code for application is available as a Python package at the repository https://github.com/MGBergomi/hubpersistence.git. The Appendix contains examples showing that most ip-functions of the paper are not balanced.

1.1 Persistence diagrams

The main object of study in persistent homology (Edelsbrunner and Harer 2008) are filtered spaces, i.e. pairs (Xf) where X is a topological space (e.g., the space of a simplicial complex) and \(f:X \rightarrow \mathbb {R}\) is a map called filtering function: sublevel sets \(X_u= f^{-1}\big ((-\infty , u]\big )\) are compared through homology morphisms induced by inclusion, in particular the so-called Persistent Betti Number functions. From such a function a persistence diagram (see Definition 1) can be built (Cohen-Steiner et al. (2007), Sect. 2). In turn, Persistent Betti Number functions can be recovered from the persistence diagram, Cohen-Steiner et al. (2007).

Persistence diagrams are the most widely used “fingerprints” of filtered spaces. The bottleneck distance between persistence diagrams yields an effective lower bound to distances between filtered spaces. This makes persistence diagrams a powerful tool in shape classification, analysis and retrieval. The strategic advantage of the generalisation started in Bergomi and Vertechi (2020), Bergomi et al. (2021) consists in the fact that also categorical persistence functions (Definition 4) can be represented by persistence diagrams: see Bergomi and Vertechi (2020, Sec. 3.9).

In \(\mathbb {R}\times (\mathbb {R}\cup \{+\infty \})\) set \(\varDelta =\{(u, v) \, | \, u=v\}\), \(\varDelta ^+=\{(u,v) \, | \, u<v \}\) and \(\bar{\varDelta }^+ = \varDelta \cup \varDelta ^+\). In a multiset, the multiplicity of an element will be the number of times that the element appears.

Definition 1

A persistence diagram D is a multiset of points of \(\bar{\varDelta }^+\) where every point of the diagonal \(\varDelta \) appears with infinite multiplicity.

The points of D belonging to \(\varDelta ^+\) are called cornerpoints; they are said to be proper if both their coordinates are finite, cornerpoints at infinity otherwise. A persistence diagram is said to be finite if so is its set of cornerpoints. We shall only consider finite persistence diagrams.

Definition 2

Given persistence diagrams \(D, D'\), let \(\Gamma \) be the set of all bijections between D and \(D'\). We define the bottleneck (formerly matching) distance as the real number

$$\begin{aligned} d(D, D') = \inf _{\gamma \in \Gamma } \sup _{p\in D} \Vert p-\gamma (p)\Vert _\infty \end{aligned}$$

First, this distance function checks the maximum displacement between corresponding points for a given matching either between cornerpoints of the two diagrams or cornerpoints and their projections on the diagonal \(\varDelta \). Then, the minimum among these maxima is computed. Minima and maxima are actually attained because of the requested finiteness.

2 Graph-theoretical persistence

Let Graph be the category having finite simple undirected graphs as objects and injective simplicial applications as morphisms, seen as a subcategory of the category of finite simplicial complexes. In what follows, a graph will be considered as the pair of its vertex set and edge set, i.e. \(G=(V, E)\), \(G'=(V', E')\) and so on.

Definition 3

An \((\mathbb {R}, \le )\)-indexed diagram is any functor from the category \((\mathbb {R}, \le )\) to an arbitrary category \(\mathbf {C}\). \((\mathbb {R}, \le )\)-indexed diagrams form a category, \(\mathbf {C}^{(\mathbb {R}, \le )}\). The \((\mathbb {R}, \le )\)-indexed diagram is said to be monic if all morphisms of its image are monomorphisms of \(\mathbf {C}\).

We consider \((\mathbb {R}, \le )\)-indexed diagrams in Graph that are constant on a finite set of left-closed, right-open intervals. Because of the choice of monomorphisms as the only acceptable morphisms, every such \((\mathbb {R}, \le )\)-indexed diagram is monic, see Definition 3, and can be seen, up to natural isomorphisms, as a filtration of a graph G coming from a filtering function \(f:V\cup E \rightarrow \mathbb {R}\cup \{+\infty \}\). Moreover, we shall limit our study to \((\mathbb {R}, \le )\)-indexed diagrams whose associated filtration has no isolated vertices at any level. In other words, the filtering function f takes value \(+\infty \) if a vertex is isolated, and the minimum of its values on the edges incident to the vertex, otherwise. Thus, f is determined by its restriction to E; therefore the weighted graphs considered here are pairs (Gf) with \(f:E \rightarrow \mathbb {R}\). By construction, the subgraphs of the corresponding filtrations are induced by their edge sets.

Definition 4

Let \({\bar{\mathbf {C}}}\) be a category. A lower-bounded function \(p:\text {Morph}({\bar{\mathbf {C}}}) \rightarrow \mathbb {Z}\) is a categorical persistence function if, for all \(u_1 \rightarrow u_2 \rightarrow v_1 \rightarrow v_2\), the following inequalities hold:

  1. 1.

    \(p(u_1\rightarrow v_1) \le p(u_2\rightarrow v_1)\) and \(p(u_2\rightarrow v_2) \le p(u_2\rightarrow v_1)\).

  2. 2.

    \(p(u_2\rightarrow v_1) - p(u_1\rightarrow v_1) \ge p(u_2\rightarrow v_2) - p(u_1\rightarrow v_2)\).

Remark 1

Such a function is categorical in the sense that it yields the same result to morphisms obtained from each other by composition with a \({\bar{\mathbf {C}}}\)-isomorphism. For instance, we can retrieve the framework of classical topological persistence by setting \({\bar{\mathbf {C}}}= \mathbf {Vect}\) and p as the rank operator, i.e. the dimension of the image.

In what follows we focus on \({\bar{\mathbf {C}}}=(\mathbb {R}, \le )\). In this case a morphism \(u\rightarrow v\) is simply the relation \(u\le v\), which is represented as the point (uv) in the persistence diagrams.

Definition 5

Let p be a map assigning to each monic \((\mathbb {R}, \le )\)-indexed diagram M in a category \(\mathbf {C}\) a categorical persistence function \(p_M\) on \((\mathbb {R}, \le )\), such that \(p_{M} = p_{M'}\) whenever a natural isomorphism between M and \(M'\) exists. All the resulting categorical persistence functions \(p_M\) are called indexing-aware persistence functions in \(\mathbf {C}\) (ip-functions for brevity). The map p itself is called an ip-function generator.

Remark 2

An ip-function generator is actually a categorical function (in the sense of Remark 1) on the functor category \(\mathbf {C}^{(\mathbb {R}, \le )}\) .

An ip-function in Graph (Definition 5) \(p_M\), where M is an \((\mathbb {R}, \le )\)-indexed diagram, will be denoted \(p_{(G, f)}\), where M corresponds to the filtration produced by the weighted graph (Gf). The associated persistence diagram will be denoted by D(f), for the sake of simplicity and if no confusion may occur.

Fig. 1
figure 1

A weighted graph (left) and its Persistent Betti Number functions in degree 0 (middle) and 1 (right)

We can now observe that ip-functions are a particular case of categorical persistence functions in the category Graph. We recall that categorical persistence functions generalise Persistent Betti Number (PBN) functions. The difference between any of the categorical persistence functions introduced in Bergomi et al. (2021) and an ip-function defined here is that the former comes from a functor defined on Graph, while the latter strictly depends on the filtration, so comes from a functor defined on \((\mathbb {R}, \le )\).

Remark 3

The graph depicted in Fig. 1 shall be our running toy example along the entire manuscript. In the figure, we report the PBN functions of degree 0 and 1 to allow the reader to compare those classical results with the ones we shall obtain through ip-functions.

In Sect. 4, we extend the notions introduced above to the category of directed graphs.

2.1 Balanced ip-functions

The categorical functions introduced in Bergomi et al. (2021) are stable, i.e. the bottleneck distance between their persistence diagrams is a lower bound for their interleaving distance. The same does not automatically hold for ip-functions. However, we shall state a condition (Definition 6) which implies stability (as proved in Theorem 1). This condition corresponds to d’Amico et al. (2010, Proposition 10): there, it is proved for 0-degree PBNs, and from it the stability theorem (d’Amico et al. (2010), Theorem 29) follows through a sequence of lemmas; here, it is postulated.

Definition 6

Let p be a ip-function generator on Graph. The map p itself and the resulting ip-functions are said to be balanced if the following condition is satisfied. Let (Gf) and \((G', f')\) be any two weighted graphs, and \(p_{(G, f)}\), \(p_{(G', f')}\) their associated ip-functions. If an isomorphism \(\psi :G\rightarrow G'\) and a positive real number h exist, such that \(\sup _{e\in E} |f(e)-f'\big (\psi (e)\big )|\le h\), then for all \((u, v)\in \varDelta ^+\) the inequality \(p_{(G, f)}(u-h, v+h)\le p_{(G', f')}(u, v)\) holds.

Let (Gf), \((G', f')\) be as above. Let also \(\mathcal {H}\) be the (possibly empty) set of graph isomorphisms between G and \(G'\). We can now take to Graph some definitions given in Frosini and Mulazzani (1999), d’Amico et al. (2010), and Lesnick (2015).

Definition 7

The natural pseudodistance of (Gf) and \((G', f')\) is

$$\begin{aligned} \delta \big ((G, f), (G', f')\big ) = \left\{ \begin{array}{l l} +\infty &{} \text {if} \ \ \ \mathcal {H}=\emptyset \\ \inf _{\phi \in \mathcal {H}}\sup _{e\in E} |f(e) - g\big (\phi (e)\big )| \ \ &{} \text {otherwise} \end{array} \right. \end{aligned}$$

Some simple adjustments of the proof of d’Amico et al. (2010, Theorem 29) and of its preceding lemmas yield the following theorem.

Theorem 1

(Stability) Let p be a balanced ip-function generator in Graph and \((G, f), (G', f')\) be two weighted graphs. Then we have

$$\begin{aligned} d\big (D(f), D(f')\big ) \le \delta \big ((G, f), (G', f')\big ), \end{aligned}$$

where D(f) and \(D(f')\) are the persistence diagrams realized by the ip-functions \(p_{(G, f)}\) and \(p_{(G', f')}\) respectively. \(\square \)

Through Frosini et al. (2019, Theorem 5.8), this also implies stability with respect to the interleaving distance. Universality (Lesnick (2015), Sec. 5.2) is generally not granted for stable persistence functions: it needs ad hoc constructions.

When discussing stability above, we introduced two distinct graphs. However, the following proposition describes stability when considering a single graph and two filtering functions. This result will be useful in the remainder of the paper.

Proposition 1

The ip-function generator p is balanced if and only if the following condition is satisfied. Let \(G=(V, E)\) be any graph, f and g be two filtering functions on G, and \(p_{G, f)}\) and \(p_{(G, g)}\) their ip-functions. If a positive real number h exists, such that \(\sup _{e\in E}|f(e)-g(e)|\le h\), then for all \((u, v)\in \varDelta ^+\) the inequality \(p_{(G,f)}(u-h, v+h) \le p_{(G, g)}(u, v)\) holds.

Proof

One of the two implications is immediate. The other is proved by the fact that \(p_{(G, g)} = p_{(G', f')}\) where \(g = f'\circ \psi \), with the notation of Definition 6. \(\square \)

Remark 4

The condition is symmetric: if it holds as in the statement of Proposition 1, then also \(p_{(G, g)}(u-h, v+h) \le p_{(G, f)}(u, v)\) holds for all \((u, v)\in \varDelta ^+\).

2.2 Steady and ranging sets

Definition 8

Given a graph \(G = (V, E)\), any function \(\mathcal {F}:2^{V\cup E} \rightarrow \{true, false\}\) is called a feature. We call \(\mathcal {F}\)-set any \(X\subset V\cup E\) such that \(\mathcal {F}(X)= true\). Given a weighted graph (Gf) and a real number u, we denote by \(G_u\) the subgraph of G induced by the edge set \(f^{-1}(-\infty , u]\). We shall say that \(X\subset V\cup E\) is an \(\mathcal {F}\)-set at level \(w\in \mathbb {R}\) if it is an \(\mathcal {F}\)-set of the subgraph \(G_w\).

Definition 9

Let \(\mathcal {F}\) be a feature of G. We define the maximal feature \(m\mathcal {F}\) associated with \(\mathcal {F}\) as follows: for any \(X \subseteq (V \cup E)\), \(m\mathcal {F}(X) = true\) if and only if \(\mathcal {F}(X)=true\) and there is no \(Y \subseteq (V \cup E)\) such that \(X \subset Y\) and \(\mathcal {F}(Y)=true\).

Definition 10

Let \(\mathcal {F}\) be a feature. A set \(X\subseteq V\cup E\) is a steady \( \mathcal {F}\)-set (s\(\mathcal {F}\)-set for brevity) at \((u, v) \in \varDelta ^+\) if it is an \(\mathcal {F}\)-set at all levels w with \(u\le w \le v\). We call X a ranging \(\mathcal {F}\)-set (r\(\mathcal {F}\)-set) at (uv) if there exist levels \(w\le u\) and \(w'\ge v\) at which it is an \(\mathcal {F}\)-set.

Let \(S^\mathcal {F}_{(G, f)}(u,v)\) be the set of s\(\mathcal {F}\)-sets at (uv) and let \(R^\mathcal {F}_{(G, f)}(u,v)\) be the set of r\(\mathcal {F}\)-sets at (uv).

Remark 5

Intuitively, the adjective “steady” stresses that a steady set enjoys a given feature \(\mathcal {F}\) throughout the entire interval [uv]. “Ranging”, instead, refers to the fact that a ranging set spans, with feature \(\mathcal {F}\), the range [uv] although possibly with gaps. Of course, steady implies ranging. This implication is granted by the “\(\le \)” and “\(\ge \)” signs in the definitions. With strict inequalities the implication fails. There are features for which steady is equivalent to ranging, e.g., features for which a set can be an \(\mathcal {F}\)-set only in a (possibly unbounded) interval. A simple example is the feature \(\mathcal {F}\) which assigns true only to singletons consisting of a vertex of a fixed degree.

Lemma 1

If \(u\le u' < v' \le v\), then

  1. 1.

    \(S^\mathcal {F}_{(G, f)}(u,v) \subseteq S^\mathcal {F}_{(G, f)}(u',v')\)

  2. 2.

    \(R^\mathcal {F}_{(G, f)}(u,v) \subseteq R^\mathcal {F}_{(G, f)}(u',v')\)

where the equalities hold if \(G_u = G_{u'}\) and \(G_v = G_{v'}\). Moreover \(S^\mathcal {F}_{(G, f)}(u,v) = \emptyset = R^\mathcal {F}_{(G, f)}(u, v)\) if \(G_u =\emptyset \).

Proof

By the definitions themselves of steady and ranging \(\mathcal {F}\)-set. \(\square \)

Definition 11

Let \(\mathcal {F}\) be a feature. For any graph G, for any filtering function \(f:E \rightarrow \mathbb {R}\), we define \(\sigma ^\mathcal {F}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {Z}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|S^\mathcal {F}_{(G, f)}(u,v)|\) and \(\varrho ^\mathcal {F}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {Z}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|R^\mathcal {F}_{(G, f)}(u,v)|\). We denote by \(\sigma ^\mathcal {F}\) and \(\varrho ^\mathcal {F}\) the maps assigning \(\sigma ^\mathcal {F}_{(G, f)}\) and \(\varrho ^\mathcal {F}_{(G, f)}\) respectively to the \((\mathbb {R}, \le )\)-indexed diagram corresponding to (Gf).

Proposition 2

The maps \(\sigma ^\mathcal {F}\) and \(\varrho ^\mathcal {F}\) are ip-function generators.

Proof

We prove conditions 1 and 2 of Definition 4, recalling that the source category is \((\mathbb {R}, \le )\), so the existence of a morphism \(u \rightarrow v\) (with \(u\ne v\)) simply means that \(u<v\). Assume \(u_1<u_2<v_1<v_2\). Let (Gf) be any weighted graph.

  • (Condition 1 for \(\sigma ^\mathcal {F}\)) By Lemma 1, \(S^\mathcal {F}_{(G, f)}(u_1, v_1) \subseteq S^\mathcal {F}_{(G, f)}(u_2, v_1)\), so \(|S^\mathcal {F}_{(G, f)}(u_1, v_1)| \le |S^\mathcal {F}_{(G, f)}(u_2, v_1)|\). Also \(S^\mathcal {F}_{(G, f)}(u_2, v_2) \subseteq S^\mathcal {F}_{(G, f)}(u_2, v_1)\) and \(|S^\mathcal {F}_{(G, f)}(u_2, v_2)| \le |S^\mathcal {F}_{(G, f)}(u_2, v_1)|\).

  • (Condition 2 for \(\sigma ^\mathcal {F}\)) By Lemma 1, \(S^\mathcal {F}_{(G, f)}(u_1, v_1) \subseteq S^\mathcal {F}_{(G, f)}(u_2, v_1)\), so \(|S^\mathcal {F}_{(G, f)}(u_2, v_1)| - |S^\mathcal {F}_{(G, f)}(u_1, v_1)|\) is the number of s\(\mathcal {F}\)-sets at \((u_2, v_1)\) which fail to be \(\mathcal {F}\)-sets at some w with \(u_1\le w \le u_2\). Analogously for \(|S^\mathcal {F}_{(G, f)}(u_2, v_2)| - |S^\mathcal {F}_{(G, f)}(u_1, v_2)|\). Now, every s\(\mathcal {F}\)-set at \((u_2, v_2)\) which fails to be an \(\mathcal {F}\)-set at w with \(u_1\le w \le u_2\) is also an s\(\mathcal {F}\)-set at \((u_2, v_1)\) failing at the same w. So \(S^\mathcal {F}_{(G, f)}(u_2, v_1) - S^\mathcal {F}_{(G, f)}(u_1, v_1) \supseteq S^\mathcal {F}_{(G, f)}(u_2, v_2) - S^\mathcal {F}_{(G, f)}(u_1, v_2)\) and \(|S^\mathcal {F}_{(G, f)}(u_2, v_1)| - |S^\mathcal {F}_{(G, f)}(u_1, v_1)| \ge |S^\mathcal {F}_{(G, f)}(u_2, v_2)| - |S^\mathcal {F}_{(G, f)}(u_1, v_2)|\).

  • (Condition 1 for \(\varrho ^\mathcal {F}\)) The argument is the same as for \(\sigma ^\mathcal {F}\).

  • (Condition 2 for \(\varrho ^\mathcal {F}\)) By Lemma 1, \(R^\mathcal {F}_{(G, f)}(u_1, v_1) \subseteq R^\mathcal {F}_{(G, f)}(u_2, v_1)\), so \(|R^\mathcal {F}_{(G, f)}(u_2, v_1)| - |R^\mathcal {F}_{(G, f)}(u_1, v_1)|\) is the number of r\(\mathcal {F}\)-sets at \((u_2, v_1)\) which fail to be \(\mathcal {F}\)-sets at all levels w with \(w \le u_1\). Analogously for \(|R^\mathcal {F}_{(G, f)}(u_2, v_2)| - |R^\mathcal {F}_{(G, f)}(u_1, v_2)|\). Now, every r\(\mathcal {F}\)-set at \((u_2, v_2)\) which fails to be an \(\mathcal {F}\)-set at all levels w with \(w \le u_1\) is also an r\(\mathcal {F}\)-set at \((u_2, v_1)\) failing at the same levels w. So \(R^\mathcal {F}_{(G, f)}(u_2, v_1) - R^\mathcal {F}_{(G, f)}(u_1, v_1) \supseteq R^\mathcal {F}_{(G, f)}(u_2, v_2) - R^\mathcal {F}_{(G, f)}(u_1, v_2)\) and \(|R^\mathcal {F}_{(G, f)}(u_2, v_1)| - |R^\mathcal {F}_{(G, f)}(u_1, v_1)| \ge |R^\mathcal {F}_{(G, f)}(u_2, v_2)| - |R^\mathcal {F}_{(G, f)}(u_1, v_2)|\).

\(\square \)

The value of both functions \(\sigma ^\mathcal {F}_{(G, f)}\) and \(\varrho ^\mathcal {F}_{(G, f)}\) at a point P on a vertical (resp. horizontal) discontinuity line is the same as the value at the points in a right (resp. upper) neighbourhood of P

Of course, there are many features which give valid ip-functions: eg. the features \(\mathcal {F}\) such that, if X is an \(\mathcal {F}\)-set at level u, then it is an \(\mathcal {F}\)-set also at level v for all \(v>u\).

We still don’t know which general hypothesis on \(\mathcal {F}\) would imply that \(\sigma ^\mathcal {F}\) or \(\varrho ^\mathcal {F}\) are balanced ip-function generators (Definition 6). Such features exist: Sect. 2.4 presents a whole class of features giving rise to balanced ip-functions.

2.3 Steady and ranging persistence on Eulerian sets

We now give an example of the framework exposed in Sect. 2.2. Given any graph G, we define \(\mathcal{EU}\mathcal{}: 2^{V\cup E} \rightarrow \{true, false\}\) to yield true on a set A if and only if A is a set of vertices whose induced subgraph of G is nonempty, Eulerian and maximal with respect to these properties; in that case A is said to be a \(\mathcal{EU}\mathcal{}\)-set of G. \(\mathcal{EU}\mathcal{}\) is then the maximal version of a feature we are not going to deal with. Let now (Gf) be a weighted graph. We apply Definition 10 to feature \(\mathcal{EU}\mathcal{}\).

Definition 12

For any real number w, the subset \(A\subseteq V\) is a \(\mathcal{EU}\mathcal{}\)-set at level w if it is a \(\mathcal{EU}\mathcal{}\)-set of the subgraph \(G_w\). It is a steady \(\mathcal{EU}\mathcal{}\)-set (an s\(\mathcal{EU}\mathcal{}\)-set) at \((u, v) \in \varDelta ^+\) if it is a \(\mathcal{EU}\mathcal{}\)-set at all levels w with \(u\le w \le v\). It is a ranging \(\mathcal{EU}\mathcal{}\)-set (an r\(\mathcal{EU}\mathcal{}\)-set) at (uv) if there exist levels \(w\le u\) and \(w'\ge v\) at which it is a \(\mathcal{EU}\mathcal{}\)-set.

\(S^{\mathcal{EU}\mathcal{}}_{(G, f)}(u, v)\) and \(R^{\mathcal{EU}\mathcal{}}_{(G, f)}(u, v)\) are respectively the sets of s\(\mathcal{EU}\mathcal{}\)-sets and of r\(\mathcal{EU}\mathcal{}\)-sets at (uv). We define \(\sigma ^\mathcal{EU}\mathcal{}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {R}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|S^\mathcal{EU}\mathcal{}_{(G, f)}(u,v)|\) and \(\varrho ^\mathcal{EU}\mathcal{}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {R}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|R^\mathcal{EU}\mathcal{}_{(G, f)}(u,v)|\).

We denote by \(\sigma ^\mathcal{EU}\mathcal{}\) and \(\varrho ^\mathcal{EU}\mathcal{}\) the maps assigning \(\sigma ^\mathcal{EU}\mathcal{}_{(G, f)}\) and \(\varrho ^\mathcal{EU}\mathcal{}_{(G, f)}\) respectively to the \((\mathbb {R}, \le )\)-indexed diagram corresponding to (Gf). By Proposition 2, \(\sigma ^\mathcal{EU}\mathcal{}\) and \(\varrho ^\mathcal{EU}\mathcal{}\) are ip-function generators.

Consider the example displayed in Fig. 1. In that particular example, the functions \(\sigma ^\mathcal{EU}\mathcal{}_{(G, f)}\) and \(\varrho ^\mathcal{EU}\mathcal{}_{(G, f)}\) are the same. Furthermore, they also coincide with the PBN function in degree 1 shown in the same figure. We show that this is not always the case in Fig. 2.

Both functions \(\sigma ^\mathcal{EU}\mathcal{}\) and \(\varrho ^\mathcal{EU}\mathcal{}\) are not balanced (see the Appendix).

2.4 Monotone features

For a given graph \(G = (V, E)\), we shall consider as subgraphs only the ones induced by sets of edges. The next definition is a variation on the notion of monotone (sometimes dubbed hereditary) property defined in Alon and Shapira (2008).

Definition 13

We say that a feature \(\mathcal {F}\) is monotone if

  • For any graphs \(G' =(V', E')\subset G''=(V'', E'')\), and any \(X \subseteq (V' \cup E')\), \(\mathcal {F}(X) = true\) in \(G''\) implies \(\mathcal {F}(X)= true\) in \(G'\)

  • In any graph \(\overline{G}=(\overline{V}, \overline{E})\), for any \(Y \subset X \subseteq \overline{V} \cup \overline{E}\), \(\mathcal {F}(X) = true\) implies \(\mathcal {F}(Y)= true\).

A paradigmatic monotone feature is independence: independent (or stable) sets and matchings are examples of sets of vertices, respectively of edges, with monotone features.

For the remainder of this section, let (Gf) be a weighted graph, \(G=(V, E)\), and \(\mathcal {F}\) a monotone feature in G. By Proposition 2, \(\sigma ^\mathcal {F}\) and \(\varrho ^\mathcal {F}\) are ip-function generators.

Lemma 2

Let \(X \subseteq (V \cup E)\). Then, either there is no value u for which \(\mathcal {F}(X)=true\) in \(G_u\), or \(\mathcal {F}(X)=true\) in \(G_u\) for all \(u \in [u_1, v_1)\), where \(u_1\) is the lowest value u such that in the subgraph \(G_u=(V_u, E_u)\) one has \(X \subseteq (V_u \cup E_u)\), and \(v_1\) is either the lowest value v for which \(\mathcal {F}(X)=false\) in \(G_v\) or \(+\infty \).

Fig. 2
figure 2

A weighted graph (Hh) (left) and the corresponding functions \(\sigma ^\mathcal{EU}\mathcal{}_{(H, h)}\) (middle) and \(\varrho ^\mathcal{EU}\mathcal{}_{(H, h)}\) (right)

Proof

Assume that \(\mathcal {F}(X)=true\) in \(G_u\) for at least one value u. If \(\mathcal {F}(X)=true\) in \(G_u\), then \(\mathcal {F}(X)=true\) in \(G_{u'}=(V_{u'}, E_{u'})\) for all \(u'<u\) such that \(X\subseteq (V_{u'} \cup E_{u'})\) by Definition 13. \(\square \)

The interval \([u_1, v_1)\) of Lemma 2, i.e. the widest interval for which \(\mathcal {F}(X)=true\) in (Gf), is called the \(\mathcal {F}\)-interval of X in (Gf).

Proposition 3

\(\sigma ^\mathcal {F} =\varrho ^\mathcal {F}\)

Proof

By Lemma 2. \(\square \)

Let now g be another filtering function on G; in order to avoid confusion, for each real number u, we denote by \(G_{f, u}\) (resp. \(G_{g, u}\)) the subgraph of G induced by the edge set \(f^{-1}\big ((-\infty , u]\big )\) (resp. \(g^{-1}\big ((-\infty , u]\big )\)).

Lemma 3

Assume that there exists a positive real h such that \(\sup _{e\in E}|f(e)-g(e)|\le h\). Assume also that \(X \subseteq (V \cup E)\) exists, such that \(u \in [u_1, v_1)\) is its \(\mathcal {F}\)-interval in Gf), with \(u_1+2h<v_1<+\infty \). Then there is a non-empty \(\mathcal {F}\)-interval \([u_2, v_2)\) of X in (Gg), and \(|u_1-u_2|\le h, |v_1-v_2|\le h\).

Proof

Assume that, for \(e\in E\), \(f(e)=u\); then \(g(e)\le u+h\). This proves that, for each u, \(G_{f, u}\) is a subgraph of \(G_{g, u+h}\). Swapping the roles, also \(G_{g, u}\) is a subgraph of \(G_{f, u+h}\).

Therefore, if X exists in \(G_{f, u}\) it also exists in \(G_{g, u+h}\). Symmetrically, if X exists in \(G_{g, u}\) it also exists in \(G_{f, u+h}\). Recalling, by Lemma 2, the meaning of \(u_1\) and, correspondingly, \(u_2\), we obtain that \(|u_1-u_2|\le h\).

If \(\mathcal {F}(X)=true\) in \(G_{f, u+h}\), then \(\mathcal {F}(X)=true\) also in the subgraph \(G_{g, u}\) because \(\mathcal {F}\) is monotone. Analogously, \(\mathcal {F}(X)=true\) in \(G_{g, u+h}\) implies \(\mathcal {F}(X)=true\) in \(G_{f, u}\). Recalling, by Lemma 2, the meaning of \(v_1\) and, correspondingly, of \(v_2\), we obtain that \(|v_1-v_2|\le h\). \(\square \)

Proposition 4

The ip-function generators \(\sigma ^\mathcal {F} =\varrho ^\mathcal {F}\) are balanced.

Proof

We shall prove for \(\sigma ^\mathcal {F}\) (and consequently for \(\varrho ^\mathcal {F}\), by Proposition 3) the property stated in Proposition 1. With the notation and the assumptions of Lemma 2, assume that for \(u<v\) we have \(\sigma ^\mathcal {F}_{(G, f)}(u-h, v+h)>0\) (if it vanishes the claim is trivially true). We want to show that \(\sigma _{(G, f)}^\mathcal {F}(u-h, v+h)\le \sigma ^\mathcal {F}_{(G, g)}(u, v)\). Let \(X \subseteq (V \cup E)\) be such that \(\mathcal {F}(X)= true\) in \(G_{f, w}\) for all \(w \in [u-h, v+h]\). Then, for the \(\mathcal {F}\)-interval \([u_1, v_1)\) of X in (Gf) we have \(u_1\le u-h\), \(v+h<v_1\). The \(\mathcal {F}\)-interval of the same X in (Gg) is \([u_2, v_2)\), with \(|u_1-u_2|\le h\), \(|v_1-v_2|\le h\) by Lemma 3. So, \(u_2\le u_1+h \le u-h+h = u\) and \(v= v+h-h < v_1-h \le v_2\), i.e. [uv] is contained in the \(\mathcal {F}\)-interval of X in (Gg) and \(\mathcal {F}(X)= true\) in \(G_{g, w}\) for all \(w\in [u,v]\). Therefore, an injective map exists from \(S^\mathcal {F}_{(G,f)}(u-h, v+h)\) to \(S^\mathcal {F}_{(G, g)}(u, v)\), proving that \(\sigma ^\mathcal {F}_{(G, f)}(u-h, v+h) \le \sigma ^\mathcal {F}_{(G, g)}(u, v)\). \(\square \)

Fig. 3
figure 3

A weighted graph (Gf) (left) and its functions \(\sigma ^{m\mathcal {I}}_{(G, f)}\) (middle) and \(\varrho ^{m\mathcal {I}}_{(G, f)}\) (right)

Monotone features—although balanced—often give rise to extremely rich persistence diagrams. For this reason, it is possible to consider instead the maximal version (that could be non-balanced) of those features. In Fig. 3, we show how maximal independent sets give rise to complex persistence diagrams, even considering as graph our running toy example (the one shown originally in Fig. 1). For the monotone feature \(\mathcal {I}\) which identifies independent sets of vertices, \(m\mathcal {I}\) is not balanced (see the Appendix).

Anyway, the maximal version of the feature \(\mathcal {M}\), which identifies matchings, produces balanced ip-function generators (Proposition 5). See Fig. 4 for the functions \(\sigma ^{m\mathcal {M}}\) and \(\varrho ^{m\mathcal {M}}\) of the usual example of Fig. 1.

Proposition 5

The ip-function generators \(\sigma ^{m\mathcal {M}}\) and \(\varrho ^{m\mathcal {M}}\) coincide and are balanced.

Proof

If the edge set X is a matching in a graph, it is a matching in all supergraphs. In a weighted graph (Gf), the set of levels w such that an edge set X is a maximal matching in \(G_w = (V_w, E_w)\) is either empty or the interval \([u_2, v_2)\) where \(u_1\) is the left end-point of the \(\mathcal {M}\)-interval of X and \(v_2\) is either \(+\infty \) or the left end-point of the \(\mathcal {M}\)-interval of a matching Y containing X. This proves that \(\sigma ^{m\mathcal {M}}_{(G, f)} = \varrho ^{m\mathcal {M}}_{(G, f)}\).

Let now g be another filtering function on G, such that \(\sup _{e\in E}|f(e)-g(e)|\le h\), with \(h>0\). Assume that the interval \([u_2, v_2)\) on which X is a maximal matching is such that \(u_2+2h< v_2 < +\infty \). Then, by Lemma 3, for the left end-point \(u_3\) of the \(\mathcal {M}\)-interval of X in (Gg) and the left end-point \(v_3\) of the \(\mathcal {M}\)-interval of Y in (Gg) one has \(|u_2-u_3|\le h, |v_2-v_3|\le h\). So, if X belongs to \(S^{m\mathcal {M}}_{(G,f)}(u-h, v+h)\), it also belongs to \(S^{m\mathcal {M}}_{(G, g)}(u, v)\), proving that \(\sigma ^{m\mathcal {M}}_{(G, f)}(u-h, v+h) \le \sigma ^{m\mathcal {M}}_{(G, g)}(u, v)\). \(\square \)

Fig. 4
figure 4

A weighted graph (Gf) (left) and its functions \(\sigma ^{m\mathcal {M}}_{(G, f)} = \varrho ^{m\mathcal {M}}_{(G, f)}\) (right)

2.5 Hubs

Although the informal concept of hub is intuitively clear, it is not as easy to formalize in graph-theoretical terms. The simple idea of a vertex with (locally) maximum degree is not entirely satisfactory: in a social network it is common to find users with a lot of contacts, with whom, however, they interact poorly. Even a high sum of traffic intensities (e.g. the number of messages exchanged between a user and their connections) is not enough to bestow a vertex the central role implied by the word hub.

There is an important line of research on a probabilistic concept of “persistent hubs” based on degree maximality (Dereich and Mörters 2009; Galashin 2016; Banerjee and Bhamidi 2021) with some intersection with what we are proposing.

We shall use local degree prevalence as feature for building two ip-function generators: for any graph G we define \(\mathcal {H}:2^{V\cup E} \rightarrow \{true, false\}\) to yield true only on singletons containing a vertex whose degree is greater than the ones of its neighbours. Such a vertex is called an \(\mathcal {H}\)-vertex or simply a hub. This feature, combined with the indexing-aware persistence framework and the notion of ranging and steady feature, allows for the identification of those vertices whose role is indeed central throughout the filtration of a given weighted graph (Gf).

Importantly, we preserve the flexibility granted in the realm of classical persistence: as one of the many possible variations, we could consider a vertex to be a hub if the sum of values of f on the edges incident to it (instead of the degree) is greater then the sum at its neighbours.

Our proposal is to build persistence diagrams in our generalized framework, and thereafter use the selection procedure presented in Kurlin (2016) (see 3.1) to identify relevant cornerpoints, thus identifying the “persistent” hubs (with a different meaning of the adjective than in Dereich and Mörters (2009), Galashin (2016), Banerjee and Bhamidi (2021)) of a given weighted graph.

Definition 14

For any real number w, a vertex is a hub (or \(\mathcal {H}\)-vertex) at level w if it is an \(\mathcal {H}\)-vertex of the subgraph \(G_w\). It is a steady hub (or s\(\mathcal {H}\)-vertex) at \((u, v)\in \varDelta ^+\) if it is an \(\mathcal {H}\)-vertex at all levels w with \(u\le w\le v\). It is a ranging hub (or r\(\mathcal {H}\)-vertex) at \((u, v)\in \varDelta ^+\) if there exist levels \(w \le u\) and \(w'\ge v\) at which it is an \(\mathcal {H}\)-vertex.

\(S^\mathcal {H}_{(G, f)}(u, v)\) and \(R^\mathcal {H}_{(G, f)}(u, v)\) are respectively the sets of s\(\mathcal {H}\)-vertices and of r\(\mathcal {H}\)-vertices at (uv). We define \(\sigma ^\mathcal {H}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {R}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|S^\mathcal {H}_{(G, f)}(u,v)|\) and \(\varrho ^\mathcal {H}_{(G, f)}: \varDelta ^+ \rightarrow \mathbb {R}\) as the function which assigns to \((u, v) \in \varDelta ^+\) the number \(|R^\mathcal {H}_{(G, f)}(u,v)|\).

We denote by \(\sigma ^\mathcal {H}\) and \(\varrho ^\mathcal {H}\) the maps assigning \(\sigma ^\mathcal {H}_{(G, f)}\) and \(\varrho ^\mathcal {H}_{(G, f)}\) respectively to the \((\mathbb {R}, \le )\)-indexed diagram corresponding to (Gf).

Figure 5 shows the two ip-functions \(\sigma ^\mathcal {H}\) and \(\varrho ^\mathcal {H}\) for the usual example of Fig. 1. Also \(\sigma ^\mathcal {H}\) and \(\varrho ^\mathcal {H}\) are not balanced (see the Appendix).

Fig. 5
figure 5

A weighted graph (Gf) (left) and its functions \(\sigma ^\mathcal {H}_{(G, f)}\) (middle) and \(\varrho ^\mathcal {H}_{(G,f)}\) (right). The topmost vertex is a hub at all levels in \([2, 3) \cup [4, 5)\)

3 Persistent hubs

In this Section we present a first approach to hub detection implementable on real-world graphs. We consider this work in progress a sort of exploration of the meaning of steady and ranging hubs in different contexts; however, we will not compare our results to a ground truth.

In the following examples, instead of the functions \(\sigma ^\mathcal {H}_{(G, f)}\) and \(\varrho ^\mathcal {H}_{(G, f)}\), we will only show the corresponding persistence diagrams, to make the selection procedure clearer.

3.1 A selection procedure

It is well-known in persistence that noise is represented by cornerpoints close to the diagonal \(\varDelta \). However, not all cornerpoints close to \(\varDelta \) necessarily represent noise, then how wide is the strip along \(\varDelta \) to get rid of? A smart, simple answer is offered in Kurlin (2016), where a remarkable application to segmentation of very noisy data is given. We summarize it here for a given persistence diagram D.

Call diagonal gap a maximal region of the form \(\{(u,v) \in \varDelta ^+ \, | \, a<v-a<b\}\) where no cornerpoints of D lie; \(b-a\) is its width. We can then form a hierarchy of diagonal gaps by decreasing width; out of it we get a hierarchy of sets of cornerpoints: We can consider the cornerpoints lying above the first, widest gap as the most relevant. Empirically, we may decide that also the cornerpoints sitting above the second, or the third widest gap are relevant, and so on. Equivalently, we consider the cornerpoints below the chosen gap to be ignored as a possible result of noise. In Fig. 6 it is possible to observe how the selection of cornerpoints above the widest diagonal gap allows to traceback those maxima (or classes of maxima depending on the multiplicity of the cornerpoints), that are more relevant with respect to the trend of the time series.

Fig. 6
figure 6

Selecting maxima in a time series. Left Flow of the Nile from 1871 to 1970. Data freely available at vincentarelbundock.github.io. Right Cornerpoints selected by considering the widest diagonal gap (in yellow) (color figure online)

In the next Sections we apply this selection criterion to the persistence diagrams corresponding to the functions \(\sigma ^\mathcal {H}_{(G, f)}\) and \(\varrho ^\mathcal {H}_{(G, f)}\), computed for some networks and some filtering functions. The vertices identified by the so selected cornerpoints will be called persistent hubs, in particular persistent steady hubs or persistent ranging hubs.

3.2 Airports

A first attempt of the search for relevant hubs has been realized on a set of 44 major North-American cities (41 in the US, three in Canada; the ones in capital letters in the Amtrak railway map; see Table 1). The edges connect cities between which there have been flights in a randomly chosen but fixed week (June 11–17, 2018). Flight data have been obtained from Google Flights by selecting direct flights with Business Class; distances have been found at Prokerala.com. A single vertex has been considered for each city with more than one airport.

Table 1 The towns considered as vertices and the respective degrees in the graph

As filtering functions we used:

  • Distance

  • Number of flights in the fixed week

  • Their product

and their opposites (+their maximum). For each such choice we looked for steady and ranging hubs, for a total of twelve different persistence diagrams. Note that the same vertex can contribute to several cornerpoints of the persistence diagram of \(\sigma ^\mathcal {H}_{(G, f)}\), whereas this cannot happen for \(\varrho ^\mathcal {H}_{(G, f)}\).

Next, we report results in which where the interest resides in the identification of hubs which do not rank very high by their degree. In particular, we do not find of particular interest that Atlanta, Dallas, Chicago and Houston turn out to be often persistent ranging or steady hubs, since they have the highest degrees in the graph (42, 41, 40 and 40 respectively).

Fig. 7
figure 7

Filtering function: distance; steady hubs. Persistent steady hubs above the widest diagonal gap: two cornerpoints represent Atlanta, one Dallas and one Seattle

The first occurrence of a persistent hub which is rather far from having highest degrees is with the filtering function distance: Seattle is just twelfth in the degree rank, but appears above the widest diagonal gap as a steady hub (Fig. 7). Persistent steady hubs are: Atlanta (with two cornerpoints), Dallas, Seattle.

Surprisingly, if we use the opposite of distance (summed to the maximum distance, for ease of representation), the cornerpoints corresponding to vertices with highest degrees are located under the widest diagonal gap (Fig. 8). Persistent steady hubs are: Los Angeles, San Francisco, Seattle.

Fig. 8
figure 8

Filtering function: max distance minus distance; steady hubs. Persistent steady hubs above the widest diagonal gap: Los Angeles, San Francisco, Seattle

New York City has the eighth highest degree (35, together with Detroit, Phoenix and San Francisco). Still, we would expect it to appear as a hub, in the common sense of the term. In fact, it occurs as one of the few ranging hubs when the filtering functions (max minus number of flights) and distance\(\cdot \)(max minus number of flights) are used.

Ranging hubs for (max minus number of flights): Atlanta, Chicago, Dallas, New York.

Ranging hubs for the product filtering function are Atlanta, Chicago, Dallas, New York, Vancouver.

3.3 Characters co-occurrence in a novel

A classical benchmark for the analysis of hubs in co-occurrence graphs is given by Les Misérables. The network representing the co-occurrence of its characters is freely available at Graphistry. The graph has 77 major characters as vertices; each of the 254 edges joins two characters which appear together in at least one scene; the weight on an edge is the number of common occurrences. We used the inverse of the weight as a filtering function. We compare our results with the ones of Rieck et al. (2018), where the notion of clique-community centrality was used to spot particularly important characters: Table 2.

Table 2 Hubs in Les Misérables characters co-occurrence

Our method spots Cosette as a hub, whereas clique-community centrality does not. On the contrary, our technique misses Gavroche and Fantine. Both methods miss Javert. We are particularly puzzled by the result of Kurlin’s selection method: above the second widest diagonal gap (the first obviously isolates Jean Valjean) we find only Enjolras.

3.4 Time-varying hubs

Weighted graphs can represent discrete dynamics in time-varying process. It is possible to keep track of persistence hubs obtaining a concise representation of the relative importance of each hub in time. We considered the characters co-occurrence in five subsequent books of the Game of Thrones saga, and applied the algorithm mentioned above for the analysis of character co-occurrence in Les Misérables. In this case, however, characters evolve throughout the books. A global analysis, i.e., computing hubs on the graphs obtained considering summary statistics on the five book hardly carries dynamical information. On the contrary, persistence hubs yield an easily visualizable summary of the characters’ roles in time. See Fig. 9.

Fig. 9
figure 9

Evolution of Game of Thrones hub characters throughout five books. The legend reports the first six hubs and their persistence values per book

3.5 Languages

The website TerraLing.com contains much information, consisting of 165 properties, about several languages. It was used in an interesting research (Port et al. 2018) on persistent cycles in language families. Unfortunately the amount of information varies quite a lot from language to language. We analysed the mutual relations of 19 languages (18 of the European Union plus Turkish: Table 3) for which at least 50% of the 165 properties are checked. The graph is the complete one with 19 vertices. The filtering function defined on each edge is the opposite of the normalised quantity of common properties of the two languages that it connects. Ranging and steady hubs coincide and are: Castilian, Catalan, Dutch, English, Portuguese, Swedish.

Table 3 The 19 considered languages

Apart from the presence of English, which might also be biased by the great quantity of information available, we have no key for interpreting these results. For this and for the previous applications, we would very much like to set up a research with specific experts.

4 Digraph persistence

In this section, let (Gf), with \(G=(V, A)\), be any weighted digraph. Given a feature \(\mathcal {F}: 2^{V\cup A} \rightarrow \{true, false\}\), it is straightforward to extend the definitions of balanced ip-function (Definition 6), of natural pseudodistance (Definition 7), the stability theorem (Theorem 1) and the definitions of steady and ranging sets (Definition 10) and of the ip-function generators \(\sigma ^\mathcal {F}\) and \(\varrho ^\mathcal {F}\) (Definition 11, Proposition 2) to this setting.

Fig. 10
figure 10

a The eight tournaments on three vertices, with \(\{1, 2, 3\}\)-valued filtering functions b The ip-functions corresponding to the digraphs of Fig. 10a with respect to features \(\mathcal{DH}\mathcal{}\) and \(\mathcal {K}\). Examples of digraphs and ip-functions; for the correspondence see Tables 4 and 5

We define \(\mathcal{DH}\mathcal{}: 2^{V\cup A} \rightarrow \{true, false\}\) to yield true only on singletons containing a vertex whose outdegree is greater than the ones of its neighbours. Also in this case, there are many possible variations of this feature: we recover the notions of hub, steady hub and ranging hub and ip-function generators \(\sigma ^{\mathcal{DH}\mathcal{}}\) and \(\varrho ^{\mathcal{DH}\mathcal{}}\) as in Sect. 2.5.

Figure 10a presents all tournaments on three vertices, with injective functions with values in the set \(\{1, 2, 3\}\). Figure 10b shows the values of some ip-functions. The correspondence between weighted tournaments and functions is given in Table 4. On these digraphs, \(\sigma ^{\mathcal{DH}\mathcal{}}\) and \(\varrho ^{\mathcal{DH}\mathcal{}}\) yield coinciding functions. However, this is not always the case, as shown in Fig. 11.

Table 4 The correspondence between the weighted digraphs of Fig. 10a and the diagrams of Fig. 10b for feature \({\mathcal{DH}\mathcal{}}\)
Fig. 11
figure 11

A weighted digraph (Gf) (left) and its functions \(\sigma ^{\mathcal{DH}\mathcal{}}_{(G, f)}\) (middle) and \(\varrho ^{\mathcal{DH}\mathcal{}}_{(G, f)}\) (right)

There are two opposite definitions of a kernel of a digraph; we shall consider the one given in Morgenstern and Neumann (1953). However, alternative definitions (see, e.g., Galeana-Sánchez and Hernández-Cruz (2014)) give also rise to admissible features in our framework. We define the feature \(\mathcal {K}: 2^{V \cup A} \rightarrow \{true, false\}\) to yield true only on kernels, i.e. independent sets X of vertices such for every vertex \(w \in V-X\), there exists at least one arc \(a \in A\) with w as tail and head in X, where independence is defined with respect to the underlying undirected graph. Then \(\sigma ^\mathcal {K}\) and \(\varrho ^\mathcal {K}\) are ip-function generators. The correspondence between weighted tournaments and functions is given in Table 5.

Table 5 The correspondence between the weighted digraphs of Fig. 10a and the diagrams of Fig. 10b for feature \(\mathcal {K}\)

None of the ip-function generators \(\sigma ^{\mathcal{DH}\mathcal{}}\), \(\varrho ^{\mathcal{DH}\mathcal{}}\), \(\sigma ^\mathcal {K}\), \(\varrho ^\mathcal {K}\) is balanced (see the Appendix).

5 Conclusions

We introduced ip-functions in a fairly general setting and studied their stability. We have then restricted our scope to the categories of graphs and digraphs, where we have defined steady and ranging sets according to features relative to the given (di)graphs.

We showed how graph-theoretical features can be used directly to obtain a concise representation of weighted undirected and directed graphs as persistence diagrams. In particular, we believe that the steady and ranging ip-function generators allow for a more streamlined analysis of graphs and networks bypassing the construction of auxiliary simplicial complexes. Although the steady and ranging sets yield equivalent results in some cases, persistence diagrams associated with ranging sets are generally simpler than the ones derived from steady sets, so the information is represented in a more condensed way. This is not the only reason for considering both representations. In our applications, we focused on the notion of hub. There, we showcased how the ranging representation of hubs is relevant for hub detection: a vertex might be relevant for the global dynamics of a network if it has local degree prevalence at far enough levels. For example, in a graph whose vertices represent users of a social network, edges represent “friendship”, and weights represent geographical distance, we conjecture that high-persistence ranging hubs might be crucial for the diffusion of “viral” documents. Analogously, we thought that an airport might have a key role if it has a sort of centrality both at a regional and international level, but not necessarily at all intermediate ones.

Fig. 12
figure 12

\(\sigma ^\mathcal{EU}\mathcal{}\) is not balanced: filtering function f left, g right