A novel graph matrix representation: sequence of neighbourhood matrices with an application

Abstract

In the study of network optimization, finding the shortest path minimizing time/distance/cost from a source node to a destination node is one of the fundamental problems. Our focus here is to find the shortest path between any pair of nodes in a given undirected unweighted simple graph with the help of the sequence of powers of neighbourhood matrices. The authors recently introduced the concept of neighbourhood matrix as a novel representation of graphs using the neighbourhood sets of the vertices. In this article, an extension of the above work is presented by introducing a sequence of matrices, referred to as the sequence of powers of \({\mathcal {NM}}(G)\). It is denoted it by \({\mathcal {NM}}^{\{l\}}(G) = [\eta _{ij}^{\{l\}}], 1\le l \le k(G)\), where k(G) is called the iteration number, \(k(G)=\lceil \log _{2} {\hbox {diameter}}(G)\rceil\). As this sequence of matrices captures the distance between the nodes profoundly, we further develop the technique and present several characterizations. Based on the theoretical results, we present an algorithm to find the shortest path between any pair of nodes in a given graph. The proposed algorithm and the claims therein are formally validated through simulations on synthetic data and the real network data from Facebook. The empirical results are quite promising with our algorithm having best running time among all the existing well-known shortest path algorithms for the considered graph classes.

Introduction

With the advent of modern technology and electronic devices, the occurrence of large-scale networks is widespread and ubiquitous. Especially in the field of computer science, these networks are omnipresent in various forms such as communication networks and social networks. These networks require cooperation between a large number of individual components/parts. Major challenges in this field include predicting the behaviour of the large-scale network, and information propagation within the network. For every large-scale network that arises in daily life, their understanding and mathematical description are crucial and achieved with the aid of an intricate graph representation that encodes the interaction between the network’s components.

One of the fundamental problems in the study of network optimization is the shortest path problem that deals with finding a path with minimum distance, time or cost from the source to the destination. Most of the communication network problems, such as information propagation/exchange, were modelled as designing/finding multiple routes between nodes (if it exists) and finding shortest paths between nodes. Two classical approaches for the single-source shortest path problem are attributable to Dijkstra [8] and Bellman–Ford [2]. For more, see [7]. Since then, many researchers have been working on finding the shortest path by reducing the involved time complexity. For a survey of the shortest path algorithms, we refer the readers to [14, 18, 26, 30]. Further, in the literature, many variations and special cases of the shortest path problem were proposed. For instance, we refer to the recent works on massively parallel systems [5], dynamic algorithms for hypergraphs [12], random networks with uncertain arc lengths [13, 25] and in fuzzy graphs [9, 11]. There are numerous algorithms and techniques proposed by many researchers; see [1, 4, 10, 15, 19, 22,23,24] for probabilisitic approach and for large graphs, we refer to [20, 21], and [29] exploits the symmetry in graphs, for more related literature.

In this paper, our focus is in developing algorithm for the single-source shortest path (SSSP) problem over large-scale graphs in a much efficient manner using algebraic graph theory and theory of matrices. That is, given an undirected graph G of a network, design an efficient algorithm to find shortest paths between a pairs of nodes by using the matrix introduced in [16]. This novel graph matrix associated with a graph was called neighbourhood matrix \({\mathcal {NM}}(G)\) and is defined as,

$$\begin{aligned} \eta _{ij}={\left\{ \begin{array}{ll} -|N_{G}(i)|, &{} \quad {\text {if}}\, i=j\\ |N_{G}(j)-N_{G}(i)|, &{} \quad {\text {if}}\, (i,j)\in E(G)\\ -|N_{G}(i) \cap N_{G}(j)|, &{} \quad {\text {if}}\, (i,j)\notin E(G) \\ \end{array}\right. }, \end{aligned}$$

where \(N_G(v)\) denote the set of all neighbours of the vertex v in G. In the same paper, the authors also have discussed various properties and characterizations of \({\mathcal {NM}}(G)\).

We present an extension of the work mentioned above and introduce a scheme for associating a finite sequence of matrices to an undirected graph G. As an application of this scheme, we design an algorithm to compute the shortest path between any pair of vertices. We then discuss the running time and efficiency of the algorithm by supporting empirical results. The proposed algorithm and the claims therein are formally validated through simulations on synthetic data where sampling-based computations are performed in large scale and with large-sized graphs. The empirical results are quite promising, wherein the proposed algorithm has least running time among all the existing well-known shortest path algorithms such as Dijkstra, Bellman–Ford and breadth-first search (BFS). The \({\mathcal {NM}}\)-ShortestPath algorithm can be used effectively in real-life situations involving communication networks where disruption in the network are not tolerated and computing the alternate routes in the network needs to be done within a micro-fraction of a second irrespective of the weights involved in them.

This paper is organized as follows: Rest of this section describes all the basic definitions, notations and also introduce the novel concept of sequence of powers of neighbourhood matrix. Section 2 discusses the methodology with the mathematical setting of the graph characterizations from the matrix entries and various results on the sequence of powers of neighbourhood matrix. Following this, we present the algorithm and the pseudocode (written in MATLAB) to find the shortest path between any two vertices in a graph and analyse the complexity of the proposed algorithms. In Sect. 3, we present numerical results supporting our theory with simulations and empirical results. Conclusions with future directions follow in Sect. 4. Lastly, for the sake of completion, proofs of all the results discussed in this paper are presented in “Appendix”.

Neighbourhood matrix

In this subsection, we present all the required notations, definitions of the neighbourhood matrix \({\mathcal {NM}}(G)\). For all basic notations and definitions of graph theory, we follow the books by Bondy and Murty [3] and West [28]. All the graphs considered are undirected, unweighted and simple. Let G(V(G), E(G)) be a graph with vertex set V(G) and edge set E(G) . For a vertex \(v\in V(G)\), let \(N_{G}(v)\) denote the set of all neighbours of v. The degree of a vertex v is given by \(d_{G}(v)\) or \(|N_{G}(v)|\). The diameter of the graph is denoted as diameter(G) and the shortest path distance between two vertices i and j in G is denoted by \(d_{G}(i,j) , i,j\in V(G)\). Let \(A_{G}, D(G)\) and C(G) denote the adjacency matrix, degree matrix and the Laplacian/admittance matrix of the graph G, respectively. For readers interested in the proofs of the results in this section, we refer to [16].

Definition 1

[16] Given a graph G, the neighbourhood matrix, denoted by \({\mathcal {NM}}(G) =(\eta _{ij})\) is defined as

$$\begin{aligned} \eta _{ij}={\left\{ \begin{array}{ll} -|N_{G}(i)|, &{} \quad {\text {if}}\, i=j\\ |N_{G}(j)-N_{G}(i)|, &{} \quad {\text {if}}\, (i,j)\in E(G)\\ -|N_{G}(i) \cap N_{G}(j)|, &{} \quad {\text {if}}\, (i,j)\notin E(G) \end{array}\right. } \end{aligned}$$

The following lemma facilitates us with a way of computing the neighbourhood matrix.

Lemma 1

[16] The \({\mathcal {NM}}(G)\) can also be obtained from the product of adjacency matrix and Laplacian matrix of a graph G.

For the computation of the \({\mathcal {NM}}(G)\), we often resort to the above lemma. Several efficient algorithms have been proposed in the literature to compute product/multiplication of two matrices; see [6, 27]. We make use of the algorithm described in [6].

Another concept which we require before proceeding to the main result is the level decomposition of a graph from a source node, which is defined by the breadth-first search traversal technique.

Breadth-first search (BFS) is a graph traversal technique [3] where a (source) node and its neighbours are visited first and then the neighbours of neighbours. The algorithm returns not only a search tree rooted at the source node but also a function \(l: V\rightarrow {\mathbb {N}}\), which records the level of each vertex in the tree, that is, the distance of each vertex from the source node. In simple terms, the BFS algorithm traverses level-wise from the source. First, it traverses level 1 nodes (direct neighbours of source node) and then level 2 nodes (neighbours of neighbours of the source node) and so on. We refer to such a level representation as to the level decomposition from the source node.

Lemma 2

[16] Given a graph G and a vertex v, the entries of any row of \({\mathcal {NM}}(G)\) correspond to the subgraph with vertices from the first two levels of the level decomposition of the graph rooted at v with all the edges connecting the vertices lying in different levels.

Remark 1

[16] The proof of the above lemma (given in “Appendix”) also reveals interesting facts about the neighbourhood matrix which justifies its terminology and further helps us in proving the following theorem.

Theorem 1

[16] If at least one row of \({\mathcal {NM}}(G)\) has no zero entries, then the graph G has diameter at most 4.

Remark 2

[16] Note the converse of the above theorem need not hold. For example, consider the hypercube \(Q_3\), whose diameter is 3 while every row of the \({\mathcal {NM}}(Q_3)\) matrix (see Fig. 1) contains a zero entry.

Fig. 1
figure1

The neighbourhood matrix of the hypercube \(Q_3\)

Theorem 2

[16] The matrix \({\mathcal {NM}}(G)\) has no zero entries if and only if the graph G has diameter at most 2.

Sequence of neighbourhood matrices

Theorem 2 captures the essence of the vertices which are placed at a distance at most 2. We extend this idea and introduce in this article an iterative procedure to generate a sequence of graphs and consequently, a sequence of \({\mathcal {NM}}\) matrices associated with these graphs. Such a sequence of \({\mathcal {NM}}\) matrices has proven to be useful in reducing the computational time involved for finding various properties of a given graph.

In the next section, we show one such application of this process in finding the shortest path between any pair of vertices.

Definition 2

Let G be an undirected unweighted simple graph on n vertices. The sequence of neighbourhood matrices \(\mathbf {<{\mathcal {NM}}^{\{l\}}>}\) is recursively defined as follows: Let \(G^{\{1\}}=G\) and \({\mathcal {NM}}^{\{1\}} = {\mathcal {NM}}(G^{\{1\}})\). For \(l>1\), let \(G^{\{l\}}\) be the graph constructed from the graph \(G^{\{l-1\}}\) as follows: \(V(G^{\{l\}})=V(G^{\{l-1\}})\) and \(E(G^{\{l\}}) = E(G^{\{l-1\}}) \cup \{(i,j): \eta_{ij}^{\{l-1\}}<0, i\ne j\}\). Then, the matrix \({\mathcal {NM}}^{\{l\}}\) is defined to be \({\mathcal {NM}}(G^{\{l\}})\) and can be constructed from \(G^{\{l\}}\) using Definition 1 or Lemma 1.

Remark 3

Note that the adjacency matrix of \(A(G^{\{l\}})\) is given by:

$$\begin{aligned} A(G^{\{l\}}) = (a^{\{l\}}_{ij}) ={\left\{ \begin{array}{ll} 1 ,&{} \quad {\text {if}}\, \eta _{ij}^{\{l-1\}}\ne 0, i\ne j\\ 0 ,&{} \quad {\text {Otherwise}} \end{array}\right. } \end{aligned}$$

Definition 3

Let k be the smallest possible integer, such that, \({\mathcal {NM}}^{\{k\}}\) has no zero entry or the number of nonzero entries of \({\mathcal {NM}}^{\{k\}}\) and \({\mathcal {NM}}^{\{k+1\}}\) are the same. The number \(k(G):=k\) is called the iteration number of G.

Remark 4

When \(G = K_n\), the complete graph, \({\mathcal {NM}}(K_n)\) has no zero entries. Hence for \(K_n\), the iteration number is \(k(K_n) = 1\). Further, for a graph G with diameter 2, \({\mathcal {NM}}(G)\) has no zero entries, hence iteration number \(k(G) = 1\)

Let us illustrate the above definitions with a randomly generated graph G on 12 vertices and 14 edges given in Fig. 2.

Fig. 2
figure2

A graph G on 12 vertices and 14 edges

The corresponding sequence of \({\mathcal {NM}}\)-matrices of G is given in Fig. 3. Note that when \(k=3\), the matrix \({\mathcal {NM}}^{\{3\}}(G)\) has no zero entry and hence the iteration number \(k(G):=3\).

Fig. 3
figure3

Sequence of \({\mathcal {NM}}\) matrices corresponding to G

Methodology

In this section, we state some important results connecting the concept of distance between any two vertices with the entries of the matrix \({\mathcal {NM}}^{\{l\}}, 1\le l\le k\). The proofs of the mathematical results from this section are in “Appendix”. These results will be useful for the algorithms stated in the next section.

Mathematical foundation

Theorem 3

Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. If \(\eta _{ij}^{\{l\}}< 0\) for some \(1\le l \le k(G)\), then

  1. 1.

    \(\eta _{ij}^{\{p\}}=0, 1\le p \le l-1\) and \(\eta _{ij}^{\{q\}}>0, l+1\le q\le k(G)\). Converse is also true.

  2. 2.

    There exists a vertex x such that \(\eta _{ix} ^{\{l\}} >0\) and \(\eta _{xj} ^{\{l\}}>0\). Converse need not be true.

Example

For the graph G in Fig. 2 and its matrices in Fig. 3, let us consider the vertices \(i=2\) and \(j=5\). Note that \(\eta _{2,5}^{\{2\}} = -\,4 <0\). Hence \(\eta _{2,5}^{\{1\}} = 0\) and \(\eta _{2,5}^{\{3\}} = 4 >0\).

Theorem 4

Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. For \(1 \le l \le k(G)\), the off-diagonal elements of \({\mathcal {NM}}^{\{l\}}\) can be characterized as follows, for \(1\le i\ne j\le n\)

  1. 1.

    \(\eta ^{\{l\}}_{ij}=0\) if and only if \(d_{G}(i,j)> 2^{l}\).

  2. 2.

    \(\eta ^{\{l\}}_{ij}>0\) if and only if \(0 < d_{G}(i,j)\le 2^{l-1}\).

  3. 3.

    \(\eta ^{\{l\}}_{ij}<0\) if and only if \(2^{l-1}<d_{G}(i,j) \le 2^l\).

Example

For the graph G in Fig. 2 and its associated matrices in Fig. 3, let the vertices be \(i=2\) and \(j=5\). Note that the distance between 2 and 5 is \(3 <2^2\) and that \(\eta _{2,5}^{\{2\}} = -\,4 < 0\), where \(l=2\). Similarly, \(\eta _{1,10}^{\{3\}} \ne 0\) and \(2^2< d(1,10) = 5 \le 2^3\).

Theorem 5

Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. If \(\eta _{ij}^{\{l\}}=-\,1,\) for \(1\le l \le k(G)\), then \(d_{G}(i,j)=2^{l}\).

Remark 5

Note the converse of the above theorem need not hold always. For example, consider the cycle \(C_4\) and its neighbourhood matrix \({\mathcal {NM}}(C_4)\) for \(l = 1\). For non-adjacent vertices i and j, \(\eta _{ij}^{\{1\}} = -\,2\) while \(d_{C_4}(i,j) = 2\).

From the above theorem, we define the parameter called range of distance between any pair of vertices in a graph as follows:

Definition 4

Given two vertices i and j, \(rd_{G}(i,j)\), is defined as follows:

$$\begin{aligned} rd_{G}(i,j)={\left\{ \begin{array}{ll} 1, &{} \quad {\text {if }}\, (i,j)\,\in E(G)\\ p, &{} \quad {\text {if }}\, i,j \,{\text {are connected and}} \eta ^{\{p\}}_{ij}<0\\ \infty , &{} \quad {\text {if }}\, i,j\, {\text {are not connected}} \\ \end{array}\right. } \end{aligned}$$

Remark 6

For any two connected non-adjacent vertices \(i,j\in V(G)\), we have \(2^{rd_{G}(i,j)-1}<d_{G}(i,j)\le 2^{rd_{G}(i,j)}\)

Theorem 6

A graph G is connected if and only if the corresponding matrix \({\mathcal {NM}}^{\{k\}}\) has no zero entries. In addition, the iteration number k(G) of the graph \(G\ne K_{n}\) is given by \(k(G)=\lceil \log _{2}(\hbox {diameter}(G))\rceil\). (For \(G=K_{n}, k(G)=1\) and \({\mathcal {NM}}^{\{1\}}={\mathcal {NM}}(G)=-\,C(G)\))

Corollary 1

A graph G is disconnected if and only if \(z < n^{2}\). Further, the iteration number k(G) of G is given by \(k(G)=\lceil \log _{2}S\rceil\), where S is the maximum over the diameter of components of G.

Remark 7

For any connected graph G, it is immediate from the above discussion that \(1\le k(G)\le \lceil \log _2 (n-1)\rceil\), that is, the iteration number is maximum when \(\hbox {diameter}(G)\) is maximum. It is well-known that the path graph on n vertices has the largest diameter among all graphs on n vertices (with diameter \(n-1\)), while any graph with diameter at most 2 has \(k(G) =1\).

An efficient shortest path algorithm

In this section, we present a brief description of the proposed algorithm to find the shortest path between any pair of vertices in a given graph G along with the pseudocodes in MATLAB of the algorithm.

Description of the algorithm

The objective of this work is to reduce the computational time involved in finding the shortest path between any given pair of vertices. The algorithm is based on the construction of the sequence of graphs and its corresponding matrices \({\mathcal {NM}}^{\{l\}}, 1\le l \le k(G)\). Using this sequence of matrices and the results from the previous section, we find the shortest path and also show that the computational time taken is very less for graphs of large size. The algorithm is presented in three modules:

In the first module (Algorithm 1), we compute the \({\mathcal {NM}}(G)\) matrix and the sequence of k(G) matrices, namely powers of \({\mathcal {NM}} (G)\) [denoted by SPG( : ,  : ,  : ) ] associated with G. The algorithm is named as SP-\({\mathcal {NM}}(A)\), where A is the adjacency matrix of G.

figurea

In the main module (Algorithm 2), given a graph G, SPG array and a pair of vertices i and j, the shortest path between i and j is obtained with recursive calls to the module CN and itself. The name of the algorithm is \({\mathcal {NM}}-\) ShortestPath(ijkSPG).

Here, first we check if the vertices i and j are connected and estimate the range of distance between i and j, that is, we find \(l:= rd_{G}(i,j)\) such that, \(2^{l-1}<d_{G}(i,j)\le 2^{l}\) by applying the idea from Theorems 3 and 4. In particular, we find least l, for which \(\eta _{ij}^{\{l\}}\ne 0\). This is computed through steps 1–6 of Algorithm 2.

figureb

When \(l= 1\): If \(\eta _{ij}^{\{1\}} >0 \Rightarrow d_G(i,j) = 1\), then \(<i,j>\) is the shortest path and if \(\eta _{ij}^{\{1\}}<0 \Rightarrow d_G(i,j) = 2\), then we have \(<i,x,j>\), where x is a common neighbour of i and j, to be the shortest path. This is computed through steps 7–12 of Algorithm 2.

figurec

For \(l \ge 2\), we proceed to find a vertex y such that \(d_{G}(i,y)=2^{l-1}\) and \(d_{G}(y,j)\) is minimum by invoking the subroutine stated as Algorithm 3. In the module (Algorithm 3), we find the vertex y such that \(d_{G}(i,y)=2^{l-1}\) and \(d_{G}(y,j)\) is minimum. The algorithm is named as CN(ijlSPG). Now, for \(l \ge 2\), we have \(\eta _{ij}^{\{l\}} < 0\) and hence we find the set \(X = \{x: \eta _{ix}^{\{l-1\}} < 0 \}\), where x is the vertex such that \(2^{l-2} < d_{G}(i,x)\le 2^{l-1}\) and \(d_{G}(x,j)\le 2^{l-1}\) (Theorem 4). From this set, we want to choose one y such that \(d_{G}(i,y) = 2^{l-1}\) and \(d_{G}(y,j)\le 2^{l-1}\). Compute \(ct:= \min \{rd_{G}(x,j): x \in X\}\). Note that \(1\le ct \le l-1\). Let \(Z = \{x\in X: rd_{G}(x,j)= ct\}\). Recursively, we calculate the distance between vertices in Z and j. Let y be the vertex such that, \(d_{G}(y,j) \le d_{G}(z,j)\), for all \(z\in Z\), that is the distance \(d_G(y,j)\) is minimum and that y satisfies \(d_{G}(i,y)=2^{l-1}\) and \(d_{G}(y,j)\) is minimum.

Upon repeated application of this iterative process to the pairs \(\{i,y\}\) and \(\{y,j\}\), we get all the vertices lying on the shortest path between i and j. (rest of Algorithm 2)

In general, the algorithm requires computation of k(G) matrices and storing it for further purpose, hence there is a huge requirement of storage space and the space complexity spans \(\textit{O}(k(G)n^2)\).

Example

Consider the graph in Fig. 2 and having computed the matrices in Fig. 3 using Algorithm 1, let us illustrate Algorithms 2 and 3 for \(i = 2\) and \(j= 11\).

First we find that \(\eta _{2,11}^{\{3\}} = SPG(2,11,3)= -\,6 \ne 0\). Therefore, \(l = rd(2, 11)\) \(= 3\). Since \(l\ge 2\), we find a vertex y by invoking Algorithm 3, such that \(d _G(2,y) = 4\) and \(d_G(y,11)\) is minimum. First, we get the set \(X = \{5,7,8,10\}\) by scanning the second row of \({\mathcal {NM}}^{\{2\}}(G):=SPG(2, :, 2)\).

Next, we compute \(ct = \min \{rd(x,11): x\in X\} = 1\) using steps 5–7 and r1 stores those vertices from X which had \(ct =1\). Since \(ct = 1\), by step 10, choosing positions of \(SPG(r1,11,1) > 0\) directly, we get \(y = 10\) and the distance between 10 and 11 is 1. Hence, we get the vector \(S_P = [2~10~11]\) as the output at this step.

Now recursively, repeating the procedure for the vertex pair (2, 10): \(\eta _{2,10}^{\{2\}} =\) \(SPG(2,10,2) = -\,1,\) we have \(l = rd(2,10) = 2\). Since \(l=2\), we find a vertex y by invoking Algorithm 3. Here, \(X= \{3,6\}\) is obtained by scanning the second row of \({\mathcal {NM}}^{\{1\}}(G):= SPG(2, :, 1)\). From steps 5–7, compute \(ct= 1\) and by steps 9–11, we get \(y = 6\), with distance between 6 and 10 equals 2. So we get the vector [2 6 10 11].

Now repeating the process with the vertex pairs (2, 6) and (6, 10), we get [2 4 6] and [6 8 10]. Finally, combining the solutions, we get [2 4 6 8 10 11] as the shortest path.

Time complexity

As stated before, the proposed algorithm has three modules and first, we discuss their time complexity individually.

  1. 1.

    In the first module (Algorithm 1), we compute the sequence of k(G) matrices, \({\mathcal {NM}}^{\{k\}}(G)\). The running time of this module is \({{\mathcal {O}}}([k+1]n^{2.3737})\) as we apply the product of two matrices namely adjacency A(G) and the Laplacian C(G) to obtain the \({\mathcal {NM}}(G)\) and we do this for k(G) times, for where \(k(G)= \lceil \log _{2}(diameter(G))\rceil\), while G is connected and \(k(G) + 1\) times while G is disconnected. We use Coppersmith–Winograd algorithm [6] for matrix multiplication which has a \({{\mathcal {O}}}(n^{2.3737})\) running time.

  2. 2.

    For the module in Algorithm 3, the worst case running time is \({{\mathcal {O}}}(k^{2}n)\), where \(k(G)\le \lceil \log _{2}(diameter(G))\rceil \le \lceil \log _{2}n\rceil\) and the best case running time is \({{\mathcal {O}}}(n)\).

  3. 3.

    For the main module (Algorithm 2) \({\mathcal {NM}}-\) ShortestPath(ijkSPG), the best case running time is \({{\mathcal {O}}}(n)\) when \(k =1\) while the worst case running time of this algorithm is \({{\mathcal {O}}}(2^{k}k^{2}n)\), \(k(G) = \lceil \log _{2} (diameter(G))\rceil \le \lceil \log _{2}n\rceil\), which turns out to be \({{\mathcal {O}}}[(n\log _2 n)^2]\). Note that for most of the choices of i and j, whose distance lies between \(2^{l-1}< d(i,j) \le 2^{l}\) for \(l\le k\), this algorithm runs only in \({{\mathcal {O}}}(2^{l}l^{2}n)\), \(1\le l\le k\).

Given any graph G and a pair of vertices i and j in G, the computational time required by our algorithms in the worst case is \({{\mathcal {O}}}([k+1]n^{2.3737}) + {{\mathcal {O}}}(2^k\cdot k^2 \cdot n)\), where the computation of \({\mathcal {NM}}^{\{l\}}\) is a one-time procedure for each graph. In addition, to find all possible shortest paths from a single source node i, we require \({{\mathcal {O}}}([k+1]n^{2.3737}) + {{\mathcal {O}}}(2^k\cdot k^2 \cdot n(n-1))\), k is the iteration number of G. Moreover, for different pairs of vertices within the same graph, the bound given here is too large.

As it is well-known, the notion of the running time of the algorithm always refer to worst-case running time, which do not always resemble the algorithms’ behaviour on real-life instances. However, algorithms with large worst-case running times do not inevitably perform poorly in practice.

Hence, in the next section, we empirically analyse the algorithms on various datasets to justify the bound.

Results and discussion

In this section, we consider various datasets of graphs and illustrate the results from previous section.

For this purpose, we use special graph classes and compare the proposed shortest path algorithm with the existing algorithms such as Dijkstra, BFS and Bellman–Ford to find the shortest path between any given pair of vertices. We have tested our algorithm on various randomly generated collection of graphs using MATLAB.

We would like to recall from Theorem 6 that the iteration number for any connected graph is given by \(1 \le k(G) \le \lceil \log _{2}(diameter(G))\rceil \le \lceil \log _{2}(n-1)\rceil\). Since path graph has diam \(n-1\) and requires computation of \(\lceil \log _{2}(n-1)\rceil -{\mathcal {NM}}^{\{l\}}\) matrices, maximum number of iterations in the worst case. Hence, to ensure connectivity, sparseness and the worst cases possible, we sample our datasets by adding randomly edges to a path graph \(P_n\).

On the other hand, it is well-established fact in the complex network analysis that any real-world network exhibits sparseness and a behaviour of small world network whose average diameter is approximately 6. In this regard, we have also found that all the considered real-world networks required computation of 3 or 4 of the \({\mathcal {NM}}^{\{l\}}\) matrices giving us the iteration number as 3 or 4.

We have experimentally tested our theory on various datasets. However, here for brevity, we present four such datasets.

\(\underline{DS_{1}:}\) This dataset contains 1100 graphs where each graph is connected and contains 100 vertices. The first graph of this collection is \(P_{100}\cup \{e\}\), where e is any edge. Consecutively, a graph \(G_{m} \in DS_1\), for \(1 \le m \le 1100\) contains \(P_{100} \cup \{e_{1}, e_{2},\ldots ,e_{m}\}\) for a random generated set of \(m-\)distinct edges. In particular, \(G_{m}\) contains 100 vertices and \(99 + m\) edges.

\(\underline{DS_{2}:}\) Similar to the dataset \(DS_1\), this dataset contains 1100 graphs where each graph is connected and has 200 vertices. The first graph of this collection is \(P_{200}\cup \{e\}\), where e is any edge. A graph \(G_{m} \in DS_2\), for \(1 \le m \le 1100\) contains \(P_{200} \cup \{e_{1}, e_{2},\ldots ,e_{m}\}\) for a randomly generated set of \(m-\)distinct edges. In particular, \(G_{m}\) contains 200 vertices and \(199+m\) edges.

\(\underline{DS_{3}:}\) This dataset also contains 1100 graphs, where each graph is connected and has 300 vertices. The first graph of this collection is \(P_{300}\cup \{e\}\), where e is any arbitrary edge. A graph \(G_{m} \in DS_3\), for \(1 \le m \le 1100\) contains \(P_{300} \cup \{e_{1}, e_{2},\ldots ,e_{m}\}\) for a randomly generated set of \(m-\)distinct edges. In particular, \(G_{m}\) contains 300 vertices and \(299+m\) edges.

Further, to implement our algorithm and test our claims on a large-scale real-world network, we have considered the Facebook graph obtained from SNAP [17].

\(\underline{{DS_{4}:}}\) This dataset contains one graph with 4039 vertices, 88,234 edges and has a diameter of 8. (Hence, we see that the iteration number is 3).

Fig. 4
figure4

Running times for dataset \(DS _{1}\)

Fig. 5
figure5

Running times for dataset \(DS _{2}\)

Fig. 6
figure6

Running times for dataset \(DS _{3}\)

Fig. 7
figure7

Time taken by \({\mathcal {NM}}\) Shortestpath on different datasets \(DS_{1},DS_{2}\) and \(DS_{3}\)

Figures 45 and 6 represent the total time required to compute the shortest path between distinct pairs of vertices for each of the graphs given in the considered datasets \(DS_{1}\), \(DS_{2}\), \(DS_{3}\), respectively, by all the four shortest path algorithms, namely \({\mathcal {NM}}\) shortest path algorithm, BFS, Bellman–Ford and Dijkstra.

Figure 4 represents the computational time for \(G_{20}\) to \(G_{1100}\), while the inner box of Fig. 4 contains the computational time required for \(G_{1}\) to \(G_{50}\) to compute the shortest path between all pairs of vertices. As the scale has a huge variation at the beginning, to visualize the plots and to give a clear distinction of the computational time required by all the four algorithms, the plots are presented this way. It is immediately seen that the computational time of \({\mathcal {NM}}\)-Shortestpath algorithm for (nearly) path graphs is too large and the plots become difficult to visualize at the beginning. From this figure, we observe that the running time of the \({\mathcal {NM}}\) shortest path algorithm decreases exponentially and rapidly, with increase in number of edges in the graph. There is a steep and sharp peak at the beginning, which is due to the fact that iteration number is large for graphs which are almost path graphs. With increasing number of edges, since the graphs are connected, the graph’s diameter decreases.

In particular, we have that from \(G_{20}\) to \(G_{1100}\) in \(DS_{1}\), as the diameter of the graph decreases, the iteration number \(k(G_m)\) also decreases as m runs from 20 to 1100. Hence, the computation of sequence of \({\mathcal {NM}}\) matrices and the shortest path between any pair of vertices decreases, while the other three algorithms steadily increases with the increase in number of edges in the graph. However, we conclude that less computational time is required by \({\mathcal {NM}}-\)ShortestPath when compared to the existing algorithms.

A similar behaviour is observed with the other two datasets \(DS_2\) and \(DS_3\) when the computational time of the four algorithms is compared. In fact, a similar pattern of plots are obtained which is evident in Figs. 5 and 6. Hence, we conclude that our algorithm performs extremely well when the graph is not too sparse specifically, not a nearly path graph. In fact, on an average, we find that for the graphs with edge density greater than 25%, our proposed algorithm runs correctly with very high efficiency.

Further, Fig. 7 represents the comparative analysis of the computational time of the proposed algorithm for the considered datasets \(DS_{1}\), \(DS_{2}\), \(DS_{3}\). Here, we observe that as n increases, the average computational time increases but decreases exponentially as the density of the graphs increases. Moreover, they form a similar pattern showing that irrespective of the order of the graph as the density of the graph increases and diameter of the graph decreases, the running time is comparatively minimal. This leaves a positive note for studying dense graphs in minimal amount of time.

Table 1 Computation time (in s) of the three known shortest path algorithms on \(DS_{4}\)
Table 2 Computation time (in s) of the \({\mathcal {NM}}\) shortest path algorithm on \(DS_{4}\)

For the Facebook graph considered in \(DS _{4}\), we have \(k(G)=3\), that is, the graph is connected and has diameter 8. As there are 8,158,780 possible pairs of vertices, we first randomly choose a set of pairs to test the efficiency of the existing algorithms, namely the Dijkstra, Bellman–Ford and BFS with the proposed \({\mathcal {NM}}\)-ShortestPath algorithm. Tables 1 and 2 represent the total running time of the algorithms to compute the shortest path between randomly chosen pairs of vertices, namely 50,000 pairs, 100,000 pairs, all the pairs of vertices which are at maximum distance from each other, that is, the diametrically opposite pairs of vertices (there were 15,620 such pairs) and finally, we considered all the possible pairs of vertices (namely the 81,587,80 pairs).

In Table 1, the data represents the total computational time required and the mean computational time per pair for the three existing algorithms. Table 2 represents the runtime of \({\mathcal {NM}}\) shortest path algorithm, where the run time for computing the sequence of matrices and the shortest paths are given in columns 2 and 3 along with total computational time, given in column 4. Here, we observe that there is a overhead computational time (of approx 115 s) involved to compute all the k-matrices of \({\mathcal {NM}}\) sequence. Having computed that finding the shortest path takes less than 0.0003 s per pair.

From the tables, it is evident that for 50,000 pairs of vertices, the \({\mathcal {NM}}\) shortest path algorithm takes only 155.5888 s including 115.9333 s to compute the sequence of \({\mathcal {NM}}\) matrices as preprocessing and 39.6555 s to compute the actual shortest path between the given pairs. While for the same pairs, the BFS algorithm has taken 3840.4749 s, Dijkstra’s takes 3992.4677 s and Bellman–Ford requires a maximum time of 4705.6922 s.

Similarly, for the 100,000 randomly chosen pairs of vertices, we get that \({\mathcal {NM}}\) shortest path algorithm takes 194.8794 s only, while the other algorithms takes around 7600 to 9300 s.

Now considering the pairs of diametrically opposite vertices, we anticipate that the running time for the \({\mathcal {NM}}\) shortest path algorithm will be large for such pairs when compared to other possible pairs. Row 3 in Tables 1 and 2 represents the running time for these 15,620 pairs of vertices which have a shortest path distance exactly 8. Our expectations were justified and proven to be true as we can see that the mean computational time for a pair of vertices with diameter 8 takes maximum time of (0.0082), while the average is 0.0004 over all the possible pairs.

Row 4 represents the total running time to compute all-pairs shortest path of the graph. As we can see the \({\mathcal {NM}}\) shortest path algorithm takes only 3126.1592 s (which is less than an hour) to find all pair shortest path or an average 0.0004 s to compute the running time of any given pair in the graph. It is immediate to see that the running time of existing shortest path algorithms for all pairs takes more than 158 h (estimated based on the mean time obtained for random pairs collection).

The above plots and tables clearly indicate that the proposed \({{\mathcal {NM}}}\) shortest path algorithm is efficient in terms of accuracy and time involved than the existing shortest path algorithms. Moreover, for increasing size of graphs, the difference in the running time of an existing algorithm and our algorithm is also very high, with our algorithm performing well almost every time. The only drawback is that when the graph’s diameter is close to n, the computation of the sequence of matrices takes more time and also finding shortest path takes considerably larger amount of time.

Conclusion

In this paper, our focus is to design an efficient algorithm to find the shortest path between any given pair of vertices in a graph. To achieve this, we proposed the concept of sequence of powers of \({\mathcal {NM}}(G)\) matrix that can be associated with a graph to reveal more information. With the help of these matrices, we presented an efficient algorithm (\({\mathcal {NM}}\)-ShortestPath) to compute the shortest path on any graph G, with the computational time of \({{\mathcal {O}}}([k+1]n^{2.3737}) + {{\mathcal {O}}}(2^k\cdot k^2 \cdot n)\), in the worst case, where the computation of \({\mathcal {NM}}^{\{l\}}\) is a one-time procedure for each graph. Further, for different pairs of vertices within the same graph, the bound given here is found to be too large.

An empirical study of the algorithm was undertaken and found to be satisfying about the efficiency and accuracy of the same. Thus, demonstrated that \({\mathcal {NM}}\)-ShortestPath algorithm is expeditious and highly effective than the existing algorithms when the graph has small diameters. The only drawback is that when the graph’s diameter is close to n, the computation of the sequence of matrices takes more time and also finding shortest path takes a considerably larger amount of time. Further, in real-life instances, the networks pertain to simulate to be “Small-World” with very small diameter when compared to the size of the network and hence the proposed algorithm performs very well in such cases.

Future directions

In this paper, only undirected unweighted graphs were considered. As a future direction, one could extend this technique to weighted graphs and directed graphs. Also from a theoretical perspective, there is a lot of scope to explore and study the matrix properties of \({\mathcal {NM}}\) such as the spectral analysis from both theoretical and computational aspects. Further, with the association of such a sequence of matrices, we conjecture that many graph related problems could be solved with less time complexity.

References

  1. 1.

    Abraham I, Bartal Y, Chan THH, Dhamdhere KD, Gupta A, Kleinberg J, Neiman O, Slivkins A (2005) Metric embeddings with relaxed guarantees. In: Proceedings of the 46th annual IEEE symposium on foundations of computer science, FOCS’05. IEEE Computer Society, Washington, DC, pp 83–100. https://doi.org/10.1109/SFCS.2005.51

  2. 2.

    Bellman E (1958) On a routing problem. Q Appl Math 16(1):87–90

    Article  Google Scholar 

  3. 3.

    Bondy J, Murty U (1976) Graph theory with applications. Macmillan, Berlin

    Google Scholar 

  4. 4.

    Brunch T, Cornelissen K, Manthey B, Roglin H, Rosner C (2015) Smoothed analysis of the successive shortest path algorithm. SIAM J Comput 44(6):1798–1819

    MathSciNet  Article  Google Scholar 

  5. 5.

    Chakaravarthy VT, Checconi F, Murali P, Petrini F, Sabharwal Y (2017) Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans Parallel Distrib Syst 28(7):2031–2045

    Article  Google Scholar 

  6. 6.

    Coppersmith D, Winograd S (1990) Matrix multiplication via arithmetic progressions. J Symb Comput 9(3):251–280

    MathSciNet  Article  Google Scholar 

  7. 7.

    Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. Prentice-Hall, New York

    Google Scholar 

  8. 8.

    Dijkstra EW (1959) A note on two problems in connection with graphs. Numer Math 1(1):269–271

    MathSciNet  Article  Google Scholar 

  9. 9.

    Dou Y, Zhu L, Wang HS (2012) Solving the fuzzy shortest path problem using multi-criteria decision method based on vague similarity measure. Appl Soft Comput 12:1621–1631

    Article  Google Scholar 

  10. 10.

    Duncan A (2004) Powers of the adjacency matrix and the walk matrix. Collect Univ Malta 9:4–11

    Google Scholar 

  11. 11.

    Ebrahimnejad A, Tavana M, Alrezaamiri H (2016) A novel artificial bee colony algorithm for shortest path problems with fuzzy arc weights. Meas J Int Meas Confed (IMEKO) 93:48–56

    Google Scholar 

  12. 12.

    Gao J, Zhao Q, Ren W, Swami A, Ramanathan R, Bar-Noy A (2015) Dynamic shortest path algorithms for hypergraphs. IEEE/ACM Trans Netw 23(6):1805–1817

    Article  Google Scholar 

  13. 13.

    Gao Y (2011) Shortest path problem with uncertain ARC lengths. Comput Math Appl 62:2591–2600

    MathSciNet  Article  Google Scholar 

  14. 14.

    Goldberg AV, Harrelson C (2005) Computing the shortest path: a search meets graph theory. In: Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, SODA’05. Society for Industrial and Applied Mathematics, Philadelphia, pp 156–165

  15. 15.

    Johnson DB (1977) Efficient algorithms for shortest paths in sparse networks. J ACM 24(1):1–13

    MathSciNet  Article  Google Scholar 

  16. 16.

    Karunakaran S, Selvaganesh L (2019) An unique and novel graph matrix for efficient extraction of structural information of networks. CoRR arXiv:1903.05341, pp 1–10

  17. 17.

    Leskovec J (2012) Social circles: Facebook. https://snap.stanford.edu/data/egonets-~Facebook.html

  18. 18.

    Madkour A, Aref WG, Rehman FU, Rahman MA, Basalamah S (2017) A survey of shortest-path algorithms. arXiv:1705.02044 v1 [cs.DS], pp 1–26

  19. 19.

    Mulmuley K, Shah P (2001) A lower bound for the shortest path problem. J Comput Syst Sci 63(2):253–267. https://doi.org/10.1006/jcss.2001.1766

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Nikolova E, Kelner JA, Brand M, Mitzenmacher M (2006) Stochastic shortest paths via quasi-convex maximization. In: Azar Y, Erlebach T (eds) Algorithms—ESA 2006. Springer, Berlin, pp 552–563

    Google Scholar 

  21. 21.

    Potamias M, Bonchi F, Castillo C, Gionis A (2009) Fast shortest path distance estimation in large networks. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM’09. ACM, New York, pp 867–876. https://doi.org/10.1145/1645953.1646063

  22. 22.

    Roditty L, Shapira A (2011) All-pairs shortest paths with a sublinear additive error. ACM Trans Algorithms 7(4):45:1–45:12. https://doi.org/10.1145/2000807.2000813

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Seidel R (1995) On the all-pairs-shortest-path problem in unweighted undirected graphs. J Comput Syst Sci 51(3):400–403

    MathSciNet  Article  Google Scholar 

  24. 24.

    Sen S (2009) Approximating shortest paths in graphs. In: Proceedings of the 3rd international workshop on algorithms and computation, WALCOM’09. Springer, Berlin, pp 32–43. https://doi.org/10.1007/978-3-642-00202-1_3

  25. 25.

    Sheng Y, Gao Y (2016) Shortest path problem of uncertain random network. Comput Ind Eng 99:97–105

    Article  Google Scholar 

  26. 26.

    Sommer C (2014) Shortest-path queries in static networks. ACM Comput Surv 46(45):1–31

    Article  Google Scholar 

  27. 27.

    Thabet K, Al-Ghuribi S (2012) Matrix multiplication algorithms. Int J Comput Sci Netw Secur 12(2):74–79

    Google Scholar 

  28. 28.

    West D (2000) Introduction to graph theory. Pearson, London

    Google Scholar 

  29. 29.

    Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, EDBT’09. ACM, New York, pp 493–504. https://doi.org/10.1145/1516360.1516418

  30. 30.

    Zwick U (2001) Exact and approximate distances in graphs—a survey. In: Proceedings of the 9th annual European symposium on algorithms, ESA’01. Springer, London, pp 33–48. http://dl.acm.org/citation.cfm?id=647911.740642

Download references

Acknowledgements

The authors would like to acknowledge and thank DST-SERB Young Scientist Scheme, India (Grants No. SB/FTP/MS-050/2013), for their support to carry out this research at SRM Research Institute, SRM Institute of Science and Technology. Mr. Sivakumar Karunakaran would like to thank SRM Research Institute and the Department of Mathematical Science, IIT (BHU), for their support during the preparation of this manuscript. He would also like to thank Mr. Ashish Kumar Maurya and Mr. Amit Biswas, research scholars of IIT (BHU) for fruitful discussions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lavanya Selvaganesh.

Ethics declarations

Competing interest

The authors declare that they have no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this article is also available in the arXiv with the identifier 1903.05377.

Appendix

Appendix

In this section, we present all the necessary proofs for the theorems discussed in Sect. 2.

Proof of Theorem 3

Theorem 3 Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. If \(\eta _{ij}^{\{l\}}< 0\) for some \(1\le l \le k(G)\), then

  1. 1.

    \(\eta _{ij}^{\{p\}}=0, 1\le p \le l-1\) and \(\eta _{ij}^{\{q\}}>0, l+1\le q\le k(G)\). Converse is also true.

  2. 2.

    There exists a vertex x such that \(\eta _{ix} ^{\{l\}} >0\) and \(\eta _{xj} ^{\{l\}}>0\). Converse need not be true.

Proof

  1. 1.

    For some, l \(1\le l \le k(G)\), given \(\eta _{ij}^{\{l\}}<0\implies\) by the Definition 1, i and j are not adjacent in \(G^{\{l\}}\). In fact, i and j are not adjacent in \(G^{\{p\}}\), for any \(p, 1\le p \le l\), that is, \(a_{ij}^{\{p\}}=0\). By the Definition 1, \(a_{ij}^{\{p\}}=0\) implies \(\eta ^{\{p-1\}}_{ij}\) must have been zero. Hence, we have \(\eta _{ij}^{\{p\}}=0\), for \(1 \le p \le l-1\). Further, \(\eta _{ij}^{\{l\}}<0\implies a_{ij}^{\{l+1\}}=1\), that is, i and j become adjacent in \(G ^{\{l+1\}}\). Hence, \(\eta _{ij}^{\{l+1\}} >0\) (By Definition 1). In addition, we have \(G^{\{l+1\}}\) is a subgraph of \(G^{\{q\}}\) for every \(q \ge l+1\). Hence by definition of \({\mathcal {NM}}(G)\), \(\eta _{ij}^{\{q\}}>0\), for every \(q, q\ge l+1\).

    Conversely, suppose \(\eta _{ij}^{\{p\}}=0, 1\le p< l\) and \(\eta _{ij}^{\{q\}}>0\), \(q\ge l+1\) holds, then in particular, for \(p= l-1\) and for \(q = l+1\), we have,

    $$\begin{aligned} \eta _{ij}^{\{l-1\}}=0 \implies a_{ij}^{\{l\}}=0\implies \eta _{ij}^{\{l\}}\le 0 \end{aligned}$$
    (1)
    $$\begin{aligned} \eta _{ij}^{\{l+1\}} > 0 \implies a_{ij}^{\{l+1\}} = 1\implies \eta _{ij}^{\{l\}}\ne 0 \end{aligned}$$
    (2)

    Equations (1) and (2) together implies \(\eta _{ij}^{\{l\}}<0\).

  2. 2.

    If \(\eta ^{\{l\}}_{ij} <0\), then by the Definition 1, i and j are not adjacent in \(G^{\{l\}}\) and there exist \(|N_{G^{\{l\}}}(i)\cap N_{G^{\{l\}}}(j)|\) common neighbours between i and j in \(G^{\{l\}}\). Let x be any such common neighbour of i and j in \(G^{\{l\}}\), then \(\eta _{ix}^{\{l\}}>0\) and \(\eta _{xj}^{\{l\}}>0\).

    To disprove the converse, consider the case when three vertices \(\{i,x,j\}\) are mutually adjacent in \(G^{\{l\}}\), then by Definition 1 we have, \(\eta _{ix}^{\{l\}}>0, \eta _{xj}^{\{l\}}>0\) and \(\eta _{ij}^{\{l\}}>0\) which contradicts the conclusion \(\eta _{ij}^{\{l\}}<0\). \(\square\)

Proof of Theorem 4

Theorem 4 Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. For \(1 \le l \le k(G)\), the off-diagonal elements of \({\mathcal {NM}}^{\{l\}}\) can be characterized as follows, for \(1\le i\ne j\le n\)

  1. 1.

    \(\eta ^{\{l\}}_{ij}=0\) if and only if \(d_{G}(i,j)> 2^{l}\)

  2. 2.

    \(\eta ^{\{l\}}_{ij}>0\) if and only if \(0 < d_{G}(i,j)\le 2^{l-1}\)

  3. 3.

    \(\eta ^{\{l\}}_{ij}<0\) if and only if \(2^{l-1}<d_{G}(i,j) \le 2^l\)

Proof

Let us prove the theorem by induction on l. Basic step: When \(l=1\), by Definition 1 and Remark 1, we know that \(\eta _{ij}^{\{1\}}\ne 0\iff d_{G}(i,j)\le 2\). In particular, we have

$$\begin{aligned} \eta _{ij}^{\{1\}}=0\iff & {} d_{G}(i,j)>2 \end{aligned}$$
(3)
$$\begin{aligned} \eta _{ij}^{\{1\}}>0\iff & {} d_{G}(i,j)=1 \end{aligned}$$
(4)
$$\begin{aligned} \eta _{ij}^{\{1\}} <0\iff & {} d_{G}(i,j)=2 \end{aligned}$$
(5)

Let us assume the theorem is true for any integer \(p < k(G)\), that is,

$$\begin{aligned} \eta ^{\{p\}}_{ij}=0\iff & {} d_{G}(i,j)> 2^{p} \end{aligned}$$
(6)
$$\begin{aligned} \eta ^{\{p\}}_{ij}>0\iff & {} 0 < d_{G}(i,j)\le 2^{p-1} \end{aligned}$$
(7)
$$\begin{aligned} \eta ^{\{p\}}_{ij}<0\iff & {} 2^{p-1}<d_{G}(i,j) \le 2^p \end{aligned}$$
(8)

Let us now prove for \(p+1\):

  1. 1.

    \(\eta _{ij}^{\{p+1\}}=0 \iff \eta _{ij}^{\{l\}}\) is zero for every l, \(1\le l \le p+1\) (Which is shown in the proof of Theorem 3 (1)). Therefore, by induction, \(d_{G}(i,j) > \underbrace{2\cdot 2 \cdots 2}_{p+ 1 {\text { times }}} = 2 ^{p+1}\).

  2. 2.

    \(\eta _{ij}^{\{p+1\}}>0 {\mathop {\Longleftrightarrow }\limits ^{{\text {By Def }}\,1}} i\) and j are adjacent in \(G^{\{p+1\}}\), that is, \(a_{ij}^{\{p+1\}}=1 {\mathop {\Longleftrightarrow }\limits ^{\text {By Def }\,2}} \eta _{ij}^{\{p\}}\ne 0\). Hence, by Eqs. (7) and (8), we have \(0 \le d_{G}(i,j)\le 2^{p}\).

  3. 3.

    Suppose \(\eta _{ij}^{\{p+1\}}<0\) then by Theorem 3 (1) we have \(\eta _{ij}^{\{p\}}=0\). Therefore, by induction, we have \(d_{G}(i,j)>2^{p}\). Further by Theorem 3 (2), we know that there exists a vertex x such that \(\eta _{ix}^{\{p+1\}}>0\) and \(\eta _{xj}^{\{p+1\}}>0\). Hence, by induction on the nonnegative entries, we have \(d_{G}(i,x) \le 2^p\) and \(d_{G}(x,j)\le 2^{p}\). Therefore \(\eta _{ij}^{\{p+1\}}<0 \implies 2^{p}<d_{G}(i,j)\le d_{G}(i,x)+d_{G}(x,j)\le 2^{p+1}\). Conversely, let \(2^{p}<d_{G}(i,j)\le 2^{p+1}\). By induction on the nonnegative entries in the \(\eta _{ij}^{\{p+1\}}\) namely using (1.) and (2.) above we have \(d_{G}(i,j)> 2^p \implies \eta _{ij}^{\{p\}}=0\), and \(d_{G}(i,j)\le 2^{p+1} \implies \eta _{ij}^{\{p+2\}}>0\). Since \(\eta _{ij}^{\{p\}}=0\) and \(\eta _{ij}^{\{p+2\}}>0\), by Theorem 3 (1), we have \(\eta _{ij}^{\{p+1\}}<0\).

Hence, the theorem is true for all integers \(l, 1\le l \le k(G)\). \(\square\)

Proof of Theorem 5

Theorem 5 Let G be an undirected unweighted simple graph on n vertices and let k(G) be the iteration number of G. If \(\eta _{ij}^{\{l\}}=-\,1,\) for \(1\le l \le k(G)\), then \(d_{G}(i,j)=2^{l}\).

Proof

By the definition of \({\mathcal {NM}}\) matrix, \(\eta _{ij}^{\{l\}}=-\,1\), implies that there exist exactly one common neighbour, say x, between i and j in \(G^{\{l\}}\) and by Theorem 4, we have \(2^{l-1}<d_{G}(i,j)=d_{G}(i,x)+d_{G}(x,j)\le 2^{l}\). Next, we claim that x is such a vertex that satisfies \(d_{G}(i,x) = 2^{l-1}\) and \(d_{G}(x,j)= 2^{l-1}\) and therefore, \(d_{G}(i,j)\) becomes \(2^l\).

Claim: \(d_{G}(i,x) = 2^{l-1}\) and \(d_{G}(x,j)= 2^{l-1}\)

Suppose \(f_{1}\) is the vertex on the path connecting i and j such that \(d_{G}(i,f_{1})=2^{l-1}\) and \(d_{G}(f_{1},j)\le 2^{l-1}\). By Theorem 4, we have \(\eta _{if_{1}}^{\{l-1\}} < 0\) and \(\eta _{f_{1}j}^{\{l-1\}}\ne 0\implies \eta _{if_{1}}^{\{l\}}>0\) and \(\eta _{f_{1}j}^{\{l\}}> 0\) \(\implies\) \(f_{1}\) is the common neighbour of i and j in \(G^{\{l\}}\). Since x is the only common neighbour of i and j, we have \(f_{1} = x\), that is, \(d_{G}(i,x)=2^{l-1}.\)

Similarly suppose \(f_{2}\) is the vertex on the path connecting i and j, such that \(d_{G}(f_{2},j)=2^{l-1}\) and \(d_{G}(i,f_{2})\le 2^{l-1}\), then by same argument as before, we get that \(f_{2}=x\). Hence, \(d_{G}(x,j)=2^{l-1}\).

Therefore, \(d_{G}(i,j)=d_{G}(i,x)+d_{G}(x,j)=2^{l-1}+2^{l-1}=2^{l}\). \(\square\)

Proof of Theorem 6

Theorem 6 A graph G is connected if and only if the corresponding matrix \({\mathcal {NM}}^{\{k\}}\) has no zero entries. In addition, the iteration number k(G) of the graph \(G\ne K_{n}\) is given by \(k(G)=\lceil \log _{2}(diameter(G))\rceil\). (For \(G=K_{n}, k(G)=1\) and \({\mathcal {NM}}^{\{1\}}={\mathcal {NM}}(G)=-\,C(G)\))

Proof

Let the number of nonzero entries in \({\mathcal {NM}}^{\{k\}}\) be denoted by z. Note that \(z\le n^2\).

$$\begin{aligned} z=n^{2}\iff & {} {\mathcal {NM}}^{\{k\}} {\text {has no zero entries}}\\\iff & {} \eta _{ij}^{\{k\}}\ne 0, \forall i,j\in V(G)\\\iff & {} 0<d_{G}(i,j)\le 2^{k},i\ne j,\forall i,j\in V(G)\, ({\hbox {Theorem 4}}) \end{aligned}$$

Hence, G is connected and \(diameter(G) \le 2^k\). Since \(G\ne K_{n},\) we have \(diameter(G)>1\), implies that \(k(G)\ge \log _{2}(diameter(G)).\) Since the iteration number is defined to be the smallest possible integer satisfying the above condition, we obtain \(k(G)=\lceil \log _{2}(diameter(G))\rceil .\) \(\square\)

Proof of Corollary 1

Corollary 1 A graph G is disconnected if and only if \(z < n^{2}\). Further, the iteration number k(G) of G is given by \(k(G)=\lceil \log _{2}S\rceil\), where S is the maximum over the diameter of components of G.

Proof

Follows by contrapositive of above Theorem, since \(z\le n^{2}\) and the observation that, if the \((i,j){\mathrm{th}}\) entry of \({\mathcal {NM}}^{\{k\}}\) is zero then the \((i,j){\mathrm{th}}\) entry of \({\mathcal {NM}}^{\{q\}}\) is zero, \(\forall q\ge 1\). By definition, the iteration number k(G) is the smallest number such that the number of nonzero entries in \({\mathcal {NM}}^{\{l\}}\) remains same for \(l=k(G)\) and \(l=k+1\). In particular, k(G) will be obtained when the distance between every pair of vertices in every component has been accounted. Hence, \(k(G)=\lceil \log _{2}S\rceil\), where S is the maximum over the diameter of all the components of G. \(\square\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Karunakaran, S., Selvaganesh, L. A novel graph matrix representation: sequence of neighbourhood matrices with an application. SN Appl. Sci. 2, 944 (2020). https://doi.org/10.1007/s42452-020-2635-1

Download citation

Keywords

  • Shortest path problem
  • Graph matrix
  • Neighbourhood matrix
  • Matrix multiplication
  • Complex network analysis

Mathematics Subject Classification

  • 05C50
  • 05C62
  • 05C82
  • 05C85