1 Introduction

In the modern world, the ubiquity of the diffusion phenomena taking on networks has incurred huge losses to human society. Some typical examples include computer virus propagation (Wang et al. 2014a), disease spreading (Zhang et al. 2018) and rumor diffusion (Hosseini and Azgomi 2016), etc. It is of great theoretical and practical significance to develop effective strategies to control the harmful diffusion process (Yu et al. 2017). As one of the significant measures, propagation source locating has attracted widespread attentions, many effective methods are proposed in recent years (Jiang et al. 2017; Paluch et al. 2020). These methods can provide effective solutions for many important issues in reality, including locating the source(s) of SARS (Brockmann and Helbing 2013), COVID-19 (Tian et al. 2020), Cholera (Li et al. 2021), identifying the source of delay in public transportation networks (Manitz et al. 2017), estimating the source of foodborne disease (Horn and Friedrich 2019), etc.

It is well known that, when a diffusion process occurs on a network, there exists a spanning tree corresponding to the first time each node gets infected (Shah and Zaman 2011; Pinto et al. 2012; Tang et al. 2018). In fact, reconstructing the spanning tree is helpful to locate the propagation source (Yang et al. 2020). However, the commonly used breadth-first search (BFS) heuristic (Shah and Zaman 2011; Pinto et al. 2012; Yang et al. 2016) may be not an effective strategy (Tang et al. 2018; Yang et al. 2020). In this paper, we introduce an effective graph traversal method termed as relaxed direction-induced search (DIS), which is developed in our previous work (Yang et al. 2020). By utilizing the diffusion direction information of the observers, the relaxed DIS could effectively approximate the spanning tree corresponding to the first time each node gets infected. Based on the relaxed DIS, we further utilize the infection time information of the observers to define two kinds of observers-based similarity measures: (1) Infection Time Similarity, which measures the similarity between the observation infection time of the given observers and the measuring infection time of the given observers. (2) Infection Time Order Similarity, which measures the similarity between two sorted sequences of the given observers. One sequence is the observers ascending order obtained by sorting the observation infection time of the observers. Another sequence is the observers ascending order obtained by sorting the measuring infection time of the observers. Further, with the two kinds of similarity measures and the relaxed DIS, we propose a novel source locating method. Obviously, in the proposed method, both of the diffusion direction information and the infection time information are considered. Experiments are performed on a series of synthetic and real networks; the results show that the proposed method is feasible and effective in accurately locating the propagation source.

Fig. 1
figure 1

Sub-Fig. 1a shows the SI model (\(\beta =1\)) diffused on a given network. The infection is initiated by node 1 (with “red” color). All the “red” paths form a diffusion tree (rooted at node 1) of the network. The arrows attached to the “red” paths represent the actual diffusion direction. The infection diffused along this actual diffusion tree. The nodes with “pink” color are in the infectious state. Sub-Fig. 1b shows a relaxed DIS spanning tree (rooted at node 1) of the network in Sub-Fig. 1a. This tree is constructed by the edges with “red” color, which is generated by the relaxed DIS algorithm with three observers (nodes 2, 3 and 6, with “green” color). The nodes with “gray” color cannot be observed. The pair of value next to each observer represents the recorded Diffusion Direction information and Diffusion Timing information (color figure online)

Table 1 Notation summarization

The rest of this paper is organized as follows. Existing related works are briefly reviewed in Sect. 2. We introduce the direction-induced search (DIS) in Sect. 3. Our method is proposed in Sect. 4. The performance of the proposed method is validated in Sect. 5. We conclude this work in Sect. 6.

2 Related work

For unweighted networks, a systematic method for propagation source locating was pioneered by Shah et al. (2011); they constructed a source estimator based on a novel topological quantity which is termed as Rumor Centrality (RC). Some researchers extended the RC to more complex environments, such as utilizing multiple observations to locate the source (Wang et al. 2014b), locating multi-sources (Luo et al. 2013; Wang et al. 2015) and so on. Zhu et al. (2016) developed a sample path-based method termed as Jordan Center (JC). Several improved methods based on the JC were developed to locate the source(s) with sparse observations (Zhu and Ying 2014; W.Luo et al. 2014; Jiang et al. 2018). Meanwhile, many source locating methods based on various ideas were developed for unweighted networks, including the Dynamic Message Passing-based method (Lokhov et al. 2014), the Belief Propagation base method (Altarelli et al. 2014), the Minimum Description Length-based method (Prakash et al. 2014), the Monte Carlo-based method (Antulov-Fantulin et al. 2015), the Rationality Observation-based method (Yang et al. 2016), the Time Aggregated Graph-based method (Chai et al. 2021), etc. The above methods are effective in unweighted networks. However, in reality, we have to consider various significant weights associated with the edges in networks, such as the traffic, the propagation delay and so on.

For weighted networks, Brockmann et al. (2013) modeled the Global Mobility Network as a weighted graph and proposed a source locating method based on a novel effective distance. This method is extended to more complex environments, including identifying the multiple sources (Jiang et al. 2015), identifying the source of delay in public transportation networks (Manitz et al. 2017), etc. However, the effective distance-based methods require the complete knowledge of nodes state. Meanwhile, there are several source locating methods based on various ideas for weighted networks  (Cai et al. 2018; Chang et al. 2020; Feizi et al. 2019). But these methods also require the complete knowledge of nodes state. In reality, it is often the case that only limited nodes state can be observed (Caputo et al. 2019). To this problem, many methods were developed to locate the source with limited observers. Shen et al. (2016) developed a time-reversal backward spreading (TRBS) algorithm, but this algorithm may not work if the locatability condition is violated. Hu et al. (2019) proposed a greedy optimization algorithm to reduce the number of observers for TRBS. Tang and Ji et al. (2018) et al. proposed a source estimation algorithm based on the Gromov matrix. However, the Gromov matrix may be not the optimal heuristic for source locating. Meanwhile, Fu et al. (2016) proposed a backward diffusion-based method for multiple sources locating. Wang (2019) and Xu et al. (2019) identified the diffusion source based on the Spearman’s coefficient. Wang and Sun (2020) proposed a sequential neighbor filtering (SNF) algorithm for heterogeneous propagation models. Wang et al. (2021) proposed three source locating algorithms by defining the estimated mean and standard deviation of the propagation delay. However, the methods using limited observers only considered the infection time information, but the diffusion direction information was ignored.

Table 2 The time complexity of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms
Table 3 The parameters for generating the synthetic networks

The Gaussian estimator (Pinto et al. 2012) first located the source with limited observers by utilizing the diffusion direction information of the observers, and its time complexity can be reduced by ignoring the observers with low-quality information (Paluch et al. 2018). However, the diffusion direction information is only used in the tree graphs. In our previous work (Yang et al. 2020), a relaxed direction-induced search (DIS) was proposed by utilizing the diffusion direction information. With the relaxed DIS, the accuracy of the Gaussian estimator on general graphs is improved. Different from the previous work, in this paper, we first introduce the relaxed direction-induced search (DIS) (Yang et al. 2020) to utilize the diffusion direction information of the observers to approximate the actual diffusion tree on a network. Based on the relaxed DIS, we further utilize the infection time information of the observers to define two kinds of similarity measures, including the Infection Time Similarity and the Infection Time Order Similarity. With the two kinds of similarity measures and the relaxed DIS, we propose a novel source locating method. Obviously, the diffusion direction information and the infection time information are combined in this method. The feasibility and effectiveness of this method are validated on a series of synthetic and real networks.

Table 4 Real networks
Table 5 The topological properties of the used networks

3 Preliminaries

A network is modeled as an undirected and weighted graph \(\mathcal {G}=\left( \mathcal {V}, \mathcal {E}, {\varvec{\theta }}\right) \), where \(\mathcal {V}\) and \(\mathcal {E}\) represent the nodes set and edges set, respectively. \({\varvec{\theta }}=\left\{ \theta _{uv}\right\} \), where \(\theta _{uv}\) denotes the random propagation delay associated with an edge connecting nodes u and v, \(u, v\in \mathcal {V}\), \(vu\in \mathcal {E}\). The random variables \(\theta _{vu}\) for different edges vu have a known, arbitrary joint distribution.

Diffusion model Similar to the references (Zhu and Ying 2016; W.Luo et al. 2014; Lokhov et al. 2014; Yang et al. 2016), the diffusion process on \(\mathcal {G}\) is discrete. We adopt a simple Susceptible-Infectious (SI) model. With the SI model, each node in \({\mathcal {V}}\) is only in one of the two states: (1) susceptible, if it has not been infected so far, or (2) infectious, if it has been infected by any one neighbor. The diffusion process on \(\mathcal {G}\) is initiated by a single propagation source (denoted by \(s^*\)) at an unknown time \(t^*\). All nodes are susceptible except for \(s^*\) is infectious. A diffusion is possible from an infected node to a susceptible node if and only if there is an edge between them. Once infected, the node will stay the infectious state forever. Let \(\mathcal {N}\left( v\right) \) denote the neighbors set of node v, suppose v is infected by one neighbor w at time \(t_{v}\), then v will attempt to infect each susceptible neighbor \(u\in \mathcal {N}\left( v\right) \) (except for w) along the weighted edge vu with propagation ratio \(\beta \). If there are two or more infected neighbors having a same propagation delay to u, u can be first time infected by only one neighbor. Without loss of generality, the diffusion process is terminated when there are no susceptible nodes in \(\mathcal {G}\).

Fig. 2
figure 2

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (1). Each sub-figure is obtained by 100 runs

Fig. 3
figure 3

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (2). Each sub-figure is obtained by 100 runs

Fig. 4
figure 4

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (3). Each sub-figure is obtained by 100 runs

Fig. 5
figure 5

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (4). Each sub-figure is obtained by 100 runs

Fig. 6
figure 6

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (5). Each sub-figure is obtained by 100 runs

Fig. 7
figure 7

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on BA model (6). Each sub-figure is obtained by 100 runs

Let \(\mathcal {O}=\left\{ o_k\right\} ^{\mathcal {K}}_{k=1}\subseteq \mathcal {V}\) denote the set of \(\mathcal {K}\) observable nodes on \(\mathcal {G}\), termed as observers set, whose location in \(\mathcal {G}\) is known. Generally, there is \(\mathcal {K}\ll \vert \mathcal {V}\vert \). Similar to the references (Pinto et al. 2012; Yang et al. 2020), each \(o_k\in \mathcal {O}\) can provide two types of information: (1) the Diffusion Direction information in which the infection arrives to \(o_k\), (2) the Infection Timing information at which the infection arrives to \(o_k\).

figure a

Direction-induced search (DIS) We introduce a graph traversal method termed as relaxed direction-induced search (DIS), which is developed in our previous work (Yang et al. 2020). The relaxed DIS is summarized in Algorithm 1. The \(E\left( \mathcal {T}\right) \) declared in line 2 is an edge set. The function of lines 3–19 is to traverse the \(\mathcal {G}\) with s as root, which requires \(O\left( \vert \mathcal {V}\vert ^2\right) \) computations in the worst case. Here, from lines 10–16, we know that, if a node is an observer, then the infection direction is determined by the Diffusion Direction information recorded in this node. If the node is a non-observer, the infection direction will be assumed to be its current parent node. From lines 20–23, a DIS spanning tree \({\mathcal {T}}_{\textrm{dis},s}\) is generated if and only if “\(\vert E\left( \mathcal {T} \right) \vert ==\vert V\vert -1\)”, where line 20 requires \(O\left( \vert \mathcal {V}\vert +\vert \mathcal {E}\vert \right) \) computations. Finally, taking the loop in line 1 into account, the time complexity of Algorithm 1 is \(O\left( \vert \mathcal {V}\vert ^3\right) \). Further, by using Algorithm 1, a relaxed DIS spanning tree is generated by utilizing the Diffusion Direction information recorded in \(\mathcal {O}\).

Table 6 The average error hop of the six algorithms on different networks

Frequently used notations are summarized in Table 1.

4 The proposed method

Given an arbitrary \(\mathcal {G}\) and an arbitrary \(\mathcal {O}\), we locate the propagation source by measuring the similarity between the \(\mathcal {O}\) in the actual diffusion tree (corresponding to the first time each node gets infected) and the \(\mathcal {O}\) in a spanning tree of \(\mathcal {G}\), which can be described as an estimator that maximizes the similarity.

$$\begin{aligned} \hat{s}=\mathop {\arg \max }_{s\in \mathcal {V}}\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_s}\right) \end{aligned}$$
(1)

where \(\mathcal {T}_{s^*}\) denotes the actual diffusion tree with source \(s^*\) as root, and \(\mathcal {T}_s\) denotes a tree that spans all nodes in \(\mathcal {G}\) with node s as root. \(\mathcal {O}_{\mathcal {T}_{s^*}}\) and \(\mathcal {O}_{\mathcal {T}_s}\) denote the given \(\mathcal {O}\) in \(\mathcal {T}_{s^*}\) and \(\mathcal {T}_s\), respectively. \(\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_s}\right) \) denotes the similarity between \(\mathcal {O}_{\mathcal {T}_{s^*}}\) and \(\mathcal {O}_{\mathcal {T}_s}\).

Theoretically, we have to evaluate \(\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_s}\right) \) in Eq. 1 for all spanning trees of \(\mathcal {G}\) and then select the one with the maximal similarity and its root is the \(s^*\). However, the complexity to generate all spanning trees of \(\mathcal {G}\) will increase exponentially with the number of nodes. Therefore, we introduce an approximation by assuming that the actual diffusion tree is a relaxed DIS spanning tree (obtained by Algorithm 1), and the time complexity only requires \(O\left( \vert \mathcal {V}\vert ^3\right) \). Then, Eq. 1 can be modified as follows.

$$\begin{aligned} \hat{s}=\mathop {\arg \max }_{s\in \mathcal {V}}\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_{{\text {DIS}},s}}\right) \end{aligned}$$
(2)

where \(\mathcal {T}_{{\text {DIS}},s}\) is a relaxed DIS spanning tree of \(\mathcal {G}\) with a node s as root. \(\mathcal {O}_{\mathcal {T}_{{\text {DIS}},s}}\) denote the given \(\mathcal {O}\) in \(\mathcal {T}_{{\text {DIS}},s}\). \(\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_{{\text {DIS}},s}}\right) \) denotes the similarity between \(\mathcal {O}_{\mathcal {T}_{s^*}}\) and \(\mathcal {O}_{\mathcal {T}_{{\text {DIS}},s}}\).

Since \(\mathcal {K}<\vert \mathcal {V}\vert \), \(\mathcal {T}_{{\text {DIS}},s}\) may be not unique, and each \(\mathcal {T}_{{\text {DIS}},s}\) may not correspond to the actual diffusion tree. Thus, the relaxed DIS is obviously a sub-optimal heuristic.

4.1 Observers-based similarity measures

In this subsection, we first define two kinds of observers-based similarity measures by utilizing the infection time information of observers. One is Infection Time Similarity; another is Infection Time Order Similarity.

Definition 1

Observation Infection Time. Given an arbitrary \(\mathcal {G}=\left( \mathcal {V},\mathcal {E},{\varvec{\theta }}\right) \) and an observers set \(\mathcal {O}=\left\{ o_k\right\} ^{\mathcal {K}}_{k=1}\). the Observation Infection Time of \(\mathcal {O}\) is defined as \({{\textbf {T}}}_{\mathcal {O}}=\left\{ t_{o_k}\right\} ^{\mathcal {K}}_{k=1}\), where \(t_{o_k}\) denotes the Infection Timing information recorded in \(o_k\).

Fig. 8
figure 8

The IQR (box-plot) of error hop of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms on the BA models generated by different powers of the preferential attachment. The orange line and red line in each box denote the median and average error hop (also shown in Table 6), respectively (color figure online)

Table 7 The average error delay of the six algorithms on different networks

Definition 2

Measuring Infection Time. Given an arbitrary \(\mathcal {G}=\left( \mathcal {V},\mathcal {E},{\varvec{\theta }}\right) \) and an observers set \(\mathcal {O}=\left\{ o_k\right\} ^{\mathcal {K}}_{k=1}\). The Measuring Infection Time of \(\mathcal {O}\) in \({\mathcal {T}_{{\text {DIS}},s}}\) is defined as:

$$\begin{aligned} {{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}=\left\{ t_{v_{o_k}}\right\} ^{\mathcal {K}}_{k=1} \end{aligned}$$
(3)

where \(\mathcal {T}_{{\text {DIS}},s}\) is a relaxed DIS spanning tree of \(\mathcal {G}\) with a node s as root, \(s\in \mathcal {V}\). \(t_{v_{o_k}}\) denotes the Measuring Infection Time of node \(v_{o_k}\); \(v_{o_k}\) is the node with the node number corresponding to the observer \(o_k\).

$$\begin{aligned} t_{v_{o_k}}= {\left\{ \begin{array}{ll} t^*,&{}v_{o_k}=s\\ t^*+{\sum }_{j\in p\left( s,v_{o_k}\right) }\theta _j,&{}v_{o_k}\ne s \end{array}\right. } \end{aligned}$$
(4)

where \(t^*\) is the unknown start time. s is the root of \({\mathcal {T}_{{\text {DIS}},s}}\). \(p\left( s,v_{o_k}\right) \) denotes the path from s to \(v_{o_k}\) in the \({\mathcal {T}_{{\text {DIS}},s}}\). \(\theta _j\) denotes the propagation delay.

Fig. 9
figure 9

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on WS model (1). Each sub-figure is obtained by 100 runs

In fact, in the current fixed \({\mathcal {T}_{{\text {DIS}},s}}\) with s as root, the s is assumed to be the propagation source. Thus, \(t^*\) minimizes the difference between \({{\textbf {T}}}_{\mathcal {O}}\) and \({{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\); \(t^*\) can be estimated by the following function.

$$\begin{aligned} \begin{aligned} \hat{t^*}=&\arg \min \sum _{k=1}^\mathcal {K}\left( t_{v_{o_k}}-t_{o_k}\right) ^2\\ =&\arg \min \sum _{k=1}^\mathcal {K}\left( t^*+\sum _{j\in p\left( s,v_{o_k}\right) }\theta _j-t_{o_k}\right) ^2 \end{aligned} \end{aligned}$$
(5)

where \(t^*\in \left[ 0,z\right] \), \(z\le \sum \theta \), \(\theta \in {\varvec{\theta }}\). \(t_{v_{o_k}}\) denotes the Measuring Infection Time of \(v_{o_k}\). \(t_{o_k}\) denotes the Infection Timing information recorded of \(o_k\).

Fig. 10
figure 10

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on WS model (2). Each sub-figure is obtained by 100 runs

Fig. 11
figure 11

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on WS model (3). Each sub-figure is obtained by 100 runs

Definition 3

Infection Time Similarity is defined as:

$$\begin{aligned} \begin{aligned} \varvec{\mathcal {S}}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) =&\frac{1}{1+\mathcal {D}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) }\\ =&\frac{1}{1+\left( \sum \nolimits _{k=1}^\mathcal {K}\vert t_{v_{o_k}}-t_{o_k}\vert ^{2}\right) ^{1/2}} \end{aligned} \end{aligned}$$
(6)

where \(\mathcal {D}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \) denotes the Euclidean distance (Rui and Wunsch 2005) between the \({{\textbf {T}}}_{\mathcal {O}}\) (Definition 1) and the \({{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\) (Definition 2). \(\varvec{\mathcal {S}}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \in \left( 0,1\right] \).

Definition 4

Observation Infection Time Order. Given an arbitrary \(\mathcal {G}=\left( \mathcal {V},\mathcal {E},{\varvec{\theta }}\right) \) and an observers set \(\mathcal {O}=\left\{ o_k\right\} ^{\mathcal {K}}_{k=1}\). The Observation Time Order of \(\mathcal {O}\) is defined as an ordered observers sequence \({{\textbf {TO}}}_{\mathcal {O}}=\left\langle o_i\right\rangle ^\mathcal {K}_{i=1}\), in which each \(o_i\) is sorted by ascending order according to the Infection Timing information (denoted by \(t_{o_i}\)) recorded in \(o_i\). For any pair of observers \(o_i,o_{i+1}\in {{\textbf {TO}}}_{\mathcal {O}}\), there is \(t_{o_i}\le t_{o_{i+1}}\).

Definition 5

Measuring Infection Time Order. Given an arbitrary \(\mathcal {G}=\left( \mathcal {V},\mathcal {E},{\varvec{\theta }}\right) \) and an observers set \(\mathcal {O}=\left\{ o_k\right\} ^{\mathcal {K}}_{k=1}\). The Measuring Infection Time Order of \(\mathcal {O}\) on \(\mathcal {T}_{{\text {DIS}},s}\) is defined as an ordered nodes sequence \({{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}=\left\langle v_{o_k}\right\rangle ^\mathcal {K}_{k=1}\), in which each \(v_{o_k}\in {{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\) is sorted by ascending order according to the Measuring Infection Time \(t_{v_{o_k}}\) (Definition 2). For any pair of nodes \(v_{o_k},v_{o_{k+1}}\in {{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\), there is \(t_{v_{o_k}}\le t_{v_{o_{k+1}}}\).

Definition 6

Infection Time Order Similarity is defined as:

$$\begin{aligned} \varvec{\mathcal {S}}\left( {{\textbf {TO}}}_{\mathcal {O}},{{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) =\frac{1+\tau \left( {{\textbf {TO}}}_{\mathcal {O}},{{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) }{2} \end{aligned}$$
(7)

where \(\tau \) denotes the correlation coefficient defined in reference (Kendall 1938); the details can be found in Appendix A. \(\tau \) is mainly used to measure the concordance between the \({{\textbf {TO}}}_{\mathcal {O}}\) (Definition 4) and the \({{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\) (Definition 5). \(\varvec{\mathcal {S}}\left( {{\textbf {TO}}}_{\mathcal {O}},{{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \in \left[ 0,1\right] \).

The properties related to the Infection Time Similarity (Definition 3) and the Infection Time Order Similarity (Definition 6) can be found in Appendix B.

Example: In Fig. 1, Fig. 1b shows a relaxed DIS spanning tree of the network shown in Fig. 1a. \(\mathcal {O}=\left\{ o_1, o_2, o_3\right\} \), \(o_1\), \(o_2\) and \(o_3\) correspond to nodes 2, 3 and 6, respectively. According to Definition 1, for \({{\textbf {T}}}_{\mathcal {O}}\), \(t_2=4\), \(t_3=7\), \(t_6=6\). When the current root is node 1, according to Eq. 5, \(t^*=1\). According to Definition 2, for \({{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\), \(t_2=4\), \(t_3=7\), \(t_6=6\). Then, according to Definition 3, we have \(\varvec{\mathcal {S}}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) =1\). According to Definition 4, \({{\textbf {TO}}}_{\mathcal {O}}=\left\langle 2,6,3\right\rangle \). According to Definition 5, \({{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}=\left\langle 2,6,3\right\rangle \). Further, with Definition 6, we have \(\varvec{\mathcal {S}}\left( {{\textbf {TO}}}_{\mathcal {O}},{{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) =1\).

Fig. 12
figure 12

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on WS model (4). Each sub-figure is obtained by 100 runs

Fig. 13
figure 13

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on WS model (5). Each sub-figure is obtained by 100 runs

Fig. 14
figure 14

The IQR (box-plot) of error hop of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms on the WS models generated by different rewiring probabilities. The orange line and red line in each box denote the median and average error hop (also shown in Table 6), respectively (color figure online)

4.2 Locating the propagation source

By combining the Infection Time Similarity (Definition 3) and the Infection Time Order Similarity (Definition 6), the source estimator in Eq. 2 can be written as follows:

$$\begin{aligned} \begin{aligned} \hat{s}&=\mathop {\arg \max }_{s\in \mathcal {V}}\varvec{\mathcal {S}}\left( \mathcal {O}_{\mathcal {T}_{s^*}}, \mathcal {O}_{\mathcal {T}_{{\text {DIS}},s}}\right) \\&=\mathop {\arg \max }_{s\in \mathcal {V}}\left( \varvec{\mathcal {S}}\left( {{\textbf {T}}}_{\mathcal {O}}, {{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \times \varvec{\mathcal {S}}\left( {{\textbf {TO}}}_{\mathcal {O}}, {{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \right) \end{aligned} \end{aligned}$$
(8)

where \(\varvec{\mathcal {S}}\left( {{\textbf {T}}}_{\mathcal {O}},{{\textbf {T}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \) and \(\varvec{\mathcal {S}}\left( {{\textbf {TO}}}_{\mathcal {O}},{{\textbf {TO}}}_{\mathcal {T}_{{\text {DIS}},s}}\right) \) are defined in Definition 3 and Definition 6, respectively.

Based on Eq. 8, we propose a novel source locating method, termed as OSDIS algorithm, which is summarized in Algorithm 2.

Algorithm 2 analysis: The \(E\left( \mathcal {T}\right) \) declared in line 2 is an edge set. Lines 3–19 are used for traversing the \(\mathcal {G}\) by the relaxed DIS with current node s as root and recording the eligible edges into \(E\left( \mathcal {T}\right) \). Lines 3–19 require \(O\left( \vert \mathcal {V}\vert ^2\right) \) computations in the worst case. Line 20 requires \(O\left( \vert \mathcal {V}\vert +\vert \mathcal {E}\vert \right) \) computations, which can be reduced to \(O\left( \vert \mathcal {E}\vert \right) \). In lines 21–22, the \(\mathcal {T}\) obtained in line 20 will be marked as a relaxed DIS spanning tree \(\mathcal {T}_{{\text {DIS}},s}\) if and only if \(\vert E\left( \mathcal {T}\right) \vert ==\vert \mathcal {V}\vert -1\). Line 23 requires \(O\left( \mathcal {K}\right) \) computations. Line 24 requires \(O\left( \vert \mathcal {V}\vert ^2+z\vert \mathcal {V}\vert \mathcal {K}\right) \) computations (z can be found in Eq. 5). Lines 25–26 require \(O\left( \vert \mathcal {V}\vert \right) \) and \(O\left( \mathcal {K}^2\right) \) computations, respectively. Both lines 27 and 28 require \(O\left( \mathcal {K}\log \mathcal {K}\right) \) computations. Line 29 requires \(O\left( \mathcal {K}^2\right) \) computations. Finally, each node \(s\in \mathcal {V}\) will be used as root to construct different \(\mathcal {T}_{{\text {DIS}},s}\). Thus, the time complexity of Algorithm 2 is \(O\left( \vert \mathcal {V}\vert ^3+z\vert \mathcal {V}\vert ^2\mathcal {K}\right) \).

figure b
Fig. 15
figure 15

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on Dolphins network. Each sub-figure is obtained by 62 runs

Fig. 16
figure 16

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on Lesmis network. Each sub-figure is obtained by 77 runs

Fig. 17
figure 17

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on PDZBase network. Each sub-figure is obtained by 100 runs

Fig. 18
figure 18

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on USAirlines network. Each sub-figure is obtained by 100 runs

Fig. 19
figure 19

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on NetScience network. Each sub-figure is obtained by 100 runs

Fig. 20
figure 20

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on Celegans network. Each sub-figure is obtained by 100 runs

Fig. 21
figure 21

The results of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms applied on Euroroads network. Each sub-figure is obtained by 100 runs

5 Experimental evaluation

To validate the feasibility and effectiveness of the OSDIS algorithm, it is compared with other four state-of-the-art methods on a series of synthetic and real networks. The four methods include the Gauss algorithm (Pinto et al. 2012), GSSI algorithm (Tang et al. 2018), TRBS algorithm (Shen et al. 2016) and SNF algorithm (Wang and Sun 2020). Besides, since the OSDIS algorithm is based on the relaxed DIS heuristic, to show its advantage, we also define an algorithm, denoted by OSBFS, in which the relaxed DIS heuristic is replaced by the breadth-first search (BFS) heuristic. Totally, six algorithms are compared in the experiments. Their time complexity is shown in Table 2. Similar to the reference (Yang et al. 2020), the performance of a source locating algorithm is mainly evaluated by the precision (the precise locating ratio, i.e., the proportion of 0 error hop), the average error hop and the average error delay. For the precision, the higher the value is, the better the algorithm is. For the average error hop and the average error delay, the smaller the value is, the better the algorithm is.

Running environment Hardware: Dell R740 with 2 Intel\(^{\small {\textcircled {\tiny {R}}}}\)Xeon\(^{\small {\textcircled {\tiny {R}}}}\) gold 6254 CPU, 1T RAM. Software: Cygwin 3.0.7 + Eclipse Cpp2019 + igraph C 0.7.1 + Eigen/Dense (used for running algorithms). R 64 \(\times \) 3.3.3 + igraph R 1.2.1 (used for generating synthetic networks).

Datasets The six algorithms are evaluated on a series of synthetic and real networks. The synthetic networks include the scale-free (BA) model (Barabasi and Albert 1999) and the small-world (WS) model (Watts and Strogatz 1998). Totally, six BA models with different powers of the preferential attachment and five WS models with different rewiring probabilities are generated, respectively. The detailed parameters for generating these synthetic networks are shown in Table 3. The real networks are selected from different fields, which can be obtained from the Koblenz Network Collection (Kunegis 2013) and the Network Data Repository (Rossi and Ahmed 2015) for free. All the real networks are shown in Table 4. The topology properties of the used networks are shown in Table 5.

Parameters setting Given an arbitrary graph \(\mathcal {G}\), the propagation delays set \({\varvec{\theta }}\) are independent identically distributed (i.i.d) random variables with Gaussian distribution \(\mathcal {N}\left( \mu , \sigma ^2 \right) \), \(\mu \) and \(\sigma ^2\) are known (Pinto et al. 2012; Paluch et al. 2018). We set \(\mu /\sigma =4\). The diffusion model follows the one introduced in Sect. 3. To investigate the impact of different propagation ratios (denoted by \(\beta \)) on the performance of the source locating algorithms, we set \(\beta =0.25\), \(\beta =0.50\) and \(\beta =0.75\), respectively. Additionally, to compare with the GSSI algorithm (Tang et al. 2018), we set \(s^*\notin \mathcal {O}\). Generally, in reality, to save the cost, the number of the observers will be far less than the size of \(\mathcal {G}\). Thus, we randomly select \(5\%\) nodes as the observers in each network.

Table 8 The average running time ratios between other five algorithms and the OSDIS on different networks

5.1 Experimental results on the synthetic networks

Figures 2, 3, 4, 5, 6, 7 show the precision (the precise locating ratio, i.e., the proportion of 0 error hop) of the six algorithms on a series of BA models. From Figs. 2, 3, 4, 5, 6, 7, we can see that, when \(\beta =0.25\), \(\beta =0.5\) and \(\beta =0.75\), the OSDIS algorithm generally exposes the best precision on all the six BA models, i.e., the OSDIS has a higher proportion in 0 error hop than other five algorithms. Only when \(\beta =0.75\), the OSDIS is inferior to the GSSI and TRBS on BA model (1), but outperforms other three algorithms. From Table 6, we know that, when \(\beta =0.25\), \(\beta =0.5\) and \(\beta =0.75\), the OSDIS is better than other five algorithms in the average error hop on all the six BA models. Only when \(\beta =0.5\), the OSDIS is a litter inferior to OSBFS on BA model (5), but superior to other four algorithms. Meanwhile, in Fig. 8, we plot interquartile range (IQR) to show the distribution regions of error hop of the six algorithms on the BA models. Additionally, from Table 7, we can see that, when \(\beta =0.25\), \(\beta =0.5\) and \(\beta =0.75\), the OSDIS exposes a better average error delay on all the six BA models. Only when \(\beta =0.75\), the OSDIS is inferior to TRBS on BA model (1), but outperforms other four algorithms. In summary, on the BA models, the OSDIS is generally better than other five algorithms in the precision, the average error hop and average error delay.

Figures 9, 10, 11, 12, 13 show the precision (the precise locating ratio, i.e., the proportion of 0 error hop) of the six algorithms on a series of WS models. From Figs. 9, 10, 11, 12, 13, we can see that, when \(\beta =0.25\), the OSDIS is superior to other five algorithms in the precision on WS models (1)–(4) (i.e., the OSDIS has a higher proportion in 0 error hop), but only inferior to TRBS and GSSI on WS model (5). When \(\beta =0.5\) and \(\beta =0.75\), the OSDIS is generally inferior to TRBS and GSSI in the precision on WS models (1)–(5), but superior to other three algorithms. Only when \(\beta =0.5\), the OSDIS exposes the best precision on WS model (5). From Tables 6 and 7, we know that, when \(\beta =0.25\) and \(\beta =0.5\), the OSDIS is generally better than other five algorithms in the average error hop and average error delay on all the five WS models. Only when \(\beta =0.5\), the OSDIS is inferior to TRBS on WS model (2). Meanwhile, from Tables 6 and 7, we know that, when \(\beta =0.75\), the OSDIS is always inferior to GSSI and TRBS in the average error hop and average error delay, but outperforms other three algorithms. In Fig. 14, we further plot interquartile range (IQR) to show the distribution regions of error hop of the six algorithms on the WS models. In summary, on the WS models, the OSDIS is generally superior to other five algorithms in the precision when \(\beta =0.25\) and generally exposes a better performance in the average error hop and average error delay when \(\beta =0.25\) and \(\beta =0.5\). Thus, the OSDIS is better than other five algorithms in most cases. Obviously, the performance of the OSDIS on the BA models is better than on the WS models.

5.2 Experimental results on the real networks

In this subsection, we further validate the performance of the six algorithms on the real networks. Figures 15, 16, 17, 18, 19, 20, 21 show the precision (the precise locating ratio, i.e., the proportion of 0 error hop) of the six algorithms on the real networks. When \(\beta =0.25\), from Figs. 15, 16, 17, 18, 19, 20, 21, we can see that the OSDIS generally exposes the best precision (i.e., the OSDIS has a higher proportion in 0 error hop) on all the real networks, except for Euroroads network on which the OSDIS is only inferior to GSSI, but superior to other four algorithms. By combining with Tables 6 and 7, we know that the OSDIS is also superior to other five algorithms in the average error hop and the average error delay on all the real networks, except for Euroroads network on which the OSDIS is only inferior to GSSI, but superior to other four algorithms. When \(\beta =0.5\), from Figs. 15, 16, 17, 18, 19, 20, 21, we know that the OSDIS exposes the best precision (i.e., the OSDIS has a higher proportion in 0 error hop) on Dolphins, Lesmis, USAirlines and Celegans networks. Meanwhile, the OSDIS is only inferior to GSSI in the precision on PDZBase, NetScience and Euroroads networks, but superior to other four algorithms. By combining with Tables 6 and 7, we can see that the OSDIS generally outperforms other five algorithms in the average error hop and the average error delay on the real networks. Only on USAirlines network, the OSDIS is inferior to GSSI and TRBS in the average error delay. When \(\beta =0.75\), from Figs. 15, 16, 17, 18, 19, 20, 21, we know that, except for Euroroads network, the OSDIS is generally inferior to TRBS or GSSI in the precision on the real networks. By combining with Tables 6 and 7, we can see that the OSDIS outperforms other five algorithms in the average error hop and average error delay on Dolphins, Lesmis, PDZBase, NetScience and Euroroads networks, but is inferior to GSSI and TRBS on USAirlines and Celegans networks. Meanwhile, in Appendix C Figs. 22, 23, 24, 25, 26, 27, 28, we plot interquartile range (IQR) to further show the distribution regions of error hop of the six algorithms on the real networks. In summary, on the real networks, the OSDIS is generally superior to other five algorithms in the precision, the average error hop and the average error delay when \(\beta =0.25\) and \(\beta =0.5\), but inferior to GSSI and TRBS when \(\beta =0.75\). Thus, the OSDIS is better than other five algorithms in most cases.

Overall, in the precision, the average error hop and average error delay, the OSDIS outperforms other five algorithms on the BA models and in most cases is superior to other five algorithms on the WS models and real networks. In a few cases, the OSDIS is only inferior to GSSI and TRBS, but superior to other three algorithms. Thus, the OSDIS is a feasible and effective method in accurately locating the propagation source. Meanwhile, on the BA models, WS models and real networks, the OSDIS obviously outperforms the OSBFS, which indicates that the relaxed DIS heuristic outperforms the BFS heuristic in locating the propagation source.

The average error hops of the six algorithms on different networks are shown in Table 6. The average error delay is shown in Table 7. The average running time ratios between other five algorithms and the OSDIS on all networks are shown in Table 8. From Table 8, we can see that the efficiency of the OSDIS is inferior to the TRBS and SNF, similar with the OSBFS, and superior to the Gauss and GSSI.

Fig. 22
figure 22

The IQR (box-plot) of error hop of Gauss, GSSI, TRBS, SNF, OSBFS and OSDIS algorithms on Dolphins network. The orange line and red line in each box denote the median and average error hop, respectively (color figure online)

6 Conclusion

In this paper, we locate the propagation source by utilizing both of the diffusion direction information and the infection time information of the observers. We introduce a relaxed direction-induced search (DIS) to utilize the diffusion direction information of the observers to approximate the actual diffusion tree on a network. Based on the relaxed DIS, we further utilize the infection time information of the observers to define two kinds of observer-based similarity measures, including the Infection Time Similarity and the Infection Time Order Similarity. With the two kinds of similarity measures and the relaxed DIS, a source locating method termed as OSDIS is proposed. The feasibility and effectiveness of the OSDIS are validated on a series of synthetic and real networks. Meanwhile, the experimental results also show that the relaxed DIS heuristic outperforms the BFS heuristic in propagation source locating. The current OSDIS is only developed for single source locating. In the future work, we will study the OSDIS-based multi-sources locating method.