Towards efficient top-k reliability search on uncertain graphs
- 329 Downloads
Abstract
Uncertain graph has been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper investigates on a new problem with broad real-world applications, the top-k reliability search problem on uncertain graphs, that is, finding the k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stores them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. In addition, we generalize the top-k reliability search problem from single-source case to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Moreover, we define two types of the reverse top-k reliability search problems with different semantics on uncertain graphs. We propose appropriate solutions for both of them. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1–2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph.
Keywords
Uncertain graphs Reliability search BFS sharing Offline samplingNotes
Acknowledgments
This work was partially supported by the 973 Program of China (No. 2012CB036202), the National Natural Science Foundation of China (Nos. 61532015 and 61173023) and the HIT-Tencent Open Research Fund.
References
- 1.Adar E, Ré C (2007) Managing uncertainty in social networks. IEEE Data Eng Bull 30(2):15–22Google Scholar
- 2.Aggarwal CC (2010) Managing and mining uncertain data, vol 35. Springer, BerlinCrossRefMATHGoogle Scholar
- 3.Aggarwal K, Misra K, Gupta J (1975) Reliability evaluation a comparative study of different techniques. Microelectron Reliab 14(1):49–56CrossRefGoogle Scholar
- 4.Aggarwal K, Rai S (1981) Reliability evaluation in computer-communication networks. IEEE Trans Reliab 1:32–35CrossRefMATHGoogle Scholar
- 5.Asthana S, King OD, Gibbons FD, Roth FP (2004) Predicting protein complex membership using probabilistic network reliability. Genome Res 14(6):1170–1175CrossRefGoogle Scholar
- 6.Bader DA, Madduri K (2006) Gtgraph: a synthetic graph generator suite. http://www.cse.psu.edu/~kxm85/software/GTgraph/gen
- 7.Condie T, Conway N, Alvaro P, Hellerstein JM , Gerth J, Talbot J, Elmeleegy K, Sears R (2010) Online aggregation and continuous query support in mapreduce. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’10), pp 1115–1118Google Scholar
- 8.Jin R, Liu L, Aggarwal C C (2011) Discovering highly reliable subgraphs in uncertain graphs. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’11), pp 992–1000Google Scholar
- 9.Jin R, Liu L, Ding B, Wang H (2011) Distance-constraint reachability computation in uncertain graphs. Proc VLDB Endow (PVLDB) 4(9):551–562CrossRefGoogle Scholar
- 10.Khan A, Bonchi F, Gionis A, Gullo F (2014) Fast reliability search in uncertain graphs. In: Proceedings of the 17th international conference on extending database technology (EDBT’14), pp 535–546Google Scholar
- 11.Li RH, Yu JX, Mao R, Jin T (2014) Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: Proceedings of the IEEE 30th international conference on data engineering (ICDE’14), pp 892–903Google Scholar
- 12.Liu L, Jin R, Aggarwal C, Shen Y (2012) Reliable clustering on uncertain graphs.In: Proceedings of the IEEE 12th international conference on data mining (ICDM’12), pp 459–468Google Scholar
- 13.Potamias M, Bonchi F, Gionis A, Kollios G (2012) K-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1–2):997–1008Google Scholar
- 14.Schmidt JP, Siegel A, Srinivasan A (1995) Chernoff–Hoeffding bounds for applications with limited independence. SIAM J Discrete Math 8(2):223–250MathSciNetCrossRefMATHGoogle Scholar
- 15.Sevon P, Eronen L, Hintsanen P, Kulovesi K, Toivonen H (2006) Link discovery in graphs derived from biological databases. In: Leser U, Naumann F, Eckman B (eds) Data integration in the life sciences. Springer, BerlinGoogle Scholar
- 16.Valiant LG (1979) The complexity of enumeration and reliability problems. SIAM J Comput 8(3):410–421MathSciNetCrossRefMATHGoogle Scholar
- 17.WepiwÉ G, Simeonov PL (2006) Hipeer: a highly reliable P2P system. IEICE Trans Inf Syst 89(2):570–580CrossRefGoogle Scholar
- 18.Yu AW, Mamoulis N, Su H (2014) Reverse top-k search using random walk with restart. Proc VLDB Endow (PVLDB) 7(5):401–412CrossRefGoogle Scholar
- 19.Yuan Y, Chen L, Wang G (2010) Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In: Proceedings of the 15th database systems for advanced applications (DASFAA’10), pp 155–170Google Scholar
- 20.Yuan Y, Wang G, Wang H, Chen L (2011) Efficient subgraph search over large uncertain graphs. Proc VLDB Endow (PVLDB) 4(11):876–886Google Scholar
- 21.Zhao B, Wang J, Li M, Wu F, Pan Y (2014) Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans Comput Biol Bioinf 11(3):486–497CrossRefGoogle Scholar
- 22.Zhu R, Zou Z, Li J (2015) Top-k reliability search on uncertain graphs. In: Proceedings of the 15th IEEE international conference on data mining (ICDM’15), pp 659–668Google Scholar
- 23.Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM’09), pp 583–592Google Scholar
- 24.Zou Z, Li J, Gao H, Zhang S (2010) Finding top-k maximal cliques in an uncertain graph. In: Proceedings of the 26th IEEE international conference on data engineering (ICDE’10), pp 649–652Google Scholar
- 25.Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22(9):1203–1218CrossRefGoogle Scholar