Abstract
Reverse k-nearest neighbor (\(\hbox {R}k\hbox {NN}\)) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource allocation and profile-based marketing. However, to the best of our knowledge, there is little previous work on \(\hbox {R}k\hbox {NN}\) search over uncertain graph data, even though many complex networks such as traffic networks and protein–protein interaction networks are often modeled as uncertain graphs. In this paper, we systematically study the problem of reverse k-nearest neighbor search on uncertain graphs (\(\hbox {UG-R}k\hbox {NN}\) search for short), where graph edges contain uncertainty. First, to address \(\hbox {UG-R}k\hbox {NN}\) search, we propose three effective heuristics, i.e., GSP, EGR, and PBP, which minimize the original large uncertain graph as a much smaller essential uncertain graph, cut down the number of possible graphs via the newly introduced graph conditional dominance relationship, and reduce the validation cost of data nodes in order to improve query efficiency. Then, we present an efficient algorithm, termed as SDP, to support \(\hbox {UG-R}k\hbox {NN}\) retrieval by seamlessly integrating the three heuristics together. In view of the high complexity of \(\hbox {UG-R}k\hbox {NN}\) search, we further present a novel algorithm called TripS, with the help of an adaptive stratified sampling technique. Extensive experiments using both real and synthetic graphs demonstrate the performance of our proposed algorithms.
Similar content being viewed by others
References
Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD, pp. 34–48 (1987)
Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse \(k\)-nearest neighbor estimation. Informatik-Forschung und Entwicklung 21(3–4), 179–195 (2007)
Adar, E., Ré, C.: Managing uncertainty in social networks. IEEE Data Eng. Bull. 30(2), 15–22 (2007)
Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex membership using probabilistic network reliability. Genome Res. 14(6), 1170–1175 (2004)
Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)
Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Trans. Knowl. Data Eng. 22(4), 550–564 (2010)
Cheema, M.A., Zhang, W., Lin, X., Zhang, Y., Li, X.: Continuous reverse \(k\) nearest neighbors queries in Euclidean space and in spatial networks. VLDB J. 21(1), 69–95 (2012)
Chen, L., Wang, C.: Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Trans. Knowl. Data Eng. 22(8), 1093–1109 (2010)
Choudhury, F.M., Culpepper, J.S., Sellis, T., Cao, X.: Maximizing bichromatic reverse spatial and textual \(k\) nearest neighbor queries. PVLDB 9(6), 456–467 (2016)
Emrich, T., Kriegel, H.P., Niedermayer, J., Renz, M., Suhartha, A., Züfle, A.: Exploration of Monte-Carlo based probabilistic query processing in uncertain graphs. In: CIKM, pp. 2728–2730 (2012)
Gao, Y., Liu, Q., Miao, X., Yang, J.: Reverse \(k\)-nearest neighbor search in the presence of obstacles. Inf. Sci. 330, 274–292 (2016)
Gao, Y., Zheng, B., Chen, G., Lee, W.C., Lee, K.C., Li, Q.: Visible reverse \(k\)-nearest neighbor query processing in spatial databases. IEEE Trans. Knowl. Data Eng. 21(9), 1314–1327 (2009)
Gu, Y., Gao, C., Cong, G., Yu, G.: Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans. Knowl. Data Eng. 26(5), 1117–1130 (2014)
Hung, H.J., Yang, D.N., Lee, W.C.: Social influence-aware reverse nearest neighbor search. In: DSAA, pp. 223–229. IEEE (2014)
Jin, R., Liu, L., Aggarwal, C.C.: Discovering highly reliable subgraphs in uncertain graphs. In: SIGKDD, pp. 992–1000 (2011)
Jin, R., Liu, L., Ding, B., Wang, H.: Distance-constraint reachability computation in uncertain graphs. PVLDB 4(9), 551–562 (2011)
Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD, pp. 201–212 (2000)
Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
Lee, K.C., Zheng, B., Lee, W.C.: Ranked reverse nearest neighbor search. IEEE Trans. Knowl. Data Eng. 20(7), 894–910 (2008)
Levin, R., Kanza, Y.: Stratified-sampling over social networks using mapreduce. In: SIGMOD, pp. 863–874 (2014)
Li, G., Li, Y., Li, J., LihChyun, S., Yang, F.: Continuous reverse \(k\) nearest neighbor monitoring on moving objects in road networks. Inf. Syst. 35(8), 860–883 (2010)
Li, J., Zou, Z., Gao, H.: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. 21(6), 753–777 (2012)
Li, R.H., Yu, J.X., Mao, R., Jin, T.: Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: ICDE, pp. 892–903 (2014)
Li, R.H., Yu, J.X., Mao, R., Jin, T.: Recursive stratified sampling: a new framework for query evaluation on uncertain graphs. IEEE Trans. Knowl. Data Eng. 28(2), 468–482 (2016)
Lian, X., Chen, L.: Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J. 18(3), 787–808 (2009)
Lian, X., Chen, L., Huang, Z.: Keyword search over probabilistic RDF graphs. IEEE Trans. Knowl. Data Eng. 27(5), 1246–1260 (2015)
Liu, G., Wong, L., Chua, H.N.: Complex discovery from weighted PPI networks. Bioinformatics 25(15), 1891–1897 (2009)
Liu, Z., Wang, C., Wang, J.: Aggregate nearest neighbor queries in uncertain graphs. World Wide Web 17(1), 161–188 (2014)
Melaniphy, J.C.: The restaurant location guidebook: a comprehensive guide to selecting restaurant & quick service food locations. International Real Estate Location Institute (2007)
Moustafa, W.E., Kimmig, A., Deshpande, A., Getoor, L.: Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In: ICDE, pp. 904–915 (2014)
Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: ICDE, pp. 243–254 (2015)
Ning, K., Ng, H.K., Srihari, S., Leong, H.W., Nesvizhskii, A.I.: Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology. BMC Bioinform. 11(1), 1 (2010)
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20 (2015)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. PVLDB 3(1), 997–1008 (2010)
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)
Rice, J.: Mathematical statistics and data analysis. Cengage Learning (2006)
Safar, M., Ibrahimi, D., Taniar, D.: Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimedia Syst. 15(5), 295–308 (2009)
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)
Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD, pp. 44–53 (2000)
Suratanee, A., Plaimas, K.: Identification of inflammatory bowel disease-related proteins using a reverse \(k\)-nearest neighbor search. J. Bioinf. Comput. Biol. 12(04), 1450017 (2014)
Tao, Y., Papadias, D., Lian, X.: Reverse \(k\)NN search in arbitrary dimensionality. In: VLDB, pp. 744–755 (2004)
Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)
Wackerly, D., Mendenhall, W., Scheaffer, R.: Mathematical statistics with applications. Nelson Education (2007)
Wang, S., Cheema, M.A., Lin, X.: Efficiently monitoring reverse \(k\)-nearest neighbors in spatial networks. Comput. J. 58(1), 40–56 (2015)
Wang, S., Cheema, M.A., Lin, X., Zhang, Y., Liu, D.: Efficiently computing reverse \(k\) furthest neighbors. In: ICDE, pp. 1110–1121 (2016)
Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: Finch: Evaluating reverse \(k\)-nearest-neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)
Xu, C., Gu, Y., Chen, L., Qiao, J., Yu, G.: Interval reverse nearest neighbor queries on uncertain data with markov correlations. In: ICDE, pp. 170–181 (2013)
Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse \(k\) nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)
Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: Slice: Reviving regions-based pruning for reverse \(k\) nearest neighbors queries. In: ICDE, pp. 760–771 (2014)
Yiu, M.L., Papadias, D., Mamoulis, N., Tao, Y.: Reverse nearest neighbors in large graphs. IEEE Trans. Knowl. Data Eng. 18(4), 540–553 (2006)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. PVLDB 5(9), 800–811 (2012)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. IEEE Trans. Knowl. Data Eng. 25(12), 2767–2779 (2013)
Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. 24(2), 271–296 (2015)
Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)
Zhang, W., Lin, X., Zhang, Y., Zhu, K., Zhu, G.: Efficient probabilistic supergraph search. IEEE Trans. Knowl. Data Eng. 28(4), 965–978 (2016)
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-\(k\) maximal cliques in an uncertain graph. In: ICDE, pp. 649–652 (2010)
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Acknowledgements
This work was supported in part by the 973 Program of China Grant Nos. 2013CB336500 and 2015C B352502, NSFC Grant Nos. 61522208, 61379033, and 61472 348, and the NSFC-Zhejiang Joint Fund Grant No. U1609217. We also would like to express our gratitude to some anonymous reviewers for their giving valuable and helpful comments to improve the technical quality and presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix
A Proof of Theorem 1
Proof
Let \(t_i\) be the rate of the number of samples (\(N_i\)) to the number of population (\(L_i\)) for stratum i, as defined in Eq. 10. Based on the theorem of mathematical analysis [38], for stratum i, the variance of the sample \(S_i^2\) and the variance of the simple random sampling \({\text{ Var }}(\hat{F}_i)\) are given in Eqs. 11 and 12, respectively.
Based on these three equation, let T represent the total number of strara in stratified sampling, and \(\hat{F}_i\) denote the estimator of the true value F in the stratum i, where \(\pi _i=L_i/L\). Then, we have
\(\square \)
B Proof of Lemma 10
Proof
Combining the equation that \({\text{ Var }}(\hat{F})=\sum _{i=1}^{T} \pi _i^{2}{\text{ Var }}(\hat{F}_{i})\) [38], Thus, we have
\(\square \)
C Proof of Theorem 2
Proof
On the one hand,
On the other hand, \({\text{ Var }}(\hat{F}_\mathrm{MC}) = (\frac{1}{N}-\frac{1}{L})S^2\),
Hence, we have
Therefore,
\(\square \)
Rights and permissions
About this article
Cite this article
Gao, Y., Miao, X., Chen, G. et al. On efficiently finding reverse k-nearest neighbors over uncertain graphs. The VLDB Journal 26, 467–492 (2017). https://doi.org/10.1007/s00778-017-0460-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-017-0460-y