On efficiently finding reverse k-nearest neighbors over uncertain graphs

Gao, Yunjun; Miao, Xiaoye; Chen, Gang; Zheng, Baihua; Cai, Deng; Cui, Huiyong

doi:10.1007/s00778-017-0460-y

On efficiently finding reverse k-nearest neighbors over uncertain graphs

Regular Paper
Published: 17 March 2017

Volume 26, pages 467–492, (2017)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yunjun Gao^1,2,3,
Xiaoye Miao¹,
Gang Chen^1,2,
Baihua Zheng⁴,
Deng Cai^1,3 &
…
Huiyong Cui¹

925 Accesses
18 Citations
Explore all metrics

Abstract

Reverse k-nearest neighbor ($\hbox {R}k\hbox {NN}$) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource allocation and profile-based marketing. However, to the best of our knowledge, there is little previous work on $\hbox {R}k\hbox {NN}$ search over uncertain graph data, even though many complex networks such as traffic networks and protein–protein interaction networks are often modeled as uncertain graphs. In this paper, we systematically study the problem of reverse k-nearest neighbor search on uncertain graphs ($\hbox {UG-R}k\hbox {NN}$ search for short), where graph edges contain uncertainty. First, to address $\hbox {UG-R}k\hbox {NN}$ search, we propose three effective heuristics, i.e., GSP, EGR, and PBP, which minimize the original large uncertain graph as a much smaller essential uncertain graph, cut down the number of possible graphs via the newly introduced graph conditional dominance relationship, and reduce the validation cost of data nodes in order to improve query efficiency. Then, we present an efficient algorithm, termed as SDP, to support $\hbox {UG-R}k\hbox {NN}$ retrieval by seamlessly integrating the three heuristics together. In view of the high complexity of $\hbox {UG-R}k\hbox {NN}$ search, we further present a novel algorithm called TripS, with the help of an adaptive stratified sampling technique. Extensive experiments using both real and synthetic graphs demonstrate the performance of our proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

A survey on visualization approaches for exploring association relationships in graph data

Article 02 April 2019

Deep graph similarity learning: a survey

Article Open access 24 March 2021

References

Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD, pp. 34–48 (1987)
Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse $k$-nearest neighbor estimation. Informatik-Forschung und Entwicklung 21(3–4), 179–195 (2007)
Article Google Scholar
Adar, E., Ré, C.: Managing uncertainty in social networks. IEEE Data Eng. Bull. 30(2), 15–22 (2007)
Google Scholar
Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex membership using probabilistic network reliability. Genome Res. 14(6), 1170–1175 (2004)
Article Google Scholar
Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)
Google Scholar
Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Trans. Knowl. Data Eng. 22(4), 550–564 (2010)
Article Google Scholar
Cheema, M.A., Zhang, W., Lin, X., Zhang, Y., Li, X.: Continuous reverse $k$ nearest neighbors queries in Euclidean space and in spatial networks. VLDB J. 21(1), 69–95 (2012)
Article Google Scholar
Chen, L., Wang, C.: Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Trans. Knowl. Data Eng. 22(8), 1093–1109 (2010)
Article Google Scholar
Choudhury, F.M., Culpepper, J.S., Sellis, T., Cao, X.: Maximizing bichromatic reverse spatial and textual $k$ nearest neighbor queries. PVLDB 9(6), 456–467 (2016)
Google Scholar
Emrich, T., Kriegel, H.P., Niedermayer, J., Renz, M., Suhartha, A., Züfle, A.: Exploration of Monte-Carlo based probabilistic query processing in uncertain graphs. In: CIKM, pp. 2728–2730 (2012)
Gao, Y., Liu, Q., Miao, X., Yang, J.: Reverse $k$-nearest neighbor search in the presence of obstacles. Inf. Sci. 330, 274–292 (2016)
Article Google Scholar
Gao, Y., Zheng, B., Chen, G., Lee, W.C., Lee, K.C., Li, Q.: Visible reverse $k$-nearest neighbor query processing in spatial databases. IEEE Trans. Knowl. Data Eng. 21(9), 1314–1327 (2009)
Article Google Scholar
Gu, Y., Gao, C., Cong, G., Yu, G.: Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans. Knowl. Data Eng. 26(5), 1117–1130 (2014)
Article Google Scholar
Hung, H.J., Yang, D.N., Lee, W.C.: Social influence-aware reverse nearest neighbor search. In: DSAA, pp. 223–229. IEEE (2014)
Jin, R., Liu, L., Aggarwal, C.C.: Discovering highly reliable subgraphs in uncertain graphs. In: SIGKDD, pp. 992–1000 (2011)
Jin, R., Liu, L., Ding, B., Wang, H.: Distance-constraint reachability computation in uncertain graphs. PVLDB 4(9), 551–562 (2011)
Google Scholar
Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)
Article Google Scholar
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD, pp. 201–212 (2000)
Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
Article Google Scholar
Lee, K.C., Zheng, B., Lee, W.C.: Ranked reverse nearest neighbor search. IEEE Trans. Knowl. Data Eng. 20(7), 894–910 (2008)
Article Google Scholar
Levin, R., Kanza, Y.: Stratified-sampling over social networks using mapreduce. In: SIGMOD, pp. 863–874 (2014)
Li, G., Li, Y., Li, J., LihChyun, S., Yang, F.: Continuous reverse $k$ nearest neighbor monitoring on moving objects in road networks. Inf. Syst. 35(8), 860–883 (2010)
Article Google Scholar
Li, J., Zou, Z., Gao, H.: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. 21(6), 753–777 (2012)
Article Google Scholar
Li, R.H., Yu, J.X., Mao, R., Jin, T.: Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: ICDE, pp. 892–903 (2014)
Li, R.H., Yu, J.X., Mao, R., Jin, T.: Recursive stratified sampling: a new framework for query evaluation on uncertain graphs. IEEE Trans. Knowl. Data Eng. 28(2), 468–482 (2016)
Article Google Scholar
Lian, X., Chen, L.: Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J. 18(3), 787–808 (2009)
Article Google Scholar
Lian, X., Chen, L., Huang, Z.: Keyword search over probabilistic RDF graphs. IEEE Trans. Knowl. Data Eng. 27(5), 1246–1260 (2015)
Article Google Scholar
Liu, G., Wong, L., Chua, H.N.: Complex discovery from weighted PPI networks. Bioinformatics 25(15), 1891–1897 (2009)
Article Google Scholar
Liu, Z., Wang, C., Wang, J.: Aggregate nearest neighbor queries in uncertain graphs. World Wide Web 17(1), 161–188 (2014)
Article Google Scholar
Melaniphy, J.C.: The restaurant location guidebook: a comprehensive guide to selecting restaurant & quick service food locations. International Real Estate Location Institute (2007)
Moustafa, W.E., Kimmig, A., Deshpande, A., Getoor, L.: Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In: ICDE, pp. 904–915 (2014)
Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: ICDE, pp. 243–254 (2015)
Ning, K., Ng, H.K., Srihari, S., Leong, H.W., Nesvizhskii, A.I.: Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology. BMC Bioinform. 11(1), 1 (2010)
Article Google Scholar
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)
Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20 (2015)
Article MathSciNet Google Scholar
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. PVLDB 3(1), 997–1008 (2010)
Google Scholar
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)
Article Google Scholar
Rice, J.: Mathematical statistics and data analysis. Cengage Learning (2006)
Safar, M., Ibrahimi, D., Taniar, D.: Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimedia Syst. 15(5), 295–308 (2009)
Article Google Scholar
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)
Article Google Scholar
Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD, pp. 44–53 (2000)
Suratanee, A., Plaimas, K.: Identification of inflammatory bowel disease-related proteins using a reverse $k$-nearest neighbor search. J. Bioinf. Comput. Biol. 12(04), 1450017 (2014)
Article Google Scholar
Tao, Y., Papadias, D., Lian, X.: Reverse $k$NN search in arbitrary dimensionality. In: VLDB, pp. 744–755 (2004)
Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)
Article Google Scholar
Wackerly, D., Mendenhall, W., Scheaffer, R.: Mathematical statistics with applications. Nelson Education (2007)
Wang, S., Cheema, M.A., Lin, X.: Efficiently monitoring reverse $k$-nearest neighbors in spatial networks. Comput. J. 58(1), 40–56 (2015)
Article Google Scholar
Wang, S., Cheema, M.A., Lin, X., Zhang, Y., Liu, D.: Efficiently computing reverse $k$ furthest neighbors. In: ICDE, pp. 1110–1121 (2016)
Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: Finch: Evaluating reverse $k$-nearest-neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)
Google Scholar
Xu, C., Gu, Y., Chen, L., Qiao, J., Yu, G.: Interval reverse nearest neighbor queries on uncertain data with markov correlations. In: ICDE, pp. 170–181 (2013)
Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse $k$ nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)
Google Scholar
Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: Slice: Reviving regions-based pruning for reverse $k$ nearest neighbors queries. In: ICDE, pp. 760–771 (2014)
Yiu, M.L., Papadias, D., Mamoulis, N., Tao, Y.: Reverse nearest neighbors in large graphs. IEEE Trans. Knowl. Data Eng. 18(4), 540–553 (2006)
Article Google Scholar
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. PVLDB 5(9), 800–811 (2012)
Google Scholar
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. IEEE Trans. Knowl. Data Eng. 25(12), 2767–2779 (2013)
Article Google Scholar
Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. 24(2), 271–296 (2015)
Article Google Scholar
Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)
Google Scholar
Zhang, W., Lin, X., Zhang, Y., Zhu, K., Zhu, G.: Efficient probabilistic supergraph search. IEEE Trans. Knowl. Data Eng. 28(4), 965–978 (2016)
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-$k$ maximal cliques in an uncertain graph. In: ICDE, pp. 649–652 (2010)
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the 973 Program of China Grant Nos. 2013CB336500 and 2015C B352502, NSFC Grant Nos. 61522208, 61379033, and 61472 348, and the NSFC-Zhejiang Joint Fund Grant No. U1609217. We also would like to express our gratitude to some anonymous reviewers for their giving valuable and helpful comments to improve the technical quality and presentation of this paper.

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Yunjun Gao, Xiaoye Miao, Gang Chen, Deng Cai & Huiyong Cui
The Key Lab of Big Data Intelligent Computing of Zhejiang Province, Zhejiang University, Hangzhou, China
Yunjun Gao & Gang Chen
State Key Laboratory of CAD&CG, College of Computer Science, Zhejiang University, Hangzhou, China
Yunjun Gao & Deng Cai
School of Information Systems, Singapore Management University, Singapore, Singapore
Baihua Zheng

Authors

Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye Miao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Baihua Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Deng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Huiyong Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Chen.

Appendices

Appendix

A Proof of Theorem 1

Proof

Let $t_i$ be the rate of the number of samples ($N_i$) to the number of population ($L_i$) for stratum i, as defined in Eq. 10. Based on the theorem of mathematical analysis [38], for stratum i, the variance of the sample $S_i^2$ and the variance of the simple random sampling ${\text{ Var }}(\hat{F}_i)$ are given in Eqs. 11 and 12, respectively.

$$\begin{aligned}&t_i = \frac{N_i}{L_i} \end{aligned}$$

(10)

$$\begin{aligned}&S_i^2 = \frac{\sum _{j=1}^{L_i}(F_{ij}-\hat{F}_i)^2}{L_i-1} \end{aligned}$$

(11)

$$\begin{aligned}&{\text{ Var }}(\hat{F}_i) = \frac{1-t_i}{N_i}\cdot S_i^2 \end{aligned}$$

(12)

Based on these three equation, let T represent the total number of strara in stratified sampling, and $\hat{F}_i$ denote the estimator of the true value F in the stratum i, where $\pi _i=L_i/L$. Then, we have

$\square $

B Proof of Lemma 10

Proof

Combining the equation that ${\text{ Var }}(\hat{F})=\sum _{i=1}^{T} \pi _i^{2}{\text{ Var }}(\hat{F}_{i})$ [38], Thus, we have

$\square $

C Proof of Theorem 2

Proof

On the one hand,

On the other hand, ${\text{ Var }}(\hat{F}_\mathrm{MC}) = (\frac{1}{N}-\frac{1}{L})S^2$,

Hence, we have

Therefore,

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, Y., Miao, X., Chen, G. et al. On efficiently finding reverse k-nearest neighbors over uncertain graphs. The VLDB Journal 26, 467–492 (2017). https://doi.org/10.1007/s00778-017-0460-y

Download citation

Received: 13 April 2016
Revised: 01 February 2017
Accepted: 24 February 2017
Published: 17 March 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s00778-017-0460-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On efficiently finding reverse k-nearest neighbors over uncertain graphs

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges