Skip to main content
Log in

On efficiently finding reverse k-nearest neighbors over uncertain graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Reverse k-nearest neighbor (\(\hbox {R}k\hbox {NN}\)) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource allocation and profile-based marketing. However, to the best of our knowledge, there is little previous work on \(\hbox {R}k\hbox {NN}\) search over uncertain graph data, even though many complex networks such as traffic networks and protein–protein interaction networks are often modeled as uncertain graphs. In this paper, we systematically study the problem of reverse k-nearest neighbor search on uncertain graphs (\(\hbox {UG-R}k\hbox {NN}\) search for short), where graph edges contain uncertainty. First, to address \(\hbox {UG-R}k\hbox {NN}\) search, we propose three effective heuristics, i.e., GSP, EGR, and PBP, which minimize the original large uncertain graph as a much smaller essential uncertain graph, cut down the number of possible graphs via the newly introduced graph conditional dominance relationship, and reduce the validation cost of data nodes in order to improve query efficiency. Then, we present an efficient algorithm, termed as SDP, to support \(\hbox {UG-R}k\hbox {NN}\) retrieval by seamlessly integrating the three heuristics together. In view of the high complexity of \(\hbox {UG-R}k\hbox {NN}\) search, we further present a novel algorithm called TripS, with the help of an adaptive stratified sampling technique. Extensive experiments using both real and synthetic graphs demonstrate the performance of our proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD, pp. 34–48 (1987)

  2. Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse \(k\)-nearest neighbor estimation. Informatik-Forschung und Entwicklung 21(3–4), 179–195 (2007)

    Article  Google Scholar 

  3. Adar, E., Ré, C.: Managing uncertainty in social networks. IEEE Data Eng. Bull. 30(2), 15–22 (2007)

    Google Scholar 

  4. Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex membership using probabilistic network reliability. Genome Res. 14(6), 1170–1175 (2004)

    Article  Google Scholar 

  5. Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)

    Google Scholar 

  6. Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Trans. Knowl. Data Eng. 22(4), 550–564 (2010)

    Article  Google Scholar 

  7. Cheema, M.A., Zhang, W., Lin, X., Zhang, Y., Li, X.: Continuous reverse \(k\) nearest neighbors queries in Euclidean space and in spatial networks. VLDB J. 21(1), 69–95 (2012)

    Article  Google Scholar 

  8. Chen, L., Wang, C.: Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Trans. Knowl. Data Eng. 22(8), 1093–1109 (2010)

    Article  Google Scholar 

  9. Choudhury, F.M., Culpepper, J.S., Sellis, T., Cao, X.: Maximizing bichromatic reverse spatial and textual \(k\) nearest neighbor queries. PVLDB 9(6), 456–467 (2016)

    Google Scholar 

  10. Emrich, T., Kriegel, H.P., Niedermayer, J., Renz, M., Suhartha, A., Züfle, A.: Exploration of Monte-Carlo based probabilistic query processing in uncertain graphs. In: CIKM, pp. 2728–2730 (2012)

  11. Gao, Y., Liu, Q., Miao, X., Yang, J.: Reverse \(k\)-nearest neighbor search in the presence of obstacles. Inf. Sci. 330, 274–292 (2016)

    Article  Google Scholar 

  12. Gao, Y., Zheng, B., Chen, G., Lee, W.C., Lee, K.C., Li, Q.: Visible reverse \(k\)-nearest neighbor query processing in spatial databases. IEEE Trans. Knowl. Data Eng. 21(9), 1314–1327 (2009)

    Article  Google Scholar 

  13. Gu, Y., Gao, C., Cong, G., Yu, G.: Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans. Knowl. Data Eng. 26(5), 1117–1130 (2014)

    Article  Google Scholar 

  14. Hung, H.J., Yang, D.N., Lee, W.C.: Social influence-aware reverse nearest neighbor search. In: DSAA, pp. 223–229. IEEE (2014)

  15. Jin, R., Liu, L., Aggarwal, C.C.: Discovering highly reliable subgraphs in uncertain graphs. In: SIGKDD, pp. 992–1000 (2011)

  16. Jin, R., Liu, L., Ding, B., Wang, H.: Distance-constraint reachability computation in uncertain graphs. PVLDB 4(9), 551–562 (2011)

    Google Scholar 

  17. Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)

    Article  Google Scholar 

  18. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD, pp. 201–212 (2000)

  19. Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)

    Article  Google Scholar 

  20. Lee, K.C., Zheng, B., Lee, W.C.: Ranked reverse nearest neighbor search. IEEE Trans. Knowl. Data Eng. 20(7), 894–910 (2008)

    Article  Google Scholar 

  21. Levin, R., Kanza, Y.: Stratified-sampling over social networks using mapreduce. In: SIGMOD, pp. 863–874 (2014)

  22. Li, G., Li, Y., Li, J., LihChyun, S., Yang, F.: Continuous reverse \(k\) nearest neighbor monitoring on moving objects in road networks. Inf. Syst. 35(8), 860–883 (2010)

    Article  Google Scholar 

  23. Li, J., Zou, Z., Gao, H.: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. 21(6), 753–777 (2012)

    Article  Google Scholar 

  24. Li, R.H., Yu, J.X., Mao, R., Jin, T.: Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: ICDE, pp. 892–903 (2014)

  25. Li, R.H., Yu, J.X., Mao, R., Jin, T.: Recursive stratified sampling: a new framework for query evaluation on uncertain graphs. IEEE Trans. Knowl. Data Eng. 28(2), 468–482 (2016)

    Article  Google Scholar 

  26. Lian, X., Chen, L.: Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J. 18(3), 787–808 (2009)

    Article  Google Scholar 

  27. Lian, X., Chen, L., Huang, Z.: Keyword search over probabilistic RDF graphs. IEEE Trans. Knowl. Data Eng. 27(5), 1246–1260 (2015)

    Article  Google Scholar 

  28. Liu, G., Wong, L., Chua, H.N.: Complex discovery from weighted PPI networks. Bioinformatics 25(15), 1891–1897 (2009)

    Article  Google Scholar 

  29. Liu, Z., Wang, C., Wang, J.: Aggregate nearest neighbor queries in uncertain graphs. World Wide Web 17(1), 161–188 (2014)

    Article  Google Scholar 

  30. Melaniphy, J.C.: The restaurant location guidebook: a comprehensive guide to selecting restaurant & quick service food locations. International Real Estate Location Institute (2007)

  31. Moustafa, W.E., Kimmig, A., Deshpande, A., Getoor, L.: Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In: ICDE, pp. 904–915 (2014)

  32. Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: ICDE, pp. 243–254 (2015)

  33. Ning, K., Ng, H.K., Srihari, S., Leong, H.W., Nesvizhskii, A.I.: Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology. BMC Bioinform. 11(1), 1 (2010)

    Article  Google Scholar 

  34. Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)

  35. Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20 (2015)

    Article  MathSciNet  Google Scholar 

  36. Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. PVLDB 3(1), 997–1008 (2010)

    Google Scholar 

  37. Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)

    Article  Google Scholar 

  38. Rice, J.: Mathematical statistics and data analysis. Cengage Learning (2006)

  39. Safar, M., Ibrahimi, D., Taniar, D.: Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimedia Syst. 15(5), 295–308 (2009)

    Article  Google Scholar 

  40. Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  41. Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD, pp. 44–53 (2000)

  42. Suratanee, A., Plaimas, K.: Identification of inflammatory bowel disease-related proteins using a reverse \(k\)-nearest neighbor search. J. Bioinf. Comput. Biol. 12(04), 1450017 (2014)

    Article  Google Scholar 

  43. Tao, Y., Papadias, D., Lian, X.: Reverse \(k\)NN search in arbitrary dimensionality. In: VLDB, pp. 744–755 (2004)

  44. Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)

    Article  Google Scholar 

  45. Wackerly, D., Mendenhall, W., Scheaffer, R.: Mathematical statistics with applications. Nelson Education (2007)

  46. Wang, S., Cheema, M.A., Lin, X.: Efficiently monitoring reverse \(k\)-nearest neighbors in spatial networks. Comput. J. 58(1), 40–56 (2015)

    Article  Google Scholar 

  47. Wang, S., Cheema, M.A., Lin, X., Zhang, Y., Liu, D.: Efficiently computing reverse \(k\) furthest neighbors. In: ICDE, pp. 1110–1121 (2016)

  48. Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: Finch: Evaluating reverse \(k\)-nearest-neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)

    Google Scholar 

  49. Xu, C., Gu, Y., Chen, L., Qiao, J., Yu, G.: Interval reverse nearest neighbor queries on uncertain data with markov correlations. In: ICDE, pp. 170–181 (2013)

  50. Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse \(k\) nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)

    Google Scholar 

  51. Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: Slice: Reviving regions-based pruning for reverse \(k\) nearest neighbors queries. In: ICDE, pp. 760–771 (2014)

  52. Yiu, M.L., Papadias, D., Mamoulis, N., Tao, Y.: Reverse nearest neighbors in large graphs. IEEE Trans. Knowl. Data Eng. 18(4), 540–553 (2006)

    Article  Google Scholar 

  53. Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. PVLDB 5(9), 800–811 (2012)

    Google Scholar 

  54. Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient keyword search on uncertain graph data. IEEE Trans. Knowl. Data Eng. 25(12), 2767–2779 (2013)

    Article  Google Scholar 

  55. Yuan, Y., Wang, G., Chen, L., Wang, H.: Graph similarity search on large uncertain graph databases. VLDB J. 24(2), 271–296 (2015)

    Article  Google Scholar 

  56. Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)

    Google Scholar 

  57. Zhang, W., Lin, X., Zhang, Y., Zhu, K., Zhu, G.: Efficient probabilistic supergraph search. IEEE Trans. Knowl. Data Eng. 28(4), 965–978 (2016)

  58. Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-\(k\) maximal cliques in an uncertain graph. In: ICDE, pp. 649–652 (2010)

  59. Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the 973 Program of China Grant Nos. 2013CB336500 and 2015C B352502, NSFC Grant Nos. 61522208, 61379033, and 61472 348, and the NSFC-Zhejiang Joint Fund Grant No. U1609217. We also would like to express our gratitude to some anonymous reviewers for their giving valuable and helpful comments to improve the technical quality and presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Chen.

Appendices

Appendix

A Proof of Theorem 1

Proof

Let \(t_i\) be the rate of the number of samples (\(N_i\)) to the number of population (\(L_i\)) for stratum i, as defined in Eq. 10. Based on the theorem of mathematical analysis  [38], for stratum i, the variance of the sample \(S_i^2\) and the variance of the simple random sampling \({\text{ Var }}(\hat{F}_i)\) are given in Eqs. 11 and 12, respectively.

$$\begin{aligned}&t_i = \frac{N_i}{L_i} \end{aligned}$$
(10)
$$\begin{aligned}&S_i^2 = \frac{\sum _{j=1}^{L_i}(F_{ij}-\hat{F}_i)^2}{L_i-1} \end{aligned}$$
(11)
$$\begin{aligned}&{\text{ Var }}(\hat{F}_i) = \frac{1-t_i}{N_i}\cdot S_i^2 \end{aligned}$$
(12)

Based on these three equation, let T represent the total number of strara in stratified sampling, and \(\hat{F}_i\) denote the estimator of the true value F in the stratum i, where \(\pi _i=L_i/L\). Then, we have

\(\square \)

B Proof of Lemma 10

Proof

Combining the equation that \({\text{ Var }}(\hat{F})=\sum _{i=1}^{T} \pi _i^{2}{\text{ Var }}(\hat{F}_{i})\)  [38], Thus, we have

\(\square \)

C Proof of Theorem 2

Proof

On the one hand,

On the other hand, \({\text{ Var }}(\hat{F}_\mathrm{MC}) = (\frac{1}{N}-\frac{1}{L})S^2\),

Hence, we have

Therefore,

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Miao, X., Chen, G. et al. On efficiently finding reverse k-nearest neighbors over uncertain graphs. The VLDB Journal 26, 467–492 (2017). https://doi.org/10.1007/s00778-017-0460-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0460-y

Keywords

Navigation