Distributed and Parallel Databases

, Volume 34, Issue 2, pp 259–287 | Cite as

Probabilistic nearest neighbor query processing on distributed uncertain data

  • Daichi Amagata
  • Yuya Sasaki
  • Takahiro Hara
  • Shojiro Nishio


A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost.


Probabilistic nearest neighbor query Uncertain databases Distributed query processing 


  1. 1.
    AbdulAzeem, Y.M., ElDesouky, A.I., Ali, H.A.: A framework for ranking uncertain distributed database. DKE 92, 1–19 (2014)CrossRefGoogle Scholar
  2. 2.
    Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)Google Scholar
  3. 3.
    Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS, pp. 279–287 (2009)Google Scholar
  4. 4.
    Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)Google Scholar
  5. 5.
    Beskales, G., Soliman, M.A., IIyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases. PVLDB 1(1), 326–339 (2008)Google Scholar
  6. 6.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)Google Scholar
  7. 7.
    Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE TKDE 22(4), 550–564 (2010)Google Scholar
  8. 8.
    Chen, J., Cheng, R., Mokbel, M., Chow, C.Y.: Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data. VLDB J. 18(5), 1219–1240 (2009)CrossRefGoogle Scholar
  9. 9.
    Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: EDBT, pp. 672–683 (2009)Google Scholar
  10. 10.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9), 1112–1127 (2004)Google Scholar
  11. 11.
    Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)Google Scholar
  12. 12.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)CrossRefGoogle Scholar
  13. 13.
    Ding, X., Jin, H.: Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE TKDE 24(8), 1448–1462 (2012)Google Scholar
  14. 14.
    Fu, T.Y., Peng, W.C., Lee, W.C.: Parallelizing itinerary-based knn query processing in wireless sensor networks. IEEE TKDE 22(5), 711–729 (2010)Google Scholar
  15. 15.
    Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388 (2009)Google Scholar
  16. 16.
    Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686 (2008)Google Scholar
  17. 17.
    Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA, pp. 337–348 (2007)Google Scholar
  18. 18.
    Li, F., Yi, K., Jestes, J.: Ranking distributed probabilistic data. In: SIGMOD, pp. 361–374 (2009)Google Scholar
  19. 19.
    Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)Google Scholar
  20. 20.
    Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: Gdps: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)CrossRefGoogle Scholar
  21. 21.
    Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE TKDE 20(6), 809–824 (2008)Google Scholar
  22. 22.
    Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)CrossRefGoogle Scholar
  23. 23.
    Lin, X., Xu, J., Hu, H., Lee, W.: Authenticating location-based skyline queries in arbitrary subspaces. IEEE TKDE 26(6), 1479–1493 (2014)Google Scholar
  24. 24.
    Liu, X., Yang, D., Ye, M., Lee, W.: U-skyline: a new skyline query for uncertain databases. IEEE TKDE 25(4), 945–960 (2013)Google Scholar
  25. 25.
    Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB, pp. 15–26 (2007)Google Scholar
  26. 26.
    Pripužić, K., Žarko, I.P., Aberer, K.: Distributed processing of continuous sliding-window k-nn queries for data stream filtering. World Wide Web 14(5–6), 465–494 (2011)Google Scholar
  27. 27.
    Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE, pp. 7–27 (2006)Google Scholar
  28. 28.
    Soliman, M.A., Ilyas, I.F., Chen-Chuan Chang, K.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)Google Scholar
  29. 29.
    Tang, M., Li, F., Phillips, J.M., Jestes, J.: Efficient threshold monitoring for distributed probabilistic data. In: ICDE, pp. 1120–1131 (2012)Google Scholar
  30. 30.
    Tang, M., Li, F., Tao, Y.: Distributed online tracking. In: SIGMOD (2015)Google Scholar
  31. 31.
    Wang, Y., Li, X., Li, X., Wang, Y.: A Survey of Queries Over Uncertain Data. Knowledge and Information Systems. Springer, London (2013)Google Scholar
  32. 32.
    Ye, M., Lee, W., Lee, D., Liu, X.: Distributed processing of probabilistic top-k queries in wireless sensor networks. IEEE TKDE 25(6), 76–91 (2013)Google Scholar
  33. 33.
    Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB, pp. 483–494 (2007)Google Scholar
  34. 34.
    Yuen, S.M., Tao, Y., Xiao, X., Pei, J., Zhang, D.: Superseding nearest neighbor search on uncertain spatial databases. IEEE TKDE 22(7), 1041–1055 (2010)Google Scholar
  35. 35.
    Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26(1), 67–126 (2009)CrossRefGoogle Scholar
  36. 36.
    Zhang, Y., Lin, X., Zhu, G., Zhang, W., Lin, Q.: Efficient rank based knn query processing over uncertain data. In: ICDE, pp. 28–39 (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Daichi Amagata
    • 1
  • Yuya Sasaki
    • 2
  • Takahiro Hara
    • 1
  • Shojiro Nishio
    • 1
  1. 1.Department of Multimedia Engineering Graduate School of Information Science and Technology Osaka UniversitySuitaJapan
  2. 2.Department of Systems and Social InformaticsGraduate School of Information Science Nagoya UniversityNagoyaJapan

Personalised recommendations