Skip to main content
Log in

Threshold-based probabilistic top-k dominating queries

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Top-k dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying sets of possible worlds. In: SIGMOD (1987)

  2. Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: a system for data, uncertainty, and lineage. In: VLDB (2006)

  3. Antova, L., Koch, C., Olteanu, D.: \({10^{10^6}}\) worlds and beyond: Efficient representation and processing of incomplete information. In: ICDE (2007)

  4. Barbara D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE TKDE 4(5), 487–502 (1992)

    Google Scholar 

  5. Bekales, G., Soliman, M.A., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases In: VLDB (2008)

  6. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)

  7. Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)

  8. Chan, C., Jagadish, H., Tan, K.-L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)

  9. Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)

  10. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)

  11. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)

  12. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  13. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)

  14. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)

  15. Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)

  16. Dubhashi, D., Panconesi, A.: Concentration of measure for the analysis of randomised algorithms, p. 12. http://citeseer.ist.psu.edu/old/dubhashi98concentration.html (1998)

  17. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)

  18. Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66, 614–656 (2003)

    MATH  MathSciNet  Google Scholar 

  19. Gilks W., Richardson S., Spiegelhalter D.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)

    MATH  Google Scholar 

  20. Goldreich, O.: Randomized Methods in Computation, Lecture 2. http://www.wisdom.weizmann.ac.il/~oded/rnd.html (2001)

  21. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)

  22. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD (2008)

  23. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)

  24. Imielinski T., Lipski W.: Incomplete information in relational databases. JACM 31(4), 761–791 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  25. Kalos M.H., Whitlock P.A.: Monte Carlo Methods. Wiley Interscience, London (1986)

    Book  MATH  Google Scholar 

  26. Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: DASFAA (2006)

  27. Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)

  28. Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: KDD (2005)

  29. Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: Probview: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)

    Article  Google Scholar 

  30. Ngai, W.K., Kao, B., Cheng, C.K.C.R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: ICDM (2006)

  31. Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD (2001)

  32. Papadias, D., Mamoulis, N., Theodoridis, Y.: Processing and optimization of multiway spatial joins using R-trees. In: PODS (1999)

  33. Papadias D., Tao Y., Greg F., Seeger B.: Progressive skyline computation in database systems. ACM TODS 30(1), 41–82 (2003)

    Article  Google Scholar 

  34. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skyline on uncertain data. In: VLDB (2007)

  35. Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)

  36. Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2005)

  37. Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)

  38. Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)

  39. Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: ICDE (2007)

  40. Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB (2001)

  41. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB (2005)

  42. Yi K., Li F., Kollios G., Srivastava D.: Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng. 20(12), 1669–1682 (2008)

    Article  Google Scholar 

  43. Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuemin Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Lin, X., Zhang, Y. et al. Threshold-based probabilistic top-k dominating queries. The VLDB Journal 19, 283–305 (2010). https://doi.org/10.1007/s00778-009-0162-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0162-1

Keywords

Navigation