Advertisement

The VLDB Journal

, Volume 19, Issue 2, pp 283–305 | Cite as

Threshold-based probabilistic top-k dominating queries

  • Wenjie Zhang
  • Xuemin LinEmail author
  • Ying Zhang
  • Jian Pei
  • Wei Wang
Regular Paper

Abstract

Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Top-k dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.

Keywords

Uncertain objects Top k Dominating relation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying sets of possible worlds. In: SIGMOD (1987)Google Scholar
  2. 2.
    Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: a system for data, uncertainty, and lineage. In: VLDB (2006)Google Scholar
  3. 3.
    Antova, L., Koch, C., Olteanu, D.: \({10^{10^6}}\) worlds and beyond: Efficient representation and processing of incomplete information. In: ICDE (2007)Google Scholar
  4. 4.
    Barbara D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE TKDE 4(5), 487–502 (1992)Google Scholar
  5. 5.
    Bekales, G., Soliman, M.A., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases In: VLDB (2008)Google Scholar
  6. 6.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)Google Scholar
  7. 7.
    Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)Google Scholar
  8. 8.
    Chan, C., Jagadish, H., Tan, K.-L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)Google Scholar
  9. 9.
    Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)Google Scholar
  10. 10.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)Google Scholar
  11. 11.
    Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)Google Scholar
  12. 12.
    Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  13. 13.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)Google Scholar
  14. 14.
    Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)Google Scholar
  15. 15.
    Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)Google Scholar
  16. 16.
    Dubhashi, D., Panconesi, A.: Concentration of measure for the analysis of randomised algorithms, p. 12. http://citeseer.ist.psu.edu/old/dubhashi98concentration.html (1998)
  17. 17.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)Google Scholar
  18. 18.
    Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66, 614–656 (2003)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Gilks W., Richardson S., Spiegelhalter D.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)zbMATHGoogle Scholar
  20. 20.
    Goldreich, O.: Randomized Methods in Computation, Lecture 2. http://www.wisdom.weizmann.ac.il/~oded/rnd.html (2001)
  21. 21.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)Google Scholar
  22. 22.
    Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD (2008)Google Scholar
  23. 23.
    Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)Google Scholar
  24. 24.
    Imielinski T., Lipski W.: Incomplete information in relational databases. JACM 31(4), 761–791 (1984)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Kalos M.H., Whitlock P.A.: Monte Carlo Methods. Wiley Interscience, London (1986)zbMATHCrossRefGoogle Scholar
  26. 26.
    Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: DASFAA (2006)Google Scholar
  27. 27.
    Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)Google Scholar
  28. 28.
    Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: KDD (2005)Google Scholar
  29. 29.
    Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: Probview: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)CrossRefGoogle Scholar
  30. 30.
    Ngai, W.K., Kao, B., Cheng, C.K.C.R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: ICDM (2006)Google Scholar
  31. 31.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD (2001)Google Scholar
  32. 32.
    Papadias, D., Mamoulis, N., Theodoridis, Y.: Processing and optimization of multiway spatial joins using R-trees. In: PODS (1999)Google Scholar
  33. 33.
    Papadias D., Tao Y., Greg F., Seeger B.: Progressive skyline computation in database systems. ACM TODS 30(1), 41–82 (2003)CrossRefGoogle Scholar
  34. 34.
    Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skyline on uncertain data. In: VLDB (2007)Google Scholar
  35. 35.
    Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)Google Scholar
  36. 36.
    Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2005)Google Scholar
  37. 37.
    Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)Google Scholar
  38. 38.
    Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)Google Scholar
  39. 39.
    Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: ICDE (2007)Google Scholar
  40. 40.
    Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB (2001)Google Scholar
  41. 41.
    Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB (2005)Google Scholar
  42. 42.
    Yi K., Li F., Kollios G., Srivastava D.: Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng. 20(12), 1669–1682 (2008)CrossRefGoogle Scholar
  43. 43.
    Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB (2007)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Wenjie Zhang
    • 1
  • Xuemin Lin
    • 1
    Email author
  • Ying Zhang
    • 1
  • Jian Pei
    • 2
  • Wei Wang
    • 1
  1. 1.The University of New South Wales and NICTASydneyAustralia
  2. 2.Simon Fraser UniversityBurnabyCanada

Personalised recommendations