Threshold-based probabilistic top-k dominating queries

Zhang, Wenjie; Lin, Xuemin; Zhang, Ying; Pei, Jian; Wang, Wei

doi:10.1007/s00778-009-0162-1

Threshold-based probabilistic top-k dominating queries

Regular Paper
Published: 04 September 2009

Volume 19, pages 283–305, (2010)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Wenjie Zhang¹,
Xuemin Lin¹,
Ying Zhang¹,
Jian Pei² &
…
Wei Wang¹

230 Accesses
45 Citations
Explore all metrics

Abstract

Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Top-k dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying sets of possible worlds. In: SIGMOD (1987)
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: a system for data, uncertainty, and lineage. In: VLDB (2006)
Antova, L., Koch, C., Olteanu, D.: \({10^{10^6}}\) worlds and beyond: Efficient representation and processing of incomplete information. In: ICDE (2007)
Barbara D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE TKDE 4(5), 487–502 (1992)
Google Scholar
Bekales, G., Soliman, M.A., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases In: VLDB (2008)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)
Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)
Chan, C., Jagadish, H., Tan, K.-L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)
Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)
Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
MATH Google Scholar
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)
Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)
Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS (2007)
Dubhashi, D., Panconesi, A.: Concentration of measure for the analysis of randomised algorithms, p. 12. http://citeseer.ist.psu.edu/old/dubhashi98concentration.html (1998)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66, 614–656 (2003)
MATH MathSciNet Google Scholar
Gilks W., Richardson S., Spiegelhalter D.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)
MATH Google Scholar
Goldreich, O.: Randomized Methods in Computation, Lecture 2. http://www.wisdom.weizmann.ac.il/~oded/rnd.html (2001)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)
Imielinski T., Lipski W.: Incomplete information in relational databases. JACM 31(4), 761–791 (1984)
Article MATH MathSciNet Google Scholar
Kalos M.H., Whitlock P.A.: Monte Carlo Methods. Wiley Interscience, London (1986)
Book MATH Google Scholar
Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: DASFAA (2006)
Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)
Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: KDD (2005)
Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: Probview: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)
Article Google Scholar
Ngai, W.K., Kao, B., Cheng, C.K.C.R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: ICDM (2006)
Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient olap operations in spatial data warehouses. In: SSTD (2001)
Papadias, D., Mamoulis, N., Theodoridis, Y.: Processing and optimization of multiway spatial joins using R-trees. In: PODS (1999)
Papadias D., Tao Y., Greg F., Seeger B.: Progressive skyline computation in database systems. ACM TODS 30(1), 41–82 (2003)
Article Google Scholar
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skyline on uncertain data. In: VLDB (2007)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2005)
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: ICDE (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB (2001)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB (2005)
Yi K., Li F., Kollios G., Srivastava D.: Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng. 20(12), 1669–1682 (2008)
Article Google Scholar
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB (2007)

Download references

Author information

Authors and Affiliations

The University of New South Wales and NICTA, Sydney, Australia
Wenjie Zhang, Xuemin Lin, Ying Zhang & Wei Wang
Simon Fraser University, Burnaby, Canada
Jian Pei

Authors

Wenjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuemin Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Lin, X., Zhang, Y. et al. Threshold-based probabilistic top-k dominating queries. The VLDB Journal 19, 283–305 (2010). https://doi.org/10.1007/s00778-009-0162-1

Download citation

Received: 10 May 2008
Revised: 05 May 2009
Accepted: 27 June 2009
Published: 04 September 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s00778-009-0162-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Threshold-based probabilistic top-k dominating queries

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Novel picture fuzzy power partitioned Hamy mean operators with Dempster-Shafer theory and their applications in MCDM

Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and Variants

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Threshold-based probabilistic top-k dominating queries

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Novel picture fuzzy power partitioned Hamy mean operators with Dempster-Shafer theory and their applications in MCDM

Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and Variants

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation