Skip to main content
Log in

Ranking queries on uncertain data

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain data using three parameters: rank threshold k, probability threshold p, and answer set size threshold l. Systematically, we identify four types of ranking queries on uncertain data. First, a probability threshold top-k query computes the uncertain records taking a probability of at least p to be in the top-k list. Second, a top-(k, l) query returns the top-l uncertain records whose probabilities of being ranked among top-k are the largest. Third, the p-rank of an uncertain record is the smallest number k such that the record takes a probability of at least p to be ranked in the top-k list. A rank threshold top-k query retrieves the records whose p-ranks are at most k. Last, a top-(p, l) query returns the top-l uncertain records with the smallest p-ranks. To answer such ranking queries, we present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation-based algorithm. To answer top-(k, l) queries and top-(p, l) queries, we propose PRist+, a compact index. An efficient index construction algorithm and efficacious query answering methods are developed for PRist+. An empirical study using real and synthetic data sets verifies the effectiveness of the probabilistic ranking queries and the efficiency of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD’87

  2. Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE’08

  3. Angluin, D., Valiant, L.G.: Fast probabilistic algorithms for hamiltonian circuits and matchings. In: STOC’77

  4. Antova, L., Koch, C., Olteanu, D.: \({10^{10^6} }\) worlds and beyond: efficient representation and processing of incomplete information. In: ICDE’07

  5. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: Uldbs: databases with uncertainty and lineage. In: VLDB’06

  6. Böhm, C., Pryakhin, A., Schubert, M.: The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: ICDE’06

  7. Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: SIGMOD’05

  8. Cam L.L.: An approximation theorem for poisson binomial distribution. Pac. J. Math. 10, 1181–1197 (1960)

    MATH  Google Scholar 

  9. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD’03

  10. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB’04

  11. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE’09

  12. Dalvi, N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: PODS’07

  13. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS’07

  14. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB’04

  15. Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: VLDB’06

  16. Dey D., Sarkar S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)

    Article  Google Scholar 

  17. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS’01

  18. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA’03

  19. Hodges J.L., Cam J.L.L.: The poisson approximation to the poisson binomial distribution. Ann. Math. Stat. 31(3), 737–740 (1960)

    Article  MATH  Google Scholar 

  20. Hoeffding W.: On the distribution of the number of successes in independent trials. Ann. Math. Stat. 27, 713–721 (1956)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data (extended abstract). In: ICDE’08

  22. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD’08

  23. Imielinski T., Lipski W. J.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  24. Lange, K.: Numerical analysis for statisticians. Stat. comput. (1999)

  25. Lee, S.K.: Imprecise and uncertain information in databases: an evidential approach. In: ICDE’92

  26. Li, F., Yi, K., Jestes, J.: Ranking distributed probabilistic data. In: SIGMOD’09

  27. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. In: VLDB’09

  28. Li, J., Deshpande, A.: Consensus answers for queries over probabilistic databases. In: PODS’09

  29. Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: EDBT’08

  30. Ljosa, V., Singh, A.K.: Apla: Indexing arbitrary probability distributions. In: ICDE’07

  31. Motwani R., Raghavan P.: Randomized Algorithms. Cambridge University Press, United Kingdom (1995)

    MATH  Google Scholar 

  32. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD’06

  33. Nepal, S., Ramakrishna, M.: Query processing issues in image(multimedia) databases. In: ICDE’99

  34. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB ’07

  35. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE’07

  36. Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE’06

  37. Silberstein, A.S., Braynard, R., Ellis, C., Munagala, K., Yang, J.: A sampling-based approach to optimizing top-k queries in sensor networks. In: ICDE ’06

  38. Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.: Indexing uncertain categorical data. In: ICDE’07

  39. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: ICDE’07

  40. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB’05

  41. Tao Y., Xiao X., Cheng R.: Range search on multidimensional uncertain data. ACM Trans. Database Syst. 32(3), 15 (2007)

    Article  Google Scholar 

  42. Yi, K., Li, F., Srivastava, D., Kollios, G.: Efficient processing of top-k queries in uncertain databases. In: ICDE’08

  43. Yi K., Li F., Kollios G., Srivastava D.: Efficient processing of Top-k queries in uncertain databases with x-Relations. IEEE Trans. Knowl. Data Eng. 20(12), 1669–1682 (2008)

    Article  Google Scholar 

  44. Zhang, X., Chomicki, J.: Semantics and evaluation of Top-k queries in probabilistic databases. Distrib. Parallel Databases (DAPD) J. (Special Issue on Ranking in Databases), pp. 67–126

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Hua.

Additional information

The authors are grateful to the anonymous reviewers and Dr. Michael Böhlen, the associate editor, for their constructive and insightful advice on the paper. This research was supported by an NSERC Discovery Grant and an NSERC Discovery Accelerator Supplement Grant. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hua, M., Pei, J. & Lin, X. Ranking queries on uncertain data. The VLDB Journal 20, 129–153 (2011). https://doi.org/10.1007/s00778-010-0196-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0196-4

Keywords

Navigation