Advertisement

Journal of Intelligent Information Systems

, Volume 38, Issue 1, pp 1–39 | Cite as

Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

  • Bin Jiang
  • Jian Pei
  • Xuemin Lin
  • Yidong Yuan
Article

Abstract

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1). Computing probabilistic skylines on large uncertain data sets is challenging. We develop a bounding-pruning-refining framework and three algorithms systematically. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Combining the advantages of the bottom-up algorithm and the top-down algorithm, we develop a hybrid algorithm to further improve the performance. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our algorithms are efficient on large data sets.

Keywords

Uncertain data Skyline queries Probabilistic queries Algorithms 

References

  1. Abiteboul, S., Kanellakis, P., & Grahne, G. (1987). On the representation and querying of sets of possible worlds. In Proceedings of the 1987 ACM SIGMOD international conference on Management of data (SIGMOD’87) (pp. 34–48). New York: ACM Press.CrossRefGoogle Scholar
  2. Aggarwal, C. C., & Yu, P. S. (2007). A survey of uncertain data algorithms and applications. IBM technical report (RC 24394).Google Scholar
  3. Atallah, M. J., & Qi, Y. (2009). Computing all skyline probabilities for uncertain data. In Proceedings of the twenty-eigth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS (pp. 279–287).Google Scholar
  4. Balke, W. T., Güntzer, U., & Zheng, J. X. (2004). Efficient distributed skylining for web information systems. In EDBT 2004, 9th international conference on extending database technology (pp. 256–273).Google Scholar
  5. Benjelloun, O., Sarma, A. D., Halevy, A., & Widom, J. (2006). Uldbs: Databases with uncertainty and lineage. In VLDB’2006: Proceedings of the 32nd international conference on very large data bases, VLDB endowment (pp. 953–964).Google Scholar
  6. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM (CACM), 18(9), 509–517.CrossRefMATHGoogle Scholar
  7. Bentley, J. L., Kung, H. T., Schkolnick, M., & Thompson, C. D. (1978). On the average number of maxima in a set of vectors and applications. Journal of the ACM, 25(4), 536–543.CrossRefMATHMathSciNetGoogle Scholar
  8. Böhm, C., Fiedler, F., Oswald, A., Plant, C., & Wackersreuther, B. (2009). Probabilistic skyline queries. In Proceedings of the 18th ACM conference on information and knowledge management, CIKM (pp. 651–660).Google Scholar
  9. Borzsonyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of 2001 international conferences on data engineering (ICDE’01). Heidelberg, Germany.Google Scholar
  10. Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., & Vaithyanathan, S. (2005). OLAP over uncertain and imprecise data. In VLDB ’05: Proceedings of the 31st international conference on very large data bases, VLDB endowment (pp. 970–981).Google Scholar
  11. Chan, C. Y., Eng, P. K., & Tan, K. L. (2005). Stratified computation of skylines with partially-ordered domains. In Proceedings of the 2005 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 203–214).Google Scholar
  12. Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of the 2006 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 503–514).Google Scholar
  13. Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006b). Finding k-dominant skylines in high dimensional space. In SIGMOD (pp. 503–514). New York: ACM Press.Google Scholar
  14. Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006c). On high dimensional skylines. In 10th international conference on extending database technology (EDBT) (pp. 478–495).Google Scholar
  15. Chen, L., & Lian, X. (2008). Dynamic skyline queries in metric spaces. In EDBT (pp. 333–343).Google Scholar
  16. Cheng, R., Kalashnikov, D. V., & Prabhakar, S. (2003). Evaluating probabilistic queries over imprecise data. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD’03) (pp. 551–562). New York: ACM Press.CrossRefGoogle Scholar
  17. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., & Vitter, J. S. (2004). Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of 30th international conference on very large data bases (VLDB) (pp. 876–887).Google Scholar
  18. Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of the 19th international conference on data engineering (ICDE) (pp. 717–816).Google Scholar
  19. Dai, X., Yiu, M. L., Mamoulis, N., Tao, Y., & Vaitis, M. (2005). Probabilistic spatial queries on existentially uncertain data. In Proceeding of the 9th international symposium on spatial and temporal databases (SSTD) (pp. 400–417).Google Scholar
  20. Dalvi, N. N., & Suciu, D. (2004). Efficient query evaluation on probabilistic databases. In Proceedings of 30th international conference on very large data bases (VLDB) (pp. 864–875).Google Scholar
  21. Dalvi, N. N., & Suciu, D. (2007). Management of probabilistic data: Foundations and challenges. In Proceedings of the twenty-sixth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 1–12). New York: ACM Press.CrossRefGoogle Scholar
  22. Dellis, E., & Seeger, B. (2007). Efficient computation of reverse skyline queries. In Proceedings of the 33rd international conference on very large data bases (VLDB) (pp. 291–302).Google Scholar
  23. Denis Mindolin, J. C. (2009). Discovering relative importance of skyline attributes. In Proceedings of the 35th international conference on very large data bases (VLDB).Google Scholar
  24. Deshpande, A., & Sarawagi, S. (2007). Probabilistic graphical models and their role in databases. In Proceedings of the 33rd international conference on very large data bases (pp. 1435–1436).Google Scholar
  25. Godfrey, P., Shipley, R., & Gryz, J. (2005). Maximal vector computation in large data sets. In VLDB. Trondheim, Norway.Google Scholar
  26. Guttman, A. (1984). R-tree: A dynamic index structure for spatial searching. In Proc. 1984 ACM-SIGMOD int. conf. management of data (SIGMOD’84) (pp. 47–57). Boston, MA.Google Scholar
  27. Huang, Z., Jensen, C. S., Lu, H., & Ooi, B. C. (2006). Skyline queries against mobile lightweight devices in manets. In Proceedings of the 22nd international conference on data engineering (ICDE’06). New York: IEEE.Google Scholar
  28. Imielinski, T., & Witold Lipski, J. (1984). Incomplete information in relational databases. Journal of the ACM, 31(4), 761–791.CrossRefMATHGoogle Scholar
  29. Jiang, B., & Pei, J. (2009). Online interval skyline queries on time series. In Proceedings of the 25th international conference on data engineering (ICDE’09). Shanghai, China.Google Scholar
  30. Jiang, B., Pei, J., Lin, X., Cheung, D. W., & Han, J. (2008). Mining preferences from superior and inferior examples. In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 390–398). New York: ACM Press.CrossRefGoogle Scholar
  31. Kossmann, D., Ramsak, F., & Rost, S. (2002). Shooting stars in the sky: An online algorithm for skyline queries. In Proc. 2002 int. conf. on very large data bases (VLDB’02). Hong Kong, China.Google Scholar
  32. Kriegel, H. P., Kunath, P., Pfeifle, M., & Renz, M. (2006). Probabilistic similarity join on uncertain data. In Proceeding of the 11th international conference on database systems for advanced applications (DASFAA) (pp. 295–309).Google Scholar
  33. Kung, H. T., Luccio, F., & Preparata, F. P. (1975). On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 469–476.CrossRefMATHMathSciNetGoogle Scholar
  34. Lian, X., & Chen, L. (2008). Monochromatic and bichromatic reverse skyline search over uncertain databases. In SIGMOD conference (pp. 213–226).Google Scholar
  35. Lin, X., Yuan, Y., Wang, W., & Lu, H. (2005). Stabbing the sky: Efficient skyline computation over sliding windows. In Proceedings of the 21st international conference on data engineering (ICDE) (pp. 502–513).Google Scholar
  36. Morse, M. D., Patel, J. M., & Grosky, W. I. (2006). Efficient continuous skyline computation. In Proceedings of the 22nd international conference on data engineering (ICDE) (p. 108).Google Scholar
  37. Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2003). An optimal and progressive algorithm for skyline queries. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 467–478).Google Scholar
  38. Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 760–771).Google Scholar
  39. Pei, J., Jin, W., Ester, M., & Tao, Y. (2005). Catching the best views in skyline: A semantic approach. In Proceedings of the 31st international conference on very large data bases (VLDB’05).Google Scholar
  40. Pei, J., Fu, A. W. C., Lin, X., & Wang, H. (2007a). Computing compressed skyline cubes efficiently. In Proceedings of the 23rd international conference on data engineering (ICDE’07). IEEE, Istanbul.Google Scholar
  41. Pei, J., Jiang, B., Lin, X., & Yuan, Y. (2007b). Probabilistic skylines on uncertain data. In Proceedings of the 33rd international conference on very large data bases (VLDB’07). Viena, Austria.Google Scholar
  42. Sacharidis, D., Papadopoulos, S., & Papadias, D. (2009). Topologically sorted skylines for partially ordered domains. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 1072–1083).Google Scholar
  43. Sarma, A. D., Benjelloun, O., Halevy, A. Y., & Widom, J. (2006). Working models for uncertain data. In Proceedings of the 22nd international conference on data engineering (ICDE) (p. 7).Google Scholar
  44. Sarma, A. D., Lall, A., Nanongkai, D., & Xu, J. (2009). Randomized multi-pass streaming skyline algorithms. In Proceedings of the 35th international conference on very large data bases.Google Scholar
  45. Sen, P., Deshpande, A., & Getoor, L. (2007). Representing tuple and attribute uncertainty in probabilistic databases. In Workshops proceedings of the 7th IEEE international conference on data mining (ICDM) (pp. 507–512). Los Alamitos: IEEE Computer Society.CrossRefGoogle Scholar
  46. Sharifzadeh, M., & Shahabi, C. (2006). The spatial skyline queries. In Proceedings of the 32nd international conference on very large data bases (VLDB) (pp. 751–762).Google Scholar
  47. Soliman, M. A., Ilyas, I. F., & Chang, K. C. C. (2007). Top-k query processing in uncertain databases. In Proceedings of the 23rd international conference on data engineering (ICDE’07). New York: IEEE.Google Scholar
  48. Tan, K. L., Eng, P. K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of 27th international conference on very large data bases (VLDB) (pp. 301–310).Google Scholar
  49. Tao, Y., & Papadias, D. (2006). Maintaining sliding window skylines on data streams. IEEE Transactions on Knowledge and Data Engineering, 18(2), 377–391.Google Scholar
  50. Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., & Prabhakar, S. (2005). Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proceedings of 31st international conference on very large data bases (VLDB) (pp. 922–933).Google Scholar
  51. Tao, Y., Xiao, X., & Pei, J. (2006). Subsky: Efficient computation of skylines in subspaces. In Proceedings of the 22nd international conference on data engineering (ICDE’06). New York: IEEE.Google Scholar
  52. Wong, R. C. W., Pei, J., Fu, A. W. C., & Wang, K. (2007). Mining favorable facets. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 804–813). New York: ACM.CrossRefGoogle Scholar
  53. Wu, P., Zhang, C., Feng, Y., Zhao, B. Y., Agrawal, D., & Abbadi, A. E. (2006). Parallelizing skyline queries for scalable distribution. In Proceedings of the 10th international conference on extending database technology (EDBT’06). Munich: Springer.Google Scholar
  54. Xia, T., & Zhang, D. (2006). Refreshing the sky: The compressed skycube with efficient support for frequent updates. In Proceedings of the 2006 ACM SIGMOD international conference on management of data (SIGMOD’06) (pp. 491–502). New York: ACM Press.CrossRefGoogle Scholar
  55. Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J. X., & Zhang, Q. (2005). Efficient computation of the skyline cube. In Proceedings of the 31st international conference on very large data bases (VLDB) (pp. 241–252).Google Scholar
  56. Zhang, W., Lin, X., Zhang, Y., Wang, W., & Yu, J. X. (2009a). Probabilistic skyline operator over sliding windows. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 1060–1071).Google Scholar
  57. Zhang, Z., Cheng, R., Papadias, D., & Tung, A. K. H. (2009b). Minimizing the communication cost for continuous skyline maintenance. In Proceedings of the ACM SIGMOD international conference on management of data. Providence, RI, USA.Google Scholar
  58. Zhang, Z., Yang, Y., Cai, R., Papadias, D., & Tung, A. K. H. (2009c). Kernel-based skyline cardinality estimation. In Proceedings of the ACM SIGMOD international conference on management of data.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.School of Computer Science and EngineeringThe University of New South Wales and NICTASydneyAustralia

Personalised recommendations