Skip to main content

Efficiently Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational Data

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6295))

Abstract

Focusing on novel database application scenarios, where datasets arise more and more in uncertain and imprecise formats, in this paper we propose a novel framework for efficiently computing and querying multidimensional OLAP data cubes over probabilistic data, which well-capture previous kinds of data. Several models and algorithms supported in our proposed framework are formally presented and described in details, based on well-understood theoretical statistical/probabilistic tools, which converge to the definition of the so-called probabilistic OLAP data cubes, the most prominent result of our research. Finally, we complete our analytical contribution by introducing an innovative Probability Distribution Function (PDF)-based approach for efficiently querying probabilistic OLAP data cubes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimensional Aggregates. In: Proceedings of VLDB 1996 Int. Conf. (1996)

    Google Scholar 

  2. Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A System for Data, Uncertainty, and Lineage. In: Proceedings of VLDB 2006 Int. Conf. (2006)

    Google Scholar 

  3. Barbarà, D., Garcia-Molina, H., Porter, D.: The Management of Probabilistic Data. IEEE Transactions on Knowledge Data Engineering 4(5) (1992)

    Google Scholar 

  4. Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with Uncertainty and Lineage. VLDB Journal 17(2) (2008)

    Google Scholar 

  5. Bonnet, P., Gehrke, J.E., Seshadri, P.: Towards Sensor Database Systems. In: Proceedings of ACM MDM Int. Conf. (2001)

    Google Scholar 

  6. Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over Uncertain and Imprecise Data. In: Proceedings of VLDB 2005 Int. Conf. (2005)

    Google Scholar 

  7. Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: Efficient Allocation Algorithms for OLAP over Imprecise Data. In: Proceedings of VLDB 2006 Int. Conf. (2006)

    Google Scholar 

  8. Burdick, D., Doan, A., Ramakrishnan, R., Vaithyanathan, S.: OLAP over Imprecise Data with Domain Constraints. In: Proceedings of VLDB 2007 Int. Conf. (2007)

    Google Scholar 

  9. Chen, A.L.P., Chiu, J.-S., Tseng, F.S.-C.: Evaluating Aggregate Operations over Imprecise Data. IEEE Transactions on Knowledge Data Engineering 8(2) (1996)

    Google Scholar 

  10. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating Probabilistic Queries over Imprecise Data. In: Proceedings of ACM SIGMOD 2003 Int. Conf. (2003)

    Google Scholar 

  11. Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient Join Processing over Uncertain Data. In: Proceedings of ACM CIKM 2006 Int. Conf. (2006)

    Google Scholar 

  12. Colliat, G.: OLAP, Relational, and Multidimensional Database Systems. SIGMOD Record 25(3) (1996)

    Google Scholar 

  13. Cormode, G., Garofalakis, M.: Sketching Probabilistic Data Streams. In: Proceedings of ACM SIGMOD 2007 Int. Conf. (2007)

    Google Scholar 

  14. Cuzzocrea, A.: Improving Range-Sum Query Evaluation on Data Cubes via Polynomial Approximation. Data & Knowledge Engineering 56(2) (2006)

    Google Scholar 

  15. Cuzzocrea, A., Wang, W.: Approximate Range-Sum Query Answering on Data Cubes with Probabilistic Guarantees. Journal of Intelligent Information Systems 28(2) (2007)

    Google Scholar 

  16. Dalvi, N., Suciu, D.: Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of VLDB 2004 Int. Conf. (2004)

    Google Scholar 

  17. Dalvi, N., Suciu, D.: Management of Probabilistic Data: Foundations and Challenges. In: Proceedings of ACM PODS 2007 Int. Conf. (2007)

    Google Scholar 

  18. Deligiannakis, A., Roussopoulos, N.: Extended Wavelets for Multiple Measures. In: Proceedings of ACM SIGMOD 2003 Int. Conf. (2003)

    Google Scholar 

  19. Golub, G.H., Van Loan, C.F.: Matrix Computation, 2nd edn. Johns Hopkins University Press, Baltimore (1989)

    Google Scholar 

  20. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1) (1997)

    Google Scholar 

  21. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, second ed. Morgan Kauffmann Publishers, San Francisco (2006)

    Google Scholar 

  22. Harinarayan, V., Rajaraman, A., Ullman, J.: Implementing Data Cubes Efficiently. In: Proceedings of ACM SIGMOD 1996 Int. Conf. (1996)

    Google Scholar 

  23. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: Proceedings of ACM SIGMOD 1997 Int. Conf. (1997)

    Google Scholar 

  24. Ho, C.-T., Agrawal, R., Megiddo, N., Srikant, R.: Range Queries in OLAP Data Cubes. In: Proceedings of ACM SIGMOD 1997 Int. Conf. (1997)

    Google Scholar 

  25. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In: Proceedings of ACM SIGMOD 2008 Int. Conf. (2008)

    Google Scholar 

  26. Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating Statistical Aggregates on Probabilistic Data Streams. In: Proceedings of ACM PODS 2007 Int. Conf. (2007)

    Google Scholar 

  27. Jin, R., Glimcher, L., Jermaine, C., Agrawal, G.: New Sampling-Based Estimators for OLAP Queries. In: Proceedings of IEEE ICDE 2006 Int. Conf. (2006)

    Google Scholar 

  28. Kimelfeld, B., Sagiv, Y.: Maximally Joining Probabilistic Data. In: Proceedings of ACM PODS 2007 Int. Conf. (2007)

    Google Scholar 

  29. Lian, X., Chen, L.: Probabilistic Ranked Queries in Uncertain Databases. In: Proceedings of EDBT 2008 Int. Conf. (2008)

    Google Scholar 

  30. McClean, S.I., Scotney, B.W., Shapcott, M.: Aggregation of Imprecise and Uncertain Information in Databases. IEEE Transactions on Knowledge Data Engineering 13(6) (2001)

    Google Scholar 

  31. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, second ed. McGraw-Hill, New York (1984)

    Google Scholar 

  32. Ré, C., Suciu, D.: Approximate Lineage for Probabilistic Databases. PVLDB 1(1) (2008)

    Google Scholar 

  33. Ross, R., Subrahmanian, V.S., Grant, J.: Aggregate Operators in Probabilistic Databases. Journal of the ACM 52(1) (2005)

    Google Scholar 

  34. Sarma, A.D., Theobald, M., Widom, J.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: Proceedings of IEEE ICDE Int. Conf. (2008)

    Google Scholar 

  35. Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-K Query Processing in Uncertain Databases. In: Proceedings of IEEE ICDE 2007 Int. Conf. (2007)

    Google Scholar 

  36. Timko, I., Dyreson, C.E., Pedersen, T.B.: Pre-Aggregation with Probability Distributions. In: Proceedings of ACM DOLAP 2006 Int. Conf. (2006)

    Google Scholar 

  37. Vassiliadis, P., Sellis, T.: A Survey of Logical Models for OLAP Databases. SIGMOD Record 28(4) (1999)

    Google Scholar 

  38. Yi, K., Li, F., Srivastava, D.: Kollios. G.: Efficient Processing of Top-K Queries in Uncertain Databases. In: Proceedings of IEEE ICDE 2008 Int. Conf. (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cuzzocrea, A., Gunopulos, D. (2010). Efficiently Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational Data. In: Catania, B., Ivanović, M., Thalheim, B. (eds) Advances in Databases and Information Systems. ADBIS 2010. Lecture Notes in Computer Science, vol 6295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15576-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15576-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15575-8

  • Online ISBN: 978-3-642-15576-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics