Advertisement

The VLDB Journal

, Volume 16, Issue 1, pp 123–144 | Cite as

OLAP over uncertain and imprecise data

  • Doug Burdick
  • Prasad M. Deshpande
  • T. S. Jayram
  • Raghu Ramakrishnan
  • Shivakumar Vaithyanathan
Special Issue Paper

Abstract

We extend the OLAP data model to represent data ambiguity, specifically imprecision and uncertainty, and introduce an allocation-based approach to the semantics of aggregation queries over such data. We identify three natural query properties and use them to shed light on alternative query semantics. While there is much work on representing and querying ambiguous data, to our knowledge this is the first paper to handle both imprecision and uncertainty in an OLAP setting.

Keywords

Aggregation Imprecision Uncertainty Ambiguous 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul S., Kanellakis P.C., Grahne G. On the representation and querying of sets of possible worlds. In: SIGMOD (1987)Google Scholar
  2. 2.
    Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS (1999)Google Scholar
  3. 3.
    Arenas M., Bertossi L.E., Chomicki J., He X., Raghavan V., Spinrad J. (2003) Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 3(296): 405–434zbMATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Bell D.A., Guan J.W., Lee S.K. (1996) Generalized union and project operations for pooling uncertain and imprecise information. Data Knowl. Eng. 18(2): 89–117zbMATHCrossRefGoogle Scholar
  5. 5.
    Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB (1987)Google Scholar
  6. 6.
    Chen A.L.P., Chiu J.S., Tseng F.S.C. (1996) Evaluating aggregate operations over imprecise data. IEEE TKDE 8(2): 273–284Google Scholar
  7. 7.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)Google Scholar
  8. 8.
    Dempster A.P., Laird N.M., Rubin D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39(1): 1–38zbMATHMathSciNetGoogle Scholar
  9. 9.
    Dey, D., Sarkar, S.: PSQL: A query language for probabilistic relational data. Data Knowl. Eng. 28(1), 107–120 (1998). DOI http://dx.doi.org/10.1016/S0169-023X(98)00015-9Google Scholar
  10. 10.
    Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997). http://doi.acm.org/10.1145/239041.239045Google Scholar
  11. 11.
    Garcia-Molina H., Porter D. (1992) The management of probabilistic data. IEEE TKDE 4, 487–501Google Scholar
  12. 12.
    Garg, A., Jayram, T.S., Vaithyanathan, S., Zhu, H.: Model based opinion pooling. In: The Eight International Symposium on Artificial Intelligence and Mathematics (2004)Google Scholar
  13. 13.
    Genest C., Zidek J.V. (1986) Combining probability distributions: a critique and an annotated bibliography (avec discussion). Stat. Sci. 1, 114–148zbMATHMathSciNetGoogle Scholar
  14. 14.
    Kiviniemi, J., Wolski, A., Pesonen, A., Arminen, J.: Lazy aggregates for real-time OLAP. In: DaWaK 1999Google Scholar
  15. 15.
    Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S. (1997) ProbView: a flexible probabilistic database system. ACM TODS 22(3): 419–469CrossRefGoogle Scholar
  16. 16.
    Lenz, H.J., Shoshani, A.: Summarizability in OLAP and statistical data bases In: SSDBM (1997)Google Scholar
  17. 17.
    Lenz, H.J., Thalheim, B.: OLAP databases and aggregation functions. In: SSDBM (2001)Google Scholar
  18. 18.
    McClean S.I., Scotney B.W., Shapcott M. (2001) Aggregation of imprecise and uncertain information in databases. IEEE TKDE 13(6): 902–912Google Scholar
  19. 19.
    Motro A. (1990) Accommodating imprecision in database systems: issues and solutions. SIGMOD Rec. 19(4): 69–74CrossRefGoogle Scholar
  20. 20.
    .Motro, A.: Sources of uncertainty, imprecision and inconsistency in information systems. In: Uncertainty Management in Information Systems, pp. 9–34 (1996)Google Scholar
  21. 21.
    Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Supporting imprecision in multidimensional databases using granularities. In: SSDBM (1999)Google Scholar
  22. 22.
    Ross, R., Subrahmanian, V.S., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005). http://doi.acm.org/10.1145/1044731.1044734Google Scholar
  23. 23.
    Rundensteiner, E.A., Bic, L.: Evaluating aggregates in possibilistic relational databases. Data Knowl. Eng. 7(3), 239–267 (1992). DOI http://dx.doi.org/10.1016/0169-023X(92)90040-IGoogle Scholar
  24. 24.
    Shoshani, A.: OLAP and statistical databases: similarities and differences. In: PODS (1997)Google Scholar
  25. 25.
    Wu X., Barbará D. (2002) Learning missing values from summary constraints. SIGKDD Explor. 4(1): 21–30Google Scholar
  26. 26.
    .Wu, X., Barbará, D.: Modeling and imputation of large incomplete multidimensional datasets. In: DaWaK (2002)Google Scholar
  27. 27.
    Zhu, H., Vaithyanathan, S., Joshi, M.V.: Topic learning from few examples. In: PKDD (2003)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  • Doug Burdick
    • 1
  • Prasad M. Deshpande
    • 2
  • T. S. Jayram
    • 2
  • Raghu Ramakrishnan
    • 1
  • Shivakumar Vaithyanathan
    • 2
  1. 1.MadisonUSA
  2. 2.San JoseUSA

Personalised recommendations