Advertisement

Acta Informatica

, Volume 40, Issue 8, pp 529–584 | Cite as

On the equivalence and rewriting of aggregate queries

  • Stéphane GrumbachEmail author
  • Maurizio Rafanelli
  • Leonardo Tininini
Article

Abstract.

We introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multiply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexity, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation. We then consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query (namely single-block aggregate group-by queries without having clause). Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then analyze the equivalence problem in the case of aggregate conjunctive queries with comparisons. We introduce the concept of cross isomorphic linear expansions, which generalizes isomorphim modulo a product, and we show that equivalence reduces to it and that it can be decided in PSPACE. Finally, we show that the problem of complete rewriting of count queries using count views is NP-complete, and we introduce new rewriting techniques based on the isomorphism modulo a product. to recover the values of counts by complex arithmetical computation from the views.

Keywords

Linear Expansion Expressive Power Aggregation Operator Isomorphic Linear Arithmetical Computation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arnon, D., Collins, G., McCallum, S. (1988) Cylindrical algebraic decomposition. SIAM J. computing 13(4): 865-889zbMATHGoogle Scholar
  2. 2.
    Abiteboul, S., Duschka, O.M. (1998) Complexity of answering queries using materialized views. In: Proc. ACM PODS’98, June 1-3, 1998. ACM Press, Seattle, Washington, pp. 254-263Google Scholar
  3. 3.
    Agrawal, R., Gupta, A., Sarawagi, S. (1997) Modeling multidimensional databases. In: Proceedings of ICDE’97. IEEE Computer Society, pp. 232-243Google Scholar
  4. 4.
    Abiteboul, S., Hull, R., Vianu, V. (1995) Foundations of Databases. Addison-WesleyGoogle Scholar
  5. 5.
    Afrati, F.N., Li, C., Mitra, P. (2002) Answering queries using views with arithmetic comparisons. In: Proc. PODS 2002, pp. 209-220Google Scholar
  6. 6.
    Benedikt, M., Dong, G., Libkin, L., Wong, L. (1996) Relational expressive power of constraint query languages. In: Proc. PODS’96. Journal of the ACM (to appear)Google Scholar
  7. 7.
    Barbará, D., Imielinski, T. (1995) Sleepers and workaholics: Caching strategies in mobile environments. VLDB Journal 4(4): 567-602Google Scholar
  8. 8.
    Benedikt, M., Libkin, L. (1996) On the structure of queries in constraint query languages. In: Proc. LICS’96. IEEE Computer Society Press, pp. 25-34Google Scholar
  9. 9.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2000) Query processing using views for regular path queries with inverse. In: Proc. PODS 2000, pp. 58-66Google Scholar
  10. 10.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2002) Lossless regular views. In: Proc. PODS 2002, pp. 247-258Google Scholar
  11. 11.
    Chaudhuri, S., Krishnamurthy, R., Potamianos, Shim, S.K. (1995) Optimizing queries with materialized views. In: Proc. ICDE’95. IEEE Computer Society, pp. 190-200Google Scholar
  12. 12.
    Chandra, A.K., Merlin, P.M. (1977) Optimal implementation of conjunctive queries in relational data bases. In: Proc. ACM SIGACT Symp. on the Theory of Computing, pp. 77-90Google Scholar
  13. 13.
    Cohen, S., Nutt, W., Serebrenik, A., (1999) Rewriting aggregate queries using views. In: Proc. PODS’99. ACM Press, pp. 155-166Google Scholar
  14. 14.
    Cohen, S., Nutt, W., Serebrenik, A., (2000) Algorithms for rewriting aggregate queries using views. In: Proc. ADBIS-DASFAA 2000. Springer, Berlin Heidelberg New York, pp. 65-78Google Scholar
  15. 15.
    Cohen, S., Nutt, W., Sagiv, Y. (2001) Equivalences among aggregate queries with negation. In: Proc. PODS 2001, ACMGoogle Scholar
  16. 16.
    Cabibbo, L., Torlone, R. (1999) A framework for the investigation of aggregate functions in database queries. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 383-397Google Scholar
  17. 17.
    Chaudhuri, S., Vardi, M. (1993) Optimization of real conjunctive queries. In: Proc. 12th ACM PODS. Washington, pp. 59-70Google Scholar
  18. 18.
    Duschka, O.M. Genesereth, M.R., Levy, A.Y. (2000) Recursive query plans for data integration. Journal of Logic Programming 43(1): 49-73CrossRefMathSciNetzbMATHGoogle Scholar
  19. 19.
    Van den Dries, L., Macintyre, A., Marker, D. (1994) The elementary theory of restricted analytic fields with exponentiation. Annals of Mathematics 85Google Scholar
  20. 20.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H. (1996) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proc. ICDE’96, New Orleans, Louisiana USA, pp. 152-159Google Scholar
  21. 21.
    Ghosh, S.P. (1986) Statistical relational tables for statistical database management. IEEE Transactions on Software Engineering SE-12(12): 1106-1116Google Scholar
  22. 22.
    Gupta, A., Harinarayan, V., Quass, D. (1995) Aggregate-query processing in data warehousing environments. In: Proc. VLDB’95. Morgan Kaufmann, pp. 358-369Google Scholar
  23. 23.
    Gyssens, M., Lakshmanan, L.V.S. (1997) A foundation for multi-dimensional databases. In: Proc. VLDB’97. Morgan Kaufmann, pp. 106-115Google Scholar
  24. 24.
    Goldstein, J., Larson, P. (2001) Optimizing queries using materialized views: A practical, scalable solution. In: Proc. SIGMOD 2001. ACM, pp. 331-342Google Scholar
  25. 25.
    Grumbach, S., Libkin, L., Milo, T., Wong, L. (1996) Query languages for bags: Expressive power and complexity. Sigact News 27(2): 30-37Google Scholar
  26. 26.
    Gupta, H., Mumick, I.S. (1999) Selection of views to materialize under a maintenance cost constraint. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 453-470Google Scholar
  27. 27.
    Grumbach, S., Rafanelli, M., Tininini, L. (1999) Querying aggregate data. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31-June 2, 1999. Philadelphia, Pennsylvania, ACM Press, pp. 174-184Google Scholar
  28. 28.
    Grumbach, S., Tininini, L. (2000) Automatic aggregation using explicit metadata. In: Proc. SSDBM 2000. IEEE, pp. 85-94Google Scholar
  29. 29.
    Grumbach, S., Tininini, L. (2000) On the content of materialized aggregate views. In: Proc. ACM-PODS 2000. ACM, pp. 47-57Google Scholar
  30. 30.
    Gusfield, D. (1988) A graph theoretic approach to statistical data security. SIAM Journal on Computing 17(3): 552-571MathSciNetzbMATHGoogle Scholar
  31. 31.
    Halevy, A.Y. (2001) Answering queries using views: a survey. VLDB Journal 10(4): 270-294CrossRefzbMATHGoogle Scholar
  32. 32.
    Hella, L., Libkin, L., Nurmonen, J., Wong, L. (1999) Logics with aggregate operators. In: Proc. LICS’99. IEEE Computer Society, pp. 35-44Google Scholar
  33. 33.
    Harinarayan, V., Rajaraman, A., Ullman, J.D. (1996) Implementing data cube efficiently. In: Proc. SIGMOD’96. Montreal, Canada, pp. 205-216Google Scholar
  34. 34.
    Hogg, R.V., Tanis, E.A. (1977) Probability and Statistical Inference. MacMillanGoogle Scholar
  35. 35.
    Ibarra O., Su, J. (1997) On the containment and equivalence of database queries with linear constraints. In: Proc. PODS’97, pp. 32-43Google Scholar
  36. 36.
    Kanellakis, P.C., Kuper, G.M., Revesz, P.Z. (1995) Constraint query languages. Journal of Computer and System Sciences 51: 26-52CrossRefMathSciNetGoogle Scholar
  37. 37.
    Kuper, G.M., Libkin, L., Paredaens, J. (2000) Constraint Databases. LNCS, Springer, Berlin Heidelberg New YorkGoogle Scholar
  38. 38.
    Klug, A. (1988) On conjunctive queries containing inequalities. Journal of the ACM 35(1): 146-160CrossRefzbMATHGoogle Scholar
  39. 39.
    Kotidis, Y., Roussopoulos, N. (1999) Dynamat: A dynamic view management system for data warehouses. In: Proc. SIGMOD’99. ACM Press, pp. 371-382Google Scholar
  40. 40.
    Kozen, D., Yap, C. (1985) Algebraic cell decomposition in nc. In: Proc IEEE Foundations of Computer Science, pp. 515-521Google Scholar
  41. 41.
    Li, C., Bawa, M., Ullman, J.D. (2001) Minimizing view sets without losing query-answering power. In: Proc. ICDT 2001. Springer, Berlin Heidelberg New York, pp. 99-113Google Scholar
  42. 42.
    Levy, A.Y., Mumick, I.S. (1996) Reasoning with aggregation constraints. In: Proc EDBT’96, pp. 514-534Google Scholar
  43. 43.
    Levy, A.Y. Mendelzon, A.O. Sagiv, Y., Srivastava, D. (1995) Answering queries using views. In: Proc. PODS’95, pp. 95-104Google Scholar
  44. 44.
    Lenz, H.-J., Shoshani, A. (1997) Summarizability in olap and statistical data bases. In: Proc. SSDBM’97. Olympia, Washington, USA, pp. 132-143Google Scholar
  45. 45.
    Levy, A.Y., Srivastava, D., Kirk, T. (1995) Data model and query evaluation in global information systems. Journal of Intelligent Information Systems 5(2): 121-143Google Scholar
  46. 46.
    Libkin, L., Wong, L. (1997) On the power of aggregation in relational query languages. In: Proc. DBPL’97. Springer, Berlin Heidelberg New York, pp. 260-280Google Scholar
  47. 47.
    Malvestuto, F.M., Moscarini, M. (1998) Computational issues connected with the protection of sensitive statistics by auditing sum queries. In: Proc. SSDBM’98. IEEE Computer Society, pp. 134-144Google Scholar
  48. 48.
    Malvestuto, F.M., Moscarini, M., Rafanelli, M. (1991) Suppressing marginal cells to protect sensitive information in a two-dimensional statistical table. In: Proc. PODS’91. ACM Press, pp. 252-258Google Scholar
  49. 49.
    Nutt, W., Sagiv, Y., Shurin, S. (1998) Deciding equivalence among aggregate queries. In: Proc. PODS’98, pp. 214-223Google Scholar
  50. 50.
    Ozsoyoglu, G., Ozsoyoglu, Z.M. Matos, V. (1987) Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Transactions on Database Systems 12(4): 566-592CrossRefGoogle Scholar
  51. 51.
    Pottinger, R., Levy, A.Y. (2000) A scalable algorithm for answering queries using views. In: Proc. VLDB 2000. Morgan Kaufmann, pp. 484-495Google Scholar
  52. 52.
    Qian, X. (1996) Query folding. In: Proc. ICDE’96. IEEE Computer Society, pp. 48-55Google Scholar
  53. 53.
    Rafanelli, M., Bezenchek, A., Tininini, L. (1996) The aggregate data problem: a system for their definition and management. ACM Sigmod Record 25(4): 8-13Google Scholar
  54. 54.
    Renegar, J. (1992) On the computational complexity and geometry of the first-order theory of the reals. Journal of Symbolic Computation 13: 255-352MathSciNetzbMATHGoogle Scholar
  55. 55.
    Rafanelli, M., Ricci, F.L. (1993) Mefisto: a functional model for statistical entities. IEEE Transactions on Knowledge and Data Engineering 5(4): 670-681CrossRefGoogle Scholar
  56. 56.
    Ross, K.A., Srivastava, D., Stuckey, P.J., Sudarshan, S. (1998) Foundations of aggregation constraints. Theoretical Computer Science B 193(1-2): 149-179Google Scholar
  57. 57.
    Rajaraman, A., Sagiv, Y., Ullman, J.D. (1995) Answering queries using templates with binding patterns. In: Proc. PODS’95. ACM Press, pp. 105-112Google Scholar
  58. 58.
    Sristava, D., Dar, S., Jagadish, H.V. Levy, A.Y. (1996) Answering queries with aggregation using views. In: Proc. VLDB’96, pp. 318-329Google Scholar
  59. 59.
    Shoshani, A. (1997) Olap and statistical databases: Similarities and differences. In: Proc. PODS’97, pp. 183-196Google Scholar
  60. 60.
    Shoshani, A., Wong, H.K.T. (1985) Statistical and scientific database issues. IEEE Transactions on Software Engineering SD-11(10): 1040-1047Google Scholar
  61. 61.
    van der Meyden, R. (1992) The complexity of querying indefinite data about linearly ordered domains. In: Proc. PODS’92. ACM Press, pp. 331-345Google Scholar
  62. 62.
    Wolfson, O., Sistla, A.P., Dao, S., Narayanan, K., Raj, R. (1995) View maintenance in mobile computing. ACM Sigmod Record 24(4): 22-27Google Scholar
  63. 63.
    Yang, H.Z., Larson, P. (1987) Query transformation for psj-queries. In: Proc. VLDB’87. Morgan Kaufmann, pp. 245-254Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2004

Authors and Affiliations

  • Stéphane Grumbach
    • 1
    Email author
  • Maurizio Rafanelli
    • 2
  • Leonardo Tininini
    • 2
  1. 1.INRIALe ChesnayFrance
  2. 2.CNR-IASIRomaItaly

Personalised recommendations