Using Genetic Algorithms to Optimize Redundant Data

  • Iwona Szulc
  • Krzysztof Stencel
  • Piotr Wiśniewski
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 716)

Abstract

Analytic queries can exhaust resources of the DBMS at hand. Since the nature of such queries can be foreseen, a database administrator can prepare the DBMS so that it serves such queries efficiently. Materialization of partial results (aggregates) is perhaps the most important method to reduce the resource consumption of such queries. The number of possible aggregates of a fact table is exponential in the number of its dimensions. The administrator has to choose a reasonable subset of all possible materialized aggregates. If an aggregate is materialized, it may produce benefits during a query execution but also instigate a cost during data maintenance (not to mention the space needed). Thus, the administrator faces an optimisation problem: knowing the workload (i.e. the queries and updates to be performed), what is the subset of all aggregates that gives the maximal net benefit? In this paper we present a cost model that defines the framework of this optimisation problem. Then, we compare two methods to compute the optimal subset of aggregates: a complete search and a genetic algorithm. We tested these meta-heuristics on a fact table with 30 dimensions. The results are promising. The genetic algorithm runs significantly faster while yielding solutions within 10% margin of the optimal solution found by the complete search.

Keywords

Materialized views Aggregations Complete search Genetic algorithm 

References

  1. 1.
    Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009). http://www.vldb.org/pvldb/2/vldb09-10years.pdf Google Scholar
  2. 2.
    Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of data, Santa Barbara, CA, USA, 21–24 May 2001, pp. 211–222 (2001). http://doi.acm.org/10.1145/375663.375686
  3. 3.
    Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007, pp. 3–14 (2007). http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf
  4. 4.
    Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). http://dx.doi.org/10.1561/1900000001 CrossRefMATHGoogle Scholar
  5. 5.
    Flexviews: Incrementally refreshable materialized views for MySQL, January 2012. http://code.google.com/p/flexviews/
  6. 6.
    Gawarkiewicz, M., Wiśniewski, P.: Partial aggregation using hibernate. In: Kim, T., Adeli, H., Slezak, D., Sandnes, F.E., Song, X., Chung, K., Arnett, K.P. (eds.) FGIT 2011. LNCS, vol. 7105, pp. 90–99. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27142-7_11 CrossRefGoogle Scholar
  7. 7.
    Gawarkiewicz, M., Wiśniewski, P., Stencel, K.: Granular indices for HQL analytic queries. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 30–39. Springer, Cham (2014). doi:10.1007/978-3-319-06932-6_4 CrossRefGoogle Scholar
  8. 8.
    Hindshaw, F., Metzger, J., Zane, B.: Optimized Database Appliance, Patent No. U.S. 7,010,521 B2, Assignee: Netezza Corporation, Framingham, MA, issued 7 March 2006Google Scholar
  9. 9.
    Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003). http://www.vldb.org/conf/2003/papers/S02P01.pdf
  10. 10.
    Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010). http://dx.doi.org/10.1145/1862919.1862921 CrossRefGoogle Scholar
  11. 11.
    Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 395–406 (2004). http://doi.acm.org/10.1145/1007568.1007613
  12. 12.
    Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, 2–4 June 1998, Seattle, Washington, USA, pp. 106–117 (1998). http://doi.acm.org/10.1145/276304.276315
  13. 13.
    Kalyvianaki, E., Wiesemann, W., Vu, Q.H., Kuhn, D., Pietzuch, P.: SQPR: stream query planning with reuse. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, 11–16 April 2011, Hannover, Germany, pp. 840–851 (2011). http://dx.doi.org/10.1109/ICDE.2011.5767851
  14. 14.
    Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 659–670 (2004). http://doi.acm.org/10.1145/1007568.1007642
  15. 15.
    Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. In: SIGMOD Conference, pp. 100–111 (1997)Google Scholar
  16. 16.
    Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec. 29(2), 129–140 (2000). http://doi.acm.org/10.1145/335191.335393 CrossRefGoogle Scholar
  17. 17.
    Slezak, D., Synak, P., Borkowski, J., Wroblewski, J., Toppin, G.: A rough-columnar RDBMS engine - a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012). http://sites.computer.org/debull/A12mar/infobright1.pdf Google Scholar
  18. 18.
    Slezak, D., Synak, P., Wojna, A., Wroblewski, J.: Two database related interpretations of rough approximations: data organization and query execution. Fundam. Inform. 127(1–4), 445–459 (2013). http://dx.doi.org/10.3233/FI-2013-920 Google Scholar
  19. 19.
    Slezak, D., Wroblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008). http://www.vldb.org/pvldb/1/1454174.pdf Google Scholar
  20. 20.
    Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. In: CS&P, pp. 457–468 (2013)Google Scholar
  21. 21.
    Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. Fundam. Inform. 135(4), 537–551 (2014). http://dx.doi.org/10.3233/FI-2014-1139 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Iwona Szulc
    • 1
  • Krzysztof Stencel
    • 1
  • Piotr Wiśniewski
    • 1
  1. 1.Faculty of Mathematics and Computer ScienceNicolaus Copernicus UniversityToruńPoland

Personalised recommendations