Using Genetic Algorithms to Optimize Redundant Data
- First Online:
- Cite this paper as:
- Szulc I., Stencel K., Wiśniewski P. (2017) Using Genetic Algorithms to Optimize Redundant Data. In: Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham
Analytic queries can exhaust resources of the DBMS at hand. Since the nature of such queries can be foreseen, a database administrator can prepare the DBMS so that it serves such queries efficiently. Materialization of partial results (aggregates) is perhaps the most important method to reduce the resource consumption of such queries. The number of possible aggregates of a fact table is exponential in the number of its dimensions. The administrator has to choose a reasonable subset of all possible materialized aggregates. If an aggregate is materialized, it may produce benefits during a query execution but also instigate a cost during data maintenance (not to mention the space needed). Thus, the administrator faces an optimisation problem: knowing the workload (i.e. the queries and updates to be performed), what is the subset of all aggregates that gives the maximal net benefit? In this paper we present a cost model that defines the framework of this optimisation problem. Then, we compare two methods to compute the optimal subset of aggregates: a complete search and a genetic algorithm. We tested these meta-heuristics on a fact table with 30 dimensions. The results are promising. The genetic algorithm runs significantly faster while yielding solutions within 10% margin of the optimal solution found by the complete search.