Abstract
We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.
Similar content being viewed by others
References
TPC: TPC BenchmarkTM H—Revision 2.3.0. Transaction Processing Performance Council. http://www.tpc.org/tpch (2008). Accessed 20 April 2008
Valduriez, P.: Parallel database systems: open problems and new issues. Int. J. Distrib. Parallel Databases 1(2), 137–165 (1993)
Akal, F., Böhm, K., Schek, H.-J.: OLAP query evaluation in a database cluster: a performance study on intra-query parallelism. In: Proceedings of the 6th East European Conference on Advances in Databases and Information Systems. LNCS, vol. 2435, pp. 218–231. Springer, Berlin (2002)
Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
Röhm, U., Böhm, K., Schek, H.-J.: OLAP query routing and physical design in a database cluster. In: Proceedings of the 7th International Conference on Extending Database Technology. LNCS, vol. 1777, pp. 254–268. Springer, Berlin (2000)
Miranda, B., Lima, A.A.B., Valduriez, P., Mattoso, M.: Apuama: combining intra-query and inter-query parallelism in a database cluster. In: Proceedings of the EDBT workshops. LNCS, vol. 4254, pp. 649–661. Springer, Berlin (2006)
Mattoso, M., Silva, G.Z., Lima, A.A.B., Baião, F.A., Braganholo, V.P., Aveleda, A., Miranda, B., Almentero, B.K., Costa, M.N.: ParGRES: middleware para processamento paralelo consultas OLAP em clusters de Banco de dados. In: Proceedings of the 21st Brazilian Symposium on Databases—2nd Demo Session, pp. 19–24, 2006
Lima, A.A.B., Mattoso, M., Valduriez, P.: Adaptive virtual partitioning for OLAP query processing in a database cluster. In: Proceedings of the 19th Brazilian Symposium on Databases, pp. 92–105, 2004
Cappello, F., Caron, E., Dayde, M., Desprez, F., Jegou, Y., Primet, P., Jeannot, E., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Quetier, B., Richard, O.: Grid5000: a large scale and highly reconfigurable grid experimental testbed. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 99–106, 2005
PostgreSQL, PostgreSQL DBMS. http://www.postgresql.org (2008). Accessed 11 April 2008
Cecchet, E., Marguerite, J., Zwaenepoel, W.: C-JDBC: flexible database clustering middleware. In: Proceedings of the annual conference on USENIX Annual Technical Conference, pp. 26–26, 2004
MySQL: A guide to high availability clustering—how MySQL supports 99,999% Availability. MySQL AB. http://www.mysql.com/why-mysql/white-papers/cluster.php (2004). Accessed 10 April 2008
PGCluster, PG-Cluster: the multi-master synchronous replication system for PostgreSQL. http://pgcluster.projects.postgresql.org/ (2005). Accessed 11 April 2008
Lima, A.A.B., Mattoso, M., Valduriez, P.: OLAP query processing in a database cluster. In: Proceedings of the 10th International Euro-Par Conference. LNCS, vol. 3149, pp. 355–362. Springer, Berlin (2004)
Furtado, C., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Physical and virtual partitioning in OLAP database clusters. In: Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pp. 143–150, 2005
Furtado, P.: Replication in node partitioned data warehouses. In: VLDB Workshop on Design, Implementation, and Deployment of Database Replication, 2005
Furtado, P.: Node partitioned data warehouses: experimental evidence and improvements. J. Database Manag. 17(2), 42–60 (2006)
Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 273–284, 2000
Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: Proceedings of the 7th International Conference DaWaK. LNCS, vol. 3589, pp. 115–125. Springer, Berlin (2005)
Hsiao, H., DeWitt, D.J.: Chained declustering: a new availability strategy for multiprocessor database machines. In: Proceedings of 6th International Data Engineering Conference, pp. 456–465, 1990
Plastino, A., Ribeiro, C.C., Rodriguez, N.: Developing SPMD applications with load balancing. Parallel Comput. 29(6), 743–766 (2003)
Paes, M., Lima, A.A.B., Valduriez, P., Mattoso, M.: High performance query processing of a real-world OLAP database with ParGRES. In: Proceedings of the 8th International Conference VECPAR. LNCS, vol. 5336, pp. 188–200. Springer, Berlin (2008)
Kotowski, N., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Parallel query processing for OLAP in grids. Concurr. Comput. Pract. Exp. 20(17), 2039–2048 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ladjel Bellatreche.
Rights and permissions
About this article
Cite this article
Lima, A.A.B., Furtado, C., Valduriez, P. et al. Parallel OLAP query processing in database clusters with data replication. Distrib Parallel Databases 25, 97–123 (2009). https://doi.org/10.1007/s10619-009-7037-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-009-7037-8