Skip to main content
Log in

Parallel OLAP query processing in database clusters with data replication

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. TPC: TPC BenchmarkTM H—Revision 2.3.0. Transaction Processing Performance Council. http://www.tpc.org/tpch (2008). Accessed 20 April 2008

  2. Valduriez, P.: Parallel database systems: open problems and new issues. Int. J. Distrib. Parallel Databases 1(2), 137–165 (1993)

    Article  Google Scholar 

  3. Akal, F., Böhm, K., Schek, H.-J.: OLAP query evaluation in a database cluster: a performance study on intra-query parallelism. In: Proceedings of the 6th East European Conference on Advances in Databases and Information Systems. LNCS, vol. 2435, pp. 218–231. Springer, Berlin (2002)

    Google Scholar 

  4. Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, Englewood Cliffs (1999)

    Google Scholar 

  5. Röhm, U., Böhm, K., Schek, H.-J.: OLAP query routing and physical design in a database cluster. In: Proceedings of the 7th International Conference on Extending Database Technology. LNCS, vol. 1777, pp. 254–268. Springer, Berlin (2000)

    Google Scholar 

  6. Miranda, B., Lima, A.A.B., Valduriez, P., Mattoso, M.: Apuama: combining intra-query and inter-query parallelism in a database cluster. In: Proceedings of the EDBT workshops. LNCS, vol. 4254, pp. 649–661. Springer, Berlin (2006)

    Google Scholar 

  7. Mattoso, M., Silva, G.Z., Lima, A.A.B., Baião, F.A., Braganholo, V.P., Aveleda, A., Miranda, B., Almentero, B.K., Costa, M.N.: ParGRES: middleware para processamento paralelo consultas OLAP em clusters de Banco de dados. In: Proceedings of the 21st Brazilian Symposium on Databases—2nd Demo Session, pp. 19–24, 2006

  8. Lima, A.A.B., Mattoso, M., Valduriez, P.: Adaptive virtual partitioning for OLAP query processing in a database cluster. In: Proceedings of the 19th Brazilian Symposium on Databases, pp. 92–105, 2004

  9. Cappello, F., Caron, E., Dayde, M., Desprez, F., Jegou, Y., Primet, P., Jeannot, E., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Quetier, B., Richard, O.: Grid5000: a large scale and highly reconfigurable grid experimental testbed. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 99–106, 2005

  10. PostgreSQL, PostgreSQL DBMS. http://www.postgresql.org (2008). Accessed 11 April 2008

  11. Cecchet, E., Marguerite, J., Zwaenepoel, W.: C-JDBC: flexible database clustering middleware. In: Proceedings of the annual conference on USENIX Annual Technical Conference, pp. 26–26, 2004

  12. MySQL: A guide to high availability clustering—how MySQL supports 99,999% Availability. MySQL AB. http://www.mysql.com/why-mysql/white-papers/cluster.php (2004). Accessed 10 April 2008

  13. PGCluster, PG-Cluster: the multi-master synchronous replication system for PostgreSQL. http://pgcluster.projects.postgresql.org/ (2005). Accessed 11 April 2008

  14. Lima, A.A.B., Mattoso, M., Valduriez, P.: OLAP query processing in a database cluster. In: Proceedings of the 10th International Euro-Par Conference. LNCS, vol. 3149, pp. 355–362. Springer, Berlin (2004)

    Google Scholar 

  15. Furtado, C., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Physical and virtual partitioning in OLAP database clusters. In: Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pp. 143–150, 2005

  16. Furtado, P.: Replication in node partitioned data warehouses. In: VLDB Workshop on Design, Implementation, and Deployment of Database Replication, 2005

  17. Furtado, P.: Node partitioned data warehouses: experimental evidence and improvements. J. Database Manag. 17(2), 42–60 (2006)

    Google Scholar 

  18. Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 273–284, 2000

  19. Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: Proceedings of the 7th International Conference DaWaK. LNCS, vol. 3589, pp. 115–125. Springer, Berlin (2005)

    Google Scholar 

  20. Hsiao, H., DeWitt, D.J.: Chained declustering: a new availability strategy for multiprocessor database machines. In: Proceedings of 6th International Data Engineering Conference, pp. 456–465, 1990

  21. Plastino, A., Ribeiro, C.C., Rodriguez, N.: Developing SPMD applications with load balancing. Parallel Comput. 29(6), 743–766 (2003)

    Article  Google Scholar 

  22. Paes, M., Lima, A.A.B., Valduriez, P., Mattoso, M.: High performance query processing of a real-world OLAP database with ParGRES. In: Proceedings of the 8th International Conference VECPAR. LNCS, vol. 5336, pp. 188–200. Springer, Berlin (2008)

    Google Scholar 

  23. Kotowski, N., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Parallel query processing for OLAP in grids. Concurr. Comput. Pract. Exp. 20(17), 2039–2048 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre A. B. Lima.

Additional information

Communicated by Ladjel Bellatreche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lima, A.A.B., Furtado, C., Valduriez, P. et al. Parallel OLAP query processing in database clusters with data replication. Distrib Parallel Databases 25, 97–123 (2009). https://doi.org/10.1007/s10619-009-7037-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-009-7037-8

Keywords

Navigation