An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures

Hassan, Mohamad Al Hajj; Bamha, Mostafa

doi:10.1007/978-3-642-05201-9_10

Mohamad Al Hajj Hassan⁵ &
Mostafa Bamha⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 47))

Included in the following conference series:

International Conference on Software and Data Technologies

301 Accesses

Abstract

Pipelined parallelism was largely studied and successfully implemented, on shared nothing machines, in several join algorithms in the presence of ideal conditions of load balancing between processors and in the absence of data skew. The aim of pipelining is to allow flexible resource allocation while avoiding unnecessary disk input/output for intermediate join results in the treatment of multi-join queries.

The main drawback of pipelining in existing algorithms is that communication and load balancing remain limited to the use of static approaches (generated during query optimization phase) based on hashing to redistribute data over the network and therefore cannot solve data skew problem and load imbalance between processors on heterogeneous multi-processor architectures where the load of each processor may vary in a dynamic and unpredictable way.

In this paper, we present a pipelined parallel algorithm for multi-join queries allowing to solve the problem of data skew while guaranteeing perfect balancing properties, on heterogeneous multi-processor Shared Nothing architectures. The performance of this algorithm is analyzed using the scalable portable BSP (Bulk Synchronous Parallel) cost model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bamha, M.: An optimal and skew-insensitive join and multi-join algorithm for ditributed architectures. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 616–625. Springer, Heidelberg (2005)
Chapter Google Scholar
Bamha, M., Exbrayat, M.: Pipelining a skew-insensitive parallel join algorithm. Parallel Processing Letters 13(3), 317–328 (2003)
Article MathSciNet Google Scholar
Bamha, M., Hains, G.: A skew insensitive algorithm for join and multi-join operation on Shared Nothing machines. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 644–653. Springer, Heidelberg (2000)
Chapter Google Scholar
Bamha, M., Hains, G.: A frequency adaptive join algorithm for Shared Nothing machines. Journal of Parallel and Distributed Computing Practices (PDCP) 3(3), 333–345 (1999); Appears also in: Columbus, F. Progress in Computer Research, vol. II. Nova Science Publishers (2001)
Google Scholar
Chen, M.-S., Lo, M.L., Yu, P.S., Young, H.C.: Using segmented right-deep trees for the execution of pipelined hash joins. In: Yuan, L.-Y. (ed.) Very Large Data Bases: VLDB 1992, Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, Canada, August 23–27, pp. 15–26. Morgan Kaufmann Publishers, Los Altos (1992)
Google Scholar
Chen, M.-S., Yu, P.S., Wu, K.-L.: Scheduling and processor allocation for the execution of multi-join queries. In: International Conference on Data Engineering, pp. 58–67. IEEE Computer Society Press, Los Alamos (1992)
Google Scholar
Datta, A., Moon, B., Thomas, H.: A case for parallelism in datawarehousing and OLAP. In: Ninth International Workshop on Database and Expert Systems Applications, DEXA 1998, pp. 226–231. IEEE Computer Society, Vienna (1998)
Chapter Google Scholar
DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. In: Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, pp. 27–40 (1992)
Google Scholar
Anastasios Gounaris: Resource aware query processing on the grid. Thesis report, University of Manchester, Faculty of Engineering and Physical Sciences (2005)
Google Scholar
Hassan, M.A.H., Bamha, M.: Dynamic data redistribution for join queries on heterogeneous shared nothing architecture. Technical Report 2, LIFO, Université d’Orléans, France (March 2008)
Google Scholar
Hua, K.A., Lee, C.: Handling data skew in multiprocessor database computers using partition tuning. In: Lohman, G.M., Sernadas, A., Camps, R. (eds.) Proc. of the 17th International Conference on Very Large Data Bases, Barcelona, Catalonia, Spain, pp. 525–535. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Liu, B., Rundensteiner, E.A.: Revisiting pipelined parallelism in multi-join query processing. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, pp. 829–840. VLDB Endowment (2005)
Google Scholar
Lu, H., Ooi, B.-C., Tan, K.-L.: Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamos (1994)
Google Scholar
Mourad, A.N., Morris, R.J.T., Swami, A., Young, H.C.: Limits of parallelism in hash join algorithms. Performance evaluation 20(1/3), 301–316 (1994)
Article Google Scholar
Rahm, E.: Dynamic load balancing in parallel database systems. In: Fraigniaud, P., et al. (eds.) Euro-Par 1996. LNCS, vol. 1123. Springer, Heidelberg (1996)
Google Scholar
Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and Answers about BSP. Scientific Programming 6(3), 249–274 (1997)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Communications of the ACM 33(8), 103–111 (1990)
Article Google Scholar
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Parallel and Distributed Information Systems (PDIS 1991), pp. 68–77. IEEE Computer Society Press, Los Alamits (1991)
Chapter Google Scholar
Wilschut, A.N., Flokstra, J., Apers, P.M.G.: Parallel evaluation of multi-join queries. Proceedings of the ACM-SIGMOD 24(2), 115–126 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIFO, University of Orléans, BP 6759, 45067, Orléans cedex 2, France
Mohamad Al Hajj Hassan & Mostafa Bamha

Authors

Mohamad Al Hajj Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa Bamha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Control and Communication (INSTICC), Institute for Systems and Technologies of Information,, Rua do Vale de Chaves, Estefanilha, 2910-761, Setúbal, Portugal
José Cordeiro
Interdisciplinary Institute for Collaboration and Research on Enterprise Systems and Technology – IICREST, P.O. Box 104, 1618, Sofia, Bulgaria
Boris Shishkov
INSTICC, Avenida D. Manuel I, 2910, Setúbal, Portugal
AlpeshKumar Ranchordas
School of Computing, Dublin City University, Dublin 9, Ireland
Markus Helfert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassan, M.A.H., Bamha, M. (2009). An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures. In: Cordeiro, J., Shishkov, B., Ranchordas, A., Helfert, M. (eds) Software and Data Technologies. ICSOFT 2008. Communications in Computer and Information Science, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05201-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-05201-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05200-2
Online ISBN: 978-3-642-05201-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics