Skip to main content

An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures

  • Conference paper
Software and Data Technologies (ICSOFT 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 47))

Included in the following conference series:

  • 301 Accesses

Abstract

Pipelined parallelism was largely studied and successfully implemented, on shared nothing machines, in several join algorithms in the presence of ideal conditions of load balancing between processors and in the absence of data skew. The aim of pipelining is to allow flexible resource allocation while avoiding unnecessary disk input/output for intermediate join results in the treatment of multi-join queries.

The main drawback of pipelining in existing algorithms is that communication and load balancing remain limited to the use of static approaches (generated during query optimization phase) based on hashing to redistribute data over the network and therefore cannot solve data skew problem and load imbalance between processors on heterogeneous multi-processor architectures where the load of each processor may vary in a dynamic and unpredictable way.

In this paper, we present a pipelined parallel algorithm for multi-join queries allowing to solve the problem of data skew while guaranteeing perfect balancing properties, on heterogeneous multi-processor Shared Nothing architectures. The performance of this algorithm is analyzed using the scalable portable BSP (Bulk Synchronous Parallel) cost model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bamha, M.: An optimal and skew-insensitive join and multi-join algorithm for ditributed architectures. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 616–625. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Bamha, M., Exbrayat, M.: Pipelining a skew-insensitive parallel join algorithm. Parallel Processing Letters 13(3), 317–328 (2003)

    Article  MathSciNet  Google Scholar 

  3. Bamha, M., Hains, G.: A skew insensitive algorithm for join and multi-join operation on Shared Nothing machines. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 644–653. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Bamha, M., Hains, G.: A frequency adaptive join algorithm for Shared Nothing machines. Journal of Parallel and Distributed Computing Practices (PDCP) 3(3), 333–345 (1999); Appears also in: Columbus, F. Progress in Computer Research, vol. II. Nova Science Publishers (2001)

    Google Scholar 

  5. Chen, M.-S., Lo, M.L., Yu, P.S., Young, H.C.: Using segmented right-deep trees for the execution of pipelined hash joins. In: Yuan, L.-Y. (ed.) Very Large Data Bases: VLDB 1992, Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, Canada, August 23–27, pp. 15–26. Morgan Kaufmann Publishers, Los Altos (1992)

    Google Scholar 

  6. Chen, M.-S., Yu, P.S., Wu, K.-L.: Scheduling and processor allocation for the execution of multi-join queries. In: International Conference on Data Engineering, pp. 58–67. IEEE Computer Society Press, Los Alamos (1992)

    Google Scholar 

  7. Datta, A., Moon, B., Thomas, H.: A case for parallelism in datawarehousing and OLAP. In: Ninth International Workshop on Database and Expert Systems Applications, DEXA 1998, pp. 226–231. IEEE Computer Society, Vienna (1998)

    Chapter  Google Scholar 

  8. DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. In: Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, pp. 27–40 (1992)

    Google Scholar 

  9. Anastasios Gounaris: Resource aware query processing on the grid. Thesis report, University of Manchester, Faculty of Engineering and Physical Sciences (2005)

    Google Scholar 

  10. Hassan, M.A.H., Bamha, M.: Dynamic data redistribution for join queries on heterogeneous shared nothing architecture. Technical Report 2, LIFO, Université d’Orléans, France (March 2008)

    Google Scholar 

  11. Hua, K.A., Lee, C.: Handling data skew in multiprocessor database computers using partition tuning. In: Lohman, G.M., Sernadas, A., Camps, R. (eds.) Proc. of the 17th International Conference on Very Large Data Bases, Barcelona, Catalonia, Spain, pp. 525–535. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  12. Liu, B., Rundensteiner, E.A.: Revisiting pipelined parallelism in multi-join query processing. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, pp. 829–840. VLDB Endowment (2005)

    Google Scholar 

  13. Lu, H., Ooi, B.-C., Tan, K.-L.: Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamos (1994)

    Google Scholar 

  14. Mourad, A.N., Morris, R.J.T., Swami, A., Young, H.C.: Limits of parallelism in hash join algorithms. Performance evaluation 20(1/3), 301–316 (1994)

    Article  Google Scholar 

  15. Rahm, E.: Dynamic load balancing in parallel database systems. In: Fraigniaud, P., et al. (eds.) Euro-Par 1996. LNCS, vol. 1123. Springer, Heidelberg (1996)

    Google Scholar 

  16. Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and Answers about BSP. Scientific Programming 6(3), 249–274 (1997)

    Google Scholar 

  17. Valiant, L.G.: A bridging model for parallel computation. Communications of the ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  18. Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Parallel and Distributed Information Systems (PDIS 1991), pp. 68–77. IEEE Computer Society Press, Los Alamits (1991)

    Chapter  Google Scholar 

  19. Wilschut, A.N., Flokstra, J., Apers, P.M.G.: Parallel evaluation of multi-join queries. Proceedings of the ACM-SIGMOD 24(2), 115–126 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hassan, M.A.H., Bamha, M. (2009). An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures. In: Cordeiro, J., Shishkov, B., Ranchordas, A., Helfert, M. (eds) Software and Data Technologies. ICSOFT 2008. Communications in Computer and Information Science, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05201-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05201-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05200-2

  • Online ISBN: 978-3-642-05201-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics