Skip to main content

Dataflow query execution in a parallel main-memory environment

Abstract

In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries.

Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule; larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    P. America (ed.),Proc. PRISMA Workshop Parallel Database Systems, Springer-Verlag: New York, 1991.

    Google Scholar 

  2. 2.

    P.M.G. Apers, C.A. van den Berg, J. Flokstra, P.W.P.J. Grefen, M.L. Kersten, and A.N. Wilschut, “PRISMA/DB: A parallel main-memory relational DBMS.” To appear in IEEE transactions on Knowledge and Data Engineering.

  3. 3.

    D. Bitton, D.J. DeWitt and C. Turbyfill, “Benchmarking database systems—A systematic approach,” in M. Schkolnick and C. Thanos (eds.),Proc. 9th Int. Conf. Very Large Data Bases, Florence, Italy VLDB Endowment: Saratoga, CA, 1983.

    Google Scholar 

  4. 4.

    P. Bodorik and J.S. Riordon, “Heuristic algorithms for distributed query processing,” in S. Jajodia, W. Kim and A. Silberschatz (eds.),Proc. Int. Symposium on Databases Parallel Distributed Systems, Austin, Texas IEEE Press: Montvale, NJ, pp. 107–117, 1988.

    Google Scholar 

  5. 5.

    H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez, “Prototyping Bubba, a highly parallel database system,IEEE Trans Knowledge Data Eng., Vol. 2, no. 2, pp. 4–24, 1990.

    Google Scholar 

  6. 6.

    K. Bratbergsengen and T. Gjelsvik, “The development of the CROSS8 and HC16-186 (Database) computers,” in H. Boral and P. Faudemay (eds.),Proc. 6th Int. Workshop Database Machines, Deauville, France, June 1989, Springer-Verlag: New York, pp. 359–372, 1989.

    Google Scholar 

  7. 7.

    B.W. Char, K.O. Geddes, G.H. Gonnet, M.B. Monager, and S.M. Watt,Maple Reference Manual, WATCOM: Waterloo, Canada, 1988.

    Google Scholar 

  8. 8.

    D.J. DeWitt and J. Gray, “Parallel database systems: The future of database processing or a passing fad?,”ACM SIGMOD Record, vol. 19, no. 4, pp. 104–112, 1990.

    Google Scholar 

  9. 9.

    D.J. DeWitt, S. Ghandeharizadeh, D.A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, “The GAMMA database machine project,”IEEE Trans. Knowledge Data Eng., vol. 2, no. 1, pp. 44–62, 1990.

    Google Scholar 

  10. 10.

    G. Graefe, “Encapsulation of parallelism in the volcano query processing system,” in H. Garcia-Molina, H.V. Jagardish (eds.),Proc. ACM-SIGMOD 1990 Int. Conf. Management Data, Atlantic City, NJ, ACM Press: New York, pp. 102–111.

  11. 11.

    P.W.P.J. Grefen, A.N. Wilschut, and J. Flokstra, “PRISMA/DB1 User Manual,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-06, 1991.

  12. 12.

    M. Jarke and J. Koch, “Query optimization in database systems,”Comput. Surv., vol. 16, no. 2, pp. 111–152, 1984.

    Google Scholar 

  13. 13.

    M.L. Kersten, P.M.G. Apers, M.A.W. Houtsma, H.J.A. van Kuijk, and R.L.W. vande Weg, “PRISMA: A Distributed main memory database machine,” inProc. 5th Inter. Workshop Database Machines, Karuizawa, Japan, 1987.

  14. 14.

    E. van Kuijk, “Semantic query optimization in distributed database systems,” Ph.D. thesis, University of Twente, 1991.

  15. 15.

    A. Okubo,Diffusion and Ecological Problems: Mathematical Models, Springer-Verlag: New York, 1980.

    Google Scholar 

  16. 16.

    D.A. Schneider and D.J. DeWitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,” in J. Clifford, B. Lindsay and D. Maier (eds.),Proc. ACM-SIGMOD 1989 Inter. Conf. Management Data, Portland, OR, ACM Press: New York, 1989 (Also appeared as ACM SIGMOD Record, vol. 18, no. 2, 1989.)

    Google Scholar 

  17. 17.

    D.A. Schneider and D.J. Dewitt, “Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,” in D. McLeod, R. Sacks-Davis and H. Schek (eds.),Proc. 16th Int. Conf. Very Large Data Bases, Brisbane, Australia, Morgan Kaufmann: Palo Alto, CA, pp. 469–480, 1990.

    Google Scholar 

  18. 18.

    P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie and T.G. Price, “Access path selection in a Relational Database Management System,” inProc. ACM-SIGMOD 1979 Int. Conf. Management Data, Boston, MA, pp. 82–93, 1979.

  19. 19.

    W.B. Teeuw and H.M. Blanken, “Control versus data flow in distributed database machines,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-02, 1991.

  20. 20.

    Teradata Corporation, “Teradata,” DBC/1012 Database Computer Concepts and Facilities,” C02-0001-00, 1983.

  21. 21.

    A.N. Wilschut, “A model for dataflow query execution in a parallel main-memory environment,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-34, 1991.

  22. 22.

    A.N. Wilschut and P.M.G. Apers, “Pipelining in query execution,” in N. Rishe, S. Navathe, and D. Tal (eds.),Proc. Int. Conf. Databases, Parallel Architectures and their applications, Miami, IEEE Press: Montvale, NJ, 1990.

    Google Scholar 

  23. 23.

    A.N. Wilschut, P.M.G. Apers, and J. Flokstra, “Parallel query execution in PRISMA/DB,” in P. America (ed.),Proc. PRISMA Workshop Parallel Database Systems, Noordwijk, The Netherlands, Springer-Verlag: New York, 1991.

    Google Scholar 

  24. 24.

    A.N. Wilschut and P.G. Doucet, “Theoretical studies on animal orientation: A model for kinesis,”Theoret. Biol. vol. 127, pp. 111–125, 1987.

    Google Scholar 

  25. 25.

    A.N. Wilschut, J. Flokstra, and P.M.G. Apers, “Parallelism in a main-memory system: The performance of PRISMA/DB.,” inProc. 18th Int. Conf. Very Large Data Bases, Vancouver, Canada, 1992.

  26. 26.

    A.N. Wilschut, P.W.P.J. Grefen, P.M.G. Apers, and M.L. Kersten, “Implementing PRISMA/DB in an OOPL.,” in H. Boral and P. Faudemay (eds.),Proc. 6th Int. Workshop Database Machines, Deauville, France, Springer-Verlag: New York, pp. 359–372, 1989.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wilschut, A.N., Apers, P.M.G. Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1, 103–128 (1993). https://doi.org/10.1007/BF01277522

Download citation

Keywords

  • parallel query processing
  • multi-join queries
  • simulation
  • analytical modeling