Supercomputing pp 939-953 | Cite as

Fast parallel algorithms for processing of joins

  • Dennis Shasha
  • Paul Spirakis
Session 10: Algorithms, Architectures And Performance III
Part of the Lecture Notes in Computer Science book series (LNCS, volume 297)


We present and analyze here some innovative techniques for processing a join (or a semi-join) in a parallel computing environment. Our algorithms employ perfect hashing and, in some cases, copying of data in a group of processors, or filtering the data as they move through the network. By using the combinatorial properties of hashing we are able to prove almost optimal speedup, with high probability, when some uniformity assumptions hold for the data. Even in the absense of these assumptions our techniques achieve sub-optimal speedup and can be used as practical heuristics.


Query Processing Query Optimization Communication Step Proof Sketch Relational Database Management System 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Babb, 79]
    E. Babb, "Implementing a Relational Database by Means of Specialized Hardware", ACM TODS 4,1 (March, 1979), 1–29.Google Scholar
  2. [Bernstein, Chiu, 81]
    P.A. Bernstein and D.W. Chiu, "Using Semijoins to Solve Relational Queries", J.ACM 28:1, pp. 25–40, 1981.Google Scholar
  3. [Boral, DeWitt, 80]
    H. Boral and D.J. DeWitt, "Design consideration for data-flow database mechines", in proceedings of the ACM-SIGMOD conference on management of data, 1980, pp. 94–104.Google Scholar
  4. [Boral, DeWitt, Friedland, Wilkinson, 83]
    H. Boral, D.J. DeWitt, D. Friedland, and W.K. Wilkinson, "Parallel Algorithms for the execution of relational database operations", ACM Transactions on Database Systems vol. 8, no. 3, September 1983, pp. 324–353.Google Scholar
  5. [Bouras, Garofalakis, Spirakis, Triantafillou, 87]
    C. Bouras, Y. Garofalakis, P. Spirakis and V. Trianatafillou "Queuing Delays in Buffered Multistage Interconnection Networks", 1987 ACM SIGMETRICS Conference, Perf. Evaluation Review, vol. 15 no. 1 pp. 111–122.Google Scholar
  6. [Bernstein, Goodman, Wong, Reeve, Rothnie, 81]
    P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, and J.B. Rothnie Jr. "Query Processing in a System for Distributed Databases" (SDD-1). ACM Trans. Database Syst. 6,4 pp. 602–625, 1981.Google Scholar
  7. [Berra, Oliver, 79]
    B. Berra and E. Oliver 1979 "The role of associative array processors in database machine architectures" IEEE Computer, 12,3,53–61.Google Scholar
  8. [Carter, Wegman, 77]
    Carter J.L. and Wegman M.N. "Universal classes of hash functions", Proc. 9th Symposium on Theory of Computing, 1977, pp. 106–112.Google Scholar
  9. [Ceri, Pelagatti]
    S. Ceri and G. Pelagatti, "Allocation of Operations in Distributed Database Access" IEEE Trans. Comput. C-31,2, pp. 119–128.Google Scholar
  10. [Chu, Hurley, 82]
    W.W. Chu and P. Hurley, "Optimal Query Processing for Distributed Database Systems" IEEE Trans. Computing, C-31,9, pp. 835–850, 1982.Google Scholar
  11. [DeWitt, 79]
    D.J. DeWitt, "DIRECT-a multiprocessor organization for supporting relational database management systems", IEEE Transactions on Computers, C-28,6, 1979.Google Scholar
  12. [Epstein, Stonebraker, 80]
    R.S. Epstein and M. Stonebraker, "Analysis of query processing stategies for distributed database systems", sixth international conference on very large databases, Mondreal, October, 1980.Google Scholar
  13. [Forker]
    H.J. Forker, "Algebraical and operational methods for the optimization of query processing in distributed relational database management systems. In Proceedings of the 2nd International Symposium on Distributed Databases (Berlin, FRG). Elsevier North-Holland, New York, pp. 39–59.Google Scholar
  14. [Gonnet, 81]
    Gonnet G.H., "Expected length of the longest probe sequence in hash code searching", JACM 28, 1981, 289–304.Google Scholar
  15. [Goke, Lipovsky, 73]
    L.R. Goke and G.J. Lipovsky, "Banyan networks for partitioning multiprocessor systems", in proceedings `st annual symposium on computer architecture, 1973, pp. 21–28.Google Scholar
  16. [Goodman, Sequin, 81]
    J.R. Goodman and C.H. Sequin, "HYPERTREE: a multiprocessor interconnection topology", IEEE Transactions on Computing, 30,12, 1981.Google Scholar
  17. [Gavish, Segev, 82]
    B. Gavish and A. Segev, "Query Optimization in Distributed Computer Systems" In Management of Distributed Data Processing, J. Akoks, Ed. Elsevier North-Holland, New York, pp. 233–252, 1982.Google Scholar
  18. [Goodman, Shmueli, 82]
    N. Goodman and O. Shmueli, "Tree queries: A simple class of relational queries" ACM Transactions of Database Systems vol. 7, no. 4, December 1982, pp. 653–677.Google Scholar
  19. [Henver, Yao, 79]
    A.R. Henver and S.-B. Yao, "Query Processing in Distributed Database systems", IEEE Trans. Softw. Eng. SE-5,3 pp. 177–187, 1979.Google Scholar
  20. [Henver, Yao, 78]
    A.R. Henver and S.B. Yao, "Query Processing on a Distributed Database" proceedings Third Workshop on Distributed Data Management and Computer Networks, August 1978, pp. 91–107.Google Scholar
  21. [Hsiao, 79]
    D.K. Hsiao, 1979 "Database Machines are Coming, Database Machines are Coming" IEEE Computer 12,3, pp. 7–9.Google Scholar
  22. [Jarke, Koch, 84]
    M. Jarke and J. Koch, "Query Optimization in Database Systems", ACM Computing Surveys, vol. 16, no. 2, June 1984, pp. 111–152.Google Scholar
  23. [Kruskal, Snir, 83]
    C.P. Kruskal and M. Snir, "The Performance of multistage interconnection networks for multiprocessors", in IEEE transactions on computers, vol. C-32, no. 12, December 1983.Google Scholar
  24. [Kitsuregawa, Tanaka, Moto-oka, 84]
    M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Architecture and Performance of Relational Algebra Machine Grace" IEEE Parallel Processing Conference 1984.Google Scholar
  25. [Kitsuregawa, Tanaka, Moto-oka, 83]
    M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Grace: Relational Algebra Machine Based on Hash and Sort-Its Design Concepts" Journal of Information Processing, vol. 6, no. 3, 1983.Google Scholar
  26. [Menon, Hsiao, 81]
    M.J. Menon and D.K. Hsiao, "Design and Analysis of a Relation Join Operation for VLSI", Proceedings International Conference on Very Large Database, 1981.Google Scholar
  27. [Ozkarahan, 82]
    E.A. Ozkarahan 1982, RAP "Database Machine/Computer Based Distributed Databases", In Proceedings of the 2nd International Symposium on Distributed Databases. (Berlin, FRG). Elsevier North-Holland, New York, pp. 61–80.Google Scholar
  28. [Schwartz, 80]
    J. Schwartz "Ultracomputers" ACM Transactions on Programming Languages and Systems, 1980.Google Scholar
  29. [Su, 79]
    S.Y.W. Su 1979 "Cellular-logic Devices: Concepts and Applications" IEEE Computer 12,3, 11–25.Google Scholar
  30. [Schkolnick, 82]
    M. Schkolnick, "Physical database design techniques", In Data Base Design Techniques II S.B. Yao and T.L. Kunii, Eds., Springer-Verlag, pp. 229–252, 1982.Google Scholar
  31. [Schmidt, 79]
    J.W. Schmidt, "Parallel processing of relations: a single-assignment approach", In proceedings of the IEEE 5th international conference on very large data bases, pp. 398–408, 1979.Google Scholar
  32. [Su, Lipovsky, 75]
    S.Y.W. Su and G. Lipovsky 1975, "CASSM: A Cellular System for Very Large Databases" In Proceedings of the 1st International Conference on Very Large Data Bases" Framingham, Mass., Sept. 22–24. ACM, New York, pp. 456–472.Google Scholar
  33. [Shasha, 86]
    D. Shasha, "Query Processing in a Symmetric Parallel Environment" 6th Advanced Database Symposium, Proceedings.Google Scholar
  34. [Shultz, Zingg, 84]
    R.K. Shultz and R.J. Zingg, "Response Time Analysis of Multiprocessor Computers for Database Support" ACM Transactions of Database Systems, vol.9, no.1, March, 1984, pp. 100–132.Google Scholar
  35. [Ullman, 82]
    J.D. Ullman, Principles of Database Systems second edition. Computer Science Press, 1982.Google Scholar
  36. [Valduriez, Gardarin, 84]
    P. Valduriez and G. Gardarin, "Join and Semijoin Algorithms for a Multiprocessor Database Machine" ACM Transactions of Database Systems, vol. 9, no. 1, March 1984, pp. 133–161.Google Scholar
  37. [Vishkin, 83]
    U. Vishkin, "A parallel-design distributed-implementation (PDDI) general-purpose computer", Technical Report no. 96, New York University department of computer science, June, 1983.Google Scholar
  38. [Wong, Youssefi, 76]
    E. Wong and K. Youssefi, "Decomposition a strategy for query processing" ACM TODS 1,3 Sept. 1976, pp. 223–241.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Dennis Shasha
    • 1
  • Paul Spirakis
    • 1
    • 2
  1. 1.Courant Institute, NYUUSA
  2. 2.Computer Technology InstituteGreece

Personalised recommendations