Abstract
We present and analyze here some innovative techniques for processing a join (or a semi-join) in a parallel computing environment. Our algorithms employ perfect hashing and, in some cases, copying of data in a group of processors, or filtering the data as they move through the network. By using the combinatorial properties of hashing we are able to prove almost optimal speedup, with high probability, when some uniformity assumptions hold for the data. Even in the absense of these assumptions our techniques achieve sub-optimal speedup and can be used as practical heuristics.
This research was supported in part by the NSF grant DCR 8503497 and by the Ministry of Industry, Energy and Technology of Greece.
Preview
Unable to display preview. Download preview PDF.
References
E. Babb, "Implementing a Relational Database by Means of Specialized Hardware", ACM TODS 4,1 (March, 1979), 1–29.
P.A. Bernstein and D.W. Chiu, "Using Semijoins to Solve Relational Queries", J.ACM 28:1, pp. 25–40, 1981.
H. Boral and D.J. DeWitt, "Design consideration for data-flow database mechines", in proceedings of the ACM-SIGMOD conference on management of data, 1980, pp. 94–104.
H. Boral, D.J. DeWitt, D. Friedland, and W.K. Wilkinson, "Parallel Algorithms for the execution of relational database operations", ACM Transactions on Database Systems vol. 8, no. 3, September 1983, pp. 324–353.
C. Bouras, Y. Garofalakis, P. Spirakis and V. Trianatafillou "Queuing Delays in Buffered Multistage Interconnection Networks", 1987 ACM SIGMETRICS Conference, Perf. Evaluation Review, vol. 15 no. 1 pp. 111–122.
P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, and J.B. Rothnie Jr. "Query Processing in a System for Distributed Databases" (SDD-1). ACM Trans. Database Syst. 6,4 pp. 602–625, 1981.
B. Berra and E. Oliver 1979 "The role of associative array processors in database machine architectures" IEEE Computer, 12,3,53–61.
Carter J.L. and Wegman M.N. "Universal classes of hash functions", Proc. 9th Symposium on Theory of Computing, 1977, pp. 106–112.
S. Ceri and G. Pelagatti, "Allocation of Operations in Distributed Database Access" IEEE Trans. Comput. C-31,2, pp. 119–128.
W.W. Chu and P. Hurley, "Optimal Query Processing for Distributed Database Systems" IEEE Trans. Computing, C-31,9, pp. 835–850, 1982.
D.J. DeWitt, "DIRECT-a multiprocessor organization for supporting relational database management systems", IEEE Transactions on Computers, C-28,6, 1979.
R.S. Epstein and M. Stonebraker, "Analysis of query processing stategies for distributed database systems", sixth international conference on very large databases, Mondreal, October, 1980.
H.J. Forker, "Algebraical and operational methods for the optimization of query processing in distributed relational database management systems. In Proceedings of the 2nd International Symposium on Distributed Databases (Berlin, FRG). Elsevier North-Holland, New York, pp. 39–59.
Gonnet G.H., "Expected length of the longest probe sequence in hash code searching", JACM 28, 1981, 289–304.
L.R. Goke and G.J. Lipovsky, "Banyan networks for partitioning multiprocessor systems", in proceedings `st annual symposium on computer architecture, 1973, pp. 21–28.
J.R. Goodman and C.H. Sequin, "HYPERTREE: a multiprocessor interconnection topology", IEEE Transactions on Computing, 30,12, 1981.
B. Gavish and A. Segev, "Query Optimization in Distributed Computer Systems" In Management of Distributed Data Processing, J. Akoks, Ed. Elsevier North-Holland, New York, pp. 233–252, 1982.
N. Goodman and O. Shmueli, "Tree queries: A simple class of relational queries" ACM Transactions of Database Systems vol. 7, no. 4, December 1982, pp. 653–677.
A.R. Henver and S.-B. Yao, "Query Processing in Distributed Database systems", IEEE Trans. Softw. Eng. SE-5,3 pp. 177–187, 1979.
A.R. Henver and S.B. Yao, "Query Processing on a Distributed Database" proceedings Third Workshop on Distributed Data Management and Computer Networks, August 1978, pp. 91–107.
D.K. Hsiao, 1979 "Database Machines are Coming, Database Machines are Coming" IEEE Computer 12,3, pp. 7–9.
M. Jarke and J. Koch, "Query Optimization in Database Systems", ACM Computing Surveys, vol. 16, no. 2, June 1984, pp. 111–152.
C.P. Kruskal and M. Snir, "The Performance of multistage interconnection networks for multiprocessors", in IEEE transactions on computers, vol. C-32, no. 12, December 1983.
M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Architecture and Performance of Relational Algebra Machine Grace" IEEE Parallel Processing Conference 1984.
M. Kitsuregawa, H. Tanaka, and T. Moto-oka, "Grace: Relational Algebra Machine Based on Hash and Sort-Its Design Concepts" Journal of Information Processing, vol. 6, no. 3, 1983.
M.J. Menon and D.K. Hsiao, "Design and Analysis of a Relation Join Operation for VLSI", Proceedings International Conference on Very Large Database, 1981.
E.A. Ozkarahan 1982, RAP "Database Machine/Computer Based Distributed Databases", In Proceedings of the 2nd International Symposium on Distributed Databases. (Berlin, FRG). Elsevier North-Holland, New York, pp. 61–80.
J. Schwartz "Ultracomputers" ACM Transactions on Programming Languages and Systems, 1980.
S.Y.W. Su 1979 "Cellular-logic Devices: Concepts and Applications" IEEE Computer 12,3, 11–25.
M. Schkolnick, "Physical database design techniques", In Data Base Design Techniques II S.B. Yao and T.L. Kunii, Eds., Springer-Verlag, pp. 229–252, 1982.
J.W. Schmidt, "Parallel processing of relations: a single-assignment approach", In proceedings of the IEEE 5th international conference on very large data bases, pp. 398–408, 1979.
S.Y.W. Su and G. Lipovsky 1975, "CASSM: A Cellular System for Very Large Databases" In Proceedings of the 1st International Conference on Very Large Data Bases" Framingham, Mass., Sept. 22–24. ACM, New York, pp. 456–472.
D. Shasha, "Query Processing in a Symmetric Parallel Environment" 6th Advanced Database Symposium, Proceedings.
R.K. Shultz and R.J. Zingg, "Response Time Analysis of Multiprocessor Computers for Database Support" ACM Transactions of Database Systems, vol.9, no.1, March, 1984, pp. 100–132.
J.D. Ullman, Principles of Database Systems second edition. Computer Science Press, 1982.
P. Valduriez and G. Gardarin, "Join and Semijoin Algorithms for a Multiprocessor Database Machine" ACM Transactions of Database Systems, vol. 9, no. 1, March 1984, pp. 133–161.
U. Vishkin, "A parallel-design distributed-implementation (PDDI) general-purpose computer", Technical Report no. 96, New York University department of computer science, June, 1983.
E. Wong and K. Youssefi, "Decomposition a strategy for query processing" ACM TODS 1,3 Sept. 1976, pp. 223–241.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1988 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shasha, D., Spirakis, P. (1988). Fast parallel algorithms for processing of joins. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds) Supercomputing. ICS 1987. Lecture Notes in Computer Science, vol 297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-18991-2_55
Download citation
DOI: https://doi.org/10.1007/3-540-18991-2_55
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18991-6
Online ISBN: 978-3-540-38888-3
eBook Packages: Springer Book Archive