Parallelism for high performance query processing
We present a new method for a type of processing required in data base management systems. The method efficiently determin relevance of a given query value to each of many (target) sets of data. By using a new type of data structure, the method allows complete parallelism both for operations on different target sets and for those within each target set. The method never generates a false drop (i.e. indicates that an irrelevant target set is relevant to the query) and always identifies all relevant target sets. This eliminates the the overhead of reading each selected target set to ensure that the selection was not a false drop. A good deterministic bound on the system's performance is established.
With O(ln N v +ln ln M) processors, the relevance of any target set can be completely determined in O(1) time against a query consisting of a subset of N v vocabulary items. The space complexity is O(N i (ln N v +ln lnN v )) bits, where N i is the number of items relevant to target set i. As a concrete example, for a database using 64 byte keys, having a 100,000 word vocabulary (potentially valid keys) and in which a target set can have up to 64 distinct relevant elements, the relevance of a target set can be determined in 2 parallel operations using 6 processors. In other words, with 64K processors a database of one million target sets can be processed in 184 parallel operations. No probability distribution assumptions are necessary.
KeywordsSignatures Searching Retrieval Parallel Algorithms Complexity
Unable to display preview. Download preview PDF.
- 1.S. R. Ahuja and C. S. Roberts, “An Associative/Parallel Processor for Partial Match Retrieval Using Superimposed Codes,” in Annual Symposium on Computer Architecture, 1980, pp. 218–227.Google Scholar
- 2.J. Bentley, “A Spelling Checker,” Communications of the ACM,” Vol. 28, no. 5, pp. 456–462, 1985.Google Scholar
- 3.C. Faloutsos, “Access Methods for Text,” Computing Surveys, vol. 17, no. 1, pp. 49–74, 1985.Google Scholar
- 4.M. Fredman, J. Komlos and E. Szemeredi, “Storing a Sparse Table with O(1) Worst Case Access Time,” Journal of the ACM, vol. 31, no. 3, pp. 538–544, 1984.Google Scholar
- 5.L. L. Gremillion, “Designing a Bloom Filter for Differential Access,” Communications of the ACM, vol. 25, no. 7, pp. 600–604, 1980.Google Scholar
- 6.D. E. Knuth, 1973. The Art of Computer Programming, vol. S: Sorting and Searching. Reading, Mass.: Addison-Wesley, 1973.Google Scholar
- 7.J. W. Lloyd, “Optimal Partial Match Retrieval,” BIT, vol. 20, pp. 406–413, 1980.Google Scholar
- 8.P. E. McKenney, “High Speed Event Counting and Classification Using a Dictionary Hash Technique,” in Proceedings of the International Conference on Parallel Processing, pp. 218–227, 1989.Google Scholar
- 9.H. N. Shapiro, Introduction to the Theory of Numbers. New York: John Wiley and Sons, 1983.Google Scholar
- 10.D. Tsichritzis D. Christodoulakis and S. Christodoulakis, “Message Files,” ACM Trans. Office Inf. Systems, vol. 1, no. 1, pp. 88–98, 1983.Google Scholar