Abstract
Free text retrieval is an important problem which can significantly benefit from a parallel architecture. Signature methods have been proposed to answer text retrieval queries in parallel machines [Sta88, LF92], under the assumption that the main memory is sufficient to hold the entire signature file. We propose the use of a Parallel Bit-Sliced Signature File method on a SIMD machine architecture when the size of the signature file exceeds the available memory. We propose that we need not examine all the bit slices; instead we use a partial fetch slice swapping algorithm. This method achieves graceful performance degradation according to the database size. We provide formulae for the optimal number of signature slices to fetch and match with the query signature. Arithmetic examples show that our method can handle a 128GB database with a 2sec response time on a machine with the characteristics of the Connection Machine.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This research was sponsored partially by the Institute for Advanced Computer Studies (UMIACS), by the National Science Foundation under the grants IRI-8719458, IRI-8958546 and IRI-9205273, by a donation by EMPRESS Software Inc., and by a donation by Thinking Machines Inc.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Stavros Christodoulakis and Christos Faloutsos. Design Considerations for a Message File Server. IEEE Transactions on Software Engineering, 10(2):201–210, March 1984.
Christos Faloutsos. Signature-Based Text Retrieval Methods: A Survey. IEEE Data Engineering, pages 25–32, March 1990.
Christos Faloutsos and Stavros Christodoulakis. Description and Performance Analysis of Signature File Methods for Office Filing. ACM Transactions on Office Information Systems, 5(3):237–257, July 1987.
Christos Faloutsos and Raphael Chan. Fast Text Access Methods for Optical Disks: Designs and Performance Comparison. In Proceedings of the 14th International Conference on Very Large Databases, pages 280–293, Long Beach, California, August 1988.
Christos Faloutsos and H. V. Jagadish. Hybrid Index Organizations for Text Databases. Technical Report UMIACS-TR-91-33 and CS-TR-2621, Department of Computer Science, University of Maryland, March 1991.
R. Haskin. Special-Purpose Processors for Text Retrieval. Database Engineering, 4(1):16–29, September 1981.
Zheng Lin and Christos Faloutsos. Frame Sliced Signature Files. IEEE Transactions on Knowledge and Data Engineering, 4(3):158–180, June 1992. Also available as UMD CS-TR-2146 and UMIACS-TR-88-88.
Zheng Lin. CAT: An Execution Model for Concurrent Full Text Search. In PDIS, 1992.
D. L. Lee and C. W. Leng. Partitioned Signature File: Designs and Performance Evaluation. ACM Transactions on Office Information Systems, 7(2):158–180, April 1989.
George Panagopoulos. Bit-Sliced Signature Files for Very Large Databases on a Parallel Machine Architecture. Technical Report CSC-809, Department of Computer Science, University of Maryland, April 1992.
Ron Sacks-Davis. Two Level Superimposed Coding Scheme for Partial Match Retrieval. Information Systems, 8(4):273–280, 1983.
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Craig Stanfill. Parallel Computing for Information Retrieval: Recent Developments. Technical Report DR88-1, Thinking Machines Corporation, Cambridge, Mass., January 1988.
Simon Stiassny. Mathematical Analysis of Various Superimposed Coding Methods. American Documentation, 11(2):155–169, February 1960.
Harold S. Stone. Parallel Querying of Large Databases: A Case Study. IEEE Computer, 20(10):11–21, October 1987.
D. Tsichritzis and S. Christodoulakis. Message Files. ACM Transactions on Office Information Systems, 1(1):88–98, January 1983.
Thinking Machines Corporation, Cambridge, Mass. Parallel Instruction Set, Version 5.2, October 1989.
G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge, MA, 1949.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Panagopoulos, G., Faloutsos, C. (1994). Bit-sliced signature files for very large text databases on a parallel machine architecture. In: Jarke, M., Bubenko, J., Jeffery, K. (eds) Advances in Database Technology — EDBT '94. EDBT 1994. Lecture Notes in Computer Science, vol 779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57818-8_65
Download citation
DOI: https://doi.org/10.1007/3-540-57818-8_65
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57818-5
Online ISBN: 978-3-540-48342-7
eBook Packages: Springer Book Archive