Bit-sliced signature files for very large text databases on a parallel machine architecture

  • George Panagopoulos
  • Christos Faloutsos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 779)

Abstract

Free text retrieval is an important problem which can significantly benefit from a parallel architecture. Signature methods have been proposed to answer text retrieval queries in parallel machines [Sta88, LF92], under the assumption that the main memory is sufficient to hold the entire signature file. We propose the use of a Parallel Bit-Sliced Signature File method on a SIMD machine architecture when the size of the signature file exceeds the available memory. We propose that we need not examine all the bit slices; instead we use a partial fetch slice swapping algorithm. This method achieves graceful performance degradation according to the database size. We provide formulae for the optimal number of signature slices to fetch and match with the query signature. Arithmetic examples show that our method can handle a 128GB database with a 2sec response time on a machine with the characteristics of the Connection Machine.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [CF84]
    Stavros Christodoulakis and Christos Faloutsos. Design Considerations for a Message File Server. IEEE Transactions on Software Engineering, 10(2):201–210, March 1984.Google Scholar
  2. [Fal90]
    Christos Faloutsos. Signature-Based Text Retrieval Methods: A Survey. IEEE Data Engineering, pages 25–32, March 1990.Google Scholar
  3. [FC87]
    Christos Faloutsos and Stavros Christodoulakis. Description and Performance Analysis of Signature File Methods for Office Filing. ACM Transactions on Office Information Systems, 5(3):237–257, July 1987.Google Scholar
  4. [FC88]
    Christos Faloutsos and Raphael Chan. Fast Text Access Methods for Optical Disks: Designs and Performance Comparison. In Proceedings of the 14th International Conference on Very Large Databases, pages 280–293, Long Beach, California, August 1988.Google Scholar
  5. [FJ91]
    Christos Faloutsos and H. V. Jagadish. Hybrid Index Organizations for Text Databases. Technical Report UMIACS-TR-91-33 and CS-TR-2621, Department of Computer Science, University of Maryland, March 1991.Google Scholar
  6. [Has81]
    R. Haskin. Special-Purpose Processors for Text Retrieval. Database Engineering, 4(1):16–29, September 1981.Google Scholar
  7. [LF92]
    Zheng Lin and Christos Faloutsos. Frame Sliced Signature Files. IEEE Transactions on Knowledge and Data Engineering, 4(3):158–180, June 1992. Also available as UMD CS-TR-2146 and UMIACS-TR-88-88.Google Scholar
  8. [Lin92]
    Zheng Lin. CAT: An Execution Model for Concurrent Full Text Search. In PDIS, 1992.Google Scholar
  9. [LL89]
    D. L. Lee and C. W. Leng. Partitioned Signature File: Designs and Performance Evaluation. ACM Transactions on Office Information Systems, 7(2):158–180, April 1989.Google Scholar
  10. [Pan92]
    George Panagopoulos. Bit-Sliced Signature Files for Very Large Databases on a Parallel Machine Architecture. Technical Report CSC-809, Department of Computer Science, University of Maryland, April 1992.Google Scholar
  11. [SD83]
    Ron Sacks-Davis. Two Level Superimposed Coding Scheme for Partial Match Retrieval. Information Systems, 8(4):273–280, 1983.Google Scholar
  12. [SM83]
    G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.Google Scholar
  13. [Sta88]
    Craig Stanfill. Parallel Computing for Information Retrieval: Recent Developments. Technical Report DR88-1, Thinking Machines Corporation, Cambridge, Mass., January 1988.Google Scholar
  14. [Sti60]
    Simon Stiassny. Mathematical Analysis of Various Superimposed Coding Methods. American Documentation, 11(2):155–169, February 1960.Google Scholar
  15. [Sto87]
    Harold S. Stone. Parallel Querying of Large Databases: A Case Study. IEEE Computer, 20(10):11–21, October 1987.Google Scholar
  16. [TC83]
    D. Tsichritzis and S. Christodoulakis. Message Files. ACM Transactions on Office Information Systems, 1(1):88–98, January 1983.Google Scholar
  17. [Thi89]
    Thinking Machines Corporation, Cambridge, Mass. Parallel Instruction Set, Version 5.2, October 1989.Google Scholar
  18. [Zip49]
    G. K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge, MA, 1949.Google Scholar

Copyright information

© Springer-Verlag 1994

Authors and Affiliations

  • George Panagopoulos
    • 1
  • Christos Faloutsos
    • 1
  1. 1.Department of Computer Science and Institute for Systems Research (ISR)University of MarylandCollege Park

Personalised recommendations