A performance study of four index structures for set-valued attributes of low cardinality

Abstract.

The efficient retrieval of data items on set-valued attributes is an important research topic that has attracted little attention so far. We studied and modified four index structures (sequential signature files, signature trees, extendible signature hashing, and inverted files) for a fast retrieval of sets with low cardinality. We compared the index structures by implementing them and subjecting them to extensive experiments, investigating the influence of query set size, database size, domain size, and data distribution (synthetic and real). The results of the experiments clearly indicate that inverted files exhibit the best overall behavior of all tested index structures.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Ash JE, Chubb PA, Ward SE, Welford SM, Willet P (1985) Communication, storage and retrieval of chemical information. Ellis Horwood, Chichester, UK

  2. 2

    Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res 24(1):21-25

    Article  Google Scholar 

  3. 3

    Bertino E, Kim W (1989) Indexing techniques for queries on nested objects. IEEE Trans Knowledg Data Eng 1(2):196-214

    Article  Google Scholar 

  4. 4

    Biliris A (1992) An efficient database storage structure for large dynamic objects. In: Proceedings of the 8th international conference on data engineering, Tempe, AZ, February 1992, pp 301-308

  5. 5

    Biliris A, Panagos E (1994) EOS user's guide. Technical report, AT&T Bell Laboratories, Florham Park, NJ

  6. 6

    Böhm K, Rakow TC (1994) Metadata for multimedia documents. SIGMOD Rec 23(4):21-26

    Google Scholar 

  7. 7

    Cattell R (ed) (1997) The object database standard: ODMG 2.0. Morgan Kaufmann, San Francisco

    Google Scholar 

  8. 8

    Claussen J, Kemper A, Moerkotte G, Peithner K (1997) Optimizing queries with universal quantification in object-oriented and object-relational databases. In: Proceedings of the 23rd VLDB conference, Athens, Greece, August 1997, pp 286-295

  9. 9

    Deppisch U (1986) S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 1986 ACM conference on research and development in information retrieval, Pisa, September 1986, pp 77-87

  10. 10

    Fagin R, Nievergelt J, Pippenger N, Strong HR (1979) Extendible hashing - a fast access method for dynamic files. ACM Trans Database Sys 4(3):315-344

    Article  Google Scholar 

  11. 11

    Faloutsos C, Christodoulakis S (1984) Signature files: an access method for documents and its analytical performance evaluation. ACM Trans Office Inform Sys 2(4):267-288

    Article  Google Scholar 

  12. 12

    Fasman KH, Letovsky SI, Cottingham RW, Kingsbury DT (1996) Improvements to the GDB human genome data base. Nucleic Acids Res 24(1):57-63

    Article  Google Scholar 

  13. 13

    Grobel T, Kilger C, Rude S (1992) Object-oriented modelling of production organization. In: Tagungsband der 22. GI-Jahrestagung, Karlsruhe, September 1992. Informatik Aktuell, Springer, Berlin Heidelberg New York, (in German)

  14. 14

    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, June 1984, Boston, pp 47-57

  15. 15

    Hellerstein JM, Pfeffer A (1994) The RD-tree: an index structure for sets. Technical Report 1252, University of Wisconsin at Madison

  16. 16

    Helmer S (1997) Index structures for databases containing data items with set-valued attributes. Technical Report 2/97, Universität Mannheim http://pi3.informatik.uni-mannheim.de

  17. 17

    Ishikawa Y, Kitagawa H, Ohbo N (1993) Evaluation of signature files as set access facilities in OODBs. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, May 1993, pp 247-256

  18. 18

    Jain R, Hampapur A (1994) Metadata in video databases. SIGMOD Rec 23(4):27-33

    MATH  Google Scholar 

  19. 19

    Kemper A, Moerkotte G (1992) Access support relations: an indexing method for object bases. Inform Sys 17(2):117-146

    Article  MATH  Google Scholar 

  20. 20

    Kitagawa H, Fukushima K (1996) Composite bit-sliced signature file: An efficient access method for set-valued object retrieval. In: Proceedings of the international symposium on cooperative database systems for advanced applications (CODAS), Kyoto, Japan, December 1996, pp 388-395

  21. 21

    Kitagawa H, Fukushima Y, Ishikawa Y, Ohbo N (1993) Estimation of false drops in set-valued object retrieval with signature files. In: Proceedings of the 4th international conference on foundations of data organization and algorithms, Chicago, October 1993, pp 146-163

  22. 22

    Knuth DE (1973) The art of computer programming, vol. 3: Sorting and searching. Addison-Wesley, Reading, MA

  23. 23

    Maier D, Stein J (1986) Indexing in an object-oriented database. In: Proceedings of the IEEE workshop on object-oriented DBMSs, Asilomar, CA, September 1986

  24. 24

    Moffat A, Zobel J (1996) Self-indexing inverted files for fast text retrieval. ACM Trans Inform Sys 14(4):349-379

    Article  Google Scholar 

  25. 25

    Poosala V (1995) Zipf's law. Technical report, University of Wisconsin at Madison

  26. 26

    Sacks-Davis R, Zobel J (1997) Text databases. In: Indexing techniques for advanced database systems. Kluwer, Amsterdam, pp 151-184

  27. 27

    Stonebraker M, Moore D (1996) Object-relational DBMSs: the next great wave. Morgan Kaufmann, San Francisco

  28. 28

    Vance B, Maier D (1996) Rapid bushy join-order optimization with cartesian products. In: Proceedings of the ACM SIGMOD international conference on management of data, Montréal, June 1996, pp 35-46

  29. 29

    Westmann T, Kossmann D, Helmer S, Moerkotte G (2000) The implementation and performance of compressed databases. SIGMOD Rec 29(3):55-67

    Google Scholar 

  30. 30

    Will M, Fachinger W, Richert JR (1996) Fully automated structure elucidation - a spectroscopist's dream comes true. J Chem Inf Comput Sci 36:221-227

    Article  Google Scholar 

  31. 31

    Witten IH, Moffat A, Bell TC (1999) Managing gigabytes. Morgan Kaufmann, San Francisco

  32. 32

    Xie Z, Han J (1994) Join index hierarchies for supporting efficient navigation in object-oriented databases. In: Proceedings international conference on very large data bases (VLDB), Santiago, September 1994, pp 522-533

  33. 33

    Zezula P, Rabitti F, Tiberio P (1991) Dynamic partitioning of signature files. ACM Trans Inform Sys 9(4):336-369

    Article  Google Scholar 

  34. 34

    Zobel J, Moffat A, Ramamohanarao K (1996) Guidelines for presentation and comparison of indexing techniques. ACM SIGMOD Rec 25(3):10-15

    Google Scholar 

  35. 35

    Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. Trans Database Sys 23(4):453-490

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sven Helmer.

Additional information

Received: 0 May 2000, Accepted: 18 October 2000, Published online: 17 September 2003

Edited by E. Bertino

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Helmer, S., Moerkotte, G. A performance study of four index structures for set-valued attributes of low cardinality. VLDB 12, 244–261 (2003). https://doi.org/10.1007/s00778-003-0106-0

Download citation

Keywords:

  • Database management systems
  • Physical design
  • Access methods
  • Index structures
  • Set-valued attributes