The VLDB Journal

, Volume 12, Issue 3, pp 244–261 | Cite as

A performance study of four index structures for set-valued attributes of low cardinality

  • Sven HelmerEmail author
  • Guido Moerkotte


The efficient retrieval of data items on set-valued attributes is an important research topic that has attracted little attention so far. We studied and modified four index structures (sequential signature files, signature trees, extendible signature hashing, and inverted files) for a fast retrieval of sets with low cardinality. We compared the index structures by implementing them and subjecting them to extensive experiments, investigating the influence of query set size, database size, domain size, and data distribution (synthetic and real). The results of the experiments clearly indicate that inverted files exhibit the best overall behavior of all tested index structures.


Database management systems Physical design Access methods Index structures Set-valued attributes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ash JE, Chubb PA, Ward SE, Welford SM, Willet P (1985) Communication, storage and retrieval of chemical information. Ellis Horwood, Chichester, UKGoogle Scholar
  2. 2.
    Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res 24(1):21-25CrossRefGoogle Scholar
  3. 3.
    Bertino E, Kim W (1989) Indexing techniques for queries on nested objects. IEEE Trans Knowledg Data Eng 1(2):196-214CrossRefGoogle Scholar
  4. 4.
    Biliris A (1992) An efficient database storage structure for large dynamic objects. In: Proceedings of the 8th international conference on data engineering, Tempe, AZ, February 1992, pp 301-308Google Scholar
  5. 5.
    Biliris A, Panagos E (1994) EOS user's guide. Technical report, AT&T Bell Laboratories, Florham Park, NJGoogle Scholar
  6. 6.
    Böhm K, Rakow TC (1994) Metadata for multimedia documents. SIGMOD Rec 23(4):21-26Google Scholar
  7. 7.
    Cattell R (ed) (1997) The object database standard: ODMG 2.0. Morgan Kaufmann, San FranciscoGoogle Scholar
  8. 8.
    Claussen J, Kemper A, Moerkotte G, Peithner K (1997) Optimizing queries with universal quantification in object-oriented and object-relational databases. In: Proceedings of the 23rd VLDB conference, Athens, Greece, August 1997, pp 286-295Google Scholar
  9. 9.
    Deppisch U (1986) S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 1986 ACM conference on research and development in information retrieval, Pisa, September 1986, pp 77-87Google Scholar
  10. 10.
    Fagin R, Nievergelt J, Pippenger N, Strong HR (1979) Extendible hashing - a fast access method for dynamic files. ACM Trans Database Sys 4(3):315-344CrossRefGoogle Scholar
  11. 11.
    Faloutsos C, Christodoulakis S (1984) Signature files: an access method for documents and its analytical performance evaluation. ACM Trans Office Inform Sys 2(4):267-288CrossRefGoogle Scholar
  12. 12.
    Fasman KH, Letovsky SI, Cottingham RW, Kingsbury DT (1996) Improvements to the GDB human genome data base. Nucleic Acids Res 24(1):57-63CrossRefGoogle Scholar
  13. 13.
    Grobel T, Kilger C, Rude S (1992) Object-oriented modelling of production organization. In: Tagungsband der 22. GI-Jahrestagung, Karlsruhe, September 1992. Informatik Aktuell, Springer, Berlin Heidelberg New York, (in German)Google Scholar
  14. 14.
    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, June 1984, Boston, pp 47-57Google Scholar
  15. 15.
    Hellerstein JM, Pfeffer A (1994) The RD-tree: an index structure for sets. Technical Report 1252, University of Wisconsin at MadisonGoogle Scholar
  16. 16.
    Helmer S (1997) Index structures for databases containing data items with set-valued attributes. Technical Report 2/97, Universität Mannheim http://pi3.informatik.uni-mannheim.deGoogle Scholar
  17. 17.
    Ishikawa Y, Kitagawa H, Ohbo N (1993) Evaluation of signature files as set access facilities in OODBs. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, May 1993, pp 247-256Google Scholar
  18. 18.
    Jain R, Hampapur A (1994) Metadata in video databases. SIGMOD Rec 23(4):27-33zbMATHGoogle Scholar
  19. 19.
    Kemper A, Moerkotte G (1992) Access support relations: an indexing method for object bases. Inform Sys 17(2):117-146CrossRefzbMATHGoogle Scholar
  20. 20.
    Kitagawa H, Fukushima K (1996) Composite bit-sliced signature file: An efficient access method for set-valued object retrieval. In: Proceedings of the international symposium on cooperative database systems for advanced applications (CODAS), Kyoto, Japan, December 1996, pp 388-395Google Scholar
  21. 21.
    Kitagawa H, Fukushima Y, Ishikawa Y, Ohbo N (1993) Estimation of false drops in set-valued object retrieval with signature files. In: Proceedings of the 4th international conference on foundations of data organization and algorithms, Chicago, October 1993, pp 146-163Google Scholar
  22. 22.
    Knuth DE (1973) The art of computer programming, vol. 3: Sorting and searching. Addison-Wesley, Reading, MAGoogle Scholar
  23. 23.
    Maier D, Stein J (1986) Indexing in an object-oriented database. In: Proceedings of the IEEE workshop on object-oriented DBMSs, Asilomar, CA, September 1986Google Scholar
  24. 24.
    Moffat A, Zobel J (1996) Self-indexing inverted files for fast text retrieval. ACM Trans Inform Sys 14(4):349-379CrossRefGoogle Scholar
  25. 25.
    Poosala V (1995) Zipf's law. Technical report, University of Wisconsin at MadisonGoogle Scholar
  26. 26.
    Sacks-Davis R, Zobel J (1997) Text databases. In: Indexing techniques for advanced database systems. Kluwer, Amsterdam, pp 151-184Google Scholar
  27. 27.
    Stonebraker M, Moore D (1996) Object-relational DBMSs: the next great wave. Morgan Kaufmann, San FranciscoGoogle Scholar
  28. 28.
    Vance B, Maier D (1996) Rapid bushy join-order optimization with cartesian products. In: Proceedings of the ACM SIGMOD international conference on management of data, Montréal, June 1996, pp 35-46Google Scholar
  29. 29.
    Westmann T, Kossmann D, Helmer S, Moerkotte G (2000) The implementation and performance of compressed databases. SIGMOD Rec 29(3):55-67Google Scholar
  30. 30.
    Will M, Fachinger W, Richert JR (1996) Fully automated structure elucidation - a spectroscopist's dream comes true. J Chem Inf Comput Sci 36:221-227CrossRefGoogle Scholar
  31. 31.
    Witten IH, Moffat A, Bell TC (1999) Managing gigabytes. Morgan Kaufmann, San FranciscoGoogle Scholar
  32. 32.
    Xie Z, Han J (1994) Join index hierarchies for supporting efficient navigation in object-oriented databases. In: Proceedings international conference on very large data bases (VLDB), Santiago, September 1994, pp 522-533Google Scholar
  33. 33.
    Zezula P, Rabitti F, Tiberio P (1991) Dynamic partitioning of signature files. ACM Trans Inform Sys 9(4):336-369CrossRefGoogle Scholar
  34. 34.
    Zobel J, Moffat A, Ramamohanarao K (1996) Guidelines for presentation and comparison of indexing techniques. ACM SIGMOD Rec 25(3):10-15Google Scholar
  35. 35.
    Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. Trans Database Sys 23(4):453-490CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2003

Authors and Affiliations

  1. 1.Lehrstuhl für Praktische Informatik IIIUniversität MannheimMannheimGermany

Personalised recommendations