Skip to main content

A Hybrid Index Structure for Set-Valued Attributes Using Itemset Tree and Inverted List

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Abstract

The use of set-valued objects is becoming increasingly commonplace in modern application domains, multimedia, genetics, the stock market, etc. Recent research on set indexing has focused mainly on containment joins and data mining without considering basic set operations on set-valued attributes. In this paper, we propose a novel indexing scheme for processing superset, subset and equality queries on set-valued attributes. The proposed index structure is a hybrid of itemset-transaction set tree of “frequent items” and an inverted list of “infrequent items” that take advantage of the developments in itemset research in data mining. In this hybrid scheme, the expectation is that basic set operations with frequent low cardinality sets will yield superior retrieval performance and avoid the high costs of construction and maintenance of item-set tree for infrequent large item-sets. We demonstrate, through extensive experiments, that the proposed method performs as expected, and yields superior overall performance compared to the state of the art indexing scheme for set-valued attributes, i.e., inverted lists.

This research was partially supported by National Science Foundation grants CNS 0521454 and IIS 0612203.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bairoch, A., Apweiler, R.: The swiss-prot protein sequence data bank and its supplement trembl. Nucleic Acids Res. 27, 49–54 (1997)

    Article  Google Scholar 

  2. Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The uci kdd archive of large data sets for data mining research and experimentation. SIGKDD Explor. Newsl. 2(2), 81–85 (2000)

    Article  Google Scholar 

  3. Bertino, E., Tan, C.K.-L., Ooi, B.C., Sacks-Davis, R., Zobel, J., Shidlovsky, B.: Indexing Techniques for Advanced Database Systems, pp. 151–184. Kluwer, Dordrecht (1997)

    MATH  Google Scholar 

  4. Böhm, K., Rakow, T.C.: Metadata for multimedia documents. SIGMOD Record 23, 21–26 (1994)

    Article  Google Scholar 

  5. Chen, Y.: On the signature tree construction and analysis. TKDE 18(9), 1207–1224 (2006)

    Google Scholar 

  6. Jain, R., Hampapur, A.: Metadata in Video Databases. SIGMOD Record 23(4), 27–33 (1994)

    Article  Google Scholar 

  7. Faloutsos, C.: Signature files. In: Information Retrieval: Data Structures & Algorithms, pp. 44–65 (1992)

    Google Scholar 

  8. Ganter, B., Stumme, G., Wille, R.: Formal concept analysis: Theory and applications 10(8), 926–926 (2004)

    Google Scholar 

  9. Guttman, A.: R-trees: a dynamic index structure for spatial searching. Morgan Kaufmann Publishers Inc., San Francisco (1988)

    Google Scholar 

  10. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8(1), 53–87 (2004)

    MathSciNet  Google Scholar 

  11. Hellerstein, J.M., Pfeffer, A.: The RD-tree: an index structure for sets. Technical Report 1252, University of Wisconsin at Madison (1994)

    Google Scholar 

  12. Helmer, S., Moerkotte, G.: A performance study of four index structures for set-valued attributes of low cardinality. The VLDB Journal 12(3), 244–261 (2003)

    Article  Google Scholar 

  13. Hossain, S., Jamil, H.: MixIIT: A hybrid index structure for set-valued attributes. Technical report, Wayne State University, USA (2010)

    Google Scholar 

  14. Hu, K., Lu, Y., Shi, C.: Incremental discovering association rules: A concept lattice approach. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 109–113. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  15. Ishikawa, Y., Kitagawa, H., Ohbo, N.: Evaluation of signature files as set access facilities in oodbs. In: SIGMOD, pp. 247–256. ACM Press, New York (1993)

    Google Scholar 

  16. Mamoulis, N.: Efficient processing of joins on set-valued attributes. In: SIGMOD, pp. 157–168 (2003)

    Google Scholar 

  17. Mamoulis, N., Cheung, D., Lian, W.: Similarity search in sets and categorical data using the signature tree. In: ICDE, March 2003, pp. 75–86 (2003)

    Google Scholar 

  18. Stumme, G., Taouil, R., Bastide, Y., Lakhal, L.: Conceptual clustering with iceberg concept lattices. In: Proc. of GI-Fachgruppentreffen Maschinelles Lernen’01, Universität Dortmund (2001)

    Google Scholar 

  19. Terrovitis, M., Passas, S., Vassiliadis, P., Sellis, T.: A combination of trie-trees and inverted files for the indexing of set-valued attributes. In: CIKM, pp. 728–737 (2006)

    Google Scholar 

  20. Valtchev, P., Missaoui, R., Godin, R.: A framework for incremental generation of closed itemsets. Discrete Appl. Math. 156(6), 924–949 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  21. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, September 1981. NATO Advanced Study Institute, vol. 83, pp. 445–470 (1981)

    Google Scholar 

  22. Zaki, M.J., Hsiao, C.-J.: Charm: An efficient algorithm for closed itemset mining. In: SDM (2002)

    Google Scholar 

  23. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, pp. 283–286. AAAI Press, Menlo Park (1997)

    Google Scholar 

  24. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD, pp. 425–436 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hossain, S., Jamil, H. (2010). A Hybrid Index Structure for Set-Valued Attributes Using Itemset Tree and Inverted List. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15364-8_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15363-1

  • Online ISBN: 978-3-642-15364-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics