Abstract
The use of set-valued objects is becoming increasingly commonplace in modern application domains, multimedia, genetics, the stock market, etc. Recent research on set indexing has focused mainly on containment joins and data mining without considering basic set operations on set-valued attributes. In this paper, we propose a novel indexing scheme for processing superset, subset and equality queries on set-valued attributes. The proposed index structure is a hybrid of itemset-transaction set tree of “frequent items” and an inverted list of “infrequent items” that take advantage of the developments in itemset research in data mining. In this hybrid scheme, the expectation is that basic set operations with frequent low cardinality sets will yield superior retrieval performance and avoid the high costs of construction and maintenance of item-set tree for infrequent large item-sets. We demonstrate, through extensive experiments, that the proposed method performs as expected, and yields superior overall performance compared to the state of the art indexing scheme for set-valued attributes, i.e., inverted lists.
This research was partially supported by National Science Foundation grants CNS 0521454 and IIS 0612203.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bairoch, A., Apweiler, R.: The swiss-prot protein sequence data bank and its supplement trembl. Nucleic Acids Res. 27, 49–54 (1997)
Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The uci kdd archive of large data sets for data mining research and experimentation. SIGKDD Explor. Newsl. 2(2), 81–85 (2000)
Bertino, E., Tan, C.K.-L., Ooi, B.C., Sacks-Davis, R., Zobel, J., Shidlovsky, B.: Indexing Techniques for Advanced Database Systems, pp. 151–184. Kluwer, Dordrecht (1997)
Böhm, K., Rakow, T.C.: Metadata for multimedia documents. SIGMOD Record 23, 21–26 (1994)
Chen, Y.: On the signature tree construction and analysis. TKDE 18(9), 1207–1224 (2006)
Jain, R., Hampapur, A.: Metadata in Video Databases. SIGMOD Record 23(4), 27–33 (1994)
Faloutsos, C.: Signature files. In: Information Retrieval: Data Structures & Algorithms, pp. 44–65 (1992)
Ganter, B., Stumme, G., Wille, R.: Formal concept analysis: Theory and applications 10(8), 926–926 (2004)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8(1), 53–87 (2004)
Hellerstein, J.M., Pfeffer, A.: The RD-tree: an index structure for sets. Technical Report 1252, University of Wisconsin at Madison (1994)
Helmer, S., Moerkotte, G.: A performance study of four index structures for set-valued attributes of low cardinality. The VLDB Journal 12(3), 244–261 (2003)
Hossain, S., Jamil, H.: MixIIT: A hybrid index structure for set-valued attributes. Technical report, Wayne State University, USA (2010)
Hu, K., Lu, Y., Shi, C.: Incremental discovering association rules: A concept lattice approach. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 109–113. Springer, Heidelberg (1999)
Ishikawa, Y., Kitagawa, H., Ohbo, N.: Evaluation of signature files as set access facilities in oodbs. In: SIGMOD, pp. 247–256. ACM Press, New York (1993)
Mamoulis, N.: Efficient processing of joins on set-valued attributes. In: SIGMOD, pp. 157–168 (2003)
Mamoulis, N., Cheung, D., Lian, W.: Similarity search in sets and categorical data using the signature tree. In: ICDE, March 2003, pp. 75–86 (2003)
Stumme, G., Taouil, R., Bastide, Y., Lakhal, L.: Conceptual clustering with iceberg concept lattices. In: Proc. of GI-Fachgruppentreffen Maschinelles Lernen’01, Universität Dortmund (2001)
Terrovitis, M., Passas, S., Vassiliadis, P., Sellis, T.: A combination of trie-trees and inverted files for the indexing of set-valued attributes. In: CIKM, pp. 728–737 (2006)
Valtchev, P., Missaoui, R., Godin, R.: A framework for incremental generation of closed itemsets. Discrete Appl. Math. 156(6), 924–949 (2008)
Wille, R.: Restructuring lattice theory: An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, September 1981. NATO Advanced Study Institute, vol. 83, pp. 445–470 (1981)
Zaki, M.J., Hsiao, C.-J.: Charm: An efficient algorithm for closed itemset mining. In: SDM (2002)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, pp. 283–286. AAAI Press, Menlo Park (1997)
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD, pp. 425–436 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hossain, S., Jamil, H. (2010). A Hybrid Index Structure for Set-Valued Attributes Using Itemset Tree and Inverted List. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-15364-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)