BitCube: A Three-Dimensional Bitmap Indexing for XML Documents
- 45 Downloads
XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. In this paper, we propose that an XML document collection be represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations suitable for bitmap indexes. We also define statistical measurements in the BitCube: center, and radius. Based on these measurements, we describe a new bitmap indexing based technique to cluster XML documents. The techniques for clustering are motivated by the fact that the bitmap indexes are expected to be very sparse.
Furthermore, a 2-dimensional bitmap index is extended to a 3-dimensional bitmap index, called the BitCube. Sophisticated querying of XML document collections can be performed using primitive operations such as slice, project, and dice. Experiments show that the BitCube can be created efficiently and the primitive operations can be performed more efficiently with the BitCube than with other alternatives.
Unable to display preview. Download preview PDF.
- Berchtold, S., Keim, D.A., and Kriegel, H.P. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. Intl. Conf. On Very Large Data Bases, Bombay, India (pp. 28-39).Google Scholar
- Chan, C. and Ioannidis,Y. (1998). Bitmap Index Design and Evaluation. In Proc. of Int'lACMSIGMODConference(pp. 355-366).Google Scholar
- Gupta, A. and Mumick, I.S. (Eds.) (2000). Materialized Views. Cambridge, MA: MIT Press.Google Scholar
- Hill, D. (1968). Mechanized Information Storage, Retrieval and Dissemination. Amsterdam: North-Holland.Google Scholar
- Kobayashi, M. and Takeda, K. (2000). Information Retrieval on theWeb. ACMComputing Surveys, 32(2), 144-173.Google Scholar
- O'Neil, P. and Quass, D. (1997). Improved Query Performance with Variant Indexes. In Proc. of Int'l ACM SIGMOD Conference(pp. 38-49).Google Scholar
- Papadimitriou, C., Tamaki, H., Raghavan, P., and Vempala, S. (1998). Latent Semantic Indexing: A Probabilistic Analysis. In Proc. of the 17th ACM Symposium on Principles of Database Systems(pp. 159-168).Google Scholar
- Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. NY: McGraw-Hill.Google Scholar
- Tomasic, A., Garcia-Molina, H., and Shoens, K. (1994). Incremental Updates of Inverted Lists for Text Retrieval. In Proc. ACM SIGMOD Conference on Management of Data, Minneapolis, U.S.A. (pp. 289-300).Google Scholar
- Willet, P. (1988). Recent Trends in Hierarchical Document Clustering: A Critical Review. Information Processing and Management, 24, 577-597.Google Scholar
- Wu, M. (1999). Query Optimization for Selections using Bitmaps. In Proc. Int'l ACM SIGMOD Conference(pp. 227-238).Google Scholar
- Yoon, J. and Kim, S. (1998). A Three-Level User Interface to Multimedia Digital Libraries with Relaxation and Restriction. In IEEE Conf. on Advanced Digital Libraries, Santa Barbara, U.S.A. (pp. 206-215).Google Scholar
- Zamir, O. and Etzioni, O. (1998).Web Document Clustering: A Feasibility Demonstration. In Proc. of ACMSIGIR Conf. on Research and Development in Information Retrieval(pp. 46-54).Google Scholar