Journal of Intelligent Information Systems

, Volume 17, Issue 2–3, pp 241–254

BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

  • Jong P. Yoon
  • Vijay Raghavan
  • Venu Chakilam
  • Larry Kerschberg
Article

Abstract

XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. In this paper, we propose that an XML document collection be represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations suitable for bitmap indexes. We also define statistical measurements in the BitCube: center, and radius. Based on these measurements, we describe a new bitmap indexing based technique to cluster XML documents. The techniques for clustering are motivated by the fact that the bitmap indexes are expected to be very sparse.

Furthermore, a 2-dimensional bitmap index is extended to a 3-dimensional bitmap index, called the BitCube. Sophisticated querying of XML document collections can be performed using primitive operations such as slice, project, and dice. Experiments show that the BitCube can be created efficiently and the primitive operations can be performed more efficiently with the BitCube than with other alternatives.

XML document retrieval document clustering bitmap indexing bit-wise operations 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berchtold, S., Keim, D.A., and Kriegel, H.P. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. Intl. Conf. On Very Large Data Bases, Bombay, India (pp. 28-39).Google Scholar
  2. Chan, C. and Ioannidis,Y. (1998). Bitmap Index Design and Evaluation. In Proc. of Int'lACMSIGMODConference(pp. 355-366).Google Scholar
  3. Gupta, A. and Mumick, I.S. (Eds.) (2000). Materialized Views. Cambridge, MA: MIT Press.Google Scholar
  4. Hill, D. (1968). Mechanized Information Storage, Retrieval and Dissemination. Amsterdam: North-Holland.Google Scholar
  5. Kobayashi, M. and Takeda, K. (2000). Information Retrieval on theWeb. ACMComputing Surveys, 32(2), 144-173.Google Scholar
  6. O'Neil, P. and Quass, D. (1997). Improved Query Performance with Variant Indexes. In Proc. of Int'l ACM SIGMOD Conference(pp. 38-49).Google Scholar
  7. Papadimitriou, C., Tamaki, H., Raghavan, P., and Vempala, S. (1998). Latent Semantic Indexing: A Probabilistic Analysis. In Proc. of the 17th ACM Symposium on Principles of Database Systems(pp. 159-168).Google Scholar
  8. Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. NY: McGraw-Hill.Google Scholar
  9. Tomasic, A., Garcia-Molina, H., and Shoens, K. (1994). Incremental Updates of Inverted Lists for Text Retrieval. In Proc. ACM SIGMOD Conference on Management of Data, Minneapolis, U.S.A. (pp. 289-300).Google Scholar
  10. Willet, P. (1988). Recent Trends in Hierarchical Document Clustering: A Critical Review. Information Processing and Management, 24, 577-597.Google Scholar
  11. Wu, M. (1999). Query Optimization for Selections using Bitmaps. In Proc. Int'l ACM SIGMOD Conference(pp. 227-238).Google Scholar
  12. Yoon, J. and Kim, S. (1998). A Three-Level User Interface to Multimedia Digital Libraries with Relaxation and Restriction. In IEEE Conf. on Advanced Digital Libraries, Santa Barbara, U.S.A. (pp. 206-215).Google Scholar
  13. Zamir, O. and Etzioni, O. (1998).Web Document Clustering: A Feasibility Demonstration. In Proc. of ACMSIGIR Conf. on Research and Development in Information Retrieval(pp. 46-54).Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Jong P. Yoon
    • 1
  • Vijay Raghavan
    • 1
  • Venu Chakilam
    • 1
  • Larry Kerschberg
    • 2
  1. 1.Center for Advanced Computer StudiesUniversity of LouisianaLafayetteUSA
  2. 2.E-Center for E-Business and Department of Information and Software EngineeringGeorge Mason UniversityFairfaxUSA

Personalised recommendations