Flexible Workload-Aware Clustering of XML Documents
We investigate workload-directed physical data clustering in native XML database and repository systems. We present a practical algorithm for clustering XML documents, called XC, which is based on Lukes’ tree partitioning algorithm. XC carefully approximates certain aspects of Lukes’ algorithm so as to substantially reduce memory and time usage. XC can operate with varying degrees of precision, even in memory constrained environments. Experimental results indicate that XC is a superior clustering algorithm in terms of partition quality, with only a slight overhead in performance when compared to a workload-directed depth-first scan and store scheme. We demonstrate that XC is substantially faster than the exact Lukes’ algorithm, with only a minimal loss in clustering quality. Results also indicate that XC can exploit application workload information to generate XML clustering solutions that lead to major reduction in page faults for the workload under consideration.
KeywordsMemory Usage Optimal Partition Chunk Size Page Fault XPath Query
Unable to display preview. Download preview PDF.
- 1.Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML Schema to Relations: A Cost-Based Approach to XML Storage. In: Proceedings of the 18th IEEE International it Conference on Data Engineering, pp. 64–80 (2002)Google Scholar
- 2.Bordawekar, R., Shmueli, O.: Flexible Workload-aware Clustering of XML Documents. Technical report, IBM T. J. Watson Research Center (May 2004)Google Scholar
- 3.Fiebig, T., Helmer, S., Kanne, C., Mildenberger, J., Moerkotte, G., Schiele, R., Westmann, T.: Anatomy of a Native XML Database System. Technical Report, University of Mannheim (2002)Google Scholar
- 4.Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDBMS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)Google Scholar
- 6.Gerlhof, C.A., Kemper, A., Kilger, C., Moerkotte, G.: Partition-Based Clustering in Object Bases: From Theory to Practice. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 301–316. Springer, Heidelberg (1993)Google Scholar
- 7.Johnson, D.S., Niemi, K.A.: On Knapsacks, Partitions, and a New Dynamic Programming Technique for Trees. Mathematics of Operations Research 8(1) (1983)Google Scholar
- 8.Kanne, C., Moerkotte, G.: Efficient Storage of XML Data. In: Proceedings of the 16th International Conference on Data Engineering, IEEE Computer Society, Los Alamitos (March 2000)Google Scholar