Abstract
XML documents are the special kind of data having hierarchical structure. Typical clustering algorithms do not meet requirements which may be stated for analysis of such data. A novel, dedicated for XML documents clustering method called Multilevel clustering of XML documents (ML) is presented in the paper. The method clusters feature vectors encoding XML documents on the different structure levels. Application of Conditional Fuzzy C-Means algorithm to ML method is proposed in the paper and the advantage of this fuzzy method over hard approach to ML algorithm is discussed and proved. An application of ML method to accelerating query execution on XML documents is discussed in the paper. The experimental results performed on two data sets having different characteristics show that the proposed method of multilevel conditional fuzzy clustering of XML documents outperforms hard multilevel clustering.
Chapter PDF
Similar content being viewed by others
References
Barbosa, D., Keenleyside, J., Lyons, K., Mendelzon, A.: ToXgene - the ToX XML Data Generator (2007), http://www.cs.toronto.edu/tox/toxgene
Dataset used in the experiments: http://dydaktyka.polsl.pl/ZTiPSK/IndywidualnePlanyZajec/Michal_Kozielski_dane/mkoz_www.html
Flesca, S., et al.: Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering 17(2) (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A review. ACM Computing Surveys 31(3) (1999)
Kozielski, M.: Przyspieszanie realizacji zapytań na dokumentach XML z wykorzystaniem grupowania względem ich struktury, Bazy Danych, Nowe Technologie: Architektura, metody formalne i zaawansowana analiza danych, WKŁ, pp. 305–314 (2007)
Kozielski, M.: Improving the Results and Performance of Clustering Bit-encoded XML Documents. In: Proc. of ICDM Workshops 2006, pp. 60–64. IEEE Computer Society Press, Los Alamitos (2006)
Lian, W., et al.: An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering 16(1) (2004)
Liu, J., et al.: XML Clustering by Principal Component Analysis. In: ICTAI 2004. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, IEEE Computer Society Press, Los Alamitos (2004)
Łęski, J.: Generalized Weighted Conditional Fuzzy Clustering. IEEE Transactions on Fuzzy Systems 11(6) (2003)
Nayak, R.: Fast and Effective Clustering of XML Data Utilizing their Structural Information, Under publication in KAIS: Knowledge and Information Systems - An International Journal
Pedrycz, W.: Conditional Fuzzy C-Means. Pattern Recognition Letters 17, 625–631 (1996)
(2006), http://xmlmining.lip6.fr
Yoon, J.P., Raghavan, V., Chakilam, V.: Bitmap Indexing-based Clustering and Retrieval of XML Documents. In: Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, New Orleans, LA, ACM Press, New York (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozielski, M. (2007). Multilevel Conditional Fuzzy C-Means Clustering of XML Documents. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-74976-9_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)