A Framework for Clustering and Dynamic Maintenance of XML Documents

  • Ahmed Al-ShammariEmail author
  • Chengfei Liu
  • Mehdi Naseriparsa
  • Bao Quoc Vo
  • Tarique Anwar
  • Rui Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10604)


Web data clustering has been widely studied in the data mining communities. However, dynamic maintenance of the web data clusters is still a challenging task. In this paper, we propose a novel framework called XClusterMaint which serves for both clustering and maintenance of the XML documents. For clustering, we take both structure and content into account and propose an efficient solution for grouping the documents based on the combination of structure and content similarity. For maintenance, we propose an incremental approach for maintaining the existing clusters dynamically when we receive new incoming XML documents. Since the dynamic maintenance of the clusters is computationally expensive, we also propose an improved approach which uses a lazy maintenance scheme to improve the performance of the clusters maintenance. The experimental results on real datasets verify the efficiency of the proposed clustering and maintenance model.


Clustering XML documents Structure and content similarity Dynamic maintenance 



This work was partially supported by the ARC Discovery Project under Grant No. DP170104747 and the Iraqi Ministry of Higher Education and Scientific Research.


  1. 1.
    Abbas, A.M., Bakar, A.A., Ahmad, M.Z.: Fast dynamic clustering SOAP messages based compression and aggregation model for enhanced performance of web services. J. Netw. Comput. Appl. 41, 80–88 (2014)CrossRefGoogle Scholar
  2. 2.
    Al-Shammary, D., Khalil, I.: Dynamic fractal clustering technique for SOAP web messages. In: IEEE International Conference on Services Computing (SCC), pp. 96–103 (2011)Google Scholar
  3. 3.
    Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(2), 1 (2007)MathSciNetGoogle Scholar
  4. 4.
    Cheng, W., Zhang, X., Pan, F., Wang, W.: HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl. Inf. Syst. 46(2), 343–367 (2016)CrossRefGoogle Scholar
  5. 5.
    Cochez, M., Mou, H.: Twister tries: approximate hierarchical agglomerative clustering for average distance in linear time. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 505–517 (2015)Google Scholar
  6. 6.
    Costa, G., Manco, G., Ortale, R., Ritacco, E.: Hierarchical clustering of XML documents focused on structural components. Data Knowl. Eng. 84, 26–46 (2013)CrossRefGoogle Scholar
  7. 7.
    Ding, R., Wang, Q., Dang, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proc. VLDB Endow. 8(5), 473–484 (2015)CrossRefGoogle Scholar
  8. 8.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  9. 9.
    OpenFlights, 15 December 2016.
  10. 10.
    Phan, K.A., Tari, Z., Bertok, P.: Similarity-based soap multicast protocol to reduce bandwidth and latency in web services. IEEE Trans. Serv. Comput. 1(2), 88–103 (2008)CrossRefGoogle Scholar
  11. 11.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)CrossRefzbMATHGoogle Scholar
  12. 12.
    Tran, T., Nayak, R., Bruza, P.: Combining structure and content similarities for XML document clustering. In: Proceedings of the 7th Australasian Data Mining Conference, vol. 87, pp. 219–225 (2008)Google Scholar
  13. 13.
    Wang, D., Li, T.: Document update summarization using incremental hierarchical clustering. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 279–288 (2010)Google Scholar
  14. 14.
    Yan, J., Cheng, D., Zong, M., Deng, Z.: Improved spectral clustering algorithm based on similarity measure. In: International Conference on Advanced Data Mining and Applications, pp. 641–654 (2014)Google Scholar
  15. 15.
    Yongming, G., Dehua, C., Jiajin, L.: Clustering XML documents by combining content and structure. In: International Symposium on Information Science and Engineering, ISISE 2008, vol. 1, pp. 583–587 (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ahmed Al-Shammari
    • 1
    Email author
  • Chengfei Liu
    • 1
  • Mehdi Naseriparsa
    • 1
  • Bao Quoc Vo
    • 1
  • Tarique Anwar
    • 1
  • Rui Zhou
    • 1
  1. 1.Swinburne University of TechnologyMelbourneAustralia

Personalised recommendations