FXProj – A Fuzzy XML Documents Projected Clustering Based on Structure and Content

  • Tengfei Ji
  • Xiaoyuan Bao
  • Dongqing Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7120)


XML documents possess inherent semi-structured property, consisting of structural and content features. Most existing methods for XML documents clustering consider only one aspect of them. In this paper, we propose a fuzzy XML documents projected clustering algorithm, which can be used to cluster XML documents efficiently by combining the structural and content features. Another contribution is the adoption of some fuzzy techniques in a way that each frequent induced substructure has a fuzzy parameter associated with each cluster. Experimental results on both synthetic and real datasets show its effectiveness, especially when applying to large schemaless XML document collections.


Synthetic Dataset Content Feature Subspace Cluster Fuzzy Parameter Fuzzy Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: Xproj: a framework for projected structural clustering of xml documents. In: Proceeding of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 46–55 (2007)Google Scholar
  2. 2.
    Kutty, S., Nayak, R., Li, Y.: XCFS - An XML Documents Clustering Approach using both the Structure and the Content. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 1729–1732 (2009)Google Scholar
  3. 3.
    Seeland, M., Girschick, T., Buchwald, F., Kramer, S.: Online Structural Graph Clustering using Frequent Subgraph Mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 213–228. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Tran, T., Nayak, R.: Document Clustering using Incremental and Pairwise Approaches. Focused Access to XML Documents. 222-232 (2008)Google Scholar
  5. 5.
    Doucet, A., Ahonen-Myka, H.: Naive clustering of a large XML document collection. In: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2002, pp. 81–87 (2002)Google Scholar
  6. 6.
    Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering using Tensor Space Model A Preliminary Study. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010, pp. 1167–1173 (2010)Google Scholar
  7. 7.
    Lesniewska, A.: Clustering XML Documents by Structure. In: Advances in Databases and Information Systems - Associated Workshops and Doctoral Consortium of the 13th East European Conference, ADBIS 2009, pp. 238–246 (2009)Google Scholar
  8. 8.
    Gan, G., Wu, J., Yang, Z.: The XML web: a first study. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 500–510 (2003)Google Scholar
  9. 9.
    Hwang, J.H., Ryu, K.H.: A weighted common structure based clustering technique for XML documents. Journal of Systems and Software, 1267–1274 (2010)Google Scholar
  10. 10.
    Tekli, J., Chbeir, R., Yetongnon, K.: An overview on XML similarity: Background, current trends and future directions. Computer Science Review, 151–173 (2009)Google Scholar
  11. 11.
    Kutty, S., Nayak, R., Li, Y.: HCX: An Efficient Hybrid Clustering Approach for XML Documents. In: Proceedings of the 2009 ACM Symposium on Document Engineering, DocEng 2009, pp. 94–97 (2009)Google Scholar
  12. 12.
    Zhang, L., Li, Z., Chen, Q., Li, N.: Structure and Content Similarity for Clustering XML Documents. In: Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., Shao, J. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 116–124. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the SIAM International Conference on Data Mining (2004)Google Scholar
  14. 14.
    Abel, J., Teahan, W.: Universal Text Preprocessing for Data Compression. IEEE Transactions on Computers, 497–507 (2005)Google Scholar
  15. 15.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management, 513–523 (1988)Google Scholar
  16. 16.
    Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.K.: Clustering XML Documents Using Structural Summaries. In: Lindner, W., Fischer, F., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 547–556. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tengfei Ji
    • 1
  • Xiaoyuan Bao
    • 1
  • Dongqing Yang
    • 1
  1. 1.Peking UniversityBeijingChina

Personalised recommendations