A New Sequential Mining Approach to XML Document Clustering*

  • Jeong Hee Hwang
  • Keun Ho Ryu
Conference paper

DOI: 10.1007/978-3-540-31849-1_27

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3399)
Cite this paper as:
Hwang J.H., Ryu K.H. (2005) A New Sequential Mining Approach to XML Document Clustering*. In: Zhang Y., Tanaka K., Yu J.X., Wang S., Li M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg

Abstract

XML has recently become very popular for representing semi-structured data and a standard for data exchange over the web because of its varied applicability in a number of applications. Therefore, XML documents form an important data mining domain. In this paper, we propose a new XML document clustering technique using sequential pattern mining algorithm. Our approach first extracts the representative structures of frequent patterns from schemaless XML documents by using a sequential pattern mining algorithm. And then, unlike most previous document clustering methods, we apply clustering algorithm for transactional data without a measure of pairwise similarity, considering that an XML document as a transaction and the extracted frequent structures of documents as the items of the transaction. We have experimented our clustering algorithm by comparing it with the previous methods. The experimental results show the effectiveness of the proposed method in performance and in producing clusters with higher cluster cohesion.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jeong Hee Hwang
    • 1
  • Keun Ho Ryu
    • 1
  1. 1.Database LaboratoryChungbuk National UniversityKorea

Personalised recommendations