A New Sequential Mining Approach to XML Document Clustering*
- Cite this paper as:
- Hwang J.H., Ryu K.H. (2005) A New Sequential Mining Approach to XML Document Clustering*. In: Zhang Y., Tanaka K., Yu J.X., Wang S., Li M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg
XML has recently become very popular for representing semi-structured data and a standard for data exchange over the web because of its varied applicability in a number of applications. Therefore, XML documents form an important data mining domain. In this paper, we propose a new XML document clustering technique using sequential pattern mining algorithm. Our approach first extracts the representative structures of frequent patterns from schemaless XML documents by using a sequential pattern mining algorithm. And then, unlike most previous document clustering methods, we apply clustering algorithm for transactional data without a measure of pairwise similarity, considering that an XML document as a transaction and the extracted frequent structures of documents as the items of the transaction. We have experimented our clustering algorithm by comparing it with the previous methods. The experimental results show the effectiveness of the proposed method in performance and in producing clusters with higher cluster cohesion.
Unable to display preview. Download preview PDF.