XML Clustering Based on Common Neighbor

Lv, Tian-yang; Zhang, Xi-zhe; Zuo, Wan-li; Wang, Zheng-xuan

doi:10.1007/11610496_18

Tian-yang Lv^21,22,
Xi-zhe Zhang²¹,
Wan-li Zuo²¹ &
…
Zheng-xuan Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3842))

Included in the following conference series:

Asia-Pacific Web Conference

764 Accesses
2 Citations

Abstract

Clustering on XML documents is an important task. However, it is difficult to select the appropriate parameters’ value for the clustering algorithms. By integrating outlier detection with clustering, the paper takes a new approach for analyzing the XML documents by structure distance. After stating the XML tree distance, the paper proposes a new clustering algorithm, which stops clustering automatically by utilizing the outlier information and needs only one parameter, whose appropriate value range can be decided in the outlier mining process. The paper adopts the XML dataset with different structure and other real-life datasets to compare it with other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM Int. Conf. on Information and Knowledge Management, pp. 292–299 (2002)
Google Scholar
Shen, Y., Wang, B.: Clustering Schemaless XML Document. In: Proc. of the 11th Int. Conf. on Cooperative Information System, pp. 767–784 (2003)
Google Scholar
Dalamagas, T., et al.: Clustering XML documents by structure. In: Proceedings Methods and Applications of Artificial Intelligence, pp. 112–121 (2004)
Google Scholar
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiment and Analysis. Technical Report #01-40, University of Minnesota (2001)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int’l Conf. on Data Eng. (1999)
Google Scholar
http://www.cs.wisc.edu/niagara/data.html
Fred, A.L.N., Leitão, J.M.N.: A new Cluster Isolation criterion Based on Dissimilarity Increments. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(8), 944–958 (2003)
Article Google Scholar
http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, China
Tian-yang Lv, Xi-zhe Zhang, Wan-li Zuo & Zheng-xuan Wang
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Tian-yang Lv

Authors

Tian-yang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Xi-zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wan-li Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
School of Computer Science and Technology, Heilongjiang University, P.O. Box, 150080, Harbin, China
Jinbao Li
Department of Computer Science and Engineering, Shanghai Jiatong University, 80 Dongcuan Road, 200240, Shanghai, China
Minglu Li
Department of Computer Science, College of Liberal Arts and Science, University of Iowa, 52242, Iowa City, IA, USA
Jun Ni
UNC Chapel Hill,
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, Ty., Zhang, Xz., Zuo, Wl., Wang, Zx. (2006). XML Clustering Based on Common Neighbor. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_18

Download citation

DOI: https://doi.org/10.1007/11610496_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31158-4
Online ISBN: 978-3-540-32435-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics