CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph

Wang, Da-Wei; Cui, Wan-Qiu; Qin, Biao

doi:10.1007/s11390-019-1966-0

CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph

Regular Paper
Published: 06 September 2019

Volume 34, pages 1152–1166, (2019)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Da-Wei Wang¹,
Wan-Qiu Cui² &
Biao Qin¹

81 Accesses
5 Citations
Explore all metrics

Abstract

The designation of the cluster number K and the initial centroids is essential for K-modes clustering algorithm. However, most of the improved methods based on K-modes specify the K value manually and generate the initial centroids randomly, which makes the clustering algorithm significantly dependent on human-based decisions and unstable on the iteration time. To overcome this limitation, we propose a cohesive K-modes (CK-modes) algorithm to generate the cluster number K and the initial centroids automatically. Explicitly, we construct a labeled property graph based on index-free adjacency to capture both global and local cohesion of the node in the sample of the input datasets. The cohesive node calculated based on the property similarity is exploited to split the graph to a K-node tree that determines the K value, and then the initial centroids are selected from the split subtrees. Since the property graph construction and the cohesion calculation are only performed once, they account for a small amount of execution time of the clustering operation with multiple iterations, but significantly accelerate the clustering convergence. Experimental validation in both real-world and synthetic datasets shows that the CK-modes algorithm outperforms the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Seed Expansion Based on Composite Similarity for Community Detection in Attributed Networks

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Article 22 April 2020

Connected graph decomposition for spectral clustering

Article 10 September 2018

References

Shiokawa H, Fujiwara Y, Onizuka M. SCAN++: Efficient algorithm for finding clusters, hubs and outliers on largescale graphs. Proceedings of the VLDB Endowment, 2015, 8(11): 1178-1189.
Article Google Scholar
Zhang W P, Li Z J, Li R H, Liu Y H, Mao R, Qiao S J. MapReduce-based graph structural clustering algorithm. Journal of Software, 2018, 29(3): 627-641. (in Chinese)
MATH Google Scholar
Wu Y, Zhong Z N, Xiong W, Chen L, Jing N. An efficient method for attributed graph clustering. Chinese Journal of Computer, 2013, 36(8): 1704-1713. (in Chinese)
Article Google Scholar
Guo T, Ding X W, Li Y F. Parallel K-modes algorithm based on MapReduce. In Proc. the 3rd International Conference on Digital Information, Networking, and Wireless Communications, February 2015, pp.176-179.
Zhou F F, Li J C, Huang W, Wang J H, Zhao Y. Extending dimensions in Radviz for visual clustering analysis. Journal of Software, 2016, 27(5): 1127-1139. (in Chinese)
MathSciNet Google Scholar
Noori-Daryan M, Taleizadeh A A, Govindan K. Joint replenishment and pricing decisions with different freight modes considerations for a supply chain under a composite incentive contract. Journal of the Operational Research Society, 2018, 69(6): 876-894.
Article Google Scholar
Huang Z X. Clustering large data sets with mixed numeric and categorical values. In Proc. the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, February 1997, pp.21-35.
Ahmad A, Dey L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognition Letters, 2007, 28(1): 110-118.
Article Google Scholar
Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 2009, 36(2): 3336-3341.
Article Google Scholar
Zadegan S M R, Mirzaie M, Sadoughi F. Randed K-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 2013, 39: 133-143.
Article Google Scholar
Ferrarini L, Olofsen H, Palm W M, van Buchem M A, Reiber J H C, Admiraal-Behloul F. GAMEs: Growing and adaptive meshes for fully automatic shape modeling and analysis. Medical Image Analysis. 2007, 11(3): 302-314.
Article Google Scholar
Ng M K, Chan E Y, So M M C, Ching W K. A semisupervised regression model for mixed numerical and categorical variables. Pattern Recognition, 2007, 40(6): 1745-1752.
Article Google Scholar
Bachem O, Lucic M, Hassani S H, Krause A. Approximate K-means++ in sublinear time. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1459-1467.
Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. In Proc. the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2007, pp.1027-1035.
Liu Y C, Li Z M, Xiong H, Gao X D,Wu J J. Understanding of internal clustering validation measures. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.911-916.
Liu Y C, Li Z M, Xiong H, Gao X D,Wu J J. Understanding and enhancement of internal clustering validation measures. IEEE Transactions on Cybernetics, 2013, 43(3): 982-994.
Article Google Scholar
Robinson I, Webber J, Eifrem E. Graph Databases (1st edition). O’Reilly Media, 2013.
Akpan N P, Iwok I A. A minimum spanning tree approach of solving a transportation problem. International Journal of Mathematics and Statistics Invention, 2017, 5(3): 9-18.
Google Scholar
Li M C, Han S, Shi J. An enhanced ISODATA algorithm for recognizing multiple electric appliances from the aggregated power consumption dataset. Energy and Buildings, 2017, (140): 305-316.
Article Google Scholar
Hesthaven J S. A stable penalty method for the compressible Navier-Stokes equations: II. One-dimensional domain decomposition schemes. SIAM Journal on Scientific Computing, 1997, 18(3): 658-685.
Article MathSciNet Google Scholar
Jin X, Han J. K-medoids clustering. In Encyclopedia of Machine Learning, Sammut G, Webb G I (eds.), Springer, 2016, pp.564-565.
Han L S, Xiang L S, Liu X Y, Luan J. The K-medoids algorithm with initial centers optimized based on a P System. Journal of Information and Computational Science, 2014, 11(6): 1765-1773.
Article Google Scholar
Kang Z, Peng C, Cheng Q. Clustering with adaptive manifold structure learning. In Proc. the 33rd Int. Conference on Data Engineering, Apr. 2017, pp.79-82.
Nehak D, Dehak R, Glass J, Reynolds D, Kenny P. Cosine similarity scoring without score normalization techniques. In Proc. the Speaker and Language Recognition Workshop, June 2010, Article No. 15.
Cheng H, Zhou Y, Yu J X. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data, 2011, 5(2): Article No. 12.
Article MathSciNet Google Scholar
Chang L J, Li W, Lu Q, Zhang W J, Yang S Y. pSCAN: Fast and exact structural graph clustering. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(2): 387-401.
Article Google Scholar
Schubert E, Sander J, Ester M, Kriegel H P, Xu X W. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 2017, 42(3): Article No. 19.
Du Z H, Li Y B. An improved BIRCH clustering algorithm and application in thermal power. In Proc. the 2010 International Conference on Web Information Systems and Mining, October 2010, pp.53-56.
Xiong H, Wu J J, Chen J. K-means clustering versus validation measures: A data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2009, 39(2): 318-331.
Article Google Scholar
Wu J J, Xiong H, Chen J. Adapting the right measures for K-means clustering. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.877-886.

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Da-Wei Wang & Biao Qin
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Wan-Qiu Cui

Authors

Da-Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wan-Qiu Cui
View author publications
You can also search for this author in PubMed Google Scholar
Biao Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biao Qin.

Electronic supplementary material

ESM 1

(PDF 549 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, DW., Cui, WQ. & Qin, B. CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph. J. Comput. Sci. Technol. 34, 1152–1166 (2019). https://doi.org/10.1007/s11390-019-1966-0

Download citation

Received: 29 July 2018
Revised: 25 July 2019
Published: 06 September 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s11390-019-1966-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph

Abstract

Access this article

Similar content being viewed by others

Adaptive Seed Expansion Based on Composite Similarity for Community Detection in Attributed Networks

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Connected graph decomposition for spectral clustering

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph

Abstract

Access this article

Similar content being viewed by others

Adaptive Seed Expansion Based on Composite Similarity for Community Detection in Attributed Networks

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Connected graph decomposition for spectral clustering

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation