Deep K-Means: A Simple and Effective Method for Data Clustering

Huang, Shudong; Kang, Zhao; Xu, Zenglin

doi:10.1007/978-981-15-7670-6_23

Shudong Huang¹⁰,
Zhao Kang¹¹ &
Zenglin Xu^11,12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1265))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

1309 Accesses
3 Citations

Abstract

Clustering is one of the most fundamental techniques in statistic and machine learning. Due to the simplicity and efficiency, the most frequently used clustering method is the k-means algorithm. In the past decades, k-means and its various extensions have been proposed and successfully applied in data mining practical problems. However, previous clustering methods are typically designed in a single layer formulation. Thus the mapping between the low-dimensional representation obtained by these methods and the original data may contain rather complex hierarchical information. In this paper, a novel deep k-means model is proposed to learn such hidden representations with respect to different implicit lower-level characteristics. By utilizing the deep structure to conduct k-means hierarchically, the hierarchical semantics of data is learned in a layerwise way. The data points from same class are gathered closer layer by layer, which is beneficial for the subsequent learning task. Experiments on benchmark data sets are performed to illustrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For simplicity, the layer size (dimensionalities) of layer 1 to layer r is denoted as [\(k_1 \cdots k_r\)] in the experiments.
2.
https://archive.ics.uci.edu/ml/datasets.html.

References

Ault, S.V., Perez, R.J., Kimble, C.A., Wang, J.: On speech recognition algorithms. Int. J. Mach. Learn. Comput. 8(6) (2018)
Google Scholar
Badea, L.: Clustering and metaclustering with nonnegative matrix decompositions. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 10–22. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_7
Chapter Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends® in Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Buchta, C., Kober, M., Feinerer, I., Hornik, K.: Spherical k-means clustering. J. Stat. Softw. 50(10), 1–22 (2012)
Google Scholar
Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 29–37 (2004)
Google Scholar
Ding, C., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)
Article Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135 (2006)
Google Scholar
Gokcay, E., Principe, J.C.: Information theoretic clustering. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 158–171 (2002)
Article Google Scholar
Gönen, M., Margolin, A.A.: Localized data fusion for kernel k-means clustering with application to cancer biology. In: Advances in Neural Information Processing Systems,. pp. 1305–1313 (2014)
Google Scholar
Hou, C., Nie, F., Yi, D., Tao, D.: Discriminative embedded clustering: a framework for grouping high-dimensional data. IEEE Trans. Neural Netw. Learn. Syst. 26(6), 1287–1299 (2015)
Article MathSciNet Google Scholar
Huang, S., Kang, Z., Xu, Z.: Self-weighted multi-view clustering with soft capped norm. Knowl. Based Syst. 158, 1–8 (2018)
Article Google Scholar
Huang, S., Ren, Y., Xu, Z.: Robust multi-view data clustering with multi-view capped-norm k-means. Neurocomputing 311, 197–208 (2018)
Article Google Scholar
Huang, S., Wang, H., Li, T., Li, T., Xu, Z.: Robust graph regularized nonnegative matrix factorization for clustering. Data Min. Knowl. Disc. 32(2), 483–503 (2018)
Article MathSciNet MATH Google Scholar
Huang, S., Xu, Z., Kang, Z., Ren, Y.: Regularized nonnegative matrix factorization with adaptive local structure learning. Neurocomputing 382, 196–209 (2020)
Article Google Scholar
Huang, S., Xu, Z., Lv, J.: Adaptive local structure learning for document co-clustering. Knowl.-Based Syst. 148, 74–84 (2018)
Article Google Scholar
Huang, S., Xu, Z., Wang, F.: Nonnegative matrix factorization with adaptive neighbors. In: International Joint Conference on Neural Networks, pp. 486–493 (2017)
Google Scholar
Huang, S., Zhao, P., Ren, Y., Li, T., Xu, Z.: Self-paced and soft-weighted nonnegative matrix factorization for data representation. Knowl.-Based Syst. 164, 29–37 (2018)
Article Google Scholar
Kang, Z., Peng, C., Cheng, Q.: Kernel-driven similarity learning. Neurocomputing 267, 210–219 (2017)
Article Google Scholar
Kang, Z., et al.: Multi-graph fusion for multi-view spectral clustering. Knowl. Based Syst. 189, 105102 (2020)
Article Google Scholar
Kang, Z., Wen, L., Chen, W., Xu, Z.: Low-rank kernel learning for graph-based clustering. Knowl.-Based Syst. 163, 510–517 (2019)
Article Google Scholar
Kang, Z., Xu, H., Wang, B., Zhu, H., Xu, Z.: Clustering with similarity preserving. Neurocomputing 365, 211–218 (2019)
Article Google Scholar
Kang, Z., et al.: Partition level multiview subspace clustering. Neural Netw. 122, 279–288 (2020)
Article Google Scholar
Kong, D., Ding, C., Huang, H.: Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 673–682. ACM (2011)
Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562 (2001)
Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Newling, J., Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, pp. 936–944 (2016)
Google Scholar
Newling, J., Fleuret, F.: Nested mini-batch k-means. In: Advances in Neural Information Processing Systems, pp. 1352–1360 (2016)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)
Google Scholar
Ren, Y., Domeniconi, C., Zhang, G., Yu, G.: Weighted-object ensemble clustering: methods and analysis. Knowl. Inf. Syst. 51(2), 661–689 (2017)
Article Google Scholar
Ren, Y., Hu, K., Dai, X., Pan, L., Hoi, S.C., Xu, Z.: Semi-supervised deep embedded clustering. Neurocomputing 325, 121–130 (2019)
Article Google Scholar
Ren, Y., Huang, S., Zhao, P., Han, M., Xu, Z.: Self-paced and auto-weighted multi-view clustering. Neurocomputing 383, 248–256 (2020)
Article Google Scholar
Ren, Y., Kamath, U., Domeniconi, C., Xu, Z.: Parallel boosted clustering. Neurocomputing 351, 87–100 (2019)
Article Google Scholar
Ren, Y., Que, X., Yao, D., Xu, Z.: Self-paced multi-task clustering. Neurocomputing 350, 212–220 (2019)
Article Google Scholar
Trigeorgis, G., Bousmalis, K., Zafeiriou, S., Schuller, B.W.: A deep matrix factorization method for learning attribute representations. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 417–429 (2017)
Article Google Scholar
Tunali, V., Bilgin, T., Camurcu, A.: An improved clustering algorithm for text mining: multi-cluster spherical k-means. Int. Arab J. Inf. Technol. 13(1), 12–19 (2016)
Google Scholar
Wang, J., et al.: Enhancing multiphoton upconversion through energy clustering at sublattice level. Nat. Mater. 13(2), 157 (2014)
Article Google Scholar
Wang, L., Pan, C.: Robust level set image segmentation via a local correntropy-based k-means clustering. Pattern Recogn. 47(5), 1917–1925 (2014)
Article Google Scholar
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Key Research and Development Program of China under Contract 2017YFB1002201, the National Natural Science Fund for Distinguished Young Scholar under Grant 61625204, the State Key Program of the National Science Foundation of China under Grant 61836006, and the Fundamental Research Funds for the Central Universities under Grant 1082204112364.

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Shudong Huang
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Zhao Kang & Zenglin Xu
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
Zenglin Xu
Center for Artificial Intelligence, Peng Cheng Lab, Shenzhen, 518055, China
Zenglin Xu

Authors

Shudong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Kang
View author publications
You can also search for this author in PubMed Google Scholar
Zenglin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shudong Huang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Hefei University of Technology, Hefei, China
Zhao Zhang
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, S., Kang, Z., Xu, Z. (2020). Deep K-Means: A Simple and Effective Method for Data Clustering. In: Zhang, H., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2020. Communications in Computer and Information Science, vol 1265. Springer, Singapore. https://doi.org/10.1007/978-981-15-7670-6_23

Download citation

DOI: https://doi.org/10.1007/978-981-15-7670-6_23
Published: 13 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7669-0
Online ISBN: 978-981-15-7670-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics