Abstract
Maximum margin clustering (MMC) is a recent approach of applying margin maximization in supervised learning to unsupervised learning, aiming to partition the data into clusters with high discrimination. Recently, extreme learning machine (ELM) has been applied to MMC (called iterative ELM clustering or ELMCIter) which maximizes the data discrimination by iteratively training a weighted extreme learning machine (W-ELM). In this way, ELMCIter achieves a substantial reduction in training time and provides a unified model for both binary and multi-class clusterings. However, there exist two issues in ELMCIter: (1) random feature mappings adopted in ELMCIter are unable to well obtain high-quality discriminative features for clustering and (2) a large model is usually required in ELMCIter because its performance is affected by the number of hidden nodes, and training such model becomes relatively slow. In this paper, the hidden layer in ELMCIter is encoded by an approximate empirical kernel map (AEKM) rather than the random feature mappings, in order to solve these two issues. AEKM is generated from low-rank approximation of the kernel matrix, derived from the input data through a kernel function. Our proposed method is called iterative AEKM for clustering (AEKMCIter), whose contributions are: (1) AEKM can extract discriminative and robust features from the kernel matrix so that better performance is always achieved in AEKMCIter and (2) AEKMCIter produces an extremely small number of hidden nodes for low memory consumption and fast training. Detailed experiments verified the effectiveness and efficiency of our approach. As an illustration, on the MNIST10 dataset, our approach AEKMCIter improves the clustering accuracy over ELMCIter up to 5%, while significantly reducing the training time and the memory consumption (i.e., the number of hidden nodes) up to 1/7 and 1/20, respectively.
Similar content being viewed by others
Notes
References
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42:513–529
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat) 28:100–108
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
McLachlan GJ, Lee SX, Rathnayake SI (2000) Finite mixture models. Ann Rev Stat Appl 6:355–378
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems, pp 849–856
Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Proceedings of advances in neural information processing systems, pp 1537–1544
Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Proceedings of advances in neural information processing systems, pp 1417–1424
Zhang K, Tsang IW, Kwok JT (2009) Maximum margin clustering made practical. IEEE Trans Neural Netw 20:583–596
Bezdek JC, Hathaway RJ (2003) Convergence of alternating optimization. Neural Parallel Sci Comput 11:351–368
Zhang C, Xia S, Liu B, Zhang L (2013) Extreme maximum margin clustering. IEICE Trans Inf Syst 96:1745–1753
Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8
Zong W, Huang G-B, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
He Q, Jin X, Du C, Zhuang F, Shi Z (2014) Clustering in extreme learning machine feature space. Neurocomputing 128:88–95
Huang G, Song S, Gupta JN, Wu C (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44:2405–2417
Achlioptas D (2003) Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687
Li C, Deng C, Zhou S, Zhao B, Huang G-B (2018) Conditional random mapping for effective elm feature representation. Cognit Comput 10:1–21
Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel SVM on limited resources: a low-rank linearization approach. Proc Mach Learn Res 22:1425–1434
Golts A, Elad M (2016) Linearized kernel dictionary learning. IEEE J Sel Top Sig Process 10:726–739
Pourkamali-Anaraki F, Becker S (2016) A randomized approach to efficient kernel clustering. In: Proceedings of the IEEE global conference on signal and information processing, pp 207–211
Vong C-M, Chen C, Wong P-K (2018) Empirical kernel map-based multilayer extreme learning machines for representation learning. Neurocomputing 310:265–276
Zhang K, Kwok JT (2010) Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Trans Neural Netw 21:1576–1587
Kumar S, Mohri M, Talwalkar A (2012) Sampling methods for the Nyström method. J Mach Learn Res 13:981–1006
Scholkopf B, Mika S, Burges CJ, Knirsch P, Muller K-R, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017
Pourkamali-Anaraki F, Becker S (2018) Randomized clustered Nystrom for large-scale kernel machines. ArXiv preprint arXiv:1612.06470. Accessed 20 May 2018
Gittens A, Mahoney MW (2013) Revisiting the Nyström method for improved large-scale machine learning. J Mach Learn Res 28:567–575
Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J Comput 36:132–157
Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175
Williams CK, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems, pp 682–688
Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, pp 1232–1239
Lichman M (2013) UCI machine learning repository, University of California, School of Information and Computer Sciences, Irvine, CA. http://archive.ics.uci.edu/ml
Acknowledgements
The work is partially supported by the following Grants: MYRG2018-00138-FST, MYRG2016-00134, University of Macau; 273/2017/A, Fundo de Ciencia e Tecnologia, Macau.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, C., Vong, CM., Wong, PK. et al. Approximate empirical kernel map-based iterative extreme learning machine for clustering. Neural Comput & Applic 32, 8031–8046 (2020). https://doi.org/10.1007/s00521-019-04295-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04295-6