Skip to main content
Log in

Approximate empirical kernel map-based iterative extreme learning machine for clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Maximum margin clustering (MMC) is a recent approach of applying margin maximization in supervised learning to unsupervised learning, aiming to partition the data into clusters with high discrimination. Recently, extreme learning machine (ELM) has been applied to MMC (called iterative ELM clustering or ELMCIter) which maximizes the data discrimination by iteratively training a weighted extreme learning machine (W-ELM). In this way, ELMCIter achieves a substantial reduction in training time and provides a unified model for both binary and multi-class clusterings. However, there exist two issues in ELMCIter: (1) random feature mappings adopted in ELMCIter are unable to well obtain high-quality discriminative features for clustering and (2) a large model is usually required in ELMCIter because its performance is affected by the number of hidden nodes, and training such model becomes relatively slow. In this paper, the hidden layer in ELMCIter is encoded by an approximate empirical kernel map (AEKM) rather than the random feature mappings, in order to solve these two issues. AEKM is generated from low-rank approximation of the kernel matrix, derived from the input data through a kernel function. Our proposed method is called iterative AEKM for clustering (AEKMCIter), whose contributions are: (1) AEKM can extract discriminative and robust features from the kernel matrix so that better performance is always achieved in AEKMCIter and (2) AEKMCIter produces an extremely small number of hidden nodes for low memory consumption and fast training. Detailed experiments verified the effectiveness and efficiency of our approach. As an illustration, on the MNIST10 dataset, our approach AEKMCIter improves the clustering accuracy over ELMCIter up to 5%, while significantly reducing the training time and the memory consumption (i.e., the number of hidden nodes) up to 1/7 and 1/20, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

  2. http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.

  3. http://www.escience.cn/people/fpnie/papers.html.

  4. http://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/multiclass.html#usps.

  5. https://www.mathworks.com/matlabcentral/fileexchange/41459-6-functions-for-generating-artificial-datasets.

  6. http://manifold.cs.uchicago.edu/manifold_regularization/data.html.

References

  1. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  2. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501

    Article  Google Scholar 

  3. Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42:513–529

    Article  Google Scholar 

  4. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  5. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat) 28:100–108

    MATH  Google Scholar 

  6. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556

  7. McLachlan GJ, Lee SX, Rathnayake SI (2000) Finite mixture models. Ann Rev Stat Appl 6:355–378

    Article  MathSciNet  Google Scholar 

  8. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems, pp 849–856

  9. Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Proceedings of advances in neural information processing systems, pp 1537–1544

  10. Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Proceedings of advances in neural information processing systems, pp 1417–1424

  11. Zhang K, Tsang IW, Kwok JT (2009) Maximum margin clustering made practical. IEEE Trans Neural Netw 20:583–596

    Article  Google Scholar 

  12. Bezdek JC, Hathaway RJ (2003) Convergence of alternating optimization. Neural Parallel Sci Comput 11:351–368

    MathSciNet  MATH  Google Scholar 

  13. Zhang C, Xia S, Liu B, Zhang L (2013) Extreme maximum margin clustering. IEICE Trans Inf Syst 96:1745–1753

    Article  Google Scholar 

  14. Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8

    Article  Google Scholar 

  15. Zong W, Huang G-B, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242

    Article  Google Scholar 

  16. He Q, Jin X, Du C, Zhuang F, Shi Z (2014) Clustering in extreme learning machine feature space. Neurocomputing 128:88–95

    Article  Google Scholar 

  17. Huang G, Song S, Gupta JN, Wu C (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44:2405–2417

    Article  Google Scholar 

  18. Achlioptas D (2003) Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687

    Article  MathSciNet  Google Scholar 

  19. Li C, Deng C, Zhou S, Zhao B, Huang G-B (2018) Conditional random mapping for effective elm feature representation. Cognit Comput 10:1–21

    Article  Google Scholar 

  20. Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel SVM on limited resources: a low-rank linearization approach. Proc Mach Learn Res 22:1425–1434

    Google Scholar 

  21. Golts A, Elad M (2016) Linearized kernel dictionary learning. IEEE J Sel Top Sig Process 10:726–739

    Article  Google Scholar 

  22. Pourkamali-Anaraki F, Becker S (2016) A randomized approach to efficient kernel clustering. In: Proceedings of the IEEE global conference on signal and information processing, pp 207–211

  23. Vong C-M, Chen C, Wong P-K (2018) Empirical kernel map-based multilayer extreme learning machines for representation learning. Neurocomputing 310:265–276

    Article  Google Scholar 

  24. Zhang K, Kwok JT (2010) Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Trans Neural Netw 21:1576–1587

    Article  Google Scholar 

  25. Kumar S, Mohri M, Talwalkar A (2012) Sampling methods for the Nyström method. J Mach Learn Res 13:981–1006

    MathSciNet  MATH  Google Scholar 

  26. Scholkopf B, Mika S, Burges CJ, Knirsch P, Muller K-R, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017

    Article  Google Scholar 

  27. Pourkamali-Anaraki F, Becker S (2018) Randomized clustered Nystrom for large-scale kernel machines. ArXiv preprint arXiv:1612.06470. Accessed 20 May 2018

  28. Gittens A, Mahoney MW (2013) Revisiting the Nyström method for improved large-scale machine learning. J Mach Learn Res 28:567–575

    MATH  Google Scholar 

  29. Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J Comput 36:132–157

    Article  MathSciNet  Google Scholar 

  30. Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175

    MathSciNet  MATH  Google Scholar 

  31. Williams CK, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems, pp 682–688

  32. Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, pp 1232–1239

  33. Lichman M (2013) UCI machine learning repository, University of California, School of Information and Computer Sciences, Irvine, CA. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

The work is partially supported by the following Grants: MYRG2018-00138-FST, MYRG2016-00134, University of Macau; 273/2017/A, Fundo de Ciencia e Tecnologia, Macau.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Man Vong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Vong, CM., Wong, PK. et al. Approximate empirical kernel map-based iterative extreme learning machine for clustering. Neural Comput & Applic 32, 8031–8046 (2020). https://doi.org/10.1007/s00521-019-04295-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04295-6

Keywords

Navigation