Soft Competitive Learning for Large Data Sets

Schleif, Frank-Michael; Zhu, Xibin; Hammer, Barbara

doi:10.1007/978-3-642-32518-2_14

Frank-Michael Schleif³,
Xibin Zhu³ &
Barbara Hammer³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 185))

1424 Accesses

Abstract

Soft competitive learning is an advanced k-means like clustering approach overcoming some severe drawbacks of k-means, like initialization dependence and sticking to local minima. It achieves lower distortion error than k-means and has shown very good performance in the clustering of complex data sets, using various metrics or kernels. While very effective, it does not scale for large data sets which is even more severe in case of kernels, due to a dense prototype model. In this paper, we propose a novel soft-competitive learning algorithm using core-sets, significantly accelerating the original method in practice with natural sparsity. It effectively deals with very large data sets up to multiple million points. Our method provides also an alternative fast kernelization of soft-competitive learning. In contrast to many other clustering methods the obtained model is based on only few prototypes and shows natural sparsity. It is the first natural sparse kernelized soft competitive learning approach. Numerical experiments on synthetical and benchmark data sets show the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC, pp. 250–257 (2002)
Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases. Department of Information and Computer Science. University of California, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Camastra, F., Verri, A.: A Novel Kernel Method for Clustering. IEEE TPAMI 27(5), 801–805 (2005)
Article Google Scholar
Filippone, M., Camastra, F., Massulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognition 41, 176–190 (2008)
Article MATH Google Scholar
Frénay, B., Verleysen, M.: Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing 74(16), 2526–2531 (2011)
Article Google Scholar
Frey, B., Dueck, D.: Clustering by message passing between data points. Science 315, 972–976 (2007)
Article MathSciNet MATH Google Scholar
Hammer, B., Hasenfuss, A.: Topographic mapping of large dissimilarity data sets. Neural Computation 22(9), 2229–2284 (2010)
Article MathSciNet MATH Google Scholar
Labusch, K., Barth, E., Martinetz, T.: Soft-competitive learning of sparse codes and its application to image reconstruction. Neurocomputing 74(9), 1418–1428 (2011)
Article Google Scholar
Liang, C., Xiao-Ming, D., Sui-Wu, Z., Yong-Qing, W.: Scaling up kernel grower clustering method for large data sets via core-sets. Acta Automatica Sinica 34(3), 376–382 (2008)
MATH Google Scholar
Martinetz, T., Berkovich, S., Schulten, K.: Neural Gas Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks 4(4), 558–569 (1993)
Article Google Scholar
Qin, A.K., Suganthan, P.N.: A novel kernel prototype-based learning algorithm. In: Proc. of ICPR 2004, pp. 2621–2624 (2004)
Google Scholar
Schleif, F.M., Villmann, T., Hammer, B., Schneider, P.: Effcient kernelized prototype-based classification. Journal of Neural Systems 21(6), 443–457 (2011)
Article Google Scholar
Schleif, F.-M., Villmann, T., Hammer, B., Schneider, P., Biehl, M.: Generalized Derivative Based Kernelized Learning Vector Quantization. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 21–28. Springer, Heidelberg (2010)
Chapter Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press (2002)
Google Scholar
Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Article Google Scholar
Smola, A.J., Schölkopf, B.: Sparse greedy matrix approximation for machine learning. In: Langley, P. (ed.) ICML, pp. 911–918. Morgan Kaufmann (2000)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)
Article Google Scholar
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research 6, 363–392 (2005)
MathSciNet MATH Google Scholar
Tzortzis, G., Likas, A.: The global kernel k-means clustering algorithm. In: IJCNN, pp. 1977–1984. IEEE (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

CITEC Centre of Excellence, Bielefeld University, 33615, Bielefeld, Germany
Frank-Michael Schleif, Xibin Zhu & Barbara Hammer

Authors

Frank-Michael Schleif
View author publications
You can also search for this author in PubMed Google Scholar
Xibin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Hammer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank-Michael Schleif .

Editor information

Editors and Affiliations

, Department of Computer Science, Eindhoven University of Technology, Eindhoven, 5600, Netherlands
Mykola Pechenizkiy
Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, Poznan, 60-965, Poland
Marek Wojciechowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schleif, FM., Zhu, X., Hammer, B. (2013). Soft Competitive Learning for Large Data Sets. In: Pechenizkiy, M., Wojciechowski, M. (eds) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32518-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-32518-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32517-5
Online ISBN: 978-3-642-32518-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics