Skip to main content

Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions

  • Conference paper
Advanced Methods for Computational Collective Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 457))

Abstract

This paper is a reflection upon a common practice of solving various types of learning problems by optimizing arbitrarily chosen criteria in the hope that they are well correlated with the criterion actually used for assessment of the results. This issue has been investigated using clustering as an example, hence a unified view of clustering as an optimization problem is first proposed, stemming from the belief that typical design choices in clustering, like the number of clusters or similarity measure can be, and often are suboptimal, also from the point of view of clustering quality measures later used for algorithm comparison and ranking. In order to illustrate our point we propose a generalized clustering framework and provide a proof-of-concept using standard benchmark datasets and two popular clustering methods for comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)

    Google Scholar 

  2. Birge, B.: PSOt – a particle swarm optimization toolbox for use with Matlab. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium SIS03 Cat No03EX706, pp. 182–186 (2003)

    Google Scholar 

  3. Budka, M., Gabrys, B.: Correntropy-based density-preserving data sampling as an alternative to standard cross-validation. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (July 2010)

    Google Scholar 

  4. Budka, M., Gabrys, B.: Ridge regression ensemble for toxicity prediction. Procedia Computer Science 1(1), 193–201 (2010)

    Article  Google Scholar 

  5. Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2), 224–227 (1979)

    Google Scholar 

  6. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Dubes, R.: How many clusters are best?-an experiment. Pattern Recognition 20(6), 645–663 (1987)

    Article  Google Scholar 

  8. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  9. Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S.: PR–Tools 4.1. A MATLAB Toolbox for Pattern Recognition (2007), http://prtools.org

  10. Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4(1), 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  11. Fletcher, R.: Practical methods of optimization, 2nd edn. Wiley (2000)

    Google Scholar 

  12. Fraser, A.: Simulation of genetic systems by automatic digital computers vi. epistasis. Australian Journal of Biological Sciences 13(2), 150–162 (1960)

    Google Scholar 

  13. Hamming, R.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)

    MathSciNet  Google Scholar 

  14. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura (1901)

    Google Scholar 

  15. Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  16. Jenssen, R., Erdogmus, D., Hild, K.E., Príncipe, J.C., Eltoft, T.: Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 34–45. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)

    Google Scholar 

  18. Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., Xu, Y.: Trustworthy online controlled experiments: Five puzzling outcomes explained. In: KDD 2012, Beijing China, August 12-16 (2012)

    Google Scholar 

  19. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, p. 14 (1967)

    Google Scholar 

  20. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Budka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Budka, M. (2013). Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions. In: Nguyen, N., Trawiński, B., Katarzyniak, R., Jo, GS. (eds) Advanced Methods for Computational Collective Intelligence. Studies in Computational Intelligence, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34300-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34300-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34299-8

  • Online ISBN: 978-3-642-34300-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics