Skip to main content

Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering

  • Conference paper
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR 2005)

Abstract

This paper addresses the problem of efficient information theoretic, non-parametric data clustering. We develop a procedure for adapting the cluster memberships of the data patterns, in order to maximize the recent Cauchy-Schwarz (CS) probability density function (pdf) distance measure. Each pdf corresponds to a cluster. The CS distance is estimated analytically and non-parametrically by means of the Parzen window technique for density estimation. The resulting form of the cost function makes it possible to develop an efficient adaption procedure based on constrained gradient descent, using stochastic approximation of the gradients. The computational complexity of the algorithm is O(MN), MN, where N is the total number of data patterns and M is the number of data patterns used in the stochastic approximation. We show that the new algorithm is capable of performing well on several odd-shaped and irregular data sets.

This work was partially supported by NSF grants ECS-9900394 and EIA-0135946.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Bezdek, J.C.: A Convergence Theorem for the Fuzzy Isodata Clustering Algorithms. IEEE Transactions on Pattern Analysis and Machine Learning 2(1), 1–8 (1980)

    Article  MATH  Google Scholar 

  3. McLachlan, G.J., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)

    Book  MATH  Google Scholar 

  4. Rose, K., Gurewitz, E., Fox, G.C.: Vector Quantization by Deterministic Annealing. IEEE Transactions on Information Theory 38(4), 1249–1257 (1992)

    Article  MATH  Google Scholar 

  5. Hofmann, T., Buhmann, J.M.: Pairwise Data Clustering by Deterministic Annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)

    Article  Google Scholar 

  6. Roberts, S.J., Everson, R., Rezek, I.: Maximum Certainty Data Partitioning. Pattern Recognition 33, 833–839 (2000)

    Article  Google Scholar 

  7. Tishby, N., Slonim, N.: Data Clustering by Markovian Relaxation and the Information Bottleneck Method. In: Advances in Neural Information Processing Systems, vol. 13, pp. 640–646. MIT Press, Cambridge (2001)

    Google Scholar 

  8. Principe, J., Xu, D., Fisher, J.: Information Theoretic Learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering, ch. 7, vol. I. John Wiley & Sons, New York (2000)

    Google Scholar 

  9. Parzen, E.: On the Estimation of a Probability Density Function and the Mode. The Annals of Mathematical Statistics 32, 1065–1076 (1962)

    Article  MathSciNet  Google Scholar 

  10. Gokcay, E., Principe, J.: Information Theoretic Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 158–170 (2002)

    Article  Google Scholar 

  11. Milligan, G.W., Cooper, M.C.: An Examination of Procedures for Determining the Number of Clusters in a Data Set. Phychometrica, 159–179 (1985)

    Google Scholar 

  12. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  13. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  14. Mangasarian, O.L., Wolberg, W.H.: Cancer Diagnosis via Linear Programming. SIAM News 5, 1–18 (1990)

    Google Scholar 

  15. Jenssen, R., Principe, J.C., Eltoft, T.: Information Cut and Information Forces for Clustering. In: Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Toulouse, France, September 17-19, pp. 459–468 (2003)

    Google Scholar 

  16. Jenssen, R., Erdogmus, D., Principe, J.C., Eltoft, T.: The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space. In: Advances in Neural Information Processing Systems, vol. 17, pp. 625–632. MIT Press, Cambridge (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jenssen, R., Erdogmus, D., Hild, K.E., Principe, J.C., Eltoft, T. (2005). Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering. In: Rangarajan, A., Vemuri, B., Yuille, A.L. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2005. Lecture Notes in Computer Science, vol 3757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11585978_3

Download citation

  • DOI: https://doi.org/10.1007/11585978_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30287-2

  • Online ISBN: 978-3-540-32098-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics