Skip to main content

Spectral Clustering: An Explorative Study of Proximity Measures

  • Conference paper
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2011)

Abstract

Spectral clustering algorithms recently gained much interest in research community. This surge in interest is mainly due to their ease of use, their applicability to a variety of data types and domains as well as the fact that they very often outperform traditional clustering algorithms. These algorithms consider the pair-wise similarity between data objects and construct a similarity matrix to group data into natural subsets, so that the objects located in the same cluster share many common characteristics. Objects are then allocated into clusters by employing a proximity measure, which is used to compute the similarity or distance between the data objects in the matrix. As such, an early and fundamental step in spectral cluster analysis is the selection of a proximity measure. This choice also has the highest impact on the quality and usability of the end result. However, this crucial aspect is frequently overlooked. For instance, most prior studies use the Euclidean distance measure without explicitly stating the consequences of selecting such measure. To address this issue, we perform a comparative and explorative study on the performance of various existing proximity measures when applied to spectral clustering algorithm. Our results indicate that the commonly used Euclidean distance measure is not always suitable, specifically in domains where the data is highly imbalanced and the correct clustering of boundary objects are critical. Moreover, we also noticed that for numeric data type, the relative distance measures outperformed the absolute distance measures and therefore, may boost the performance of a clustering algorithm if used. As for the datasets with mixed variables, the selection of distance measure for numeric variable again has the highest impact on the end result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luxburg, U.: A Tutorial on Spectral Clustering. Statistics and Computing 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  2. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  3. Bach, F.R., Jordan, M.I.: Learning Spectral Clustering, with Application to Speech Separation. J. Mach. Learn. Res. 7, 1963–2001 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Paccanaro, A., Casbon, J.A., Saqi, M.A.: Spectral Clustering of Protein Sequences. Nucleic Acids Res. 34(5), 1571–1580 (2006)

    Article  Google Scholar 

  5. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: Dietterich, T.G., Ghahramani, S.B. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856 (2001)

    Google Scholar 

  6. Verma, D., Meila, M.: A Comparison of Spectral Clustering Algorithms (2001)

    Google Scholar 

  7. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  8. Bach, F.R., Jordan, M.I.: Learning Spectral Clustering. In: Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, pp. 305–312 (2003)

    Google Scholar 

  9. Everitt, B.S.: Cluster Analysis, 2nd edn. Edward Arnold and Halsted Press (1980)

    Google Scholar 

  10. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience (2005)

    Google Scholar 

  11. Meila, M., Shi, J.: A Random Walks View of Spectral Segmentation. In: International Conference on Artificial Intelligence and Statistics (AISTAT), pp. 8–11 (2001)

    Google Scholar 

  12. Webb, A.R.: Statistical Pattern Recognition, 2nd edn. John Wiley & Sons (2002)

    Google Scholar 

  13. Larose, D.T.: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley-Interscience (2004)

    Google Scholar 

  14. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2006)

    MATH  Google Scholar 

  15. Costa, I.G., de Carvalho, F.A.T., de Souto, M.C.P.: Comparative Study on Proximity Indices for Cluster Analysis of Gene Expression Time Series. Journal of Intelligent and Fuzzy Systems: Applications in Engineering and Technology 13(2-4), 133–142 (2002)

    Google Scholar 

  16. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining (2000)

    Google Scholar 

  17. Kubat, M., Holte, R.C., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30(2-3), 195–215 (1998)

    Article  Google Scholar 

  18. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann (2005)

    Google Scholar 

  19. Japkowicz, N., Shah, M.: Performance Evaluation for Classification A Machine Learning and Data Mining Perspective (in progress): Chapter 6: Statistical Significance Testing (2011)

    Google Scholar 

  20. Heinz, G., Peterson, L.J., Johnson, R.W., Kerk, C.J.: Exploring Relationships in Body Dimensions. Journal of Statistics Education 11(2) (2003)

    Google Scholar 

  21. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)

    Google Scholar 

  22. Lee, S.-W., Verri, A. (eds.): SVM 2002. LNCS, vol. 2388. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  23. Abou-Moustafa, K.T., Ferrie, F.P.: The Minimum Volume Ellipsoid Metric. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 335–344. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Filzmoser, P., Garrett, R., Reimann, C.: Multivariate Outlier Detection in Exploration Geochemistry. Computers and Geosciences 31(5), 579–587 (2005)

    Article  Google Scholar 

  25. Aiello, M., Andreozzi, F., Catanzariti, E., Isgro, F., Santoro, M.: Fast Convergence for Spectral Clustering. In: ICIAP 2007: Proceedings of the 14th International Conference on Image Analysis and Processing, pp. 641–646. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  26. Fischer, I., Poland, J.: New Methods for Spectral Clustering. Technical Report IDSIA-12-04, IDSIA (2004)

    Google Scholar 

  27. Teknomo, K.: Similarity Measurement, http://people.revoledu.com/kardi/tutorial/Similarity/

  28. Boslaugh, S., Watters, P.A.: Statistics in a Nutshell. O.Reilly & Associates, Inc., Sebastopol (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Azam, N.F., Viktor, H.L. (2013). Spectral Clustering: An Explorative Study of Proximity Measures. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37186-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37185-1

  • Online ISBN: 978-3-642-37186-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics