Unsupervised Learning and Clustering

  • Derek Greene
  • Pádraig Cunningham
  • Rudolf Mayer
Part of the Cognitive Technologies book series (COGTECH)


Unsupervised learning is very important in the processing of multimedia content as clustering or partitioning of data in the absence of class labels is often a requirement. This chapter begins with a review of the classic clustering techniques of k-means clustering and hierarchical clustering. Modern advances in clustering are covered with an analysis of kernel-based clustering and spectral clustering. One of the most popular unsupervised learning techniques for processing multimedia content is the self-organizing map, so a review of self-organizing maps and variants is presented in this chapter. The absence of class labels in unsupervised learning makes the question of evaluation and cluster quality assessment more complicated than in supervised learning. So this chapter also includes a comprehensive analysis of cluster validity assessment techniques.


Cluster Algorithm Unsupervised Learn Spectral Cluster Model Vector Cluster Validation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25(6):821–837, 1964.MathSciNetGoogle Scholar
  2. 2.
    A. Ben-Hur, A. Elisseeff, and I. Guyon. A stability based method for discovering structure in clustered data. In Proceedings of the 7th Pacific Symposium on Biocomputing (PSB 2002), pp. 6–17, Lihue, HI, January 2002.Google Scholar
  3. 3.
    J. C. Bezdek and N. R. Pal. Cluster validation with generalized dunn’s indices. In ANNES ’95: Proceedings of the 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems, p. 190, Washington, DC, USA, 1995. IEEE Computer Society.Google Scholar
  4. 4.
    J. Blackmore and R. Miikkulainen. Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map. In Proceedings of the ICNN’93, International Conference on Neural Networks, Vol. I, pp. 450–455, Piscataway, NJ, 1993. IEEE Service Center.Google Scholar
  5. 5.
    N. Bolshakova and F. Azuaje. Cluster validation techniques for genome expression data. Technical Report TCD-CS-2002-33, Trinity College Dublin, September 2002.Google Scholar
  6. 6.
    M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In Proceedings of the 9th International Workshop on AI and Statistics, January 2003.Google Scholar
  7. 7.
    T. Calinski and J. Harabasz. A dendrite method for cluster analysis. Communications in Statistics, 3:1–27, 1974.MathSciNetGoogle Scholar
  8. 8.
    N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines: and Other Kernel-Based Learning Methods. Cambridge University Press, New York, NY, USA, 2000.Google Scholar
  9. 9.
    D. L. Davies and W. Bouldin. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2):224–227, 1979.CrossRefGoogle Scholar
  10. 10.
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39:1–38, 1977.zbMATHMathSciNetGoogle Scholar
  11. 11.
    I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Knowledge Discovery and Data Mining, pp. 269–274, 2001.Google Scholar
  12. 12.
    I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the 2004 ACM SIGKDD International conference on Knowledge Discovery and Data Mining, pp. 551–556. New York, NY, 2004. ACM Press.Google Scholar
  13. 13.
    C. Ding and X. He. Cluster merging and splitting in hierarchical clustering algorithms. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), p. 139. Washington, DC, 2002. IEEE Computer Society.Google Scholar
  14. 14.
    W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM Journal of Research and Development, 17:420–425, 1973.zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    R. C. Dubes. How many clusters are best? – an experiment. Pattern Recognition, 20(6):645–663, 1987.CrossRefGoogle Scholar
  16. 16.
    J. C. Dunn. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3:32–57, 1974.Google Scholar
  17. 17.
    J. C. Dunn. Well separated clusters and optimal fuzzy-partitions. Journal of Cybernetics, 4:95–104, 1974.MathSciNetGoogle Scholar
  18. 18.
    M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98):298–305, 1973.MathSciNetGoogle Scholar
  19. 19.
    B. Fischer and J. M. Buhmann. Path-based clustering for grouping of smooth curves and texture segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions, 25(4):513–518, April 2003.Google Scholar
  20. 20.
    E. W. Forgy. Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics, 21:768–769, 1965.Google Scholar
  21. 21.
    E. B. Fowlkes and C. L. Mallow. A method for comparing two hierarchical clusterings. Journal of American Statistical Association, 78:553–569, 1983.zbMATHCrossRefGoogle Scholar
  22. 22.
    B. Fritzke. Growing cell structures—a self-organizing network in k dimensions. In I. Aleksander and J. Taylor, editors, Artificial Neural Networks, 2, Vol. II, pp. 1051–1056, Amsterdam, Netherlands, 1992. North-Holland.Google Scholar
  23. 23.
    B. Fritzke. Growing grid – a self-organizing network with constant neighborhood range and adaptation strength. Neural Processing Letters, 2(5):9–13, 1995.CrossRefGoogle Scholar
  24. 24.
    J. Ghosh. Scalable clustering methods for data mining. In N. Ye, editor, Handbook of Data Mining, chapter 10. Mahwah, NJ, 2003. Lawrence Erlbaum.Google Scholar
  25. 25.
    C. D. Giurcaneanu and I. Tabus. Cluster structure inference based on clustering stability with applications to microarray data analysis. EURASIP Journal on Applied Signal Processing, 1:64–80, 2004.CrossRefGoogle Scholar
  26. 26.
    S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84, 1998.Google Scholar
  27. 27.
    K. M. Hall. An r-dimensional quadratic placement algorithm. Management Science, 17(3):219–229, November 1970.zbMATHGoogle Scholar
  28. 28.
    V. Hautamäki, S. Cherednichenko, I. Kärkkäinen, T. Kinnunen, and P. Fränti. Improving k-means by outlier removal. In Image Analysis, 14th Scandinavian Conference, SCIA 2005, pp. 978–987, 2005.Google Scholar
  29. 29.
    L. J. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2:193–218, 1985.CrossRefGoogle Scholar
  30. 30.
    L. J. Hubert and J. R. Levin. A general statistical framework for accessing categorical. Psychological Bulletin, 83:1072–1082, 1976.CrossRefGoogle Scholar
  31. 31.
    P. Jaccard. The distribution of flora in the alpine zone. New Phytologist, 11(2):37–50, 1912.CrossRefGoogle Scholar
  32. 32.
    S. Kaski, J. Kangas, and T. Kohonen. Bibliography of self-organizing map (SOM) papers 1981–1997. Neural Computing Surveys, 1(3&4):1–176, 1998.Google Scholar
  33. 33.
    B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291–308, 1970.Google Scholar
  34. 34.
    Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research, 13:703–716, April 2003.CrossRefGoogle Scholar
  35. 35.
    T. Kohonen. Self-Organizing Maps. Springer-Verlag, New York, NY, 2001.zbMATHGoogle Scholar
  36. 36.
    T. Kohonen, E. Oja, O. Simula, A. Visa, and J. Kangas. Engineering applications of the self-organizing map. Proceedings of the IEEE, 84(10):1358–1384, October 1996.Google Scholar
  37. 37.
    T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann. Stability-based validation of clustering solutions. Neural Computation, 16(6):1299–1323, 2004.zbMATHCrossRefGoogle Scholar
  38. 38.
    B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In KDD ’99: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22, New York, NY, USA, 1999. ACM Press.Google Scholar
  39. 39.
    M. Law and A. K. Jain. Cluster validity by bootstrapping partitions. Technical Report MSU-CSE-03-5, University of Washington, February 2003.Google Scholar
  40. 40.
    E. Levine and E. Domany. Resampling method for unsupervised estimation of cluster validity. Neural Computation, 13(11):2573–2593, 2001.zbMATHCrossRefGoogle Scholar
  41. 41.
    M. Meila. Comparing clusterings. Technical Report 418, University of Washington, 2002.Google Scholar
  42. 42.
    D. Merkl. Exploration of text collections with hierarchical feature maps. In Research and Development in Information Retrieval, pp. 186–195, 1997.Google Scholar
  43. 43.
    R. Miikkulainen. Script recognition with hierarchical feature maps. Connection Science, 2(1&2):83–101, 1990.Google Scholar
  44. 44.
    G. W. Milligan and M. C. Cooper. An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2):159–179, 1985.CrossRefGoogle Scholar
  45. 45.
    A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Proceedings of the Advances in Neural Information Processing, 2001.Google Scholar
  46. 46.
    M. Oja, S. Kaski, and T. Kohonen. Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum. Neural Computing Surveys, 3:1–156, 2003.Google Scholar
  47. 47.
    A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal of Mathematical Analysis and Applications, 11(3):430–452, 1990.zbMATHCrossRefMathSciNetGoogle Scholar
  48. 48.
    W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(66):846–850, 1971.CrossRefGoogle Scholar
  49. 49.
    A. Rauber and D. Merkl. The SOMLib digital library system. In Proceedings of the 3rd European Conference on Research and Advanced Technology for Digital Libraries (ECDL’99), Lecture Notes in Computer Science (LNCS 1696), pp. 323–342, Paris, France, September 22-24 1999. Springer.Google Scholar
  50. 50.
    A. Rauber, D. Merkl, and M. Dittenbach. The growing hierarchical self-organizing map: Exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks, 13(6):1331–1341, November 2002.CrossRefGoogle Scholar
  51. 51.
    P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53–65, 1987.zbMATHCrossRefGoogle Scholar
  52. 52.
    J. W. Sammon Jr. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5):401–409, May 1969.CrossRefGoogle Scholar
  53. 53.
    B. Schölkopf, A. Smola, and K-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.CrossRefGoogle Scholar
  54. 54.
    J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97), pp. 731–737. Huntsville, AL, 1997. IEEE Computer Society.Google Scholar
  55. 55.
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(8):888–905, August 2000.CrossRefGoogle Scholar
  56. 56.
    M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Proceedings of KDD Workshop on Text Mining 2000, 2000.Google Scholar
  57. 57.
    A. Strehl and J. Ghosh. Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583–617, December 2002.CrossRefMathSciNetGoogle Scholar
  58. 58.
    R. Tibshirani, G. Walther, D. Botstein, and P. Brown. Cluster validation by prediction strength. Technical report, Statistics Department, Stanford University, 2001.Google Scholar
  59. 59.
    D. Verma and M. Meila. A comparison of spectral clustering algorithms. Technical report, University of Washington, 2003.Google Scholar
  60. 60.
    S. X. Yu and J. Shi. Multiclass spectral clustering. In Proceedings of the 9th IEEE International Conference on Computer Vision, p. 313, October 2003.Google Scholar
  61. 61.
    T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114, 1996.Google Scholar
  62. 62.
    Y. Zhao and G. Karypis. Criterion functions for document clustering: experiments and analysis. Technical Report 01-040, University of Minnesota, November 2001.Google Scholar
  63. 63.
    Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 515–524. New York, NY, 2002. ACM Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Derek Greene
    • 1
  • Pádraig Cunningham
    • 1
  • Rudolf Mayer
    • 2
  1. 1.University College DublinIreland
  2. 2.Vienna University of TechnologyAustria

Personalised recommendations