Cascade Evaluation of Clustering Algorithms

  • Laurent Candillier
  • Isabelle Tellier
  • Fabien Torre
  • Olivier Bousquet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


This paper is about the evaluation of the results of clustering algorithms, and the comparison of such algorithms. We propose a new method based on the enrichment of a set of independent labeled datasets by the results of clustering, and the use of a supervised method to evaluate the interest of adding such new information to the datasets.

We thus adapt the cascade generalization [1] paradigm in the case where we combine an unsupervised and a supervised learner. We also consider the case where independent supervised learnings are performed on the different groups of data objects created by the clustering [2].

We then conduct experiments using different supervised algorithms to compare various clustering algorithms. And we thus show that our proposed method exhibits a coherent behavior, pointing out, for example, that the algorithms based on the use of complex probabilistic models outperform algorithms based on the use of simpler models.


Cluster Algorithm Data Object Supervise Learning Combination Method Subspace Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gama, J., Brazdil, P.: Cascade generalization. Machine Learning 41, 315–343 (2000)MATHCrossRefGoogle Scholar
  2. 2.
    Apte, C.V., Natarajan, R., Pednault, E.P.D., Tipu, F.A.: A probabilistic estimaton framework for predictive model analytics. IBM Systems Journal 41 (2002)Google Scholar
  3. 3.
    Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Machine Learning 24, 173–202 (1996)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)MATHMathSciNetGoogle Scholar
  5. 5.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Int. Conf. on Machine Learning, pp. 148–156 (1996)Google Scholar
  6. 6.
    Breiman, L.: Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department, University of California (1996)Google Scholar
  7. 7.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS, vol. 2600, pp. 118–183. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  10. 10.
    Parsons, L., Haque, E., Liu, H.: Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications, SIAM Int. Conf. on Data Mining, pp. 48–56 (2004)Google Scholar
  11. 11.
    Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  12. 12.
    Alpaydin, E.: Combined 5x2cv F-test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)CrossRefGoogle Scholar
  13. 13.
    Domeniconi, C., Papadopoulos, D., Gunopolos, D., Ma, S.: Subspace clustering of high dimensional data. In: SIAM Int. Conf. on Data Mining (2004)Google Scholar
  14. 14.
    Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SSC: Statistical Subspace Clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 100–109. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)MATHMathSciNetGoogle Scholar
  16. 16.
    Candillier, L., Tellier, I., Torre, F., Bousquet, O.: SuSE: Subspace Selection embedded in an EM algorithm. In: Miclet, L. (ed.) Actes de la huitième Conférence d’Apprentissage (CAp) (2006)Google Scholar
  17. 17.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. KAUFM (1993)Google Scholar
  18. 18.
    Quinlan, R.: Data mining tools see5 and c5.0 (2004)Google Scholar
  19. 19.
    Webb, G.I., Agar, J.W.M.: Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm. Artificial Intelligence in Medicine 4, 419–430 (1992)CrossRefGoogle Scholar
  20. 20.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)MathSciNetGoogle Scholar
  21. 21.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Laurent Candillier
    • 1
    • 2
  • Isabelle Tellier
    • 1
  • Fabien Torre
    • 1
  • Olivier Bousquet
    • 2
  1. 1.GRAppACharles de Gaulle UniversityLille 3
  2. 2.PertinenceParis

Personalised recommendations