Ensemble of Dissimilarity Based Classifiers for Cancerous Samples Classification

  • Ángela Blanco
  • Manuel Martín-Merino
  • Javier de las Rivas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)


DNA Microarray technology allow us to identify cancerous tissues considering the gene expression levels across a collection of related samples.

Several classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) have been applied to this problem. However, they are usually based on Euclidean distances that fail to reflect accurately the sample proximities. Several classifiers have been extended to work with non-Euclidean dissimilarities although none outperforms the others because they misclassify a different set of patterns.

In this paper, we combine different kind of dissimilarity based classifiers to reduce the misclassification errors. The diversity among classifiers is induced considering a set of complementary dissimilarities for three different type of models. The experimental results suggest that the algorithm proposed helps to improve classifiers based on a single dissimilarity and a widely used combination strategy such as Bagging.


Support Vector Machine Combination Strategy Training Pattern Vote Strategy Dissimilarity Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of the ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD-PODS), vol. 1, pp. 13–18 (March 2001)Google Scholar
  2. 2.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat’l. Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  3. 3.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)CrossRefGoogle Scholar
  4. 4.
    Braga-Neto, U., Dougherty, E.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Chapman & Hall/CRC Press, New York (2001)Google Scholar
  7. 7.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  8. 8.
    Drãghici, S.: Data Analysis Tools for DNA Microarrays. Chapman & Hall/CRC Press, New York (2003)Google Scholar
  9. 9.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  11. 11.
    Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)CrossRefGoogle Scholar
  13. 13.
    Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. Johns Hopkins university press, Baltimore, Maryland, USA (1996)zbMATHGoogle Scholar
  14. 14.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)zbMATHCrossRefGoogle Scholar
  15. 15.
    Hinneburg, C.C.A.A., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. of the International Conference on Database Theory (ICDT), pp. 506–515. Morgan Kaufmann, Cairo, Egypt (2000)Google Scholar
  16. 16.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11) (November 2004)Google Scholar
  17. 17.
    Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Neural Networks 20(3), 228–239 (1998)Google Scholar
  18. 18.
    Kuncheva, L.I.: Combining Pattern Classifiers. John Wiley, New Jersey (2004)zbMATHGoogle Scholar
  19. 19.
    Martín-Merino, M., Muñoz, A.: Self organizing map and sammon mapping for asymmetric proximities. Neurocomputing 63, 171–192 (2005)CrossRefGoogle Scholar
  20. 20.
    Martín-Merino, M., Noz, A.M.: A new sammon algorithm for sparse data visualization. In: International Conference on Pattern Recognition (ICPR), pp. 477–481. IEEE Press, Cambridge (UK) (2004)Google Scholar
  21. 21.
    Molinaro, A., Simon, R., Pfeiffer, R.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15), 3301–3307 (2005)CrossRefGoogle Scholar
  22. 22.
    Pekalska, E., Paclick, P., Duin, R.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)CrossRefGoogle Scholar
  23. 23.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge, USA (2002)Google Scholar
  24. 24.
    Valentini, G., Dietterich, T.: Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. Journal of Machine Learning Research 5, 725–775 (2004)MathSciNetGoogle Scholar
  25. 25.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)zbMATHGoogle Scholar
  26. 26.
    West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J., Nevins, J.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20) (September 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ángela Blanco
    • 1
  • Manuel Martín-Merino
    • 1
  • Javier de las Rivas
    • 2
  1. 1.Universidad Pontificia de Salamanca, C/Compañía 5, 37002, SalamancaSpain
  2. 2.Cancer Research Center of Salamanca (CIC), SalamancaSpain

Personalised recommendations