Identification of Breast Cancer Subtypes Using Multiple Gene Expression Microarray Datasets

  • Alexandre Mendes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7106)


This work is motivated by the need for consensus clustering methods using multiple datasets, applicable to microarray data. It introduces a new method for clustering samples with similar genetic profiles, in an unsupervised fashion, using information from two or more datasets. The method was tested using two breast cancer gene expression microarray datasets, with 295 and 249 samples; and 12,325 common genes. Four subtypes with similar genetic profiles were identified in both datasets. Clinical information was analysed for the subtypes found and they confirmed different levels of tumour aggressiveness, measured by the time of metastasis, thus indicating a connection between different genetic profiles and prognosis. Finally, the subtypes identified were compared to already established subtypes of breast cancer. That indicates that the new approach managed to detect similar gene expression profile patterns across the two datasets without any a priori knowledge. The two datasets used in this work, as well as all the figures, are available for download from the website


Bioinformatics breast cancer data mining genetic algorithms 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buriol, L., Franca, P., Moscato, P.: A new memetic algorithm for the asymmetric traveling salesman problem. Journal of Heuristics 10, 483–506 (2004)CrossRefMATHGoogle Scholar
  2. 2.
    Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceeding of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 418–426. IEEE Computer Society (2003)Google Scholar
  3. 3.
    Glover, F., Kochenberger, G.: Handbook of Metaheuristics. Springer, USA (2003)CrossRefMATHGoogle Scholar
  4. 4.
    Grotkjaer, T., Winther, O., Regenberg, B., Nielsen, J., Hansen, L.: Robust multi-scale clustering of large dna microarray datasets with the consensus algorithm. Bioinformatics 22, 58–67 (2006)CrossRefGoogle Scholar
  5. 5.
    Hoshida, Y., Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Subclass mapping: Identifying common subtypes in independent disease data sets. PLoS ONE 2, e1195 (2007)CrossRefGoogle Scholar
  6. 6.
    Hu, X., Stern, H.M., Ge, L., O’Brien, C., Haydu, L., Honchell, C.D., Haverty, P.M., Wu, B.P.T., Amler, L.C., Chant, J., Stokoe, D., Lackner, M.R., Cavet, G.: Genetic alterations and oncogenic pathways associated with breast cancer subtypes. Molecular Cancer Research 7, 511–522 (2009)CrossRefGoogle Scholar
  7. 7.
    Irvin Jr., W., Carey, L.: What is triple-negative breast cancer? European Journal of Cancer 44, 2799–2805 (2008)CrossRefGoogle Scholar
  8. 8.
    Mendes, A.: Consensus clustering of gene expression microarray data using genetic algorithms. In: Proceedings of PRIB 2008 - Third IAPR International Conference on Pattern Recognition in Bioinformatics (Supp. volume), pp. 181–192 (2008)Google Scholar
  9. 9.
    Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Sciences 102, 13550–13555 (2005)CrossRefGoogle Scholar
  10. 10.
    Monti, S., Mesirov, P.T.J., Golub, T.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003)CrossRefMATHGoogle Scholar
  11. 11.
    Moscato, P., Mendes, A., Berretta, R.: Benchmarking a memetic algorithm for ordering microarray data. Biosystems 88, 56–75 (2007)CrossRefGoogle Scholar
  12. 12.
    Olariu, S., Zomaya, A.: Handbook of Bioinspired Algorithms and Applications. Chapman & Hall/CRC, USA (2005)CrossRefMATHGoogle Scholar
  13. 13.
    Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F.L., Walker, M.G., Watson, D., Park, T., Hiller, W., Fisher, E.R., Wickerham, L., Bryant, J., Wolmark, N.: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. The New England Journal of Medicine 351, 2817–2826 (2004)CrossRefGoogle Scholar
  14. 14.
    Perreard, L., Fan, C., Quackenbush, J., Mullins, M., Gauthier, N., Nelson, E., Mone, M., Hansen, H., Buys, S., Rasmussen, K., Orrico, A., Dreher, D., Walters, R., Parker, J., Hu, Z., He, X., Palazzo, J., Olopade, O., Szabo, A., Perou, C.M., Bernard, P.: Classification and risk stratification of invasive breast carcinomas using a real-time quantitative rt-pcr assay. Breast Cancer Research 8, R23 (2006)CrossRefGoogle Scholar
  15. 15.
    Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biology 5, R94 (2004)CrossRefGoogle Scholar
  16. 16.
    van de Vijver, M., He, Y., van’t Veer, L., Dai, H., Hart, A., Voskuil, D., Schreiber, G., Peterse, J., Roberts, C., Marton, M., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E., Friend, S., Bernards, R.: A gene expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine 347, 1999–2009 (2002)CrossRefGoogle Scholar
  17. 17.
    van’t Veer, L., Bernards, R.: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature 452, 564–570 (2008)CrossRefGoogle Scholar
  18. 18.
    Weigelt, B., Baehner, F., Reis-Filho, J.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. Journal of Pathology 220, 263–280 (2010)CrossRefGoogle Scholar
  19. 19.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Alexandre Mendes
    • 1
  1. 1.Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine School of Electrical Engineering and Computer Science Faculty of Engineering and Built EnvironmentThe University of NewcastleCallaghanAustralia

Personalised recommendations