Identifying Informative Genes for Prediction of Breast Cancer Subtypes

  • Iman Rezaeian
  • Yifeng Li
  • Martin Crozier
  • Eran Andrechek
  • Alioune Ngom
  • Luis Rueda
  • Lisa Porter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)


It is known that breast cancer is not just one disease, but rather a collection of many different diseases occurring in one site that can be distinguished based in part on characteristic gene expression signatures. Appropriate diagnosis of the specific subtypes of this disease is critical for ensuring the best possible patient response to therapy. Currently, therapeutic direction is determined based on the expression of characteristic receptors; while cost effective, this method is not robust and is limited to predicting a small number of subtypes reliably. Using the original 5 subtypes of breast cancer we hypothesized that machine learning techniques would offer many benefits for feature selection. Unlike existing gene selection approaches, we propose a tree-based approach that conducts gene selection and builds the classifier simultaneously. We conducted computational experiments to select the minimal number of genes that would reliably predict a given subtype. Our results support that this modified approach to gene selection yields a small subset of genes that can predict subtypes with greater than 95% overall accuracy. In addition to providing a valuable list of targets for diagnostic purposes, the gene ontologies of selected genes suggest that these methods have isolated a number of potential genes involved in breast cancer biology, etiology and potentially novel therapeutics.


breast tumor subtype gene selection classification 


  1. 1.
    Perou, C.M., et al.: Golecular Portraits of Human Breast Tumours. Nature 406, 747–752 (2000)CrossRefGoogle Scholar
  2. 2.
    Perou, C.M., et al.: Comprehensive Molecular Portraits of Human Breast Tumours. Nature 490, 61–70 (2012)CrossRefGoogle Scholar
  3. 3.
    Chandriani, S., Frengen, E., Cowling, V.H., Pendergrass, S.A., Perou, C.M., Whitfield, M.L., Cole, M.D.: A Core MYC Gene Expression Signatures is Prominent in Basal-Like Breast Cancer but only Partially Overlaps the Core Serum Response. PLOS One 4(8), e6693 (2009)Google Scholar
  4. 4.
    van’t Veer, L.J., et al.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415(6871), 530–536 (2002)Google Scholar
  5. 5.
    Klebanov, L., Yakovlev, A.: How High is The Level of Technical Noise in Microarray Data? Biology Direct. 2, 9 (2007)CrossRefGoogle Scholar
  6. 6.
    Ding, C., Peng, H.: Munimun Redundancy Feature Selection from Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  8. 8.
    Li, T., Zhang, C., Ogihata, M.: A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Vased on Gene Expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRefGoogle Scholar
  9. 9.
    Liu, H., Setiono, R.: Chi2: Feature Selection and Discretization of Numeric Attributes. In: IEEE International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE Press, New York (1995)Google Scholar
  10. 10.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  11. 11.
    Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-Norm Support Vector Machines. In: NIPS. MIT Press, Cambridge (2004)Google Scholar
  12. 12.
    Hu, Z., et al.: The Molecular Portraits of Breast Tumors are Conserved Across Microarray Platforms. BMC Genomics 7, 96 (2006)CrossRefGoogle Scholar
  13. 13.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2006)Google Scholar
  14. 14.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  15. 15.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 12, 27:1–27:27 (2011)Google Scholar
  16. 16.
    Liu, X., Krishnan, A., Mondry, A.: An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data. BMC Bioinformatics 6, 76 (2005)CrossRefGoogle Scholar
  17. 17.
    Liu, Q., Sung, A.H., Chen, Z., Liu, J., Huang, X., Deng, Y.: Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data. PLoS One 4(12), e8250 (2009)Google Scholar
  18. 18.
    Zeng, T., Liu, J.: Mixture Classification Model Based on Clinical Markers for Breast Cancer Prognosis. Artificial Intelligence in Medicine 48, 129–137 (2010)CrossRefGoogle Scholar
  19. 19.
    Mohamad, M.S., Omatu, S., Deris, S., Yoshioka, M.: Particle Swarm Optimization for Gene Selection in Classifying Cancer Classes. Artificial Life and Robotics 14(1), 16–19 (2009)CrossRefGoogle Scholar
  20. 20.
    Yousef, M., Jung, S., Showe, L., Showe, M.: Recursive Cluster Elimination (RCE) for Classification and Feature Selection from Gene Expression Data. BMC Bioinformatics 8, 144 (2007)CrossRefGoogle Scholar
  21. 21.
    Li, Y., Ngom, A., Rueda, L.: A Framework of Gene Subset Selection Using Multiobjective Evolutionary Algorithm. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS (LNBI), vol. 7632, pp. 38–48. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    Diehn, M., et al.: SOURCE: a Unified Genomic Resource of Functional Annotations, Ontologies, and Gene Expression Data. Nucleic Acids Research 31(1), 219–223 (2003), CrossRefGoogle Scholar
  23. 23.
    Sorlie, T., et al.: Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications. PANS 98(19), 10869–10874 (2001)CrossRefGoogle Scholar
  24. 24.
    Sorlie, T., et al.: Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets. PANS 100(14), 8418–8423 (2003)CrossRefGoogle Scholar
  25. 25.
    Curtis, C., et al.: The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups. Nature 486(7403), 346–352 (2012)Google Scholar
  26. 26.
    Hallett, R.M., Dvorkin-Gheva, A., Bane, A., Hassell, J.A.: A Gene Signature for Predicting Outcome in Patients with Basal-Like Breast Cancer. Scientific Reports 2, 227 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Iman Rezaeian
    • 1
  • Yifeng Li
    • 1
  • Martin Crozier
    • 2
  • Eran Andrechek
    • 3
  • Alioune Ngom
    • 1
  • Luis Rueda
    • 1
  • Lisa Porter
    • 3
  1. 1.School of Computer ScienceUniversity of WindsorWindsorCanada
  2. 2.Department of Biological SciencesUniversity of WindsorWindsorCanada
  3. 3.Department of PhysiologyMichigan State UniversityEast LansingUnited States

Personalised recommendations