Biological Knowledge Integration in DNA Microarray Gene Expression Classification Based on Rough Set Theory

  • D. Calvo-DmgzEmail author
  • J. F. Galvez
  • Daniel Glez-Peña
  • Florentino Fdez-Riverola
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 154)


DNA microarrays have contributed to the exponential growth of genetic data from years. One of the possible applications of this large amount of gene expression data diagnosis of diseases like cancer using classification methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge in classification process using Rough Set Theory, making it more effective. In addition, the proposed model is able to indicate which part of biological knowledge has been used building the model and classifing new samples.


DNA microarray classification Biological Knowledge Principal Component Analysis Discriminant Fuzzy Pattern Rough Sets Basic Category 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    McLachlan, G.J., Do, K.A., Ambroise, C.: Analyzing Microarray Gene Expression Data. John Wiley & Sons, Inc., Chichester (2004)zbMATHCrossRefGoogle Scholar
  2. 2.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Furey, Cristianini, Duffy, Bednarski, Schummer, Haussler: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000)CrossRefGoogle Scholar
  4. 4.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Angelo, M., Ladd, C., Reich, M., Mesirov, P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multi-class cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America 98, 15149–15154 (2001)CrossRefGoogle Scholar
  5. 5.
    Meltzer, P.S., Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)CrossRefGoogle Scholar
  6. 6.
    Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)CrossRefGoogle Scholar
  7. 7.
    Demichelis, F., Magni, P., Piergiorgi, P., Rubin, M.A., Bellazzi, R.: A hierarchical naïve bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics 7, 514 (2006)CrossRefGoogle Scholar
  8. 8.
    Pawlak, Z.: Rough Sets, Theoretical aspects of reasoning about data. Kluwer Academic Publishers (1991)Google Scholar
  9. 9.
    Chen, X., Wang, L.: Integrating biological knowledge with gene expression profiles for survival prediction of cancer. Computational Biology 16(2), 265–278 (2009)CrossRefGoogle Scholar
  10. 10.
    Wei, Z., Li, H.: Nonparametric pathway-based regression models for analysis of genomic data. Biostatistics 8, 265–284 (2007)zbMATHCrossRefGoogle Scholar
  11. 11.
    Tai, F., Pan, W.: Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data. Bioinformatics 23(23), 3170–3177 (2007)CrossRefGoogle Scholar
  12. 12.
    Tai, F., Pan, W.: Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics 23(14), 1775–1782 (2007)CrossRefGoogle Scholar
  13. 13.
    Galvez, J.F., Diaz, F., Carrion, P., Garcia, A.: An Application for Knowledge Discovery Based on a Revision of VPRS Model. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 296–303. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Ziarko, W.: Variable precision rough set model. Computer and System Sciences 46, 39–59 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Fodor, I.: A survey of dimension reduction techniques. tech. rep., Lawrence Livermore National Laboratory (May 2002)Google Scholar
  17. 17.
    Glez-Pena, D.: Modelo para la integratión de conocimiento biológico explícito en técnicas de clasificación aplicadas a datos procedentes de microarrays de ADN. PhD thesis, University of Vigo (2009) Google Scholar
  18. 18.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2, 559–572 (1901)Google Scholar
  19. 19.
    Glez-Pena, D., Alvarez, R., Diaz, F., Fdez-Riverola, F.: Dfp: a bioconductor package for fuzzy profile identification and gene reduction of microarray data. BMC Bioinformatics 10(1), 37 (2009)CrossRefGoogle Scholar
  20. 20.
    Maji, P., Paul, S.: Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int. J. Approx. Reasoning 52(3), 408–426 (2011)CrossRefGoogle Scholar
  21. 21.
    Galvez, J.F., Olivieri, D., Carrion, P.: An improved algorithm for determining reducts in rough set models (2003)Google Scholar
  22. 22.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  23. 23.
    Fix, E., Hodges, J.L.: Discriminatory analysis – nonparametric discrimination: Consistency properties. Tech. Rep. Project 21-49-004, Report No. 4, 261-279, USAF School of Aviation Medicine, Randolph Field, Texas (1951)Google Scholar
  24. 24.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  25. 25.
    Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98(4), 262–272 (2006)CrossRefGoogle Scholar
  26. 26.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)CrossRefGoogle Scholar
  27. 27.
    Wang, Y., Klijn, J., Zhang, Y., Sieuwerts, A., Look, M., Yang, F., Talantov, D., Timmermans, M., Meijervangelder, M., Yu, J.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet 365, 671–679 (2005)Google Scholar
  28. 28.
    Amberger, J.S., Bocchini, C.A., Scott, A.F., Hamosh, A.: Mckusick’s online mendelian inheritance in man (OMIM®). Nucleic Acids Research 37(Database-Issue), 793–796 (2009)CrossRefGoogle Scholar
  29. 29.
    Ben-David, A.: Comparison of classification accuracy using cohen’s weighted kappa. Expert Syst. Appl. 34(2), 825–832 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • D. Calvo-Dmgz
    • 1
    Email author
  • J. F. Galvez
    • 1
  • Daniel Glez-Peña
    • 1
  • Florentino Fdez-Riverola
    • 1
  1. 1.ESEI: Escuela Superior de Enxeñería InformáticaUniversity of VigoOurenseSpain

Personalised recommendations