A gene expression analysis system for medical diagnosis

  • Dimitris Maroulis
  • Dimitris Iakovidis
  • Ilias Flaounas
  • Stavros Karkanis
Part of the IFIP International Federation for Information Processing book series (IFIPAICT, volume 204)


In this paper we present a novel system that utilizes molecular-level information for medical diagnosis. It accepts high dimensional vectors of gene expressions, quantified by means of microarray image analysis, as input. The proposed system incorporates various data pre-processing methods, such as missing values estimation and data normalization. A novel approach to the classification of gene expression vectors in multiple classes that embodies vari-ous gene selection methods has been adopted for diagnostic purposes. The pro-posed system has been extensively tested on various, publicly available data-sets. We demonstrate its performance for prostate cancer diagnosis and corn-pare its performance with a well established multiclass classification scheme. The results show that the proposed system could be proved a valuable diagnostic aid in medicine.


Gene Expression Data Prostate Cancer Diagnosis Classification Error Rate Gene Expression Matrix Gene Expression Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Do, K.-A., Nikolova, R., Roebuck, P., Broom, B.: GeneClust,, accessed Nov. 2004Google Scholar
  2. 2.
    Hastie, T., Tibshirani, R., Eisen, M. B, Alizadeh, A., Levy, R., Staudt, L., Chan, W.C., Botstein D., and Brown, P.: ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Gen. Biol. 1 (2000) 0003.1–0003.21Google Scholar
  3. 3.
    Li, C., Wong, W. H.: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. PNAS 98 (2001) 31–36zbMATHCrossRefGoogle Scholar
  4. Peterson, L.E.: CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. Gen. Biol. 3 (2002) 0002.1–0002.8Google Scholar
  5. 5.
    Sturn, J. Quackenbush, Z. Trajanoski; Genesis: cluster analysis of microarray data. Bioinformaties 18 (2002) 207–208CrossRefGoogle Scholar
  6. 6.
    Colantuoni, C., Henry, G., Zeger, S., Pevsner, J.: SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis, Bioinformaties 18 (2002) 1540–1541CrossRefGoogle Scholar
  7. 7.
    Saal, L. H., Troein, C., Vallon-Christersson, J., Gruvberger, S., Borg, Å., Peterson, C: Bio Array Software Environment: A Platform for Comprehensive Management and Analysis of MicroarrayData. Gen. Biol. 3 (2002) 0003.1–0003.6Google Scholar
  8. 8.
    Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Stum, A., Snuffln, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., Quackenbush, J.: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34 (2003) 374–378Google Scholar
  9. 9.
    Gentleman, R., Rossini, R., Dudoit S., Homik K.: The Bioconductor FAQ, (2003) official URL. Google Scholar
  10. 10.
    Yang, S., Murali, T. M., Pavlovic, V., Schaffer, M., Kasif, S.: RankGene: identification of diagnostic genes based on expression data. Bioinformaties. 19 (2003) 1578–1579CrossRefGoogle Scholar
  11. 11.
    Xu, D., Olman, V., Wang, L., Xu, Y.: EXCAVATOR: a computer program for efficiently mining gene expression data. Nucleic Acids Research 31 (2003) 5582–5589CrossRefGoogle Scholar
  12. 12.
    Toyoda T., Konagaya, A.: KnowledgeEditor: a new tool for interactive modeling and analyzing biological pathways based on microarray data. Bioinformaties. 19 (2003) 433–434CrossRefGoogle Scholar
  13. 13.
    Pieler, R., Sanchez-Cabo, F., Hackl, H., Thallinger G.G., Trajanoski, Z.: ArrayNorm: comprehensive normalization and analysis of microarray data. Bioinformaties. 20 (2004) 1971–1973CrossRefGoogle Scholar
  14. 14.
    Zhang, W., Shmulevich, I., (ed.), Computation and Statistical Approaches to Genomics, Kluwer Academic Publishers, Boston, (2002)Google Scholar
  15. 15.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshiran, R., Botstein D., Altman, R.B., Missing value estimation methods for DNA microarrays. Bioinformaties 17 (2001) 520–525CrossRefGoogle Scholar
  16. 16.
    Pan, W., A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformaties. 18 (2002) 546–554CrossRefGoogle Scholar
  17. 17.
    Golub, T.R. et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 286 (1999), 531–537CrossRefGoogle Scholar
  18. 18.
    Sun, M., Xiong, M.: A mathematical programming approach for gene selection and tissue classification. Bioinformaties 19 (2003) 1243–1251CrossRefGoogle Scholar
  19. 19.
    Vapnik, V.: Statistical Learning Theory, John Will and Sons, New York, (1998)zbMATHGoogle Scholar
  20. 20.
    Lapointe, J., Li, C., Higgins, J.P., Van de Rijn, M., Bair, E., Montgomery, K. et al.: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc. Nat. Acad. Sci. 101 (2004) 811–816CrossRefGoogle Scholar
  21. Stanford Microarray Database, accessed Nov. 2004.Google Scholar
  22. 22.
    Hsu C.W., Lin, C.J., A comparison of Methods for Multiclass Support Vector Machines, IEEE Trans. Neural Networks, 13 (2002), 415–425CrossRefGoogle Scholar
  23. 23.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A. J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci. 96 (1999) 6745–6750.CrossRefGoogle Scholar
  24. 24.
    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa P., et al.:, Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Nat. Acad. Sci. 98(24) (2001) 13790–13795CrossRefGoogle Scholar

Copyright information

© International Federation for Information Processing 2006

Authors and Affiliations

  • Dimitris Maroulis
    • 1
  • Dimitris Iakovidis
    • 1
  • Ilias Flaounas
    • 1
  • Stavros Karkanis
    • 2
  1. 1.Dept. of Informatics and Telecommunications, PanepistimiopolisUniversity of AthensIlisiaGreece
  2. 2.Dept. of Informatics and Computer TechnologyLamia Institute of TechnologyLamiaGreece

Personalised recommendations