Bioinformatics pp 363-377 | Cite as

Combinatorial Optimization Models for Finding Genetic Signatures from Gene Expression Datasets

  • Regina Berretta
  • Wagner Costa
  • Pablo Moscato
Part of the Methods in Molecular Biology™ book series (MIMB, volume 453)


The aim of this chapter is to present combinatorial optimization models and techniques for the analysis of microarray datasets. The chapter illustrates the application of a novel objective function that guides the search for high-quality solutions for sequential ordering of expression profiles. The approach is unsupervised and a metaheuristic method (a memetic algorithm) is used to provide high-quality solutions. For the problem of selecting discriminative groups of genes, we used a supervised method that has provided good results in a variety of datasets. This chapter illustrates the application of these models in an Alzheimer's disease microarray dataset.

Key words:

Combinatorial optimization integer programming gene selection feature selection gene ordering microarray data analysis Alzheimer's disease. 


  1. 1.
    Tamayo, P., Ramaswamy, S. (2003) Micro-array data analysis: cancer genomics and molecular pattern recognition, in (Ladanyi, M., Gerald, W., eds.),Expression Profiling of Human Tumors: Diagnostic and Research Applications. Humana Press, Totowa, NJ.Google Scholar
  2. 2.
    Brazma, A., Vilo, J. (2000) Gene expression data analysis.FEBS Letts 480, 17–24.CrossRefGoogle Scholar
  3. 3.
    Brown, M. P., Grundy, W. N., Lin, D., et al. (2000) Knowledge-based analysis of micro-array gene expression data by using support vector machines.Proc Natl Acad Sci U S A 97, 62–267.Google Scholar
  4. 4.
    Eisen, M., Spellman, P., Brown, P., et al. (1998) Cluster analysis and display of genome-wide expression patterns.Proc Natl Acad Sci U S A 95, 14863–14868.PubMedCrossRefGoogle Scholar
  5. 5.
    Moscato, P., Mendes, A., Berretta, R. Benchmarking (2007) a memetic algorithm for ordering microarray data.BioSystems 88 (I-2), 56–75.PubMedCrossRefGoogle Scholar
  6. 6.
    Brown, V., Ossadtchi, A., Khan, A., et al. (2002) High-throughput imaging of brain gene expression.Genome Res 12, 244–254.PubMedCrossRefGoogle Scholar
  7. 7.
    Berretta R., Mendes, A., Moscato, P. (2005) Integer programming models and algorithms for molecular classification of cancer from microarray data. Proceedings of the 28th Australasian Computer Science Conference, in (V. Estivill-Castro, ed.),Conferences in Research and Practice in Information Technology 38, 361–370.Google Scholar
  8. 8.
    Moscato P., Berretta R., Hourani M., et al. (2005) Genes related with Alzheimer's disease: a comparison of evolutionary search, statistical and integer programming approaches. Proceedings of EvoBIO2005: 3rd European Workshop on Evolutionary Bioinformatics, in (Rothlauf, F., et al. eds.),Lecture Notes in Computer Science 3449, 84–94.Google Scholar
  9. 9.
    Pardalos, P. M., Resende, M. G. C. (2002)Handbook of Applied Optimization. Oxford University Press, New York.Google Scholar
  10. 10.
    Sun, M., Xiong, M. (2003) A mathematical programming approach for gene selection and tissue classification.Bioinformatics 19, 1243–1251.PubMedCrossRefGoogle Scholar
  11. 11.
    Merz, P. (2003) Analysis of gene expression profiles: an application of memetic algorithms to the minimum sum-of-squares clustering problem.BioSystems 72, 99–109.PubMedCrossRefGoogle Scholar
  12. 12.
    Lee, S., Kim, Y., Moon, B. (2003) Finding the optimal gene order in displaying micro-array data. Proceedings of GECCO2003: Genetic and Evolutionary Computation Conference, in (Cantu-Paz, E., et al., eds),Lecture Notes in Computer Science 2724, 2215–2226.Google Scholar
  13. 13.
    Cotta, C., Moscato, P. (2003) A memetic-aided approach to hierarchical clustering from distance matrices: application to phylogeny and gene expression clustering.Biosystems 72, 75–97.PubMedCrossRefGoogle Scholar
  14. 14.
    Greenberg, H., Hart, W., Lancia, G. (2004) Opportunities for combinatorial optimization in computational biology.INFORMS J Comput 16, 211–231.CrossRefGoogle Scholar
  15. 15.
    Rizzi, R., Bafna, V., Istrail, S., et al. (2002) Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. Proc. 2nd Annual Workshop on Algorithms in Bioinformatics (WABI), in (Guigo, R., Gusfield, D., eds.),Lecture Notes in Computer Science 2452, 29–43.Google Scholar
  16. 16.
    Moscato, P., Cotta, C. (2003) A gentle introduction to memetic algorithms. in (Glover, F., Kochenberger, G., eds.),Handbook of Metaheuristics. Kluwer Academic Publishers, Boston.Google Scholar
  17. 17.
    Glover, F., Laguna, M. (1997)Tabu Search. Kluwer Academic Publishers, Norwell, MA.Google Scholar
  18. 18.
    Davies, S., Russell, S. (1994)NP-completeness of searches for smallest possible feature sets, in (Greiner, R., Subramanian, D., eds.),AAAI Symposium on Intelligent Relevance. New Orleans, AAAI Press.Google Scholar
  19. 19.
    Cotta, C., Moscato, P. (2003) Thek-FEATURE SET Problem isW[2]-Complete.J Comput Syst Sci 67, 686–690.CrossRefGoogle Scholar
  20. 20.
    Cotta, C., Sloper, C., Moscato, P. (2004) Evolutionary search of thresholds for robust feature selection: application to the analysis of microarray data, in (Raidl, G., et al. eds.),Applications of Evolutionary Computing Lecture Notes in Computer Science. 3005, 21–30.Google Scholar
  21. 21.
    Fayyad, U., Irani, K. (1993) Multi-interval discretization of continuous-valued attributes for classification learning.Proceedings of the 13th International Joint Conference on Artificial Intelligence. 1022–1029.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Regina Berretta
    • 1
  • Wagner Costa
    • 2
  • Pablo Moscato
    • 3
  1. 1.Centre of Bioinformatics, Biomarker Discovery and Information-Based MedicineThe University of NewcastleCallaghanAustralia
  2. 2.School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia
  3. 3.ARC Centre of Excellence in Bioinformatics, and Centre of Bioinformatics, Biomarker Discovery and Information-Based MedicineThe University of NewcastleCallaghanAustralia

Personalised recommendations