Abstract
The article demonstrates the use of Multiple Iterative Constraint Satisfaction Learning (MICSL) process in inducing gene-markers from microarray gene-expression profiles. MICSL adopts a supervised learning from examples framework and proceeds by optimizing an evolving zero-one optimization model with constraints. After a data discretization pre-processing step, each example sample is transformed into a corresponding constraint. Extra constraints are added to guarantee mutual-exclusiveness between gene (feature) and assigned phenotype (class) values. The objective function corresponds to the learning outcome and strives to minimize use of genes by following an iterative constraint-satisfaction mode that finds solutions of increasing complexity. Standard (c4.5-like) pruning and rule-simplification processes are also incorporated. MICSL is applied on several well-known microarray datasets and exhibits very good performance that outperforms other established algorithms, providing evidence that the approach is suited for the discovery of biomarkers from microarray experiments. Implications of the approach in the biomedical informatics domain are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sander, C.: Genomic Medicine and the Future of Health Care. Science 287(5460), 1977–1978 (2000)
Sanchez, F.M., Iakovidis, I., et al.: Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of Biomedical Informatics 37(1), 30–42 (2004)
McConnell, P., Johnson, K., Lockhart, D.J.: An introduction to DNA microarrays. In: 2nd Conference on Critical Assessment of Microarray Data Analysis (CAMDA 2001) - Methods of Microarray Data Analysis II, pp. 9–21 (2002)
Dopazo, J.: Microarray data processing and analysis. In: 2nd Conference on Critical Assessment of Microarray Data Analysis (CAMDA 2001) - Methods of Microarray Data Analysis II, pp. 43–63 (2002)
Piatetsky-Shapiro, G., Tamayo, P.: Microarray Data Mining: Facing the Challenges. ACM SIGKDD Explorations 5(5), 1–5 (2003)
Butte, A.J.: Translational Bioinformatics: Coming of Age. J Am. Med. Inform. Assoc. 15(6), 709–714 (2008)
Golub, T.R., Slonim, D.K., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Van’t Veer, L.J., Dai, H., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Gordon, G.J., Jensen, R.V., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research 62, 4963–4967 (2002)
Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Petricoin, E.F., Ardekani, A.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(93056), 572–577 (2002)
Potamias, G.: MICSL: Multiple Iterative Constraint Satisfaction based Learning. Intell. Data Anal. 3(4), 245–265 (1999)
Hunt, E.B., Marin, J., Stone, P.J.: Experiments in Induction. Academic Press, New York (1966)
Michalski, R.C.: Concept Learning. Encyvlopedia of Artificial Intelligence 1, 185–194 (1986)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th International Joint Conference of Artificial Intelligence, pp. 1022–1027 (1993)
Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Kaufmann Publishers Inc., San Mateo (1993)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(379–423), 623–656 (1948)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.E.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1), 10–18 (2009)
Bell, C., Nerode, A., Raymond, T.N., Subrahmanian, V.S.: Implementing deductive databases by mixed integer programming. ACM Transactions on Database Systems 21(2), 238–269 (1996)
Cohen, W.W.: Fast Effective Rule Induction. In: 12th International Conference on Machine Learning, pp. 115–123 (1995)
Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: 15th International Conference on Machine Learning, pp. 144–151 (1998)
Pazzani, M.J., Sarrett, W.: A framework for the average case analysis of conjunctive learning algorithms. Machine Learning 9, 349–372 (1992)
Kohavi, R.: The Power of Decision Tables. In: 8th European Conference on Machine Learning, pp. 174–189 (1995)
Hall, M., Frank, E.: Combining Naive Bayes and Decision Tables. In: 21st Florida Artificial Intelligence Society Conference, pp. 15–17 (2008)
Martin, B.: Instance-based learning: nearest neighbor with generalization. Master Thesis, University of. Waikato, Hamilton, New Zealand (1995)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Gaines, B.R., Compton, P.: Induction of Ripple-Down Rules. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 349–354 (1992)
Singh, D., Febbo, P.G., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Sorace, J.M., Zhan, M.: A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4, 24 (2003)
West, M., Blanchette, C., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. 98(20), 11462–11467 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Potamias, G., Koumakis, L., Kanterakis, A., Moustakis, V. (2010). Towards the Discovery of Reliable Biomarkers from Gene-Expression Profiles: An Iterative Constraint Satisfaction Learning Approach. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-12842-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12841-7
Online ISBN: 978-3-642-12842-4
eBook Packages: Computer ScienceComputer Science (R0)