Abstract
The risks of developing complex diseases are likely to be determined by single nucleotide polymorphisms (SNPs), which are the most common form of DNA variations. Rapidly developing genotyping technologies have made it possible to assess the influence of SNPs on a particular disease. The aim of this paper is to identify the risk/protective factors of a disease, which are modeled as a subset of SNPs (with specified alleles) with the maximum odds ratio. On the basis of risk/protective factor and the relationship between nucleotides and amino acids, two novel risk/protective factors (called k-relaxed risk/protective factors and weighted-relaxed risk/protective factors) are proposed to consider more complex disease-associated SNPs. However, the enormous amount of possible SNPs interactions presents a mathematical and computational challenge. In this paper, we use the Bayesian Optimization Algorithm (BOA) to search for the risk/protective factors of a particular disease. Determining the Bayesian network (BN) structure is NP-hard; therefore, the binary particle swarm optimization was used to determine the BN structure. The proposed algorithm was tested on four datasets. Experimental results showed that the algorithm proposed in this paper is a promising method for discovering SNPs interactions that cause/prevent diseases.
Article PDF
Similar content being viewed by others
References
Wei B, Peng Q K, Zhang Q W, et al. Identification of a combination of SNPs associated with Graves’ disease using swarm intelligence. Sci China Life Sci, 2011, 54: 139–145
Moore J H. The ubiquitous nature of epistasis in de-termining susceptibility to common human diseases. Hum Hered, 2003, 56: 73–82
Jasnos L, Korona R. Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet, 2007, 39: 550–554
Martin G, Elena S F, Lenormand T. Distributions of epistasis in microbes fit predictions from a fitness landscape model. Nat Genet, 2007, 39: 555–560
Hirschhorn J N, Daly M J. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet, 2005, 6: 95–108
Mccarthy M I, Abecasis G R, Cardon L R, et al. Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nat Rev Genet, 2008, 9: 356–369
Culverhouse R, Suarez B K, Lin J, et al. A perspective on epistasis: Limits of models displaying no main effect. Am J Hum Genet, 2002, 70: 461–471
Brinza D, Zelikovsky A. Design and validation of methods searching for risk factors in genotype case-control studies. J Comput Biol, 2008, 15: 81–90
Kelemen A, Vasilakos A V, Liang Y. Computational intelligence in bioinformatics: SNP/Haplotype data in ge-netic association study for common diseases. IEEE Trans Inf Technol Biomed, 2009, 13: 841–847
Thornton T A, Moore J H, Haines J L. Genetics, statistics and human disease: Analytical retooling for com-plexity. Trends Genet, 2004, 20: 640–647
Hirschhorn J N. Genomewide association studies illuminating biologic pathways. N Engl J Med, 2009, 360: 1699–1701
Goldstein D B. Common genetic variation and human traits. N Engl J Med, 2009, 360: 1696–1698
Pelikan M, Goldberg D E, CantuPaz E. BOA: The Bayesian optimization algorithm. P Genet Evol Comput Conf, 1999, 525-532
Pelikan M, Goldberg D E, Cantu P E. Linkage problem, distribution estimation, and Bayesian networks. Evol Comput, 2000, 8: 311–340
Heckerman D, Geiger D, Chickering D M. Learning Bayesian networks: The combination of knowledge and statistical data. Mach Learn, 1995, 20: 197–243
Ruczinski I, Kooperberg C, LeBlanc M L. Ex-ploring interactions in high-dimensional genomic data: An overview of Logic Regression, with applications. J Multi-variate Anal, 2004, 90: 178–195
Bashir S, Naeem M, Shah S I. A comparative study of heuristic algorithms: GA and UMDA in spatially multiplexed communication systems. Eng Appl Artif Intel, 2010, 23: 95–101
Chen T, Ke T, Chen G L, et al. Analysis of com-putational time of simple estimation of distribution algo-rithms. IEEE Trans Evol Comput, 2010, 14: 1–22
Shapiro J L. Drift and scaling in estimation of distribution algorithms. Evol Comput, 2005, 13: 99–123
Chrubasik B. Readings on the principles and applications of decision-analysis: Vol 1: General collection; vol 2: Professional collection-Howard, RA, Matheson, JE. Eur J Oper Res, 1986, 27: 383–384
Kyburg H E. Probabilistic reasoning in intelligent systems-networks of plausible inference-pearl. J Philos, 1991, 88: 434–437
Schwarz J, Ocenasek J. A problem-knowledge based evolutionary algorithm KBOA for hypergraph parti-tioning. In: Proceedings of the Fourth Joint Conference on Knowledge-Based Software Engineering, IO Press, Brno, Czech Republic, 2000. 51–58
Pelikan M, Sastry K, Goldberg D E. Scalability of the Bayesian optimization algorithm. Int J Approx Reason, 2002, 31: 221–258
Kennedy J, Eberhart R C. A discrete binary ver-sion of the particle swarm algorithm. Conf Proc—IEEE Int Conf Syst Man Cybern, 1997, 5: 4104–4108
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Wei, B., Peng, Q., Chen, X. et al. Bayesian optimization algorithm-based methods searching for risk/protective factors. Chin. Sci. Bull. 58, 2828–2835 (2013). https://doi.org/10.1007/s11434-012-5475-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-012-5475-6