Abstract
All the cell types are under strict control of how their genes are transcribed into expressed transcripts by the temporally dynamic orchestration of the transcription factor binding activities. Given a set of known binding sites (BSs) of a given transcription factor (TF), computational TFBS screening technique represents a cost efficient and large scale strategy to complement the experimental ones. There are two major classes of computational TFBS prediction algorithms based on the tertiary and primary structures, respectively. A tertiary structure based algorithm tries to calculate the binding affinity between a query DNA fragment and the tertiary structure of the given TF. Due to the limited number of available TF tertiary structures, primary structure based TFBS prediction algorithm is a necessary complementary technique for large scale TFBS screening. This study proposes a novel evolutionary algorithm to randomly mutate the weights of different positions in the binding motif of a TF, so that the overall TFBS prediction accuracy is optimized. The comparison with the most widely used algorithm, Position Weight Matrix (PWM), suggests that our algorithm performs better or the same level in all the performance measurements, including sensitivity, specificity, accuracy and Matthews correlation coefficient. Our data also suggests that it is necessary to remove the widely used assumption of independence between motif positions. The supplementary material may be found at: http://www.healthinformaticslab.org/supp/ .
Zhao Zhang and Miaomiao Zhao have been contributed equally to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563
Ameur A, Rada-Iglesias A, Komorowski J, Wadelius C (2009) Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res 37(12):e85
Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8(3):206–216
Galas DJ, Schmitz A (1978) DNAase footprinting a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res 5(9):3157–3170
Dent C, Latchman D (1993) The DNA mobility shift assay. In: Transcription factors: a practical approach, pp 1–3
Pillai S, Chellappan SP (2009) ChIP on chip assays: genome-wide analysis of transcription factor binding and histone modifications. In: Chromatin protocols. Springer, Berlin, pp 341–366
Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316(5830):1497–1502
Wilson D, Charoensawan V, Kummerfeld SK, Teichmann SA (2008) DBD–taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res 36(Database issue):D88–D92
Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16(1):16–23
Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21(11):2657–2666
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188–1190
Quader S, Huang CH (2012) Effect of positional dependence and alignment strategy on modeling transcription factor binding sites. BMC Res Notes 5:340
Gorin AA, Zhurkin VB, Wilma K (1995) B-DNA twisting correlates with base-pair morphology. J Mol Biol 247(1):34–48
Oshchepkov DY, Vityaev EE, Grigorovich DA, Ignatieva EV, Khlebodarova TM (2004) SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition. Nucleic Acids Res 32(suppl 2):W208–W212
Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, Green RK, Goodsell DS, Prlic A, Quesada M et al (2013) The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res 41(Database issue):D475–D482
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K et al (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34(Database issue):D108–D110
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S et al (2013) Ensembl 2013. Nucleic Acids Res 41(Database issue):D48–D55
String Alignment using Dynamic Programming.(http://www.biorecipes.com/DynProgBasic/code.html)
Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31(13):3576–3579
Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X (2005) GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 33(Web Server issue):W184–W187
Zhou FF, Xue Y, Chen GL, Yao X (2004) GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 325(4):1443–1448
Sheffield NC, Thurman RE, Song L, Safi A, Stamatoyannopoulos JA, Lenhard B, Crawford GE, Furey TS (2013) Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res 23(5):777–788
Zhou Q, Liu JS (2004) Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics 20(6):909–916
Cheng C, Ung M, Grant GD, Whitfield ML (2013) Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding RNAs. PLoS Comput Biol 9(7):e1003132
Zhou F, Xu Y (2010) cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 26(16):2051–2052
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics 19(15):1917–1926
Potts JC, Giddens TD, Yadav SB (1994) The development and evaluation of an improved genetic algorithm based on migration and artificial selection. IEEE Trans Syst Man Cybern 24(1):73–86
Tam KY (1992) Genetic algorithms, function optimization, and facility layout design. Eur J Oper Res 63(2):322–346
Anastassopoulos G, Adamopoulos A, Galiatsatos D, Drosos G (2013) Feature extraction of osteoporosis risk factors using artificial neural networks and genetic algorithms. Stud Health Technol Inform 190:186–188
Santiso EE, Musolino N, Trout BL (2013) Design of linear ligands for selective separation using a genetic algorithm applied to molecular architecture. J Chem Inf Model 53(7):1638–1660
Chen JB, Chuang LY, Lin YD, Liou CW, Lin TK, Lee WC, Cheng BC, Chang HW, Yang CH (2013) Genetic algorithm-generated SNP barcodes of the mitochondrial D-loop for chronic dialysis susceptibility. Mitochondrial DNA
Sale M, Sherer EA (2013) A genetic algorithm based global search strategy for population pharmacokinetic/pharmacodynamic model selection. Brit J Clin Pharmacol
Yoon Y, Kim YH (2013) An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks. IEEE Trans Cybern
Azadnia AH, Taheri S, Ghadimi P, Mat Saman MZ, Wong KY (2013) Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms. Sci World J 2013:246578
Chuang LY, Cheng YH, Yang CH, Yang CH (2013) Associate PCR-RFLP assay design with SNPs based on genetic algorithm in appropriate parameters estimation. IEEE Trans Nanobiosci 12(2):119–127
Khotanlou H, Afrasiabi M (2012) Feature selection in order to extract multiple sclerosis lesions automatically in 3D brain magnetic resonance images using combination of support vector machine and genetic algorithm. J Med Signals Sens 2(4):211–218
Kou J, Xiong S, Fang Z, Zong X, Chen Z (2013) Multiobjective optimization of evacuation routes in stadium using superposed potential field network based ACO. Comput Intell Neurosci 2013:369016
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Acknowledgments
This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040400), Shenzhen Peacock Plan (KQCX20130628112914301), Shenzhen Research Grant (ZDSY20120617113021359), China 973 program (2010CB732606), the MOE Humanities Social Sciences Fund (No.13YJC790105) and Doctoral Research Fund of HBUT (No. BSQD13050). Computing resources were partly provided by the Dawning supercomputing clusters at SIAT CAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Shanghai Jiaotong University Press, Shanghai and Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Zhang, Z., Wang, Z., Mai, G., Luo, Y., Zhao, M., Zhou, F. (2015). Evolutionary Optimization of Transcription Factor Binding Motif Detection. In: Wei, D., Xu, Q., Zhao, T., Dai, H. (eds) Advance in Structural Bioinformatics. Advances in Experimental Medicine and Biology, vol 827. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9245-5_15
Download citation
DOI: https://doi.org/10.1007/978-94-017-9245-5_15
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9244-8
Online ISBN: 978-94-017-9245-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)