Evolutionary Optimization of Transcription Factor Binding Motif Detection

  • Zhao Zhang
  • Ze Wang
  • Guoqin Mai
  • Youxi Luo
  • Miaomiao Zhao
  • Fengfeng Zhou
Chapter
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 827)

Abstract

All the cell types are under strict control of how their genes are transcribed into expressed transcripts by the temporally dynamic orchestration of the transcription factor binding activities. Given a set of known binding sites (BSs) of a given transcription factor (TF), computational TFBS screening technique represents a cost efficient and large scale strategy to complement the experimental ones. There are two major classes of computational TFBS prediction algorithms based on the tertiary and primary structures, respectively. A tertiary structure based algorithm tries to calculate the binding affinity between a query DNA fragment and the tertiary structure of the given TF. Due to the limited number of available TF tertiary structures, primary structure based TFBS prediction algorithm is a necessary complementary technique for large scale TFBS screening. This study proposes a novel evolutionary algorithm to randomly mutate the weights of different positions in the binding motif of a TF, so that the overall TFBS prediction accuracy is optimized. The comparison with the most widely used algorithm, Position Weight Matrix (PWM), suggests that our algorithm performs better or the same level in all the performance measurements, including sensitivity, specificity, accuracy and Matthews correlation coefficient. Our data also suggests that it is necessary to remove the widely used assumption of independence between motif positions. The supplementary material may be found at: http://www.healthinformaticslab.org/supp/ .

Keywords

Binding sites Transcription factor Position weight matrix Motif 

Notes

Acknowledgments

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040400), Shenzhen Peacock Plan (KQCX20130628112914301), Shenzhen Research Grant (ZDSY20120617113021359), China 973 program (2010CB732606), the MOE Humanities Social Sciences Fund (No.13YJC790105) and Doctoral Research Fund of HBUT (No. BSQD13050). Computing resources were partly provided by the Dawning supercomputing clusters at SIAT CAS.

References

  1. 1.
    Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563PubMedCrossRefGoogle Scholar
  2. 2.
    Ameur A, Rada-Iglesias A, Komorowski J, Wadelius C (2009) Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res 37(12):e85PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8(3):206–216PubMedCrossRefGoogle Scholar
  4. 4.
    Galas DJ, Schmitz A (1978) DNAase footprinting a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res 5(9):3157–3170PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Dent C, Latchman D (1993) The DNA mobility shift assay. In: Transcription factors: a practical approach, pp 1–3Google Scholar
  6. 6.
    Pillai S, Chellappan SP (2009) ChIP on chip assays: genome-wide analysis of transcription factor binding and histone modifications. In: Chromatin protocols. Springer, Berlin, pp 341–366Google Scholar
  7. 7.
    Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316(5830):1497–1502PubMedCrossRefGoogle Scholar
  8. 8.
    Wilson D, Charoensawan V, Kummerfeld SK, Teichmann SA (2008) DBD–taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res 36(Database issue):D88–D92Google Scholar
  9. 9.
    Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16(1):16–23PubMedCrossRefGoogle Scholar
  10. 10.
    Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21(11):2657–2666PubMedCrossRefGoogle Scholar
  11. 11.
    Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188–1190PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Quader S, Huang CH (2012) Effect of positional dependence and alignment strategy on modeling transcription factor binding sites. BMC Res Notes 5:340PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Gorin AA, Zhurkin VB, Wilma K (1995) B-DNA twisting correlates with base-pair morphology. J Mol Biol 247(1):34–48PubMedCrossRefGoogle Scholar
  14. 14.
    Oshchepkov DY, Vityaev EE, Grigorovich DA, Ignatieva EV, Khlebodarova TM (2004) SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition. Nucleic Acids Res 32(suppl 2):W208–W212PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, Green RK, Goodsell DS, Prlic A, Quesada M et al (2013) The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res 41(Database issue):D475–D482Google Scholar
  16. 16.
    Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K et al (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34(Database issue):D108–D110Google Scholar
  17. 17.
    Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S et al (2013) Ensembl 2013. Nucleic Acids Res 41(Database issue):D48–D55Google Scholar
  18. 18.
    String Alignment using Dynamic Programming.(http://www.biorecipes.com/DynProgBasic/code.html)
  19. 19.
    Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31(13):3576–3579PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X (2005) GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 33(Web Server issue):W184–W187Google Scholar
  21. 21.
    Zhou FF, Xue Y, Chen GL, Yao X (2004) GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 325(4):1443–1448PubMedCrossRefGoogle Scholar
  22. 22.
    Sheffield NC, Thurman RE, Song L, Safi A, Stamatoyannopoulos JA, Lenhard B, Crawford GE, Furey TS (2013) Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res 23(5):777–788PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Zhou Q, Liu JS (2004) Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics 20(6):909–916PubMedCrossRefGoogle Scholar
  24. 24.
    Cheng C, Ung M, Grant GD, Whitfield ML (2013) Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding RNAs. PLoS Comput Biol 9(7):e1003132PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Zhou F, Xu Y (2010) cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 26(16):2051–2052PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data. Bioinformatics 19(15):1917–1926PubMedCrossRefGoogle Scholar
  27. 27.
    Potts JC, Giddens TD, Yadav SB (1994) The development and evaluation of an improved genetic algorithm based on migration and artificial selection. IEEE Trans Syst Man Cybern 24(1):73–86CrossRefGoogle Scholar
  28. 28.
    Tam KY (1992) Genetic algorithms, function optimization, and facility layout design. Eur J Oper Res 63(2):322–346CrossRefGoogle Scholar
  29. 29.
    Anastassopoulos G, Adamopoulos A, Galiatsatos D, Drosos G (2013) Feature extraction of osteoporosis risk factors using artificial neural networks and genetic algorithms. Stud Health Technol Inform 190:186–188PubMedGoogle Scholar
  30. 30.
    Santiso EE, Musolino N, Trout BL (2013) Design of linear ligands for selective separation using a genetic algorithm applied to molecular architecture. J Chem Inf Model 53(7):1638–1660PubMedCrossRefGoogle Scholar
  31. 31.
    Chen JB, Chuang LY, Lin YD, Liou CW, Lin TK, Lee WC, Cheng BC, Chang HW, Yang CH (2013) Genetic algorithm-generated SNP barcodes of the mitochondrial D-loop for chronic dialysis susceptibility. Mitochondrial DNAGoogle Scholar
  32. 32.
    Sale M, Sherer EA (2013) A genetic algorithm based global search strategy for population pharmacokinetic/pharmacodynamic model selection. Brit J Clin PharmacolGoogle Scholar
  33. 33.
    Yoon Y, Kim YH (2013) An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks. IEEE Trans CybernGoogle Scholar
  34. 34.
    Azadnia AH, Taheri S, Ghadimi P, Mat Saman MZ, Wong KY (2013) Order batching in warehouses by minimizing total tardiness: a hybrid approach of weighted association rule mining and genetic algorithms. Sci World J 2013:246578Google Scholar
  35. 35.
    Chuang LY, Cheng YH, Yang CH, Yang CH (2013) Associate PCR-RFLP assay design with SNPs based on genetic algorithm in appropriate parameters estimation. IEEE Trans Nanobiosci 12(2):119–127CrossRefGoogle Scholar
  36. 36.
    Khotanlou H, Afrasiabi M (2012) Feature selection in order to extract multiple sclerosis lesions automatically in 3D brain magnetic resonance images using combination of support vector machine and genetic algorithm. J Med Signals Sens 2(4):211–218PubMedCentralPubMedGoogle Scholar
  37. 37.
    Kou J, Xiong S, Fang Z, Zong X, Chen Z (2013) Multiobjective optimization of evacuation routes in stadium using superposed potential field network based ACO. Comput Intell Neurosci 2013:369016PubMedCentralPubMedGoogle Scholar
  38. 38.
    Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar

Copyright information

© Shanghai Jiaotong University Press, Shanghai and Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Zhao Zhang
    • 1
    • 2
  • Ze Wang
    • 1
  • Guoqin Mai
    • 2
  • Youxi Luo
    • 2
    • 3
  • Miaomiao Zhao
    • 2
  • Fengfeng Zhou
    • 2
  1. 1.School of Computer Science and Software EngineeringTianjin Polytechnic UniversityTianjinChina
  2. 2.Shenzhen Institutes of Advanced Technology and Key Laboratory for Health InformaticsChinese Academy of SciencesShenzhenChina
  3. 3.School of ScienceHubei University of TechnologyWuhanChina

Personalised recommendations