Skip to main content

Gibbs sampler

  • Chapter
  • First Online:
Bioinformatics and the Cell
  • 2458 Accesses

Abstract

Gibbs sampler is for de novo motif discovery. Suppose we have a set of sequences each containing a regulatory motif located in different locations of the sequences, but we do not know what the motif looks like or where it is located within each sequence. Gibbs sampler will find such a motif if it is well represented in these sequences. If we have a set of yeast intron sequences each containing a branchpoint site (BPS) somewhere, but we do not know what BPS looks like or where it is located along the intron sequence, Gibbs sampler will find these BPSs. Another scenario involves the discovery of protein binding sites (e.g., transcription factor binding site) given a set of sequences from ChIP-Seq. Each of these sequences has a short sequence segment with affinity to a protein, but we do not know what the short sequence segment looks like or where it is located within the sequence. Gibbs sampler shines in discovering such protein-binding sites. This chapter breaks the black box of Gibbs sampler and numerically illustrates each of its computational steps, including the site sampler (which assumes that each input sequence harbors a signal motif) and motif sampler (which is used when some sequences may contain multiple signal motifs and some none).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aerts S, Van Loo P, Thijs G, Mayer H, de Martin R, Moreau Y, De Moor B (2005) TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res 33(Web Server):W393–W396

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Aird WC, Parvin JD, Sharp PA, Rosenberg RD (1994) The interaction of GATA-binding proteins and basal transcription factors with GATA box-containing core promoters. A model of tissue-specific gene expression. J Biol Chem 269(2):883–889

    PubMed  CAS  Google Scholar 

  • Anderson KP, Crable SC, Lingrel JB (1998) Multiple proteins binding to a GATA-E box-GATA motif regulate the erythroid Kruppel-like factor (EKLF) gene. J Biol Chem 273(23):14347–14354

    Article  CAS  PubMed  Google Scholar 

  • Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bucklew JA (1990) Large deviation techniques in decision, simulation, and estimation. Wiley, New York

    Google Scholar 

  • Coessens B, Thijs G, Aerts S, Marchal K, De Smet F, Engelen K, Glenisson P, Moreau Y, Mathys J, De Moor B (2003) INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res 31(13):3468–3470

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Evans T, Felsenfeld G, Reitman M (1990) Control of globin gene transcription. Annu Rev Cell Biol 6:95–124

    Article  CAS  PubMed  Google Scholar 

  • Fong TC, Emerson BM (1992) The erythroid-specific protein cGATA-1 mediates distal enhancer activity through a specialized beta-globin TATA box. Genes Dev 6(4):521–532

    Article  CAS  PubMed  Google Scholar 

  • Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741

    Article  CAS  PubMed  Google Scholar 

  • Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577

    Article  CAS  PubMed  Google Scholar 

  • Holmes I, Bruno WJ (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9):803–820

    Article  CAS  PubMed  Google Scholar 

  • Jensen JL, Hein J (2005) Gibbs sampler for statistical multiple alignment. Stat Sin 15:889–907

    Google Scholar 

  • Kullback S (1959) Information theory and statistics. Wiley, New York

    Google Scholar 

  • Kullback S (1987) The Kullback-Leibler distance. Am Stat 41:340–341

    Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  Google Scholar 

  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214

    Article  CAS  PubMed  Google Scholar 

  • Lowry JA, Atchley WR (2000) Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J Mol Evol 50(2):103–115

    Article  CAS  PubMed  Google Scholar 

  • Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458(7234):97–101

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mannella CA, Neuwald AF, Lawrence CE (1996) Detection of likely transmembrane beta strand regions in sequences of mitochondrial pore proteins using the Gibbs sampler. J Bioenerg Biomembr 28(2):163–169

    Article  CAS  PubMed  Google Scholar 

  • Metropolis N (1987) The beginnning of the Monte Carlo method. Los Alamos Sci 15(Special issue):125–130

    Google Scholar 

  • Moi P, Loudianos G, Lavinha J, Murru S, Cossu P, Casu R, Oggiano L, Longinotti M, Cao A, Pirastu M (1992) Delta-thalassemia due to a mutation in an erythroid-specific binding protein sequence 3′ to the delta-globin gene. Blood 79(2):512–516

    PubMed  CAS  Google Scholar 

  • Neuwald AF, Liu JS, Lawrence CE (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci 4(8):1618–1632

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nishimura S, Takahashi S, Kuroha T, Suwabe N, Nagasawa T, Trainor C, Yamamoto M (2000) A GATA box in the GATA-1 gene hematopoietic enhancer is a critical element in the network of GATA factors and sites that regulate this gene. Mol Cell Biol 20(2):713–723

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Orkin SH (1990) Globin gene regulation and switching: circa 1990. Cell 63(4):665–672

    Article  CAS  PubMed  Google Scholar 

  • Orkin SH (1992) GATA-binding transcription factors in hematopoietic cells. Blood 80(3):575–581

    PubMed  CAS  Google Scholar 

  • Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, Laxman B, Asangani IA, Grasso CS, Kominsky HD et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29(8):742–749

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Qin ZS, McCue LA, Thompson W, Mayerhofer L, Lawrence CE, Liu JS (2003) Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 21(4):435–439

    Article  CAS  PubMed  Google Scholar 

  • Qu K, McCue LA, Lawrence CE (1998) Bayesian protein family classifier. Proc Int Conf Intell Syst Mol Biol 6:131–139

    PubMed  CAS  Google Scholar 

  • Rouchka EC (1997) A brief overview of Gibbs Sampling. IBC Statistics Study Group, Washington University, Institute for Biomedical Computing

    Google Scholar 

  • Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512

    Article  CAS  PubMed  Google Scholar 

  • Samso M, Palumbo MJ, Radermacher M, Liu JS, Lawrence CE (2002) A Bayesian method for classification of images from electron micrographs. J Struct Biol 138(3):157–170

    Article  PubMed  Google Scholar 

  • Schena M (1996) Genome analysis with gene expression microarrays. BioEssays 18(5):427–431

    Article  PubMed  CAS  Google Scholar 

  • Schena M (2003) Microarray analysis. Wiley-Liss, New York

    Google Scholar 

  • Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122

    Article  CAS  PubMed  Google Scholar 

  • Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y (2002a) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 9(2):447–464

    Article  CAS  PubMed  Google Scholar 

  • Thijs G, Moreau Y, De Smet F, Mathys J, Lescot M, Rombauts S, Rouze P, De Moor B, Marchal K (2002b) INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling. Bioinformatics 18(2):331–332

    Article  CAS  PubMed  Google Scholar 

  • Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Thompson W, Palumbo MJ, Wasserman WW, Liu JS, Lawrence CE (2004) Decoding human regulatory circuits. Genome Res 14(10A):1967–1974

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Van Esch H, Devriendt K (2001) Transcription factor GATA3 and the human HDR syndrome. Cell Mol Life Sci 58(9):1296–1300

    Article  PubMed  Google Scholar 

  • Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487

    Article  PubMed  CAS  Google Scholar 

  • Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xia X (2007b) Bioinformatics and the cell: modern computational approaches in genomics, proteomics and transcriptomics. Springer US, New York

    Book  Google Scholar 

  • Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43

    Article  Google Scholar 

  • Xia X, MacKay V, Yao X, Wu J, Miura F, Ito T, Morris DR (2011) Translation initiation: a regulatory role for poly(A) tracts in front of the AUG codon in saccharomyces cerevisiae. Genetics 189(2):469–478

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhu J, Liu JS, Lawrence CE (1998) Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1):25–39

    Article  CAS  PubMed  Google Scholar 

  • Zon LI, Gurish MF, Stevens RL, Mather C, Reynolds DS, Austen KF, Orkin SH (1991) GATA-binding transcription factors in mast cells regulate the promoter of the mast cell carboxypeptidase A gene. J Biol Chem 266(34):22948–22953

    PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Postscript

Postscript

Gibbs sampler for motif discovery illustrates the magic of a random process guided by a selection process. We start with a random set of motifs and apply a selection process that favors certain motifs against others, based on the criterion that the chosen motifs should contribute to a site-specific frequency distribution with a larger F defined in Eq. (4.1). The starting set of random motifs, shaped by the selection process, eventually converges to a final set of motifs with a strong nonrandom pattern. Sometimes we may have sequences with two or more different types of signal motifs. If we run Gibbs sampler with different starting sets of random motifs, we may converge to different sets of highly informative motifs. Thus, the same random process coupled with the same selection process may generate quite different outcomes.

My Christian friends often assert that Darwinian evolutionary theory is all wrong because random collision of molecules cannot generate highly structured patterns. Random collision of molecules indeed is limited by their potential to generate highly structured patterns. However, Darwinian evolution is not a random process. In fact, Darwin’s most significant contribution to biology is the substantiation of a particular force that he named natural selection. The combination of a random process guided by this particular force can do miracles in generating biodiversity of all colors and shades. It is when Darwin visualized the miracles generated by this ubiquitous force that he proclaimed that “There is grandeur in this view of life.”

This force has been with us, from time immemorial, and continues to shape all forms of life, including the life of those who deny its existence.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, X. (2018). Gibbs sampler. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_4

Download citation

Publish with us

Policies and ethics