Finding Protein Binding Sites Using Volunteer Computing Grids

  • Travis DesellEmail author
  • Lee A. Newberg
  • Malik Magdon-Ismail
  • Boleslaw K. Szymanski
  • William Thompson
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 144)


This paper describes initial work in the development of the DNA@Home volunteer computing project, which aims to use Gibbs sampling for the identification and location of DNA control signals on full genome scale data sets. Most current research involving sequence analysis for these control signals involve significantly smaller data sets, however volunteer computing can provide the necessary computational power to make full genome analysis feasible. A fault tolerant and asynchronous implementation of Gibbs sampling using the Berkeley Open Infrastructure for Network Computing (BOINC) is presented, which is currently being used to analyze the intergenic regions of the Mycobacterium tuberculosis genome. In only three months of limited operation, the project has had over 1,800 volunteered computing hosts participate and obtains a number of samples required for analysis over 400 times faster than an average computing host for the Mycobacterium tuberculosis dataset. We feel that the preliminary results for this project provide a strong argument for the feasibility and public interest of a volunteer computing project for this type of bioinformatics.


Intergenic Region Full Genome Motif Model Yersinia Pestis Gibbs Sampling Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pande, V., et al.: Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing. Biopolymers 68(1), 91–109 (2002), peter Kollman Memorial IssueCrossRefGoogle Scholar
  2. 2.
    Anderson, D.P., Korpela, E., Walton, R.: High-performance task distribution for volunteer computing. In: e-Science, pp. 196–203. IEEE Computer Society Press (2005)Google Scholar
  3. 3.
    Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)CrossRefGoogle Scholar
  4. 4.
    Bais, A.S., Kaminski, N., Benos, P.V.: Finding subtypes of transcription factor motif pairs with distinct regulatory roles. Nucleic Acids Research (2011)Google Scholar
  5. 5.
    Stormo, G.D.: Motif discovery using expectation maximization and gibbs sampling. In: Ladunga, I. (ed.) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol. 674, pp. 85–95. Humana Press (2010)Google Scholar
  6. 6.
    Challa, S., Thulasiraman, P.: Protein Sequence Motif Discovery on Distributed Supercomputer. In: Wu, S., Yang, L.T., Xu, T.L. (eds.) GPC 2008. LNCS, vol. 5036, pp. 232–243. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Zhang, X.: Automatic feature learning and parameter estimation for hidden markov models using mce and gibbs sampling. Ph.D. dissertation, University of Florida (2009)Google Scholar
  8. 8.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005)CrossRefGoogle Scholar
  9. 9.
    Tan, X., Xi, W., Baras, J.S.: Decentralized coordination of autonomous swarms using parallel gibbs sampling. Automatica 46(12), 2068–2076 (2010)zbMATHCrossRefGoogle Scholar
  10. 10.
    Salas-Gonzalez, D., Kuruoglu, E.E., Ruiz, D.P.: Modelling with mixture of symmetric stable distributions using gibbs sampling. Signal Processing 90(3), 774–783 (2010)zbMATHCrossRefGoogle Scholar
  11. 11.
    Newberg, L.A., Thompson, W.A., Conlan, S., Smith, T.M., McCue, L.A., Lawrence, C.E.: A phylogenetic gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)CrossRefGoogle Scholar
  12. 12.
    Thompson, W.A., Newberg, L.A., Conlan, S., McCue, L.A., Lawrence, C.E.: The gibbs centroid sampler. Nucleic Acids Research 35(Web-Server-Issue), 232–237 (2007)CrossRefGoogle Scholar
  13. 13.
    Lartillot, N.: Conjugate gibbs sampling for bayesian phylogenetic models. Journal of Computational Biology 13(10), 1701–1722 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gelman, A., Rubin, D.: Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–511 (1992)CrossRefGoogle Scholar
  15. 15.
    Yu, L., Xu, Y.: A parallel gibbs sampling algorithm for motif finding on gpu. In: 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 555–558 (2009)Google Scholar
  16. 16.
    Kuttippurathu, L., Hsing, M., Liu, Y., Schmidt, B., Maskell, D.L., Lee, K., He, A., Pu, W.T., Kong, S.W.: Decgpu: distributed error correction on massively parallel graphics processing units using cuda and mpi. BMC Bioinformatics 12(85) (2011)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • Travis Desell
    • 1
    Email author
  • Lee A. Newberg
    • 2
  • Malik Magdon-Ismail
    • 2
  • Boleslaw K. Szymanski
    • 2
  • William Thompson
    • 3
  1. 1.University of North DakotaGrand ForksUSA
  2. 2.RPITroyUSA
  3. 3.Center for Computational Molecular Biology, Department of Applied MathematicsBrown UniversityProvidenceUSA

Personalised recommendations