Abstract
This paper describes initial work in the development of the DNA@Home volunteer computing project, which aims to use Gibbs sampling for the identification and location of DNA control signals on full genome scale data sets. Most current research involving sequence analysis for these control signals involve significantly smaller data sets, however volunteer computing can provide the necessary computational power to make full genome analysis feasible. A fault tolerant and asynchronous implementation of Gibbs sampling using the Berkeley Open Infrastructure for Network Computing (BOINC) is presented, which is currently being used to analyze the intergenic regions of the Mycobacterium tuberculosis genome. In only three months of limited operation, the project has had over 1,800 volunteered computing hosts participate and obtains a number of samples required for analysis over 400 times faster than an average computing host for the Mycobacterium tuberculosis dataset. We feel that the preliminary results for this project provide a strong argument for the feasibility and public interest of a volunteer computing project for this type of bioinformatics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pande, V., et al.: Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing. Biopolymers 68(1), 91–109 (2002), peter Kollman Memorial Issue
Anderson, D.P., Korpela, E., Walton, R.: High-performance task distribution for volunteer computing. In: e-Science, pp. 196–203. IEEE Computer Society Press (2005)
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)
Bais, A.S., Kaminski, N., Benos, P.V.: Finding subtypes of transcription factor motif pairs with distinct regulatory roles. Nucleic Acids Research (2011)
Stormo, G.D.: Motif discovery using expectation maximization and gibbs sampling. In: Ladunga, I. (ed.) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol. 674, pp. 85–95. Humana Press (2010)
Challa, S., Thulasiraman, P.: Protein Sequence Motif Discovery on Distributed Supercomputer. In: Wu, S., Yang, L.T., Xu, T.L. (eds.) GPC 2008. LNCS, vol. 5036, pp. 232–243. Springer, Heidelberg (2008)
Zhang, X.: Automatic feature learning and parameter estimation for hidden markov models using mce and gibbs sampling. Ph.D. dissertation, University of Florida (2009)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005)
Tan, X., Xi, W., Baras, J.S.: Decentralized coordination of autonomous swarms using parallel gibbs sampling. Automatica 46(12), 2068–2076 (2010)
Salas-Gonzalez, D., Kuruoglu, E.E., Ruiz, D.P.: Modelling with mixture of symmetric stable distributions using gibbs sampling. Signal Processing 90(3), 774–783 (2010)
Newberg, L.A., Thompson, W.A., Conlan, S., Smith, T.M., McCue, L.A., Lawrence, C.E.: A phylogenetic gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)
Thompson, W.A., Newberg, L.A., Conlan, S., McCue, L.A., Lawrence, C.E.: The gibbs centroid sampler. Nucleic Acids Research 35(Web-Server-Issue), 232–237 (2007)
Lartillot, N.: Conjugate gibbs sampling for bayesian phylogenetic models. Journal of Computational Biology 13(10), 1701–1722 (2006)
Gelman, A., Rubin, D.: Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–511 (1992)
Yu, L., Xu, Y.: A parallel gibbs sampling algorithm for motif finding on gpu. In: 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 555–558 (2009)
Kuttippurathu, L., Hsing, M., Liu, Y., Schmidt, B., Maskell, D.L., Lee, K., He, A., Pu, W.T., Kong, S.W.: Decgpu: distributed error correction on massively parallel graphics processing units using cuda and mpi. BMC Bioinformatics 12(85) (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
Desell, T., Newberg, L.A., Magdon-Ismail, M., Szymanski, B.K., Thompson, W. (2012). Finding Protein Binding Sites Using Volunteer Computing Grids. In: Gaol, F., Nguyen, Q. (eds) Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science. Advances in Intelligent and Soft Computing, vol 144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28314-7_52
Download citation
DOI: https://doi.org/10.1007/978-3-642-28314-7_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28313-0
Online ISBN: 978-3-642-28314-7
eBook Packages: EngineeringEngineering (R0)