Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2011: Euro-Par 2011: Parallel Processing Workshops pp 23–32Cite as

  1. Home
  2. Euro-Par 2011: Parallel Processing Workshops
  3. Conference paper
Enabling Data and Compute Intensive Workflows in Bioinformatics

Enabling Data and Compute Intensive Workflows in Bioinformatics

  • Gaurang Mehta30,
  • Ewa Deelman30,
  • James A. Knowles31,
  • Ting Chen32,
  • Ying Wang32,34,
  • Jens Vöckler30,
  • Steven Buyske33 &
  • …
  • Tara Matise33 
  • Conference paper
  • 1171 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7156)

Abstract

Accelerated growth in the field of bioinformatics has resulted in large data sets being produced and analyzed. With this rapid growth has come the need to analyze these data in a quick, easy, scalable, and reliable manner on a variety of computing infrastructures including desktops, clusters, grids and clouds. This paper presents the application of workflow technologies, and, specifically, Pegasus WMS, a robust scientific workflow management system, to a variety of bioinformatics projects from RNA sequencing, proteomics, and data quality control in population studies using GWAS data.

Keywords

  • workflows
  • bioinformatics
  • sequencing
  • epigenetics
  • proteomics

Download conference paper PDF

References

  1. Deelman, E., Mehta, G., Singh, G., Su, M.H., Vahi, K.: Pegasus: Mapping Large-Scale Workflows to Distributed Resources. In: Workflows for e-Science (2007)

    Google Scholar 

  2. Deelman, E., et al.: Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming Journal 13, 219–237 (2005)

    Google Scholar 

  3. Juve, G., Deelman, E., Vahi, K., Mehta, G., et al.: Data Sharing Options for Scientific Workflows on Amazon EC2. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010)

    Google Scholar 

  4. Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, vol. 5(3), pp. 55–63 (2002)

    Google Scholar 

  5. Litzkow, M.J., Livny, M., Mutka, M.W.: Condor: A Hunter of Idle Workstations. In: 8th International Conference on Distributed Computing Systems (1988)

    Google Scholar 

  6. Couvares, P., Kosar, T., Roy, A., et al.: Workflow in Condor. In: Taylor, I., Deelman, E., et al. (eds.) Workflows for e-Science. Springer Press (January 2007)

    Google Scholar 

  7. Xu, H., Freitas, M.A.: Bioinformatics 25(10), 1341–1343 (2009)

    CrossRef  Google Scholar 

  8. Freitas, M.A., Mehta, G., et al.: Large-Scale Proteomic Data Analysis via Flexible Scalable Workflows. In: RECOMB Satellite Conference on Computational Proteomics (2010)

    Google Scholar 

  9. Transcriptional Atlas of the Developing Human Brain, http://www.brainspan.org/

  10. Illumina Eland Alignment Algorithm, http://www.illumina.com

  11. Chen, Y., Souaiaia, T., Chen, T.: PerM: Efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25(19), 2514–2521 (2009)

    CrossRef  Google Scholar 

  12. Wang, Y., Mehta, G., Mayani, R., Lu, J., Souaiaia, T., et al.: RseqFlow: Workflows for RNA-Seq data analysis. Submission: Oxford Bioinformatics-Application Notes

    Google Scholar 

  13. O’Connor, B., Merriman, B., Nelson, S.: SeqWare Query Engine: storing and searching sequence data in the cloud. BMC Bioinformatics 11(suppl. 12), S2 (2010)

    Google Scholar 

  14. Matise, T.C., Ambite, J.L., et al.: For the PAGE Study.  Population Architecture using Genetics and Epidemiology. Am. J. Epidemiol (2011), doi:10.1093/aje/kwr160

    Google Scholar 

  15. Mailman, M.D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., et al.: The NCBI dbGaP Database of Genotypes and Phenotypes. Nat Genet. 39(10), 1181–1186 (2007)

    CrossRef  Google Scholar 

  16. Virtual Box, http://www.virtualbox.org/

  17. VMware, http://www.vmware.com/

  18. Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: kvm: the Linux virtual machine monitor. In: OLS 2007: The 2007 Ottawa Linux Symposium, pp. 225–230 (July 2007)

    Google Scholar 

  19. Ludascher, B., Altintas, I., Berkley, C., et al.: Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience (2005)

    Google Scholar 

  20. Blankenberg, D., et al.: Galaxy: a web-based genome analysis tool for experimentalists. In: Current Protocols in Molecular Biology, ch. 19, Unit 19.10.1-21 (2010)

    Google Scholar 

  21. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34, 729–732 (2006)

    CrossRef  Google Scholar 

  22. Romano, P.: Automation of in-silico data analysis processes through workflow management systems. Briefings in Bioinformatics 9(1), 57–68 (2008)

    CrossRef  Google Scholar 

  23. Nakata, K., Lipska, B.L., Hyde, T.M., Ye, T., et al.: DISC1 splice variants are upregulated in schizophrenia and associated with risk polymorphisms. PNAS, August 24 (2009)

    Google Scholar 

  24. Deelman, E., Kesselman, C., Mehta, G., et al.: GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists. In: 11th Int. Symposium HPDC, HPDC11 2002, p. 225 (2002)

    Google Scholar 

  25. Eng, J.K., McCormack, A.L., Yates III, J.R.: An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J. Am. Soc. Mass. Spectrom. 5(11), 976–989 (1994)

    CrossRef  Google Scholar 

  26. Perkins, D.N., Pappin, D.J., et al.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)

    CrossRef  Google Scholar 

  27. Eker, J., Janneck, J., Lee, E.A., Liu, J., et al.: Taming heterogeneity - the Ptolemy approach. Proceedings of the IEEE 91(1), 127–144 (2003)

    CrossRef  Google Scholar 

  28. Pegasus Workflow Management System, http://pegasus.isi.edu/wms

  29. Teragrid, http://www.teragrid.org

  30. Open Science Grid, http://www.opensciencegrid.org

  31. FutureGrid, http://www.futuregrid.org

  32. Nagavaram, A., Agrawal, G., et al.: A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis. In: Proceedings of the 7th IEEE International Conference on e-Science (e-Science 2011) (December 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. USC Information Sciences Institute, USA

    Gaurang Mehta, Ewa Deelman & Jens Vöckler

  2. Keck School of Medicine of USC, USA

    James A. Knowles

  3. University of Southern California, USA

    Ting Chen & Ying Wang

  4. Rutgers University, USA

    Steven Buyske & Tara Matise

  5. Xiamen University, P.R. China

    Ying Wang

Authors
  1. Gaurang Mehta
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Ewa Deelman
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. James A. Knowles
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Ting Chen
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Ying Wang
    View author publications

    You can also search for this author in PubMed Google Scholar

  6. Jens Vöckler
    View author publications

    You can also search for this author in PubMed Google Scholar

  7. Steven Buyske
    View author publications

    You can also search for this author in PubMed Google Scholar

  8. Tara Matise
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria

    Michael Alexander

  2. ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy

    Pasqua D’Ambra

  3. University of Amsterdam, 1090, Amsterdam, Netherlands

    Adam Belloum

  4. Innovative Computing Laboratory, The University of Tennessee, US

    George Bosilca

  5. Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy

    Mario Cannataro

  6. Computer Science Department, University of Pisa, Italy

    Marco Danelutto

  7. Second University of Naples, Italy

    Beniamino Di Martino

  8. TUMünchen,, Boltzmannstr. 3, ,, 85748, Garching, Germany

    Michael Gerndt

  9. Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Emmanuel Jeannot & Raymond Namyst & 

  10. Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Jean Roman

  11. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831-6164, Oak Ridge, TN, USA

    Stephen L. Scott

  12. Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austria

    Jesper Larsson Traff

  13. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA

    Geoffroy Vallée

  14. Technische Universität München, Germany

    Josef Weidendorfer

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mehta, G. et al. (2012). Enabling Data and Compute Intensive Workflows in Bioinformatics. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_4

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-29740-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29739-7

  • Online ISBN: 978-3-642-29740-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature