Advertisement

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

  • Olga T. Vrousgou
  • Fotis E. PsomopoulosEmail author
  • Pericles A. Mitkas
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 517)

Abstract

In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400%.

Keywords

Bioinformatics Grid computing Comparative genomics Sequence alignment Protein clustering Phylogenetic profiles Parallel processing Modular software engineering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hach, F., et al.: SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics. 28(23), 3051–3057 (2012)CrossRefGoogle Scholar
  2. 2.
    Jourdren, L., et al.: Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics. 28(11), 1542–1543 (2012)CrossRefGoogle Scholar
  3. 3.
    Vouzis, P., et al.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)CrossRefGoogle Scholar
  4. 4.
    Chung, W.C., et al.: CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce. PLoS One 9(6), e98146 (2014)CrossRefGoogle Scholar
  5. 5.
    Jun, G., et al.: An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 16. pii: gr.176552.114 (2015)Google Scholar
  6. 6.
    Decap, D., et al.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 26. pii: btv179 (2015)Google Scholar
  7. 7.
    Lobo, I.: Basic Local Alignment Search Tool (BLAST). Nature Education 1(1), 215 (2008)Google Scholar
  8. 8.
    Enright, A.J., Van Dongen, S.: C. A. Ouzounis.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  9. 9.
    Pellegrini, M., et al.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)CrossRefGoogle Scholar
  10. 10.
    Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles. PLoS ONE 8(1), e52854 (2013)CrossRefGoogle Scholar
  11. 11.
    Gómez, J., et al.: BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics 29(8), 1103–1104 (2013)CrossRefGoogle Scholar
  12. 12.
    Psomopoulos, F.E, et al.: The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence. Genes 3(2), 291–319Google Scholar
  13. 13.
    Goecks, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Olga T. Vrousgou
    • 1
  • Fotis E. Psomopoulos
    • 1
    • 2
    Email author
  • Pericles A. Mitkas
    • 1
  1. 1.Aristotle University of ThessalonikiThessalonikiGreece
  2. 2.Center for Research and Technology HellasThessalonikiGreece

Personalised recommendations