Two level parallelism and I/O reduction in genome comparisons

Torreno, Oscar; Trelles, Oswaldo

doi:10.1007/s10586-017-0873-9

Two level parallelism and I/O reduction in genome comparisons

Published: 22 April 2017

Volume 20, pages 1925–1936, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

261 Accesses
2 Citations
Explore all metrics

Abstract

Genome comparison poses important computational challenges, especially in CPU-time, memory allocation and I/O operations. Although there already exist parallel approaches of multiple sequence comparisons algorithms, they face a significant limitation on the input sequence length. GECKO appeared as a computational and memory efficient method to overcome such limitation. However, its performance could be greatly increased by applying parallel strategies and I/O optimisations. We have applied two different strategies to accelerate GECKO while producing the same results. First, a two-level parallel approach parallelising each independent internal pairwise comparison in the first level, and the GECKO modules in the second level. A second approach consists on a complete rewrite of the original code to reduce I/O. Both strategies outperform the original code, which was already faster than equivalent software. Thus, much faster pairwise and multiple genome comparisons can be performed, what is really important with the ever-growing list of available genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-Level Parallelism to Accelerate Multiple Genome Comparisons

Breaking the computational barriers of pairwise genome comparison

Article Open access 11 August 2015

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Article 23 April 2019

References

Butenhof, D.R.: Programming with POSIX threads. Addison-Wesley Professional, Boston (1997)
Google Scholar
Caffarena, G., Pedreira, C., Carreras, C., Bojanic, S., Nieto-Taladriz, O.: FPGA acceleration for DNA sequence alignment. J. Circuits Syst. Comput. 16(02), 245–266 (2007)
Article Google Scholar
Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S.: mbwa: a massively parallel sequence reads aligner. In: 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014), pp. 113–120. Springer (2014)
Darling, A.E., Mau, B., Perna, N.T.: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010)
Article Google Scholar
Duvigneau, R., Kloczko, T., Praveen, C.: A three-level parallelization strategy for robust design in aerodynamics. In: Proceedings of the 20th International Conference on Parallel Computational Fluid Dynamics, pp. 379–384 (2008)
Farrar, M.: Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)
Article Google Scholar
Harris, R.: Improved pairwise alignment of genomic DNA. 2007. PhD diss., The Pennsylvania State University (2007)
Ino, F., Munekawa, Y., Hagihara, K.: Sequence homology search using fine grained cycle sharing of idle GPUs. IEEE Trans. Parallel Distrib. Syst. 23(4), 751–759 (2012)
Article Google Scholar
Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21(3), 487–493 (2011)
Article Google Scholar
Krishnajith, A.P., Kelly, W., Hayward, R., Tian, Y.C.: Managing memory and reducing i/o cost for correlation matrix calculation in bioinformatics. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 36–43. IEEE (2013)
Krumsiek, J., Arnold, R., Rattei, T.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007)
Article Google Scholar
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Article Google Scholar
Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2008)
Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 184–185. IEEE (2014)
Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)
Article Google Scholar
Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith–Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 257–265. IEEE (2014)
Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1 (2013)
Article Google Scholar
Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Not. 49(8), 219–232 (2014)
Article Google Scholar
Meng, X., Chaudhary, V.: A high-performance heterogeneous computing platform for biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 21(9), 1267–1280 (2010)
Article Google Scholar
Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid cpu+ gpu platforms. In: Euro-Par 2012: Parallel Processing Workshops, pp. 165–174. Springer (2012)
NCBI: National center for biotechnology information (2016). http://www.ncbi.nlm.nih.gov/. Accessed 21 Nov 2016
PD Krishnajith, A., Kelly, W., Tian, Y.C.: Optimizing i/o cost and managing memory for composition vector method based on correlation matrix calculation in bioinformatics. Curr. Bioinform. 9(3), 234–245 (2014)
Article Google Scholar
Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 1 (2011)
Article Google Scholar
Rucci, E., De Giusti, A., Naiouf, M., Botella, G., García, C., Prieto-Matias, M.: Smith–Waterman algorithm on heterogeneous systems: A case study. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 323–330. IEEE (2014)
Sandes, E.F.D.O., Boukerche, A., Melo, A.C.M.A.D.: Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput. Surv. (CSUR) 48(4), 63 (2016)
Article Google Scholar
Sandes, E.F.D.O., de Melo, A.C.M.: Retrieving Smith–Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Trans. Parallel Distrib. Syst. 24(5), 1009–1021 (2013)
Article Google Scholar
Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)
Article MathSciNet MATH Google Scholar
Supercomputing Center, U.o.M.: Picasso supercomputer (2016). http://www.scbi.uma.es/site/scbi/hardware. Accessed 21 Nov 2016
The Open MPI Project: Open MPI. https://www.open-mpi.org/. Accessed 21 Nov 2016
The Regents of the University of California: Jgi gold | statistics (2016). https://gold.jgi.doe.gov/statistics. Accessed 21 Nov 2016
Torreno, O., Trelles, O.: GECKO Supplementary material. http://bitlab-es.com/gecko/documents/HSPWorkflow-SuppMat-submittedv2.pdf. Accessed 21 Nov 2016
Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 1 (2015)
Article Google Scholar
Torreno, O., Trelles, O.: Two-level parallelism to accelerate multiple genome comparisons. In: 2016 22nd International Conference on Parallel and Distributed Computing (Euro-Par). Springer (2016)
Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: Accelerating biological database search on xeon phi. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 950–957. IEEE (2014)
Wienbrandt, L.: Bioinformatics applications on the FPGA-based high-performance computer RIVYERA. In: High-Performance Computing Using FPGAs, pp. 81–103. Springer (2013)
Zhou, Y., Xu, W., Donald, B.R., Zeng, J.: An efficient parallel algorithm for accelerating computational protein design. Bioinformatics 30(12), i255–i263 (2014)
Article Google Scholar

Download references

Acknowledgements

This work has been partially supported by the European projects Mr.Symbiomath (Grant No. 324554) and ELIXIR-EXCELERATE (Grant No. 676559), and the Spanish national projects “Plataforma de Recursos Biomoleculares y Bioinformáticos” (ISCIII-PT13.0001.0012) and RIRAAF (ISCIII-RD12/0013/0006).

Author information

Authors and Affiliations

Computer Architecture Department, University of Málaga, Bulevar Louis Pasteur 35, 29071, Málaga, Spain
Oscar Torreno & Oswaldo Trelles

Authors

Oscar Torreno
View author publications
You can also search for this author in PubMed Google Scholar
Oswaldo Trelles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oscar Torreno.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torreno, O., Trelles, O. Two level parallelism and I/O reduction in genome comparisons. Cluster Comput 20, 1925–1936 (2017). https://doi.org/10.1007/s10586-017-0873-9

Download citation

Received: 09 December 2016
Revised: 10 April 2017
Accepted: 17 April 2017
Published: 22 April 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-0873-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two level parallelism and I/O reduction in genome comparisons

Abstract

Access this article

Similar content being viewed by others

Two-Level Parallelism to Accelerate Multiple Genome Comparisons

Breaking the computational barriers of pairwise genome comparison

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two level parallelism and I/O reduction in genome comparisons

Abstract

Access this article

Similar content being viewed by others

Two-Level Parallelism to Accelerate Multiple Genome Comparisons

Breaking the computational barriers of pairwise genome comparison

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation