ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads

Natarajan, Santhi; KrishnaKumar, N.; Anuchan, H. V.; Pal, Debnath; Nandy, S. K.

doi:10.1007/978-3-319-78890-6_45

Santhi Natarajan ORCID: orcid.org/0000-0002-6701-826X¹⁹,
N. KrishnaKumar ORCID: orcid.org/0000-0002-2385-9606¹⁹,
H. V. Anuchan²⁰,
Debnath Pal¹⁹ &
…
S. K. Nandy¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10824))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

2324 Accesses
1 Citations

Abstract

Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150–600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frese, K.S., Katus, H.A., Meder, B.: Next-generation sequencing: from understanding biology to personalized medicine. Biology 2, 378–398 (2013)
Article Google Scholar
Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. 14, 157–167 (2013)
Article Google Scholar
Zhang, W., Chen, J., Yang, Y., Tang, Y., Shang, J., Shen, B.: A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS ONE 6(3), e17915 (2011)
Article Google Scholar
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)
Article Google Scholar
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010)
Article Google Scholar
Hernandez, D., Francois, P., Farinelli, L., Osteras, M., Schrenzel, J.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18, 802–809 (2008)
Article Google Scholar
Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007)
Article Google Scholar
Bryant Jr., D.W., Wong, W.K., Mockler, T.C.: QSRA: a quality-value guided de novo short read assembler. BMC Bioinform. 10, 69 (2009)
Article Google Scholar
Varma, B.S.C., Paul, K., Balakrishnan, M., Lavenier, D.: Fassem: FPGA based acceleration of de novo genome assembly. In: FCCM 2013, pp. 173–176. IEEE Computer Society, Washington, DC (2013)
Google Scholar
Varma, B., Paul, K., Balakrishnan, M.: Accelerating genome assembly using hard embedded blocks in FPGAs. In: 27th International Conference on VLSI Design and 13th International Conference on Embedded Systems, pp. 306–311, January 2014
Google Scholar
Cray Inc.: Cray XC40: Scaling Across the Supercomputer Performance Spectrum. http://www.cray.com/sites/default/files/resources/CrayXC40Brochure.pdf
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K.: AccuRA: accurate alignment of short reads on scalable reconfigurable accelerators. In: Proceedings of IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XVI), pp. 79–87, July 2016
Google Scholar
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K.: Accurate and accelerated secondary analysis of genomes: implications for genomics. In: Barcelona NGS 2017: Structural Variation and Population Genomics, April 2017
Google Scholar
Natarajan, S., KrishnaKumar, N., Pavan, M., Pal, D., Nandy, S.K.: ReneGENE-DP: accelerated parallel dynamic programming for genome informatics. In: Accepted at the 2018 International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT), March 2018
Google Scholar
Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 345–374 (1994)
Article MathSciNet Google Scholar
Shi, F.: Fast approximate string matching with q-blocks sequences. In: Proceedings of 3rd South American Workshop on String Processing, pp. 257–271 (1996)
Google Scholar
Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms 6, 132–137 (1985)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Science, Bangalore, 560012, India
Santhi Natarajan, N. KrishnaKumar, Debnath Pal & S. K. Nandy
National Institute of Technology Karnataka, Surathkal, 575025, India
H. V. Anuchan

Authors

Santhi Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
N. KrishnaKumar
View author publications
You can also search for this author in PubMed Google Scholar
H. V. Anuchan
View author publications
You can also search for this author in PubMed Google Scholar
Debnath Pal
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Nandy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santhi Natarajan .

Editor information

Editors and Affiliations

Technological Educational Institute of Western Greece, Antirrio, Greece
Nikolaos Voros
Ruhr-Universität Bochum, Bochum, Germany
Michael Huebner
Technological Educational Institute of Western Greece, Antirrio, Greece
Georgios Keramidas
Technische Universität Dresden, Dresden, Germany
Diana Goehringer
Technological Educational Institute of Western Greece, Antirio, Greece
Christos Antonopoulos
INESC-ID, Lisbon, Portugal
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Natarajan, S., KrishnaKumar, N., Anuchan, H.V., Pal, D., Nandy, S.K. (2018). ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-78890-6_45
Published: 08 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78889-0
Online ISBN: 978-3-319-78890-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics