Skip to main content

ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2018)

Abstract

Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150–600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Frese, K.S., Katus, H.A., Meder, B.: Next-generation sequencing: from understanding biology to personalized medicine. Biology 2, 378–398 (2013)

    Article  Google Scholar 

  2. Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. 14, 157–167 (2013)

    Article  Google Scholar 

  3. Zhang, W., Chen, J., Yang, Y., Tang, Y., Shang, J., Shen, B.: A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS ONE 6(3), e17915 (2011)

    Article  Google Scholar 

  4. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)

    Article  Google Scholar 

  5. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010)

    Article  Google Scholar 

  6. Hernandez, D., Francois, P., Farinelli, L., Osteras, M., Schrenzel, J.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18, 802–809 (2008)

    Article  Google Scholar 

  7. Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007)

    Article  Google Scholar 

  8. Bryant Jr., D.W., Wong, W.K., Mockler, T.C.: QSRA: a quality-value guided de novo short read assembler. BMC Bioinform. 10, 69 (2009)

    Article  Google Scholar 

  9. Varma, B.S.C., Paul, K., Balakrishnan, M., Lavenier, D.: Fassem: FPGA based acceleration of de novo genome assembly. In: FCCM 2013, pp. 173–176. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  10. Varma, B., Paul, K., Balakrishnan, M.: Accelerating genome assembly using hard embedded blocks in FPGAs. In: 27th International Conference on VLSI Design and 13th International Conference on Embedded Systems, pp. 306–311, January 2014

    Google Scholar 

  11. Cray Inc.: Cray XC40: Scaling Across the Supercomputer Performance Spectrum. http://www.cray.com/sites/default/files/resources/CrayXC40Brochure.pdf

  12. Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K.: AccuRA: accurate alignment of short reads on scalable reconfigurable accelerators. In: Proceedings of IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XVI), pp. 79–87, July 2016

    Google Scholar 

  13. Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K.: Accurate and accelerated secondary analysis of genomes: implications for genomics. In: Barcelona NGS 2017: Structural Variation and Population Genomics, April 2017

    Google Scholar 

  14. Natarajan, S., KrishnaKumar, N., Pavan, M., Pal, D., Nandy, S.K.: ReneGENE-DP: accelerated parallel dynamic programming for genome informatics. In: Accepted at the 2018 International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT), March 2018

    Google Scholar 

  15. Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 345–374 (1994)

    Article  MathSciNet  Google Scholar 

  16. Shi, F.: Fast approximate string matching with q-blocks sequences. In: Proceedings of 3rd South American Workshop on String Processing, pp. 257–271 (1996)

    Google Scholar 

  17. Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms 6, 132–137 (1985)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santhi Natarajan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Natarajan, S., KrishnaKumar, N., Anuchan, H.V., Pal, D., Nandy, S.K. (2018). ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78890-6_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78889-0

  • Online ISBN: 978-3-319-78890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics