Skip to main content

Big Data Technologies for DNA Sequencing

Synonyms

Next-generation sequencing

Definitions

DNA sequencing is a modern technique for the precise determination of the order of nucleotides within a DNA molecule. Using this technique a huge amount of raw data is generated in life sciences.

Overview

Genome analyses play an important role in different applications in the life sciences ranging from animal breeding to personalized medicine. The technological advancements in DNA sequencing lead to vast amounts of genome data being produced and processed on a daily basis. This chapter provides an overview of the big data challenges in the area of DNA sequencing and discusses several data management solutions.

Next-generation sequencing (NGS) technologies make it possible for life scientists to produce huge amounts of DNA sequence data in a short period of time (Stephens et al. 2015). Using these technologies, in recent years thousands of genomes and short DNA sequence reads for humans, plants, animals, and microbes have been collected...

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-77525-8_32
  • Chapter length: 7 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   899.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-77525-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   1,099.99
Price excludes VAT (USA)
Big Data Technologies for DNA Sequencing, Fig. 1

References

  • Becker H (2011) Pflanzenzüchtung. UTB basics. UTB GmbH

    Google Scholar 

  • Bonfield JK, Mahoney MV (2013) Compression of FASTQ and SAM format sequencing data. PLoS One 8(3):e59190

    CrossRef  Google Scholar 

  • Cao MD, Ganesamoorthy D, Elliott AG, Zhang H, Cooper MA, Coin LJ (2016) Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time minion tm sequencing. GigaScience 5(1):32

    CrossRef  Google Scholar 

  • Carlson R (2003) The pace and proliferation of biological technologies. Biosecur Bioterror Biodefense Strategy Pract Sci 1(3):203–214

    Google Scholar 

  • Christley S, Lu Y, Li C, Xie X (2008) Human genomes as email attachments. Bioinformatics 25(2):274–275

    CrossRef  Google Scholar 

  • Chung WC, Chen CC, Ho JM, Lin CY, Hsu WL, Wang YC, Lee DT, Lai F, Huang CW, Chang YJ (2014) Clouddoe: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce. PloS one 9(6):e98146

    CrossRef  Google Scholar 

  • Dorok S, Breß S, Teubner J, Läpple H, Saake G, Markl V (2017) Efficiently storing and analyzing genome data in database systems. Datenbank-Spektrum 17(2): 139–154

    CrossRef  Google Scholar 

  • Fiannaca A, La Rosa M, La Paglia L, Messina A, Urso A (2016) Biographdb: a new graphdb collecting heterogeneous data for bioinformatics analysis. In: Proceedings of BIOTECHNO

    Google Scholar 

  • Have CT, Jensen LJ (2013) Are graph databases ready for bioinformatics? Bioinformatics 29(24):3107

    CrossRef  Google Scholar 

  • Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M (2015) Improved data analysis for the minion nanopore sequencer. Nat Methods 12(4): 351–356

    CrossRef  Google Scholar 

  • Loman NJ, Watson M (2015) Successful test launch for nanopore sequencing. Nat methods 12(4):303

    CrossRef  Google Scholar 

  • Martínez H, Barrachina S, Castillo M, Tárraga J, Medina I, Dopazo J, Quintana-Ortí ES (2015) Scalable RNA sequencing onclusters of multicore processors. Trustcom/BigDataSE/ISPA 3:190–195

    Google Scholar 

  • Mielczarek M, Szyda J (2016) Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 57(1):71–79. https://doi.org/10.1007/s13353-015-0292-7

    CrossRef  Google Scholar 

  • Mushtaq H, Liu F, Costa C, Liu G, Hofstee P, Al-Ars Z (2017) Sparkga: a spark framework for cost effective, fast and accurate dna analysis at scale. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics. ACM, pp 148–157

    Google Scholar 

  • Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14(4): 417–419

    CrossRef  Google Scholar 

  • Pedersen E, Bongo LA (2016) Big biological data management. In: Pop F, Kolodziej J, Martino BD (eds) Resource management for big data platforms. Computer communications and networks. Springer, Heidelberg, pp 265–277

    Google Scholar 

  • Popitsch N, von Haeseler A (2012) NGC: lossless and lossy compression of aligned high-throughput sequencing data. Nucleic Acids Res 41(1):e27–e27

    CrossRef  Google Scholar 

  • Salavert Torres J, Blanquer Espert I, Tomas Dominguez A, Hernendez V, Medina I, Terraga J, Dopazo J (2012) Using GPUs for the exact alignment of short-read genetic sequences by means of the burrows-wheeler transform. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1245–1256

    CrossRef  Google Scholar 

  • Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13(7):e1002195

    CrossRef  Google Scholar 

  • Taylor RC (2010) An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. BMC Bioinform 11(12):S1

    MathSciNet  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lena Wiese .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Verify currency and authenticity via CrossMark

Cite this entry

Wiese, L., Schmitt, A.O., Gültas, M. (2019). Big Data Technologies for DNA Sequencing. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_32

Download citation