Skip to main content

Incremental DNA Sequence Analysis in the Cloud

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7338))

Abstract

In this paper, we propose to demonstrate a “stream-as-you-go” approach that minimizes the data transfer time of data- and compute-intensive scientific applications deployed in the cloud, by making them incrementally processable. We describe a system that implements this approach based on the IBM InfoSphere Streams computing platform deployed over Amazon EC2. The functionality, performance, and usability of the system will be demonstrated through two DNA sequence analysis applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crossbow, http://bowtie-bio.sourceforge.net/crossbow/

  2. IBM InfoSphere Streams, http://www.ibm.com/software/data/infosphere/streams/

  3. SNP, http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism

  4. Tablet Assembly Viewer, http://bioinf.scri.ac.uk/tablet

  5. Collins, F.S., Guyer, M., Chakravarti, A.: Variations on a Theme: Cataloging Human DNA Sequence Variation. Science 278(5343) (1997)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI Conference (2004)

    Google Scholar 

  7. Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N.: Large-Scale DNA Sequence Analysis in the Cloud: A Stream-Based Approach. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 467–476. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N.: Stream As You Go: The Case for Incremental Data Access and Processing in the Cloud. In: ICDE DMC Workshop (2012)

    Google Scholar 

  9. Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L.: Searching for SNPs with Cloud Computing. Genome Biology 10(11) (2009)

    Google Scholar 

  10. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome. Genome Biology 10(3) (2009)

    Google Scholar 

  11. Li, H., Homer, N.: A Survey of Sequence Alignment Algorithms for Next Generation Sequencing. Briefings in Bioinformatics 11(5) (2010)

    Google Scholar 

  12. Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., Wang, J.: SNP Detection for Massively Parallel Whole-Genome Resequencing. Genome Research 19(6) (2009)

    Google Scholar 

  13. Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Computational Biology 5(5) (2009)

    Google Scholar 

  14. Schatz, M.C.: CloudBurst: Highly Sensitive Read Mapping with MapReduce. Bioinformatics 25(11) (2009)

    Google Scholar 

  15. Taylor, R.: An Overview of the Hadoop/MapReduce/HBase Framework and its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12) (2010)

    Google Scholar 

  16. Tung, N., Weisong, S., Douglas, R.: CloudAligner: A Fast and Full-featured Map Reduce-based Tool for Sequence Mapping. BMC Research Notes 4 (2011)

    Google Scholar 

  17. Voelkerding, K.V., Dames, S.A., Durtschi, J.D.: Next Generation Sequencing: From Basic Research to Diagnostics. Clinical Chemistry 55(4) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N. (2012). Incremental DNA Sequence Analysis in the Cloud. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31235-9_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31234-2

  • Online ISBN: 978-3-642-31235-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics