Abstract
In this paper, we propose to demonstrate a “stream-as-you-go” approach that minimizes the data transfer time of data- and compute-intensive scientific applications deployed in the cloud, by making them incrementally processable. We describe a system that implements this approach based on the IBM InfoSphere Streams computing platform deployed over Amazon EC2. The functionality, performance, and usability of the system will be demonstrated through two DNA sequence analysis applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
IBM InfoSphere Streams, http://www.ibm.com/software/data/infosphere/streams/
SNP, http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism
Tablet Assembly Viewer, http://bioinf.scri.ac.uk/tablet
Collins, F.S., Guyer, M., Chakravarti, A.: Variations on a Theme: Cataloging Human DNA Sequence Variation. Science 278(5343) (1997)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI Conference (2004)
Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N.: Large-Scale DNA Sequence Analysis in the Cloud: A Stream-Based Approach. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 467–476. Springer, Heidelberg (2012)
Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N.: Stream As You Go: The Case for Incremental Data Access and Processing in the Cloud. In: ICDE DMC Workshop (2012)
Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L.: Searching for SNPs with Cloud Computing. Genome Biology 10(11) (2009)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome. Genome Biology 10(3) (2009)
Li, H., Homer, N.: A Survey of Sequence Alignment Algorithms for Next Generation Sequencing. Briefings in Bioinformatics 11(5) (2010)
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., Wang, J.: SNP Detection for Massively Parallel Whole-Genome Resequencing. Genome Research 19(6) (2009)
Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Computational Biology 5(5) (2009)
Schatz, M.C.: CloudBurst: Highly Sensitive Read Mapping with MapReduce. Bioinformatics 25(11) (2009)
Taylor, R.: An Overview of the Hadoop/MapReduce/HBase Framework and its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12) (2010)
Tung, N., Weisong, S., Douglas, R.: CloudAligner: A Fast and Full-featured Map Reduce-based Tool for Sequence Mapping. BMC Research Notes 4 (2011)
Voelkerding, K.V., Dames, S.A., Durtschi, J.D.: Next Generation Sequencing: From Basic Research to Diagnostics. Clinical Chemistry 55(4) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kienzler, R., Bruggmann, R., Ranganathan, A., Tatbul, N. (2012). Incremental DNA Sequence Analysis in the Cloud. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-31235-9_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)