Genome sequence analysis with MonetDB

Cijvat, Robin; Manegold, Stefan; Kersten, Martin; Klau, Gunnar W.; Schönhuth, Alexander; Marschall, Tobias; Zhang, Ying

doi:10.1007/s13222-015-0198-x

Genome sequence analysis with MonetDB

A case study on Ebola virus diversity

SCHWERPUNKTBEITRAG
Published: 12 October 2015

Volume 15, pages 185–191, (2015)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Robin Cijvat¹,
Stefan Manegold²,
Martin Kersten^1,2,
Gunnar W. Klau²,
Alexander Schönhuth²,
Tobias Marschall³ &
…
Ying Zhang^1,2

425 Accesses
5 Citations
Explore all metrics

Abstract

Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus

genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://www.monetdb.org/
https://www.monetdb.org/bam/
For this use case, we do not benefit from the read oriented storage that MonetDB/BAM uses. However, [2] shows many use cases for which it does.

Literatur

Beerenwinkel N et al (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:329
Cijvat R (2014) Bridging the gap between big genome data analysis and database management systems. Master’s thesis, CWI and Utrecht University
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Paper presented at the 6th Symposium on Operating System Design and Implementation, San Francisco, December 2004
Dorok S et al (2014) Toward Efficient Variant Calling Inside Main-Memory Database Systems. BIOKDD-DEXA Workshops, pp. 41–45
Gire SK et al (2014) Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345(6202):1369–1372
Kargin Y, Kersten ML, Manegold S, Pirk H (2015) The DBMS—your big data sommelier. Proceedings of IEEE International Conference on Data Engineering 2015 (ICDE 31)
Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760
Li H et al (2009) The Sequence Alignment/{M}ap format and SAMtools. Bioinformatics 25:2078–2079
Manegold S et al (2009) Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2):1648–1653
Pavlo A et al (2009) A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD, pp. 165–178
Quinlan A Hall I (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842
Röhm U, Blakeley JA (2009) Data management for high-throughput genomics. CIDR, pp. 97–111
Schapranow MP, Plattner H (2013) HIG - An in-memory database platform enabling real-time analyses of genome data. BigData, pp. 691–696
Schatz MC, Langmead B (2013) The DNA data deluge. IEEE Spectrum 50(7):28–33
Toepfer A et al (2014) Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 10(3):e1003515
Volchkov VE et al (1999) Characterization of the L gene and 5` trailer region of Ebola virus. J Gen Virol 80(Pt2):355–362
Wolstencroft K et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41(Web Server issue):W557–W561

Download references

Author information

Authors and Affiliations

MonetDB Solutions, Amsterdam, The Netherlands
Robin Cijvat, Martin Kersten & Ying Zhang
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Stefan Manegold, Martin Kersten, Gunnar W. Klau, Alexander Schönhuth & Ying Zhang
Saarland University & Max Planck Institute for Informatics, Saarbrücken, Germany
Tobias Marschall

Authors

Robin Cijvat
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Manegold
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kersten
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar W. Klau
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schönhuth
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Marschall
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robin Cijvat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cijvat, R., Manegold, S., Kersten, M. et al. Genome sequence analysis with MonetDB. Datenbank Spektrum 15, 185–191 (2015). https://doi.org/10.1007/s13222-015-0198-x

Download citation

Published: 12 October 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s13222-015-0198-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genome sequence analysis with MonetDB

Abstract

Access this article

Similar content being viewed by others

BamBam: genome sequence analysis tools for biologists

dBBQs: dataBase of Bacterial Quality scores

Bioinformatics in Next-Generation Genome Sequencing

Notes

Literatur

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Genome sequence analysis with MonetDB

Abstract

Access this article

Similar content being viewed by others

BamBam: genome sequence analysis tools for biologists

dBBQs: dataBase of Bacterial Quality scores

Bioinformatics in Next-Generation Genome Sequencing

Notes

Literatur

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation