Abstract
In research labs, there is often a need to customise software at every step in a given bioinformatics workflow, but traditionally it has been difficult to obtain both a high degree of customisability and good performance. Performance-sensitive tools are often highly monolithic, which can make research difficult. We present a novel set of software development principles and a bioinformatics framework, Friedrich, which is currently in early development. Friedrich applications support both early stage experimentation and late stage batch processing, since they simultaneously allow for good performance and a high degree of flexibility and customisability. These benefits are obtained in large part by basing Friedrich on the multiparadigm programming language Scala. We present a case study in the form of a basic genome assembler and its extension with new functionality. Our architecture has the potential to greatly increase the overall productivity of software developers and researchers in bioinformatics.
Keywords
- Open Framework
- Scala Code
- Bioinformatics Application
- Short Read Data
- Early Stage Experimentation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download conference paper PDF
References
Cock, P.J.A., et al.: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422–1423 (2009)
Compeau, P.E.C., et al.: How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29(11), 987–991 (2011)
Goecks, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11(8), R86+ (2010)
Holland, R.C.G., et al.: BioJava: an Open-Source Framework for Bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)
Hundt, R.: Loop Recognition in C++/Java/Go/Scala. In: Proceedings of Scala Days 2011 (2011)
Hunter, A.A., et al.: Yabi: An online research environment for grid, high performance and cloud computing. Source Code for Biology and Medicine 7(1), 1+ (2012)
Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
MacLean, D., Kamoun, S.: Big data in small places. Nature Biotechnology 30(1), 33–34 (2012)
Mangalam, H.: The Bio* toolkits–a brief overview. Briefings in Bioinformatics 3(3), 296–302 (2002)
McKenna, A., et al.: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20(9), 1297–1303 (2010)
Mitsuteru, N.G., et al.: BioRuby: open-source bioinformatics library (2003)
Odersky, M.: The Scala Language Specification, Version 2.9 (May 2011), http://www.scala-lang.org/docu/files/ScalaReference.pdf
Prins, P.: BioScala (March 2011), https://github.com/bioscala/bioscala
Rother, K., et al.: A toolbox for developing bioinformatics software. Briefings in Bioinformatics 13(2), 244–257 (2012)
Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome research 19(6), 1117–1123 (2009)
Stajich, J.E., et al.: The Bioperl toolkit: Perl modules for the life sciences. Genome research 12(10), 1611–1618 (2002)
Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keeble-Gagnère, G., Nyström-Persson, J., Bellgard, M.I., Mizuguchi, K. (2012). An Open Framework for Extensible Multi-stage Bioinformatics Software. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2012. Lecture Notes in Computer Science(), vol 7632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34123-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-34123-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34122-9
Online ISBN: 978-3-642-34123-6
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.iapr.org/
