Abstract
Rapid advancements in the area of next generation sequencing is revolutionizing the way in which biologists and now increasingly, clinicians analyze genomic data. These advances have substantially decreased the time and the cost it takes to sequence the genomes of new patients, thereby making genomic techniques more mainstream and giving rise to the new era of precision medicine. National scale genome programs have been launched in various parts of the world such as USA, the United Kingdom, and Saudi Arabia to name a few. One of the key insights out of this mainstream adoption is that even though the time and cost of generating sequence data has decreased dramatically, the cost of analyzing the data to yield clinically relevant information has not proportionally decreased. On the contrary, downstream analysis of the genomic data now dominates the cost in terms of time, effort and monetary value. This could be attributed to a number of factors: the sheer volume of data, limited knowledge of phenotypic, regulatory and epigenetic artifacts within the genome, and limited computational capabilities of existing data analysis tools and infrastructure. Overcoming these challenges is central to realize a more accurate, sophisticated and cost-effective genomic medicine. Another challenge, related to the limited analytic capabilities of existing computational and storage infrastructure is what we address in this paper. We discuss how novel trends in hardware, including the emergence of cheap, high performance and endurance solid-state storage associated with low latency interconnect and software defined orchestration, can help creating a high performance storage tier which improves data acquisition, storage, transmission and analysis over the current commercial alternatives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gonzalez-Garay, M.: The road from next-generation sequencing to personalized medicine. Pers. Med. 11(5), 523–544 (2014)
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)
Stephens, Z., Lee, S., Faghri, F., Campbell, R., Zhai, C., Efron, M., et al.: Big data: Astronomical or genomical? PLoS Biol. 13(7) (2015)
Supermicro (2016). www.supermicro.com
IOzone: File system benchmarking (2016). www.iozone.org
PetaGene (2016). www.petagene.com/
Greenfield, D., Stegle, O., Rrustemi, A.: GeneCodeq: quality score compression and improved genotyping using a bayesian framework. Bioinformatics 32(20), 3124–3132 (2016)
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16483-0_41
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: The case for docker in multicloud enabled bioinformatics applications. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 587–601. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31744-1_52
Acknowledgments
This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST). Our thanks to Majed Alelaiwi, Gabriele Paciucci, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. Our thanks to Majed Alelaiwi, Gabriele Paciucci, Craig Rhodes, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. We would also like to thank Faheem Karim and Martin Galle from Supermicro on their advice on chassis and configuration. We would like to thank Vaughn Wittorff and Dan Greenfield of PetaGene (Cambridge, UK) for allowing us to use their test compression runs.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kaul, G., Shah, Z.A., Abouelhoda, M. (2017). A High Performance Storage Appliance for Genomic Data. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)