Abstract
Next Generation Sequencing (NGS) allows sequencing of a human genome within hours, enabling large scale applications such as sequencing the genome of each patient in a clinical study. Each individual human genome has about 3.5 Million genetic differences to the so called reference genome, the consensus genome of a healthy human. These differences, called variants, determine individual phenotypes, and certain variants are known to indicate disease predispositions. Finding associations from variant patterns and affected genes to these diseases requires combined analysis of variants from multiple individuals and hence, efficient solutions for accessing and filtering the variant data. We present Variant-DB, our in-house database solution that allows such efficient access to millions of variants from hundreds to thousands of individuals. Variant-DB stores individual variant genotypes and annotations. It features a REST-API and a web-based front-end for filtering variants based on annotations, individuals, families and studies. We explain Variant-DB and its front-end and demonstrate how the Variant-DB API can be included in data integration workflows.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rosenbloom, K.R., et al.: The UCSC genome browser database: 2015 update. Nucleic Acids Res. 43, 670–681 (2014)
Auton, A., et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)
Hakenberg, J., Cheng, W.Y., Thomas, P., Wang, Y.C., Uzilov, A.V., Chen, R.: Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts. BMC Bioinform. 17(1), 24 (2016)
Auwera, G., et al.: From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline, vol. 11 (2014)
Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
Cheng, W.Y., Hakenberg, J., Li, S.D., Chen, R.: DIVAS: a centralized genetic variant repository representing 150 000 individuals from multiple disease cohorts. Bioinformatics 32(1), 151–153 (2015)
Fokkema, I.F.A.C., Taschner, P.E.M., Schaafsma, G.C.P., Celli, J., Laros, J.F.J., den Dunnen, J.T.: LOVD v. 2.0: the next generation in gene variant databases. Hum. Mutat. 32(5), 557–563 (2011)
MacDonald, J.R., Ziman, R., Yuen, R.K.C., Feuk, L., Scherer, S.W.: The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42(D1), 986–992 (2014)
Higasa, K., et al.: Human genetic variation database, a reference database of genetic variations in the japanese population. J. Hum. Genet. 61(6), 547–553 (2016)
Vandeweyer, G., et al.: Detection and interpretation of genomic structural variation in health and disease. Expert Rev. Mol. Diagn. 13(1), 61–82 (2013)
Steger, M., et al.: Phosphoproteomics reveals that Parkinson’s disease kinase LRRK2 regulates a subset of Rab GTPases. eLife 5, pp. 1–28 (2016)
Bonifati, V.: Parkinson’s disease: the LRRK2-G2019S mutation: opening a novel era in parkinson’s disease genetics. Eur. J. Hum. Genet. 14(10), 1061–1062 (2006)
Satagopam, V., et al.: Integration and visualization of translational medicine data for better understanding of human diseases. Big Data 4(2), 97–108 (2016)
Athey, B.D., Braxenthaler, M., Haas, M., Guo, Y., Arbor, A., Alliance, P.: TranSMART: an open source and community-driven informatics and data sharing platform for clinical and translational research. AMIA summits on translational science proceedings, pp. 6–8 (2013)
Gawron, P., et al.: MINERVA—a platform for visualization and curation of molecular interaction networks. Nat. Publishing Group 2(June), 1–6 (2016)
Herzinger, S., Gu, W., Satagopam, V., Eifes, S., Rege, K., Barbosa-Silva, A., Schneider, R.: SmartR: an open-source platform for interactive visual analytics for translational research data. Bioinformatics 33(14), 2229–2231 (2017)
Fujita, K.A., et al.: Integrating pathways of parkinson’s disease in a molecular interaction map. Mol. Neurobiol. 49(1), 88–102 (2014)
Acknowledgments
We would like to thank Dheeraj R. Bobbili for the help with the PPMI data, Marek Ostaszewski and Piotr Gawron for their support with the Minerva API, Venkata Satagopam, Wei Gu, Sascha Herzinger for their help with tranSMART. JK and PM were supported by the FNR NCER-PD grant. PM was supported by the JPND Courage-PD project. Data used in the preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (www.ppmi-info.org/data) For up-to-date information on the study, visit www.ppmi-info.org. PPMI - a public-private partnership - is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, including Abbvie, Avid, Biogen, Bristol-Myers Squibb, Covance, GE Healthcare, Genentech, GlaxoSmithKline, Lilly, Lundbeek, Merck, Meso Scale Discovery, Pfizer, Piramal, Roche, Servier, Teva, UCB, and Golub Capital.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kutzera, J., May, P. (2017). Variant-DB: A Tool for Efficiently Exploring Millions of Human Genetic Variants and Their Annotations. In: Da Silveira, M., Pruski, C., Schneider, R. (eds) Data Integration in the Life Sciences. DILS 2017. Lecture Notes in Computer Science(), vol 10649. Springer, Cham. https://doi.org/10.1007/978-3-319-69751-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-69751-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69750-5
Online ISBN: 978-3-319-69751-2
eBook Packages: Computer ScienceComputer Science (R0)