EVE: Cloud-Based Annotation of Human Genetic Variants
Annotation of human genetic variants enables genotype-phenotype association studies at the gene, pathway, and tissue level. Annotation results are difficult to reproduce across study sites due to shifting software versions and a lack of a unified hardware interface between study sites. Cloud computing offers a promising solution by integrating hardware and software into reproducible virtual appliances which may be utilized on-demand and shared across institutions. We developed ENSEMBL VEP on EC2 (EVE), a cloud-based virtual appliance for annotation of human genetic variants built around the ENSEMBL Variant Effect Predictor. We integrated virtual hardware infrastructure, open-source software, and publicly available genomic datasets to provide annotation capability for genetic variants in the context of genes/transcripts, Gene Ontology pathways, tissue-specific expression from the Gene Expression Atlas, miRNA annotations, minor allele frequencies from the 1000 Genomes Project and the Exome Aggregation Consortium, and deleteriousness scores from Combined Annotation Dependent Depletion. We demonstrate the utility of EVE by annotating the genetic variants in a case-control study of glaucoma. Cloud computing can reduce the difficulty of replicating complex software pipelines such as annotation pipelines across study sites. We provide a publicly available CloudFormation template of the EVE virtual appliance which can automatically provision and deploy a parameterized, preconfigured hardware/software stack ready for annotation of human genetic variants (github.com/epistasislab/EVE). This approach offers increased reproducibility in human genetic studies by providing a unified appliance to researchers across the world.
KeywordsAnnotation GWAS Cloud computing Reproducibility Infrastructure-as-Code
This work is supported by an Amazon Web Services Cloud Credits for Research award to BSC and NIH AI116794 to JHM.
- 3.Witte, J.S.: Genome-wide association studies and beyond. Annu. Rev. Public Health 77, 9–20 (2014). doi: 10.1146/annurev.publhealth.012809.103723.Genome-WideGoogle Scholar
- 6.Greene, C.S., Voight, B.F.: Pathway and network-based strategies to translate genetic discoveries into effective therapies. Hum. Mol. Genet., 1–5 (2016). doi: 10.1093/hmg/ddw160
- 7.Greene, C.S., Krishnan, A., Wong, A.K., et al.: Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47(6) (2015). doi: 10.1038/ng.3259
- 8.McLaren, W., Gil, L., Hunt, S.E., et al.: The ensembl variant effect predictor. Genome Biol. 17(122) (2016). doi: 10.1186/s13059-016-0974-4
- 15.Kapushesky, M., Adamusiak, T., Burdett, T., et al.: Gene Expression Atlas update–a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res. 40(Database issue), D1077-81 (2012). doi: 10.1093/nar/gkr913
- 17.Wiggs, J.L., Yaspan, B.L., Hauser, M.A., et al.: Common variants at 9p21 and 8q22 are associated with increased susceptibility to optic nerve degeneration in glaucoma. PLoS Genet. 8(4) (2012). doi: 10.1371/journal.pgen.1002654
- 22.Project Consortium G, Consortium Participants are arranged by project role G, by institution alphabetically then, et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 490(7422), 56–65 (2012). doi: 10.1038/nature11632
- 24.Li, J., Doyle, M.A., Saeed, I., et al.: Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment. PLoS One 9(4) (2014). doi: 10.1371/journal.pone.0095217