Abstract
Although new sequencing technology has enabled biologists to examine genome-wide variation in their taxa of interest, new experimental design, analysis, and cost hurdles impede many researchers from using the benefits of next-generation sequencing. Here we present a workflow to help researchers identify genomic intervals associated with a desired gene function by using gene ontology terms to filter an annotated genome. Resulting intervals comprise a target region that can then be used to design complementary sequences or baits for capturing desired targets in a target enrichment experiment. We show that target regions produced by our workflow and Ensembl Genebuild share intervals but nonetheless differ, and then demonstrate our workflow’s utility for target enrichment experiments involving non-model organisms. Using available turtle genomes and bait sequences designed to capture a workflow-generated target region, in silico analysis predicts between 48 and 86 % of baits will bind to complementary sequences in genomes across the Order Testudines (turtles and tortoises). We then use these bait sequences in an actual target enrichment experiment and show that bait performance falls within the range predicted by in silico analysis. We show that by selecting a reference genome related as closely as possible to taxa of interest and focusing on important and likely conserved gene functions, users can acquire valuable genomic data from non-model organisms.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25:25–29
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Broad KD, Keverne EB (2012) The post-natal chemosensory environment induces epigenetic changes in vomeronasal receptor gene expression and a bias in olfactory preference. Behav Genet 42:461–471
Brzeski KE, Rabon DR, Chamberlain MJ, Waits LP, Taylor SS (2014) Inbreeding and inbreeding depression in endangered red wolves (Canis rufus). Mol Ecol 23:4241–4255
Castoe TA, de Koning AP, Hall KT et al (2013) The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Nat Acad Sci USA 110:20645–20650
Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Ensembl (2012) Ensembl gene annotation project: Pelodiscus sinensis (Chinese soft-shell turtle). http://useast.ensembl.org/info/genome/genebuild/2012_07_chinese_softshell_turtle_genebuild.pdf
Faircloth BC, McCormack JE, Crawford NG et al (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726
Flicek P, Amode MR, Barrell D et al (2014) Ensembl 2014. Nucleic Acids Res 42:D749–D755
Gnirke A, Melnikov A, Maguire J et al (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189
Grabherr MG, Haas BJ, Yassour M et al (2011) Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol 29:644–652
Gremme G, Steinbiss S, Kurtz S (2013) GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10:645–656
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Kinsella RJ, Kahari A, Haider S et al (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011:bar030
Korsten P, Mueller JC, Hermannstädter C et al (2010) Association between DRD4 gene polymorphism and personality variation in great tits: a test across four wild populations. Mol Ecol 19:832–843
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Liao Y, Smyth GK, Shi W (2014) feature Counts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930
Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21:936–939
Mamanova L, Coffey AJ, Scott CE et al (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7:111–118
Manceau M, Domingues VS, Linnen CR, Rosenblum EB, Hoekstra HE (2010) Convergence in pigmentation at multiple levels: mutations, genes and function. Philos Trans R Soc B Biol Sci 365:2439–2450
McCarthy FM, Wang N, Magee GB et al (2006) AgBase: a functional genomics resource for agriculture. BMC Genom 7:229
McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Near TJ, Meylan PA, Shaffer HB (2005) Assessing concordance of fossil calibration points in molecular clock studies: an example using turtles. Am Nat 165:137–146
Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acid Res 42:D756–D763
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
Rouillard JM, Zuker M, Gulari E (2003) OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res 31:3057–3062
Shaffer HB, Minx P, Warren DE et al (2013) The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol 14:R28
Shedlock AM, Edwards SV (2009) Amniotes (amniota). In: Hedges SB, Kumar S (eds) The timetree of life. Oxford University Press, Oxford, pp 375–379
St John JA, Braun EL, Isberg SR et al (2012) Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes. Genome Biol 13:415
The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195
Wang Z, Pascual-Anaya J, Zadissa A et al (2013) The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat Genet 45:701–706
Acknowledgments
We thank LSU and the LSU AgCenter for financial and logistical support. The Lucius Gilbert Foundation provided support for sequencing and for J.P.E. We are grateful to Richard Carmouche of Pennington Biomedical Research Center’s Genomic Core Facility for performing next-generation sequencing laboratory work. This project used Genomics core facilities that are supported in part by COBRE (NIH 8P20GM103528) and NORC (NIH 2P30DK072476) center Grants from the National Institutes of Health. We thank High Performance Computing at LSU for resources to analyze next-generation sequencing data.
Author contributions
J.P.E. designed the study and wrote the workflow scripts. J.P.E. and S.S.T. wrote the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
12686_2015_487_MOESM1_ESM.docx
Supplementary Table 1 Comparison of two target regions made by filtering the Pelodiscus sinensis 1.0 genome assembly using the gene ontology term immune response (GO:0006955) and its descendants. GO2TR’s target region was generated using standard GO2TR settings and mRNA predicted by the NCBI Eukaryotic Genome Annotation Pipeline. Ensembl-GOanna’s target region was generated using gene annotations and GO term associations determined by GOanna. For each comparison, features found in the first target region are compared to features in the second. Shared features overlap at least 99 %, while unique features are present in the first but not the second target region. Shared intervals differ slightly between the two target regions because some were considered flanking, which are those present in the first target region that stretch beyond intervals in the second, resulting in non-overlapping 5′ or 3′ bases. (DOCX 12 kb)
Rights and permissions
About this article
Cite this article
Elbers, J.P., Taylor, S.S. GO2TR: a gene ontology-based workflow to generate target regions for target enrichment experiments. Conservation Genet Resour 7, 851–857 (2015). https://doi.org/10.1007/s12686-015-0487-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12686-015-0487-6