Skip to main content
Log in

GO2TR: a gene ontology-based workflow to generate target regions for target enrichment experiments

  • Methods and Resources Article
  • Published:
Conservation Genetics Resources Aims and scope Submit manuscript

Abstract

Although new sequencing technology has enabled biologists to examine genome-wide variation in their taxa of interest, new experimental design, analysis, and cost hurdles impede many researchers from using the benefits of next-generation sequencing. Here we present a workflow to help researchers identify genomic intervals associated with a desired gene function by using gene ontology terms to filter an annotated genome. Resulting intervals comprise a target region that can then be used to design complementary sequences or baits for capturing desired targets in a target enrichment experiment. We show that target regions produced by our workflow and Ensembl Genebuild share intervals but nonetheless differ, and then demonstrate our workflow’s utility for target enrichment experiments involving non-model organisms. Using available turtle genomes and bait sequences designed to capture a workflow-generated target region, in silico analysis predicts between 48 and 86 % of baits will bind to complementary sequences in genomes across the Order Testudines (turtles and tortoises). We then use these bait sequences in an actual target enrichment experiment and show that bait performance falls within the range predicted by in silico analysis. We show that by selecting a reference genome related as closely as possible to taxa of interest and focusing on important and likely conserved gene functions, users can acquire valuable genomic data from non-model organisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25:25–29

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Broad KD, Keverne EB (2012) The post-natal chemosensory environment induces epigenetic changes in vomeronasal receptor gene expression and a bias in olfactory preference. Behav Genet 42:461–471

    Article  PubMed  Google Scholar 

  • Brzeski KE, Rabon DR, Chamberlain MJ, Waits LP, Taylor SS (2014) Inbreeding and inbreeding depression in endangered red wolves (Canis rufus). Mol Ecol 23:4241–4255

    Article  PubMed  Google Scholar 

  • Castoe TA, de Koning AP, Hall KT et al (2013) The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Nat Acad Sci USA 110:20645–20650

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676

    Article  CAS  PubMed  Google Scholar 

  • Ensembl (2012) Ensembl gene annotation project: Pelodiscus sinensis (Chinese soft-shell turtle). http://useast.ensembl.org/info/genome/genebuild/2012_07_chinese_softshell_turtle_genebuild.pdf

  • Faircloth BC, McCormack JE, Crawford NG et al (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726

    Article  PubMed  Google Scholar 

  • Flicek P, Amode MR, Barrell D et al (2014) Ensembl 2014. Nucleic Acids Res 42:D749–D755

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Gnirke A, Melnikov A, Maguire J et al (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Grabherr MG, Haas BJ, Yassour M et al (2011) Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol 29:644–652

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Gremme G, Steinbiss S, Kurtz S (2013) GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10:645–656

    Article  PubMed  Google Scholar 

  • Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kinsella RJ, Kahari A, Haider S et al (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011:bar030

  • Korsten P, Mueller JC, Hermannstädter C et al (2010) Association between DRD4 gene polymorphism and personality variation in great tits: a test across four wild populations. Mol Ecol 19:832–843

    Article  CAS  PubMed  Google Scholar 

  • Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed Central  PubMed  Google Scholar 

  • Liao Y, Smyth GK, Shi W (2014) feature Counts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930

    Article  CAS  PubMed  Google Scholar 

  • Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21:936–939

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mamanova L, Coffey AJ, Scott CE et al (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7:111–118

    Article  CAS  PubMed  Google Scholar 

  • Manceau M, Domingues VS, Linnen CR, Rosenblum EB, Hoekstra HE (2010) Convergence in pigmentation at multiple levels: mutations, genes and function. Philos Trans R Soc B Biol Sci 365:2439–2450

    Article  CAS  Google Scholar 

  • McCarthy FM, Wang N, Magee GB et al (2006) AgBase: a functional genomics resource for agriculture. BMC Genom 7:229

    Article  Google Scholar 

  • McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Near TJ, Meylan PA, Shaffer HB (2005) Assessing concordance of fossil calibration points in molecular clock studies: an example using turtles. Am Nat 165:137–146

    Article  PubMed  Google Scholar 

  • Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acid Res 42:D756–D763

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Rouillard JM, Zuker M, Gulari E (2003) OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res 31:3057–3062

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shaffer HB, Minx P, Warren DE et al (2013) The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol 14:R28

    Article  PubMed  Google Scholar 

  • Shedlock AM, Edwards SV (2009) Amniotes (amniota). In: Hedges SB, Kumar S (eds) The timetree of life. Oxford University Press, Oxford, pp 375–379

    Google Scholar 

  • St John JA, Braun EL, Isberg SR et al (2012) Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes. Genome Biol 13:415

    Article  PubMed Central  PubMed  Google Scholar 

  • The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195

    Article  PubMed Central  Google Scholar 

  • Wang Z, Pascual-Anaya J, Zadissa A et al (2013) The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat Genet 45:701–706

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We thank LSU and the LSU AgCenter for financial and logistical support. The Lucius Gilbert Foundation provided support for sequencing and for J.P.E. We are grateful to Richard Carmouche of Pennington Biomedical Research Center’s Genomic Core Facility for performing next-generation sequencing laboratory work. This project used Genomics core facilities that are supported in part by COBRE (NIH 8P20GM103528) and NORC (NIH 2P30DK072476) center Grants from the National Institutes of Health. We thank High Performance Computing at LSU for resources to analyze next-generation sequencing data.

Author contributions

J.P.E. designed the study and wrote the workflow scripts. J.P.E. and S.S.T. wrote the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean P. Elbers.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12686_2015_487_MOESM1_ESM.docx

Supplementary Table 1 Comparison of two target regions made by filtering the Pelodiscus sinensis 1.0 genome assembly using the gene ontology term immune response (GO:0006955) and its descendants. GO2TR’s target region was generated using standard GO2TR settings and mRNA predicted by the NCBI Eukaryotic Genome Annotation Pipeline. Ensembl-GOanna’s target region was generated using gene annotations and GO term associations determined by GOanna. For each comparison, features found in the first target region are compared to features in the second. Shared features overlap at least 99 %, while unique features are present in the first but not the second target region. Shared intervals differ slightly between the two target regions because some were considered flanking, which are those present in the first target region that stretch beyond intervals in the second, resulting in non-overlapping 5′ or 3′ bases. (DOCX 12 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elbers, J.P., Taylor, S.S. GO2TR: a gene ontology-based workflow to generate target regions for target enrichment experiments. Conservation Genet Resour 7, 851–857 (2015). https://doi.org/10.1007/s12686-015-0487-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12686-015-0487-6

Keywords

Navigation