Skip to main content

TaF: a web platform for taxonomic profile-based fungal gene prediction

Abstract

Introduction

The accurate prediction and annotation of gene structures from the genome sequence of an organism enable genome-wide functional analyses to obtain insight into the biological properties of an organism.

Objectives

We recently developed a highly accurate filamentous fungal gene prediction pipeline and web platform called TaF. TaF is a homology-based gene predictor employing large-scale taxonomic profiling to search for close relatives in genome queries.

Methods

TaF pipeline consists of four processing steps; (1) taxonomic profiling to search for close relatives to query, (2) generation of hints for determining exon–intron boundaries from orthologous protein sequence data of the profiled species, (3) gene prediction by combination of ab inito and evidence-based prediction methods, and (4) homology search for gene models.

Results

TaF generates extrinsic evidence that suggests possible exon–intron boundaries based on orthologous protein sequence data, thus reducing false-positive predictions of gene structure based on distantly related orthologs data. In particular, the gene prediction method using taxonomic profiling shows very high accuracy, including high sensitivity and specificity for gene models, suggesting a new approach for homology-based gene prediction from newly sequenced or uncharacterized fungal genomes, with the potential to improve the quality of gene prediction.

Conclusion

TaF will be a useful tool for fungal genome-wide analyses, including the identification of targeted genes associated with a trait, transcriptome profiling, comparative genomics, and evolutionary analysis.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  PubMed  CAS  Google Scholar 

  2. Borodovsky M, Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinform Chap 4: Unit 4(6):1–10

    Google Scholar 

  3. Choo JH, Hong CP, Lim JY, Seo JA, Kim YS, Lee DW, Park SG, Lee GW, Carroll E, Lee YW, Kang HA (2016) Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid. Biotechnol Biofuels 9:246

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  4. DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M, Galagan JE (2007) Conrad: gene prediction using conditional random fields. Genome Res 17:1389–1398

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  5. Dunne MP, Kelly S (2017) OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations. BMC Genom 18:390

    Article  CAS  Google Scholar 

  6. Galagan JE, Henn MR, Ma LJ, Cuomo CA, Birren B (2005) Genomics of the fungal kingdom: insights into eukaryotic biology. Genome Res 15:1620–1631

    Article  PubMed  CAS  Google Scholar 

  7. Hayden EC (2014) Technology: the $1,000 genome. Nature 507:294–295

    Article  PubMed  CAS  Google Scholar 

  8. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  9. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  10. Nakagawa S, Niimura Y, Gojobori T, Tanaka H, Miura K (2008) Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res 36:861–871

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  11. Ondov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a Web browser. BMC Bioinform 12:385

    Article  Google Scholar 

  12. Reid I, O’Toole N, Zabaneh O, Nourzadeh R, Dahdouli M, Abdellateef M, Gordon PM, Soh J, Butler G, Sensen CW, Tsang A (2014) SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinform 15:229

    Article  Google Scholar 

  13. Riley R, Haridas S, Wolfe KH, Lopes MR, Hittinger CT, Goker M, Salamov AA, Wisecaver JH, Long TM, Calvey CH, Aerts AL, Barry KW, Choi C, Clum A, Coughlan AY, Deshpande S, Douglass AP, Hanson SJ, Klenk HP, LaButti KM, Lapidus A, Lindquist EA, Lipzen AM, Meier-Kolthoff JP, Ohm RA, Otillar RP, Pangilinan JL, Peng Y, Rokas A, Rosa CA, Scheuner C, Sibirny AA, Slot JC, Stielow JB, Sun H, Kurtzman CP, Blackwell M, Grigoriev IV, Jeffries TW (2016) Comparative genomics of biotechnologically important yeasts. Proc Natl Acad Sci U S A 113:9882–9887

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  14. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Kruger N, Sonnenburg S, Ratsch G (2009) mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res 19:2133–2143

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  15. Shim D, Park SG, Kim K, Bae W, Lee GW, Ha BS, Ro HS, Kim M, Ryoo R, Rhee SK, Nou IS, Koo CD, Hong CP, Ryu H (2016) Whole genome de novo sequencing and genome annotation of the world popular cultivated edible mushroom, Lentinula edodes. J Biotechnol 223:24–25

    Article  PubMed  CAS  Google Scholar 

  16. Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31

    Article  CAS  Google Scholar 

  17. Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2:ii215ii225

    Article  Google Scholar 

  18. Stanke M, Steinkamp R, Waack S, Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res 32:W309–W312

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  19. van der Burgt A, Severing E, Collemare J, de Wit PJ (2014) Automated alignment-based curation of gene models in filamentous fungi. BMC Bioinform 15:19

    Article  CAS  Google Scholar 

  20. Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28:2184–2185

    Article  PubMed  CAS  Google Scholar 

  21. Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–1875

    Article  PubMed  CAS  Google Scholar 

  22. Yandell M, Ence D (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Strategic Initiative for Microbiomes in Agriculture and Food (Grant no. 914008-04) and by the Golden Seed Project (Grant no. 213007-05-1-SBH20) of the Ministry of Agriculture, Food and Rural Affairs of the Republic of Korea.

Author information

Affiliations

Authors

Contributions

SGP, DR, and HL developed the TaF server, evaluated the accuracy of gene prediction, and drafted the paper. HR and YJA performed the RNA isoform sequencing and gene prediction for L. edodes. JK and CPH conceived the study, participated in its design and coordination, and drafted the manuscript. All the authors read and approved the final manuscript.

Corresponding authors

Correspondence to Junsu Ko or Chang Pyo Hong.

Ethics declarations

Conflict of interest

Sin-Gi Park, DongSung Ryu, Hyunsung Lee, Hojin Ryu, Yong Ju Ahn, Seung il Yoo, Junsu Ko, and Chang Pyo Hong declare that they do not have conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PPTX 62 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Park, SG., Ryu, D., Lee, H. et al. TaF: a web platform for taxonomic profile-based fungal gene prediction. Genes Genom 41, 337–342 (2019). https://doi.org/10.1007/s13258-018-0766-1

Download citation

Keywords

  • Ab initio
  • Exon–intron boundary
  • Filamentous fungal genome
  • Homology-based gene prediction
  • Taxonomic profile
  • Web platform