Abstract
Introduction
The accurate prediction and annotation of gene structures from the genome sequence of an organism enable genome-wide functional analyses to obtain insight into the biological properties of an organism.
Objectives
We recently developed a highly accurate filamentous fungal gene prediction pipeline and web platform called TaF. TaF is a homology-based gene predictor employing large-scale taxonomic profiling to search for close relatives in genome queries.
Methods
TaF pipeline consists of four processing steps; (1) taxonomic profiling to search for close relatives to query, (2) generation of hints for determining exon–intron boundaries from orthologous protein sequence data of the profiled species, (3) gene prediction by combination of ab inito and evidence-based prediction methods, and (4) homology search for gene models.
Results
TaF generates extrinsic evidence that suggests possible exon–intron boundaries based on orthologous protein sequence data, thus reducing false-positive predictions of gene structure based on distantly related orthologs data. In particular, the gene prediction method using taxonomic profiling shows very high accuracy, including high sensitivity and specificity for gene models, suggesting a new approach for homology-based gene prediction from newly sequenced or uncharacterized fungal genomes, with the potential to improve the quality of gene prediction.
Conclusion
TaF will be a useful tool for fungal genome-wide analyses, including the identification of targeted genes associated with a trait, transcriptome profiling, comparative genomics, and evolutionary analysis.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Borodovsky M, Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinform Chap 4: Unit 4(6):1–10
Choo JH, Hong CP, Lim JY, Seo JA, Kim YS, Lee DW, Park SG, Lee GW, Carroll E, Lee YW, Kang HA (2016) Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid. Biotechnol Biofuels 9:246
DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M, Galagan JE (2007) Conrad: gene prediction using conditional random fields. Genome Res 17:1389–1398
Dunne MP, Kelly S (2017) OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations. BMC Genom 18:390
Galagan JE, Henn MR, Ma LJ, Cuomo CA, Birren B (2005) Genomics of the fungal kingdom: insights into eukaryotic biology. Genome Res 15:1620–1631
Hayden EC (2014) Technology: the $1,000 genome. Nature 507:294–295
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517
Nakagawa S, Niimura Y, Gojobori T, Tanaka H, Miura K (2008) Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res 36:861–871
Ondov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a Web browser. BMC Bioinform 12:385
Reid I, O’Toole N, Zabaneh O, Nourzadeh R, Dahdouli M, Abdellateef M, Gordon PM, Soh J, Butler G, Sensen CW, Tsang A (2014) SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinform 15:229
Riley R, Haridas S, Wolfe KH, Lopes MR, Hittinger CT, Goker M, Salamov AA, Wisecaver JH, Long TM, Calvey CH, Aerts AL, Barry KW, Choi C, Clum A, Coughlan AY, Deshpande S, Douglass AP, Hanson SJ, Klenk HP, LaButti KM, Lapidus A, Lindquist EA, Lipzen AM, Meier-Kolthoff JP, Ohm RA, Otillar RP, Pangilinan JL, Peng Y, Rokas A, Rosa CA, Scheuner C, Sibirny AA, Slot JC, Stielow JB, Sun H, Kurtzman CP, Blackwell M, Grigoriev IV, Jeffries TW (2016) Comparative genomics of biotechnologically important yeasts. Proc Natl Acad Sci U S A 113:9882–9887
Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Kruger N, Sonnenburg S, Ratsch G (2009) mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res 19:2133–2143
Shim D, Park SG, Kim K, Bae W, Lee GW, Ha BS, Ro HS, Kim M, Ryoo R, Rhee SK, Nou IS, Koo CD, Hong CP, Ryu H (2016) Whole genome de novo sequencing and genome annotation of the world popular cultivated edible mushroom, Lentinula edodes. J Biotechnol 223:24–25
Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2:ii215–ii225
Stanke M, Steinkamp R, Waack S, Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res 32:W309–W312
van der Burgt A, Severing E, Collemare J, de Wit PJ (2014) Automated alignment-based curation of gene models in filamentous fungi. BMC Bioinform 15:19
Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28:2184–2185
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–1875
Yandell M, Ence D (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342
Acknowledgements
This work was supported by the Strategic Initiative for Microbiomes in Agriculture and Food (Grant no. 914008-04) and by the Golden Seed Project (Grant no. 213007-05-1-SBH20) of the Ministry of Agriculture, Food and Rural Affairs of the Republic of Korea.
Author information
Authors and Affiliations
Contributions
SGP, DR, and HL developed the TaF server, evaluated the accuracy of gene prediction, and drafted the paper. HR and YJA performed the RNA isoform sequencing and gene prediction for L. edodes. JK and CPH conceived the study, participated in its design and coordination, and drafted the manuscript. All the authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
Sin-Gi Park, DongSung Ryu, Hyunsung Lee, Hojin Ryu, Yong Ju Ahn, Seung il Yoo, Junsu Ko, and Chang Pyo Hong declare that they do not have conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Park, SG., Ryu, D., Lee, H. et al. TaF: a web platform for taxonomic profile-based fungal gene prediction. Genes Genom 41, 337–342 (2019). https://doi.org/10.1007/s13258-018-0766-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13258-018-0766-1