Crop Genome Annotation: A Case Study for the Brassica rapa Genome

Pang, Erli; Cao, Huifeng; Zhang, Bowen; Lin, Kui

doi:10.1007/978-3-662-47901-8_5

Crop Genome Annotation: A Case Study for the Brassica rapa Genome

Erli Pang⁴,
Huifeng Cao⁴,
Bowen Zhang⁴ &
…
Kui Lin⁴

Chapter
First Online: 01 January 2015

1794 Accesses

Part of the book series: Compendium of Plant Genomes ((CPG))

Abstract

Genome annotation is crucial for the bridging the gap between sequence and biology. Nonetheless, it is also a dynamic and continuous improvement process for better understanding of the molecular biology of the genome. With the deep RNA-sequencing of eight Brassica rapa tissues, it should be able to predict protein-coding genes with more accuracy when incorporating this type of RNA information into analysis. In doing so, we used our built annotation pipeline to re-annotate the B. rapa genome on the levels of repetitive elements, protein-coding genes and non-coding RNA genes, respectively. In total, we identified 139.9 MB repetitive elements, 6,088 non-coding RNA genes and 45,149 protein-coding genes, respectively. These results, together with those published previously, would provide a valuable resource for further understanding of B. rapa.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21:3596–3603
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Apweiler R, Martin MJ, O’Donovan C, Magrane M, Alam-Faruque Y et al (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43–D47
Article Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29
Article PubMed Central CAS PubMed Google Scholar
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B et al (2005) The universal protein resource (UniProt). Nucleic Acids Res 33:D154–D159
Article PubMed Central CAS PubMed Google Scholar
Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14:988–995
Article PubMed Central CAS PubMed Google Scholar
Brent MR (2008) Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 9:62–73
Article CAS PubMed Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Article CAS PubMed Google Scholar
Campbell MS, Law M, Holt C, Stein JC, Moghe GD et al (2014) MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164:513–524
Article PubMed Central CAS PubMed Google Scholar
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18:188–196
Article PubMed Central CAS PubMed Google Scholar
Childs KL, Hamilton JP, Zhu W, Ly E, Cheung F et al (2007) The TIGR plant transcript assemblies database. Nucleic Acids Res 35:D846–D851
Article PubMed Central CAS PubMed Google Scholar
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Article CAS PubMed Google Scholar
Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E et al (2004) The Ensembl automatic gene annotation system. Genome Res 14:942–950
Article PubMed Central CAS PubMed Google Scholar
DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M et al (2007) Conrad: gene prediction using conditional random fields. Genome Res 17:1389–1398
Article PubMed Central CAS PubMed Google Scholar
Denoeud F, Aury J-M, Da Silva C, Noel B, Rogier O et al (2008) Annotating genomes with massive-scale RNA sequencing. Genome Biol 9:R175
Article PubMed Central PubMed Google Scholar
Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21:I152–I158
Article CAS PubMed Google Scholar
Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS et al (2007) Creating a honey bee consensus gene set. Genome Biol 8:R13
Article PubMed Central PubMed Google Scholar
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512
Article CAS PubMed Google Scholar
Flicek P, Amode MR, Barrell D, Beal K, Billis K et al (2014) Ensembl 2014. Nucleic Acids Res 42:D749–D755
Article PubMed Central CAS PubMed Google Scholar
Gardner PP, Daub J, Tate J, Moore BL, Osuch IH et al (2011) Rfam: wikipedia, clans and the “decimal” release. Nucleic Acids Res 39:D141–D145
Article PubMed Central CAS PubMed Google Scholar
Gotoh O (2008) Direct mapping and alignment of protein sequences onto genomic sequence. Bioinformatics 24:2438–2444
Article CAS PubMed Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Article PubMed Central CAS PubMed Google Scholar
Gross SS, Do CB, Sirota M, Batzoglou S (2007) CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol 8:R269
Article PubMed Central PubMed Google Scholar
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK et al (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31:5654–5666
Article PubMed Central CAS PubMed Google Scholar
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9:R7
Article PubMed Central PubMed Google Scholar
Huang X, Adams MD, Zhou H, Kerlavage AR (1997) A tool for analyzing and annotating genomic sequences. Genomics 46:37–45
Article CAS PubMed Google Scholar
Jones P, Binns D, Chang H-Y, Fraser M, Li W, et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
Google Scholar
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467
Article CAS PubMed Google Scholar
Keller O, Odronitz F, Stanke M, Kollmar M, Waack S (2008) Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics 9:278
Article PubMed Central PubMed Google Scholar
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664
Article PubMed Central CAS PubMed Google Scholar
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59
Article PubMed Central PubMed Google Scholar
Korf I, Flicek P, Duan D, Brent MR (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140–S148
Article PubMed Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Article PubMed Central PubMed Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Article PubMed Central CAS PubMed Google Scholar
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589–595
Article PubMed Central PubMed Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
Article CAS PubMed Google Scholar
Li Z, Zhang Z, Yan P, Huang S, Fei Z et al (2011) RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genom 12:540
Article CAS Google Scholar
Liang CZ, Mao L, Ware D, Stein L (2009) Evidence-based gene predictions in plant genomes. Genome Res 19:1912–1923
Article PubMed Central CAS PubMed Google Scholar
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:0955–0964
Article CAS Google Scholar
Lowe TM, Eddy SR (1999) A computational screen for methylation guide snoRNAs in yeast. Science 283:1168–1171
Article CAS PubMed Google Scholar
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115
Article PubMed Central CAS PubMed Google Scholar
Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20:2878–2879
Article CAS PubMed Google Scholar
Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci 13:477–478
CAS PubMed Google Scholar
Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335–1337
Article PubMed Central CAS PubMed Google Scholar
Ouyang S, Buell CR (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32:D360–D363
Article PubMed Central CAS PubMed Google Scholar
Parra G, Blanco E, Guigo R (2000) GeneID in Drosophila. Genome Res 10:511–515
Article PubMed Central CAS PubMed Google Scholar
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21:I351–I358
Article CAS PubMed Google Scholar
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J et al (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301
Article PubMed Central CAS PubMed Google Scholar
Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522
Article PubMed Central CAS PubMed Google Scholar
Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH (2009) JBrowse: a next-generation genome browser. Genome Res 19:1630–1638
Article PubMed Central CAS PubMed Google Scholar
Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31
Article PubMed Central PubMed Google Scholar
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:II215–II225
Article PubMed Google Scholar
Stanke M, Diekhans M, Baertsch R, Haussler D (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644
Article CAS PubMed Google Scholar
Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503
Article CAS PubMed Google Scholar
Stein LD, Mungall C, Shu S, Caudy M, Mangone M et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610
Article PubMed Central CAS PubMed Google Scholar
Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18:1979–1990
Article PubMed Central CAS PubMed Google Scholar
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111
Article PubMed Central CAS PubMed Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Article PubMed Central CAS PubMed Google Scholar
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63
Article PubMed Central CAS PubMed Google Scholar
Wang X, Wang H, Wang J, Sun R, Wu J et al (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1039
Article CAS PubMed Google Scholar
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–1875
Article CAS PubMed Google Scholar
Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35:W265–W268
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant: 31171235). We thank to the people who have contributed to the building and maintaining of the genome annotation pipeline in the Laboratory of Computational Molecular Biology of the Beijing Normal University.

Author information

Authors and Affiliations

College of Life Sciences, Beijing Normal University, Beijing, 100875, China
Erli Pang, Huifeng Cao, Bowen Zhang & Kui Lin

Authors

Erli Pang
View author publications
You can also search for this author in PubMed Google Scholar
Huifeng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kui Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kui Lin .

Editor information

Editors and Affiliations

Chinese Academy of Agricultural Sciences, Institute of Vegetables and Flowers, Beijing, China
Xiaowu Wang
Department of Genetics & Plant Breeding, Bidhan Chandra Krishi Viswavidyalaya, Mohanpur, Nadia, West Bengal, India
Chittaranjan Kole

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pang, E., Cao, H., Zhang, B., Lin, K. (2015). Crop Genome Annotation: A Case Study for the Brassica rapa Genome. In: Wang, X., Kole, C. (eds) The Brassica rapa Genome. Compendium of Plant Genomes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47901-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-47901-8_5
Published: 06 September 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47900-1
Online ISBN: 978-3-662-47901-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics