Skip to main content

Plant Genome Annotation Methods

  • Protocol
  • First Online:
Plant Genomics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 513))

Summary

Annotation of plant genomic sequences can be separated into structural and functional annotation. Structural annotation is the foundation of all genomics as without accurate gene models understanding gene function or evolution of genes across taxa can be impeded. Structural annotation is dependent on sensitive, specific computational programs and deep experimental evidence to identify gene features within genomic DNA. Functional annotation is highly dependent on sequence similarity to other known genes or proteins as the majority of initial “first-pass” functional annotation on a genomic scale is transitive. Coupling structural and functional annotation across genomes in a comparative manner promotes more accurate annotation as well as an understanding of gene and genome evolution. With the increasing availability of plant genome sequence data, the value of comparative annotation will increase. As with any new field, methodologies are evolving for genome annotation and will improve in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2005) GenBank. Nucleic Acids Res. 33, D34–D38.

    Article  PubMed  CAS  Google Scholar 

  2. Fickett, J.W. (1996) The gene identification problem: an overview for developers. Comput. Chem. 20, 103–118.

    Article  PubMed  CAS  Google Scholar 

  3. Yao, H., Guo, L., Fu, Y., Borsuk, L.A., Wen, T.J., Skibbe, D.S., Cui, X., Scheffler, B.E., Cao, J., Emrich, S.J., et al. (2005) Evaluation of five ab initio gene prediction programs for the discovery of maize genes. Plant Mol. Biol. 57, 445–460.

    Article  PubMed  CAS  Google Scholar 

  4. Salamov, A.A. and Solovyev, V.V. (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522.

    Article  PubMed  CAS  Google Scholar 

  5. Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506.

    Article  PubMed  CAS  Google Scholar 

  6. Mathe, C., Sagot, M.F., Schiex, T., and Rouze, P. (2002) Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30, 4103–4117.

    Article  PubMed  CAS  Google Scholar 

  7. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467.

    Article  PubMed  CAS  Google Scholar 

  8. Ouyang, S. and Buell, C.R. (2004) The TIGR plant repeat databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res. 32, D360–D363.

    Article  PubMed  CAS  Google Scholar 

  9. Bedell, J.A., Korf, I., and Gish, W. (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16, 1040–1041.

    Article  PubMed  CAS  Google Scholar 

  10. Childs, K., Hamilton, J., Zhu, W., Ly, E., Cheung, F., Wu, H., Rabinowicz, P.D., Town, C.D., Buell, C.R., and Chan, A.P. (2007) The TIGR plant transcript assemblies database. Nucleic Acids Res. 35(Database issue), D846–D851.

    Article  Google Scholar 

  11. Gish, W. and States, D.J. (1993) Identification of protein coding regions by database similarity search. Nat. Genet. 3, 266–272.

    Article  PubMed  CAS  Google Scholar 

  12. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.

    Article  PubMed  CAS  Google Scholar 

  13. Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2005) The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159.

    Article  PubMed  CAS  Google Scholar 

  14. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. (2004) The Pfam protein families database. Nucleic Acids Res. 32, D138–D141.

    Article  PubMed  CAS  Google Scholar 

  15. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201–D205.

    Article  PubMed  CAS  Google Scholar 

  16. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., and Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120.

    Article  PubMed  CAS  Google Scholar 

  17. Lee, V., Camon, E., Dimmer, E., Barrell, D., and Apweiler, R. (2005) Who tangos with GOA?-use of Gene Ontology Annotation (GOA) for biological interpretation of ‘-omics’ data and for validation of automatic annotation tools. In Silico Biol. 5, 5–8.

    PubMed  CAS  Google Scholar 

  18. Haas, B.J., Wortman, J.R., Ronning, C.M., Hannick, L.I., Smith, R.K., Jr. Maiti, R., Chan, A.P., Yu, C., Farzad, M., Wu, D., et al. (2005) Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol. 3, 7.

    Article  PubMed  Google Scholar 

  19. Berardini, T.Z., Mundodi, S., Reiser, L., Huala, E., Garcia-Hernandez, M., Zhang, P., Mueller, L.A., Yoon, J., Doyle, A., Lander, G., et al. (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol. 135, 745–755.

    Article  PubMed  CAS  Google Scholar 

  20. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32, D262–D266.

    Article  PubMed  CAS  Google Scholar 

  21. Rensink, W.A. and Buell, C.R. (2005) Microarray expression profiling resources for plant genomics. Trends Plant Sci. 10, 603–609.

    Article  PubMed  CAS  Google Scholar 

  22. Ronning, C.M., Stegalkina, S.S., Ascenzi, R.A., Bougri, O., Hart, A.L., Utterbach, T.R., Vanaken, S.E., Riedmuller, S.B., White, J.A., Cho, J., et al. (2003) Comparative analyses of potato expressed sequence tag libraries. Plant Physiol. 131, 419–429.

    Article  PubMed  Google Scholar 

  23. Journet, E.P., van Tuinen, D., Gouzy, J., Crespeau, H., Carreau, V., Farmer, M.J., Niebel, A., Schiex, T., Jaillon, O., Chatagnier, O., et al. (2002) Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis. Nucleic Acids Res. 30, 5579–5592.

    Article  PubMed  Google Scholar 

  24. Nakano, M., Nobuta, K., Vemaraju, K., Tej, S.S., Skogen, J.W., and Meyers, B.C. (2006) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res. 34, D731–D735.

    Article  PubMed  CAS  Google Scholar 

  25. Meyers, B.C., Vu, T.H., Tej, S.S., Ghazal, H., Matvienko, M., Agrawal, V., Ning, J., and Haudenschild, C.D. (2004) Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat. Biotechnol. 22, 1006–1011.

    Article  PubMed  CAS  Google Scholar 

  26. Cheung, F., Haas, B.J., Goldberg, S.M., May, G.D., Xiao, Y., and Town, C.D. (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics 7, 272.

    Article  PubMed  Google Scholar 

  27. Alonso, J.M., Stepanova, A.N., Leisse, T.J., Kim, C.J., Chen, H., Shinn, P., Stevenson, D.K., Zimmerman, J., Barajas, P., Cheuk, R., et al. (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301, 653–657.

    Article  PubMed  Google Scholar 

  28. Jeong, D.H., An, S., Park, S., Kang, H.G., Park, G.G., Kim, S.R., Sim, J., Kim, Y.O., Kim, M.K., Kim, S.R., et al. (2006) Generation of a flanking sequence-tag database for activation-tagging lines in japonica rice. Plant J. 45, 123–132.

    Article  PubMed  CAS  Google Scholar 

  29. Greco, R., Ouwerkerk, P.B., Taal, A.J., Favalli, C., Beguiristain, T., Puigdomenech, P., Colombo, L., Hoge, J.H., and Pereira, A. (2001) Early and multiple Ac transpositions in rice suitable for efficient insertional mutagenesis. Plant Mol. Biol. 46, 215–227.

    Article  PubMed  CAS  Google Scholar 

  30. Kumar, C.S., Wing, R.A., and Sundaresan, V. (2005) Efficient insertional mutagenesis in rice using the maize En/Spm elements. Plant J. 44, 879–892.

    Article  PubMed  CAS  Google Scholar 

  31. Kim, C.M., Piao, H.L., Park, S.J., Chon, N.S., Je, B.I., Sun, B., Park, S.H., Park, J.Y., Lee, E.J., Kim, M.J., et al. (2004) Rapid, large-scale generation of Ds transposant lines and analysis of the Ds insertion sites in rice. Plant J. 39, 252–263.

    Article  PubMed  CAS  Google Scholar 

  32. Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610.

    Article  PubMed  CAS  Google Scholar 

  33. Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M.A., and Barrell, B. (2000) Artemis: sequence visualization and annotation. Bioinformatics 16, 944–945.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

We acknowledge the efforts of the TIGR Bioinformatics department that has made a number of robust annotation tools readily available to our group. Genome annotation is funded by grants to C.R.B from the National Science Foundation (DBI-0218166 and DBI-0321538). Note added in proof. Subsequent to the submission of this chapter, rice genome project at TIGR moved to Michigan State University (new URL http://rice.plantbiology.msu.edu/)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Robin Buell .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Ouyang, S., Thibaud-Nissen, F., Childs, K., Zhu, W., Buell, C. (2009). Plant Genome Annotation Methods. In: Gustafson, J., Langridge, P., Somers, D. (eds) Plant Genomics. Methods in Molecular Biology™, vol 513. Humana Press. https://doi.org/10.1007/978-1-59745-427-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-427-8_14

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-997-0

  • Online ISBN: 978-1-59745-427-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics