Skip to main content

Obtaining Accurate Translations from Expressed Sequence Tags

  • Protocol
  • First Online:
Expressed Sequence Tags (ESTs)

Part of the book series: Methods in Molecular Biology ((MIMB,volume 533))

Abstract

The genomes of an increasing number of species are being investigated through the generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects. As part of our ongoing EST programs investigating these “neglected” genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We describe how this integrated approach goes a long way to overcoming the deficit in training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk-Genevaux, P. S., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Orengo, C., Petryszak, R., Selengut, J. D., Sigrist, C. J., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., and Yeats, C. (2007) New developments in the InterPro database. Nucleic Acids Res 35, D224–8.

    Article  PubMed  CAS  Google Scholar 

  2. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database. Nucleic Acids Res 32 Database issue, D138–41.

    Article  PubMed  CAS  Google Scholar 

  3. Guo, J. T., Ellrott, K., and Xu, Y. (2007) Preface. Methods Mol Biol 413, 3–42.

    Article  Google Scholar 

  4. Wasmuth, J. D., and Blaxter, M. L. (2004) prot4EST: Translating Expressed Sequence Tags from neglected genomes. BMC Bioinformatics 5, 187.

    Article  PubMed  Google Scholar 

  5. Altschul, S. F., and Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST–a tool for discovery in protein databases. Trends Biochem Sci 23, 444–7.

    Article  PubMed  CAS  Google Scholar 

  6. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–402.

    Article  PubMed  CAS  Google Scholar 

  7. Pearson, W. P. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology 183, 63–98.

    Article  PubMed  CAS  Google Scholar 

  8. Cuff, J. A., Birney, E., Clamp, M. E., and Barton, G. J. (2000) ProtEST: protein multiple sequence alignments from expressed sequence tags. Bioinformatics 16, 111–6.

    Article  PubMed  CAS  Google Scholar 

  9. Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186–94.

    PubMed  CAS  Google Scholar 

  10. Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175–85.

    PubMed  CAS  Google Scholar 

  11. Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2008) On the extent and origins of genic novelty in the Phylum Nematoda. PLoS Negl Trop Dis, 2, e258.

    Google Scholar 

  12. Hatzigeorgiou, A. G., Fiziev, P., and Reczko, M. (2001) DIANA-EST: a statistical analysis. Bioinformatics 17, 913–9.

    Article  PubMed  CAS  Google Scholar 

  13. Iseli, C., Jongeneel, C. V., and Bucher, P. (1999) in “Proc Int Conf Intell Syst Mol Biol”, 138–48.

    Google Scholar 

  14. Fukunishi, Y., and Hayashizaki, Y. (2001) Amino acid translation program for full-length cDNA sequences with frameshift errors. Physiol Genomics 5, 81–7.

    PubMed  CAS  Google Scholar 

  15. Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene - constructing partial genomes. Bioinformatics 20, 1398–404.

    Article  PubMed  CAS  Google Scholar 

  16. The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36, D190–5.

    Article  Google Scholar 

  17. Papanicolaou, A., Joron, M., McMillan, W. O., Blaxter, M. L., and Jiggins, C. D. (2005) Genomic tools and cDNA derived markers for butterflies. Mol Ecol 14, 2883–97.

    Article  PubMed  CAS  Google Scholar 

  18. Crosby, M. A., Goodman, J. L., Strelets, V. B., Zhang, P., and Gelbart, W. M. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res 35, D486–91.

    Article  PubMed  CAS  Google Scholar 

  19. Wuyts, J., Van de Peer, Y., Winkelmans, T., and De Wachter, R. (2002) The European database on small subunit ribosomal RNA. Nucleic Acids Res 30, 183–5.

    Article  PubMed  CAS  Google Scholar 

  20. Nakamura, Y., Gojobori, T., and Ikemura, T. (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28, 292.

    Article  PubMed  CAS  Google Scholar 

  21. Kulikova, T., Akhtar, R., Aldebert, P., Althorpe, N., Andersson, M., Baldwin, A., Bates, K., Bhattacharyya, S., Bower, L., Browne, P., Castro, M., Cochrane, G., Duggan, K., Eberhardt, R., Faruque, N., Hoad, G., Kanz, C., Lee, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., Lorenc, D., McWilliam, H., Mukherjee, G., Nardone, F., Pastor, M. P., Plaister, S., Sobhany, S., Stoehr, P., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2007) EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res 35, D16–20.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We would like to thank our colleagues in Edinburgh and Toronto for support and user feedback, and the authors of the Decoder and ESTScan programs for permission to use their code. This work was carried out while JW was a BBSRC-funded PhD student.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Wasmuth, J., Blaxter, M. (2009). Obtaining Accurate Translations from Expressed Sequence Tags. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-136-3_10

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-759-4

  • Online ISBN: 978-1-60327-136-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics