Skip to main content

EST Processing: From Trace to Sequence

  • Protocol
  • First Online:
Expressed Sequence Tags (ESTs)

Part of the book series: Methods in Molecular Biology ((MIMB,volume 533))

Abstract

A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from http://www.nematodes.org/bioinformatics/), which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C. (1991) Complementary-DNA Sequencing - Expressed Sequence Tags and Human Genome Project. Science 252, 1651–56.

    Article  PubMed  CAS  Google Scholar 

  2. McCombie, W. R., Adams, M. D., Kelley, J. M., Fitzgerald, M. G., Utterback, T. R., Khan, M., Dubnick, M., Kerlavage, A. R., Venter, J. C., and Fields, C. (1992) Caenorhabditis-Elegans Expressed Sequence Tags Identify Gene Families and Potential Disease Gene Homologs. Nature Genetics 1, 124–31.

    Article  PubMed  CAS  Google Scholar 

  3. Boguski, M. S., Lowe, T. M. J., and Tolstoshev, C. M. (1993) Dbest - Database for Expressed Sequence Tags. Nature Genetics 4, 332–33.

    Article  PubMed  CAS  Google Scholar 

  4. Paquola, A. C. M., Nishyiama, M. Y., Reis, E. M., da Silva, A. M., and Verjovski-Almeida, S. (2003) ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics 19, 1587–88.

    Article  PubMed  CAS  Google Scholar 

  5. D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6, S9.

    Article  PubMed  Google Scholar 

  6. Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene - constructing partial genomes. Bioinformatics 20, 1398–404.

    Article  PubMed  CAS  Google Scholar 

  7. Rudd, S., Mewes, H. W., and Mayer, K. F. X. (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Research 31, 128–32.

    Article  PubMed  CAS  Google Scholar 

  8. Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., and Hide, W. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res 29, 234–38.

    Article  PubMed  CAS  Google Scholar 

  9. Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–52.

    Article  PubMed  CAS  Google Scholar 

  10. Parkinson, J., Whitton, C., Schmid, R., Thomson, M., and Blaxter, M. (2004) NEMBASE: a resource for parasitic nematode ESTs. Nucleic Acids Res 32, D427–D30.

    Article  PubMed  CAS  Google Scholar 

  11. Sturzenbaum, S. R., Parkinson, J., Blaxter, M., Morgan, A. J., Kille, P., and Georgiev, O. (2003) The earthworm Expressed Sequence Tag project. Pedobiologia 47, 447–51.

    Google Scholar 

  12. Peregrin-Alvarez, J. M., Yam, A., Sivakumar, G., and Parkinson, J. (2005) PartiGeneDB - collating partial genomes. Nucleic Acids Res 33, D303–D07.

    Article  PubMed  CAS  Google Scholar 

  13. Wasmuth, J. D., and Blaxter, M. L. (2004) Prot4EST: Translating Expressed Sequence Tags from neglected genomes. Bmc Bioinformatics 5, 187.

    Article  PubMed  Google Scholar 

  14. Schmid, R., and Blaxter, M. L. (2008) annot8r: GO, EC and KEGG annotation of EST datasets. BMC Bioinformatics 9, 130.

    Google Scholar 

  15. Anthony, A., and Blaxter, M. wwwPartiGene unpublished.

    Google Scholar 

  16. Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175–85.

    PubMed  CAS  Google Scholar 

  17. Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186–94.

    PubMed  CAS  Google Scholar 

  18. Green, P. phrap unpublished.

    Google Scholar 

  19. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic Local Alignment Search Tool. J Mol Biol 215, 403–10.

    PubMed  CAS  Google Scholar 

  20. Parkinson, J., Guiliano, D. B., and Blaxter, M. (2002) Making sense of EST sequences by CLOBBing them. Bmc Bioinformatics 3.

    Google Scholar 

  21. Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G. R., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res 12, 1611–18.

    Article  PubMed  CAS  Google Scholar 

  22. Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H. Z., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. L. (2005) The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D59.

    Article  PubMed  CAS  Google Scholar 

  23. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology Nature Genetics 25, 25–29.

    Article  PubMed  CAS  Google Scholar 

  24. Bairoch, A. (2000) The ENZYME database in 2000 Nucleic Acids Res 28, 304–05.

    Article  PubMed  CAS  Google Scholar 

  25. Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Res 28, 27–30.

    Article  PubMed  CAS  Google Scholar 

  26. Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Pagni, M., Pointing, C. P., Quevillon, E., Selengut, J., Sigrist, C. J. A., Silventoinen, V., Studholme, D. J., Vaughan, R., and Wu, C. H. (2005) InterPro, progress and status in 2005 Nucleic Acids Res 33, D201–D05.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank all contributors and users of trace2dbest, PartiGene, and other tools of the Edinburgh EST pipeline, in particular Alasdair Anthony, John Parkinson, James Wasmuth, and Ann Hedley. Funding was in part from the NERC Environmental Genomics Thematic Data Program.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Schmid, R., Blaxter, M. (2009). EST Processing: From Trace to Sequence. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-136-3_9

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-759-4

  • Online ISBN: 978-1-60327-136-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics