EST Processing: From Trace to Sequence

  • Ralf Schmid
  • Mark Blaxter
Part of the Methods in Molecular Biology book series (MIMB, volume 533)


A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from, which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web.

Key words

EST expressed sequence tags dbEST Bioinformatics PartiGene trace2dbest 



The authors would like to thank all contributors and users of trace2dbest, PartiGene, and other tools of the Edinburgh EST pipeline, in particular Alasdair Anthony, John Parkinson, James Wasmuth, and Ann Hedley. Funding was in part from the NERC Environmental Genomics Thematic Data Program.


  1. 1.
    Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C. (1991) Complementary-DNA Sequencing - Expressed Sequence Tags and Human Genome Project. Science 252, 1651–56.PubMedCrossRefGoogle Scholar
  2. 2.
    McCombie, W. R., Adams, M. D., Kelley, J. M., Fitzgerald, M. G., Utterback, T. R., Khan, M., Dubnick, M., Kerlavage, A. R., Venter, J. C., and Fields, C. (1992) Caenorhabditis-Elegans Expressed Sequence Tags Identify Gene Families and Potential Disease Gene Homologs. Nature Genetics 1, 124–31.PubMedCrossRefGoogle Scholar
  3. 3.
    Boguski, M. S., Lowe, T. M. J., and Tolstoshev, C. M. (1993) Dbest - Database for Expressed Sequence Tags. Nature Genetics 4, 332–33.PubMedCrossRefGoogle Scholar
  4. 4.
    Paquola, A. C. M., Nishyiama, M. Y., Reis, E. M., da Silva, A. M., and Verjovski-Almeida, S. (2003) ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics 19, 1587–88.PubMedCrossRefGoogle Scholar
  5. 5.
    D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6, S9.PubMedCrossRefGoogle Scholar
  6. 6.
    Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene - constructing partial genomes. Bioinformatics 20, 1398–404.PubMedCrossRefGoogle Scholar
  7. 7.
    Rudd, S., Mewes, H. W., and Mayer, K. F. X. (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Research 31, 128–32.PubMedCrossRefGoogle Scholar
  8. 8.
    Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., and Hide, W. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res 29, 234–38.PubMedCrossRefGoogle Scholar
  9. 9.
    Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–52.PubMedCrossRefGoogle Scholar
  10. 10.
    Parkinson, J., Whitton, C., Schmid, R., Thomson, M., and Blaxter, M. (2004) NEMBASE: a resource for parasitic nematode ESTs. Nucleic Acids Res 32, D427–D30.PubMedCrossRefGoogle Scholar
  11. 11.
    Sturzenbaum, S. R., Parkinson, J., Blaxter, M., Morgan, A. J., Kille, P., and Georgiev, O. (2003) The earthworm Expressed Sequence Tag project. Pedobiologia 47, 447–51.Google Scholar
  12. 12.
    Peregrin-Alvarez, J. M., Yam, A., Sivakumar, G., and Parkinson, J. (2005) PartiGeneDB - collating partial genomes. Nucleic Acids Res 33, D303–D07.PubMedCrossRefGoogle Scholar
  13. 13.
    Wasmuth, J. D., and Blaxter, M. L. (2004) Prot4EST: Translating Expressed Sequence Tags from neglected genomes. Bmc Bioinformatics 5, 187.PubMedCrossRefGoogle Scholar
  14. 14.
    Schmid, R., and Blaxter, M. L. (2008) annot8r: GO, EC and KEGG annotation of EST datasets. BMC Bioinformatics 9, 130.Google Scholar
  15. 15.
    Anthony, A., and Blaxter, M. wwwPartiGene unpublished.Google Scholar
  16. 16.
    Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175–85.PubMedGoogle Scholar
  17. 17.
    Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186–94.PubMedGoogle Scholar
  18. 18.
    Green, P. phrap unpublished.Google Scholar
  19. 19.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic Local Alignment Search Tool. J Mol Biol 215, 403–10.PubMedGoogle Scholar
  20. 20.
    Parkinson, J., Guiliano, D. B., and Blaxter, M. (2002) Making sense of EST sequences by CLOBBing them. Bmc Bioinformatics 3.Google Scholar
  21. 21.
    Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G. R., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res 12, 1611–18.PubMedCrossRefGoogle Scholar
  22. 22.
    Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H. Z., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. L. (2005) The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D59.PubMedCrossRefGoogle Scholar
  23. 23.
    Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology Nature Genetics 25, 25–29.PubMedCrossRefGoogle Scholar
  24. 24.
    Bairoch, A. (2000) The ENZYME database in 2000 Nucleic Acids Res 28, 304–05.PubMedCrossRefGoogle Scholar
  25. 25.
    Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Res 28, 27–30.PubMedCrossRefGoogle Scholar
  26. 26.
    Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Pagni, M., Pointing, C. P., Quevillon, E., Selengut, J., Sigrist, C. J. A., Silventoinen, V., Studholme, D. J., Vaughan, R., and Wu, C. H. (2005) InterPro, progress and status in 2005 Nucleic Acids Res 33, D201–D05.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Ralf Schmid
    • 1
  • Mark Blaxter
    • 2
  1. 1.Department of BiochemistryUniversity of LeicesterLeicesterUK
  2. 2.Institute of Evolutionary BiologyUniversity of EdinburghEdinburghUK

Personalised recommendations