EST Processing: From Trace to Sequence

Schmid, Ralf; Blaxter, Mark

doi:10.1007/978-1-60327-136-3_9

Ralf Schmid² &
Mark Blaxter³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 533))

2367 Accesses
3 Citations

Abstract

A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from http://www.nematodes.org/bioinformatics/), which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C. (1991) Complementary-DNA Sequencing - Expressed Sequence Tags and Human Genome Project. Science 252, 1651–56.
Article PubMed CAS Google Scholar
McCombie, W. R., Adams, M. D., Kelley, J. M., Fitzgerald, M. G., Utterback, T. R., Khan, M., Dubnick, M., Kerlavage, A. R., Venter, J. C., and Fields, C. (1992) Caenorhabditis-Elegans Expressed Sequence Tags Identify Gene Families and Potential Disease Gene Homologs. Nature Genetics 1, 124–31.
Article PubMed CAS Google Scholar
Boguski, M. S., Lowe, T. M. J., and Tolstoshev, C. M. (1993) Dbest - Database for Expressed Sequence Tags. Nature Genetics 4, 332–33.
Article PubMed CAS Google Scholar
Paquola, A. C. M., Nishyiama, M. Y., Reis, E. M., da Silva, A. M., and Verjovski-Almeida, S. (2003) ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics 19, 1587–88.
Article PubMed CAS Google Scholar
D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6, S9.
Article PubMed Google Scholar
Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene - constructing partial genomes. Bioinformatics 20, 1398–404.
Article PubMed CAS Google Scholar
Rudd, S., Mewes, H. W., and Mayer, K. F. X. (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Research 31, 128–32.
Article PubMed CAS Google Scholar
Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., and Hide, W. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res 29, 234–38.
Article PubMed CAS Google Scholar
Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–52.
Article PubMed CAS Google Scholar
Parkinson, J., Whitton, C., Schmid, R., Thomson, M., and Blaxter, M. (2004) NEMBASE: a resource for parasitic nematode ESTs. Nucleic Acids Res 32, D427–D30.
Article PubMed CAS Google Scholar
Sturzenbaum, S. R., Parkinson, J., Blaxter, M., Morgan, A. J., Kille, P., and Georgiev, O. (2003) The earthworm Expressed Sequence Tag project. Pedobiologia 47, 447–51.
Google Scholar
Peregrin-Alvarez, J. M., Yam, A., Sivakumar, G., and Parkinson, J. (2005) PartiGeneDB - collating partial genomes. Nucleic Acids Res 33, D303–D07.
Article PubMed CAS Google Scholar
Wasmuth, J. D., and Blaxter, M. L. (2004) Prot4EST: Translating Expressed Sequence Tags from neglected genomes. Bmc Bioinformatics 5, 187.
Article PubMed Google Scholar
Schmid, R., and Blaxter, M. L. (2008) annot8r: GO, EC and KEGG annotation of EST datasets. BMC Bioinformatics 9, 130.
Google Scholar
Anthony, A., and Blaxter, M. wwwPartiGene unpublished.
Google Scholar
Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175–85.
PubMed CAS Google Scholar
Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186–94.
PubMed CAS Google Scholar
Green, P. phrap unpublished.
Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic Local Alignment Search Tool. J Mol Biol 215, 403–10.
PubMed CAS Google Scholar
Parkinson, J., Guiliano, D. B., and Blaxter, M. (2002) Making sense of EST sequences by CLOBBing them. Bmc Bioinformatics 3.
Google Scholar
Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G. R., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res 12, 1611–18.
Article PubMed CAS Google Scholar
Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H. Z., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. L. (2005) The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D59.
Article PubMed CAS Google Scholar
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology Nature Genetics 25, 25–29.
Article PubMed CAS Google Scholar
Bairoch, A. (2000) The ENZYME database in 2000 Nucleic Acids Res 28, 304–05.
Article PubMed CAS Google Scholar
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Res 28, 27–30.
Article PubMed CAS Google Scholar
Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Pagni, M., Pointing, C. P., Quevillon, E., Selengut, J., Sigrist, C. J. A., Silventoinen, V., Studholme, D. J., Vaughan, R., and Wu, C. H. (2005) InterPro, progress and status in 2005 Nucleic Acids Res 33, D201–D05.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank all contributors and users of trace2dbest, PartiGene, and other tools of the Edinburgh EST pipeline, in particular Alasdair Anthony, John Parkinson, James Wasmuth, and Ann Hedley. Funding was in part from the NERC Environmental Genomics Thematic Data Program.

Author information

Authors and Affiliations

Department of Biochemistry, University of Leicester, Leicester, UK
Ralf Schmid
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
Mark Blaxter

Authors

Ralf Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Mark Blaxter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Molecular Structure and Function Hospital for Sick Children, Departments of Biochemistry & Molecular Genetics, University of Toronto, Toronto, ON, Canada
John Parkinson

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Schmid, R., Blaxter, M. (2009). EST Processing: From Trace to Sequence. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_9

Download citation

DOI: https://doi.org/10.1007/978-1-60327-136-3_9
Published: 10 March 2009
Publisher Name: Humana Press
Print ISBN: 978-1-58829-759-4
Online ISBN: 978-1-60327-136-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics