Skip to main content

EST Databases and Web Tools for EST Projects

  • Protocol
  • First Online:
Expressed Sequence Tags (ESTs)

Part of the book series: Methods in Molecular Biology ((MIMB,volume 533))

Abstract

This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Keeling, P. J., Burger, G., Durnford, D. G., Lang, B. F., Lee, R. W., Pearlman, R. E., Roger, A. J., and Gray, M. W. (2005) The tree of eukaryotes. Trends Ecol Evol 20, 670–6.

    Article  PubMed  Google Scholar 

  2. O'Brien, E. A., Koski, L. B., Zhang, Y., Yang, L., Wang, E., Gray, M. W., Burger, G., and Lang, B. F. (2007) TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res 35, D445–51.

    Article  PubMed  Google Scholar 

  3. Koski, L. B., Gray, M. W., Lang, B. F., and Burger, G. (2005) AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics 6, 151.

    Article  PubMed  Google Scholar 

  4. Kumar, C. G., LeDuc, R., Gong, G., Roinishivili, L., Lewin, H. A., and Liu, L. (2004) ESTIMA, a tool for EST management in a multi-project environment. BMC Bioinformatics 5, 176.

    Article  PubMed  Google Scholar 

  5. Korth, H.F. and Silberschatz, A. (1991) Database System Concepts (2nd edn.). McGraw-Hill, Columbus, Ohio

    Google Scholar 

  6. Date, C.J. (2000) An Introduction to Database Systems (7th edn.). Addison-Wesley, Boston, Massachusetts.

    Google Scholar 

  7. D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6 Suppl 4, S9.

    Article  PubMed  Google Scholar 

  8. Ayoubi, P., Jin, X., Leite, S., Liu, X., Martajaja, J., Abduraham, A., Wan, Q., Yan, W., Misawa, E., and Prade, R. A. (2002) PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Res 30, 4761–9.

    Article  PubMed  CAS  Google Scholar 

  9. Lottaz C., Iseli, C., Jongeneel, C.V., and Bucher, P. (2003) Modeling sequencing errors by combining Hidden Markov models. Bioinformatics 19, ii103–ii112.

    Google Scholar 

  10. Hatzigeorgiou, A. G., Fiziev, P., and Reczko, M. (2001) DIANA-EST: a statistical analysis. Bioinformatics 17, 913–9.

    Article  PubMed  CAS  Google Scholar 

  11. Wuyts, J., Perriere, G., and Van De Peer, Y. (2004) The European ribosomal RNA database. Nucleic Acids Res 32, D101–3.

    Article  PubMed  CAS  Google Scholar 

  12. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32, D262–6.

    Article  PubMed  CAS  Google Scholar 

  13. Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32, D115–9.

    Article  PubMed  CAS  Google Scholar 

  14. Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I., Yin, J. J., and Natale, D. A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.

    Article  PubMed  Google Scholar 

  15. Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–7.

    Article  PubMed  CAS  Google Scholar 

  16. Kanehisa, M., and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30.

    Article  PubMed  CAS  Google Scholar 

  17. Sonnhammer, E. L., Eddy, S. R., and Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–20.

    Article  PubMed  CAS  Google Scholar 

  18. Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95, 5857–64.

    Article  PubMed  CAS  Google Scholar 

  19. Klein, P., Kanehisa, M., and DeLisi, C. (1984) Prediction of protein function from sequence properties. Discriminant analysis of a data base. Biochim Biophys Acta 787, 221–6.

    Article  PubMed  CAS  Google Scholar 

  20. Jensen, L. J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Staerfeldt, H. H., Rapacki, K., and Workman, C. (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319, 1257–65.

    Article  PubMed  CAS  Google Scholar 

  21. Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299, 501–22.

    Article  Google Scholar 

  22. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg, D. (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–6.

    Article  PubMed  CAS  Google Scholar 

  23. Enright, A. J., Iliopoulos, I., Kyrpides, N. C., and Ouzounis, C. A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90.

    Article  PubMed  CAS  Google Scholar 

  24. Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G. D., and Maltsev, N. (1999) The use of gene clusters to infer functional coupling. PNAS 96, 2896–901.

    Article  PubMed  CAS  Google Scholar 

  25. Ettema, T., van der Oost, J., and Huynen, M. (2001) Modularity in the gain and loss of genes: applications for function prediction. Trends in Genetics 17, 485–7.

    Article  PubMed  CAS  Google Scholar 

  26. Zheng, Y., Roberts, R. J., and Kasif, S. (2002) Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biology 3, research0060.1–60.9.

    Article  Google Scholar 

  27. Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci 96, 4285–88.

    Article  PubMed  CAS  Google Scholar 

  28. King, R. D., Karwath, A., Clare, A., and Dehaspe, L. (2000) Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17, 283–93.

    Article  PubMed  CAS  Google Scholar 

  29. Hua, S., and Sun, Z. (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–8.

    Article  PubMed  CAS  Google Scholar 

  30. Nair, R., and Rost, B. (2005) Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 348, 85–100.

    Article  PubMed  CAS  Google Scholar 

  31. Xie, D., Li, A., Wang, M., Fan, Z., and Feng, H. (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33, W105–10.

    Article  PubMed  CAS  Google Scholar 

  32. Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., and Miyano, S. (2002) Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305.

    Article  PubMed  CAS  Google Scholar 

  33. Guda, C., Fahy, E., and Subramaniam, S. (2004) MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 20, 1785–94.

    Article  PubMed  CAS  Google Scholar 

  34. Bhasin, M., and Raghava, G. P. (2004) ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res 32, W414–9.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Shen, YQ., O’Brien, E., Koski, L., Lang, B.F., Burger, G. (2009). EST Databases and Web Tools for EST Projects. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-136-3_11

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-759-4

  • Online ISBN: 978-1-60327-136-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics