EST Databases and Web Tools for EST Projects

Shen, Yao-Qing; O’Brien, Emmet; Koski, Liisa; Lang, B. Franz; Burger, Gertraud

doi:10.1007/978-1-60327-136-3_11

Yao-Qing Shen²,
Emmet O’Brien²,
Liisa Koski²,
B. Franz Lang² &
…
Gertraud Burger²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 533))

2391 Accesses
1 Citations

Abstract

This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Keeling, P. J., Burger, G., Durnford, D. G., Lang, B. F., Lee, R. W., Pearlman, R. E., Roger, A. J., and Gray, M. W. (2005) The tree of eukaryotes. Trends Ecol Evol 20, 670–6.
Article PubMed Google Scholar
O'Brien, E. A., Koski, L. B., Zhang, Y., Yang, L., Wang, E., Gray, M. W., Burger, G., and Lang, B. F. (2007) TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res 35, D445–51.
Article PubMed Google Scholar
Koski, L. B., Gray, M. W., Lang, B. F., and Burger, G. (2005) AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics 6, 151.
Article PubMed Google Scholar
Kumar, C. G., LeDuc, R., Gong, G., Roinishivili, L., Lewin, H. A., and Liu, L. (2004) ESTIMA, a tool for EST management in a multi-project environment. BMC Bioinformatics 5, 176.
Article PubMed Google Scholar
Korth, H.F. and Silberschatz, A. (1991) Database System Concepts (2nd edn.). McGraw-Hill, Columbus, Ohio
Google Scholar
Date, C.J. (2000) An Introduction to Database Systems (7th edn.). Addison-Wesley, Boston, Massachusetts.
Google Scholar
D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6 Suppl 4, S9.
Article PubMed Google Scholar
Ayoubi, P., Jin, X., Leite, S., Liu, X., Martajaja, J., Abduraham, A., Wan, Q., Yan, W., Misawa, E., and Prade, R. A. (2002) PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Res 30, 4761–9.
Article PubMed CAS Google Scholar
Lottaz C., Iseli, C., Jongeneel, C.V., and Bucher, P. (2003) Modeling sequencing errors by combining Hidden Markov models. Bioinformatics 19, ii103–ii112.
Google Scholar
Hatzigeorgiou, A. G., Fiziev, P., and Reczko, M. (2001) DIANA-EST: a statistical analysis. Bioinformatics 17, 913–9.
Article PubMed CAS Google Scholar
Wuyts, J., Perriere, G., and Van De Peer, Y. (2004) The European ribosomal RNA database. Nucleic Acids Res 32, D101–3.
Article PubMed CAS Google Scholar
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32, D262–6.
Article PubMed CAS Google Scholar
Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32, D115–9.
Article PubMed CAS Google Scholar
Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I., Yin, J. J., and Natale, D. A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.
Article PubMed Google Scholar
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–7.
Article PubMed CAS Google Scholar
Kanehisa, M., and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30.
Article PubMed CAS Google Scholar
Sonnhammer, E. L., Eddy, S. R., and Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–20.
Article PubMed CAS Google Scholar
Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95, 5857–64.
Article PubMed CAS Google Scholar
Klein, P., Kanehisa, M., and DeLisi, C. (1984) Prediction of protein function from sequence properties. Discriminant analysis of a data base. Biochim Biophys Acta 787, 221–6.
Article PubMed CAS Google Scholar
Jensen, L. J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Staerfeldt, H. H., Rapacki, K., and Workman, C. (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319, 1257–65.
Article PubMed CAS Google Scholar
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299, 501–22.
Article Google Scholar
Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg, D. (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–6.
Article PubMed CAS Google Scholar
Enright, A. J., Iliopoulos, I., Kyrpides, N. C., and Ouzounis, C. A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90.
Article PubMed CAS Google Scholar
Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G. D., and Maltsev, N. (1999) The use of gene clusters to infer functional coupling. PNAS 96, 2896–901.
Article PubMed CAS Google Scholar
Ettema, T., van der Oost, J., and Huynen, M. (2001) Modularity in the gain and loss of genes: applications for function prediction. Trends in Genetics 17, 485–7.
Article PubMed CAS Google Scholar
Zheng, Y., Roberts, R. J., and Kasif, S. (2002) Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biology 3, research0060.1–60.9.
Article Google Scholar
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci 96, 4285–88.
Article PubMed CAS Google Scholar
King, R. D., Karwath, A., Clare, A., and Dehaspe, L. (2000) Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17, 283–93.
Article PubMed CAS Google Scholar
Hua, S., and Sun, Z. (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–8.
Article PubMed CAS Google Scholar
Nair, R., and Rost, B. (2005) Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 348, 85–100.
Article PubMed CAS Google Scholar
Xie, D., Li, A., Wang, M., Fan, Z., and Feng, H. (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33, W105–10.
Article PubMed CAS Google Scholar
Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., and Miyano, S. (2002) Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305.
Article PubMed CAS Google Scholar
Guda, C., Fahy, E., and Subramaniam, S. (2004) MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 20, 1785–94.
Article PubMed CAS Google Scholar
Bhasin, M., and Raghava, G. P. (2004) ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res 32, W414–9.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Robert-Cedergren Centre for Bioinformatics and Genomics, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
Yao-Qing Shen, Emmet O’Brien, Liisa Koski, B. Franz Lang & Gertraud Burger

Authors

Yao-Qing Shen
View author publications
You can also search for this author in PubMed Google Scholar
Emmet O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Liisa Koski
View author publications
You can also search for this author in PubMed Google Scholar
B. Franz Lang
View author publications
You can also search for this author in PubMed Google Scholar
Gertraud Burger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Molecular Structure and Function Hospital for Sick Children, Departments of Biochemistry & Molecular Genetics, University of Toronto, Toronto, ON, Canada
John Parkinson

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Shen, YQ., O’Brien, E., Koski, L., Lang, B.F., Burger, G. (2009). EST Databases and Web Tools for EST Projects. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_11

Download citation

DOI: https://doi.org/10.1007/978-1-60327-136-3_11
Published: 10 March 2009
Publisher Name: Humana Press
Print ISBN: 978-1-58829-759-4
Online ISBN: 978-1-60327-136-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics