A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets

Armano, Giuliano; Manconi, Andrea

doi:10.1007/978-3-642-21384-7_2

Giuliano Armano⁵ &
Andrea Manconi⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 361))

451 Accesses
1 Citations

Abstract

The huge difference between known sequences and known tertiary structures has fostered the development of automated methods and systems for protein analysis.When these systems are learned using machine learning techniques, the capability of training them with suitable data becomes of paramount importance. From this perspective, the search for (and the generation of) specialized datasets that meet specific requirements are prominent activities for researchers. To help researchers in these activities we developed ProDaMa-C, a web application aimed at generating specialized protein structure datasets and fostering the collaboration among researchers. ProDaMa-C provides a collaborative environmentwhere researcherswith similar interests can meet and collaborate to generate new datasets. Datasets are generated selecting proteins through user-defined pipelines of methods/operators. Each pipeline can also be used as starting point for building further pipelines able to enforce additional selection criteria. Freely available as web application at the URL http://iasc.diee.unica.it/prodamac , ProDaMa-C has shown to be a useful tool for researchers involved in the task of generating specialized protein structure datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, B.F.F., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 27(1), 12–17 (1998)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 192–202 (1999)
Article Google Scholar
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Article Google Scholar
Cheng, J., Randall, A., Sweredoski, M., Baldi, P.: SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research Web Server Issue 33, 72–76 (2005)
Article Google Scholar
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 2289–3402 (1997)
Article Google Scholar
Randall, A., Cheng, J., Sweredosk, M., Baldi, P.: TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Bioinformatics 24(4), 513–520 (2008)
Article Google Scholar
Shepherd, A.J., Gorse, D., Thornton, J.M.: Prediction of the location and type of β-turns in proteins using neural networks. Protein Science 8, 1045–1055 (1999)
Article Google Scholar
Kaur, H., Raghava, G.P.S.: Prediction of beta-turns in proteins from multiple alignment using neural network. Protein Science 12, 627–634 (2003)
Article Google Scholar
Sujansky, W.: Heterogeneous database integration in biomedicine. Journal of Biomededical Informatics 34(4), 285–298 (2001)
Article Google Scholar
Perrire, G., Gouy, M.: WWW-query: An on-line retrieval system for biological sequence banks. Biochimie 78(5), 364–369 (1996)
Article Google Scholar
Etzold, T., Argos, P.: SRS – an indexing and retrieval tool for flat file data libraries. Bioinformatics 9(1), 49–57 (1992)
Article Google Scholar
Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., Goble, C.A., Brass, A.: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 16(2), 184–186 (2000)
Article Google Scholar
Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: a digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)
Google Scholar
Chapman, B., Chang, J.: Biopython: Python tools for computational biology. ACM SIGBIO Newslett. 20, 15–19 (2000)
Article Google Scholar
Armano, G., Manconi, A.: ProDaMa: an open source Python library to generate protein structure datasets. BMC Research Notes 2, 202 (2009)
Article Google Scholar
Hooft, R.W.W., Sander, C., Scharf, M., Vriend, G.: The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Bioinformatics 12(6), 525–529 (1996)
Article Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Article Google Scholar
Schneider, R., de Daruvar, A., Sander, C.: The HSSP database of protein structure-sequence alignments. Nucleic Acids Research 25(1), 226–230 (1997)
Article Google Scholar
Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data Growth and its Impact on the SCOP Database: new Developments. Nucleic Acids Research 36, D419–D425 (2008)
Article Google Scholar
Cuff, A.L., Sillitoe, I., Lewis, T., Redfern, O.C., Garratt, R., Thornton, J., Orengo, C.A.: The CATH classification revisited – architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research 37, D310–D314 (2009)
Article MATH Google Scholar
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L.L.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)
Article Google Scholar
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233(1), 123–138 (1993)
Article Google Scholar
Holm, L., Rosenstrm, P.: Dali server: conservation mapping in 3D. Nucleic Acids Research 38, W545–W549 (2010)
Article Google Scholar
Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. Journal of Molecular Biology 296(3), 921–936 (2000)
Article Google Scholar
Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Briefings in Bioinformatics 5, 39–55 (2004)
Article Google Scholar
Jayasinghe, S., Hristova, K., White, S.H.: MPtopo: A database of membrane protein topology. Protein Science 10, 455–458 (2001)
Article Google Scholar
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Article Google Scholar
Hobohm, U., Sander, C.: Enlarged representative set of protein structures. Protein Science 3(3), 522–524 (1994)
Article Google Scholar
Vriend, G.: WHAT IF: A molecular modeling and drug design program. Journal of Molecular Graphics 8, 52–56 (1990)
Article Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceeding of the National Academy of Sciences of the United States of America 85(8), 2444–2448 (1998)
Article Google Scholar
Wang, G., Dunbrack, R.L.: Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)
Article Google Scholar
Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21(2), 152–159 (2005)
Article Google Scholar
Cuff, J.A., Barton, G.J.: Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction. PROTEINS: Structure, Function, and Genetics 40, 502–511 (2000)
Article Google Scholar
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins-Structure Function and Genetics 34(4), 508–519 (1999)
Article Google Scholar
Wilson, C.L., Hubbard, S.J., Doig, A.J.: A critical assessment of the secondary structure α-helices and their termini in proteins. Protein Engineering Design and Selection 15(7), 545–554 (2002)
Article Google Scholar
Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)
Article Google Scholar
Rost, B., Schneider, R., Sander, C.: Redefining the goals of protein secondary structure prediction. Journal of Molecular Biology 235, 13–26 (1994)
Article Google Scholar
Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36(2), W197–W201 (2008)
Article Google Scholar
Sander, C., Schneider, R.: Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)
Article Google Scholar
Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41(5), 687–693 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electrical and Electronic Engineering, University of Cagliari, Italy
Giuliano Armano
Institute for Biomedical Technologies, National Research Council, Milano, Italy
Andrea Manconi

Authors

Giuliano Armano
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Manconi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

InterAnalytics, Rue des Savoises, 19, 1205, Geneva, Switzerland
Vincenzo Pallotta
CRS4, Center of Advanced Studies Research and Development in Sardinia, Parco Scientifico della Sardegna, Ed. 1, 09010, Loc. Piscinamanna Pula, CA, Italy
Alessandro Soro
Department of Electrical and Electronic Engineering, University of Cagliari, 09123, Piazza d’Armi, Cagliari, Italy
Eloisa Vargiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Armano, G., Manconi, A. (2011). A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-21384-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21383-0
Online ISBN: 978-3-642-21384-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics