A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets

Part of the Studies in Computational Intelligence book series (SCI, volume 361)


The huge difference between known sequences and known tertiary structures has fostered the development of automated methods and systems for protein analysis.When these systems are learned using machine learning techniques, the capability of training them with suitable data becomes of paramount importance. From this perspective, the search for (and the generation of) specialized datasets that meet specific requirements are prominent activities for researchers. To help researchers in these activities we developed ProDaMa-C, a web application aimed at generating specialized protein structure datasets and fostering the collaboration among researchers. ProDaMa-C provides a collaborative environmentwhere researcherswith similar interests can meet and collaborate to generate new datasets. Datasets are generated selecting proteins through user-defined pipelines of methods/operators. Each pipeline can also be used as starting point for building further pipelines able to enforce additional selection criteria. Freely available as web application at the URL , ProDaMa-C has shown to be a useful tool for researchers involved in the task of generating specialized protein structure datasets.


Protein Data Bank Protein Secondary Structure Local Database Biological Source Nucleic Acid Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, B.F.F., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 27(1), 12–17 (1998)CrossRefGoogle Scholar
  2. 2.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 192–202 (1999)CrossRefGoogle Scholar
  4. 4.
    Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)CrossRefGoogle Scholar
  5. 5.
    Cheng, J., Randall, A., Sweredoski, M., Baldi, P.: SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research Web Server Issue 33, 72–76 (2005)CrossRefGoogle Scholar
  6. 6.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 2289–3402 (1997)CrossRefGoogle Scholar
  7. 7.
    Randall, A., Cheng, J., Sweredosk, M., Baldi, P.: TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Bioinformatics 24(4), 513–520 (2008)CrossRefGoogle Scholar
  8. 8.
    Shepherd, A.J., Gorse, D., Thornton, J.M.: Prediction of the location and type of β-turns in proteins using neural networks. Protein Science 8, 1045–1055 (1999)CrossRefGoogle Scholar
  9. 9.
    Kaur, H., Raghava, G.P.S.: Prediction of beta-turns in proteins from multiple alignment using neural network. Protein Science 12, 627–634 (2003)CrossRefGoogle Scholar
  10. 10.
    Sujansky, W.: Heterogeneous database integration in biomedicine. Journal of Biomededical Informatics 34(4), 285–298 (2001)CrossRefGoogle Scholar
  11. 11.
    Perrire, G., Gouy, M.: WWW-query: An on-line retrieval system for biological sequence banks. Biochimie 78(5), 364–369 (1996)CrossRefGoogle Scholar
  12. 12.
    Etzold, T., Argos, P.: SRS – an indexing and retrieval tool for flat file data libraries. Bioinformatics 9(1), 49–57 (1992)CrossRefGoogle Scholar
  13. 13.
    Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., Goble, C.A., Brass, A.: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 16(2), 184–186 (2000)CrossRefGoogle Scholar
  14. 14.
    Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: a digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)Google Scholar
  15. 15.
    Chapman, B., Chang, J.: Biopython: Python tools for computational biology. ACM SIGBIO Newslett. 20, 15–19 (2000)CrossRefGoogle Scholar
  16. 16.
    Armano, G., Manconi, A.: ProDaMa: an open source Python library to generate protein structure datasets. BMC Research Notes 2, 202 (2009)CrossRefGoogle Scholar
  17. 17.
    Hooft, R.W.W., Sander, C., Scharf, M., Vriend, G.: The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Bioinformatics 12(6), 525–529 (1996)CrossRefGoogle Scholar
  18. 18.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)CrossRefGoogle Scholar
  19. 19.
    Schneider, R., de Daruvar, A., Sander, C.: The HSSP database of protein structure-sequence alignments. Nucleic Acids Research 25(1), 226–230 (1997)CrossRefGoogle Scholar
  20. 20.
    Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data Growth and its Impact on the SCOP Database: new Developments. Nucleic Acids Research 36, D419–D425 (2008)CrossRefGoogle Scholar
  21. 21.
    Cuff, A.L., Sillitoe, I., Lewis, T., Redfern, O.C., Garratt, R., Thornton, J., Orengo, C.A.: The CATH classification revisited – architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research 37, D310–D314 (2009)CrossRefzbMATHGoogle Scholar
  22. 22.
    Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L.L.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)CrossRefGoogle Scholar
  23. 23.
    Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233(1), 123–138 (1993)CrossRefGoogle Scholar
  24. 24.
    Holm, L., Rosenstrm, P.: Dali server: conservation mapping in 3D. Nucleic Acids Research 38, W545–W549 (2010)CrossRefGoogle Scholar
  25. 25.
    Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. Journal of Molecular Biology 296(3), 921–936 (2000)CrossRefGoogle Scholar
  26. 26.
    Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Briefings in Bioinformatics 5, 39–55 (2004)CrossRefGoogle Scholar
  27. 27.
    Jayasinghe, S., Hristova, K., White, S.H.: MPtopo: A database of membrane protein topology. Protein Science 10, 455–458 (2001)CrossRefGoogle Scholar
  28. 28.
    Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)CrossRefGoogle Scholar
  29. 29.
    Hobohm, U., Sander, C.: Enlarged representative set of protein structures. Protein Science 3(3), 522–524 (1994)CrossRefGoogle Scholar
  30. 30.
    Vriend, G.: WHAT IF: A molecular modeling and drug design program. Journal of Molecular Graphics 8, 52–56 (1990)CrossRefGoogle Scholar
  31. 31.
    Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceeding of the National Academy of Sciences of the United States of America 85(8), 2444–2448 (1998)CrossRefGoogle Scholar
  32. 32.
    Wang, G., Dunbrack, R.L.: Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar
  33. 33.
    Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21(2), 152–159 (2005)CrossRefGoogle Scholar
  34. 34.
    Cuff, J.A., Barton, G.J.: Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction. PROTEINS: Structure, Function, and Genetics 40, 502–511 (2000)CrossRefGoogle Scholar
  35. 35.
    Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins-Structure Function and Genetics 34(4), 508–519 (1999)CrossRefGoogle Scholar
  36. 36.
    Wilson, C.L., Hubbard, S.J., Doig, A.J.: A critical assessment of the secondary structure α-helices and their termini in proteins. Protein Engineering Design and Selection 15(7), 545–554 (2002)CrossRefGoogle Scholar
  37. 37.
    Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)CrossRefGoogle Scholar
  38. 38.
    Rost, B., Schneider, R., Sander, C.: Redefining the goals of protein secondary structure prediction. Journal of Molecular Biology 235, 13–26 (1994)CrossRefGoogle Scholar
  39. 39.
    Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36(2), W197–W201 (2008)CrossRefGoogle Scholar
  40. 40.
    Sander, C., Schneider, R.: Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)CrossRefGoogle Scholar
  41. 41.
    Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41(5), 687–693 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Dept. of Electrical and Electronic EngineeringUniversity of CagliariItaly
  2. 2.Institute for Biomedical TechnologiesNational Research CouncilMilanoItaly

Personalised recommendations