Skip to main content

A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets

  • Chapter
Advances in Distributed Agent-Based Retrieval Tools

Part of the book series: Studies in Computational Intelligence ((SCI,volume 361))

Abstract

The huge difference between known sequences and known tertiary structures has fostered the development of automated methods and systems for protein analysis.When these systems are learned using machine learning techniques, the capability of training them with suitable data becomes of paramount importance. From this perspective, the search for (and the generation of) specialized datasets that meet specific requirements are prominent activities for researchers. To help researchers in these activities we developed ProDaMa-C, a web application aimed at generating specialized protein structure datasets and fostering the collaboration among researchers. ProDaMa-C provides a collaborative environmentwhere researcherswith similar interests can meet and collaborate to generate new datasets. Datasets are generated selecting proteins through user-defined pipelines of methods/operators. Each pipeline can also be used as starting point for building further pipelines able to enforce additional selection criteria. Freely available as web application at the URL http://iasc.diee.unica.it/prodamac , ProDaMa-C has shown to be a useful tool for researchers involved in the task of generating specialized protein structure datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, B.F.F., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 27(1), 12–17 (1998)

    Article  Google Scholar 

  2. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  3. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 192–202 (1999)

    Article  Google Scholar 

  4. Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)

    Article  Google Scholar 

  5. Cheng, J., Randall, A., Sweredoski, M., Baldi, P.: SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research Web Server Issue 33, 72–76 (2005)

    Article  Google Scholar 

  6. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 2289–3402 (1997)

    Article  Google Scholar 

  7. Randall, A., Cheng, J., Sweredosk, M., Baldi, P.: TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Bioinformatics 24(4), 513–520 (2008)

    Article  Google Scholar 

  8. Shepherd, A.J., Gorse, D., Thornton, J.M.: Prediction of the location and type of β-turns in proteins using neural networks. Protein Science 8, 1045–1055 (1999)

    Article  Google Scholar 

  9. Kaur, H., Raghava, G.P.S.: Prediction of beta-turns in proteins from multiple alignment using neural network. Protein Science 12, 627–634 (2003)

    Article  Google Scholar 

  10. Sujansky, W.: Heterogeneous database integration in biomedicine. Journal of Biomededical Informatics 34(4), 285–298 (2001)

    Article  Google Scholar 

  11. Perrire, G., Gouy, M.: WWW-query: An on-line retrieval system for biological sequence banks. Biochimie 78(5), 364–369 (1996)

    Article  Google Scholar 

  12. Etzold, T., Argos, P.: SRS – an indexing and retrieval tool for flat file data libraries. Bioinformatics 9(1), 49–57 (1992)

    Article  Google Scholar 

  13. Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., Goble, C.A., Brass, A.: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 16(2), 184–186 (2000)

    Article  Google Scholar 

  14. Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: a digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)

    Google Scholar 

  15. Chapman, B., Chang, J.: Biopython: Python tools for computational biology. ACM SIGBIO Newslett. 20, 15–19 (2000)

    Article  Google Scholar 

  16. Armano, G., Manconi, A.: ProDaMa: an open source Python library to generate protein structure datasets. BMC Research Notes 2, 202 (2009)

    Article  Google Scholar 

  17. Hooft, R.W.W., Sander, C., Scharf, M., Vriend, G.: The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Bioinformatics 12(6), 525–529 (1996)

    Article  Google Scholar 

  18. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)

    Article  Google Scholar 

  19. Schneider, R., de Daruvar, A., Sander, C.: The HSSP database of protein structure-sequence alignments. Nucleic Acids Research 25(1), 226–230 (1997)

    Article  Google Scholar 

  20. Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data Growth and its Impact on the SCOP Database: new Developments. Nucleic Acids Research 36, D419–D425 (2008)

    Article  Google Scholar 

  21. Cuff, A.L., Sillitoe, I., Lewis, T., Redfern, O.C., Garratt, R., Thornton, J., Orengo, C.A.: The CATH classification revisited – architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research 37, D310–D314 (2009)

    Article  MATH  Google Scholar 

  22. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L.L.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)

    Article  Google Scholar 

  23. Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233(1), 123–138 (1993)

    Article  Google Scholar 

  24. Holm, L., Rosenstrm, P.: Dali server: conservation mapping in 3D. Nucleic Acids Research 38, W545–W549 (2010)

    Article  Google Scholar 

  25. Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. Journal of Molecular Biology 296(3), 921–936 (2000)

    Article  Google Scholar 

  26. Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Briefings in Bioinformatics 5, 39–55 (2004)

    Article  Google Scholar 

  27. Jayasinghe, S., Hristova, K., White, S.H.: MPtopo: A database of membrane protein topology. Protein Science 10, 455–458 (2001)

    Article  Google Scholar 

  28. Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)

    Article  Google Scholar 

  29. Hobohm, U., Sander, C.: Enlarged representative set of protein structures. Protein Science 3(3), 522–524 (1994)

    Article  Google Scholar 

  30. Vriend, G.: WHAT IF: A molecular modeling and drug design program. Journal of Molecular Graphics 8, 52–56 (1990)

    Article  Google Scholar 

  31. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceeding of the National Academy of Sciences of the United States of America 85(8), 2444–2448 (1998)

    Article  Google Scholar 

  32. Wang, G., Dunbrack, R.L.: Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)

    Article  Google Scholar 

  33. Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21(2), 152–159 (2005)

    Article  Google Scholar 

  34. Cuff, J.A., Barton, G.J.: Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction. PROTEINS: Structure, Function, and Genetics 40, 502–511 (2000)

    Article  Google Scholar 

  35. Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins-Structure Function and Genetics 34(4), 508–519 (1999)

    Article  Google Scholar 

  36. Wilson, C.L., Hubbard, S.J., Doig, A.J.: A critical assessment of the secondary structure α-helices and their termini in proteins. Protein Engineering Design and Selection 15(7), 545–554 (2002)

    Article  Google Scholar 

  37. Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)

    Article  Google Scholar 

  38. Rost, B., Schneider, R., Sander, C.: Redefining the goals of protein secondary structure prediction. Journal of Molecular Biology 235, 13–26 (1994)

    Article  Google Scholar 

  39. Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36(2), W197–W201 (2008)

    Article  Google Scholar 

  40. Sander, C., Schneider, R.: Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)

    Article  Google Scholar 

  41. Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41(5), 687–693 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Armano, G., Manconi, A. (2011). A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21384-7_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21383-0

  • Online ISBN: 978-3-642-21384-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics