P2P proteomics -- data sharing for enhanced protein identification
In order to tackle the important and challenging problem in proteomics of identifying known and new protein sequences using high-throughput methods, we propose a data-sharing platform that uses fully distributed P2P technologies to share specifications of peer-interaction protocols and service components. By using such a platform, information to be searched is no longer centralised in a few repositories but gathered from experiments in peer proteomics laboratories, which can subsequently be searched by fellow researchers.
The system distributively runs a data-sharing protocol specified in the Lightweight Communication Calculus underlying the system through which researchers interact via message passing. For this, researchers interact with the system through particular components that link to database querying systems based on BLAST and/or OMSSA and GUI-based visualisation environments. We have tested the proposed platform with data drawn from preexisting MS/MS data reservoirs from the 2006 ABRF (Association of Biomolecular Resource Facilities) test sample, which was extensively tested during the ABRF Proteomics Standards Research Group 2006 worldwide survey. In particular we have taken the data available from a subset of proteomics laboratories of Spain's National Institute for Proteomics, ProteoRed, a network for the coordination, integration and development of the Spanish proteomics facilities.
Results and Discussion
We performed queries against nine databases including seven ProteoRed proteomics laboratories, the NCBI Swiss-Prot database and the local database of the CSIC/UAB Proteomics Laboratory. A detailed analysis of the results indicated the presence of a protein that was supported by other NCBI matches and highly scored matches in several proteomics labs. The analysis clearly indicated that the protein was a relatively high concentrated contaminant that could be present in the ABRF sample. This fact is evident from the information that could be derived from the proposed P2P proteomics system, however it is not straightforward to arrive to the same conclusion by conventional means as it is difficult to discard organic contamination of samples. The actual presence of this contaminant was only stated after the ABRF study of all the identifications reported by the laboratories.
- Pertea, M, Salzberg, SL (2010) Between a chicken and a grape: estimating the number of human genes. Genome Biology 11: pp. 206 CrossRef
- Altschul, SF, Gish, W, Miller, W, Myers, EW, Lipman, DJ (1990) Basic Local Alignment Search Tool. Journal of Molecular Biology 215: pp. 403-410
- Pearson, WR Rapid and Sensitive Sequence Comparison with FASTP and FASTA. In: Doolittle, R eds. (1990) Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, Volume 183 of Methods in Enzymology. pp. 63-98
- Geer, L, Markey, S, Kowalak, J, Wagner, L, Xu, M, Maynard, D, Yang, X, Shi, W, Bryant, S (2004) Open mass spectrometry search algorithm. Journal of Proteome Research 3: pp. 958-964 CrossRef
- Siebes, R, Dupplaw, D, Kotoulas, S, Perreau de Pinninck, A, van Harmelen, F, Robertson, D The OpenKnowledge System: An Interaction-Centered Approach to Knowledge Sharing. In: Meersman, R, Tari, Z eds. (2007) On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS. OTM Confederated International Conferences CoopIS, DOA, ODBASE, GADA, and IS 2007, Vilamoura, Portugal, November 25-30, 2007, Proceedings, Part I, Volume 4803 of Lecture Notes in Computer Science. pp. 381-390
- Perreau de Pinninck, A, Dupplaw, D, Kotoulas, S, Siebes, R (2007) The OpenKnowledge Kernel. International Journal of Applied Mathematics and Computer Sciences 4: pp. 162-167
- Robertson, D, Giunchiglia, F, van Harmelen, F, Marchese, M, Sabou, M, Schorlemmer, M, Shadbolt, N, Siebes, R, Sierra, C, Walton, C, Dasmahapatra, S, Dupplaw, D, Lewis, P, Yatskevich, M, Kotoulas, S, Perreau de Pinninck, A, Loizou, A Open Knowledge -- Coordinating Knowledge Sharing Through Peer-to-Peer Interaction. In: Dastani, M, El Fallah Seghrouchni, A, Leite, J, Torroni, P eds. (2008) Languages, Methodologies and Development Tools for Multi-Agent Systems. First InternationalWorkshop, LADS 2007. Durham, UK, September 4-6, 2007. Revised Selected and Invited Papers, Volume 5118 of Lecture Notes in Artificial Intelligence. pp. 1-18
- Miller, T, McGinnis, J Amongst First-Class Protocols. In: Artikis, A, O'Hare, GMP, Stathis, K, Vouros, GA eds. (2008) Engineering Societies in the Agents World VIII, 8th International Workshop, ESAW 2007, Athens, Greece, October 22-24, 2007, Revised Selected Papers, Volume 4995 of Lecture Notes in Computer Science. pp. 208-223
- Robertson, D Multi-agent Coordination as Distributed Logic Programming. In: Demoen, B, Lifschitz, V eds. (2004) Logic Programming. 20th International Conference, ICLP 2004, Volume 3132 of Lecture Notes in Computer Science. pp. 416-430
- Giunchiglia, F, Sierra, C, McNeill, F, Osman, N, Siebes, R (2007) Good Enough Answer Algorithms. Deliverable D4.5, OpenKnowledge.
- The Statistics of Sequence Similarity Scores Retrieved from http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html on November 19, 2008. nd Retrieved from on November 19, 2008. nd
- Johnson, RS, Martin, SA, Biemann, K, Stults, JT, Watson, JT (1987) Novel Fragmentation Process of Peptides by Collision-Induced Decomposition in a Tandem Mass Spectrometer: Differentiation of Leucine and Isoleucine. Analytical Chemistry 59: pp. 2621-2625 CrossRef
- Li, Y, Bandar, ZA, McLean, D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15: pp. 871-882 CrossRef
- P2P proteomics -- data sharing for enhanced protein identification
- Open Access
- Available under Open Access This content is freely available online to anyone, anywhere at any time.
- Online Date
- January 2012
- Online ISSN
- BioMed Central
- Additional Links
- Author Affiliations
- 1. Artificial Intelligence Research Institute, IIIA-CSIC, Kragujevac, Spain
- 2. CSIC/UAB Proteomics Laboratory, IIBB-CSIC, IDIBAPS, Kragujevac, Spain