Protein Networks and Pathway Analysis pp 75-95 | Cite as
Manual Annotation of Protein Interactions
- 12 Citations
- 2.1k Downloads
Abstract
Protein interactions are the basic building blocks for assembly of pathways and networks. Almost any biologically meaningful functionality (for instance, linear signaling pathways, chains of metabolic reactions, transcription factor dimmers, protein complexes of transcriptosome, gene–disease associations) can be represented as a combination of binary relationships between “network objects” (genes, proteins, RNA species, bioactive compounds). Naturally, the assembled pathways and networks are only as good as their “weakest” link (i.e., a wrongly assigned interaction), and the errors multiply in multi-step pathways. Therefore, the utility of “systems biology” is fundamentally dependent on quality and relevance of protein interactions. The second important parameter is the sheer number of interactions assembled in the database. One needs a “critical mass” of species-specific interactions in order to build cohesive networks for a gene list, not a constellation of non-connected proteins and protein pairs. The third issue is semantic consistency between interactions of different types. Transient physical signal transduction interactions, reactions of endogenous metabolism, transcription factor–promoter binding, and kinetic drug–target interactions are all very different in nature. Yet, they have to fit well into one database format and be consistent in order to be useful in reconstruction of cellular processes.
High-quality protein interactions are available in peer-reviewed “small experiment” literature and, to a much smaller extent, patents. However, it is very challenging to find the interactions, annotate with searchable (and computable) parameters, catalogue in the database format in computer readable form, and assemble into a database. There are hundreds of thousands of mammalian interactions scattered in tens of thousands of papers in a few thousands of scientific journals. There are no widely used standards for reporting the interactions in scientific texts and, therefore, text-mining tools have only limited applicability. In order to generate a meaningful database of protein interactions, one needs a well-developed technology of manual curation, equipped with computational solutions, managerial procedures, quality control, and users’ feedback. Here we describe our ever-evolving annotation approach, the important annotation issues and our solutions, and the mammalian protein interactions database MetaBase™ which we have been working on for over 8 years.
Key words
Protein interactions manual annotation interaction database literature curation biological pathways networksReferences
- 1.Kitano H. (2007) Towards a theory of biological robustness. Mol Syst Biol. 3, 137.PubMedCrossRefGoogle Scholar
- 2.Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D. (2007) BRENDA, AMENDA and FRENDA: The enzyme information system in 2007. Nucleic Acids Res. 35, D511–D514.PubMedCrossRefGoogle Scholar
- 3.Selkov JE, Grechkin Y, Mikhailova N, Selkov E. (1998) MPW: The Metabolic Pathways database. Nucleic Acids Res. 26, 43–45.PubMedCrossRefGoogle Scholar
- 4.Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484.PubMedCrossRefGoogle Scholar
- 5.Willis RC, Hogue CW. (2006) Searching, viewing, and visualizing data in the Biomolecular Interaction Network Database (BIND). Curr Protoc Bioinformatics. Chapter 8, Unit 8.9.Google Scholar
- 6.Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes HW, Ruepp A, Frishman D. (2005) The MIPS mammalian protein-protein interaction database. Bioinformatics. 21 (6), 832–834.PubMedCrossRefGoogle Scholar
- 7.Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451.PubMedCrossRefGoogle Scholar
- 8.Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, Mering CV. (2008) STRING 8 – A global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–416.Google Scholar
- 9.Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. (2007) IntAct – Open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565.PubMedCrossRefGoogle Scholar
- 10.Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, Menon S, Hanumanthu G, Gupta M, Upendran S, Gupta S, Mahesh M, Jacob B, Mathew P, Chatterjee P, Arun KS, Sharma S, Chandrika KN, Deshpande N, Palvankar K, Raghavnath R, Krishnakanth R, Karathia H, Rekha B, Nayak R, Vishnupriya G, Kumar HG, Nagini M, Kumar GS, Jose R, Deepthi P, Mohan SS, Gandhi TK, Harsha HC, Deshpande KS, Sarker M, Prasad TS, Pandey A. (2006) Human protein reference database – 2006 update. Nucleic Acids Res. 34, D411–D414.PubMedCrossRefGoogle Scholar
- 11.Sauro HM, Bergmann FT. (2008) Standards and ontologies in computational systems biology. Essays Biochem. 45, 211–222.PubMedCrossRefGoogle Scholar
- 12.Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. (2007) UniProtKB/Swiss-Prot: The manually annotated section of the UniProt KnowledgeBase. Methods Mol Biol. 406, 89–112.PubMedCrossRefGoogle Scholar
- 13.Blake JA, Harris MA. (2008) The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis. Curr Protoc Bioinformatics. Chapter 7, Unit 7.2.Google Scholar
- 14.Mottaz A, Yip YL, Ruch P, Veuthey AL. (2008) Mapping proteins to disease terminologies: from UniProt to MeSH. BMC Bioinformatics. 9, Suppl 5, S3.PubMedCrossRefGoogle Scholar
- 15.Chatr-Aryamontri A, Zanzoni A, Ceol A, Cesareni G. (2008) Searching the protein interaction space through the MINT database. Methods Mol Biol. 484, 305–317.PubMedCrossRefGoogle Scholar
- 16.Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. (2008) miRBase: Tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158.PubMedCrossRefGoogle Scholar
- 17.Wakaguri H, Yamashita R, Suzuki Y, Sugano S, Nakai K. (2008) DBTSS: Database of transcription start sites, progress report 2008. Nucleic Acids Res. 36, D97–D101.PubMedCrossRefGoogle Scholar
- 18.Bourne KZ, Ferrari DC, Lange-Dohna C, Rossner S, Wood TG, Perez-Polo JR. (2007) Differential regulation of BACE1 promoter activity by nuclear factor-kappaB in neurons and glia upon exposure to beta-amyloid peptides. J Neurosci Res. 85 (6), 1194–1204.PubMedCrossRefGoogle Scholar
- 19.Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 36, D623–D631.PubMedCrossRefGoogle Scholar
- 20.Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. (2006) Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 45 (8), 2545–2555.PubMedCrossRefGoogle Scholar
- 21.Suthram S, Shlomi T, Ruppin E, Sharan R, Ideker T. (2006) A direct comparison of protein interaction confidence schemes. BMC Bioinformatics. 7, 360.PubMedCrossRefGoogle Scholar