Abstract
Protein interactions are the basic building blocks for assembly of pathways and networks. Almost any biologically meaningful functionality (for instance, linear signaling pathways, chains of metabolic reactions, transcription factor dimmers, protein complexes of transcriptosome, gene–disease associations) can be represented as a combination of binary relationships between “network objects” (genes, proteins, RNA species, bioactive compounds). Naturally, the assembled pathways and networks are only as good as their “weakest” link (i.e., a wrongly assigned interaction), and the errors multiply in multi-step pathways. Therefore, the utility of “systems biology” is fundamentally dependent on quality and relevance of protein interactions. The second important parameter is the sheer number of interactions assembled in the database. One needs a “critical mass” of species-specific interactions in order to build cohesive networks for a gene list, not a constellation of non-connected proteins and protein pairs. The third issue is semantic consistency between interactions of different types. Transient physical signal transduction interactions, reactions of endogenous metabolism, transcription factor–promoter binding, and kinetic drug–target interactions are all very different in nature. Yet, they have to fit well into one database format and be consistent in order to be useful in reconstruction of cellular processes.
High-quality protein interactions are available in peer-reviewed “small experiment” literature and, to a much smaller extent, patents. However, it is very challenging to find the interactions, annotate with searchable (and computable) parameters, catalogue in the database format in computer readable form, and assemble into a database. There are hundreds of thousands of mammalian interactions scattered in tens of thousands of papers in a few thousands of scientific journals. There are no widely used standards for reporting the interactions in scientific texts and, therefore, text-mining tools have only limited applicability. In order to generate a meaningful database of protein interactions, one needs a well-developed technology of manual curation, equipped with computational solutions, managerial procedures, quality control, and users’ feedback. Here we describe our ever-evolving annotation approach, the important annotation issues and our solutions, and the mammalian protein interactions database MetaBase™ which we have been working on for over 8 years.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kitano H. (2007) Towards a theory of biological robustness. Mol Syst Biol. 3, 137.
Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D. (2007) BRENDA, AMENDA and FRENDA: The enzyme information system in 2007. Nucleic Acids Res. 35, D511–D514.
Selkov JE, Grechkin Y, Mikhailova N, Selkov E. (1998) MPW: The Metabolic Pathways database. Nucleic Acids Res. 26, 43–45.
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484.
Willis RC, Hogue CW. (2006) Searching, viewing, and visualizing data in the Biomolecular Interaction Network Database (BIND). Curr Protoc Bioinformatics. Chapter 8, Unit 8.9.
Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes HW, Ruepp A, Frishman D. (2005) The MIPS mammalian protein-protein interaction database. Bioinformatics. 21 (6), 832–834.
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451.
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, Mering CV. (2008) STRING 8 – A global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–416.
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. (2007) IntAct – Open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565.
Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, Menon S, Hanumanthu G, Gupta M, Upendran S, Gupta S, Mahesh M, Jacob B, Mathew P, Chatterjee P, Arun KS, Sharma S, Chandrika KN, Deshpande N, Palvankar K, Raghavnath R, Krishnakanth R, Karathia H, Rekha B, Nayak R, Vishnupriya G, Kumar HG, Nagini M, Kumar GS, Jose R, Deepthi P, Mohan SS, Gandhi TK, Harsha HC, Deshpande KS, Sarker M, Prasad TS, Pandey A. (2006) Human protein reference database – 2006 update. Nucleic Acids Res. 34, D411–D414.
Sauro HM, Bergmann FT. (2008) Standards and ontologies in computational systems biology. Essays Biochem. 45, 211–222.
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. (2007) UniProtKB/Swiss-Prot: The manually annotated section of the UniProt KnowledgeBase. Methods Mol Biol. 406, 89–112.
Blake JA, Harris MA. (2008) The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis. Curr Protoc Bioinformatics. Chapter 7, Unit 7.2.
Mottaz A, Yip YL, Ruch P, Veuthey AL. (2008) Mapping proteins to disease terminologies: from UniProt to MeSH. BMC Bioinformatics. 9, Suppl 5, S3.
Chatr-Aryamontri A, Zanzoni A, Ceol A, Cesareni G. (2008) Searching the protein interaction space through the MINT database. Methods Mol Biol. 484, 305–317.
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. (2008) miRBase: Tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158.
Wakaguri H, Yamashita R, Suzuki Y, Sugano S, Nakai K. (2008) DBTSS: Database of transcription start sites, progress report 2008. Nucleic Acids Res. 36, D97–D101.
Bourne KZ, Ferrari DC, Lange-Dohna C, Rossner S, Wood TG, Perez-Polo JR. (2007) Differential regulation of BACE1 promoter activity by nuclear factor-kappaB in neurons and glia upon exposure to beta-amyloid peptides. J Neurosci Res. 85 (6), 1194–1204.
Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD. (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 36, D623–D631.
Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. (2006) Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 45 (8), 2545–2555.
Suthram S, Shlomi T, Ruppin E, Sharan R, Ideker T. (2006) A direct comparison of protein interaction confidence schemes. BMC Bioinformatics. 7, 360.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Bureeva, S., Zvereva, S., Romanov, V., Serebryiskaya, T. (2009). Manual Annotation of Protein Interactions. In: Nikolsky, Y., Bryant, J. (eds) Protein Networks and Pathway Analysis. Methods in Molecular Biology, vol 563. Humana Press. https://doi.org/10.1007/978-1-60761-175-2_5
Download citation
DOI: https://doi.org/10.1007/978-1-60761-175-2_5
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60761-174-5
Online ISBN: 978-1-60761-175-2
eBook Packages: Springer Protocols