Amino Acids

, Volume 38, Issue 4, pp 1237–1252

A mouse protein interactome through combined literature mining with multiple sources of interaction evidence

Original Article

Abstract

Protein–protein interactions (PPIs) play crucial roles in a number of biological processes. Recently, protein interaction networks (PINs) for several model organisms and humans have been generated, but few large-scale researches for mice have ever been made neither experimentally nor computationally. In the work, we undertook an effort to map a mouse PIN, in which protein interactions are hidden in enormous amount of biomedical literatures. Following a co-occurrence-based text-mining approach, a probabilistic model—naïve Bayesian was used to filter false-positive interactions by integrating heterogeneous kinds of evidence from genomic and proteomic datasets. A support vector machine algorithm was further used to choose protein pairs with physical interactions. By comparing with the currently available PPI datasets from several model organisms and humans, it showed that the derived mouse PINs have similar topological properties at the global level, but a high local divergence. The mouse protein interaction dataset is stored in the Mouse protein–protein interaction DataBase (MppDB) that is useful source of information for system-level understanding of gene function and biological processes in mammals. Access to the MppDB database is public available at http://bio.scu.edu.cn/mppi.

Keywords

Interactome Mouse Protein interaction network Protein–protein interaction 

Supplementary material

726_2009_335_MOESM1_ESM.pdf (881 kb)
Supplementary material (PDF 880 kb)

References

  1. Alfarano C, Andrade CE, Anthony K et al (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 33:D418–D424. doi:10.1093/nar/gki051 CrossRefPubMedGoogle Scholar
  2. Baldi P, Brunak S, Chauvin Y et al (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424. doi:10.1093/bioinformatics/16.5.412 CrossRefPubMedGoogle Scholar
  3. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113. doi:10.1038/nrg1272 CrossRefPubMedGoogle Scholar
  4. Barrett T, Troup DB, Wilhite SE et al (2007) NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res 35:D760–D765. doi:10.1093/nar/gkl887 CrossRefPubMedGoogle Scholar
  5. Barsky A, Gardy JL, Hancock RE et al (2007) Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 23:1040–1042. doi:10.1093/bioinformatics/btm057 CrossRefPubMedGoogle Scholar
  6. Beltrao P, Serrano L (2007) Specificity and evolvability in eukaryotic protein interaction networks. PLOS Comput Biol 3:e25. doi:10.1371/journal.pcbi.0030025 CrossRefPubMedGoogle Scholar
  7. Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein–protein interactions. BMC Bioinformatics 7(Suppl 1):S2. doi:10.1186/1471-2105-7-S1-S2
  8. Berg J, Lassig M (2006) Cross-species analysis of biological networks by Bayesian alignment. Proc Natl Acad Sci USA 103:10967–10972. doi:10.1073/pnas.0602294103 CrossRefPubMedGoogle Scholar
  9. Bowers PM, Pellegrini M, Thompson MJ et al (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5:R35. doi:10.1186/gb-2004-5-5-r35 CrossRefPubMedGoogle Scholar
  10. Brown KR, Jurisica I (2007) Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol 8:R95. doi:10.1186/gb-2007-8-5-r95 CrossRefPubMedGoogle Scholar
  11. Chatr-aryamontri A, Ceol A, Palazzi LM et al (2007) MINT: the molecular INTeraction database. Nucleic Acids Res 35:D572–D574. doi:10.1093/nar/gkl950 CrossRefPubMedGoogle Scholar
  12. Clevers H (2006) Wnt/beta-catenin signaling in development and disease. Cell 127:469–480. doi:10.1016/j.cell.2006.10.018 CrossRefPubMedGoogle Scholar
  13. Cox RD, Brown SD (2003) Rodent models of genetic disease. Curr Opin Genet Dev 13:278–283. doi:10.1016/S0959-437X(03)00051-0 CrossRefPubMedGoogle Scholar
  14. Cui J, Li P, Li G et al (2008) AtPID: Arabidopsis thaliana protein interactome database—an integrative platform for plant systems biology. Nucleic Acids Res 36:D999–D1008. doi:10.1093/nar/gkm844 CrossRefPubMedGoogle Scholar
  15. Date SV, Stoeckert CJ Jr (2006) Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res 16:542–549. doi:10.1101/gr.4573206 CrossRefPubMedGoogle Scholar
  16. Ewing RM, Chu P, Elisma F et al (2007) Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol Syst Biol 3:89. doi:10.1038/msb4100134 CrossRefPubMedGoogle Scholar
  17. Formstecher E, Aresta S, Collura V et al (2005) Protein interaction mapping: a Drosophila case study. Genome Res 15:376–384. doi:10.1101/gr.2659105 CrossRefPubMedGoogle Scholar
  18. Gandhi TK, Zhong J, Mathivanan S et al (2006) Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet 38:285–293. doi:10.1038/ng1747 CrossRefPubMedGoogle Scholar
  19. Gavin AC, Bosche M, Krause R et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147. doi:10.1038/415141a CrossRefPubMedGoogle Scholar
  20. Ge H, Liu Z, Church GM et al (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29:482–486. doi:10.1038/ng776 CrossRefPubMedGoogle Scholar
  21. Giot L, Bader JS, Brouwer C et al (2003) A protein interaction map of Drosophila melanogaster. Science 302:1727–1736. doi:10.1126/science.1090289 CrossRefPubMedGoogle Scholar
  22. Gordon MD, Nusse R (2006) Wnt signaling: multiple pathways, multiple receptors, and multiple transcription factors. J Biol Chem 281:22429–22433. doi:10.1074/jbc.R600015200 CrossRefPubMedGoogle Scholar
  23. Guan Y, Myers CL, Lu R et al (2008) A genomewide functional network for the laboratory mouse. PLOS Comput Biol 4:e1000165. doi:10.1371/journal.pcbi.1000165 CrossRefPubMedGoogle Scholar
  24. Harris MA, Clark JI, Ireland A et al (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34:D322–D326. doi:10.1093/nar/gkj021 CrossRefGoogle Scholar
  25. Hedges SB (2002) The origin and evolution of model organisms. Nat Rev Genet 3:838–849. doi:10.1038/nrg929 CrossRefPubMedGoogle Scholar
  26. Hendrickx M, Leyns L (2008) Non-conventional frizzled ligands and Wnt receptors. Dev Growth Differ 50:229–243PubMedGoogle Scholar
  27. Ho Y, Gruhler A, Heilbut A et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183. doi:10.1038/415180a CrossRefPubMedGoogle Scholar
  28. Hovatta I, Tennant RS, Helton R et al (2005) Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice. Nature 438:662–666. doi:10.1038/nature04250 CrossRefPubMedGoogle Scholar
  29. Huang TW, Lin CY, Kao CY (2007) Reconstruction of human protein interolog network using evolutionary conserved network. BMC Bioinformatics 8:152. doi:10.1186/1471-2105-8-152 CrossRefPubMedGoogle Scholar
  30. Ito T, Chiba T, Ozawa R et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98:4569–4574. doi:10.1073/pnas.061034498 CrossRefPubMedGoogle Scholar
  31. Jansen R, Yu H, Greenbaum D et al (2003) A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302:449–453. doi:10.1126/science.1087361 CrossRefPubMedGoogle Scholar
  32. Jenssen TK, Laegreid A, Komorowski J et al (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28:21–28. doi:10.1038/88213 CrossRefPubMedGoogle Scholar
  33. Kerrien S, Alam-Faruque Y, Aranda B et al (2007) IntAct–open source resource for molecular interaction data. Nucleic Acids Res 35:D561–D565. doi:10.1093/nar/gkl958 CrossRefPubMedGoogle Scholar
  34. Lehner B, Fraser AG (2004) A first-draft human protein-interaction map. Genome Biol 5:R63. doi:10.1186/gb-2004-5-9-r63 CrossRefPubMedGoogle Scholar
  35. Lemos B, Meiklejohn CD, Hartl DL (2004) Regulatory evolution across the protein interaction network. Nat Genet 36:1059–1060. doi:10.1038/ng1427 CrossRefPubMedGoogle Scholar
  36. Li S, Armstrong CM, Bertin N et al (2004) A map of the interactome network of the metazoan C. elegans. Science 303:540–543. doi:10.1126/science.1091403 CrossRefPubMedGoogle Scholar
  37. Li D, Li J, Ouyang S et al (2006a) Protein interaction networks of Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster: large-scale organization and robustness. Proteomics 6:456–461. doi:10.1002/pmic.200500228 CrossRefPubMedGoogle Scholar
  38. Li S, Wu L, Zhang Z (2006b) Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach. Bioinformatics 22:2143–2150. doi:10.1093/bioinformatics/btl363 CrossRefPubMedGoogle Scholar
  39. Li D, Liu W, Liu Z et al (2008) PRINCESS, a protein interaction confidence evaluation system with multiple data sources. Mol Cell Proteomics 7:1043–1052. doi:10.1074/mcp.M700287-MCP200 CrossRefPubMedGoogle Scholar
  40. Lu LJ, Xia Y, Paccanaro A et al (2005) Assessing the limits of genomic data integration for predicting protein networks. Genome Res 15:945–953. doi:10.1101/gr.3610305 CrossRefPubMedGoogle Scholar
  41. Matthews LR, Vaglio P, Reboul J et al (2001) Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or “interologs”. Genome Res 11:2120–2126. doi:10.1101/gr.205301 CrossRefPubMedGoogle Scholar
  42. McDermott J, Guerquin M, Frazier Z et al (2005) BIOVERSE: enhancements to the framework for structural, functional and contextual modeling of proteins and proteomes. Nucleic Acids Res 33:W324–325. doi:10.1093/nar/gki401 CrossRefPubMedGoogle Scholar
  43. Mika S, Rost B (2004) NLProt: extracting protein names and sequences from papers. Nucleic Acids Res 32:W634–637. doi:10.1093/nar/gkh427 CrossRefPubMedGoogle Scholar
  44. Mishra GR, Suresh M, Kumaran K et al (2006) Human protein reference database—2006 update. Nucleic Acids Res 34:D411–D414. doi:10.1093/nar/gkj141 CrossRefPubMedGoogle Scholar
  45. Ng SK, Zhang Z, Tan SH et al (2003) InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 31:251–254. doi:10.1093/nar/gkg079 CrossRefPubMedGoogle Scholar
  46. O’Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33:D476–D480. doi:10.1093/nar/gki107 CrossRefPubMedGoogle Scholar
  47. Okuda S, Yamada T, Hamajima M et al (2008) KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res 36:W423–426. doi:10.1093/nar/gkn629 CrossRefPubMedGoogle Scholar
  48. Pagel P, Kovac S, Oesterheld M et al (2005) The MIPS mammalian protein–protein interaction database. Bioinformatics 21:832–834. doi:10.1093/bioinformatics/bti115 CrossRefPubMedGoogle Scholar
  49. Ramani AK, Bunescu RC, Mooney RJ, et al. (2005) Consolidating the set of known human protein–protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol 6:R40. doi:10.1186/gb-2005-6-5-r40
  50. Ramirez F, Schlicker A, Assenov Y et al (2007) Computational analysis of human protein interaction networks. Proteomics 7:2541–2552. doi:10.1002/pmic.200600924 CrossRefPubMedGoogle Scholar
  51. Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein–protein interaction network. Nat Biotechnol 23:951–959. doi:10.1038/nbt1103 CrossRefPubMedGoogle Scholar
  52. Rosenthal N, Brown S (2007) The mouse ascending: perspectives for human-disease models. Nat Cell Biol 9:993–999. doi:10.1038/ncb437 CrossRefPubMedGoogle Scholar
  53. Rual JF, Venkatesan K, Hao T et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437:1173–1178. doi:10.1038/nature04209 CrossRefPubMedGoogle Scholar
  54. Salwinski L, Miller CS, Smith AJ et al (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32:D449–D451. doi:10.1093/nar/gkh086 CrossRefPubMedGoogle Scholar
  55. Shannon P, Markiel A, Ozier O et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504. doi:10.1101/gr.1239303 CrossRefPubMedGoogle Scholar
  56. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88. doi:10.1038/msb4100129 CrossRefPubMedGoogle Scholar
  57. Shen J, Zhang J, Luo X et al (2007) Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci USA 104:4337–4341. doi:10.1073/pnas.0607879104 CrossRefPubMedGoogle Scholar
  58. Smith CL, Goldsmith CA, Eppig JT (2005) The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6:R7. doi:10.1186/gb-2004-6-1-r7 CrossRefPubMedGoogle Scholar
  59. Sprenger J, Lynn Fink J, Karunaratne S et al (2008) LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res 36:D230–D233. doi:10.1093/nar/gkm950 CrossRefPubMedGoogle Scholar
  60. SPSS I (1999) SPSS Base 10.0 User’s Guide. SPSS, Inc., ChicagoGoogle Scholar
  61. Stapley BJ, Benoit G (2000) Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput 529–540Google Scholar
  62. Stein A, Russell RB, Aloy P (2005) 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 33:D413–D417. doi:10.1093/nar/gki037 CrossRefPubMedGoogle Scholar
  63. Stelzl U, Worm U, Lalowski M et al (2005) A human protein–protein interaction network: a resource for annotating the proteome. Cell 122:957–968. doi:10.1016/j.cell.2005.08.029 CrossRefPubMedGoogle Scholar
  64. Su AI, Wiltshire T, Batalov S et al (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067. doi:10.1073/pnas.0400782101 CrossRefPubMedGoogle Scholar
  65. Suzuki H, Fukunishi Y, Kagawa I et al (2001) Protein–protein interaction panel using mouse full-length cDNAs. Genome Res 11:1758–1765. doi:10.1101/gr.180101 CrossRefPubMedGoogle Scholar
  66. Tsaparas P, Marino-Ramirez L, Bodenreider O et al (2006) Global similarity and local divergence in human and mouse gene co-expression networks. BMC Evol Biol 6:70. doi:10.1186/1471-2148-6-70 CrossRefPubMedGoogle Scholar
  67. Uetz P, Giot L, Cagney G et al (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403:623–627. doi:10.1038/35001009 CrossRefPubMedGoogle Scholar
  68. van Amerongen R, Berns A (2006) Knockout mouse models to study Wnt signal transduction. Trends Genet 22:678–689. doi:10.1016/j.tig.2006.10.001 CrossRefPubMedGoogle Scholar
  69. Vapnik V (2005) The nature of statistical learning theory. Springer, New YorkGoogle Scholar
  70. von Mering C, Krause R, Snel B et al (2002) Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417:399–403. doi:10.1038/nature750 CrossRefGoogle Scholar
  71. Waterston RH, Lindblad-Toh K, Birney E et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. doi:10.1038/nature01262 CrossRefPubMedGoogle Scholar
  72. Winkel A, Stricker S, Tylzanowski P et al (2008) Wnt-ligand-dependent interaction of TAK1 (TGF-beta-activated kinase-1) with the receptor tyrosine kinase Ror2 modulates canonical Wnt-signalling. Cell Signal 20:2134–2144. doi:10.1016/j.cellsig.2008.08.009 CrossRefPubMedGoogle Scholar
  73. Witten IH, Frank E (2000) Data mining: practical machine learning techniques with Java implementations. Morgan Kaufmann, San FranciscoGoogle Scholar
  74. Wuchty S, Ipsaro JJ (2007) A draft of protein interactions in the malaria parasite P. falciparum. J Proteome Res 6:1461–1470. doi:10.1021/pr0605769 CrossRefPubMedGoogle Scholar
  75. Wuchty S, Barabasi AL, Ferdig MT (2006) Stable evolutionary signal in a yeast protein interaction network. BMC Evol Biol 6:8. doi:10.1186/1471-2148-6-8 CrossRefPubMedGoogle Scholar
  76. Xia K, Dong D, Han JD (2006) IntNetDB v1.0: an integrated protein–protein interaction network database generated by a probabilistic model. BMC Bioinformatics 7:508. doi:10.1186/1471-2105-7-508 Google Scholar
  77. Xu Q, Wang Y, Dabdoub A et al (2004) Vascular development in the retina and inner ear: control by Norrin and Frizzled-4, a high-affinity ligand-receptor pair. Cell 116:883–895. doi:10.1016/S0092-8674(04)00216-8 CrossRefPubMedGoogle Scholar
  78. Xuan Z, Wang J, Zhang MQ (2003) Computational comparison of two mouse draft genomes and the human golden path. Genome Biol 4:R1. doi:10.1186/gb-2002-4-1-r1 CrossRefPubMedGoogle Scholar
  79. Yellaboina S, Dudekula DB, Ko M (2008) Prediction of evolutionarily conserved interologs in Mus musculus. BMC Genomics 9:465. doi:10.1186/1471-2164-9-465 CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Sichuan Key Laboratory of Molecular Biology and Biotechnology, Ministry of Education Key Laboratory for Bio-resource and Eco-environment, College of Life Sciences, Sichuan UniversityChengduPeople’s Republic of China
  2. 2.Sichuan Animal Science AcademyChengduPeople’s Republic of China

Personalised recommendations