Automated Computational Inference of Multi-protein Assemblies from Biochemical Co-purification Data

  • Florian Goebels
  • Lucas Hu
  • Gary Bader
  • Andrew EmiliEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1764)


Biology has amassed a wealth of information about the function of a multitude of protein-coding genes across species. The challenge now is to understand how all these proteins work together to form a living organism, and a crucial step for gaining this knowledge is a complete description of the molecular “wiring circuits” that underlie cellular processes. In this chapter, we describe a general computational framework for predicting multi-protein assemblies from biochemical co-fractionation data.

Key words

Protein-protein interaction Bioinformatics Machine learning Systems biology Protein interaction prediction Protein complex prediction Python Docker Cytoscape 


  1. 1.
    Lucas Hu Ming FG, Cuihong Wan, Gary Bader, Andrew Emili (2018) EPIC: elution profile-based inference of protein complex membership. Under revision.Google Scholar
  2. 2.
    Havugimana PC et al (2012) A census of human soluble protein complexes. Cell 150(5):1068–1081CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Wan C et al (2015) Panorama of ancient metazoan macromolecular complexes. Nature 525(7569):339–344CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Shannon P et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Ruepp A et al (2010) CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res 38(suppl 1):D497–D501CrossRefPubMedGoogle Scholar
  6. 6.
    Kerrien S et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(D1):D841–D846CrossRefPubMedGoogle Scholar
  7. 7.
    Gene Ontology C (2015) Gene ontology consortium: going forward. Nucleic Acids Res 43(Database issue):D1049–D1056CrossRefGoogle Scholar
  8. 8.
    Wehrens, R. and M.R. Wehrens, Package ‘wccsom’. 2015Google Scholar
  9. 9.
    Sánchez-Taltavull D et al (2016) Bayesian correlation analysis for sequence count data. PLoS One 11(10):e0163595CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRefGoogle Scholar
  11. 11.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  12. 12.
    Szklarczyk D et al (2017) The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res 45(D1):D362–D368CrossRefPubMedGoogle Scholar
  13. 13.
    Warde-Farley D et al (2010) The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38(suppl_2):W214–W220CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Davis J and Goadrich M 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning. ACMGoogle Scholar
  15. 15.
    Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefGoogle Scholar
  16. 16.
    Lee I et al (2011) Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res 21(7):1109–1121CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Lee I et al (2010) Predicting genetic modifier loci using functional gene networks. Genome Res 20(8):1143–1153CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Kim WK, Krumpelman C, Marcotte EM (2008) Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy. Genome Biol 9(1):S5CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Florian Goebels
    • 1
  • Lucas Hu
    • 1
  • Gary Bader
    • 1
  • Andrew Emili
    • 1
    Email author
  1. 1.Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoCanada

Personalised recommendations