Background

In the scientific literature and large public databases there are currently only ~39000 human protein-protein interactions that have been experimentally confirmed out of a potential 330,000,000 (assuming 1 protein per gene). To bridge the gap, computational methods are required to guide further experimental endeavours.

Results

The PIPs framework [1] uses a naïve Bayesian method that combines the predictive capabilities of numerous features to calculate the likelihood of interaction between two proteins. Features considered by the predictor include co-expression, orthology, domain co-occurrence, post translational modification and a new feature analysing semantic similarity of Gene Ontology terms. The predictor now includes two modules that make predictions based on the topology of the predicted protein-protein interaction network. We predict 318800 interaction predictions of which 310732 (96.3%) are not present within other publically available databases. Several of the predictions have been experimentally validated by external groups.

The PIPs website (http://www.compbio.dundee.ac.uk/pips) [2] is an easy to use system to explore the predictions that have been made. Searches can be initiated by querying with a protein identifier (IPI, RefSeq or UniProt) or via a keyword search. All predicted protein-protein interactions are returned ranked by their likelihood of interaction. The website allows the user to analyse the evidence used to calculate the likelihood of interaction and provides links through to external databases and publications to retrieve the source data.

Conclusions

The set of predictions that have been made in this work increase the coverage of the human interactome and help guide future research.