Chapter

Research in Computational Molecular Biology

Volume 7821 of the series Lecture Notes in Computer Science pp 173-185

NP-MuScL: Unsupervised Global Prediction of Interaction Networks from Multiple Data Sources

  • Kriti PuniyaniAffiliated withCarnegie Mellon UniversitySchool of Computer Science, Carnegie Mellon University
  • , Eric P. XingAffiliated withCarnegie Mellon UniversitySchool of Computer Science, Carnegie Mellon University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Inference of gene interaction networks from expression data usually focuses on either supervised or unsupervised edge prediction from a single data source. However, in many real world applications, multiple data sources, such as microarray and ISH measurements of mRNA abundances, are available to offer multi-view information about the same set of genes. We propose NP-MuScL (nonparanormal multi-source learning) to estimate a gene interaction network that is consistent with such multiple data sources, which are expected to reflect the same underlying relationships between the genes. NP-MuScL casts the network estimation problem as estimating the structure of a sparse undirected graphical model. We use the semiparametric Gaussian copula to model the distribution of the different data sources, with the different copulas sharing the same precision (i.e., inverse covariance) matrix, and we present an efficient algorithm to estimate such a model in the high dimensional scenario. Results are reported on synthetic data, where NP-MuScL outperforms baseline algorithms significantly, even in the presence of noisy data sources. Experiments are also run on two real-world scenarios: two yeast microarray data sets, and three Drosophila embryonic gene expression data sets, where NP-MuScL predicts a higher number of known gene interactions than existing techniques.

Keywords

interaction networks gene expression multi-source learning sparsity Gaussian graphical models nonparanormal copula