Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data
The Gaussian graphical model is a widely used tool for learning gene regulatory networks with high-dimensional gene expression data. Most existing methods for Gaussian graphical models assume that the data are homogeneous, i.e., all samples are drawn from a single Gaussian distribution. However, for many real problems, the data are heterogeneous, which may contain some subgroups or come from different resources. This paper proposes to model the heterogeneous data using a mixture Gaussian graphical model, and apply the imputation-consistency algorithm, combining with the ψ-learning algorithm, to estimate the parameters of the mixture model and cluster the samples to different subgroups. An integrated Gaussian graphical network is learned across the subgroups along with the iterations of the imputation-consistency algorithm. The proposed method is compared with an existing method for learning mixture Gaussian graphical models as well as a few other methods developed for homogeneous data, such as graphical Lasso, nodewise regression, and ψ-learning. The numerical results indicate superiority of the proposed method in all aspects of parameter estimation, cluster identification, and network construction. The numerical results also indicate generality of the proposed method: it can be applied to homogeneous data without significant harms. The accompanied R package GGMM is available at https://cran.r-project.org.
The authors thank the book editor Dr. Yichuan Zhao and two referees for their constructive comments which have led to significant improvement of this paper. Liang’s research was supported in part by the grants DMS-1612924 and DMS/NIGMS R01-GM117597.
- Ahmadi, M., Nasiri, M., & Ebrahimi, A. (2016). Thrombosis-related factors FV and F13A1 mutations in uterine myomas. Zahedan Journal of Research in Medical Sciences, 18(10), e4836.Google Scholar
- Fan, J., Feng, Y., & Xia, L. (2015). A projection based conditional dependence measure with applications to high-dimensional undirected graphical models. ArXiv preprint arXiv:1501.01617.Google Scholar
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed., 763 pp.). Berlin, Springer.Google Scholar
- Liang, F., Jia, B., Xue, J., Li, Q., & Luo, Y. (2018). An imputation-consistency algorithm for high-dimensional missing data problems and beyond. ArXiv preprint arXiv:1802.02251.Google Scholar
- Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., & Williams, R. M., Jr. (1949). The American soldier, Vol. 1: Adjustment during army life. Princeton, NJ: Princeton University Press.Google Scholar