Advertisement

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data

  • Bochao Jia
  • Faming Liang
Chapter
Part of the ICSA Book Series in Statistics book series (ICSABSS)

Abstract

The Gaussian graphical model is a widely used tool for learning gene regulatory networks with high-dimensional gene expression data. Most existing methods for Gaussian graphical models assume that the data are homogeneous, i.e., all samples are drawn from a single Gaussian distribution. However, for many real problems, the data are heterogeneous, which may contain some subgroups or come from different resources. This paper proposes to model the heterogeneous data using a mixture Gaussian graphical model, and apply the imputation-consistency algorithm, combining with the ψ-learning algorithm, to estimate the parameters of the mixture model and cluster the samples to different subgroups. An integrated Gaussian graphical network is learned across the subgroups along with the iterations of the imputation-consistency algorithm. The proposed method is compared with an existing method for learning mixture Gaussian graphical models as well as a few other methods developed for homogeneous data, such as graphical Lasso, nodewise regression, and ψ-learning. The numerical results indicate superiority of the proposed method in all aspects of parameter estimation, cluster identification, and network construction. The numerical results also indicate generality of the proposed method: it can be applied to homogeneous data without significant harms. The accompanied R package GGMM is available at https://cran.r-project.org.

Notes

Acknowledgements

The authors thank the book editor Dr. Yichuan Zhao and two referees for their constructive comments which have led to significant improvement of this paper. Liang’s research was supported in part by the grants DMS-1612924 and DMS/NIGMS R01-GM117597.

References

  1. Aggarwal, C. C. (2018). Machine learning for text. New York: Springer.CrossRefGoogle Scholar
  2. Ahmadi, M., Nasiri, M., & Ebrahimi, A. (2016). Thrombosis-related factors FV and F13A1 mutations in uterine myomas. Zahedan Journal of Research in Medical Sciences, 18(10), e4836.Google Scholar
  3. Benjamini, Y., Krieger, A. M., & Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3), 491–507.MathSciNetCrossRefGoogle Scholar
  4. Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRefGoogle Scholar
  5. Danaher, P., Wang, P., & Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B, 76(2), 373–397.MathSciNetCrossRefGoogle Scholar
  6. Dempster, A. P. (1972). Covariance selection. Biometrics, 28, 157–175.CrossRefGoogle Scholar
  7. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38.MathSciNetzbMATHGoogle Scholar
  8. Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521.MathSciNetCrossRefGoogle Scholar
  9. Fan, J., Feng, Y., & Xia, L. (2015). A projection based conditional dependence measure with applications to high-dimensional undirected graphical models. ArXiv preprint arXiv:1501.01617.Google Scholar
  10. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, Series B, 70, 849–911.MathSciNetCrossRefGoogle Scholar
  11. Fan, J., & Song, R. (2010). Sure independence screening in generalized linear model with NP-dimensionality. Annals of Statistics, 38, 3567–3604.MathSciNetCrossRefGoogle Scholar
  12. Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.CrossRefGoogle Scholar
  13. Haque, R., Ahmed, S. A., Inzhakova, G., Shi, J., Avila, C., Polikoff, J., et al. (2012). Impact of breast cancer subtypes and treatment on survival: An analysis spanning two decades. Cancer Epidemiology and Prevention Biomarkers, 21(10), 1848–1855.CrossRefGoogle Scholar
  14. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed., 763 pp.). Berlin, Springer.Google Scholar
  15. Jia, B., Xu, S., Xiao, G., Lamba, V., & Liang, F. (2017). Learning gene regulatory networks from next generation sequencing data. Biometrics, 73, 1221–1230.MathSciNetCrossRefGoogle Scholar
  16. Lee, S., Liang, F., Cai, L., & Xiao, G. (2018). A two-stage approach of gene network analysis for high-dimensional heterogeneous data. Biostatistics, 19(2), 216–232.MathSciNetCrossRefGoogle Scholar
  17. Li, Y., Tang, X. Q., Bai, Z., & Dai, X. (2016). Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree. Scientific Reports, 6, 35773.CrossRefGoogle Scholar
  18. Liang, F., Jia, B., Xue, J., Li, Q., & Luo, Y. (2018). An imputation-consistency algorithm for high-dimensional missing data problems and beyond. ArXiv preprint arXiv:1802.02251.Google Scholar
  19. Liang, F., Song, Q., & Qiu, P. (2015). An equivalent measure of partial correlation coefficients for high dimensional gaussian graphical models. Journal of the American Statistical Association, 110, 1248–1265.MathSciNetCrossRefGoogle Scholar
  20. Liang, F., & Zhang, J. (2008). Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika, 95, 961–977.MathSciNetCrossRefGoogle Scholar
  21. Lin, Z., Wang, T., Yang, C., & Zhao, H. (2017). On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics, 73(3), 769–779.MathSciNetCrossRefGoogle Scholar
  22. Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40(4), 2293–2326.MathSciNetCrossRefGoogle Scholar
  23. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34, 1436–1462.MathSciNetCrossRefGoogle Scholar
  24. Meng, L., Xu, Y., Xu, C., & Zhang, W. (2016). Biomarker discovery to improve prediction of breast cancer survival: Using gene expression profiling, meta-analysis, and tissue validation. OncoTargets and Therapy, 9, 6177.CrossRefGoogle Scholar
  25. Milinkovic, V., Bankovic, J., Rakic, M., Stankovic, T., Skender-Gazibara, M., Ruzdijic, S., et al. (2013). Identification of novel genetic alterations in samples of malignant glioma patients. PLoS One, 8(12), e82108.CrossRefGoogle Scholar
  26. Mohammaddoust, S., Salehi, Z., & Saeidi Saedi, H. (2018). SEPP1 and SEP15 gene polymorphisms and susceptibility to breast cancer. British Journal of Biomedical Science, 75, 36–39.CrossRefGoogle Scholar
  27. Nielsen, S. F. (2000). The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli, 6(3), 457–489.MathSciNetCrossRefGoogle Scholar
  28. Ruan, L., Yuan, M., & Zou, H. (2011). Regularized parameter estimation in high-dimensional gaussian mixture models. Neural computation, 23(6), 1605–1622.MathSciNetCrossRefGoogle Scholar
  29. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479–498.MathSciNetCrossRefGoogle Scholar
  30. Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., & Williams, R. M., Jr. (1949). The American soldier, Vol. 1: Adjustment during army life. Princeton, NJ: Princeton University Press.Google Scholar
  31. Tibshirani, R. (1996). Regression analysis and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.zbMATHGoogle Scholar
  32. Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Eli Lilly and CompanyLilly Corporate CenterINUSA
  2. 2.Department of StatisticsPurdue UniversityWest LafayetteUSA

Personalised recommendations