Novel model selection criteria on sparse biological networks

  • G. B. Bülbül
  • V. PurutçuoğluEmail author
  • E. Purutçuoğlu


In statistical literature, gene networks are represented by graphical models, known by their sparsity in high dimensions. In this study, we suggest novel model selection criteria, namely, ICOMP, CAIC and CAICF to apply on simulated gene networks when selecting an optimal model among alternative estimated networks’ constructions. In this description, we build models with the Gaussian graphical model (GGM) and the inference of GGM is achieved via the graphical lasso method. In the assessment of our proposed model selection criteria, we compare their accuracies with other well-known criteria in this field under various dimensions and topologies of networks.


Model selection criteria Simulated gene networks Gaussian graphical model 



The authors thank the METU Research Grant (No. BAP-08-11-2017-035) and the COSTNET Project (No. CA15109) for their support. Moreover, they thank anonymous referees and the editor for their valuable comments which improve the quality of the paper significantly.


  1. Abegaz F, Wit E (2013) Sparse time series chain graphical models for reconstructing genetic network. Biostatistics 14(3):586–599. Google Scholar
  2. Akhmetova Z, Zhuzbaev S, Boranbayev S (2016) The method and software for the solution of dynamic waves propagation problem in elastic medium. Acta Phys Pol A 130:352–354. Google Scholar
  3. Akıncılar A (2017) A mathematical model for transporting the arriving passengers from the airport to the city centre. Acta Phys Pol A 132(3):1214. Google Scholar
  4. Alon U (2007) An introduction to systems biology: design principle of biological circuits. Chapman and Hall/CRC, Boca RatonGoogle Scholar
  5. Aparicio S, Villazón-Terrazas J, Álvarez G (2015) A model for scale free networks: application to twitter. Entropy 17:5848–5867. Google Scholar
  6. Ayyıldız E, Ağraz M, Purutçuoğlu V (2016) MARS as an alternative approach of Gaussian graphical model for biochemical networks. J Appl Stat 44:2858–2876. Google Scholar
  7. Ayyıldız E, Purutçuoğlu V, Weber G-W (2017) Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks. Eur J Oper Res 270:852. Google Scholar
  8. Beycioğlu A, Gültekin A, Aruntaş HY (2017) Usability of fuzzy logic modeling for prediction of fresh properties of self-compacting concrete. Acta Phys Pol A 132(3):1140. Google Scholar
  9. Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psycometrika 52(3):345–370. Google Scholar
  10. Bozdogan H (2010) A new class of information complexity (ICOMP) criteria with an application to customer profiling segmentation. Istanb Univ J School Bus Adm 39(2):370–398Google Scholar
  11. Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 6(35):2313–2351. Google Scholar
  12. Casella G, Berger RL (2002) Statistical inference. Thompson Learning, TorontoGoogle Scholar
  13. Cheng G, Zhou I, Huang JZ (2014) Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 1(20):141–163Google Scholar
  14. Cheung NJ, Xu Z-K, Ding X-M, Shen H-B (2005) Modeling nonlinear dynamic biological systems with human-readable fuzzy rules optimized by convergent heterogeneous particle swarm. Eur J Oper Res. Google Scholar
  15. Comert Z, Kocamaz AF (2017) Comparison of machine learning techniques for fetal heart rate classification. Acta Phys Pol A 132(3):451. Google Scholar
  16. Dobra A, Lenkoski A (2011) Copula Gaussian graphical models and their application to modeling functional disability data. Ann Math Stat 5:969–993. Google Scholar
  17. Dokuzoğlu D, Purutçuoğlu V (2017) Comprehensive analyses of Gaussian graphical model under different biological networks. Acta Phys Pol A 3(132):1106–1111. Google Scholar
  18. Ergenç T, Weber G-W (2004) Modeling and prediction of gene-expression patterns reconsidered with Runge–Kutta discretization. Comput Technol 9:40Google Scholar
  19. Ergul Z, Kamıslı Ozturk Z (2017) A new mathematical model for multisession exams-building assignment. Acta Phys Pol A 132(3):1207. Google Scholar
  20. Foygel R, Drton M (2010) Extended Bayesian information criteria for Gaussian graphical models. Adv Neural Inf Process Syst 23:604–612Google Scholar
  21. Friedman JH, Hastie T, Tibshriani R (2007) Sparse inverse covariance estimation with graphical lasso. Biostatistics 9(3):432–441. Google Scholar
  22. Gebert J, Laetsch M, Quek E, Weber G-W (2004) Analysing and optimizing genetic network structure via path-finding. Comput Technol 9(3):3–12Google Scholar
  23. Golightly A, Wilkinson DJ (2006) Bayesian sequential inference for nonlinear multivariate diffusions. Stat Comput 16:323–338. Google Scholar
  24. Gürbüz B, Sezer M (2016) Laguerre polynomial solutions of a class of an initial boundary value problems arising in science and engineering fields. Acta Phys Pol A 130:1194–1197. Google Scholar
  25. Hastie T, Tibshriani R, Friedman JH (2001) The elements of statistical learning. Springer, New YorkGoogle Scholar
  26. Iyit N, Yonar H, Genc A (2016) Generalized linear models for European Union countries energy data. Acta Phys Pol A 130(1):397. Google Scholar
  27. Kiraz A, Canpolat O, Erkan EF, Özer Ç (2018) Artificial neural networks modelling for prediction of Pb(II) adsorption. Int J Environ Sci Technol. Google Scholar
  28. Liu H, Roeder K, Wasserman L (2010) Stability approach to regularization selection (StARS) for high dimensional graphical models. Adv Neural Inf Process Syst 24(2):1432–1440Google Scholar
  29. Lysen S (2008) Permuted inclusion criterion: a variable selection technique. Publicly accessible Penn dissertations, Paper 28Google Scholar
  30. Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G (2010) Supporting information: revealing strengths and weaknesses of methods for gene network inference. PNAS 14(107):6286–6291Google Scholar
  31. Meinshausen N, Bühlmann P (2006) Stability selection. Ann Stat 3(34):1436–1462. Google Scholar
  32. Mendes P, Kell D (1998) Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 14(10):869–883. Google Scholar
  33. Noor A, Serpendin E, Nounou E, Mohamed H, Chouchane L (2013) An overview of the statistical methods used for inferring gene regulatory networks and protein-protein interaction networks. Adv Bioinform. Google Scholar
  34. Özdemir A, Erdem Z, Usuman I (2016) State-space modeling of an EPW in discrete time and observer design for state variable estimation. Acta Phys Pol A 130(1):228. Google Scholar
  35. Özmen A, Weber G-W (2014) RMARS: robustification of multivarite adaptive regression spline under polyhedral uncertainty. J Comput Appl Math 259:914–924Google Scholar
  36. Özmen A, Kropat E, Weber G-W (2006) Robust optimization in spline regression models for multi-model regulatory networks under polyhedral uncertainty. Optimization 66:1–21. Google Scholar
  37. Özmen A, Weber G-W, Batmaz I, Kropat E (2011) RCMARS: robustification of CMARS with different scenarios under polyhedral uncertainty set. Commun Nonlinear Sci Numer Simul 16(12):4780–4787 (in Special Issue Nonlinear, Fractional and Complex Systems with Discontinuity and Chaos, D. Baleanu and J.A. Tenreiro Machado) Google Scholar
  38. Özmen A, Weber G-W, Kropat E (2012) Robustification of conic generalized partial linear models under polyhedral uncertainty. Problems Nonlinear Anal Eng Syst 2(38):104–113Google Scholar
  39. Özmen A, Weber G-W, Karimov A (2013a) A robust optimization tool applied on financial data. Pac J Optim 3(9):535–552Google Scholar
  40. Özmen A, Weber G-W, Çavuşoğlu Z, Defterli O (2013b) The new robust conic GPLM method with an application to finance: prediction of credit default. J Glob Optim 2(56):233–249Google Scholar
  41. Pachocki J (2016) Graphs and beyond: faster algorithms for high dimensional convex optimization. Master’s thesis, Carnegie Mellon University, PittsburghGoogle Scholar
  42. Pfaltz JL (2013) Mathematical model of dynamic social networks. Soc Netw Anal Min 3(4):863–872. Google Scholar
  43. Ram I, Elad M, Cohen I (2011) Redundant wavelets on graphs and high dimensional data clouds. IEEE Signal Process Lett 19:291. Google Scholar
  44. Taylan P, Weber G-W, Yerlikaya-Özkurt F (2010a) A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. TOP 18:377–395Google Scholar
  45. Taylan P, Weber G-W, Liu L, Yerlikaya-Özkurt F (2010b) On the foundations of parameter estimation for generalized partial linear models with B-splines and continuous optimization. Comput Math Appl 1(60):134–143. Google Scholar
  46. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused LASSO. J Roy Stat Soc B 67:91–108. Google Scholar
  47. Ugur O, Pickl SW, Weber G-W, Wünschiers R (2009) An algorithmic approach to analyze genetic networks and biological energy production: an introduction and contribution where OR meets biology. Optimization 58(1):1–22Google Scholar
  48. Wang Y, Trupti J, Zhang X-S, Xu D, Chen L (2006) Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics 22(19):2413–2420. Google Scholar
  49. Weber G-W, Tezel A (2007) On generalized semi-infinite optimization of genetic networks. TOP 15(1):67–71. Google Scholar
  50. Weber G-W, Tezel A, Taylan P, Soyler A, Cetin M (2008a) Mathematical contributions to dynamics and optimization of gene–environment networks. Optimization 57(2):353–377. Google Scholar
  51. Weber G-W, Taylan P, Alparslan-Gok SZ, Özöğür-Akyüz S, Aktek-Öztürk B (2008b) Optimization of gene–environment networks in the presence of errors and uncertainty with Chebyshev approximation. TOP 16(2):284–318Google Scholar
  52. Weber G-W, Defterli O, Kropat E, Alparslan-Gök SZ (2011) Modeling, inference and optimization of regulatory networks based on time series data. Eur J Oper Res 211(1):1–14Google Scholar
  53. Weber G-W, Batmaz I, Köksal G, Taylan P, Yerlikaya-Özkurt F (2012) CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Problems Sci Eng 20:134–143Google Scholar
  54. Wilkinson D (2018) Stochastic Modelling for Systems Biology. Chapman and Hall/CRC, New YorkGoogle Scholar
  55. Wynn ML, Consul N, Merajver SD, Schnell S (2012) Logic-based models in systems biology: predictive and parameter-free network analysis method. Integr Biol. Google Scholar
  56. Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94:19–35. Google Scholar
  57. Zhao T, Liu H, Roeder K, Lafferty J, Wasserman L (2012) The huge package for high-dimensional undirected graph estimation in R. J Mach Learn Res 13:1059–1062Google Scholar
  58. Zou H (2006) The adaptive LASSO ad its oracle properties. J Am Stat Assoc 101:1418–1429. Google Scholar
  59. Zou H, Hastie T, Tibshirani R (2007) On the degrees of freedom of the lasso. Ann Stat 35(5):2173–2192. Google Scholar

Copyright information

© Islamic Azad University (IAU) 2019

Authors and Affiliations

  • G. B. Bülbül
    • 1
  • V. Purutçuoğlu
    • 1
    Email author
  • E. Purutçuoğlu
    • 2
  1. 1.Department of StatisticsMiddle East Technical University (METU)AnkaraTurkey
  2. 2.Department of Social ServiceAnkara UniversityAnkaraTurkey

Personalised recommendations