Abstract
In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.
Keywords
- Classification And Regression Tree (CART)
- Gaussian Graphical Models (GGM)
- Twoing Rule
- Gini Rule
- Split Question
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options


References
Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)
Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as an alternative approach of Gaussian graphical model for biochemical networks. J. Appl. Stat. 44(16), 2858–2876 (2017)
Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 2101–2113 (2004)
Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)
Bozdoğan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)
Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)
Breiman, L.: Bagging predictors. Mach. Learn. 2(24), 123–140 (1996)
Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)
Dobra, A., Lenkoski, A.: Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5(2A), 969–993 (2011)
Everett, B., Dunn G.: Applied Multivariate Data Analysis. Arnold Press, London (2001)
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)
Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)
Genest, C., Favre, A.C.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydroelectric Eng. 12(4), 347–368 (2007)
Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)
Hastie, T.: The Elements of Statistical Learning. Springer, New York (2001)
Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Upper Saddle River (2002)
Jones, D.T., Buchan, D.W.A., Cozzetto, D., Pontil, M.: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large sequence alignments. Bioinformatics 28(2), 184–190 (2012)
Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue-resisue contact predictions in a sequence- and structure-rich era. PNAS 110(39), 15674–15679 (2013)
Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)
Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M.: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004)
Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)
Meinhausen, N., Buhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)
Pinto, A.A., Zilberman, D.: Modeling, Dynamics, Optimization and Bioeconomics I. Springer International Publishing, Cham (2014)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)
Taylan, P., Weber, G.W., Yerlikaya Özkurt, F.: A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. Top 18(2), 377–395 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
Tibshirani, R., Saunders, M.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. 67(1), 91–108 (2005)
Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)
Trivedi, K., Zimmer, D.: Copula modeling: an introduction for practitioners. Found. Trends Econ. 1(1), 1–111 (2005)
Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Chichester (2001)
Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)
Witten, D.M., Tibshirani, R.: Covariance regularised regression and classification for high dimensional problems. J. R. Stat. Soc. 71(3), 615–636 (2009)
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrica 94, 19–35 (2007)
Zhou, S.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12(4), 2975–3026 (2011)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Zou, H., Hastie T.: Regularisation and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)
Acknowledgements
The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Seçilmiş, D., Purutçuoğlu, V. (2019). Modeling of Biochemical Networks via Classification and Regression Tree Methods. In: Taş, K., Baleanu, D., Machado, J. (eds) Mathematical Methods in Engineering. Nonlinear Systems and Complexity, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-90972-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-90972-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90971-4
Online ISBN: 978-3-319-90972-1
eBook Packages: EngineeringEngineering (R0)