Skip to main content

Modeling of Biochemical Networks via Classification and Regression Tree Methods

  • 693 Accesses

Part of the Nonlinear Systems and Complexity book series (NSCH,volume 24)

Abstract

In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.

Keywords

  • Classification And Regression Tree (CART)
  • Gaussian Graphical Models (GGM)
  • Twoing Rule
  • Gini Rule
  • Split Question

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-90972-1_7
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-90972-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Hardcover Book
USD   169.99
Price excludes VAT (USA)
Fig. 7.1
Fig. 7.2

References

  1. Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)

    Google Scholar 

  2. Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as an alternative approach of Gaussian graphical model for biochemical networks. J. Appl. Stat. 44(16), 2858–2876 (2017)

    CrossRef  MathSciNet  Google Scholar 

  3. Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 2101–2113 (2004)

    CrossRef  Google Scholar 

  4. Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)

    Google Scholar 

  5. Bozdoğan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)

    CrossRef  MathSciNet  Google Scholar 

  6. Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)

    Google Scholar 

  7. Breiman, L.: Bagging predictors. Mach. Learn. 2(24), 123–140 (1996)

    MATH  Google Scholar 

  8. Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)

    CrossRef  Google Scholar 

  9. Dobra, A., Lenkoski, A.: Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5(2A), 969–993 (2011)

    CrossRef  MathSciNet  Google Scholar 

  10. Everett, B., Dunn G.: Applied Multivariate Data Analysis. Arnold Press, London (2001)

    CrossRef  Google Scholar 

  11. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)

    CrossRef  MathSciNet  Google Scholar 

  12. Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    CrossRef  MathSciNet  Google Scholar 

  13. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)

    CrossRef  Google Scholar 

  14. Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)

    Google Scholar 

  15. Genest, C., Favre, A.C.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydroelectric Eng. 12(4), 347–368 (2007)

    CrossRef  Google Scholar 

  16. Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)

    CrossRef  Google Scholar 

  17. Hastie, T.: The Elements of Statistical Learning. Springer, New York (2001)

    CrossRef  Google Scholar 

  18. Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Upper Saddle River (2002)

    MATH  Google Scholar 

  19. Jones, D.T., Buchan, D.W.A., Cozzetto, D., Pontil, M.: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large sequence alignments. Bioinformatics 28(2), 184–190 (2012)

    CrossRef  Google Scholar 

  20. Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue-resisue contact predictions in a sequence- and structure-rich era. PNAS 110(39), 15674–15679 (2013)

    CrossRef  Google Scholar 

  21. Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)

    Google Scholar 

  22. Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)

    Google Scholar 

  23. Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M.: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004)

    CrossRef  Google Scholar 

  24. Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)

    CrossRef  Google Scholar 

  25. Meinhausen, N., Buhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006)

    CrossRef  MathSciNet  Google Scholar 

  26. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)

    CrossRef  Google Scholar 

  27. Pinto, A.A., Zilberman, D.: Modeling, Dynamics, Optimization and Bioeconomics I. Springer International Publishing, Cham (2014)

    MATH  Google Scholar 

  28. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    CrossRef  MathSciNet  Google Scholar 

  29. Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)

    Google Scholar 

  30. Taylan, P., Weber, G.W., Yerlikaya Özkurt, F.: A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. Top 18(2), 377–395 (2010)

    CrossRef  MathSciNet  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  32. Tibshirani, R., Saunders, M.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. 67(1), 91–108 (2005)

    CrossRef  MathSciNet  Google Scholar 

  33. Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)

    Google Scholar 

  34. Trivedi, K., Zimmer, D.: Copula modeling: an introduction for practitioners. Found. Trends Econ. 1(1), 1–111 (2005)

    MATH  Google Scholar 

  35. Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)

    Google Scholar 

  36. Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)

    MATH  Google Scholar 

  37. Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Chichester (2001)

    MATH  Google Scholar 

  38. Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)

    Google Scholar 

  39. Witten, D.M., Tibshirani, R.: Covariance regularised regression and classification for high dimensional problems. J. R. Stat. Soc. 71(3), 615–636 (2009)

    CrossRef  MathSciNet  Google Scholar 

  40. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrica 94, 19–35 (2007)

    CrossRef  MathSciNet  Google Scholar 

  41. Zhou, S.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12(4), 2975–3026 (2011)

    MathSciNet  MATH  Google Scholar 

  42. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    CrossRef  MathSciNet  Google Scholar 

  43. Zou, H., Hastie T.: Regularisation and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)

    CrossRef  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vilda Purutçuoğlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Seçilmiş, D., Purutçuoğlu, V. (2019). Modeling of Biochemical Networks via Classification and Regression Tree Methods. In: Taş, K., Baleanu, D., Machado, J. (eds) Mathematical Methods in Engineering. Nonlinear Systems and Complexity, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-90972-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90972-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90971-4

  • Online ISBN: 978-3-319-90972-1

  • eBook Packages: EngineeringEngineering (R0)