Ensemble Modeling for Bio-medical Applications

  • Christian Merkwirth
  • Jörg Wichard
  • Maciej J. Ogorzałek

Abstract

In this paper we propose to use ensembles of models constructed using methods of Statistical Learning. The input data for model construction consists of real measurements taken in physical system under consideration. Further we propose a program toolbox which allows the construction of single models as well as heterogenous ensembles of linear and nonlinear models types. Several well performing model types, among which are ridge regression, k-nearest neighbor models and neural networks have been implemented. Ensembles of heterogenous models typically yield a better generalization performance than homogenous ensembles. Additionally given are methods for model validation and assessment as well as adaptor classes performing transparent feature selection or random subspace training on large number of input variables. The toolbox is implemented in Matlab and C++ and available under the GPL. Several applications of the described methods and the numerical toolbox itself are described. These include ECG modeling, classification of activity in drug design and ...

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. MIT Press, Cambridge (1995), citeseer.ist.psu.edu/krogh95neural.html Google Scholar
  2. 2.
    Perrone, M.P., Cooper, L.N.: When Networks Disagree: Ensemble Methods for Hybrid Neural Networks. In: Mammone, R.J. (ed.) Neural Networks for Speech and Image Processing, pp. 126–142. Chapman and Hall, Boca Raton (1993)Google Scholar
  3. 3.
    Hansen, L., Salamon, P.: Neural Network Ensembles. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  4. 4.
    Naftaly, U., Intrator, N., Horn, D.: Optimal ensemble averaging of neural networks. Network, Comp. Neural Sys. 8, 283–296 (1997)MATHCrossRefGoogle Scholar
  5. 5.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)MATHGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Heidelberg (2001)MATHGoogle Scholar
  7. 7.
    Krogh, A., Sollich, P.: Statistical mechanics of ensemble learning. Physical Review E 55(1), 811–825 (1997)CrossRefGoogle Scholar
  8. 8.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996), citeseer.ist.psu.edu/breiman96bagging.html MATHMathSciNetGoogle Scholar
  9. 9.
    Merkwirth, C., Ogorzalek, M., Wichard, J.: Stochastic gradient descent training of ensembles of dt-cnn classifiers for digit recognition. In: Proceedings of the European Conference on Circuit Theory and Design ECCTD 2003, Kraków, Poland, vol. 2, pp. 337–341 (September 2003)Google Scholar
  10. 10.
    Wichard, J., Ogorzałek, M.: Iterated time series prediction with ensemble models. In: Proceedings of the 23rd International Conference on Modelling Identification and Control (2004)Google Scholar
  11. 11.
    Suykens, J., Vandewalle, J. (eds.): Nonlinear Modeling - Advanced Black–Box Techniques. Kluwer Academic Publishers, Dordrecht (1998)Google Scholar
  12. 12.
    Cohen, S., Intrator, N.: A hybrid projection based and radial basis function architecture. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 147–155. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Merkwirth, C., Lengauer, T.: Automatic generation of complementary descriptors with molecular graph networks (2004)Google Scholar
  14. 14.
    Weislow, O., Kiser, R., Fine, D., Bader, J., Shoemaker, R., Boyd, M.: New soluble formazan assay for hiv-1 cytopathic effects: application to high flux screening of synthetic and natural products for aids antiviral activity. J. Nat. Cancer Inst. 81, 577–586 (1989)CrossRefGoogle Scholar
  15. 15.
    Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: Proceedings of the Third IEEE International Conference on Data Mining ICDM 2003, Melbourne, Florida, pp. 35–42 (November 2003)Google Scholar
  16. 16.
    Wilton, D., Willett, P., Lawson, K., Mullier, G.: Comparison of ranking methods for virtual screening in lead-discovery programs. J. Chem. Inf. Comput. Sci. 43, 469–474 (2003)Google Scholar
  17. 17.
    Rothfuss, A., Steger-Hartmann, T., Heinrich, N., Wichard, J.: Computational prediction of the chromosome-damaging potential of chemicals. Chemical Research in Toxicology 19(10), 1313–1319 (2006)CrossRefGoogle Scholar
  18. 18.
    Kirkland, D., Aardema, M., Henderson, L., Muller, L.: Evaluation of the ability of a battery of three in vitro genotoxicity tests to discriminate rodent carcinogens and non-carcinogens. Mutat. Res. 584, 1–256 (2005)Google Scholar
  19. 19.
    Snyder, R.D., Pearl, G.S., Mandakas, G., Choy, W.N., Goodsaid, F., Rosenblum, I.Y.: Assessment of the sensitivity of the computational programs DEREK, TOPKAT and MCASE in the prediction of the genotoxicity of pharmaceutical molecules. EnViron. Mol. Mutagen. 43, 143–158 (2004)CrossRefGoogle Scholar
  20. 20.
    Todeschini, R.: Dragon Software, http://www.talete.mi.it/dragon_exp.htm
  21. 21.
    Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3), 801–849 (1998), http://citeseer.nj.nec.com/breiman98arcing.html MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Serra, J.R., Thompson, E.D., Jurs, P.C.: Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure. Chem. Res. Toxicol. 16, 153–163 (2003)CrossRefGoogle Scholar
  23. 23.
    Li, H., Ung, C., Yap, C., Xue, Y., Li, Z., Cao, Z., Chen, Y.: Prediction of genotoxicity of chemical compounds by statistical learning methods. Chem. Res. Toxicol. 18, 1071–1080 (2005)CrossRefGoogle Scholar
  24. 24.
    McNames, J.: Innovations in Local Modeling for Time Series Prediction, Ph.D. Thesis, Stanford University (1999)Google Scholar
  25. 25.
    Norgaard, M.: Neural Network Based System Identification Toolbox, Tech. Report. 00-E-891, Department of Automation, Technical University of Denmark (2000), http://www.iau.dtu.dk/research/control/nnsysid.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Christian Merkwirth
    • 1
  • Jörg Wichard
    • 2
    • 4
  • Maciej J. Ogorzałek
    • 1
    • 3
  1. 1.Department of Information TechnologiesJagiellonian UniversityCracowPoland
  2. 2.Institute of Molecular PharmacologyBerlinGermany
  3. 3.AGH University of Science and TechnologyCracowPoland
  4. 4.Institut für Medizinische InformatikBerlinGermany

Personalised recommendations