Evolving Regression Models

  • Roberto BaragonaEmail author
  • Francesco Battaglia
  • Irene Poli
Part of the Statistics and Computing book series (SCO)


Regression models are well established tools in statistical analysis which date back early to the eighteenth century. Nonetheless, problems involved in their implementation and application in a wide number of fields are still the object of active research. Preliminary to the regression model estimation there is an identification step which has to be performed for selecting the variables of interest, detecting the relationships of interest among them, distinguishing dependent and independent variables. On the other hand, generalized regression models often have nonlinear and non convex log-likelihood, therefore maximum likelihood estimation requires optimization of complicated functions. In this chapter evolutionary computation methods are presented that have been developed to either support or surrogate analytic tools if the problem size and complexity limit their efficiency.


Genetic Algorithm Fitness Function Independent Component Analysis Independent Component Analysis Binary String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Balcombe K (2005) Model selection using information criteria and genetic algorithms. Comput Econ 25:207–228zbMATHCrossRefGoogle Scholar
  2. Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19:1962–1984zbMATHCrossRefGoogle Scholar
  3. Bell AJ, Sejnowski TJ (1995) An information – maximization approach to blind separation and blind deconvolution. Neural Comput 7:1129–1159CrossRefGoogle Scholar
  4. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159CrossRefGoogle Scholar
  5. Bremer RH, Langevin GJ (1993) The genetic algorithm for identifying the structure of a mixed model. In: ASA proceedings of the statistical computing section. American Statistical Association, Alexandria, pp 80–85Google Scholar
  6. Cardoso JF, Souloumiac A (1993) Blind beamforming for non Gaussian signals. IEE Proc F 140:362–370Google Scholar
  7. Chatterjee S, Laudato M, Lynch LA (1996) Genetic algorithms and their statistical applications: an introduction. Comput Stat Data Anal 22:633–651zbMATHCrossRefGoogle Scholar
  8. Chiodi M (1986) Procedures for generating pseudo-random numbers from a normal distribution of order p (p > 1). Stat Appl 1:7–26Google Scholar
  9. Fitzenberger B, Winker P (1998) Threshold accepting to improve the computation of censored quantile regression. In: Paynem R, Green P (eds) COMPSTAT, proceedings in computational statistics. Physica-Verlag, Heidelberg, pp 311–316Google Scholar
  10. Friedman J (1987) Exploratory projection pursuit. J Am Stat Assoc 82:249–266zbMATHCrossRefGoogle Scholar
  11. Galeano P, Peña D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101:654–669zbMATHCrossRefGoogle Scholar
  12. Gorriz JM, Puntonet CG, Gomez AM, Pernia O (2005) Guided GA-ICA algorithms. In: Wang J, Liao X, Yi Z (eds) ISNN 2005, LNCS 3496. Springer, Berlin Heidelberg, pp 943–948Google Scholar
  13. Guo Q, Wu W, Massart DL, Boucon C, de Jong S (2002) Feature selection in principal component analysis of analytical data. Chemom Intell Lab Syst 61:123–132CrossRefGoogle Scholar
  14. Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley, New York, NYGoogle Scholar
  15. Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475zbMATHCrossRefMathSciNetGoogle Scholar
  16. Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430CrossRefGoogle Scholar
  17. Kapetanios G (2007) Variable selection in regression models using nonstandard optimisation of information criteria. Comput Stat Data Anal 52:4–15zbMATHCrossRefMathSciNetGoogle Scholar
  18. Kemsley EK (1998) A genetic algorithm approach to the calculation of canonical variates. Trends Anal Chem 17:24–34CrossRefGoogle Scholar
  19. Kemsley EK (2001) A hybrid classification method: discrete canonical variate analysis using a genetic algorithm. Chemom Intell Lab Syst 55:39–55CrossRefGoogle Scholar
  20. Lauritzen SL (1996) Graphical models. Oxford University Press, OxfordGoogle Scholar
  21. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonzbMATHGoogle Scholar
  22. Miller AJ (1990) Subset selection in regression. Chapman and Hall, LondonzbMATHGoogle Scholar
  23. Minerva T, Paterlini S (2002) Evolutionary approaches for statistical modelling. In: Fogel DB, El-Sharkam MA, Yao G, Greenwood H, Iba P, Marrow P, Shakleton M (eds) Evolutionary computation 2002. Proceedings of the 2002 congress on evolutionary computation. IEEE Press, Piscataway, NJ, vol 2, pp 2023–2028Google Scholar
  24. Mitchell M (1996) An Introduction to genetic algorithms. The MIT Press, Cambridge, MAGoogle Scholar
  25. Pasia JM, Hermosilla AY, Ombao H (2005) A useful tool for statistical estimation: genetic algorithms. J Statistical Comput Simul 75:237–251zbMATHCrossRefMathSciNetGoogle Scholar
  26. Robles V, Bielza C, Larrañaga P, González S, Ohno-Machado L (2008) Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms. TOP 16:345–366zbMATHCrossRefMathSciNetGoogle Scholar
  27. Roverato A, Poli I (1998) A genetic algorithm for graphical model selection. J Ital Stat Soc 7:197–208CrossRefGoogle Scholar
  28. Sabatier R, Reynés C (2008) Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms. Comput Stat Data Anal 52:4779–4789zbMATHCrossRefGoogle Scholar
  29. Sessions D, Stevans L (2006) Investigating omitted variable bias in regression parameter estimation: a genetic algorithm approach. Comput Stat Data Anal 50:2835–2854zbMATHCrossRefMathSciNetGoogle Scholar
  30. Spears WM, De Jong KA (1991) An analysis of multi-point crossover. In: Rawlins GJE (ed) Foundations of genetic algorithms. Morgan Kaufmann, San Mateo, CA, pp 301–315Google Scholar
  31. Sun ZL, Huang DS, Zheng CH, Shang L (2006) Optimal selection of time lags for TDSEP based on genetic algorithm. Neurocomputing 69:884–887CrossRefGoogle Scholar
  32. Tan Y, Wang J (2001) Nonlinear blind source separation using higher order statistics and a genetic algorithm. IEEE Trans Evol Comput 5:600–612CrossRefGoogle Scholar
  33. Tolvi J (2004) Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Comput 8:527–533zbMATHCrossRefGoogle Scholar
  34. Vitrano S, Baragona R (2004) The genetic algorithm estimates for the parameters of order p normal distributions. In: Bock HH, Chiodi M, Mineo A (eds) Advances in multivariate data analysis. Springer, Berlin Heidelberg, pp 133–143Google Scholar
  35. Zhou X, Wang J (2005) A genetic method of LAD estimation for models with censored data. Comput Stat Data Anal 48:451–466zbMATHCrossRefGoogle Scholar
  36. Ziehe A, Müller KR (1998) Tdsep – and efficient algorithm for blind separation using time structure. In: Proceedings of the international conference on ICANN, perspectives in neural computing. Springer, Berlin, pp 675–680Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Roberto Baragona
    • 1
    Email author
  • Francesco Battaglia
    • 2
  • Irene Poli
    • 3
  1. 1.Department of Communication and Social ResearchSapienza University of RomeRomeItaly
  2. 2.Department of Statistical SciencesSapienza University of RomeRomaItaly
  3. 3.Department of StatisticsCa’ Foscari University of VeniceVeniceItaly

Personalised recommendations