Annals of Mathematics and Artificial Intelligence

, Volume 61, Issue 2, pp 105–123 | Cite as

Feature extraction from optimization samples via ensemble based symbolic regression

  • Kalyan K. Veeramachaneni
  • Ekaterina Vladislavleva
  • Una-May O’Reilly


We demonstrate a means of knowledge discovery through feature extraction that exploits the search history of a search-based optimization run. We regress a symbolic model ensemble from optimization run search points and their objective scores. The frequency of a variable in the models of the ensemble indicates to what the extent it is an influential feature. Our demonstration uses a genetic programming symbolic regression software package that is designed to be “off-the-shelf”. By default, the only parameter needed in order to evolve a suite of models is how long the user is willing to wait. Then the user can easily specify which models should go forward in terms of sufficient accuracy and complexity. For illustration purposes, we consider a sequencing heuristic used to chain remote sensors from one to the next: “place the most reliable sensor last”. The heuristic is derived based on the mathematical form of the optimization objective function which places emphasis on the decision variable pertaining to the last sensor. Feature extraction on optimized sensor sequences demonstrates that the heuristic is usually effective though it is not always trustworthy. This is consistent with knowledge in sensor processing.


Feature selection Symbolic regression Genetic programing Ensemble methods 

Mathematics Subject Classification (2010)



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Evolved Analytics LLC: DataModeler Release 1.0. Evolved Analytics LLC. URL (2010)
  2. 2.
    Keijzer, M.: Scientific Discovery Using Genetic Programming. PhD thesis, Danish Technical University (2002)Google Scholar
  3. 3.
    Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)Google Scholar
  4. 4.
    Papastavrou, J., Athans, M.: Distributed detection by a large team of sensors in tandem. IEEE Trans. Aerosp. Electron. Syst. 28(3), 639–653. doi: 10.1109/7.256286 (1992)CrossRefGoogle Scholar
  5. 5.
    Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice II, chap. 17, pp. 283–299. Springer, Ann Arbor (2004)Google Scholar
  6. 6.
    Smits, G., Kordon, A., Vladislavleva, K., Jordaan, E., Kotanchek, M.: Variable selection in industrial datasets using pareto genetic programming. In: Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice III, Genetic Programming, vol. 9, chap. 6, pp. 79–92. Springer, Ann Arbor (2005)Google Scholar
  7. 7.
    Veeramachaneni, K., Osadciw, L.: Swarm intelligence based optimization and control of decentralized serial sensor networks. In: Swarm Intelligence Symposium. SIS 2008. IEEE, 21–23 Sept. 2008, pp. 1–8. doi: 10.1109/SIS.2008.4668332 (2008)
  8. 8.
    Vladislavleva, E: Model-based Problem Solving Through Symbolic Regression via Pareto Genetic Programming. PhD thesis, Tilburg University, Tilburg, the Netherlands. URL (2008)
  9. 9.
    Wolfram Research: Wolfram Mathematica Overview: Compute and Visualize Key Capabilities. (2009)

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Kalyan K. Veeramachaneni
    • 1
  • Ekaterina Vladislavleva
    • 2
  • Una-May O’Reilly
    • 1
  1. 1.CSAILMassachusetts Institute of TechnologyCambridgeUSA
  2. 2.Evolved Analytics Europe BVBAWijnegemBelgium

Personalised recommendations