Genetic Programming Theory and Practice V pp 107-124 | Cite as
Investigating Problem Hardness of Real Life Applications
This chapter represents a first attempt to characterize the fitness landscapes of real-life Genetic Programming applications by means of a predictive algebraic difficulty indicator. The indicator used is the Negative Slope Coefficient, whose efficacy has been recently empirically demonstrated on a large set of hand-tailored theoretical test functions and well known GP benchmarks. The real-life problems studied belong to the field of Biomedical applications and consist of automatically assessing amathematical relationship between a set of molecular descriptors from a given dataset of drugs and some important pharmacokinetic parameters. The parameters considered here are Human Oral Bioavailability, Median Oral Lethal Dose, and Plasma Protein Binding levels. The availability of good prediction tools for pharmacokinetics parameters like these is critical for optimizing the efficiency of therapies, maximizing medical success rate and minimizing toxic effects. The experimental results presented in this chapter show that the Negative Slope Coefficient seems to be a reasonable tool to characterize the difficulty of these problems, and can be used to choose the most effective Genetic Programming configuration (fitness function, representation, parameters' values) from a set of given ones.
Keywords
problem difficulty fitness landscapes real life applications fitness clouds negative slope coefficientPreview
Unable to display preview. Download preview PDF.
References
- Altenberg, L. (1994). The evolution of evolvability in genetic programming. In Kinnear, K., editor, Advances in Genetic Programming, pages 47-74, Cambridge, MA. The MIT Press.Google Scholar
- Archetti, F., Messina, E., Lanzeni, S., and Vanneschi, L. (2007a). Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (of drugs. In et al.,Google Scholar
- E. Marchiori, editor, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Fifth European Conference, EvoBIO 2007, Lecture Notes in Computer Science, LNCS 4447, pages 11-23. Springer, Berlin, Heidelberg, New York.Google Scholar
- Archetti, F., Messina, E., Lanzeni, S., and Vanneschi, L. (2007b). Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines, special issue on Medical Applications. To appear. Submitted on November 18, 2006.Google Scholar
- Archetti, Francesco, Lanzeni, Stefano, Messina, Enza, and Vanneschi, Leonardo (2006). Genetic programming for human oral bioavailability of drugs. In Keijzer, Maarten, Cattolico, Mike, Arnold, Dirk, Babovic, Vladan, Blum, Christian, Bosman, Peter, Butz, Martin V., Coello Coello, Carlos, Dasgupta, Dipankar, Ficici, Sevan G., Foster, James, Hernandez-Aguirre, Arturo, Hornby, Greg, Lipson, Hod, McMinn, Phil, Moore, Jason, Raidl, Guenther, Rothlauf, Franz, Ryan, Conor, and Thierens, Dirk, editors, GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, volume 1, pages 255-262, Seattle, Washington, USA. ACM Press.Google Scholar
- Barnett, L. (2003). Evolutionary Search on Fitness Landscapes with Neutral Networks. PhD thesis, University of Sussex.Google Scholar
- Berezhkovskiy, L. M. (2006). Determination of drug binding to plasma proteins using competitive equilibrium binding to dextran-coated charcoal. Journal of Pharmacokinetics and Pharmacodynamics, 33(5):920-937.Google Scholar
- Collard, P., Verel, S., and Clergue, M. (2004). Local search heuristics: Fitness cloud versus fitness landscape. In M ántaras, R. L. De and Saitta, L., editors, 2004 European Conference on Artificial Intelligence (ECAI04), pages 973-974, Valence, Spain. IOS Press.Google Scholar
- Horn, J. and Goldberg, D. E. (1995). Genetic algorithm difficulty and the modality of the fitness landscapes. In Whitley, D. and Vose, M., editors, FOGA-3, pages 243-269. Morgan Kaufmann.Google Scholar
- Inc, Simulation Plus (2006). a company that use both statistical methods and differential equations based simulations for adme parameter estimation. See www.simulationsplus.com.
- J. P. Eddershaw, A. P. Beresford and Bayliss, M. K. (2000). Adme/pk as part of a rational approach to drug discovery. Drug Discovery Today, 9:409-414.CrossRefGoogle Scholar
- Jones, T. (1995). Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, University of New Mexico, Albuquerque.Google Scholar
- Keijzer, Maarten (2003). Improving symbolic regression with interval arithmetic and linear scaling. In Ryan, Conor, Soule, Terence, Keijzer, Maarten, Tsang, Edward, Poli, Riccardo, and Costa, Ernesto, editors, Genetic Programming, Proceedings of EuroGP’2003, volume 2610 of LNCS, pages 70-82, Essex. Springer-Verlag.Google Scholar
- Kennedy, T. (1997). Managing the drug discovery/development interface. Drug Discovery Today, 2:436-444.CrossRefGoogle Scholar
- Kinnear, Jr., Kenneth E. (1994). Fitness landscapes and difficulty in genetic programming. In Proceedings of the 1994 IEEE World Conference on Computational Intelligence, volume 1, pages 142-147, Orlando, Florida, USA. IEEE Press.Google Scholar
- Kola, I. and Landis, J. (2004). Can the pharmaceutical industry reduce attrition rates? Nature Reviews Dug Discovery, 3:711-716.CrossRefGoogle Scholar
- Koza, John R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.zbMATHGoogle Scholar
- Koza, John R. and Poli, Riccardo (2003). A genetic programming tutorial. www.Langdon, W. B. and Poli, Riccardo (2002). Foundations of Genetic Programming. Springer-Verlag.
- Madras, N. (2002). Lectures on Monte Carlo Methods. American Mathematical Society, Providence, Rhode Island.Google Scholar
- Manderick, B., de Weger, M., and Spiessens, P. (1991). The genetic algorithm and the structure of the fitness landscape. In Belew, R. K. and Booker, L. B., editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 143-150. Morgan Kaufmann.Google Scholar
- Naudts, B. and Kallel, L. (2000). A comparison of predictive measures of problem difficulty in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 4(1):1-15.CrossRefGoogle Scholar
- Norinder, U. and Bergstrom, C. A. S. (2006). Prediction of admet properties. ChemMedChem, 1:920-937.CrossRefGoogle Scholar
- Poli, R. and Vanneschi, L. (2007). Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms. In Proceedings of the 9th annual conference on Genetic and Evolutionary Computation, GECCO 2007, London, UK. To appear. Nominated for the best paper award of the Genetic Algorithms track.Google Scholar
- REACH (2006). Registration, evaluation and authorisation of chemicals.Google Scholar
- Stadler, P. F. (2002). Fitness landscapes. In Lassig, M. and Valleriani, A., editors, Biological Evolution and Statistical Physics, volume 585 of Lecture Notes Physics, pages 187-207. Springer, Berlin, Heidelberg, New York.Google Scholar
- Tetko, I. V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Palyulin, P. Ertland V.A., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., and Prokopenko, V.V. (2005). Virtual computational chemistry laboratory - design and description. Journal of Computer Aided Molecular Design, 19:453-63. see www.vcclab.org.Google Scholar
- Tomassini, M., Vanneschi, L., Collard, P., and Clergue, M. (2005). A study of fitness distance correlation as a difficulty measure in genetic programming. Evolutionary Computation, 13(2):213-239.CrossRefGoogle Scholar
- Tuffs, A. (2001). Bayer faces shake up after lipobay withdrawn. British Medical Journal, 23(317):828.CrossRefGoogle Scholar
- van de Waterbeemd, H. and Gifford, E. (2003). Admet in silico modeling: towards prediction paradise? Nature Reviews Drug Discovery, 2:192-204.CrossRefGoogle Scholar
- Vanneschi, L., Rochat, D., and Tomassini, M. (2007). Multi-optimization improves genetic programming generalization ability. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2007. ACM Press. To appear.Google Scholar
- Vanneschi, Leonardo (2004). Theory and Practice for Efficient Genetic Programming. PhD thesis, Faculty of Sciences, University of Lausanne, Switzerland.Google Scholar
- Vanneschi, Leonardo, Clergue, Manuel, Collard, Philippe, Tomassini, Marco, and V érel, S ébastien (2004). Fitness clouds and problem hardness in genetic programming. In Deb, Kalyanmoy, Poli, Riccardo, Banzhaf, Wolfgang, Beyer, Hans-Georg, Burke, Edmund, Darwen, Paul, Dasgupta, Dipankar, Floreano, Dario, Foster, James, Harman, Mark, Holland, Owen, Lanzi, Pier Luca, Spector, Lee, Tettamanzi, Andrea, Thierens, Dirk, and Tyrrell, Andy, editors, Genetic and Evolutionary Computation - GECCO2004, Part II, volume 3103 of Lecture Notes in Computer Science, pages 690-701, Seattle, WA, USA. Springer-Verlag.Google Scholar
- Vanneschi, Leonardo, Tomassini, Marco, Collard, Philippe, and V érel, S ébastien (2006). Negative slope coefficient. A measure to characterize genetic programming. In Collet, Pierre, Tomassini, Marco, Ebner, Marc, Gustafson, Steven, and Ek árt, Anik ó , editors, Proceedings of the 9th European Conference on Genetic Programming, volume 3905 of Lecture Notes in Computer Science, pages 178-189, Budapest, Hungary. Springer.Google Scholar
- V érel, S., Collard, P., and Clergue, M. (2003). Where are bottleneck in nk-fitness landscapes? In CEC 2003: IEEE International Congress on Evolutionary Computation. Canberra, Australia, pages 273-280. IEEE Press, Piscataway, NJ.Google Scholar
- Weinberger, E. D. (1990). Correlated and uncorrelated fitness landscapes and how to tell the difference. Biol. Cybern., 63:325-336.zbMATHCrossRefGoogle Scholar
- Wishart, D.S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., and Woolsey, J. (2006). Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34. doi:10.1093/nar/gkj067.Google Scholar
- Yoshida, F. and Topliss, J. G. (2000). Qsar model for drug human oral bioavailability. Journal of Medicinal Chemistry, 43:2575-2585.CrossRefGoogle Scholar