Analytical and Bioanalytical Chemistry

, Volume 380, Issue 3, pp 430–444 | Cite as

Total ranking models by the genetic algorithm variable subset selection (GA–VSS) approach for environmental priority settings

  • M. Pavan
  • A. Mauri
  • R. Todeschini


Total order ranking (TOR) strategies, which are mathematically based on elementary methods of discrete mathematics, seem to be attractive and simple tools for performing data analysis. Moreover order-ranking strategies seem to be a very useful tool not only to perform data exploration but also to develop order ranking models, a possible alternative to conventional quantitative structure–activity relationship (QSAR) methods. In fact, when data material is characterised by uncertainties, order methods can be used as alternative to statistical methods such as multilinear regression (MLR), because they do not require specific functional relationships between the independent and dependent variables (responses). A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model attributes, which are calculated attributes. As in regression and classification models, the variable selection model is one of the main steps in finding predictive models. In this work the genetic algorithm–variable subset selection (GA–VSS) approach is proposed as the variable selection method for searching for the best ranking models within a wide set of variables. The models based on the selected subsets of variables are compared with the experimental ranking and evaluated by the Spearman’s rank index. A case study application is presented on a TOR model developed for polychlorinated biphenyl (PCB) compounds, which have been analysed according to some of their physicochemical properties which play an important role in their environmental impact.


Multicriteria decision making Priority setting Total order ranking models GA–VSS PCB 



Financial support from the Commission of the European Union (R&D project “Beam”, EVK1-CT1999-00012) is acknowledged.


  1. 1.
    Halfon E, Reggiani MG (1986) On ranking chemicals for environmental hazard. Environ Sci Technol 20:1173–1179Google Scholar
  2. 2.
    Halfon E (1989) Comparison of an index function and a vectorial approach method for ranking of waste disposal sites. Environ Sci Technol 23:600–609Google Scholar
  3. 3.
    Halfon E, Bruggemann R (1998) On ranking chemicals for environmental hazard. Comparison of methodologies. Proceedings of the workshop on order theoretical tools in environmental sciences, pp 11–48Google Scholar
  4. 4.
    Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1997) Handbook of chemometrics and qualimetrics: part A, Amsterdam, chapter 26, pp 783–803Google Scholar
  5. 5.
    Keller RH, Massart DL (1991) Chemom Intell Lab Syst 175–189Google Scholar
  6. 6.
    Hendriks MMWB, Boer JH, Smilde AK, Doorbos DA (1992) Chemom Intell Lab Syst 16:175–191CrossRefGoogle Scholar
  7. 7.
    Lewi PJ, Van Hoof J, Boey P (1992) Chemom Intell Lab Syst 16:139–144CrossRefGoogle Scholar
  8. 8.
    Harrington EC (1965) Industrial quality control 21:494–498Google Scholar
  9. 9.
    Hocking RR (1976) The analysis and selection of variables in linear regression. Biometrics 32:1–49Google Scholar
  10. 10.
    Miller AJ (1990) Subset Selection in Regression. Chapman and Hall, London (UK), pp 230Google Scholar
  11. 11.
    Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, MassachusettsGoogle Scholar
  12. 12.
    Wehrens R, Buydens LMC (1998) Evolutionary optimization: a tutorial. TrAC, Trends Anal Chem 17(4):193–203CrossRefGoogle Scholar
  13. 13.
    Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281Google Scholar
  14. 14.
    Leardi R (1994) Application of genetic algorithms to feature selection under full validation conditions and to outlier detection. J Chemom 8:65–79Google Scholar
  15. 15.
    Luke BT (1994) Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure-property relationships. J Chem Inf Comput Sci 34:1279–1287Google Scholar
  16. 16.
    Leardi R (1996) Genetic algorithms in feature selection. In: Devillers J (ed) Genetic algorithms in molecular modeling. Principles of QSAR and Drug Design. vol 1. Academic, London, pp 67–86Google Scholar
  17. 17.
    Todeschini R, Consonni V, Mauri A, Pavan M (2004) MobyDigs: software for regression and classification models by genetic algorithms In: Leardi R (ed) Nature-inspired methods in chemometrics: genetic algorithms and artificial neural networks, chap 5. Elsevier, p 141–167Google Scholar
  18. 18.
    Kendall MG (1948) Rank Correlation Methods. Charles Griffin and Co., London 195:202–204Google Scholar
  19. 19.
    Patil GS (1991) Correlation of aqueous solubility and octanol-water partition coefficient based on molecular structure. Chemosphere 22(8):723–738CrossRefGoogle Scholar
  20. 20.
    Myrdal P, Ward GH, Dannenfelser R-M, Mishra D, Yalkowsky SH (1992) AQUAFAC 1: Aqueous functional group activity coefficients: application to hydrocarbons. Chemosphere 24:1047–1061CrossRefGoogle Scholar
  21. 21.
    Todeschini R, Consonni V, Mauri A, Pavan M (2004) DRAGON, Rel. 5 for Windows; Talete srl: Milano, ItalyGoogle Scholar
  22. 22.
    HYPERCHEM (1995) Rel 4 for Windows. Autodesk. Inc., Sausalito USAGoogle Scholar
  23. 23.
    Bonchev D (1983) Information theoretic indices for characterization of chemical structures. Research Studies Press, Chichester, UKGoogle Scholar
  24. 24.
    Devillers J, Balaban AT (2000) Topological indices and related descriptors in QSAR and QSPR. Gordon and Breach, AmsterdamGoogle Scholar
  25. 25.
    Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Research Studies Press, Wiley, Chichester , pp 262Google Scholar
  26. 26.
    Moreau G, Broto P (1980a) The autocorrelation of a topological structure: a new molecular descriptor. Nouv J Chim 4:359–360Google Scholar
  27. 27.
    Moreau G, Broto P (1980b) Autocorrelation of molecular structures: application to SAR studies. Nouv J Chim 4:757–764Google Scholar
  28. 28.
    Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies. Autocorrelation Descriptor. Eur J Med Chem 19:66–70Google Scholar
  29. 29.
    Estrada E (1995) Edge adjacency relationships and a novel topological index related to molecular volume. J Chem Inf Comput Sci 35:31–33Google Scholar
  30. 30.
    Pearlman RS, Smith KM (1998) Novel software tools for chemical diversity. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in Drug Design, vol 2. Kluwer/ESCOM, Dordrecht, pp 339–353Google Scholar
  31. 31.
    Pearlman RS (1999) Novel software tools for addressing chemical diversity. Internet Communication, Scholar
  32. 32.
    Gálvez J, Garcìa R, Salabert MT, Soler R (1994) Charge indexes. New Topological Descriptors. J Chem Inf Comput Sci 34:520–525Google Scholar
  33. 33.
    Gálvez J, Garcìa-Domenech R, De Julián-Ortiz V, Soler R (1995) Topological approach to drug design. J Chem Inf Comput Sci 35:272–284PubMedGoogle Scholar
  34. 34.
    Balaban AT, Ciubotariu D, Medeleanu M (1991) Topological indices and real vertex invariants based on graph eigenvalues or eigenvectors. J Chem Inf Comput Sci 31:517–523Google Scholar
  35. 35.
    Randic M (1995) Molecular shape profiles. J Chem Inf Comput Sci 35:373–382Google Scholar
  36. 36.
    Randic M (1996) Quantitative structure-property relationship—boiling points of planar benzenoids. New J Chem 20:1001–1009Google Scholar
  37. 37.
    Hemmer MC, Steinhauer V, Gasteiger J (1999) Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19:151–164CrossRefGoogle Scholar
  38. 38.
    Schuur J, Gasteiger J (1996) 3D-MoRSE Code—a new method for coding the 3D structure of molecules. In: Gasteiger J (ed) Software Development in Chemistry, vol 10. Fachgruppe Chemie-Information-Computer (CIC), Frankfurt am MainGoogle Scholar
  39. 39.
    Schuur J, Gasteiger J (1997) Infrared spectra simulation of substituted benzene derivatives on the basis of a 3D structure representation. Anal Chem 69:2398–2405CrossRefGoogle Scholar
  40. 40.
    Todeschini R, Lasagni M, Marengo E (1994) New molecular descriptors for 2D- and 3D-Structures. Theory J Chemom 8:263–273Google Scholar
  41. 41.
    Todeschini R, Gramatica P (1997) 3D-Modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of WHIM descriptors. Quant Struct-Act Relat 16:113–119Google Scholar
  42. 42.
    Consonni V, Todeschini R, Pavan M (2002) Structure/response correlation and similarity/diversity analysis by GETAWAY descriptors. Part 1. Theory of the novel 3D molecular descriptors. J Chem Comput Sci 42:693–705CrossRefGoogle Scholar
  43. 43.
    Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 667Google Scholar
  44. 44.
    Todeschini R, Consonni V, Mauri A, Pavan M (2003) RANA for Windows; Talete srl, MilanoGoogle Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  1. 1.Milano Chemometrics and QSAR Research Group, Department of Environmental SciencesUniversity of Milano-BicoccaMilanoItaly

Personalised recommendations