Weighted averaging partial least squares regression (WA-PLS): an improved method for reconstructing environmental variables from species assemblages

  • Cajo J. F. ter Braak
  • Steve Juggins
Part of the Developments in Hydrobiology book series (DIHY, volume 90)


Weighted averaging regression and calibration form a simple, yet powerful method for reconstructing environmental variables from species assemblages. Based on the concepts of niche-space partitioning and ecological optima of species (indicator values), it performs well with noisy, species-rich data that cover a long ecological gradient (> 3 SD units). Partial least squares regression is a linear method for multivariate calibration that is popular in chemometrics as a robust alternative to principal component regression. It successively selects linear components so as to maximize predictive power. In this paper the ideas of the two methods are combined. It is shown that the weighted averaging method is a form of partial least squares regression applied to transformed data that uses the first PLS-component only. The new combined method, weighted averaging partial least squares, consists of using further components, namely as many as are useful in terms of predictive power. The further components utilize the residual structure in the species data to improve the species parameters (‘optima’) in the final weighted averaging predictor. Simulations show that the new method can give 70% reduction in prediction error in data sets with low noise, but only a small reduction in noisy data sets. In three real data sets of diatom assemblages collected for the reconstruction of acidity and salinity, the reduction in prediction error was zero, 19% and 32%.

Key words

diatoms gradient analysis indicator values palaeo-environments partial least squares regression PLS species-environment calibration transfer function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Battarbee, R. W. & D. F. Charles, 1987. The use of diatom assemblages in lake sediments as a means of assessing the timing, trends, and causes of lake acidification. Progr. Phys. Geogr. 11: 552–580.CrossRefGoogle Scholar
  2. Birks, H. J. B., J. M. Line, S. Juggins, A. C. Stevenson & C. J. F. Ter Braak, 1990a. Diatoms and pH reconstruction. Phil. Trans. r. Soc. Lond. B 327: 263–278.CrossRefGoogle Scholar
  3. Birks, H. J. B., S. Juggins & J. M. Line, 1990b. Lake surface-water chemistry reconstructions from palaeolimnological data. In B. J. Mason (ed.), The Surface Waters Acidification Programme. Cambridge University Press, Cambridge: 303 – 313.Google Scholar
  4. Brown, G. H., 1979. An optimization criterion for linear inverse estimation. Technometrics 21: 575 – 579.CrossRefGoogle Scholar
  5. Cleveland, W. S., 1979. Robust locally-weighted regression and smoothing scatterplots. J. am. Statist. Assoc. 74: 829–836.CrossRefGoogle Scholar
  6. COHMAP Members, 1988. Climatic changes of the last 18000 years: observations and model simulations. Science 241: 1043 – 1052.CrossRefGoogle Scholar
  7. Cumming, B. F., J. P. Smol & H. J. B. Birks, 1991. The relationship between sedimentary chrysophyte scales (Chrysophyceae and Synurophyceae) and limnological characteristics in 25 Norwegian lakes. Nord. J. Bot. 11: 231–241.CrossRefGoogle Scholar
  8. Dixit, S. S., A. S. Dixit & J. P. Smol, 1991. Multivariable environmental inferences based on diatom assemblages from Sudbury (Canada) lakes. Freshwat. Biol. 26: 251–266.CrossRefGoogle Scholar
  9. Fritz, S. C., S. Juggins, R. W. Battarbee & D. R. Engstrom, 1991. Reconstruction of past changes in salinity and climate using a diatom-based transfer function. Nature 352: 706 – 708.CrossRefGoogle Scholar
  10. Gasse, F. & F. Tekaia, 1983. Transfer functions for estimating paleoecological conditions (pH) from East African diatoms. In J. Meriläinen, P. Huttunen & R. W. Battarbee (eds), Palaeolimnology. Development in Hydrobiology 15. Dr W. Junk Publishers, The Hague: 85–90. Reprinted from Hydrobiologia 103.Google Scholar
  11. Guiot, J., 1990. Methodology of the last climatic cycle reconstruction in France from pollen data. Palaeogeogr. Palaeoclimatol. Palaeoecol. 80: 49–69.Google Scholar
  12. Hall, R. I. & J. P. Smol, 1992. A weighted-averaging regression and calibration model for inferring total phosphorus concentration from diatoms in British Columbia (Canada) lakes. Freshwat. Biol. 27: 417–434.CrossRefGoogle Scholar
  13. Hastie, T. & R. Tibshirani, 1990. Generalized Additive Models. Chapman and Hall, London.Google Scholar
  14. Helland, I. S., 1988. On the structure of partial least squares regression. Commun. Statist.-Simula. 17: 581–607.CrossRefGoogle Scholar
  15. Hill, M. O., 1973. Diversity and evenness: a unifying notation and its consequences. Ecology 54: 427 – 432.CrossRefGoogle Scholar
  16. Hill, M. O., 1979. DECORANA - A FORTRAN program for detrended correspondence analysis and reciprocal averaging. Ecology and Systematics. Cornell University, Ithaca, New York, 55 pp.Google Scholar
  17. Hill, M. O. & H. G. Gauch, 1980. Detrended correspondence analysis, an improved ordination technique. Vegetatio 42: 47 – 58.CrossRefGoogle Scholar
  18. Howe, S. & Webb, T. III, 1983. Calibrating pollen data in climatic terms: improving the methods. Quat. Sci. Rev. 2: 17–51.CrossRefGoogle Scholar
  19. Huntley, B. & I. C. Prentice, 1988. July temperatures in Europe from pollen data, 6000 years before present. Science 241: 687 – 690.PubMedCrossRefGoogle Scholar
  20. Juggins, S., 1992. Diatoms in the Thames estuary, England: Ecology, palaeoecology, and salinity transfer function. Bibl. diatomol. 25: 1–216.Google Scholar
  21. Juggins, S. & C. J. F. Ter Braak, 1992. CALIBRATE - a program for species-environment calibration by [weighted-averaging] partial least squares regression. Unpublished computer program, Environmental Change Research Centre, University College London, 20 pp.Google Scholar
  22. Line, J. M. & H. J. B. Birks, 1990. WACALIB version 2.1 - a computer program to reconstruct environmental variables from fossil assemblages by weighted averaging. J. Paleolimnol. 3: 170 – 173.CrossRefGoogle Scholar
  23. Lorber, A., L. E. Wangen & B. R. Kowalski, 1987. A theoretical foundation for the PLS algorithm. J. Chemometr. 1: 19 – 31.CrossRefGoogle Scholar
  24. Martens, H. & T. Naes, 1989. Multivariate calibration. Wiley, Chichester, 419 pp.Google Scholar
  25. Minchin, P. R., 1987. Simulation of multidimensional community patterns: towards a comprehensive model. Vegetatio 71: 145 – 156.Google Scholar
  26. Naes, T., C. Irgens & H. Martens, 1986. Comparison of linear statistical methods for calibration for NIR instruments. Appl. Statist. 35: 195–206.Google Scholar
  27. Oksanen, J., E. Laara, P. Huttunen & J. Merilainen, 1988. Estimation of pH optima and tolerances of diatoms in lake sediments by the methods of weighted averaging, least squares and maximum likelihood, and their use for the prediction of lake acidity. J. Paleolimnol. 1: 39 – 49.CrossRefGoogle Scholar
  28. Overpeck, J. T., T. Webb III & I. C. Prentice, 1985. Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. Quat. Res. 23: 87–108.Google Scholar
  29. Prentice, I. C., P. J. Bartlein & T. Webb III, 1991. Vegetation and climate change in eastern North America since the last glacial maximum. Ecology 72: 2038 – 2056.CrossRefGoogle Scholar
  30. Rousseau, D. D., 1991. Climatic transfer function from Quaternary molluscs in European loess deposits. Quat. Res. 36: 195–209.Google Scholar
  31. Roux, M., 1979. Estimation des paléoclimats d’après l’écologie des foraminifères. Cah. Anal. Données 4: 6179.Google Scholar
  32. Roux, M., S. Servant-Vildary & M. Servant, 1991. Inferred ionic composition and salinity of a Bolivian Quaternary lake, as estimated from fossil diatoms in the sediments. Hydrobiologia 210: 3 – 18.CrossRefGoogle Scholar
  33. Shelford, V. E., 1911. Ecological succession: stream fishes and the method of physiographic analysis. Biol. Bull. ( Woods Hole ) 21: 9–34.Google Scholar
  34. Stevenson A. C., S. Juggins, H. J. B. Birks, D. S. Anderson, N. J. Anderson, R. W. Battarbee, F. Berge, R. B. Davis, R. J. Flower, E. Y. Haworth, V. I. Jones, J. C. Kingston, A. M. Kreiser, J. M. Line, M. A. R. Munro & I. Renberg, 1991. The surface waters acidification project Palaeolimnology programme: modern diatom/lake-water chemistry data-set. ENSIS, London, 86 pp.Google Scholar
  35. Stone, M. & R. J. Brooks, 1990. Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J. R. Statist. Soc. B 52: 237–269.Google Scholar
  36. Ter Braak, C. J. F., 1986. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67: 1167 – 1179.CrossRefGoogle Scholar
  37. Ter Braak, C. J. F., 1987. Ordination. In R. H. G. Jongman, C. J. F. Ter Braak & O. F. R. Van Tongeren (eds), Data analysis in community and landscape ecology. Pudoc, Wageningen: 91 – 173.Google Scholar
  38. Ter Braak C. J. F., 1988. CANOCO — a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis (version 2.1). Report LWA-88-02. Agricultural Mathematics Group, Wageningen, 95 pp.Google Scholar
  39. Ter Braak C. J. F., 1990. Update notes: CANOCO version 3.1. Microcomputer Power, Ithaca, NY, 35 pp.Google Scholar
  40. Ter Braak, C. J. F. & L. G. Barendregt, 1986. Weighted averaging of species indicator values: its efficiency in environmental calibration. Math. Bio. 78: 57–72.Google Scholar
  41. Ter Braak, C. J. F. & C. W. N. Looman, 1986. Weighted averaging, logistic regression and the Gaussian response model. Vegetatio 65: 3 – 11.CrossRefGoogle Scholar
  42. Ter Braak, C. J. F. & I. C. Prentice, 1988. A theory of gradient analysis. Adv. Ecol. Res. 18: 271–317.Google Scholar
  43. Ter Braak, C. J. F. & H. van Dam, 1989. Inferring pH from diatoms: a comparison of old and new calibration methods. Hydrobiologia 178: 209 – 223.CrossRefGoogle Scholar
  44. Ter Braak, C. J. F., S. Juggins, H. J. B. Birks & H. van der Voet, 1993. Weighted averaging partial least squares regression (WA-PLS): definition and comparison with other methods for species-environment calibration. Chapter 25 in G. P. Patil & C. R. Rao (eds), Multivariate Environmental Statistics. North-Holland, Amsterdam.Google Scholar
  45. Walker, I. R., R. J. Mott & J. P. Smol, 1991. Allered-Younger Dryas lake temperatures from midge fossils in Atlantic Canada. Science 253: 1010 – 1012.PubMedCrossRefGoogle Scholar
  46. Whittaker, R. H., 1956. Vegetation of the Great Smoky Mountains. Ecol. Monogr. 26: 1–80.Google Scholar
  47. Wold, S., 1992. Nonlinear partial least squares modelling. II Spline inner relation. Chemometrics and Intelligent Laboratory Systems 14: 71–84.Google Scholar
  48. Wold, S., A. Ruhe, H. Wold & W. J. Dunn III, 1984. The collinearity problem in linear regression: the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput. 5: 735–743.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1993

Authors and Affiliations

  • Cajo J. F. ter Braak
    • 1
    • 2
  • Steve Juggins
    • 3
    • 4
  1. 1.Agricultural Mathematics Group-DLOWageningenThe Netherlands
  2. 2.DLO-Institute for Forestry and Nature ResearchWageningenThe Netherlands
  3. 3.Environmental Change Research Centre, Department of GeographyUniversity College LondonLondonUK
  4. 4.Botanical InstituteUniversity of BergenBergenNorway

Personalised recommendations