, Volume 269, Issue 1, pp 485–502 | Cite as

Weighted averaging partial least squares regression (WA-PLS): an improved method for reconstructing environmental variables from species assemblages

  • Cajo J. F. ter Braak
  • Steve Juggins
Research tools


Weighted averaging regression and calibration form a simple, yet powerful method for reconstructing environmental variables from species assemblages. Based on the concepts of niche-space partitioning and ecological optima of species (indicator values), it performs well with noisy, species-rich data that cover a long ecological gradient (>3 SD units). Partial least squares regression is a linear method for multivariate calibration that is popular in chemometrics as a robust alternative to principal component regression. It successively selects linear components so as to maximize predictive power. In this paper the ideas of the two methods are combined. It is shown that the weighted averaging method is a form of partial least squares regression applied to transformed data that uses the first PLS-component only. The new combined method, ast squares, consists of using further components, namely as many as are useful in terms of predictive power. The further components utilize the residual structure in the species data to improve the species parameters (‘optima’) in the final weighted averaging predictor. Simulations show that the new method can give 70% reduction in prediction error in data sets with low noise, but only a small reduction in noisy data sets. In three real data sets of diatom assemblages collected for the reconstruction of acidity and salinity, the reduction in prediction error was zero, 19% and 32%.

Key words:

diatoms gradient analysis indicator values palaeo-environments partial least squares regression PLS species-environment calibration transfer function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Battarbee, R. W. & D. F. Charles, 1987. The use of diatom assemblages in lake sediments as a means of assessing the timing, trends, and causes of lake acidification. Progr. Phys. Geogr. 11: 552–580.Google Scholar
  2. Birks, H. J. B., J. M. Line, S. Juggins, A. C. Stevenson & C. J. F. Ter Braak, 1990a. Diatoms and pH reconstruction. Phil. Trans. r. Soc. Lond. B 327: 263–278.Google Scholar
  3. Birks, H. J. B., S. Juggins & J. M. Line, 1990b. Lake surfacewater chemistry reconstructions from palaeolimnological data. In B. J. Mason (ed.), The Surface Waters Acidification Programme. Cambridge University Press, Cambridge: 303–313.Google Scholar
  4. Brown, G. H., 1979. An optimization criterion for linear inverse estimation. Technometrics 21: 575–579.Google Scholar
  5. Cleveland, W. S., 1979. Robust locally-weighted regression and smoothing scatterplots. J. am. Statist. Assoc. 74: 829–836.Google Scholar
  6. COHMAP Members, l988. Climatic changes of the last 18000 years: observations and model simulations. Science 241:1043–1052.Google Scholar
  7. Cumming, B. F., J. P. Smol & H. J. B. Birks, 1991. The relationship between sedimentary chrysophyte scales (Chrysophyceae and Synurophyceae) and limnological characteristics in 25 Norwegian lakes. Nord. J. Bot. 11: 231–241.Google Scholar
  8. Dixit, S. S., A. S. Dixit & J. P. Smol, 1991. Multivariable environmental inferences based on diatom assemblages from Sudbury (Canada) lakes. Freshwat. Biol. 26: 251–266.Google Scholar
  9. Fritz, S. C., S. Juggins, R. W. Battarbee & D. R. Engstrom, 1991. Reconstruction of past changes in salinity and climate using a diatom-based transfer function. Nature 352: 706–708.Google Scholar
  10. Gasse, F. & F. Tekaia, 1983. Transfer functions for estimating paleoecological conditions (pH) from East African diatoms. In J. Meriläinen, P. Huttunen & R. W. Battarbee (eds), Palaeolimnology. Development in Hydrobiology 15. Dr W. Junk Publishers, The Hague: 85–90. Reprinted from Hydrobiologia l03.Google Scholar
  11. Guiot, J., 1990. Methodology of the last climatic cycle reconstruction in France from pollen data. Palaeogeogr. Palaeoclimatol. Palaeoccol. 80: 49–69.Google Scholar
  12. Hall, R. I. & J. P. Smol, l992. A weighted-averaging regression and calibration model for inferring total phosphorus concentration from diatoms in British Columbia (Canada) lakes. Freshwat. Biol. 27: 417–434.Google Scholar
  13. Hastie, T. & R. Tibshirani, 1990. Generalized Additive Models. Chapman and Hall, London.Google Scholar
  14. Helland, I. S., l988. On the structure of partial least squares regression. Commun. Statist.-Simula. 17: 58l-607.Google Scholar
  15. Hill, M. O., 1973. Diversity and evenness: a unifying notation and its consequences. Ecology 54: 427–432.Google Scholar
  16. Hill M. O. 1979. DECORANA — A FORTRAN program for detrended correspondence analysis and reciprocal averaging. Ecology and Systematics. Cornell University, Ithaca, New York, 55 pp.Google Scholar
  17. Hill M. O. & H. G. Gauch, 1980. Detrended correspondence analysis, an improved ordination technique. Vegetatio 42: 47–58.Google Scholar
  18. Howe, S. & Webb, T. III, 1983. Calibrating pollen data in climatic terms: improving the methods. Quat. Sci. Rev. 2: l7–51.Google Scholar
  19. Huntley, B. & I. C. Prentice, 1988. July temperatures in Europe from pollen data, 6000 years before present. Science 241: 687–690.Google Scholar
  20. Juggins, S., 1992. Diatoms in the Thames estuary, England: Ecology, palaeoecology, and salinity transfer function. Bibl. diatomol. 25: 1–216.Google Scholar
  21. Juggins, S. & C. J. F. Ter Braak, 1992. CALIBRATE — a program for species-environment calibration by [weighted-averaging] partial least squares regression. Unpublished computer program, Environmental Change Research Centre, University College London, 20 pp.Google Scholar
  22. Line, J. M. & H. J. B. Birks, 1990. WACALIB version 2.1 — a computer program to reconstruct environmental variables from fossil assemblages by weighted averaging. J. Paleolimnol. 3: 170–173.Google Scholar
  23. Lorber, A., L. E. Wangen & B. R. Kowalski, 1987. A theoretical foundation for the PLS algorithm. J. Chemometr. l: 19–31.Google Scholar
  24. Martens, H. & T. Naes, 1989. Multivariate calibration. Wiley, Chichester, 419 pp.Google Scholar
  25. Minchin, P. R., 1987. Simulation of multidimensional community patterns: towards a comprehensive model. Vegetatio 71: 145–156.Google Scholar
  26. Naes, T., C. Irgens & H. Martens, 1986. Comparison of linear statistical methods for calibration for NIR instruments. Appl. Statist. 35: 195–206.Google Scholar
  27. Oksanen, J., E. Laara, P. Huttunen & J. Meriläinen, 1988. Estimation of pH optima and tolerances of diatoms in lake sediments by the methods of weighted averaging, least squares and maximum likelihood, and their use for the prediction of lake acidity. J. Paleolimnol. 1: 39–49.Google Scholar
  28. Overpeck, J. T., T. Webb III & I. C. Prentice, 1985. Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. Quat. Res. 23: 87–108.Google Scholar
  29. Prentice, I. C., P. J. Bartlein & T. Webb III, 1991. Vegetation and climate change in eastern North America since the last glacial maximum. Ecology 72: 2038–2056.Google Scholar
  30. Rousseau, D. D., 1991. Climatic transfer function from Quaternary molluscs in European loess deposits. Quat. Res. 36: l95–209.Google Scholar
  31. Roux, M., 1979. Estimation des paléoclimats d'après l'écologie des foraminifères. Cah. Anal. Données 4: 61–79.Google Scholar
  32. Roux, M., S. Servant-Vildary & M. Servant, 1991. Inferred ionic composition and salinity of a Bolivian Quaternary lake, as estimated from fossil diatoms in the sediments. Hydrobiologia 210: 3–18.Google Scholar
  33. Shelford, V. E., 1911. Ecological succession: stream fishes and the method of physiographic analysis. Biol. Bull. (Woods Hole) 21: 9–34.Google Scholar
  34. Stevenson A. C., S. Juggins, H. J. B. Birks, D. S. Anderson, N. J. Anderson, R. W. Battarbee, F. Berge, R. B. Davis, R. J. Flower, E. Y. Haworth, V. I. Jones, J. C. Kingston, A. M. Kreiser, J. M. Line, M. A. R. Munro & I. Renberg, 1991. The surface waters acidification project Palaeolimnology programme: modern diatom/lake-water chemistry data-set. ENSIS, London, 86 pp.Google Scholar
  35. Stone, M. & R. J. Brooks, 1990. Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J. R. Statist. Soc. B 52: 237–269.Google Scholar
  36. Ter Braak, C. J. F., 1986. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67: 1167–1179.Google Scholar
  37. Ter Braak, C. J. F., 1987. Ordination. In R. H. G. Jongman, C. J. F. Ter Braak & O. F. R. Van Tongeren (eds), Data analysis in community and landscape ecology. Pudoc, Wageningen: 91–173.Google Scholar
  38. Ter Braak C. J. F., 1988. CANOCO — a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis (version 2.1). Report LWA-88–02. Agricultural Mathematics Group, Wageningen, 95 pp.Google Scholar
  39. Ter Braak C. J. F., 1990. Update notes: CANOCO version 3.1. Microcomputer Power, Ithaca, NY, 35 pp.Google Scholar
  40. Ter Braak, C. J. F. & L. G. Barendregt, 1986. Weighted averaging of species indicator values: its efficiency in environmental calibration. Math. Bio. 78: 57–72.Google Scholar
  41. Ter Braak, C. J. F. & C. W. N. Looman, 1986. Weighted averaging, logistic regression and the Gaussian response model. Vegetatio 65: 3–11.Google Scholar
  42. Ter Braak, C. J. F. & I. C. Prentice, 1988. A theory of gradient analysis. Adv. Ecol. Res. 18: 271–317.Google Scholar
  43. Ter Braak, C. J. F. & H. van Dam, 1989. Inferring pH from diatoms: a comparison of old and new calibration methods. Hydrobiologia 178: 209–223.Google Scholar
  44. Ter Braak, C. J. F., S. Juggins, H. J. B. Birks & H. van der Voet, 1993. Weighted averaging partial least squares regression (WA-PLS): definition and comparison with other methods for species-environment calibration. Chapter 25 in G. P. Patil & C. R. Rao (eds), Multivariate Environmental Statistics. North-Holland, Amsterdam.Google Scholar
  45. Walker, I. R., R. J. Mott & J. P. Smol, 1991. Allerod-Younger Dryas lake temperatures from midge fossils in Atlantic Canada. Science 253: 1010–1012.Google Scholar
  46. Whittaker, R. H., 1956. Vegetation of the Great Smoky Mountains. Ecol. Monogr. 26: 1–80.Google Scholar
  47. Wold, S., 1992. Nonlinear partial least squares modelling. II Spline inner relation. Chemometrics and Intelligent Laboratory Systems 14: 71–84.Google Scholar
  48. Wold, S., A. Ruhe, H. Wold & W. J. Dunn III, 1984. The collinearity problem in linear regression: the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput. 5: 735–743.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Cajo J. F. ter Braak
    • 1
    • 2
  • Steve Juggins
    • 3
    • 4
  1. 1.Agricultural Mathematics Group-DLOWageningenThe Netherlands
  2. 2.DLO-Institute for Forestry and Nature ResearchWageningenThe Netherlands
  3. 3.Environmental Change Research Centre, Department of GeographyUniversity College LondonLondonUnited Kingdom
  4. 4.Botanical InstituteUniversity of BergenBergenNorway

Personalised recommendations