Tracking Environmental Change Using Lake Sediments

Volume 5 of the series Developments in Paleoenvironmental Research pp 249-327


Statistical Learning in Palaeolimnology

  • Gavin L. SimpsonAffiliated withEnvironmental Change Research Centre, University College London Email author 
  • , H. John B. BirksAffiliated withEnvironmental Change Research Centre, University College LondonDepartment of Biology and Bjerknes Centre for Climate Research, University of BergenSchool of Geography and the Environment, University of Oxford

* Final gross prices may vary according to local VAT.

Get Access


This chapter considers a range of numerical techniques that lie outside the familiar statistical methods of linear regression, analysis of variance, and generalised linear models or data-analytical techniques such as ordination, clustering, and partitioning. The techniques outlined have developed as a result of the spectacular increase in computing power since the 1980s. The methods make fewer distributional assumptions than classical statistical methods and can be applied to more complicated estimators and to huge data-sets. They are part of the ever-increasing array of ‘statistical learning’ techniques (sensu Hastie, Tibshirani, Friedman J, The elements of statistical learning, 2nd edn. Springer, New York, 2011) that try to make sense of the data at hand, to detect major patterns and trends, to understand ‘what the data say’, and thus to learn from the data.

A range of tree-based and network-based techniques are presented. These are classification and regression trees, multivariate regression trees, bagged trees, random forests, boosted trees, multivariate adaptive regression splines, artificial neural networks, self-organising maps, Bayesian networks, and genetic algorithms. Principal curves and surfaces are also discussed as they relate to unsupervised self-organising maps. The chapter concludes with a discussion of current developments in shrinkage methods and variable selection in statistical modelling that can help in model selection and can minimise collinearity problems. These include principal components regression, ridge regression, the lasso, and the elastic net.


Artificial neural networks Bagging trees Bayesian belief networks Bayesian decision networks Bayesian networks Boosted Trees Classification trees Data-mining Decision trees Genetic algorithms Genetic programmes Multivariate adaptive regression splines Multivariate regression trees Random forests Regression trees Self-organising maps Principal curves and surfaces Shrinkage Ridge regression The lasso The elastic net Model selection Statistical learning Supervised learning Unsupervised learning