A Review of Statistics in Palaeoenvironmental Research

Palaeoecologists use sequences of fossils within deposits from continents and oceans all over the world in order to produce time-series of past environmental dynamics over decades to millennia or longer. Such information can place current and future environmental change into context, for example by showing how climate, environments, ecosystems and humans interacted during past events, and by enabling verification of climate models through ‘hind-casting’ of such events. Through a meta-analysis and focused literature review of currently used statistical approaches in palaeoecological research, we highlight potential pitfalls and suggest ways forward to a fuller statistical understanding of the possibilities and limitations of palaeoecological studies. Statisticians or at least statistical reasoning should be involved in order to quantify uncertainties across the full analytical pipeline of obtaining, analysing and interpreting fossil time-series and could help optimizing the analytical decisions taken at all these steps. Supplementary materials accompanying this paper appear online.


INTRODUCTION
Climate, environments and ecosystems (CEEs) are almost never constant but fluctuate over a wide range of time-scales. Current CEEs can be measured and/or manipulated in order to investigate the nature of their dynamics and interactions, and similarly their dynamics can be investigated using measurements of CEEs over time, which can range back to some seasons, years, decades or centuries. However, directly measured time-series cover only some areas of our planet, and no direct observations are available on time-scales from multiple centuries to millennia or even millions of years. For such time-series, we thus need to rely on physical, chemical or biological proxies of CEEs (Jackson 2007). Such proxies can be found as partly fossilized remains (sub-fossils) within sites that have accumulated over long time-scales, and have been used to identify past extinctions, climate perturbations, human impacts and other major environmental events (e.g. Seddon et al. 2014).
The types of research questions approached using such fossil proxy studies vary, with some studies aiming to reconstruct past ecosystems (palaeoecology, e.g. forest composition from pollen in lakes or bogs; Prentice et al. 1987), whereas others aim to decipher climate dynamics (palaeoclimatology, e.g. regional temperature time-series from tree-ring widths; Mann et al. 1998), mainly manifested through its intermediate of the palaeo-environment. Some deposits can be seen as quite simple systems that store information about CEEs as they accumulate over millennia. Ice-sheets from polar or montane regions for example can contain acid peaks which can be interpreted as traces of volcanic eruptions, dust which indicates wind sources, CO 2 in air bubbles which indicate past atmospheric concentrations, and stable isotopes of oxygen and hydrogen within the ice itself which indicate sea-ice cover, precipitation sources and air temperatures (e.g. Steffensen et al. 2008). Other types of deposits form more indirect CEE proxy records since they form part of more intricate environments or ecosystems. For example, trees in temperate regions form annual rings and their width, density, colour and composition can serve as proxies for local climate and immediate environmental conditions such as fires or earthquakes (Swetnam and Betancourt 1990). Lakes and oceans contain sediment that accumulates over millennia and within which sub-fossilized pollen, remains of organisms (e.g. diatoms, foraminifera, and insects), isotopes and other proxy information of local environmental conditions can be found (e.g. indicators of lake levels which can be related to precipitation and temperature; Verschuren et al. 2009). Bogs accumulate over millennia and store pollen, macrofossils such as seeds and leaves, small animals, and isotopes within the resulting peat (e.g. Mauquoy and van Geel 2007). Additional types of fossil deposits containing proxy records include corals, speleothems, and other cave deposits. Since these sites accumulate over time, depth can be used as a proxy for time. By analysing the environmental proxies of a sequence of d's along a core, time series of fossil proxy information are obtained.
A range of statistical issues has to be taken into account when analysing and interpreting these types of proxy records. Here we provide a short overview of the processes involved, assisted by a review of papers published in the Q1-ranked disciplinary journal Quaternary Science Reviews. We searched through all papers published in this journal through Volumes 183-181 and selected all that presented proxy records within cores sampled from sedimentary deposits (n = 50; other types of studies such as those based purely on modelling or geomorphology were not taken into account). For each paper (Table 1), we analysed any statistical techniques used in the four subsequent steps of (i) selecting which types of archives, sites, cores, and core depths to analyse for a number of proxies, (ii) interpreting the proxies in terms of CEEs, (iii) building chronologies, and (iv) putting the results into a wider spatiotemporal context. After this overview, we look into more detail at the latter two of these steps since the authors are most familiar with research in these areas. While for chronology-building, statistical methods have been widely adopted throughout the fossil proxy communities, this is much less the case for the step of placing studies into context.  (11), loss-on-ignition (7), X-ray fluorescence (5), magnetism (5), grain size (5) Bayesian age-depth models using Bacon (12) or Oxcal's P_sequence (7) Layer-counting (1) None (26) Not mentioned (7) None (in-text literature review only; 11) Consideration of uncertainties Uncertainties visualized (17) Dates and age-model (19), dates only (7), model only (2) Uncertainty visualisation of none (23), some (13) or all (3) time-series in context None (33) None (22) Numbers within brackets indicate amount of studies in the meta-analysis which used that approach. Since none of the studies mentioned any usage of statistics during the stage of sampling for sites, proxies, cores or core depths, no column was added for this step. For more details, see Supplementary Information/Appendix 1.
It is hoped that this overview will encourage a deeper focus on the development and use of statistics during investigations based on fossil proxy records, and thus develop a fuller understanding of the uncertainties involved at all steps.

OVERVIEW
As outlined above, here based on a meta-analysis and additional literature we will outline the steps taken to obtain, analyse and interpret fossil time series. The first step is that of selecting study sites and which proxies, cores and core depths to analyse.

SAMPLING STRATEGIES
Many research teams throughout the world are involved in producing fossil proxy time series. At first sight, the fossil proxy sites available within one of the major (though nonexhaustive) palaeo-databases, the US National Oceanic and Atmospheric Administration's National Climatic Data Center (NOAA-NCDC), appear to cover large parts of the Earth (Fig. 1). However, coverage is patchy, with large areas of for example the Pacific Ocean, Africa, the Amazon and Australia being poorly represented, and tree-ring records covering some temperate and montane regions only. Some of this poor representation is due to absence of sites (e.g. regions that have unsuitable climates to support ice sheets, trees, lakes or bogs), but at smaller scales there could be sampling biases related to distance from laboratories, roads or other infrastructure. Whereas some studies attempt to select sites that cover specific environmental gradients of interest (e.g. Charman et al. 2013), and all studies will have made decisions on how best to spend their limited funds and research time in order to obtain a proxy record that would answer their research questions, very few studies (none of the 50 in Table 1) mention the use of statistics to guide their choice of archive, site, core length and density of proxy analysis. The Quaternary research community could benefit from statistical analyses of sites sampled so far, for example to strengthen our understanding of which regions (e.g. in climate or ecosystem space) are currently under-sampled and would benefit from more research effort, or to identify fossil time-series that show high correlation with other regions or studies and would thus form valuable targets for further research efforts. Once a site is selected, statistical approaches to sampling design could be useful in deciding which depths to analyse (e.g. Christen and Buck 1998;Blaauw et al. 2018).

INTERPRETING PROXY RECORDS
The spectra of sub-fossils analysed within sediment cores cannot inform us directly about past CEEs. Instead, some kind of modelling (calibration) is required before they can be interpreted in terms of past CEE dynamics. These fossil calibrations are based on the assumption that the same natural laws and processes that operate now have always operated in the past (Uniformitarianism; Hutton 1788). In order to illustrate this, we can consider these laws as functions (G(.)) which depend on the state of the environment (μ e ) at time t (G(μ e (t))). For example, from observing where current species of beetles are found, we can infer their preferences for distinct CEEs. If we then encounter the same species of beetles fossilised within a deposit, we can infer that when that deposit was formed, conditions must have been preferable for those beetles. We can thus calibrate our fossils into 'proxy' time-series of CEEs by substituting space (CEEs where species are currently found) for time (periods when fossil specimens of the same species are found within deposits). Even so, since current distributions of biota do not necessarily cover all possible climates and ecosystems, and human activity will have impacted current vegetation patterns, some ancient ecosystems will have no current analogue (Goring et al. 2016). This is equivalent to observing a function in sections with missing parts (Fig. 2).
Many conceptual or statistical methods are used to translate fossil proxies into proxy variables of interest. In the late 1800s and early 1900s, Axel Blytt and Rutger Sernander used the spatial distribution of vegetation types within regions of Scandinavia, in order to interpret peat layers found within Danish bogs as indicating distinct climatic periods in time (von Post 1946). Since the 1980s, spatial distributions of assemblages of living beetles have been used to reconstruct past temperatures through identifying 'mutual climatic ranges' (Bray et al. 2006). More modern quantitative space-to-time transformations include ordination (Hill and Gauch 1980;Birks et al. 2012), weighted-averaging partial-least-squares regression (WA-PLS; ter Braak and Juggins 1993) and transfer functions (Imbrie and Kipp 1971;Juggins et al. 2014); 10% of the papers in Table 1 use such methods. These methods have to be used with care, because many of the techniques were originally aimed at exploratory analysis rather than hypothesis testing, relationships can change over space and time (including through recent human impacts), multiple causal factors can interact, and ecosystems can show nonlinear responses which depend on initial conditions (Juggins 2013). The most commonly used technique (CONISS, 12%), a clustering algorithm based time CEE state proxies data t 1 , t 2 , t 3 , ... , t i , ..., t n forward models on squared Euclidian dissimilarity, produces zones to visualise the main trends in (mostly) pollen diagrams (Grimm 1987). The majority of papers within our meta-analysis (52%) use no statistical techniques to interpret their proxies, and only 34% use error bars or other visualisations of proxy uncertainties (even though techniques to do so have been available for decades ;Maher 1972).
To summarize, even though a wide range of statistical techniques to interpret proxy time series has been available for decades, our meta-analysis suggests they are under-utilized within the palaeo community. Greater statistical rigour could advance interpretations of fossil time-series. For example, not all biota fossilize equally, causing a degree of bias in the final observed record, and there is also some evidence that fossils can move vertically within cores (Payne and Gehrels 2009). Perhaps statistical tools could be developed to model this. Null-models (Blaauw et al. 2010b;Barr 2017) could be useful in separating signal from noise in fossil time-series (e.g. Correa-Metrio et al. 2014;Turner et al. 2016).

BUILDING CHRONOLOGIES
On its own, each fossil time series provides only limited information about the dynamics of the environment at that site. Even if say the pollen record of an undated lake core could tell us how the nearby environment has evolved into its current state, only by putting the records onto a common time-scale can multiple records be compared to each other. Through such comparisons, inferences of spatiotemporal patterns can be made, which themselves are essential to test proposed cause-effect mechanisms. For example, a compilation of radiocarbon dates across Ireland indicates that a major societal collapse occurred at around the time of a climate event (Armit et al. 2014) and could thus be argued to show how vulnerable past societies were to climate change. However, since detailed analysis of the available data showed that the societal collapse happened some time before said climate event, in this case no such causal link could be established.
Our literature meta-analysis shows that a wide range of absolute and relative dating is used, and that most models that relate sediment core depths to ages are either based on classical methods such as linear interpolation (30%), smooth splines (10%) or regression (6%). Bayesian methods (see next section) include Bacon (24%, Blaauw and Christen 2011) and the P_Sequence within the Bayesian age-modelling software Oxcal (14%, Bronk Ramsey 2008). A minority (14%) do not mention which approaches were used, and most do not show uncertainties (44%). Therefore, even though the step of chronology-building relies more on statistical techniques than do the previous two steps, many studies ignore the associated uncertainties. Chronology-building, including potential areas of future statistical interest, will be discussed in more detail in Sect. 3.

PLACEMENT INTO CONTEXT
Many studies of past CEE dynamics place their study in context by comparison to other regional records (e.g. cores from nearby lakes), to potential forcing factors (e.g. solar irradiance), and/or to other types of time-series coeval with the study's record (e.g. archaeological records), in order to identify trends or events which then can be interpreted in terms of underlying forcing factors. Indeed, most papers in our analysis (70%, Table 1) plot their CEE record(s) together with other relevant time-series on the same calendar scale (often aided by shading or lines to highlight common dynamics). However, most of these figures do not show estimates of chronological or proxy uncertainties (59%), and 22% do not show figures and rely on in-text discussions of other studies only. A minority of studies present conceptual models to interpret their records (16%), and 1 study used a formal change-point analysis. Thus, much of this critical stage of placing records into context is done without considering the inherent statistical uncertainties. This will be discussed more in Sect. 4, outlining potential research lines including statistics.

BUILDING CHRONOLOGIES
Some records form annual layers, and by counting them secure chronologies can be obtained that sometimes span many millennia. Many trees in temperate areas form such layers (rings), several ice sheets in Greenland and elsewhere do, and even some lakes and ocean basins preserve annual bands (varves; De Geer 1912;Zolitschka et al. 2015) during sedimentation, e.g. through seasonal sediment inwash into glacial lakes or seasonal biological activity such as in Lake Suigetsu in Japan (Bronk Ramsey et al. 2012) or the Cariaco Basin offshore of Venezuela (Haug et al. 2001). Distinct patterns of thicker and thinner rings or varves can be used to infer environmental change but also to link chronologies within sites (e.g. within a single forest), across sites within a region (e.g. forests across Ireland or Northwestern Europe), or even between continents (De Geer 1921). Tree time-series can be extended backward in time by overlaying the rings of living trees with those of timbers from historical buildings or fossilised trunks in river beds or bogs.
Many records do not record such annual layering, and for these records other chronological methods have to be used. These are often based on measuring radioactive isotopes of for example lead, carbon, or uranium/thorium. Radioactive isotopes are unstable and decay at a constant rate (half-life)-based on measuring the remaining concentrations of these isotopes within samples, absolute ages can be calculated with a degree of error. The age calculations depend on the processes involved. For example, 210 Pb has a half-life of 22.3 years and can be used to date sediments back to c. 100-200 years in time. Some of the 210 Pb in a sediment core will have rained in from the atmosphere, while the rest will have come from in-situ decay of parent isotopes. These contributions are unique to each site and must thus be estimated (Aquino-López et al. 2018). Most of the radiocarbon or 14 C in organic material will have come from atmospheric 14 CO 2 , which itself is derived from oxidation of 14 C particles produced through collisions of cosmic rays with particles high up in the atmosphere. Measurements of 14 C in tree-rings of independently obtained known calendar age have shown that atmospheric 14 C concentrations have varied over time. Therefore, a calibration curve is required in order to put 14 C dates on a calendar scale (Reimer et al. 2013).
Besides such absolute dates, there are also relative dates, or so-called isochronous events across sites. For example, a volcanic eruption could deposit an ash layer (tephra) across continents, and if it possesses a unique chemical fingerprint, then this layer can be used to align sites that contain the layer in question. If some of these sites carry chronological information about the age of this tephra (e.g. through layer counting in ice cores, or radiocarbon dating in lakes), then this age information can be transferred (or tie-pointed) to other sites that recorded the same tephra. Sometimes the same approach is used to produce site chronologies by tie-pointing assumed synchronous climate events -however often these layers cannot be identified securely, and subsequent climate interpretations can suffer from circular reasoning (Blaauw 2012).
Absolute dates are expensive and time-consuming to obtain, some core depths will not contain enough isotopes for reliable dating, and often cores will contain only few identifiable tephra layers. Therefore, often age-depth modelling has to be used to estimate the ages t of any depth d of a core, t = F(d). Traditionally this is done by drawing some sort of linear interpolation or regression curve through the dated depths (Bennett 1994;Blaauw 2010; see Table 1). However, this is not straightforward, because some measurements have asymmetric errors (e.g. calibrated 14 C dates), often some dates appear to be outlying (e.g. through contamination, re-deposition of older sediment, or transport of material between depths in a core), and sediments are unlikely to have accumulated linearly over long stretches of depth or time. Moreover, sometimes unexpected scatter is found among multiple 14 C dates (Scott 2013;Christen and Perez 2009). As a result of the above problems, basic age-models likely underestimate chronological uncertainties .
Bayesian age-depth models that use MCMC simulations to simulate a more variable accumulation of sediment, such as Bpeat (Blaauw and Christen 2005), Bchron Parnell 2008), Oxcal's P_Sequence (Bronk Ramsey 2008), and Bacon (Blaauw and Christen 2011), have become quite popular among the Quaternary research community. Bpeat (Blaauw and Christen 2005) and its successor Bacon (Blaauw and Christen 2011) use piece-wise linear models, together with constraints on sedimentation times and how they vary between sections. The prior for sedimentation time was informed by analysing >200 lakes (Goring et al. 2012). The posterior information can be used to estimate the age distribution of any core depth, or even display the possible ages of proxy values (Fig. 3).
Bchron models sediment accumulation through simulating jumps in both depth and time using Gamma distributions, while OxCal's P_Sequence uses a Poisson process to simulate sediment accumulation. The Bayesian age-models listed here deal well with outlying dates, either through simulating the process of dates becoming outliers (Bronk Ramsey 2009) or by using student-t distributions with wide tails (Christen and Perez 2009;Blaauw et al. 2018). Because sediment can be assumed to have accumulated over time without reversals, the above Bayesian approaches enforce age-models to be monotonous. Such prior information on necessarily positive accumulation rates can result in much enhanced chronological precision, especially in cores with dates that overlap in time. For reviews on age-depth modelling, see Blaauw and Heegaard (2012), Parnell et al. (2011) and Trachsel and Telford (2016). As outlined above, chronology-building has received considerable statistical attention over the past decade or so, and could be thus considered as a relatively well-developed component of numerical palaeoecology. In the future, process-based, forward models could be of use in producing more geologically realistic age-depth models. Such models could start to make use of real-world information such as how the shapes of lake basins shape sedimentation rates (Bennett and Buck 2016). Using physical climate models, sedimentation could be simulated mechanistically, e.g. through modelling how different climate-dependent vegetation types and other landscape factors affect erosion (Merritt et al. 2003) and thus sediment input into lakes. More thorough approaches to estimating time-series from annually banded records would also be welcome (see Comboul et al. 2014;Boers et al. 2017).

PLACING FOSSIL TIME-SERIES INTO CONTEXT
As discussed in Sect. 2, by far most studies place their results into context through plotting their environmental inferences, or raw proxy data, against calendar age, together with a number of nearby or otherwise relevant time-series (Table 1), after which any shared events or other dynamics are interpreted. This raises questions regarding (i) spatial scales, (ii) impacts of the chronological quality of individual time-series, and (iii) potential mechanistic links between the time-series.
Most of the cores containing the fossil time-series outlined above are generally less than 10 cm wide, and thus the reconstructions are essentially based on one-dimensional time-series. These records are then extrapolated to spatial scales at which the records are thought to respond. To start at the smallest scales, even within a core there is some degree of horizontal variability (Fig. 2 of Wörmer et al. 2014). Within bogs, twigs, seeds and other macrofossils can be assumed to have derived from very close to where a core was sampled and are thus used to reconstruct highly local hydrological changes (e.g. Blaauw and Mauquoy 2012). Diatom records within lake sediment indicate ecological dynamics within a lake (or local conditions in specific areas of a lake; Rimet et al. 2016). On slightly larger spatial scales, pollen within a bog or lake is often assumed to have originated both from within or just around the site as well as from the surrounding ecosystems, and can thus be used to reconstruct the past dynamics of forests, agriculture and other vegetation from a wider area around the site. The source area of pollen is a topic of active modelling research (e.g. Sugita 2007; Theuerkauf and Couwenberg 2016)-for example, an increase in the amount of pine pollen within a core could have been caused by a growing pine population in the wider region, but also by a few nearby pine trees starting to produce pollen as they reached maturity. Similarly, a narrow ring found within one tree could indicate a purely local event such as disease, but if replicated within a site, between sites and/or across continents would imply a widely felt event and thus more likely caused by a large-scale factor such as climate. On the largest spatial scales, events in many fossil records are compared and/or aligned to those found in distant Greenland ice cores (e.g. Itambi et al. 2009) or global stacks of ocean sediment records (e.g. Lisiecki and Raymo 2005).
In the quest for identifying large-scale events from multiple fossil time-series across regions or continents (e.g. to test how well such events can be reproduced by climate models), a series of important decisions has to be made. Some fossil records are much better dated than others. Blois et al. (2011) compiled well-dated and poorly dated pollen records within a region, and then aligned pollen events shared between records in order to enhance the chronologies of the poorly dated records. Similarly, layer-counted Greenland ice cores form precisely dated records of climate events and are used as reference frames to which more poorly dated archives on other continents are linked (e.g. Haesaerts et al. 2009). However, even closely spaced ice cores show differences in event timings, erroneous tie-points could be chosen, and circular reasoning could mask real timing differences of events within or across regions (Blaauw 2012). Chronological uncertainties of aligned records have been modelled using Gaussian processes , but more work remains to be done on quantifying our confidence in chosen tie-points or in modelling accumulation between tie-points.
Most of the commonly used approaches to interpret proxies are 'reverse' models since they start with the outcomes (fossils) in order to infer their causes (past CEEs); this is equivalent to observing G(μ e ) and inferring μ e from it (Fig. 2). This introduces several problems, such as poor quantifications of uncertainties, dangers of correlations with noncausal variables, no obvious ways to combine multiple records, and difficulties in estimating the smoothness in changes over time and in combining multiple records. Some of the above problems could be avoided by building simple or more complicated process-based or forward models, where changes in climate parameters steer changes in environments and ecosystems, which are then registered as changes in fossil spectra-in other words, defining the function G(μ e ). Blaauw et al. (2010b) developed a basic conceptual model where m fossil types have individual preferences and tolerances for n environmental factors. As these factors vary over time, conditions become more or less favourable for each of the fossil types, causing changes in their relative abundances. Simple conceptual models (Jackson 2012) could also enhance our qualitative understanding of the possibilities and limitations of proxy data. Building on Haslett et al. (2006), Parnell et al. (2016) produced realistic yet computationally demanding forward Bayesian models, creating iterative 'climate histories' through linking modular components with proxies (climate, modern vegetation distribution, and dated sediment cores with pollen counts) and assuming some smoothing over time. Other Bayesian compilation approaches include Holmström et al. (2005) and Ilvonen et al. (2016).
It is important to note that most Quaternary research questions do not involve absolute proxy values, but rather their changes over time and the speeds of these changes. This introduces a dependence between neighbouring proxy measurements, as well as on the age-depth models. Some work has been done on identifying changes, including changepoint analysis (Blois et al. 2011), probabilistic identification of proxy events (Blaauw et al. 2010a), and directional analysis (Holmström 2010). Some past-climate studies search for periodicities using wavelet analysis (Debret et al. 2007; see also Table 1). Such techniques are sensitive to uneven sample spacing, and to our knowledge, the impacts of chronological uncertainties on wavelet analysis have not yet been investigated.

DISCUSSION
As outlined above, a wide range of statistical techniques has been developed and used in order to better reconstruct past climates, environments and ecosystems from fossil records. However, many of the statistical approaches to proxy interpretation rely on exploratory analysis rather than hypothesis-testing. Some of these techniques were developed by the Quaternary community itself, whereas others were done through collaborations with the statistics community. Recent exciting collaborative developments in past CEE dynamics include tipping points (e.g. Scheffer et al. 2009), forward models (Parnell et al. 2016) and artificial intelligence (Anderson et al. 2014). One of the main remaining problems however is that, even though many approaches to quantify uncertainties are now available, the Quaternary community continues to largely ignore them at all steps (Table 1). Also, not all uncertainties can be quantified (Jackson 2012). We would thus welcome continued collaboration between the palaeo and statistics communities (e.g. https://www.pastearth.net/, http://sites.nd.edu/paleonproject/, https://www.earthcube.org). Most of all, efforts should be strengthened to educate the palaeo community about the best ways to dealing with the large uncertainties inherent in palaeoenvironmental studies. More effort should also be made to communicate these uncertainties to climate modellers and other research communities that make use of fossil-derived CEE reconstructions (Jackson 2012).