Background

The archaeological record includes physical features (e.g., structures, remnants of agricultural fields, rock art, and stone or ceramic artifacts) associated with human activity; these exist within ecological landscapes and provide a unique and irreplaceable long-term perspective on human–environment interactions (Scharf 2014; Hayashida 2005; Kelly and Thomas 2016). Archaeological resources are vulnerable to the impacts of wildfires, which can cause physical or chemical changes in materials or addition of residues that temporarily or permanently alter attributes important for determining artifact origin, age, cultural affiliation, or technology of production (Romme et al. 1993; Buenger 2003; Davis 2018). In this manner, wildfire impacts can compromise the interpretation of the archaeological record and its potential for documenting millennial-scale human history (Lissoway and Propper 1988; Johnson 2004; Rude and Jones 2012). Although a growing, global body of work identifies vulnerabilities of archaeological resources to sea level rise, storm events, and glacial retreat (Rockman 2015; Daly 2011; Morgan et al. 2016), less attention has been paid to the impacts of wildfires, and particularly lacking are studies that associate climate and landscape drivers of wildfire activity with potential archaeological impacts.

Climate changes, linked to increasing wildfire severity in many landscapes (Miller et al. 2009; van Mantgem et al. 2013; Singleton et al. 2019), exacerbate and accelerate fire’s effects on archaeological resources (Davis 2018). Uncharacteristically severe fires can expose archaeological materials to novel—and potentially more damaging—fire environments than occurred in the long-term ecological and cultural history of the region (Liebmann et al. 2016; Roos and Scott 2018; Roos et al. 2020). In the southwestern United States, landscapes and archaeological materials have been repeatedly exposed to wildfires in the past; however, these fires were less severe than those that have occurred in recent decades (Swetnam et al. 2016; Roos et al. 2021). More than a century of livestock grazing, logging, and fire exclusion have lengthened intervals between fires in dry conifer forests, leading to unnaturally high fuel loads, dense forest canopies, and reduced structural and spatial heterogeneity of vegetation (Covington et al. 1994; Allen et al. 2002; Reynolds et al. 2013; Kalies and Kent 2016). Fires in these forests are more intense, with larger patches of high-severity fire, than occurred historically (Covington et al. 1994; Westerling et al. 2006; Swetnam et al. 2016; Singleton et al. 2019). Amplified fire activity and proportionally larger areas of high-severity fire within burned areas is likely to continue with warming temperatures (Abatzoglou and Williams 2016; Loehman et al. 2018, Singleton et al. 2019) until fire–fuel feedbacks eventually limit fire occurrence, size, and severity (Parks et al. 2016; Schoennagel et al. 2017); thus, archaeological sites are at increasing risk for negative impacts from wildfires in the coming decades.

In this study, we analyzed relationships among environmental predictors and observed fire impacts to archaeological sites, features, and artifacts in the Jemez Mountains, New Mexico, USA—a region that contains the nation’s best-preserved archaeological record (Lissoway and Propper 1988). Although legislation (e.g., the National Historic Preservation Act [NHPA 1966] and the National Environmental Policy Act [NEPA 1970]) and agreements between American Indian tribes and federal agencies require consideration of management effects and disturbances to the archaeological record and its preservation (Dockry et al. 2017; Davis 2018), management and planning activities are constrained by lack of predictive information on how and where archaeological resources are likely to be affected by wildfires. Most previous studies have focused on inventorying the range of archaeological fire effects present after specific fires, or experimentally determining the temperatures at which archaeological resources are negatively affected (e.g., Bronitsky and Hamer 1986; Lissoway and Propper 1988; Traylor et al. 1990; Fiero 1995; Lentz et al. 1996; Nisengard et al. 2002; Buenger 2003; Sturdevant et al. 2009). To our knowledge, ours is the only study to explicitly model the environmental conditions likely to result in the presence and severity of archaeological fire effects, thus providing managers with information that can aid in pre- and post-fire planning.

Across the western United States, the occurrence of potentially damaging high-severity wildfires has been related to climate, topography, vegetation type, and fuel availability (Alexander et al. 2006; Lentile et al. 2006; Holden et al. 2009; Dillon et al. 2011; Birch et al. 2015; Parks et al. 2018), and low-moisture conditions associated with seasonal or persistent droughts that can increase the ignitability and flammability of fuels (Westerling and Swetnam 2003; Holden et al. 2009; Holsinger et al. 2016; Parks et al. 2018). We expected that, if wildfire impacts to archaeological resources were driven by the same underlying mechanisms that dictate ecological burn severity, we would find that predictors of higher ecological burn severity also predict increasingly severe archaeological impacts. If true, models of ecological burn severity (e.g., Holden et al. 2009; Dillon et al. 2011) could also be used to anticipate archaeological fire impacts.

Through this analysis, we determined the environmental conditions under which wildfires result in observable impacts to archaeological sites, artifacts, and features. Specifically, we sought to determine the importance of variables related to topography, vegetation, fuels, or weather as predictors of the following: (1) presence of fire effects on archaeological artifacts and features, and (2) the observed burn severity (i.e., unburned, low, moderate, or high severity) at archaeological sites.

Methods

Study area

Fires and archaeological sites included in this study are within the Jemez Mountains of northern New Mexico (Fig. 1), a mainly forested landscape of mesas and canyons ranging in elevation from 1500 to 3500 m above sea level. Along this gradient, vegetation communities include woodlands of piñon pine (Pinus edulis var. edulis Engelm.) and juniper (Juniperus monosperma [Engelm.] Sarg., Juniperus scopulorum Sarg.), forests of ponderosa pine (Pinus ponderosa P. Lawson & C. Lawson), frequently with a Gambel oak (Quercus gambelii Nutt.) understory, and mixed conifer forests of ponderosa pine, Gambel oak, white fir (Abies concolor [Gordon & Glend.] Lindl. ex Hildebr.), Douglas-fir (Pseudotsuga menziesii var. glauca [Mayr] Franco), southwestern white pine (Pinus strobiformis Engelm.), limber pine (Pinus flexilis E. James), and aspen (Populus tremuloides Michx.) (Touchan et al. 1996; Muldavin et al. 2011). Prior to human land use and management activities that limited fire occurrence and spread, dry conifer forests in the region experienced frequent, low-severity surface fires; however, in the past four decades, wildfires with larger components of high-severity fire than occurred in fires of the past have burned across the Jemez Mountains (Grissino Mayer and Swetnam 2000; Allen 2007; Coop et al. 2016; Roos et al. 2020).

Fig. 1
figure 1

Location of archaeological sites (blue dots) and wildfires (black perimeters, labeled) in the Jemez Mountains, New Mexico, USA (2000 to 2011) used in this study’s analysis of environmental predictors of fire effects and burn severity. Fires included the Cerro Grande Fire (2000), Lakes Fire (2002), Virgin Fire (2003), Virgin Canyon Fire (2010), and the Las Conchas Fire (2011). Note that some areas and archaeological sites have experienced multiple (repeat) fires within the study period

The Jemez Mountains contain a large number and variety of archaeological sites, features, and artifacts (Elliott 1991; Anschuetz and Scheick 2006). At higher elevations (above about 2500 m), most sites are scatters of stone artifacts or obsidian quarries, associated with momentary usage and resource procurement (Kulisheck 2010). At elevations below 2500 m, the majority of archaeological sites consist of stone masonry architectural features with associated scatters of ceramic and stone artifacts (Elliott 1986; Kulisheck 2005). Two types of structures are recognized: (1) apartment-style pueblo villages of 50 to 1800 rooms; and (2) small, isolated one- to four-room structures (field houses) that are generally associated with agricultural fields and may have been used for temporary residence or as hunting camps, lookouts, or storage structures (Preucel 1990; Kulisheck 2005). Collectively, these sites are evidence of a dual-residence land-use pattern by ancestral pueblo peoples between approximately 1200 and 1600 CE (Common Era; Preucel 1990; Orcutt 1999), in which field houses were occupied and used in association with agricultural activities distal to the pueblo villages that sustained concentrated occupation outside of the growing season.

Data

Archaeological data sources

Our dataset was derived from archaeological assessments conducted after five fires that occurred in the Jemez Mountains during or after the year 2000, corresponding to the time period in which the Landscape Fire and Resource Management Planning Tools (LANDFIRE) program began to produce spatial vegetation and fuels data. Archaeological sites in our analysis included prehistoric artifact scatters (predominantly ceramic artifacts, i.e., sherds), field houses and other small masonry structures, multi-roomed pueblo villages, rock shelters, rock alignments, and rock art, all dating from prior to the eighteenth century CE (Kulisheck and Elliott 2005). We excluded historical period (eighteenth century CE and later) sites (e.g., wooden cabins and corrals, or trash scatters) to maintain the focus on the prehistoric archaeological record and because previous studies have established that historical wood structures are consumed by fire regardless of severity (Fiero 1995). We also excluded prehistoric obsidian scatters (quarries), because these occurred across a relatively narrow range of forest types and elevations within our study area. Further, obsidian is known to exhibit few observed fire effects at low temperatures but very dramatic alterations at higher temperatures (Loyd et al. 2002; Steffen 2005), a finding that would have biased our models for other, more fire-sensitive artifact and site types.

Samples for the analysis included 858 individual archaeological sites exposed to one or more wildfires that burned on the Santa Fe National Forest (US Forest Service), in Bandelier National Monument (National Park Service), or in the Valles Caldera National Preserve (National Park Service), New Mexico, USA, between 2000 and 2011 (Fig. 1). Five fires were included: Cerro Grande (2000), Lakes (2002), Virgin (2003), Virgin Canyon (2010), and Las Conchas (2011). We extracted archaeological fire effects and site severity data from assessment forms filled out by archaeologists during post-fire reconnaissance of archaeological sites. These assessments, conducted immediately after wildfires, provide field-measured information on site condition, burn severity, fire effects to features and artifacts, potential erosion issues, and other fire effects information, and are conducted to determine need for follow-up action to reduce potential negative after-effects of fire (e.g., hazards associated with standing dead trees, flooding, or erosion) (Clark and McKinley 2011; Tillery et al. 2011). All known (i.e., previously recorded) sites within four of the five fires were assessed. For the Cerro Grande Fire, only a portion of known sites were targeted for assessment, based on assumed burn severity and likelihood of fire impacts to sites. As has been reported elsewhere, fire can reveal new archaeological sites or expose artifacts at known sites (Nisengard et al. 2002; Johnson 2004; Davis 2018). During travel to known sites for assessments after the Las Conchas Fire, new archaeological sites were discovered. These new sites, when encountered, were briefly documented and post-fire assessments were completed.

We used observations from the 858 assessments to train and test our predictive models of archaeological fire effects and observed burn severity as a function of environmental and climatic factors. We consolidated data from the assessment forms to summarize two measures of fire impact at each archaeological site: a two-class variable representing fire effect on artifacts or features within the site (“fire effect”) and a four-class variable representing severity of fire at the site as a whole (“site burn severity”). For fire effects models, we created a binomial response variable that could be derived from all archaeological assessment records: fire effect present on artifacts or features within each archaeological site; or fire effect absent. Fire effects observed on artifacts and features included changes in surface color and sheen, fine fracturing and cracking, spalling or fracturing, and deposition of resins and soot (Ryan et al. 2012). Assigning a binomial response allowed us to standardize observations across post-fire assessment forms that varied across different fires; for instance, some forms recorded the presence of fire effects for a site as a whole, whereas other forms recorded fire effects for individual artifacts or features within a site.

Archaeologists classified site burn severity using the following criteria: unburned sites were locations that fire did not cross (Fig. 2A); sites burned at low severity had partially blackened to partially consumed litter, partially burned or scorched woody debris, and partially to completely consumed foliage and smaller twigs but mainly intact branches (Fig. 2B); sites burned at moderate severity had mostly to entirely consumed litter with residual coarse light ash, deeply charred duff and unaltered underlying soils, mainly consumed woody debris and deeply charred logs, and completely consumed foliage and smaller twigs but mainly intact branches (Fig. 2C); and sites burned at high severity had entirely consumed litter and duff with residual fine white ash, visibly altered mineral soil, deeply charred logs, and total consumption of all plants including most major stems and trunks (Fig. 2D). Terminology varied slightly across different post-fire assessments, so we standardized all observations to one of four site burn severity classes: (1) unburned, (2) low (scorched or lightly burned), (3) moderate, or (4) high (severely burned).

Fig. 2
figure 2

Archaeological sites in the Jemez Mountains, New Mexico, USA, exhibiting (A) unburned, (B) low, (C) moderate, and (D) high burn severity as determined by field-recorded, post-fire archaeological assessments. Sites are representative of those included in this study’s assessment of environmental predictors of fire effects and burn severity, 2000 to 2011. Photo credits: USDA Forest Service (A) 2014, (B) 2012, and (C and D) 2011

Environmental predictors

We developed a suite of 25 continuous or categorical variables related to topography, vegetation, fuels, or weather as potential predictors of observed site burn severity and fire effects (Tables 1 and 2). To derive site-specific measures, we created a 10 m buffer around each recorded site point location, which was considered to represent the average size of archaeological sites in the Jemez Mountains, and assigned predictor values based on mean (for continuous variables) or majority count (for categorical variables) within that buffer zone. We included topographic, climatic, and biotic predictors associated with ecological fire severity in Southwestern ecosystems (Holden et al. 2009; Dillon et al. 2011), predictors that influence fire impacts through their controls on biomass production and fuel condition (Abatzoglou and Kolden 2013; Holsinger et al. 2016), and predictors that are commonly used by fire ecologists and managers to model potential fire behavior and effects (Scott and Burgan 2005; Lutes et al. 2009).

Table 1 Continuous variables used to predict observed, artifact- and feature-level fire effects (fire effect) and burn severity class for archaeological sites in the Jemez Mountains, New Mexico, USA, from 2000 to 2011. Ranges were calculated as mean values for 10 m buffers around 858 recorded point locations of archaeological sites. Climate variables were summarized for the month of fire, fire season (here considered May through September), winter preceding the fire year (December through February), or year of fire, as indicated. Variable definitions are provided in italics. Variable units are specified where present; otherwise, values are indices or dimensionless
Table 2 Categorical variables used to predict observed artifact- and feature-level fire effects (fire effect) and burn severity class for archaeological sites in the Jemez Mountains, New Mexico, USA, from 2000 to 2011. See Additional file 1 for variable descriptions associated with LANDFIRE codes

We acquired monthly, seasonal, and annual weather predictors (PRISM Climate Group 2014; http://www.prism.oregonstate.edu, 800 m resolution) for each fire year and for the baseline period 1981 to 2010 (30-year normal) (Table 1). For temperature minima and maxima, we calculated the departure from the 30-year normal for the May to September fire season (TMINfs and TMAXfs, respectively) and the fire year (TMINan and TMAXan, respectively). For precipitation, we calculated the percent of the 30-year normal for the winter months preceding the fire (December through February, PRCPwin), the fire month (PRCPmo), and the fire year (PRCPan). We obtained a regional drought measure, the Palmer drought severity index (PDSI; Palmer 1965; http://www.cpc.ncep.noaa.gov/products/monitoring_and_data/drought.shtml), as a proxy for fuel and soil moisture conditions during the fire month (PDSImo) and May through September wildfire season (PDSIfs). The PDSI is useful as a predictor of wildfire activity because it integrates precipitation and temperature anomalies over several months to several seasons (Westerling and Swetnam 2003). We included two fire danger indices, Energy Release Component (ERC) and Burning Index (BI), calculated as the average for the fire month (ERCmo and BImo, respectively) or the May to September fire season (ERCfs and BIfs, respectively), for fuel model G (Abatzoglou 2013). The ERC is a multi-day composite index that reflects the contribution of all live and dead fuels to potential fire intensity and is considered a proxy for fuel dryness or heat per unit area available to the flaming front (Riley and Loehman 2016). The BI uses ERC and a fire spread component to provide a seasonal profile of fire danger and fire behavior for vegetation types with significant dead fuel loadings (Preisler et al. 2004).

Topographic predictors were computed from a 10-meter digital elevation model (DEM) acquired from the US Geological Survey (https://gisdata.nd.gov/Metadata/ISO/html/metadata_DEM_NED_10m.html) using the ArcGIS toolbox for surface gradient and geomorphometric modeling (Evans et al. 2014; Table 1). We calculated a compound topographic index (CTI), a steady-state soil wetness index that models water flow accumulation as a function of upstream contributing area and slope and is commonly used to quantify topographic control on hydrological processes (Gessler et al. 1995). Lower CTI values represent drier areas with steep slopes and low catchment capacity, and higher values represent areas of greater soil moisture accumulation, typically with large catchments and flatter slopes. The heat load index (HLI) is a topographically corrected measure of incident solar radiation that integrates slope and aspect, with lower values representing cooler, northeast slopes (McCune and Keon 2002). We calculated two aspect measures: a transformation from circular to linear aspect values (Aspect_Lin), and a Slope-Cosine-Aspect Index (Aspect_Cos) that incorporates topographic effects on solar irradiance and surface moisture, based on Stage’s (1976) method (slope percentage times the cosine of aspect). The Slope-Cosine-Aspect Index ranges from −1 to 1 for areas with slopes up to 100%, for which the cosine of aspect is −1 at 180° (south) and 1 at 0° (north), flat areas are assigned a value of −1, and areas with slopes greater than 100% are assigned a value of 1. We also extracted slope (percent) and elevation (meters above sea level) from the DEM.

Predictors representing pre-fire vegetation and fuels were acquired from the Landscape Fire and Resource Management Planning Tools (LANDFIRE) program as 30-meter raster grids (Rollins 2009; Table 2). We associated fires with the LANDFIRE data layers that most closely pre-dated them, including those from the years 2001 (LF1.0.5), 2008 (LF1.1.0), and 2010 (1.2.0) (http://landfire.cr.usgs.gov/viewer/, accessed July 2015). We included two fuels data layers commonly used in fire behavior prediction: Fuel Loading Model (FLM), a surface fuel classification with representative loadings for woody and non-woody fuel components; and the 40 Scott and Burgan Fire Behavior Fuel Models (F40), descriptive of distributions of surface fuel loadings for grass, shrub timber, and slash fuel types (Scott and Burgan 2005). Additional data layers representing type, amount, arrangement, and continuity of vegetation (fuels) were canopy bulk density (CBD), the available canopy fuel within a forested stand; vegetation condition class (VCC), a discrete measure that quantifies the degree to which current vegetation has departed from simulated historical vegetation reference conditions; existing vegetation type (EVT), the current distribution of plant community types; and existing vegetation cover (EVC), which represents the vertically projected percent cover of live canopy. All LANDFIRE data except CBD were included as categorical predictors in our models. We consolidated several of these datasets (Table 2) to reduce bias and other issues associated with unequal and sparsely distributed classes, particularly given the large number of potential class combinations within these data.

Analysis

Predicting likelihood of fire effects and site burn severity as a function of environment

We used a Random Forest (RF) classification algorithm (Breiman 2001), implemented in the randomForest (Liaw and Wiener 2002) and caret (Kuhn 2008) packages in the R statistical computing program (R 3.3.3, R Development Core Team 2021), to identify predictors associated with (1) the likelihood of observed fire effects on prehistoric artifacts (fire effect present or absent), and (2) site burn severity classification. Random Forest is a variant of the Classification and Regression Tree algorithm (CART; Breiman et al. 1984) that is able to produce accurate predictions without over-fitting (Breiman 2001) even when data are non-normal (Kane et al. 2015), spatially autocorrelated (Cutler et al. 2007), or exhibit non-linear relationships. During modeling, Random Forest permutes a large number of “trees” based on subsets of observations and predictors. These trees are compiled into a “forest” and classification probabilities are made based on the number of trees. RF implements a bootstrapping procedure using approximately two thirds of the data for training and the remaining one third for validation and to assess predictor importance. Individual predictor importance measures are calculated by comparing out-of-bag (OOB) classification errors between runs with and without each predictor.

We tuned each model by adjusting the number of predictors permuted for each tree (mtry) and the number of bootstrap replicates, and assessed best models based on Kappa (binomial dataset) or ROC (multinomial dataset) statistics (Kuhn 2008). The final RF models for site burn severity and fire effects present or absent used mtry = 2 and 2000 replicates. Next, we ran several RF iterations to determine the best method for accounting for class imbalances (Additional file 1), which might affect interpretation and assessment of model performance. We compared models over several resampling schemes: upsampled, downsampled, Synthetic Minority Oversampling Technique (SMOTE), and Random Over-Sampling Examples (ROSE, binomial responses only) by implementing R packages SMOTE (Chawla et al. 2002), DMwR (Torgo 2010), caret, pROC (Xavier et al. 2011), and ROSE (Lunardon et al. 2014). Models were trained on a subset of resampled data (75%) and tested against the remaining 25% of observations. Models to determine site burn severity class performed best on upsampled data, a method that randomly resamples smaller classes until they have an equal number of observations as compared to the highest class, that has proven robust for decision tree analyses (Crone and Finlay 2012). Model performance for fire effects present or absent was most improved using the SMOTE algorithm, which simultaneously upsamples the less common class and downsamples the most common class (Chawla et al. 2002).

We optimized predictor sets as described in Dillon et al. (2011). Briefly, for each model, we ran ten replicates (2000 trees per run) using resampled datasets and all predictors (Tables 1 and 2) and calculated the mean importance rank for each predictor (produced under the caret “rf” method) to create a final rank value. We then performed a k-cluster analysis (Hartigan and Wong 1979) to form five groups based on mean importance values. We identified an optimal model by running a series of models, each with ten replicates and 5-fold cross validation, on datasets with progressively reduced predictor sets based on the k-cluster ranking. Multi-class (site burn severity) analysis was run using the multiClassSummary function to produce mnlogLoss (negative Logarithmic Loss) metrics of model performance. Logarithmic loss measures the performance of a classification model in which predictions are probabilities between 0 and 1. As predicted probabilities approach 1, LogLoss approaches 0; perfect models will have a LogLoss value of 0. Binomial (fire effects) analysis was run using the twoClassSummary function to produce ROC (receiver operating characteristic) curves and accuracy and Kappa metrics of model performance. The ROC metrics are appropriate for binary classification problems and are calculated by plotting the true positive rate (sensitivity) as a function of the false positive rate (100-specificity). Accuracy is the percentage of correctly classified classes. Kappa is the normalized baseline classification accuracy that can be useful when using data with imbalance among classes. We used two measures of prediction accuracy to gauge model performance and identify optimal models: (1) mean decrease in accuracy, measured as the changes in tree accuracy when each predictor is permutated (more important predictors result in greater decreases when removed); and (2) mean Gini, a measure of node impurity in which predictors that split data into nodes with higher purity result in a greater decrease in the Gini coefficient (a higher mean decrease in Gini indicates higher variable importance). For each analysis, we determined the optimal model to be the one with the fewest predictors while also having performance metrics within one standard error of the best performing model. Finally, we implemented the optimal model using all observations in randomForest, RandomForestExplainer (Paluszynska et al. 2020), and PDP (Greenwell 2018) packages to develop partial dependence plots for model interpretation. Partial dependence plots graphically characterize relationships between individual predictor variables and predicted probabilities of observed fire effect or site burn severity class (Cutler et al. 2007).

We considered three additional measures of importance related to the forest structure to interpret RandomForest outputs and variable interactions: mean minimal depth, number of nodes, and P-values. Mean minimal depth assesses how well a predictor is able to split observations into classes. Variables with high impact on the prediction lead to larger splits between observations, tend to occur earlier in tree formation, and have lower minimal depth. Mean minimal depth can also highlight potentially important predictor interactions by identifying the conditional (comparative) mean minimal depth of variable pairs used to make first and second splits in individual trees; for example, in some instances, predictor importance is more apparent when it co-occurs with another variable than when it is considered individually. Number of nodes indicates how often a variable was used to split observations; higher values indicate greater importance. Finally, P-values were calculated based on a one-sided binomial test; P-values < 0.01 occur when the number of nodes using a given predictor exceeds the theoretical number of times that predictor would be used for a split, if predictors were selected randomly.

Results

Archaeological fire effects as a function of environment

Of the 858 sites surveyed, 503 (59%) recorded the presence of at least one type of fire effect and 355 (41%) recorded no fire effect. The SMOTE resampling resulted in an analysis dataset of 2485 observations, 1420 with observed fire effects and 1065 without. We compared three sets of models: (1) using all 25 predictors (full model), (2) removing the least important cluster, and (3) removing the two least important clusters. The best performing model for predicting archaeological fire effects (present or absent) retained all 25 predictors (Additional file 1; sensitivity = 0.82, specificity = 0.93). The OOB of the final model based on all observations was 10.98%, with individual class errors of 17% and 6% for predictions of effect absent and effect present, respectively.

Overall, topographic predictors best improved model accuracy, and elevation and weather predictors resulted in the greatest mean decrease in Gini coefficient (higher node purity) (Fig. 3). The EVT and EVC variables were also important predictors of observed fire effects. Partial dependence plots of top predictors show non-linear (complex bimodal and quadratic) relationships between predictors and fire effects (Fig. 4; Additional file 1: Figure A1.1). Fire effects on sites were more likely when minimum temperatures for the fire season (TMINfs) were 1 to 1.3 degrees warmer than the 30-year normal, and when average fire month ERC (ERCmo) values were greater than 95 or average fire season ERC (ERCfs) exceeded 70 (Fig. 4B; Additional file 1: Figure A1.1). Fire effects were also more likely on sites with tree cover versus sites with shrub or mainly herbaceous vegetation, and this effect was amplified for sites with neutral or slightly positive Aspect_Cos values (Fig. 4A; Additional file 1: Figure A1.1). Among EVT forest classes, sites with ponderosa pine or piñon–juniper vegetation were most likely to have fire effects (Additional file 1: Figures A1.1, A1.2). Among other important weather predictors, probability of fire effects appeared greatest when PRCPmo was <100% of normal, at PRCPwin values ranging between 65 and 75% of normal, or when PRCPan was 60% of normal (Additional file 1: Figures A1.1, A1.2). The probability of fire effects increased for sites with intermediate CTI values (~8 to 10), on sites with less than 50% slope, and on sites with more western aspects (HLI and Linear Aspect; Additional file 1: Figures A1.1, A1.2).

Fig. 3
figure 3

Multiway importance plot of 25 variables (top ten predictors are labeled) used to predict probability of fire effects on prehistoric artifacts and features after five fires that occurred between 2000 and 2011 in the Jemez Mountains, New Mexico, USA (n = 858). See Table 2 and Additional file 1 for full names and descriptions of predictors. For mean decrease in accuracy (x-axis), more important predictors result in greater decreases when removed. The Gini coefficient represents node impurity. Variables that lead to more homogenous nodes give greater decreases in mean Gini. Therefore, a higher mean decrease in Gini (y-axis) indicates higher variable importance in the model. P-values <0.01 indicate predictors that were selected more frequently as tree nodes than would be expected if predictors were selected randomly

Fig. 4
figure 4

Partial dependence between observed site fire effects on artifacts and features and important predictors for archaeological sites within perimeters of fires that occurred from 2000 to 2011 in the Jemez Mountains, New Mexico, USA (n = 858). (A) Two-way partial dependence (y-axis) for Existing Vegetation Cover (EVC; each box) × Slope-Cosine_Aspect Index (x-axis). (B) Two-way partial dependence values for mean fire season ERC (Energy Release Component, a composite fuel moisture index) and percent of normal, mean annual precipitation. Larger (more positive) partial dependence values, shown as lighter green and yellow colors, indicate a greater likelihood that a fire effect will be observed at the corresponding predictor value. Smaller (more negative) partial dependence values, shown as darker green and blue colors, indicate that a fire effect is less likely at the corresponding predictor value. See Table 2 for full description of EVC classes and other variables. Plots for other predictors are presented in Additional file 1: Figures A1.1 and A1.2

Vegetation type (EVT) and vegetation cover (EVC) were important for distinguishing between presence or absence of fire effects, as indicated by comparisons of predictor importance based on their usefulness for splitting data into classes (e.g., mean minimal depth and number of nodes; Additional file 1: Figure A1.3, A1.7). Analysis of the most frequent variable interactions showed that the lowest mean minimal depth was achieved when EVT and EVC occurred as the root (primary split) with secondary splits by Elevation, Aspect_Cos, HLI, and CTI (Additional file 1: Figure A1.3).

Site burn severity as a function of environment

Among post-fire site assessments, 20% (171) of sites were identified as unburned, 60% (513) recorded low burn severity, 13% (111) recorded moderate burn severity, and 7% (63) recorded high burn severity. Upsampling created a new dataset in which data for under-represented classes were upsampled until each severity class had 513 observations. Models using all predictors (full model), models without the least important cluster, and models without the two least important clusters all performed well (Additional file 1: Figure A1.4). The optimal site burn severity model (accuracy = −0.92, Kappa = 0.91; see Additional file 1: Figure A1.4) retained 19 predictors after dropping the lowest ranking (FLM, CBD, EVC, EVT, F40, and VCC). The final model using all observations gave an OOB of 6.49% with class errors of 5%, 2%, 3%, and 0.5% for unburned, low-, moderate-, and high-severity classifications, respectively. Balanced accuracy values are reported in Additional file 1. Averaged across all categories, topographic predictors were most important for individual tree accuracy and were best able to partition the sites into the correct site burn severity classes (Additional file 1: Figure A1.7). Averaged across all classes, analysis of the most frequent interactions showed that the lowest mean minimal depth occurred when the Elevation variable formed a root and various topographic variables formed secondary splits (Additional file 1: Figure A1.6). The ERCmo and Elevation pairs, although slightly less frequent, also gave low mean minimal depth. These results correspond with importance rankings for Elevation, topographic predictors, and ERCmo derived from mean decrease Gini and accuracy measures.

Archaeological sites designated as unburned or burned at low or moderate severity exhibited a high degree of overlap and were relatively clustered in multidimensional space (Fig. 5), indicating similarity in predictor values among and within these classes. Environmental characteristics of sites with a recorded high burn severity were distributed across a much greater range of values, varied more within their class, and were less likely to overlap with other burn severity classes (Fig. 5). Within each burn-severity class, predictor importance varied although variable importance was generally similar for unburned, moderate, and high burn-severity classes (Fig. 6). Topographic predictors were highly important for distinguishing unburned, moderate, and high site burn severity classifications (Fig. 6). As indicated by partial dependence plots, high archaeological site burn severity was more likely in settings with more southerly orientation (with peak probabilities occurring at Aspect_Lin values of around 130 to180 degrees, where 180 degrees = south), on southwestern slopes (HLI values >0.8), and on southern aspects or flatter slopes (more negative Aspect_Cos values) (Fig. 7, Additional file 1: Figures A1.4, A1.5). High burn severity was also more likely on steeper slopes, with probabilities increasing when CTI < 6 and Slope > 100%, and at elevations >2200 m. Moderate and low burn severity were more likely present on sites in settings where the Slope-Cosine-Aspect Index neared 0 and at higher HLI (Fig. 7). The likelihood of an unburned site classification increased with increasing Aspect_Cos values and where CTI > 5 and decreased as elevation and HLI increased (Fig. 7). Some weather variables were important for predicting low burn-severity sites (Fig. 6) and partial dependence plots show that low site burn severity was more likely when ERCmo was >95, PRCPfs was >80%, PRCPan was 60% of normal, TMINan was between 1.5 and 2.4 degrees above normal, and PDSImo was less than −4 (Additional file 1: Figures A1.4, A1.5).

Fig. 5
figure 5

Multidimensional scatter (MDS) plot for observed site burn severity classifications for 858 archeological sites sampled after five fires (2000 to 2011) in the Jemez Mountains of New Mexico, USA. Four designations of site burn severity (unburned, low, moderate, high) were obtained from post-fire assessments of archaeological sites and used to identify environmental predictors of burn severity using Random Forest. MDS projects the proximity matrix produced by Random Forest to a 2-dimensional space (Dim 1 and Dim 2), in which more similar observations appear closer together in the plot

Fig. 6
figure 6

Rank of importance (mean decrease Gini) of each variable for predicting each site burn severity class (unburned, low, moderate, high) as determined by post-fire archaeological assessments. The Gini coefficient represents node impurity. Variables that lead to more homogenous nodes give greater decreases in mean Gini. Variable importance was tested as part of an analysis to determine the best environmental predictors of burn severity for 858 sites burned during five wildfires in the Jemez Mountains, New Mexico, USA, from 2000 to 2011. Numbers represent variable importance rank, with 1 being most important. Predictors are presented in order of importance (greatest to least) for predicting sites with a high severity designation. See Tables 1 and 2 for predictor definitions. Additional file 1 reports overall importance ranks for variables as measured by accuracy and mean decrease in Gini

Fig. 7
figure 7

Partial dependence between site burn severity and environmental predictors for archaeological sites in the Jemez Mountains, New Mexico, USA. Shown are six predictors determined to be important for predicting site burn severity in our study of 858 sites assessed after five wildfires that occurred between 2000 and 2011: (A) Elevation, (B) Heat Load Index, (C) Slope-Cosine-Aspect Index, (D) Percent Slope, (E) Compound Topographic Index, and (F) Linear Aspect Index. For (F) Linear Aspect Index, N = north, S = south, E = east, and W = west. Burn severity is represented as follows: high burn severity is a short dashed line, moderate severity is a dotted line, low severity is a long dashed line, and unburned is a solid black line. The y-axis represents the probability of observing a particular burn severity class for each predictor value, holding all other predictors constant at their average value. Larger (more positive) partial dependence values correspond to a greater likelihood that a severity class will be observed at the predictor value indicated on the x-axis. Smaller (more negative) partial dependence values indicate that a severity class is less likely at the corresponding predictor value

Discussion

We found that the likelihood of observing fire effects on archaeological sites, artifacts, or features was best estimated with a combination of variables related to topography, pre-fire weather, and fuel condition (Fig. 3). Models indicated environmental thresholds above which fire effects were more likely to be present, including fire season minimum temperature at least 1.0 °C warmer than the 30-year normal, high ERC during the fire season and fire month, and low winter precipitation. Fire effects were more likely for sites located above 2300 m in elevation, on sloped terrain, and under tree cover (Fig. 4; Additional file 1: Figures A1.1, A1.2). When considering predictor interactions, the influence of topography and fire-weather varied according to vegetation type and cover (Fig. 4; Additional file 1: Figure A1.3). In particular, fire effects were more likely to occur on forested sites with more dense vegetation cover versus those with shrub or herbaceous cover. The influence of topographic predictors remained relatively consistent across vegetation type and cover classes.

Our models for predicting fire severity at archaeological sites contrast somewhat with previous analyses that identified topography and vegetation type or structure as predictors of ecological burn severity, as assessed by spectral (satellite derived) measures (Dillon et al. 2011; Birch et al. 2015). Archaeological site burn severity was best predicted by topographic variables and pre-fire weather, whereas vegetation and fuel cover, type, and amount were less important predictors (Fig. 6; Additional file 1: Figure A1.8). In addition, our models indicated the importance of fuel condition—as measured indirectly by variables that reflect fuel aridity (e.g., southern aspect, incident solar radiation, surface heating, and soil moisture accumulation)—for predicting potential site burn severity. In general, increased probability of high site burn severity was associated with topographic settings indicative of drier fuel conditions (e.g., trends for HLI and Aspect_Cos, and CTI; Fig. 7; Additional file 1: Figures A1.4, A1.5). Partial dependence plots indicated a large distinction between high burn severity versus all other severity classifications (Fig. 7; Additional file 1: Figures A1.4, A1.5). This result may indicate a threshold below which fire effects are more difficult to identify or classify, resulting in less specific predicted intervals for burn-severity classes other than high severity.

Unburned sites were best predicted by topographic variables, and these predictors were also best able to distinguish between unburned sites versus those with recorded low or moderate burn severity (Figs. 6, 7), but low burn-severity sites were best predicted by weather conditions (Fig. 6). We interpret this to mean that, all else being equal, if a site were not predisposed by its vegetation class and topographical position to burn at high severity, then the likelihood of the site burning at low versus moderate severity is determined by local weather conditions and, specifically, those that directly relate to lower fuel moisture. The presence of archaeological fire effects was best predicted by the same weather variables (ERCmo, PRCPmo, and PRCPan) found important for predicting low burn-severity sites (Figs. 3, 6). We interpret this to mean that the likelihood of observing a fire effect is greater when local weather conditions create drier fuelbeds.

Low burn-severity sites predominated among post-fire archaeological assessments, yet were among the most difficult to correctly classify using Random Forest algorithms. We suggest two reasons for this: first, given their larger representation, sites classified as low burn severity likely occurred across a broader environmental range as compared to other classes. Second, because sites classified as low burn severity were the most common in our dataset, our resampling scheme (upsampling less frequent categories) may not have changed the proportional frequency of site characteristics. However, for less common classes (e.g., high site burn severity, 7% of the dataset before upsampling), upsampling might emphasize a relatively narrower range of characteristics, resulting in a more specific range of predictor values. If true, the ability of our models to identify ecological conditions associated with high burn severity may be less robust than indicated by the performance metrics.

Random Forest is a powerful tool for classification problems such as those presented in our study, but it does not lend itself to easy interpretations of why some variables are more important predictors than others. Model optimization procedures favored models with high numbers of predictors, and partial dependence plots of those predictors indicate that the probability of observing fire effects or a certain site burn severity class depended upon a suite of interactions (Figs. 4, 7; Additional file 1: Figures A1.1 through A1.6). Partial dependence plots are one method for visualizing relationships, and we considered individual and additive relationships between variables and predicted values using these plots (e.g., Figs. 4 and 7; Additional file 1: Figures A1.1 through A1.5). Partial dependence plots displaying predicted trends across the range of values for single variables indicate that most predictors exhibited non-linear relationships with observed fire effects. Some of these non-linear patterns are likely driven by complex interactions among predictors that influence probability of observing fire effects or high fire severity. By examining the relative structure of individual trees through RandomForestExplainer, we identified potential pairs of variables driving each model’s ability to distinguish among sites by fire effect and severity, important interactions within these systems (Additional file 1: Figures A1.3, A1.6). However, it remains outside the scope of the current analysis to assess how this covariation influences likelihood of seeing fire effect or severity.

Although models performed well against test data (Additional file 1: Figure A1.2), caution may be warranted when extrapolating these results to other sites given the complexity of predictor interactions and the nature of our study design. Our sample size was robust and spanned a wide range of forest types and topographic settings, but known site location is objectively constrained and biased within the archaeological record to places where (1) past peoples chose to live, work, or leave material signs of their presence; and (2) archaeologists have conducted surveys and assessments that led to detection and recording of sites. These locations likely do not include all possible, unique settings within the Jemez Mountains or other similar southwestern US landscapes. However, we posit that our large sample size is representative within the settings where archaeological sites are typical and where post-fire surveys and assessments have been conducted (over the past five decades). Our models’ ability to predict fire effects in other areas—for example, those with very different environmental settings or fuel types or those with archaeological sites and artifacts of different sensitivity to fire impacts—may be limited.

Conclusions

In this study, we analyzed the relationships among environmental variables and observed fire effects on archaeological sites, artifacts, and features. Our study area, the Jemez Mountains of New Mexico, is a region with a dense and varied prehistoric record that is significant in the context of land management and research on past human–environment interactions and cultural history, and is of great importance for contemporary descendant communities (Liebmann et al. 2016). Our analysis identified the critical environmental settings and conditions that are related to the presence and severity of archaeological fire effects in this region. Increased high-severity wildfire—driven by climate changes and legacy land use and management activities—increases the vulnerability of archaeological sites to negative fire impacts that are likely to compromise the integrity, interpretation, and cultural significance of the prehistoric material record. Archaeological site burn severity is largely linked to topographic or weather variables related to drier, more combustible fuels, conditions that occur with warming and drying trends related to anthropogenic climate change (Westerling and Swetnam 2003). As a result, climate changes are likely to increase the potential for high-severity fire activity on cultural landscapes where fuels are not limiting (Parks et al. 2015; Abatzoglou and Williams 2016). As cultural resource considerations are increasingly integrated into fire management planning and operations (Lissoway and Propper 1988; Haines and Schofer 2008), models for predicting where and when fires may negatively affect the archaeological record can be used to prioritize fuel treatments, inform fire management plans, and guide post-fire rehabilitation efforts.