Introduction

Throughout the Late Pleistocene, there was a global extinction of 97 genera of megafauna (Barnosky et al., 2004; Koch & Barnosky, 2006). Natural and swift species extinctions are frequent events in the fossil record of this period. However, the worldwide disappearance of this select group of mammals within a relatively short time is exceptional. Scientists have suggested several hypotheses to explain this distinctive mass extinction involving large mammals. For example, human overhunting and habitat destruction, climate change, an extra-terrestrial impact, and hyper-virulent, hyper-lethal diseases are just a few hypotheses in the current literature (Burney & Flannery, 2005; Koch & Barnosky, 2006; Firestone et al., 2007). To date, ideas regarding how climate change, human dispersals and overhunting might have affected megafauna, have been the most extensively investigated. Nonetheless, the degree to which these hypotheses are supported varies broadly. Importantly, the relative contribution of anthropogenic impacts to these global extinctions remains a contentious discussion among scientists worldwide (e.g., Grayson & Meltzer, 2003; Koch & Barnosky, 2006). This controversial discussion is largely hampered by a dearth of archaeological evidence securely associating human foraging behavior to megafauna subfossil remains across multiple regions like Australia (Fillios et al., 2010; Wroe & Field, 2006), Madagascar (Anderson et al., 2018; Hansford et al., 2018; Mitchell, 2020; Virah-Sawmy et al., 2010), South America (Fariña & Castilla, 2007; Fariña et al., 2014; Suárez et al., 2014), and North America (Eren et al., 2021; Haynes, 2017; Meltzer, 2020). In North America, for example, where all large mammals over 1000 kg went extinct toward the end of the Late Pleistocene (Koch & Barnosky, 2006), scientists across relevant fields have debated the cause of these extinctions for decades with little consensus. While some argue that human overhunting caused the megafauna extinctions (Alroy, 2001; Haynes, 2018; Martin, 1973; Mosimann & Martin, 1975; Surovell et al., 2016; Surovell & Waguespack, 2008), others contend that the effect of humans on these extinctions is unclear. Instead, they propose that a combination of climatic changes and human interference caused the demise of megafauna such as proboscideans (mammoth and mastodon) (Agam & Barkai, 2018; Araujo et al., 2017; Barnosky, 2008; Barnosky et al., 2004; Grayson & Meltzer 2003; Koch & Barnosky, 2006; Meltzer, 2015). These inferential differences are rooted in an empirical problem, in that, the evidence for intensive human predation, butchery, or consumption of megafauna is exiguous and is often ambiguous (Eren et al., 2021; Grayson, 1984; Grayson & Meltzer, 2003; Meltzer, 2015), leading some archaeologists to conclude that the degree to which humans included hunted megafauna in their diet remains uncertain and, with respect to certain species, was possibly minimal (e.g., Eren et al., 2021).

When studying prehistoric proboscidean hunting and butchery, researchers frequently consider stone tools that are clearly associated with zooarchaeological remains as strong evidence of human-animal interactions. In North America, such examples have been discovered at multiple archaeological sites associated with proboscideans, including Lehner, Colby, Murray Springs, Blackwater Draw, Dent, Domebo, Naco, and Escapule (Grayson, 1984; Grayson & Meltzer, 2002, 2003; Johnson, 2007; Surovell & Waguespack, 2008). Despite this evidence, diagnostic stone tools have been relatively infrequent at proboscidean kill and butchery sites (Grayson, 1984; Grayson & Meltzer, 2002, 2003). In the absence of such tools, researchers have looked to bone surface modifications (BSM) as potential trace evidence of human butchery behavior (Grayson & Meltzer, 2002). However, to date, researchers have mainly used qualitative methods to distinguish butchery marks from other types of BSM (e.g., Johnson, 2007; Lyman, 1987; Olsen & Shipman, 1988; Shipman, 1981). Consequently, many BSM identifications have been contentious, and researchers have disagreed over their interpretation (Blumenschine, 1995; Bunn, 1981; de Heinzelin et al., 1999; Grayson & Meltzer, 2002; Haynes, 2017; Holen, 2006; Holen et al., 2017; McPherron et al., 2010).

To resolve conflicting interpretations, researchers have applied quantitative methods to interpret BSM. For example, archaeologists have increasingly demonstrated the viability of studying BSM using three-dimensional (3D) methods (Bello & Soligo, 2008; Bonney, 2014; Otárola-Castillo et al., 2018; Pante et al., 2017). These methods enable researchers to quantitatively compare the shapes of experimentally generated BSM to the shapes of archaeological BSM and thus identify possible agents of the archaeological BSM. By statistically comparing 3D scans of these marks, researchers can assign probability values to multiple agents hypothesized as potential makers of the mark (Harris et al., 2017; Otárola-Castillo et al., 2018). This approach also presents the possibility of revisiting specimens that have exhibited BSM yet have lacked strong lithic evidence of human hunting or butchery, as might be the case with the Bowser Road mastodon (BR mastodon) assemblage.

In December 2018, after learning of Otárola-Castillo et al.,’s (2018) research, R. M. Gramly contacted Otárola-Castillo (EOC) to ask if a similar BSM morphometric analysis could be conducted on the skeletal element assemblage of the BR, a recently excavated mastodon from Orange County, New York (Fig. 1). The BR mastodon is an example of a large Pleistocene-aged mammal recovered with relatively scarce lithic evidence, but with potential prehistoric, butchery-generated BSM (Gramly, 2017). Gramly (2017) observed several BSM on the BR mastodon and attributed many of them to human butchery behavior as a result of chopping with stone tools. As such, the faunal remains of this individual provide an opportunity to evaluate the probability that these BSM are trace evidence of human–proboscidean interaction—adding to the body of evidence used to infer the degree of megafauna inclusion in the diet of early North Americans.

Fig. 1
figure 1

Map showing the location of the Bowser Road mastodon site within the United States and New York State

To accomplish this objective, this study conducted a controlled experiment to generate experimental stone tool chopping marks using a device calibrated to deliver the same amount of force that a human would chop large animal carcasses to process their soft tissues for food products. We used 3D scans of this experimental BSM to assess the probability that the BR mastodon marks were created by the methods we simulated. We calculated mark probabilities by applying 3D geometric morphometrics (GM) coupled with Bayesian probabilities through discriminant function analysis (DFA).

Background

Human-Animal Interactions and Megafauna Extinctions

Evidence of Pleistocene megafauna butchery is meaningful for accurately depicting this human-animal interaction—whether scavenging or hunting—and estimating its intensity worldwide. This inferential process is essential toward reconciling the multiple hypotheses proposed to characterize the effect of human foraging behavior on the extinction of ice age megafauna. Of course, inferring that BSM evidence is consistent with butchery behavior alone is not direct evidence of an active predator–prey interaction where Late Pleistocene people hunted now-extinct megafauna. However, as mentioned in the introduction, decreasing the uncertainty surrounding the agent responsible for BSM creation is necessary. Doing this will improve the current body of archaeological evidence related to hypotheses of megafauna inclusion in the diet of Pleistocene foragers worldwide.

Securely identifying instances of humans hunting or scavenging extinct megafauna is necessary to resolve the ongoing debates on human environmental impacts globally. Such controversies surrounding BSM identification, for example, are at the forefront of debates surrounding the colonization of Madagascar by humans. Hansford et al., (2018) relied on cut marked elephant-bird bones to propose that humans settled Madagascar 10,000 years ago. Hansford et al.’s hypothesis pushes the timing of initial human occupation on the island over 6000 years. Peers have been critical of this hypothesis, however, suggesting that such butchery mark identifications on Madagascar’s megafauna predating traditional colonization timeframes are inaccurately attributed to human activity (Anderson et al., 2018; Mitchell, 2020). Additionally, controversial BSM classifications also cloud discussions of human activity impacting biotic change during the Late Pleistocene colonization of Sahul. For instance, at Cuddie Springs, in Australia, Fillios et al., (2010: Table 5) identified cut marks on the extinct Macropus giganteus titan, suggesting human butchery. However, they emphasize the uncertainty in their identification and mention that cut marks may be mistaken for trampling marks. In addition, Field et al. (2013) express skepticism and add that although these data may suggest hunting or butchery, more evidence of human agency is necessary for a secure inference. Moreover, archaeologists in South America have also identified controversial cut marks on 30,000-year-old megafauna fossils in Uruguay (Domínguez-Rodrigo et al., 2021; Fariña & Castilla, 2007; Fariña et al., 2014). This claim, if accepted, pushes the timing of human colonization of the Americas back by nearly 15,000 years and would overturn current paradigms. However, Suárez et al. (2014) question the methodology used to identify these BSM and, instead, suggest that a complete paradigm shift of the peopling in the Americas is unnecessary and unfounded (see also McPherron et al., 2022; Calder et al., 2022 on machine learning applications to BSM).

In North America, the context of similar arguments surrounds evidence dating to the end of the Pleistocene, around 12,000 years ago, when 38 genera of large mammals, including proboscideans, giant ground sloths, saber-toothed cats, and others, went extinct across North America (Grayson, 1991; Grayson et al., 2021; Heintzman et al., 2017; Koch & Barnosky, 2006; Meltzer, 2015). The ongoing debate on the causes of these global extinctions began in Europe (Grayson & Meltzer 2003). In the 1960s, radiocarbon dating revealed that the North American extinctions had occurred at a time when humans were arriving to and dispersing throughout the continent. The seeming contemporaneity of North American large mammal disappearances and human appearance events led researchers to hypothesize causal linkages between them (Alroy, 2001; Martin, 1973; Mosimann & Martin, 1975; Nagaoka et al., 2018; Surovell & Waguespack, 2008). Evidence from the archaeological record suggests that some Late Pleistocene people may have hunted large mammals or scavenged their remains (Brunswig & Pitblado, 2007; Figgins, 1933; Grayson & Meltzer, 2002; Haury, 1956). At the time, evidence from these sites and others strongly suggested that Paleoindians hunted proboscidean megafauna, likely using spears tipped with the fluted projectile points that were found associated with the animals. In light of this evidence, researchers in the 1960s proposed the widely popular Overkill Hypothesis (Martin, 1966, 1967). According to the Overkill Hypothesis, similar human–animal associations were evidence of a predator–prey relationship between humans and proboscideans that had driven the megafauna to extinction. However, the Overkill Hypothesis has been difficult to evaluate because the material evidence for such a human-driven extinction event remains scarce.

Although several megafauna kill sites across Europe have yielded a wealth of archaeological evidence of human presence (Koch & Barnosky, 2006; Surovell & Waguespack, 2008; Surovell et al., 2005), the North American record consists of only 15 sites with secure evidence of human interaction (Grayson & Meltzer, 2015). In these cases, the presence of lithic evidence in clear association with large faunal remains has formed the basis for the assumption of a human–megafauna association (Grayson & Meltzer, 2002). According to Grayson and Meltzer (2015), of these of these kill/butchery/scavenging sites, 11 represent human-mammoth and two are human-mastodon associations (see Eren et al., 2021). Unfortunately, even at these sites, stratigraphic information indicating the location of secure artifacts in relation to proboscideans has at times been absent, increasing the difficulty of determining a clear human–animal association (Grayson & Meltzer, 2002, 2003, 2015). Alternatively, others argue that the number of systemic associations between humans and proboscideans more likely ranges between 17 and as many as 26 (Mackie & Randall, 2021). Regardless, to many, such a low frequency of proboscidean kill sites does not seem like an overabundant source of evidence and counters the Overkill argument (Grayson & Meltzer, 2015), while others contend that a low kill site frequency is exactly what archaeologists should expect (Martin, 1973; Mossimann & Martin, 1975; Surovell & Grund, 2012; see also Surovell & Waguespack, 2008).

In the absence of clear stratigraphic associations between hunting and butchery stone tools and faunal remains, researchers have used alternative lines of evidence such as BSM as trace evidence of human butchery (Barnosky et al., 2004: 72; Haynes, 2007: 86; Surovell et al., 2005: 6232). This type of evidence, however, has been contentious (Baquedano et al., 2012; Domínguez-Rodrigo et al., 2010; Haynes & Hutson, 2013; McPherron et al., 2010), largely due to the current practices of BSM identification and categorization, which rely on subjective assessment. To address this limitation, recent studies have turned to more objective and replicable approaches to identify BSM (e.g., Harris et al., 2017; Otárola-Castillo et al., 2018). This study provides an updated quantitative approach to improve researchers’ ability to identify BSM on proboscidean faunal remains and to evaluate arguments on the predator–prey relationship between humans and proboscideans (and other megafauna).

Identification of Bone Surface Modifications

The study of BSM has been an area of an ongoing investigation by zooarchaeologists and taphonomists. Research has established that different taphonomic processes such as root etching, trampling, carnivore feeding, and human butchery are known to leave distinct marks on the surface of the bone; however, multiple different processes may produce similar marks and can be difficult to differentiate from one another (e.g., Behrensmeyer et al., 1986; Blumenschine, 1995; Bunn, 1981; Domínguez-Rodrigo et al., 2009; Fisher, 1995; Lyman, 1987; Olsen & Shipman, 1988; Shipman, 1981; Walker & Long, 1977; Yravedra et al., 2018). Researchers have historically relied on an experimental approach to differentiate BSM produced by different processes. Such studies generate experimental marks on bones, simulating a wide variety of processes and agents such as butchery using stone and metal tools (Greenfield, 1999; Potts & Shipman, 1981), carnivore modifications (Baquedano et al., 2012), and trampling marks (Domínguez-Rodrigo et al., 2009). Researchers then compare this experimental “known” set of marks to an “unknown” set (usually derived from fossil contexts), using the naked eye (Lyman, 1987; Potts & Shipman, 1981; Shipman, 1981). Such studies have yielded great insight into the surface topology of the marks unique to each BSM agent. For example, researchers have found that stone tools used for hunting and butchery leave distinct marks with internal striations on bone; at the same time, marks on bones subjected to large mammal and human trampling experiments generate a wide variety of shaped marks, including “V-shaped” striations, but are generally more “U-shaped,” i.e., wider, with a flatter bottom and internal striations (Domínguez-Rodrigo et al., 2009; Shipman, 1981).

The subjective nature of the methods used to identify BSM (Domínguez-Rodrigo et al., 2010; Lyman, 1987; Olsen & Shipman, 1988; Shipman, 1981), however, has led to disagreements across world archaeology, as detailed at the beginning of this “Background” section—including the archaeology of initial human dispersals into North America (Blumenschine, 1995; Bunn, 1981; Domínguez-Rodrigo et al., 2010; Grayson & Meltzer, 2002, 2015; Haynes & Hutson, 2013; Haynes & Krasinski, 2010; McPherron et al., 2010). In the overkill debate, for example, marks observed on the remains of Hebior (J. M. Kramer et al., 1996), Schaefer (Overstreet et al., 1993; Joyce, 2006), and Mud Lake (Overstreet, 1996, 1998) mammoths, the Fenske mastodon (Overstreet, 1998), and the Firelands ground sloth (Megalonyx jeffersonii) were once considered to be butchery marks. These marks were held as evidence that humans had hunted and butchered these megafaunas (Johnson, 2007; Redmond et al., 2012). However, recent studies have questioned whether the marks observed on the bones of these specimens are, in fact, marks resulting from butchery (see Grayson & Meltzer, 2002; 2015; Haynes, 2022; Haynes & Hutson, 2013 for further examples of contentiously identified BSM on megafauna remains). Instead, these researchers question whether these and other potential marks of similar age are butchery mark “mimics” (Haynes & Krasinski, 2010), resulting from taphonomic agents such as animal trampling and carnivore modification, among others.

Much research has been conducted on the problem of butchery mark identification and how to differentiate such marks from others that occur naturally on bone, especially sediment abrasion marks due to trampling (Behrensmeyer et al., 1986; Domínguez-Rodrigo et al., 2009; Olsen & Shipman, 1988). However, most of this research has been in the form of visual recognition of non metric morphological features, with little focus on quantitative measurement. Blind test studies have demonstrated that these methods are robust when examining BSM from certain agents of modification but are not sufficient in all contexts (Blumenschine et al., 1996; Wong, 2010). In this procedure, an analyst identifies marks based on the visual patterning of their shape; then, two independent observers verify the identifications. Some taphonomists have criticized this method, citing the lack of an accepted protocol or diagnosis as problematic, arguing that subjective description and appraisal during analysis should be kept to a minimum (Domínguez-Rodrigo et al., 2009).

To reduce individual researcher bias, a few quantitative studies have combined metric and nonmetric characteristics of BSM (cut marks) in univariate and multivariate statistical models (e.g., Dominguez-Rodrigo et al., 2009; Johnson et al., 2013; Krasinski, 2018). Others have also measured 2D profiles of cut marks (e.g., Bello & Soligo 2008). Moreover, some researchers have developed novel statistical and computationally intensive approaches to measure the 3D surfaces of BSM. These quantitative approaches aim to improve the reliability and replicability of BSM analysis by accurately discriminating among BSM that have resulted from different butchery behaviors, as well as marks created by non-human agents (Otárola-Castillo et al., 2018; Pante et al., 2017). Harris et al., (2017), for example, used Bayesian inference to synthesize both shape and trait data associated with individual BSM. In contrast to the subjective “expert approach,” Harris et al., (2017) provide a probabilistic solution wherein an agency is assigned to an individual BSM with an associated probability of assignment. Otárola-Castillo et al., (2018) created experimental cut marks using chert flakes held at two different angles. The researchers scanned the marks directly using a confocal microscope, cleaned the scans using MeshLab software, and conducted a landmark-based 3D GM analysis to compare the marks. Once the marks had been digitized, the researchers conducted a DFA that correctly identified the type of mark 88% of the time (Otárola-Castillo et al., 2018). These studies demonstrate the potential of 3D imaging to more accurately identify BSM than the more qualitative methods of classification.

Studies of Chop Marks as Bone Surface Modifications and the Bowser Road Mastodon

Visual inspection of the BR mastodon’s skeletal elements identified numerous BSM. Due to the size and shape of the observed BSM, the original analysts attributed their creation to human butchery. Notably, chopping marks or chop marks were observed on several elements belonging to the axial and appendicular skeleton, linking the mastodon remains to human activity (Gramly, 2017: 307–308, Appendix V). It is possible that early Americans used stone tool chopping to butcher the BR mastodon; however, the resulting BSM from this action are understudied. Historically, taphonomically-oriented zooarchaeologists define chop marks as BSM possessing a broad V-shaped cross-section resulting from sharp-force trauma as a result of primary butchery activities including disarticulation and the removal of bulk tissue (Gifford-Gonzalez, 2018: 290). After creating chop marks using an experimental stone chopper, Shipman (1981) described that the production of chop marks on bone occurred when “a stone artifact is used to strike a bone surface with a blow directed roughly perpendicular to the bone surface. Because the edge of the artifact is not drawn across the bone but is rather pushed in it, there are no [internal] striations,” a common feature associated with stone-tool-mediated butchery. This supports Walker & Long (1977) earlier description of experimental chop marks generated by metal axes as “primarily a product of compression forces applied during a relatively short time during which there is less lateral movement of the tool.” Shipman & Rose (1983) proposed that chop marks had greater breadth than cut marks, describing chop marks as “broad and V-shaped in cross-section; like punctures, they often show fragments of bones crushed inwards at the nadir. Unlike punctures, chopping marks are elongated ovals or grooves in outline.”

Similarly, Olsen (1988) experimentally generated chop marks on “fresh bone” using an obsidian chopper, describing them as relatively deep, broad, and V-shaped BSM. Along the same lines, Olsen & Shipman (1988) also described chop marks as having “a broad V shaped profile,” adding that they are “sometimes [observed] with parallel striations running vertically down” the marks. In the early 1990s, White (1992) synthesized some of the literature on chop marks available at the time (Cook, 1986; Voigt, 1983, Noe Nygaard, 1989; Binford, 1981; Flynn, 1976), concluding that chop marks presented little ambiguity in their identification. However, some ambiguity was apparent when distinguishing BSM that were the result of sharp force trauma delivered with the intent to “hack” or otherwise manipulate soft tissue, from BSM that were the result of an attempt at “bone breakage,” for example, using a (sharp-edged) hammerstone to access the marrow. From White’s perspective, it seemed reasonable to propose that the principle of equifinality may play a role in the creation of similar BSM through completely different behaviors, such as with BSM created by sharp-edged hammerstones and those produced by bifacially reduced stone choppers. We concur.

In the context of the BR mastodon, Gramly (2017) argues that several of the BSM observed on the mastodon are chop marks. He further infers that these resulted from sharp force trauma inflicted by bifacially reduced lithic hand axes to intensively butcher the mastodon, disarticulate its limbs (Gramly, 2017: 34) and “sever cartilage and tendons of a mastodon skeleton” (Gramly, 2017: 115). Gramly (2017) describes one stone chopper identified at the Bowser Road site as made from “quartzite (BR-14) with a width of 92 cm and a weight of 230.5 g. One edge is flat and thick, providing a comfortable grip or a regular surface for a stick or cord haft. Its cutting edge has been trimmed by many narrow removals superimposed upon step-fractures.” Gramly (2017) provides further examples of butchery tools (e.g., Figs. 79, 80, and 84) and chop marks.

Gramly (2017) describes that, axially, chop marks were identified near the margin of the right tusk, potentially indicating tusk removal behaviors. Nearly all vertebral elements seem to show evidence of chopping behaviors, with some vertebrae reduced to small fragments. Gramly (2017) hypothesizes that ancient butchers completely chopped and fragmented the atlas in order to access the nutrient-rich mastodon brain through the foramen magnum. While some rib bones are missing from the assemblage, nearly all rib bones excavated exhibited evidence of chop marks. Gramly (2017) proposes that these rib fragments may have been worked into flats or laths to function as tools.

The left scapula of the BR mastodon is another skeletal element present in the fossil assemblage with a large BSM inferred as a chopper mark. According to the investigators (Gramly, 2017: 64), this chopper mark likely originated from the removal of the mastodon’s left front limb. However, the left humerus and radius do not exhibit chopper marks. The left ulna and carpal bones do exhibit marks the excavators attributed to human chopping behaviors. The right forelimb exhibits a similar chop mark pattern as the left ulna. There are chop marks on nearly all skeletal elements of the left hindlimb. Although the majority of the right hindlimb is missing from the BR mastodon assemblage, the right foot bones also preserve evidence of chopping behavior.

These observations led Gramly (2017) to infer that “[t]he Bowser Road mastodon was intensively butchered; every surviving anatomical unit was affected to a high degree. Some limb bone fragments even show multiple chop-marks upon their interior surface (Fig. 76)—a fact suggesting that Clovis people may have partitioned mastodon bones for food or fuel just as Gravettian hunters did with mammoth bones.” Overall, the Bowser Road investigator identified extensive chopping behaviors on nearly all skeletal elements present in the assemblage, alongside further evidence of human butchery behavior through potentially burned rib fragments. Other taphonomic agents could have contributed to the BR mastodon, as evidenced by BSM inferred to be carnivore tooth marks (Gramly, 2017: 81).

Materials and methods

The Bowser Road Mastodon

Excavators recovered the BR mastodon in Orange County, New York. In 2013, the landowners discovered the mastodon in a swampy agricultural field. Volunteers from the American Society for Amateur Archaeology excavated the site in 2014–2015. At the time of excavation, the fragmentary and disarticulated nature of the BR mastodon suggested to the excavators that the animal may not have died of natural causes. Researchers left the remains untreated to preserve the integrity of the surface modifications (Gramly, 2017). In 2017, the mastodon was donated to the Harvard Museum of Comparative Zoology (MCZ), where it is now housed (2017).

The BR mastodon was dated using collagen and bioapatite extracted from the right tusk (Gramly, 2017: 195). Pooling the radiocarbon dates results in an approximate age of 10,880 ± 32 RCBP (Table 1). This is a potentially significant find because it dates to a time when humans are known to have been present on the continent. Therefore, the death of this individual might have been caused by human predation. Based on a qualitative differential diagnosis of the morphological features of the BSM found on the specimens, investigators inferred that the BR mastodon was the product of human butchery (Gramly, 2017). Several stone artifacts that may have been used for butchery, including potential hammerstones, flakes, scrapers, choppers, and anvils, were also recovered; however, excavators found no stone tools at the site that were diagnostic of a specific archaeological culture. Gramly (2017) believes that many of the BSM are chop marks made by stone hand axes or hammerstones (Fig. 2).

Table 1 Radiocarbon dates for the Bowser Road Mastodon (Gramly, 2017) calibrated using IntCal20 (Reimer et al, 2020) in OxCal 4.4 (Bronk-Ramsey 2009)
Fig. 2
figure 2

Sample of BSM images inferred as chop marks on the Bowser Road mastodon (Gramly, 2017). A BR-39 (atlas; posterior face), B BR-36 (right calcaneus, upper dorsal), C BR-250 (right radius, proximal articular surface), and D BR-40 (left femur, lateral side of medial condyle)

However, in cases where there is a lack of diagnostic material culture (e.g., a Clovis point) to increase the certainty of human involvement in the demise of megafauna, evidence may need to be derived from sites “where artifacts were found intimately associated with bones and/or teeth […] or the presence of compelling evidence of cut marks or human-caused bone breakage […] or cases of clear spatial manipulation of skeletal elements” (Grayson & Meltzer, 2002). Especially given that the faunal assemblage was recovered from a swampy agricultural area, the unsteady ground and use of heavy machinery might have caused the bones to move and incur marks (Gramly, 2017: 31–32). This raises questions regarding the interpretation that humans caused the BSM on the BR mastodon. Ultimately, the need to clarify the association between BSM and lithic artifacts makes the BR mastodon a good candidate for further BSM analyses.

In December 2017, Gramly invited one of us (EOC) to conduct BSM analyses on the Bowser Road assemblage, using the same approach as Otarola-Castillo et al., (2018). EOC and EJS traveled to the MCZ to observe the assemblage. At the time, however, not all specimens were processed and catalogued by the MCZ. As such, we requested access to the mastodon’s appendicular skeleton, where some of the most notable BSM were reported. We also observed other skeletal elements. With the aid of magnification (× 40), we observed accessible specimens and recorded the presence and absence of BSM, including potential carnivore tooth marks, root etching, trampling marks, cut marks, and breakage (S. J. Olsen, 1979). Additionally, we photographed large or ambiguous marks for further study. Furthermore, as detailed in the “Digitization of Bone Surface Modifications” section, to conduct a quantitative hypothesis test similar to Otarola-Castillo et al. (2018), we recorded the BSM from the BR mastodon by creating a cast of each mark (Table 5).

Comparative Sample

Given the size and shape of the BSM found on the BR mastodon, the excavators posited that Clovis-age hunter-gatherers had butchered the animal using stone hand axes and “hand-held” choppers (Gramly, 2017: 115). To test this “hand-chopping” hypothesis, we conducted an experimental comparative analysis of the BSM observed on the BR mastodon. We designed a series of chopping experiments to generate BSM on fully fleshed animal bones under controlled conditions. We constructed a device, hereinafter referred to as (Arnold) “the chopper”, that swings a stone chopper with controlled force (Fig. 3). We secured the stone chopper between the boards of the arm, with two padded bolts positioned above the chopper to keep it in place and mitigate damage to the tool (Fig. 2). We also connected a stainless-steel spring to each side of the arm to create vertical resistance when raising the arm. We equipped the chopper with an electric winch that was powerful enough to lift the arm under spring tension and a release mechanism to enable the spring tension to pull the arm downward. This system allowed us to control and systematically vary the amount of force.

Fig. 3
figure 3

“Arnold” the force-calibrated chopper shown from two angles. Experimenters added power winch and casters for ease of chopping function and mobility

To introduce experimental control, we calibrated the amount of force delivered by the chopper to replicate the force generated by human chopping. We created experimental axes consisting of expediently flintknapped bifaces (see raw material description below). We conducted two preliminary experiments for calibration. In the first, we attached an experimental stone axe to a 2-in × 2-in × 4-ft square of unfinished pine board. We used this hafted experimental axe to deliver blows on a force plate. The second experiment delivered blows to the force plate using a handheld stone axe to simulate hand-chopping. The plate recorded the force exerted with each hit and thus calibrated the amount of force generated by both the handheld and hafted choppers. We held the hafted axe with one hand and delivered 10 blows from a standing position to a three-foot-tall surface. The average force from these trials was approximately 1600 N.; moreover, holding the choppers by hand alone, we delivered 10 blows from a kneeling butchery position. The average force from these trials was approximately 100 N. This enabled us to calibrate the chopper by systematically measuring the amount of force delivered to the force plate as a function of height.

To evaluate the hand-chopping hypothesis, we set the chopper to deliver the amount of force generated by the hafted stone axe. We used this chopping method as our control hypothesis. This method swings a chopper with greater force than hand-chopping and thus causes more damage to the bone. If the damage inflicted on bone by the control was more significant than observed on the Mastodon BSM, we would lower the force delivered and experiment again. However, if the control method resulted in less extensive bone damage than the mastodon BSM, then neither chopping hypothesis likely caused the damage, and both may be discarded. We achieved delivering the amount of force generated by the control hypothesis by raising the chopper arm to a height of 30 cm (see short video https://bit.ly/33XXLEV).

To replicate the technology of North American butchers 12,000 years ago, we obtained Texas Flint from Craig Ratzat, who sources material from the Fredericksburg, Texas area (www.neolithics.com). Expert flintknappers crafted 10 bifacially reduced stone choppers for this experiment (Eren and Shea; Fig. 4). The experimental choppers were modeled on stone choppers excavated from early Paleoindian sites such as the Gault site and Topper site (Smallwood, 2006, 2010). Each flintknapper created five stone choppers. Individual stone tools were attached to the chopper’s arm with the objective to deliver chopping blows on three cow forelimbs with all meat still covering the bone (specimens 1–3, Fig. 5).

Fig. 4
figure 4

Bifacially reduced lithic hand axes (stone choppers) created by experienced flintknappers (M. I. Eren and J. J. Shea)

Fig. 5
figure 5

Specimen #1 in the chopper ready for experiment (experimenters from left to right EOC, EJS, and JBR)

Cow limbs were obtained from the Purdue Butcher Block facility. Specimen one was a right forelimb from a six-month-old calf, and specimens two and three were the forelimbs of a mature cow. The skin was removed but the limbs were otherwise intact. We placed each complete forelimb specimen beneath the chopper’s arm, ensuring that each strike was aimed directly on top of the specimen, where bones are most likely located, to maximize the probability of tool-to-bone contact. Beginning with the first tools from each knapper, we delivered chopping blows on the distal ends of the limbs (near the carpals)—one from each on both the medial and lateral side. We moved toward the proximal end of the limbs and switched to a new tool every 10 cm, which resulted in two chops per tool per leg for a total of 40 blows and potential chop marks. Before each experimental chop, we ensured the forelimbs were stable but loose enough on the chopping machine’s surface to allow bone kickback and the choppers to slip across the bone. Specimen one was a test subject; it received nine chopping blows using two different stone tools created by each flintknapper. In total, the chopper delivered 49 blows to the three specimens (Table 2).

Table 2 Basic description of faunal specimens used for experimentation, including specimen number, skeletal elements, side, and number of chopper blows delivered to each specimen

After completing the experimental chop marks, we removed the soft tissue from the cow forelimbs using expedient stone tools made from chert (unretouched flakes and blades). We left soft tissue around the areas with chop marks so as not to confuse the two types of BSM. To closely examine the experimental BSM, we removed the remaining soft tissue and grease from the cow bones by maceration. We simmered each bone in a large slow cooker, checking every couple of hours until the soft tissue fell away. We removed the remaining grease with dish soap and warm water (King & Birch, 2015) and allowed the bones to dry for several days (Fig. 6).

Fig. 6
figure 6

Experimental assemblages 2 and 3 after maceration, grease removal, and drying)

We followed the aforementioned procedures to examine the experimental BSM. We distinguished between chop marks produced by the and cut marks from butchery based on their location. The resulting chop marks on each specimen (Fig. 7) were recorded, labeled, photographed, and prepared for casting and 3D scanning.

Fig. 7
figure 7

Sample of experimental chop marks on a cow humerus (LCA-1-HM-R) created using a stone hand axe and the calibrated chopper

Digitization of Bone Surface Modifications

We recorded the BR mastodon BSM and the experimental BSM by creating a cast of each mark (a “peel”) using the Struers RepliSet, a non-destructive system for replicating 3D structures to a below micron resolution (0.2 μm, Fig. 8). The RepliSet system consists of a handheld dispensing gun used to apply a flexible, silicon-based compound that captures the microstructure of sub-micron objects, including the BSM. After setting for several minutes, the compound can be removed as a peel without damaging the underlying bone. This peel results in a precise 3D representation of the BSM, from which digital 3D models can be created. The peels of the Bowser Road mastodon BSM were scanned using a Keyence VR 3200 3D microscope, which has a maximum resolution of 0.1 μm (µm), while the peels of the experimental BSM were scanned using a Nanovea Jr100 profilometer, which has a maximum resolution of 5 µm.

Fig. 8
figure 8

Peel of chop mark (CH-1) from a cow humerus (LCA-1-HM-R)

3D Model Creation and Cleaning

To comparatively study the chop mark morphologies, we used previously established protocols (Otárola-Castillo et al., 2018). The release of a new version of the open-source software MeshLab (version 2020.07) required specific parameters to be updated in order to model 3D surfaces from point cloud data. For example, to compute point set geometric “normal,” we used either 5 or 8 nearest neighbors for the BR mastodon and experimental chopper marks, respectively, to estimate the normals with no smooth iterations (Cignoni et al., 2008). We modeled the surface mesh with the following parameters: reconstruction depth = 11, adaptive octree depth = 5, conjugate gradient depth = 0, scale factor = 1.1, minimum number of samples = 1.2, and Gauss–Seidel relaxations = 8. This protocol results in a clean, smooth 3D mesh model that can be exported in several standard 3D file formats. We then isolated each mark from the surrounding bone by visual selection and prepared each mark for morphometric analysis (Fig. 9).

Fig. 9
figure 9

Illustration of BSM digitizing process beginning with the original digitized chopper mark LCA-1-HM-R-CH-1 (A) and mark isolation (B). Black point cloud represents the complete specimen (C), while the red point cloud is the specimen after k-means downsampling (D). The template point cloud (blue) is then translated and rotated (D) and warped onto the downsampled specimen using the Thin Plate Spline (E) to ensure point correspondence and generate the final specimen used in the analysis (F)

Each digitized chop mark point cloud generated hundreds of thousands of 3D points. To make the analysis manageable, we reduced the size of each mesh, while maintaining a large enough number of corresponding points (see the “Geometric Morphometric Results” section) to ensure that the morphological variability was well quantified (e.g., Gunz & Mitteroecker, 2013; Gunz et al., 2005; Mitteroecker & Gunz, 2009; Serb et al., 2011). To accomplish this, we downsampled the specimens and prepared them for morphometric analysis using protocols similar to those established in human evolutionary studies (e.g., Gunz & Mitteroecker, 2013; Gunz et al., 2005; Fig. 9).

Following Otárola-Castillo et al., (2018), we used the R programming language to conduct morphometric analyses. Each digitized mark was downsampled to 1152 points. Unlike Otarola-Castillo et al., (2018), however, we used a k-means algorithm to conduct downsampling. The k-means procedure allowed us to downsample a more even coverage of points across the morphology of each mark. The downsampled points were matched to a template to ensure point correspondence between each specimen (Fig. 9; Otarola-Castillo et al., 2018). We inspected the fit of the template to each point cloud and its subsequent downsampling to ensure that no anomalies had occurred (twisting, turning, etc.). Our detailed visual inspection indicated that the morphologies of all downsampled specimens were representative of the form of the original specimens. The robustness of the match between the template and specimens, and thus the accuracy of the specimen point sampling, depends on the number of alignment points used for matching (see Otárola-Castillo et al., 2018:59–60). The higher the number of alignment points, the closer the match between template and specimen.

Statistical Analyses

We followed established protocols to conduct all analyses (details can be found in` Otárola-Castillo et al., 2018). To study the shape of the marks, we completed landmark-based GM using the software package geomorph (version 3.3.2) in R (Adams & Otárola‐Castillo, 2013; Otárola-Castillo et al., 2018). GM combines the geometric information embedded within the spatial position of landmarks with the generalized Procrustes analysis (GPA) to examine the shape of an object. We conducted a complete 3D geometric morphometric analysis using the sliding semilandmark method. This method minimizes the introduction of artificial variation by allowing non-homologous points to mathematically “slide” along curves and surfaces during GPA (e.g., Gunz & Mitteroecker, 2013; Gunz et al., 2012; Mitteroecker & Gunz, 2009; Otárola-Castillo et al., 2018). We used a generalized Procrustes superimposition (Rohlf & Slice, 1990) to align all the specimens and yield Procrustes shape coordinates for the subsequent analyses (Bookstein et al., 1999; Mitteroecker & Bookstein, 2008; Mitteroecker et al., 2004).

Because the number of shape variables (1152) exceeded the number of specimens, we used nonparametric multivariate analysis of variance (NP MANOVA) to test the hypothesis that the marks would differ in overall shape (Adams & Otárola‐Castillo, 2013; Dines et al., 2014; Otárola-Castillo et al., 2018; Schultz et al., 2017; Serb et al., 2011). We used this approach to compute the Euclidean Procrustes distances between individual marks and groups in “morphospace” to compare between-group variation to within-group variation (for details see, M. J. Anderson 2001). Statistical significance was determined using permutation n = 10,000. Due to the high dimensionality of these data, we conducted principal component analysis (PCA) to visualize patterns of variation within and among marks.

Sample Size and Statistical Power

One concern in statistical testing is whether we may detect if sampled observations result from different populations. This means ensuring enough statistical power to recognize whether our BSM samples belong to different populations of taphonomic agents. If a study is not sufficiently powered then statistical analyses may not detect differences between samples created by different populations, even though those differences exist. To avoid these errors in our BSM study, we wrote R routines to conduct power analyses. Our R code computes the appropriate per group sample size needed to detect whether the BSM in this study were created by different populations of taphonomic agents and avoid detecting spurious effects (false alarms, e.g., Kramer et al., 2016).

To characterize the patterning of the shape data, we estimated the independent variance–covariance matrices (VCV) of the Bowser Rd. and chopper data using the PCA results. To assess power, we incrementally sampled random multivariate variables from each VCV. Samples ranged between 5 and 100 for each group in increments of 5. We repeated this procedure 1000 times for each sample size increment. This resulted in 20,000 dataset replicates of variable sample sizes.

To avoid these errors in our BSM study, we conducted a power analysis to compute the appropriate per group sample size needed to detect whether the BSM in this study were created by similar or different populations of taphonomic agents.

We evaluated the probability of the replicate datasets at each sample size to detect differences in the data structure by conducting a NP MANOVA and recording the resulting p-values. In other words, we wanted to know, on average, what sample size allows us to detect a difference between BSM groups (alpha = 0.05) if one were present.

Discriminant Function and Probability

We conducted DFA to obtain a probabilistic function that could differentiate between the two groups (BR mastodon [unknown] and chopper). DFA constructs a model wherein the input variables are used to form a linear combination formula. For this study, the DFA used prior probabilities to compute the Bayesian posterior probability of each mark belonging to the respective agent (for mathematical details see, Baxter, 2003; Legendre & Legendre, 2012: 673–690; Otárola-Castillo & Torquato, 2018; Otárola-Castillo et al., 2022). Group assignment of each specimen was computed using the “jackknife” a.k.a., the “leave-one-out” method. This method systematically excludes one specimen at a time and then constructs the discriminant functions to be used in a Bayesian model. The model is then used to predict the group assignment of specimens excluded from model building. Adding a prior probability, the model computes the posterior probability of a specimen belonging to either group (Harris et al., 2017). With limited information about the marks, we chose to model our prior probabilities as non-informative uniform probabilities. Due to the high dimensionality of the data we used the principal components that explained most variation. To find these components we conducted a null principal components comparison to identify which PC axes significantly contributed to the interpretation of shape variability between chop and mastodon marks (see supplementary materials and code for more details), and the normalized centroid size as the variables in the DFA. DFA analyses were conducted using the MASS package (Venables & Ripley, 2002) in the R computing environment (Team 2021). This approach is similar to that of Harris et al., (2017) in that it assigns a probability of agency to a given mark, rather than implementing a discrete result of “yes” or “no,” as has been the norm in all prior BSM analyses.

Null Range of High DFA Probability

Because the DFA approach computes the probabilities of multiple marks (45), based on high dimensional data, there is room for classification error and assignment of high probabilities due to random chance alone. Thus, we wanted to know how often to expect the DFA to assign high probability scores to BR BSM due to random chance alone. So, we defined high probability scores as P > 0.90, or probabilities greater than 90%. We then conducted an analysis to calculate the null range or the range of expected high probabilities caused by random chance alone. To accomplish this, we simulated completely random data following the mean and standard deviations of the BR and chopper shape variables’ PCA. We then input these data into a DFA to observe the frequency that such random data result in high probability classifications. This approach allowed us to compute an expected null distribution for the number of high probability values resulting from random data alone. We used these to compare against the results of DFA classification the chopper and BR results. Details and code to replicate analysis are in supplementary materials (S2).

Results

Bowser Road Mastodon

Our observation of the BR remains at the MCZ focused on obtaining high-quality peels of the chop marks identified on the assemblage. We observed multiple types of BSM on the specimens to which we had access. Some of the bones had clear root etching on their surface, combined with the presence of small roots within the cancellous bone (Binford, 1981). We did not observe any pieces available in the collection with obvious signs of burning, and none exhibited marks traditionally associated with trampling or carnivore modification (Blumenschine, 1995; Domínguez-Rodrigo et al., 2009; Olsen & Shipman, 1988). Most of the mastodon was represented in the collection, although much of it was fragmentary. While we were at the MCZ, we collected a total of 25 peels from the BSM we observed on the mastodon bones (Fig. 2). These were compared to the experimentally generated chop marks.

Experimental Chopping

The experimental butchery was conducted over three days (10/25/2018, 02/27/2019, and 03/06/2019). We focused on a single specimen at a time. After each experiment, we cleaned the bones and counted the number of blows delivered by the calibrated chopper that resulted in identifiable chop marks. Although some of the chopper blows left observable marks on the bones, Table 3 details that only 41% of blows delivered by the chopper resulted in observable chop marks on the cow bones.

Table 3 Number of experimental chopper marks observed on each specimen following experimentation and cleaning, compared to the number of blows received

Modeling

Following our procedures, the modeling of the 3D digital specimens generated consistent results. Figure 9 shows the specimens before and after mesh cleaning and modeling. A detailed visual inspection of individual point clouds indicated that our approach was satisfactory. Visually, our procedure did not affect specimen shape; rather, it eliminated point cloud elements believed to have resulted from digitizing error. Such errors can be introduced by lighting conditions, specimen cleanliness, or instrument sensitivity.

Results of Downsampling

We inspected the fit of the template to each point cloud and its subsequent downsampling to ensure that no anomalies had occurred (twisting, turning, etc.). The downsampled points were matched to each specimen to ensure correspondence between points (Fig. 9; Otárola-Castillo et al., 2018). Our detailed visual inspection indicated that the morphologies of all downsampled specimens were representative of the morphologies of the originals. As explained in Otárola-Castillo et al., (2018: 59–60), the degree of fidelity between the template and specimens depends on the degree of shape variation of the objects under analysis and the number of alignment points used for matching.

Geometric Morphometric Results

Chopper and mastodon BSM were aligned and compared using GPA. Figure 10 represents the superimposed GM results of the 20 chopper marks and the 25 BR mastodon BSM. Figure 11 summarizes the mean shape of the 1152 landmarks recorded from each of the 45 BSM analyzed.

Fig. 10
figure 10

Sample of all marks after the generalized Procrustes superimposition analysis (GPA) to adjust for translation, rotation, and scale

Fig. 11
figure 11

Following GPA, marks can be averaged into a consensus, to view central tendencies and surrounding variation

Statistical Results

Sample Size and Power Analysis

We determined appropriate power at the point when 95% of the P values resulting from NP MANOVA analyses was less than 0.05—below the red dashed line illustrated in Fig. 12. This figure shows that when the per groups sample size is composed of 5 and 10 BSM, the 95% range of tests produces some P values that exceed the 0.05 cutoff. At sample sizes of 15 per group, however, 95% of the P values fall below 0.05. Thus, a per-group sample size of 15 is appropriate to identify the observed differences in shape data between chopper and mastodon marks.

Fig. 12
figure 12

Plot of power analysis. Red dashed line highlights p-value of 0.05. An appropriate power level occurs when the 95% distribution of the p-value is below the red line. This illustration shows that the per group sample size of at least 15 specimens is appropriate to detect a signal if one is present

Hypothesis Tests

The NP MANOVA indicated a statistically significant difference in shape between chop marks and the BR mastodon BSM (F = 4.2763; P value = 0.0001). On average, the BR BSM were larger than the chopper BSM (log centroid size: BR = 4.53, SE = 0.13; chopper = 4.45, SE = 0.08). However, a two-sample Welch t-test of the log of the centroid size showed no statistically significant difference between the experimental chopper and archaeological BSM (t = 0.56, df = 39.85, P value = 0.5803). Overall, these results strongly suggest that the chopper tool marks and the BR mastodon BSM derive from different populations.

Principal Component Analysis

The PCA plots in Fig. 13 depict the relationship between the mastodon and experimental marks. The first three PC axes explain over 38% of the total shape variation in the data. The null PCA analysis determined that PC axes 1 to 10 are the most important above random noise, to represent and differentiate chop and mastodon marks (Figures S1.2). These 10 PC axes explain over 68% of the variation in shape data.

Fig. 13
figure 13

Three-dimensional plot of the principal component analysis (PCA) of the marks after GPA illustrating the “shape-space” of these data. Note that the shape variation of the marks observed on the BR specimen differs from and have little overlap with marks created by the chopper

Linear Discriminant Analysis, Null Range, and Bayesian Interpretations

The simulation analysis to calculate the null range DFA probabilities using random data (S2) resulted in two outcomes, classification accuracy and the frequency with which random data had classification probabilities > 0.9. First, Figure S2.1 shows that we calculated an average model accuracy of close to 0.50, as might be expected by random chance alone. After computing the simulations’ percentiles between 10 and 90%, we observed that 90% resulted in accuracies ranging between 35 and 64% from random data. Second, we computed an average frequency of six marks with probabilities > 0.90. In other words, although model classification using random data may perform at expected levels of ~ 50%, we should expect that some individual specimens will be assigned probabilities > 0.90 just by random chance. In this case, we calculate six, on average (median = 5). The 68% highest density value of the simulation results ranged between 1 and 7. This means that ~ 68% of the time (similar to 1 standard deviation) we should expect between 1 and 7 specimens to have probabilities greater than 0.90, simply due to random chance alone. Consequently, we set a null range of classification due to random error of 5 specimens (median estimate).

Moving to the observed shape data, the linear discriminant analysis (LDA) using the first 10 PC axes was capable of differentiating chopper marks and the BR mastodon BSM with 91.1% accuracy (Table 4; 41/45 marks). This is far greater than the upper end of the 90% range calculated above (64% accuracy using random data). Figure 14 shows the LDA classifications for chopper and mastodon BSM.

Table 4 Confusion matrix indicating the results of DFA classification of the observed chopper and Bowser Rd. marks
Fig. 14
figure 14

Histograms of marks by group (y-axis) along their linear discriminant function (LDF) axis (x). White histogram represents the Bowser Rd. marks; gray represents marks made by the chopper. Separation of these data is visible along the discriminant function (x-axis). Classification errors are visible via histogram overlap

The LDA model classified three BR marks as having been created by chopper tools and one chopper tool as BR. Even if classified with a high probability (> 90%), these four specimens are not above the threshold set for classification error due to random chance. The correctly classified chopper marks have an average posterior probability of 93%, which indicates the high classification confidence of the LDA model. One chopper mark is classified as BR. However, this mark had a relatively lower probability than the average, at 75%. Three BR marks are classified as a chopper. The average posterior probability of the three BR marks classified as chopper marks is 84.1% (Table 5). However, one of these three BR marks’ posterior probability as a chopper was quite low, at only 59% (BR-79–4). Consequently, this mark’s confidence as a chopper is also low.

Table 5 Classification of the Bowser Rd (unknown category) and chopper marks using Bayesian posterior probabilities of the discriminant function (likelihood) and uninformative prior (uniform)

Discussion

Analysis

We used a controlled experiment to generate chop marks on cow bones and subsequently digitized the marks as 3D models. The purpose of the experimental design and 3D modeling was to conduct a statistical comparison between the experimentally produced chop marks and the marks observed on mastodon bones. The mastodon marks were hypothesized to have been chop marks created by Clovis-aged humans. To make these comparisons, we constructed “the chopper,” a calibrated mechanical arm with attached stone bifaces that generated forces similar to hafted chopping tools. Analytically, we followed the protocols for cleaning, isolating, and analyzing 3D objects advanced by Otárola-Castillo et al., (2018) and innovated new ones (k-means downsampling).

The results of the hypothesis tests of the BR mastodon BSM and the experimental BSM showed that, on average, the two samples of BSM were significantly different in their shape and likely derived from different populations. Many of the individual BR mastodon marks (Fig. 2) were large with exceptionally smooth shoulders and no internal striations. We have yet to create chopping BSM with these characteristics under the conditions detailed here. The experimental chop marks were, on average, smaller (log centroid size: BR = 4.53, SE = 0.13; chopper = 4.45, SE = 0.08; though that difference was not statistically significant) and exhibited crushed shoulders (Fig. 7). The BR mastodon marks were more variable, and many were larger and deeper than any of the experimental chop marks we created.

The statistical hypothesis testing using NP MANOVA showed that the BR mastodon BSM and experimental BSM were most likely generated by different populations of taphonomic agents. Moreover, the results from the LDA showed that these two types of marks could be individually differentiated with high frequency (~ 91%) and a high rate of discrimination between the two types of BSM. Cross-validation results of the LDA model provide confidence in the classifications.

Bayesian inference of shape showed that most BR mastodon BSM had a low probability of being classified as chop marks created by Arnold, the experimental chopper. As shown by the number of unknown marks classified with high confidence as chopper BSM (P = 0.90 or above; n = 2 BR BSM classified as experimental chopper), there was some overlap between the groups. This overlap was an obviously important consideration when further investigating the two mastodon marks classified as chop marks with probabilities > 0.90. Because we know that BSM “mimics” are possible, the question was whether the number of mastodon marks classified as chop marks fell within the range of expected variation and classification error generated by the null distribution simulation. The null distribution described above (S2) showed that the number of mastodon BSM classified as chop marks was no greater than expected by classification error due to random chance (the “Linear Discriminant Analysis, Null Range, and Bayesian Interpretations” section above). These results strongly suggest that the BR mastodon BSM were created by an agent of modification other than that used in this study.

This study furthers our knowledge of human–megafauna interactions by assessing the evidence presented in this context. Inferences about the effect of human hunting on the demise of proboscideans require secure associations between prehistoric humans and animals. As detailed in the “Background” section, however, reliable evidence on this phenomenon is limited. Moreover, in the case of BSM inferred to be the result of butchery, the evidence is ambiguous and, due to the use of subjective methodologies, this inference is often contentious. Consequently, such methodologies introduce a great deal of uncertainty around the effect of human hunting on a megafauna assemblage. This problem is not unique to the North American archaeological record. Rather, it is pervasive in investigations of the global record of human environmental impacts. For example, similar contentious BSM exist in Africa, Australia, Asia, Madagascar, and South America, where the ambiguity surrounding BSM identification is high. As a result, the reliability of the archaeological evidence directly linking human behavior to these extinct mammals seems uncertain. Consequently, inferences about the human impact on the extinction of megafauna species during the Pleistocene are thus equally uncertain.

Our study helps to reduce such uncertainty by using a replicable methodology to measure BSM and evaluate whether they were the result of butchery activities. In this case, the morphometric and statistical analyses did not lend support to the idea that the BR mastodon marks were made by humans using bifacially reduced choppers for butchery. Further systematic evaluation of these and other marks will continue to reveal the most likely agent responsible for these and other BSM on the BR. To decrease the uncertainty of inferences about the causes of the demise of other megafaunas, we recommend that investigators conduct similar less subjective morphometric analyses.

Future Areas of Study

The experiments outlined here test only one possible source of the BSM exhibited on the BR mastodon (human chopping) among other possible explanations. The agricultural nature of the site presents the possibility that other types of human behavior involving other types of tools made chop-like marks on the bones. Given that the faunal remains were found in a previously farmed field, farm equipment such as plows or discs might have caused damage to the remains. Furthermore, the site report notes that the remains were discovered when the landowners had begun digging in the area using a trackhoe (Gramly, 2017). Disrupting the land in this manner could have caused additional marks on the specimen. There is no list of the bones that were removed by this method so we could not consider this information in our experiment. To ensure a more comprehensive analysis of the on BSM, therefore, future models will need to include experimentally generated marks from relevant farm equipment and compared to the BR BSM. For example, plowing over dry cow bones buried at various depths could test the hypothesis that a plow created the BR mastodon marks. The shape of marks made by machines such as backhoes should also be analyzed. These approaches would test alternative explanations as to the source of the BSM observed on the BR.

Conclusion

Research on late Pleistocene human-megafauna interactions, including the extent to which such animals contributed to these people’s diet and the human foraging effect on their extinctions, continues. To address this continuing discussion on worldwide environmental outcomes in prehistory, archaeologists need secure evidence linking people hunting or scavenging extinct mammals. However, the scarcity of archaeological data linking human behavior to faunal remains of extinct animals in various geographic locations worldwide with high certainty further complicates this already contentious debate.

New methods for studying trace evidence such as BSM data, for example, those employed in this study, can decrease the uncertainty around the agents responsible for BSM on archaeological remains. In doing so, these methods are helpful for increasing confidence in inferences about human-megafauna interactions, especially where lithic evidence of human hunting or butchery is scarce. The methods piloted by Bonney (2014) and Otárola-Castillo et al., (2018) demonstrate that it is possible to accurately use GM analysis and DFA to distinguish among BSM from different sources. To date, the focus of this study has been on creating experimental BSM by chopping with stone hand axes for comparison to the BSM observed on the BR mastodon specimen.

Our analyses showed that the BR mastodon’s BSMs differ from experimental chopping marks (made by [Arnold], the chopper). This finding allows three interpretations: (1) tools other than choppers made these marks, (2) choppers manufactured differently from our experimental conditions made these marks, (3) the marks resulted from some as-yet-unidentified human activity or natural, non-human agency. Further research will shed light on this issue. For now, however, we (and our colleagues) must reserve judgment on whether the BR Mastodon Site is an archaeological site or a non-anthropogenic paleontological site.

One critical point to highlight is that our approach, along with others (Harris et al., 2017) offers a solution to a problem that that plagues research in the early peopling of the New World, namely, over-reliance on visually assessed morphological analogy and appeals to authority making identifications of possibly anthropogenic tool marks and BSMs.