Controlled experiments in lithic technology and function

From the earliest manifestations of tool production, technologies have played a fundamental role in the acquisition of different resources and are representative of daily activities in the lives of ancient humans, such as hunting (stone-tipped spears) and meat processing (chipped stone tools) (Lombard 2005; McPherron et al. 2010; Lombard and Phillipson 2010; Brown et al. 2012; Wilkins et al. 2012; Sahle et al. 2013; Joordens et al. 2015; Ambrose 2001; Stout 2001). Yet many questions remain, such as how andwhy technological changes took place in earlier populations, and how technological traditions, innovations, and novelties enabled hominins to survive and disperse across the globe (Klein 2000; McBrearty and Brooks 2000; Henshilwood et al. 2001; Marean et al. 2007; Brown et al. 2012; Režek et al. 2018). By understanding how and why past humans used different tools, we could answer key questions related to human technological evolution, such as ecological decision-making processes, as well as cultural transmission dynamics (Eerkens and Lipo 2007; Whiten et al. 2009; Goodale and Andrefsky 2015; Lycett et al. 2015; Ferguson and Neeley 2010; Morgan et al. 2015). Most of the widely employed assumptions about lithic tools are based on tool design and morphology according to archeologists’ intuitions and ethnographic observations (Binford 1973; McCall 2012). Several researchers (e.g., Shea 2011, 2014; Holdaway and Douglass 2012; Dibble et al. 2017) have recently challenged the validity of these assumptions, making it clear that we cannot continue with “business-as-usual,” casual interpretations, arguments by experience, and subjective opinion in stone tool research. Therefore, experimental replication of past human activities, a part of our methodological arsenal since the 1970s (Tringham et al. 1974; Coles 1979; Odell and OdellVereecken 1980; Outram 2008), has returned to prominence among approaches to interpreting stone tool variability. However, many of the new experiments lacked rigor and the conclusions drawn from them are therefore uncertain. In the last decade, there have been major critiques of the limitations of experimental designs in both technological (production) and functional (use) studies (Collins 2008; Pfleging et al. 2015; Eren et al. 2016; Lin et al. 2018). These can be summarized as follows:


Introduction
From the earliest manifestations of tool production, technologies have played a fundamental role in the acquisition of different resources and are representative of daily activities in the lives of ancient humans, such as hunting (stone-tipped spears) and meat processing (chipped stone tools) (Lombard 2005;McPherron et al. 2010;Lombard and Phillipson 2010;Brown et al. 2012;Wilkins et al. 2012;Sahle et al. 2013;Joordens et al. 2015;Ambrose 2001;Stout 2001). Yet many questions remain, such as how and why technological changes took place in earlier populations, and how technological traditions, innovations, and novelties enabled hominins to survive and disperse across the globe (Klein 2000;McBrearty and Brooks 2000;Henshilwood et al. 2001;Marean et al. 2007;Brown et al. 2012;Režek et al. 2018). By understanding how and why past humans used different tools, we could answer key questions related to human technological evolution, such as ecological decision-making processes, as well as cultural transmission dynamics (Eerkens and Lipo 2007;Whiten et al. 2009;Goodale and Andrefsky 2015;Lycett et al. 2015;Ferguson and Neeley 2010;Morgan et al. 2015).
Most of the widely employed assumptions about lithic tools are based on tool design and morphology according to archeologists' intuitions and ethnographic observations (Binford 1973;McCall 2012). Several researchers (e.g., Shea 2011Shea , 2014Holdaway and Douglass 2012;Dibble et al. 2017) have recently challenged the validity of these assumptions, making it clear that we cannot continue with "business-as-usual," casual interpretations, arguments by experience, and subjective opinion in stone tool research. Therefore, experimental replication of past human activities, a part of our methodological arsenal since the 1970s (Tringham et al. 1974;Coles 1979;Odell and Odell-Vereecken 1980;Outram 2008), has returned to prominence among approaches to interpreting stone tool variability.
However, many of the new experiments lacked rigor and the conclusions drawn from them are therefore uncertain. In the last decade, there have been major critiques of the limitations of experimental designs in both technological (production) and functional (use) studies (Collins 2008;Pfleging et al. 2015;Eren et al. 2016;Lin et al. 2018). These can be summarized as follows: This article is part of the Topical Collection on Controlled experiments in lithic technology and function (a) There is a lack of clear research questions, including hypotheses and assumptions to be tested; (b) Alternative hypotheses are rarely tested; (c) There are insufficient details on the materials and methods; (d) The number of trials is often too low, leading to statistically underdetermined results; (e) The organization and definition of the control and manipulation of the different variables are poorly identified; (f) Confounding variables are not accounted for; and, finally (g) Qualitative methods dominate over quantitative ones.
Consequently, various researchers highlight the need to break up the experimental program into different levels of experimentation (pilot/exploratory and controlled experiments). Although they complement each other, studies working within different levels should have explicitly different goals and seek different observations and types of data. Because most human tasks, including stone tool production and use, involve a wide array of different variables, these need to be tested individually in order to evaluate their influence on observable units of variation in stone tools. This includes testing the interaction among variables and the development of units of analysis and measurement.
Based on this evaluation, controlled experiments have often used mechanical or automated instruments which reduce human variability while at the same time providing adequate control and manipulation of the system variables (Tomenchuk 1985;Collins 2008;Dibble and Rezek 2009;Eren et al. 2011;Iovita et al. 2014;Magnani et al. 2014;Pfleging et al. 2015;Key 2016;Martisius et al. 2018;Schmidt et al. 2019). Using only assumptions based on physical principles, which operate uniformly across space and time (as in geological uniformitarianism), helps to build analytical units of measure. In turn, these provide concrete data for the observed connections between the identified patterns and processes that can be used as proxies for inferring past human behavior. Here, quantification methods and techniques have several advantages over qualitative descriptions: they are easily verifiable, do not depend as much on research tradition, and avoid the trap of arguing from authority. At the same time, developing experimental planning and design based on principles of reproducibility and repeatability according to the research questions and testing hypothesis also allows improving accuracy, data quality, comparability, and the evaluation of the final results. Using these principles, it becomes possible to identify the relevant major variables to be tested and generate falsifiable explanations for how lithic tools were produced and used. In turn, this information can be used to build higher-order theories of hominin behavior and contribute to the study of cultural evolution.

Contributions
Although focused on lithic tools, the papers in this special issue go beyond the nitty-gritty of lithic analysis, contributing significantly to global debates in the interpretations of the archeological record. They address questions of broad significance while also representing a new wave of methodological research in archeology as a whole, showcasing groundbreaking methods and techniques. While the focus is on using laboratory-controlled experiments, most of the papers also highlight the importance of combining qualitative and quantitative methods of analysis to address both methodological and archeological questions. Contributions to this special issue address four main research questions on lithic technology and function: (1) the impact of post-depositional processes on the identification and interpretation of archeological lithics; (2) the influence of raw material variability on lithic production and use, including aspects of tool efficiency, durability, and inferring past human decision-making; (3) the establishment of standards and protocols of lithic experimental replication and use-wear studies; and (4) the evaluation and quantification of use-wear formation processes and its relevance to reconstructing aspects of human tool use behaviors.
In his paper, Schoville (2018) explores the impact of natural and biological disturbance processes at archeological sites-such as trampling, bioturbation, and displacementon the preservation of lithic tools and their spatial distribution in an archeological site. Although used as major lines of evidence to infer human intentional modification and use of lithic tools, attributes such as edge damage, retouch, and use-wear traces are potentially mimicked by post-depositional processes during and after human occupation at the site. The experiment presented in the paper tests different dependent variables (movement direction, distance, artifact size) in an animal trampling-monitored setup, where independent variables were also evaluated (slope and fluvial activity). Besides the importance of this approach for assessing misinterpretations within archeological assemblages, this paper also highlights the contribution of lithic studies to inferences about site preservation and formation processes, from which natural and biological patterns can be identified and associated with human occupation.
One of the most discussed topics in this issue is the use of controlled experimental replication to identify and interpret the impact of raw material variation on past human technological decision-making. These papers explore the manipulation of the raw material (Mackay et al. 2018), lithic tool production and design (Dogandžić et al. 2020;Pargeter et al. 2018), tool efficiency and durability (Abrunhosa et al. 2019 and Pereira et al. submitted), and tool use (Pfleging et al. 2018).
In their paper, Mackay and colleagues evaluate the impact of heat treatment methods on manipulating the material properties of silcrete and consequently tool production and use. By testing the assumption that silcrete is sensitive to rapid changes in temperature, this paper also addresses the use and control of heating methods in the past as a major indication of human behavioral complexity among early modern humans. In this study, the use of controlled experiments allows the researchers to test several assumptions and contradict arguments in the literature. One of the most interesting aspects is the observed variation in response to different raw material sources. The effect of the variation in material properties among silcrete sources is likely to be the strongest factor for the observed results. This variability still needs to be explored, as standardized protocols for material characterization are needed in order to evaluate the effect of past heat treatment methods on rocks other than silcrete (e.g., flint and quartzite).
The impact of rock mechanical properties is also explored by Dogandžić et al. (2020). Here, the researchers follow previous investigations on fundamental aspects of lithic production, including the association between core and tool morphological attributes and knapping force. They describe and quantify raw material constraints on knapping, with implications for interpreting the archeological record. Focused on the mechanical principles behind the physics of knapping, previous work used a standard raw material (glass) to show a clear effect-causation relation between debitage platform, core surface morphology, and flake size and shape (Dibble and Rezek 2009;Rezek et al. 2011;Magnani et al. 2014). In this paper, the researchers explore the application of their model to other raw materials observed in the archeological record, such as basalt, flint, and obsidian, under the same controlled experimental conditions. Unlike the work by Mackay and colleagues, their results show that the effect-causation between dependent and independent variables previously observed in a glass is similar in these different types of rocks. In other words, the variability of mechanical properties observed in these different rocks does not significantly affect the model, confirming both the internal and external validity of the experiments. The difference between the results of Mackay et al. and Dogandžić et al. suggests that the influence of rock mechanical properties is likely to be task and scale-dependent and it is clear that more research is needed to fully elucidate the matter. Similar questions are investigated by Pargeter et al. (2018), who assess the role of raw material in bipolar technology and its relationship with tool morphology and fragmentation processes. Through experimental replication, Pargeter et al. aim to obtain quantifiable guidelines for identifying bipolar reduction in archeological assemblages. Aspects such as the relation between efficiency and diagnostic technological attributes are explored. In this case study, driven by the nature of the archeological assemblages, quartz, and flint are used and compared. Following previous studies discussed here, the authors also highlight the significance of raw material properties on the final results and evaluation. As mentioned earlier in other works, here the similarity of the observed behaviors of milky quartz and flint is likely related to the similarity in their properties, such as brittleness.
Two other papers, Abrunhosa et al. (2019) and Pereira et al. submitted, compare the impact of raw material properties and their internal and external variability on the performance of tools, by measuring edge durability and overall tool efficiency. Using a controlled setup, results from both papers show that the mechanical characteristics of the raw material properties play an important role in the durability and efficiency of the lithic tools. This pattern seems to be observed in each experiment, independent of the worked material, and force applied. These results support the idea that past humans were conscious of the suitability of different raw materials, to which they likely adjusted their decision-making processes (see also Braun et al. 2009). This also has fundamental implications when assessing the variability observed in the archeological record and inferences about the evolution of different human behaviors.
The application of controlled experiments to use-wear studies is also explored by Pfleging et al. (2018). Traditional use-wear experiments (performed by colleagues carrying out prehistoric-like tasks in academic settings) have established a set of fundamental variables considered relevant for the formation of the different types of use-wear traces. Controlled experiments build upon this initial exploration to generate secure inferential chains that allow us to fully understand these variables' impact on wear formation. This paper quantitatively evaluates force and duration involved in a given tool task, in this case scraping. Using a force-and impedance-controlled robot allows for the control and manipulation of force with somewhat realistic dynamic trajectories, which could not be achieved either by humans or by simple mechanical wear testers, such as tribometers. This paper outlines the importance of quantification methods when characterizing microsurface texture analysis to infer lithic tool use. ISO surface texture parameters are used to measure the tool surface and tested through the sequential experiments, measuring changes in micro-surface texture as a function of the two main variables, force and duration. It is important to highlight that these types of experiments not only improve the identification and interpretation of the key factors involved in the formation of use-wear traces but-based on quantitative data-also contribute to a more rigorous qualitative labeling of use-wear, which is fundamental when analyzing archeological artifacts.
Finally, Calandra et al. (2019b) tackle the issue of repeatability in microscopic use-wear studies. They develop a relative coordinate system protocol for experimental samples. This work significantly improves accuracy and reproducibility in sequential use-wear experiments, by allowing the analysis of the same area of interest before, during, and after the experimental cycles. Although used here on artifact microwear analysis, this method can be applied to different materials (even archeological samples) and on different scales of analysis. The protocol, tested and evaluated on two different machines in two separate labs, represents an important step towards both repeatability and reproducibility in experimental archeology.

Conceptual framework, limitations, and future directions
The focus of the papers reflects the complexity of archeological experiments when addressing questions related to lithic technology and function. Although this special issue attempts to bring together different methodological and research questions in lithic studies, such an approach can be applied to other fields of archeological artifact analysis.
As many other subfields in archeology, studies on lithic technology and function have undergone many conceptual and methodological adjustments depending on changes in its theoretical scope, main research questions, and challenges presented by new archeological finds. Although it is beyond the scope of this special issue to offer a new set of methods or theoretical directions, the aim here was to bring together contributions from a new wave in methodology, and which brings with it a new set of research questions. From these papers, we learn that moving towards a more reliable, reproducible, and repeatable method in archeological experiments will involve several important steps: 1. The formulation of detailed and clear research questions; 2. The identification, control, and careful manipulation of the experimental variables; 3. The development and implementation of controlled experimental apparatuses; 4. A greater use of quantitative methods and protocols; and 5. A stronger link between the experimental results and archeological data.
Thus, a clear description of the main research question(s) should not just include the archeological evidence that triggered the need for experimentation, but also a detailed hypothesis to be experimentally tested. This approach is a determining starting point for all experimental designs, because it leads to choosing the relevant variables, both dependent and independent. A clear categorization of the identified variables (e.g., dependent, independent, and confounding) in each experiment is a key aspect of experimental validation (see Lin 2014 for more details). Here, we advocate that only this approach can lead to obtaining causal relationships between variables and the final result, while also allowing for the recognition of patterns and comparison within and between experiments.
Finally, for all their many advantages, controlled experiments also have their methodological and conceptual limitations. The first is that, due to the complexity of the archeological record, not all variables can be controlled and manipulated. For example, in most of the contributions in this special issue, researchers emphasize that raw material variability has a major influence on experimental results and therefore is of key importance for understanding lithic technology and function. Nevertheless, characterizing and evaluating the influence of the mechanical and material properties of the different types of rocks has shown itself likely to depend on specific aspects of the research questions, such as scale of analysis (e.g., efficiency, durability, damage formation) and testing hypothesis (e.g., tool knapping or use). Scale issues are likely to play a role in understanding other variables as well, raising questions of how to identify the best variables to focus on, since controlled experiments exclude so many variables from their analysis. Here, a fundamental aspect concerns time and funding constraints in each research project. Controlled experiments are time-consuming, and-depending on the research questions-can result in a high number of variable combinations that need to be tested individually in a given experiment. Potentially, sample standardization and controlled experimental instruments would reduce the number of samples and experiments needed. On the other hand, funding constraints might prevent researchers from preparing samples to given standards and from building or purchasing experimental mechanical instruments, which in our opinion seems to be the major reason that prevents this approach from becoming established in the archeological research community. Nevertheless, simple controlled setups can be used, and experimental designs can be tweaked to control variables in the absence of mechanical apparatuses. Also, we would like to emphasize that research and institutional collaborations would be of major importance for the development of the discipline. Although controlled experimental designs aim to test major variables individually, the outcome of the experiments can be analyzed in different ways and different research questions and projects.
In this respect, quantitative methods and open data are equally important for improving and sharing experimental protocols and results and feeding them back to the general archeological community. They not only help identify patterns in the data and check the validity of experiments and models but also facilitate the communication of data and results among researchers (Marwick 2017a, b;Calandra et al. 2019a). By openly sharing data, researchers with access to expensive research infrastructure can transmit information and progress to colleagues who may not have access to such tools, helping to democratize the field.
Despite this fundamental role of quantified and automated processes in experimental replication and data analysis and modeling, the discipline still faces many challenges concerning terminology and definitions. Much still needs to be done in terms of defining common standards, protocols, and a common descriptive terminology in the field. As in other applied research fields, an interdisciplinary approach is needed, in which standards and protocols from other disciplines can be adapted to the different questions addressed through lithic experimentation. As shown in several papers in this issue, the study of different aspects of the production, manipulation, and use of archeological artifacts involves a profound understanding of major principles of each raw material, which can be achieved when integrating knowledge from different disciplines and techniques such as tribology, metrology, material sciences, fracture mechanics, mechanical engineering, petrology, and others.

Final words
Controlled experiments have been regarded among archeologists with a certain degree of skepticism. This is in large part due to the perception that, in controlled setups, especially those employing machines, artifact production, or the experimental replication of their use is too far removed from authentic ancient human action to be meaningful. In that sense, the unease stems from a belief that, on the one hand, a machine is so unrealistic that whatever advantages might come from it are not helpful, and-on the other hand-that a mechanical device might introduce more biases than it helps resolve. Responding to the first objection, we would say that most controlled experiments do not aim to replicate human action or real-life activities faithfully-and nor should they. Instead, they reduce the complexity of human action by isolating, controlling, and measuring the causal effect of variables on observable properties of archeological materials. From these, a more detailed and precise understanding of the process leading to the formation of the observable variation in these materials can be deduced. Only by having a solid basis at this level can we begin to reinstate complexity and study the aspects of behavior which brought us to archeology in the first place. As to the second objection, we agree that, depending on how they are constructed, certain setups could introduce their own, artificial biases, but these would be mistakes in individual study designs, and not a flaw to all controlled experiments. In closing, we would therefore like to invite all our colleagues to engage with this new trend in experimental archeology, and to keep an open mind to integrating some of the results presented here and elsewhere into their own research. At the same time, we urge researchers with different and complementary facilities to work together more. Controlled experiments in lithic technology and function are just getting started, and judging by the number of follow-up questions stemming from the research in this issue, they will need all the help they can get.