The results of lithic experiments performed on glass cores are applicable to other raw materials

About 10 years ago, a new experimental design, based on a mechanical flaking apparatus, allowed complete control over several independent variables essential to flintknapping. This experimental setting permitted the investigation of more fundamental aspects of stone technology, including the effect of particular platform attributes, core surface morphology, and the application of force on flake size and shape. These experiments used cores made of glass that were molded to exact configurations. Here we set out to investigate whether results obtained from experiments on glass cores can be extended to other materials, in this case varieties of basalt, flint, and obsidian that were cut to the exact core configurations. We focused on the relationships between the independent variables of exterior platform angle and platform depth and dependent variables of overall size (weight or mass), volume, and linear dimensions. It was found that in almost every comparison, all four materials show similar relationships in nature and degree. What differs instead is the amount of force needed to detach a flake. In other words, given the same core morphology and platform attributes the resulting flakes will be the same, but harder materials require more force to remove the flake. These results were additionally verified on Middle Paleolithic archeological materials made mostly on Late Cretaceous flints. Our results demonstrate that experiments using glass cores are valid and can be generalized and extended to other materials.


Introduction
Controlled experiments on flake manufacture have a long history in lithic studies (Bonnichsen 1977;Dibble and Whittaker 1981;Faulkner 1972;Pelcin and Dibble 1995;Pelcin 1997aPelcin , b, 1998Speth 1972Speth , 1975Speth , 1981. These experiments were designed to investigate factors that influence size and shape of a flake. The relationship between platform variables and flake size was further used to estimate the original size of the blank before it has been reduced by retouch. Reduction intensity can be quantified by comparing the original mass and the mass of a retouched element (Dibble 1987(Dibble , 1995. This aspect has been valuable for curation studies, particularly for the questions of the Middle Paleolithic industrial variability since resharpening events drive much of the typological variability as established by F. Bordes (Bordes 1961;Dibble 1987Dibble , 1995Clarkson 2007, 2015;Turq 1989). Tight correlation of platform attributes and flake size has motivated a number of studies to refine platform measurements in order to improve this relationship and further address reduction intensity in lithic assemblages (Braun et al. 2008;Clarkson and Hiscock 2011;Davis and Shea 1998;Muller and Clarkson 2014;Shott et al. 2000).
One of the primary criticisms of controlled experiments has been about questions of validity. The criticism was that the products themselves, and/or the means by which force was applied to remove them, bear little resemblance to knapped flakes and cores. About 10 years ago, a new experimental design was developed with the intention of enhancing the validity of the experimental flaking process and the resulting flakes themselves (Dibble and Rezek 2009). While the resulting cores and flakes more closely resembled knapped artifacts, the cores themselves were made of soda-lime glass (henceforth glass), reliably molded to specific shapes. This allowed for experiments on the role of various independent variables (those under the control of the knapper) on specific flake characteristics. Cores produced from the same mold were rather identical in size and shape, thus allowing important variables to be held constant-a major advantage in any experimentation. Different core designs could be produced to also examine the effects of core surface morphology on flake variation (Rezek et al. 2011).
Over the course of numerous glass-core experiments, several independent variables were examined, including exterior platform angle, platform depth, hammer speed, force, angle of blow (Pelcin and Dibble 1995), core morphology (Rezek et al. 2011), hammer type (Magnani et al. 2014), and platform beveling (Leader et al. 2017). The results of these experiments have helped to describe and quantify how knappers achieve particular results, with direct implications for interpreting the archeological record (Lin et al. 2013;Režek et al. 2018). While glass itself is highly amenable to percussion flaking, it still leaves the question as to the applicability and the validity of the results of these experiments to the kinds of raw materials actually knapped in the past.
Much of the archeological literature dealing with the role that raw materials play in variation among lithic assemblages deals with aspects of accessibility and quantity (Andrefsky  Bamforth 1991;Brantingham 2003;Dibble 1991;Roth and Dibble 1998), nodule size and shape (Brantingham 2000;Eren et al. 2011Eren et al. , 2014Inizan et al. 1995;Kuhn 1995;Whittaker 1994), physical properties of different rocks (Braun et al. 2009;Goodman 1944;Jones 1979;Whittaker 1994), and related to the latter-heat treatment (Bleed and Meier 1980;Collins 1973;Crabtree and Butler 1964;Domanski et al. 1994;Domanski and Webb 1992;Mercieca 2000;Rick and Chappell 1983;Schmidt et al. 2013). Although often is topic of informal discussion among modern knappers, there is more limited literature on the "knappability" of different materials (Domanski and Webb 1992;Webb and Domanski 2008). It is accepted that some materials are more suitable than others for producing certain kinds of products, and that certain knapping techniques may be more suitably applied to some materials over others (e.g., Manninen 2016). A replication experiment to test whether raw material differences determine artifact shape (handaxe in this case) found no significant relationships between raw material type and artifact morphology (Eren et al. 2014). To what extent, however, do different materials (referring specifically, of course, to those that fracture conchoidally) respond similarly in nature and degree when knapped? And how do these relate to glass, when knapped? Answering to these questions will help assess the validity of the results derived from experiments in flake formation that used glass cores. Our paper presents the results of an experiment using a variety of raw material and which included glass (plus a comparison to archeological materials) to test this question empirically. This experiment focused on particular relationships between independent variables (exterior platform angle and platform depth) and dependent variables of flake size (Fig. 2).

Materials and methods
In this experiment, we examined if the effects of exterior platform angle (henceforth EPA) and platform depth (henceforth PD) on flake size as observed in previous controlled experiments using glass cores (Dibble and Rezek 2009;Pelcin and Dibble 1995;Rezek et al. 2011) also hold true in similar ways for raw materials other than glass. Raw material type is therefore one of the independent variables in this experiment. Cores of four different raw materials are tested: basalt, flint, obsidian, and glass. Glass cores are molded identically to a specific size and surface morphology. The same core shapes are then produced in three additional raw materials: obsidian, basalt, and flint. Each core, however, is produced by hand, so size and shape vary minimally (Fig. 1). Cores on all four materials have an external surface with a central ridge where two faces converged at the top of the core surface. In our analysis, we exclude flakes with terminations other than feather (hinge, step, overshot) and broken flakes where the size could not be measured. Thus of 81 flakes produced in this experiment, only 57 were available for analysis ( Table 1).
The raw materials come from three locations. The basalt comes from near Ashfork, Arizona, the obsidian from Jalisco, Mexico, and the flint is from Ingram, England. The raw material nodules were purchased from a flintknapping supply house (www.neolithics.com) and shaped into our specific cores there. There are microstructure differences between these three materials. Basalt is a fine-grained mafic rock, and flint is very fine cryptocrystalline quartz, while no microstructure is observed in obsidian.
Other independent variables are EPA, PD (Fig. 2), and displacement speed. All types of raw materials are cut with a  rock saw at one end to a specific EPA. We varied EPA by three values: 65°, 75°, and 85°. During the knapping process, we held constant variables known to affect flake size and shape: angle of blow (10°), hammer tip (steel edge), and core surface morphology (no further modifications are made to the exterior surface that was pre-shaped to the center-ridge morphology shown above). The experiment followed similar protocols to previous controlled experiments using glass alone (Dibble and Rezek 2009;Leader et al. 2017;Lin et al. 2013;Magnani et al. 2014). A core is mounted in a stable mount supported on three sides and against side rails on the exterior surface. The cores are inserted into a stable mount in an Instron Servohydraulic testing press (Model 1331). Cores are then positioned such that the angle between the platform surface and the hammer strike (angle of blow) is set to a specific value. The Instron Servohydraulic Testing Press is retrofitted with an external control panel from which displacement speed and hammer distance can be controlled. Displacement speed is held constant at 0.25 in./s. The hydraulics move a central actuator capable of producing 20,000 lbs of force. A steel hammer strikes the cores on the striking surface detaching the flake. Achieving a specific PD proved to be difficult. The striker rarely hits the core at the marked PD value, most likely due to the thickness of the striker, which is approximatively 1 mm. Therefore, the resulting PD shows a slight offset from the desired PD value and is thus a continuous variable that ranges from 2 to 10 mm. Dependent variables include flake weight, volume, length, width, and thickness. The linear measurements are recorded with calipers to the nearest 0.05 mm. Platform depth is measured four times independently by four individuals and then averaged for accuracy to mitigate against inter-observer error (see Leader et al. 2017). Flakes and cores are weighed to the nearest 0.1 g. However, because the raw materials used in this experiment vary in densities, their weights are not directly comparable. Instead of weights, we use volume as a proxy for size. We first recorded the volume of a core of each raw material by measuring the volume of their 3D scans in Meshlab software. Then their densities are calculated by dividing weight by volume. These density values are then used to calculate the volumes of each flake.
To also test the relationships derived from the experimental setting against actual archeological data, a set of several archeological assemblages are used in this study. Our sample here is composed of complete flakes from three excavated Middle Paleolithic sites in southwest France: Abri Peyrony (Soressi et al. 2013), Pech de l'Azé IV Turq et al. 2011), Combe-Capelle Bas (Dibble and Lenoir 1995), and Roc de Marsal (Table 2). While some inter-observer error in the data can be expected, all the attributes are recorded in a similar manner. Included are artifacts larger than 2.5 cm in their largest dimension that have values recorded for the variables that are analyzed here. Cases with values for EPA between 50 and 90°and PD between 1.5 and 11 mm are included in the sample so as to correspond to the intervals in the experimental dataset.
Data and figures are produced in R software (R Core Team 2018). To examine the relationship between independent and dependent variables, we used linear modeling. Response variables are flake volume and linear dimensions of length, width, and thickness; and predictor variables are PD, EPA, and raw material. Volume as a response variable is transformed to its cube root so as to transform the relationship of volume, which increases in three dimensions, and PD, which increases in one dimension, to a linear one. When incorporating the interaction of EPA and PD in the model, we standardized them with the ztransformation to have a mean of 0 and standard deviation of 1. For each model, we checked for violations of its assumptions by examining the residual distribution and variance inflation factor. Influential cases were identified by Cook's distance, leverage, and by comparing fitted values between model using all data and model with cases excluded one at a time (DDFFITs). Influential cases are then are removed from the dataset. Covariates EPA and PD exhibit a high correlation in our dataset, thus violating model assumption. This correlation, however, is due to the nature of our experimental design.  . This results in high negative correlation of EPA and PD. Therefore, we rely on variance inflation factor to detect multicollinearity of predictors. We used ANOVA to compare the null and full models when adding independent variables or interaction terms. Additional packages used in R are dplyr (Wickham et al. 2018), ggplot2 (Wickham 2016), car (Fox and Weisberg 2011), coin (Hothorn et al. 2006), and lm.beta (Behrendt 2014).

Flake volume
In many previously published experiments performed on glass (Dibble and Rezek 2009;Dibble and Whittaker 1981;Pelcin and Dibble 1995;Speth 1972Speth , 1975Speth , 1981, flake weight was shown to be a main function of two independent variables-EPA and PD-and increasing either of these results in larger flakes. Here, we modeled the effect of raw material on the relationship of EPA-PD on flake size. We first used multiple regression to predict flake volume with EPA and PD as independent variables with the entire experimental assemblage including all raw materials. We compared the model with EPA and PD as predictors with a model that includes the interaction term of these two covariates (one influential case is excluded from the dataset). ANOVA model comparison shows that adding the interaction improves the model (F(1,53) = 60.297, p < 0.001). The model shows high predictability of EPA and PD on flake size (r 2 = 0.88, F(3,52) = 131.9, p < 0.001). Raw material then is added as a categorical predictor to this model and then compared to the reduced model (EPA and PD with interaction terms) with ANOVA. The results show that the raw material improves the model as well (F(3,49) = 5.178, p = 0.003); however,  (Table 2) the r 2 as a measure of fit shows only negligible increase (r 2 = 0.9, F(6,49) = 84.41, p < 0.001). Moreover, while only obsidian shows a significant effect, confidence intervals overlapping with other raw material and low beta coefficients for obsidian (Table 3) suggest that this significance, especially given the small sample size, should be taken cautiously.
In looking at the relationship between flake volume and platform depth, cores with identical EPAs can be compared, as shown in Fig. 3. Note that each material displays the same relationship-increasing values of PD results in larger flakes. It was shown earlier (Dibble and Rezek 2009) for a separate sample of glass flakes that the relationship between flake volume and PD changes as EPA increases for all of these materials. Here (Fig. 3), slope values increase slightly for EPA of 75°and substantially for 85°.
Another way to see if flakes with different EPAs are larger or smaller relative to their PDs is to look at the ratio of the cube root of flake volume standardized by PD. Unlike the situation with EPA, it is not technically possible in our setup to produce flakes with a specific PD. For this reason, flakes are made within a wide range of PD, usually from 2 to 10 mm, depending on the value of EPA being tested (PDs that are too large will result in flakes that overshoot the core, which are not used in these analyses). As shown in Fig. 4, for all of the materials, the ratio of flake volume (cube root) to platform depth increases with each value of EPA. This means that at higher values of EPA, equivalent changes in PD result in proportionally larger changes in flake volume. Again, this has been repeatedly observed previously in glass alone (Dibble and Rezek 2009;Lin et al. 2013;Pelcin 1997a, b). There is less variability in the volume to platform depth at EPAs of 65°than at 75°and especially 85°. This is probably due to measurement error of PD. As EPA increases, each unit increase in PD has a more pronounced effect on flake size. Thus, small errors in measurement will result in higher standard deviations with higher EPAs.
As an additional test, we verified these results with archeological data, simply because they represent a variety of materials that were used in the past (in real-life). As is seen in Fig. 5, these assemblages demonstrate the same patterns, namely that weight is both a function of EPA and PD. Similar patterns in archeological materials were also presented in Dibble (1997).

Individual dimensions
Flake volume is a function of the three linear dimensions of length, width, and thickness, so it is of interest to see how these individual dimensions are themselves influenced by EPA and PD across the four material types. It has been previously reported that all three dimensions are affected by both EPA and PD, so again it is necessary to take both of these independent variables into consideration at the same time. When predicting linear dimensions with EPA and PD using multiple regression, the addition of an interaction term improved the model for length (F(1,52) = 34.278, p < 0.001) and width (F(1,49) = 7.6372, p = 0.008), but not for thickness  (Table 4). Comparing those models with the models where raw material is added as a categorical covariate revealed that raw material is not a significant predictor for the linear dimensions. Figures 6, 8, and 10 show the effect of EPA on linear dimensions, standardized by PD. Similar effects are observed in archeological materials (Figs. 7,9,11).
Standardized coefficients (β) of the models on experimental assemblages suggest that EPA has greater effect than does PD in determining length, while the opposite is true for width and thickness-they are more affected by PD than EPA (Table 4, Figs. 6;7;8;9;10;11). Similar trends have been observed on actualistic experimental assemblages (Dogandžić et al. 2015). With linear dimensions impacted differently by EPA and PD, the changes in the shape of the resulting flakes, e.g., elongation or relative thickness, could be predicted. In other words, particular flake characteristics can be realized by adjustments in these two platform variables (Lin et al. 2013). For instance, to achieve a long and thin blank, one has to keep the PD low while increasing the EPA. The fact that the effects of EPA and PD on linear dimensions are unsusceptible to differences in raw materials lends support for the archeological applications of different allometry ratios. In particular, the ratio of thickness, a dimension that is not affected by Fig. 10 Dotplot showing the effect of exterior platform angle on flake thickness (standardized by platform depth) for different raw materials Fig. 9 Boxplots showing relationship between platform depth and width for exterior platform angle groups in archeological assemblages (Table 2) Archaeol Anthropol Sci (2020) 12:44 retouch, and length/width or weight, dimensions that reduce with reduction, can confidently be used as a measure of resharpening across different materials.

Force
As described in Dibble and Rezek (2009), the current experimental design, which has the steel hammer attached to a load cell, allows us to measure the load, or force, required to remove a flake. Basically, as the hammer begins to make contact with the platform surface, the load increases until the exact moment when the flake is released; at this point, the load returns to zero. In Dibble and Rezek (2009), it was shown that force is highly correlated with flake weight-larger flakes require more force for their removal-and this relationship was not affected by variation in EPA, PD, or angle of blow except for the fact that these independent variables all contribute to flake weight. For example, two flakes of the same weight can result from a high EPA and low PD or vice versa, but the force required for their removal is the same. Force is a good predictor of weight (r 2 = 0.38, p < 0.001, F(1,50) = 32.51), and when raw material is added as a term in the model, it shows that it contributes to weight prediction (r 2 = 0.55, p < 0.001, F(4,47) = 16.63; ANOVA model comparison (F(− 3,50) = 7.26, p < 0.001). This is the one comparison tested here that does show a major difference among raw materials-harder materials, flint, and basalt, require more force than do glass and obsidian (Fig. 12), as most knappers would readily attest. Our data also suggest that flint, glass, and obsidian at lower EPAs require more force to remove a flake of particular size (Fig. 12); EPA improves the model where weight is predicted by force and raw materials (ANOVA: Fig. 11 Boxplots showing relationship between platform depth and thickness for exterior platform angle groups in archeological assemblages (Table 2) Fig. 12 Relationship of force required to remove flakes by weight (log transformed) for different raw materials and varying exterior platform angles F(1,46) = 4.26, p = 0.04, model (r 2 = 0.58, p < 0.001, F(5,46) = 15.08, p < 0.001). However, this effect is not large. More data are needed to improve our understanding of the relationship of force, platform variables, and size.

Discussion and conclusions
The purpose of this present experiment was to address the question of whether the results of controlled experiments on glass cores are applicable to raw materials that were used in the past. We addressed this question by modeling the effect of raw materials on the relationships of EPA and PD to flake size, as measured by volume and linear dimensions. In nearly every comparison, we were unable to show that raw material has an effect on the EPA and PD relationship to size. Increasing both EPA and PD will increase flake size, regardless of the raw material type (including glass). Furthermore, the two variables interact in a way that the effect of changes in PD is amplified at higher values of EPA. Our results show that the knappers can use the same combination of EPA, PD, and core morphology to obtain similar products without making adjustments for different materials. We have shown that what makes a difference in knapping different raw materials is the amount of force needed to remove a flake. With the same core configuration, EPA and PD, the flakes struck from different raw materials will be of comparable sizes, but more force is required to remove flakes in flint and basalt than in glass or obsidian.
Some variations in flake size are noted in the experimental assemblage. For instance, the variation in standardized volume or linear dimensions is higher with higher values of EPA. Perhaps this is due to slight differences in the core surface topography-the basalt, flint, and obsidian cores were all shaped by hand by the same individual, while the glass cores were molded according to a slightly different design. Potentially, the effect of core surface morphology on size might increase with higher values of EPA. Nevertheless, the effects of EPA and PD are the same in all raw materials, and this variation appears irrespective of raw material.
It is useful to put the present study in the theoretical context of experimental design and inferential validity in archeology. In comparing the archeological record to experiments, Eren et al. 2016 argue that the former is high in external validity given that it represents a direct evidence of past behavior, but low in external validity because it is a biased and incomplete record that does not allow for control or randomization of variables. Experiments, on the other hand, have high internal validity due to their repeatability and the potential for controlling the variables, but can never precisely replicate the behavior behind the archeological record. Lin et al. 2018 use these terms in a different way when discussing the reliability of inferences derived from experiments. They state that three kinds of inferential validity play a role in experiments. The first is internal validity, which refers to the level of precision in the experimental design and the degree to which the experiment can isolate the effects of a single independent variable. The second, external validity, refers to whether the experimental results can be generalized to contexts beyond the experimental setting itself. The third kind of inferential validity is ecological validity, which basically refers to how well the results from experimental design repeat what is observed in the field. Here we use these terms as per Lin et al. 2018.
Early controlled experiments on flake production (Speth 1972;Faulkner 1972;Bonnichsen 1977;Dibble and Whittaker 1981;Dibble 1995, see also Dibble andRezek 2009) tended to be reasonably high in internal validity in being able to isolate or vary in a controlled fashion the effects of any single independent variable. Unfortunately, due largely to their experimental design, they were low in ecological validity: to put it simply, the flakes produced by these experiments bore little resemblance to those found in archeological sites because the shapes of the cores used did not resemble those used in prehistoric assemblages (see Rezek et al. 2016 for review). In addition, a number of earlier experiments (Dibble and Whittaker 1981;Pelcin and Dibble 1995) were performed on plate glass, with flakes removed from the edge. While these flakes did resemble burin spalls, flake width was invariable, thus limiting the ability to generalize the results to further our understanding of how width and two-dimensional flake shape were being affected by the independent variables under study. Not surprisingly, there were few attempts to apply the findings from these experiments to archeological collections (Dibble 1981(Dibble , 1997. A new experimental design, used first about 10 years ago (Dibble and Rezek 2009) and used again here, overcame some of these limitations by using glass cores that could be molded which allowed core surface configurations that yielded flakes that more closely resembled archeological ones. In itself, this new design helped to EPA, PD, core morphology Flake shape (Rezek et al. 2011) increase both the external and ecological validity of the experiment. The use of glass cores therefore maintained high internal validity due to the fact that all of the cores were molded to be identical in size and shape. However, using glass raises a question as to the external and ecological validity of the experiments simply because glass is not a material that was used in prehistoric knapping.
Our results verified that the relationships between the platform attributes, EPA and PD on flake size, obtained from experiments on glass cores can indeed be extended to other materials that were used in the past, in this case, varieties of basalt, flint, and obsidian. This is important because it shows that experiments using glass cores can be generalized and extended to other materials. In other words, they also have a relatively high degree of external validity. We note that, besides glass, porcelain is also gaining in popularity for knapping experiments (e.g., Khreisheh et al. 2013), and this material too should be evaluated to ensure its external and ecological validity and therefore suitability as a test material.
There are similarities in how raw materials tested in these experiments behave during knapping and potentially these results can be generalized to all knappable materials. There is a possibility, however, that other-or even many-materials used in the past may respond differently. Moreover, not all flints, basalts, and obsidians are the same, and so within these broad categories of rocks, considerable variability exists that may also affect their fracture properties. While it is impossible to perform experiments on every variety of rock that exists in nature, a more fruitful approach would be to study fracture patterns in relationship to basic properties of materials-such as hardness, brittleness, and homogeneity-which could ultimately enable us to generalize experimental results to a much larger degree and model how those properties might affect particular properties of lithic assemblages. Such an approach is beyond the scope of the present paper, but it does represent a fruitful avenue for future research. We will reiterate, however, that the three raw materials tested here show a degree of variation in their properties and yet respond to changes in EPA and PD in a comparable way, and in a comparable way also to glass.
Comparisons with archeological materials allow us to assess the ecological validity of these experimental results. The general finding is that the effects of EPA and PD are observable in archeological assemblages as well. However, note that, again, even in these samples there was a limited variety of raw materials-they were mostly Late Cretaceous flints. While the comparisons presented here are concerned only with overall size and the three linear flake dimensions, previous published experiments have presented other such comparisons with reference to archeological data (see Table 5).
The applicability of the results from the experiments presented here to archeological context is reflected in the corroboration of the methods for estimating the original size of already reduced and discarded artifacts found in the archeological record (Braun et al. 2008;Dibble 1997;Lin et al. 2013). In that regard, the convergence of results stemming from controlled and free-hand (Clarkson and Hiscock 2011;Davis and Shea 1998;Dogandžić et al. 2015;Clarkson 2014, 2016;Shott et al. 2000) experiments on the relationship of platform variables and size speaks further to the level of ecological validity.
The clear conclusion-both from our experiments and from comparison with archeological data-is that the EPA and PD predict flake size irrespective of the raw material-including glass. Our results show that the experiments performed on glass are valid and therefore generalizable not only to several other materials that were knapped in the past but that this also provides insights into significant aspects of prehistoric lithic assemblage variability (see also Lin et al. 2013;Režek et al. 2018). We showed that it takes different amounts of force to remove flakes of the same size (weight) made on materials of different properties. We did not, however, investigate how other aspects of flaking, such as hammer type, and angle of blow, might affect the results, across different raw materials. There is undoubtedly still much to be learned about the fracture properties of various materials, but there is little doubt that a better understanding of how materials break will significantly aid in developing accurate interpretations of prehistoric chipped stone industries.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.