Testing inter-observer error under a collaborative research framework for studying lithic shape variability

Timbrell, Lucy; Scott, Christopher; Habte, Behailu; Tefera, Yosef; Monod, Hélène; Qazzih, Mouna; Marais, Benjamin; Black, Wendy; Maroma, Christine; Ndiema, Emmanuel; Henderson, Struan; Elmes, Katherine; Plomp, Kimberly; Grove, Matt

doi:10.1007/s12520-022-01676-2

Testing inter-observer error under a collaborative research framework for studying lithic shape variability

Research
Open access
Published: 01 October 2022

Volume 14, article number 209, (2022)
Cite this article

Download PDF

You have full access to this open access article

Archaeological and Anthropological Sciences Aims and scope Submit manuscript

Testing inter-observer error under a collaborative research framework for studying lithic shape variability

Download PDF

Lucy Timbrell¹,
Christopher Scott¹,
Behailu Habte²,
Yosef Tefera²,
Hélène Monod³,
Mouna Qazzih⁴,
Benjamin Marais⁵,
Wendy Black⁵,
Christine Maroma⁶,
Emmanuel Ndiema^6,7,
Struan Henderson⁸,
Katherine Elmes⁸,
Kimberly Plomp^9,10 &
…
Matt Grove¹

3603 Accesses
7 Citations
21 Altmetric
Explore all metrics

Abstract

Evaluating error that arises through the aggregation of data recorded by multiple observers is a key consideration in many metric and geometric morphometric analyses of stone tool shape. One of the most common approaches involves the convergence of observers for repeat trails on the same set of artefacts: however, this is logistically and financially challenging when collaborating internationally and/or at a large scale. We present and evaluate a unique alternative for testing inter-observer error, involving the development of 3D printed copies of a lithic reference collection for distribution among observers. With the aim of reducing error, clear protocols were developed for photographing and measuring the replicas, and inter-observer variability was assessed on the replicas in comparison with a corresponding data set recorded by a single observer. Our results demonstrate that, when the photography procedure is standardized and dimensions are clearly defined, the resulting metric and geometric morphometric data are minimally affected by inter-observer error, supporting this method as an effective solution for assessing error under collaborative research frameworks. Collaboration is becoming increasingly important within archaeological and anthropological sciences in order to increase the accessibility of samples, encourage dual-project development between foreign and local researchers and reduce the carbon footprint of collection-based research. This study offers a promising validation of a collaborative research design whereby researchers remotely work together to produce comparable data capturing lithic shape variability.

Evaluating the effects of parallax in archaeological geometric morphometric analyses

Article 29 June 2020

Interrogating Random and Systematic Measurement Error in Morphometric Data

Article 16 February 2024

A pan-European dataset revealing variability in lithic technology, toolkits, and artefact shapes ~15-11 kya

Article Open access 07 September 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Shape analyses are becoming an increasingly popular methodology for examining lithic variability in the archaeological record. As such, traditional linear metrics and geometric morphometrics (GMM) are often employed to capture morphological information on stone tools (Cardillo 2010; Lycett and von Cramon-Taubadel 2015; Matzig et al. 2021). Combining morphological data from multiple observers is frequently necessary in studies of lithic assemblages, to increase sample size and/or to perform inter-site/inter-assemblage analyses, yet this can be problematic due to the possibility of introducing inter-observer error into the data (Lyman and VanPool 2009). Such error has multiple potential sources, can be introduced at various stages in the workflow, and can skew results by obscuring any “real” signals in the data (Fruciano 2016); examining the magnitude of inter-observer error is therefore imperative to validate whether meta-analyses are robust. International researchers are increasingly being encouraged to work collaboratively in order to remotely produce archaeological and anthropological datasets (Chang and Alfaro 2015; O’Leary and Kaufman 2011; Scerri et al. 2020; Timbrell 2020, 2022)—in some case even crowdsourcing morphometric data (Chang and Alfaro 2015). However, it is frequently impossible for observers to converge on the same material to record repeat trials for an inter-observer repeatability assessment. Such control tests therefore need to be appropriate for the specific research design, and customized solutions for evaluating error under collaborative research frameworks should be developed (Fruciano 2016). Here, we present an innovative analysis of inter-observer error involving the compilation of standardized photographs and measurements of lithics from multiple observers for metric and GMM analysis (Timbrell 2022).

Traditionally, lithic shape variation has been examined through qualitative descriptions (Inizan et al. 1999), typological classification (Bordes 1961) and/or linear measurements (Roe 1964; McNabb 2017). Advancements in biological morphometrics and computing have meant that geometric morphometrics are now also routinely applied in the analysis of lithic morphologies (Bookstein 1991; Buchanan et al. 2018; Cardillo 2010; Lycett 2009; Serwatka and Riede 2016). GMM approaches are split into methods that use landmarks and outlines, the former representing shape through homologous points (landmarks) superimposed on a two-dimensional (2D) or three-dimensional (3D) object and the latter applying geometric descriptions of homologous outlines or surfaces (Mitteroecker 2021). Landmark-based methods allow for specific aspects of morphology to be captured without the inclusion of random noise (i.e. shape dimensions that are not pertinent to the research question); however, their application to certain non-biological structures, such as lithics and other archaeological artefacts, is often more difficult as the identification of homologous landmarks can be subjective (Okumura and Araujo 2018). Outline-based GMM, on the other hand, avoids certain issues of homology through quantifying the gross shape of each specimen (Klingenberg 2008), making them ideal for describing shape variation of lithics in archaeological studies (e.g. Iovita 2009, 2011; Ivaonovaité et al. 2020; Matzig et al. 2021; Mesfin et al. 2020; Wang and Marwick 2020).

Assessment of the levels of inter- and intra-observer error under different methodological approaches to studying lithic shape is vital, and several studies have examined error in metric and GMM analyses at different phases of the workflow (Evin et al. 2020; Fagerton et al. 2014; Lyman and VanPool 2009; Macdonald et al. 2020; Menedez 2017; Osis et al. 2015; Perini et al. 2005; Robinson and Terhune 2017; von Cramon-Taubadel et al. 2007; Yezerinac et al. 1992). Problematic landmarks, i.e. those that are difficult to consistently locate, can be a source of error in landmark-based GMM analysis (Fagerton et al. 2014; Menedez 2017; Robinson and Terhune 2017; von Cramon-Taubadel et al. 2007), even for experienced observers (Chang and Alfaro et al. 2015). von Cramon-Taubadel et al. (2007) found that repeating the digitization procedure was the most suitable method for assessing the precision of landmarks, with adequate landmark definitions imperative for reducing error. Yezerinac et al. (1992) also found that ill-defined measurements were a factor increasing error in metric data; in addition to operator experience, the precision of the measuring device and the conditions under which the measurements are made, such as lighting. Combining metric measurements from more than one observer, therefore, is likely to be suitable only when the dimensions are standardized and easily measured, and the conditions, the precision and quality of the equipment and the technique of recording the data are comparable (Lyman and VanPool 2009).

Comparatively, fewer studies have examined the levels of inter-observer error in outline-based GMM methods. Evin et al. (2020), in an investigation of error between morphometric approaches, found that although methods that employ landmarks were the most sensitive to error, outline data saw relatively lower levels of intra-observer error compared to inter-observer error, with photography being an influential source of variance between observers. Digital photography is widely used in 2D GMM as it is inexpensive, easy to perform and does not require extremely specialist knowledge or equipment, with the digitization of landmarks and/or outlines on the resulting images providing a 2D representation of the 3D object. The focal length and specifications of the lens used can, however, cause parallax error; the optical distortion that occurs when the specimen is too close or not directly centered beneath the lens (fisheye). Nonetheless, several studies employing both landmark and outline methods suggest that 2D GMM data are minimally affected by parallax error, especially when the camera set-up is standardized and calibrated, with deviations small and constant enough for accurate analyses (Caple et al. 2018; MacDonald et al. 2020; Mullin and Taylor 2002; Riano et al. 2009). Overall, outline-based methods are likely more suitable for collaborative research designs in studies of lithic shape; due to the objectivity of data capture, the fact that landmark methods have high rates of inter-observer error, though this is more pertinent during landmark digitization than object photography (Evin et al., 2020), and the potential to reduce inter-observer error through the standardization of the photography procedure.

Although the inter-observer error is a concern in any collaborative research design, collating data from multiple observers is often necessary in archaeological research, be it to increase sample sizes, facilitate interdisciplinary research and/or enable access to disparate data (Timbrell 2020). The latter is especially important when considering issues of income-disparity, childcare and disability that can disproportionately disadvantage researchers who are unable to travel extensively to collect data. Global catastrophes, such as pandemics, climate change and conflict, can also temporarily delay international research through the constraints imposed on travel and safety, requiring researchers to develop scientifically sound remote models of data generation (Scerri et al. 2020). Timbrell (2022) presents such a framework, which involves the documentation of lithic shape by multiple collaborators. These types of approaches have additional benefits for decreasing the carbon footprint associated with accessing multiple international samples and fostering knowledge-sharing through dual project development and the division of responsibilities so that both foreign and local researchers take on principal roles within a given project, which is particularly crucial across the Global North–South divide (Chirikure 2015; Douglass et al. 2020; Else, 2022). Indeed, collaborative approaches accord with the open science initiative in archaeology, which advocates that data stewardship should be centered around researchers collecting and sharing data on behalf of the scientific community, as opposed for the betterment of a single individual’s career (Marwick et al. 2017).

While collaborative data collection offers a promising new framework for generating and sharing data internationally, the analysis of inter-observer error is imperative to validate such an approach. Here, we present a unique control test that involves the production of 3D printed replicas of a lithic reference collection, which can be distributed among observers and measured following the same protocols used to collect the actual data. We then examine the differences between the datasets, knowing that each collaborator has recorded the same data from identical copies of the artefacts. Using this approach, we evaluate whether the compilation of data from multiple observers is conducive to error, and thus could negatively bias the results of a collaborative study.

Materials

Six lithic points were knapped using fine-grained flint from Caistor Quarry, Caister St Edumunds, UK and scanned for 3D printing at the University of Liverpool (Fig. 1). The reference tools varied in both size and shape, encapsulating a range of morphologies characteristic of the empirical sample to be studied in the main project (African Middle Stone Age assemblages). While flint is not a feature of African lithic assemblages, it could be considered representative of the finer-grained materials, such as obsidian, chert and heat-treated silcrete, exploited during the Middle Stone Age (Key et al. 2021; Sahle et al. 2013). The tools were produced on flakes and retouched using: (1) direct freehand hard hammer percussion (quartzite hammerstones), (2) direct soft free-hand hammer percussion using an antler hammer and (3) handheld pressure flaking using an antler tine supported in a tanned leather pad. Each tool was colored blue using craft enamel spray paint to aid scanning.

Next, each lithic was scanned with a freshly calibrated Einscan Pro 2X structured light scanner with a colour camera, using the combined feature and texture mapping in the high-resolution setting. Initial scans were performed with the lithics placed vertically in a foam holder using fixed scan mode aligned with an automated turntable and coded targets (scans taken every 11.25 degrees, i.e. 32 scans). The models were then completed by switching to the “align by” feature using the turntable (32 scans), and the lithic was rescanned (2–3 times) until a complete model was achieved. All alignment was automatic to produce a watertight mesh; no holes were filled. Each model was sharpened using the Einscan high-setting and saved as.obj files without decimation (see Supplementary Online Table 1 for further data on each model).

The 3D models were processed for printing using Chitubox v1.8.1. Medium-sized automated supports were applied using this software at 90% total coverage to provide a strong foundation for the 3D prints. We used an Elegoo Mars 2 Resin printer, with a new printer film, using standard grey Elegoo LCD UV curing 405 Nm photopolymer resin with recommended Elegoo settings (Fig. 2). The prints were extracted from the print bed, and the supports were removed by hand prior to being rinsed in ethanol and cured in direct sunlight. Each tool was printed six times to create six copies of the assemblage, resulting in 36 prints in total.

Methods

Each tool was assigned a number (Tool 1–6; Fig. 1), and a replica copy of the assemblage was sent to researchers at six independent institutions (Table 1). Data collection protocols, outlined in detail by Timbrell (2022) and described in Supplementary Online Resource S1, were developed to standardize the documentation of lithic shapes through photography and measurements. These procedures were followed by all observers across the study to produce equivalent data. Instructions for object position, camera position and settings and lighting were specified and tightly controlled (Supplementary Online Resource S1). In addition, a scale (sourced in situ by the observers) was placed in each photograph to ensure a measure of size was recorded. Table 1 reports the camera and lenses used to capture each replica assemblage in the study; high-quality equipment was accessed by all observers either through their institution directly or through funding provided by the project. Three basic measurements on each tool were also taken to record morphological length, width and thickness (see Supplementary Online Fig. 1 for a schematic) at a resolution of 0.1 mm. We defined length as the maximum dimension of the point, width as the maximum measurement in the perpendicular dimension to length and thickness as the maximum measurement in the third dimension, following Shea (2020).

Prior to distribution among institutions, all 36 replicas were also recorded by a single observer (LT) to produce a comparative dataset. Photography was performed using a Canon M50 camera with an EF-S 60 mm f/2.8 Macro USM lens and the three measurements were taken using digital calipers. This enabled us to determine the magnitude of intra-observer measurement error, for comparison with the magnitude of inter-observer error, had the project been carried out by a single individual under a traditional research framework.

Data were uploaded onto a communal data sharing platform (Google Drive) by each observer for processing and analysis by a single observer (LT). Analyses were performed in the R software environment (R Core Team 2020). Data and code can be found on the GitHub repository for the project: https://github.com/lucytimbrell/error_analysis_lithics/.

Table 1 Summary of the observers and the photography equipment used. This equipment was sourced locally; in most cases, the institutions already had access to the necessary apparatus; however, in some cases, it was rented and/or purchased and donated to the institution after the project, following guidelines provided by The Wenner Gren Foundation

Full size table

Metric analyses

We first computed the intra-class correlation coefficient ($ICC$) using the “psych” R package (Revelle, 2022) to assess the agreement between the six observers in measuring the six tools for length, width and thickness. The $ICC$ compares the variability within repeat measurements whilst contrasting variability between groups of measurements (Barlett and Frost 2008; Fruciano 2016; Koo and Li 2016; Shrout and Fleiss 1979). Specifically, we used a two-way mixed effects model to compute the $ICC$, with the set of observers considered a fixed effect. To assess the reliability of data collection, we next calculated and compared the mean, variance, technical error of measurement ($TEM$) and percentage technical error of measurement ($\% TEM$). The mean and variance (expressed as the standard deviation) were calculated for each measurement on each tool, with the $TEM$ and $\% TEM$ calculated to compare pairs of observers across all measurements on all tools. The $TEM$ reflects measurement precision between observers, and is calculated as:

$$TEM = \sqrt{\frac{\left({\sum }_{1}^{N}\left({\sum }_{1}^{K}{M}^{2}\right)\right) - \left({\left({\sum }_{1}^{\mathrm{K}}M\right)}^{2}/K\right)}{N\left(K-1\right)}}$$

where $\mathrm{N}$ is the number of subjects, $\mathrm{K}$ is the number of observers, and $\mathrm{M}$ is the measurement (modified from Ulijaszek and Kerr [1999]). The %TEM represents the magnitude of the error as a percentage of the mean of the measurement/variable studied. It is calculated as:

$$\%TEM = 100 (\frac{TEM}{\overline{v} })$$

where $\overline{v }$ is the average value of the raw measurements, taken across all measurements on all tools by multiple observers. The values obtained for these metrics must be subjectively assessed according to the research question, as there is no standard applied threshold of error deemed to be “acceptable”. Following Lyman and VanPool (2009)’s analyses of projectile points, we propose that a $\%TEM$ of < 4 could be an acceptable level of error without negative consequences on the results. Lastly, we calculated the coefficient of reliability $(R)$, which ranges from 0 to 1, with 1 indicating very high congruence between measures. We used the following formula outlined in Lyman and VanPool (2009):

$$R = {\sigma }_{v}^{2}/\left({\sigma }_{\mathrm{v}}^{2} + {\sigma }_{\mathrm{d}}^{2}\right)$$

where ${\sigma }_{\mathrm{v}}^{2}$ is the variance of all raw measurements on all tools taken by two observers and ${\sigma }_{\mathrm{d}}^{2}$ is the variance of the difference between those two sets of measurements. Similarly to the $ICC$, the coefficient of reliability distinguishes between the variability between the specimens and that which results from random measurement error. However, whilst $R$ can only be calculated between pairs of observers, the $ICC$ represents an overall metric for measurement error across all observers.

Random error can inflate the amount of variance within a sample, resulting in a loss of statistical power as noise obscures true differences in means (Fruciano 2006; Yezerinac et al. 1992). To evaluate the levels of error in the multiple observer data in relation to the single observer data, we used two-sample t-tests to compare differences in mean and F-tests to compare differences in variance. If there is high inter- and/or intra-observer error, variation within replicas of the same tool will be increased and differences in the mean values for each tool will be significantly different.

Two-dimensional geometric morphometric analysis

In preparation for GMM analysis, each image was processed using the “object select’ tool in Adobe Photoshop, which automatically determines the contour of the object. Once the contour was highlighted, the object was filled with solid black to help facilitate the extraction of outline data. All processed images were then synthesized into a single thin-plate spline (.tps) file using tpsUtil, and the outline data were extracted using tpsDig2. The outline of each artefact was represented by an average of 2856 equidistant points, which were scaled through the specification of the pixel-to-centimeter ratio for each image (see Supplementary Online Fig. 2 for a visualization of the data). The outline data were saved as (x, y) coordinates within the.tps file and imported into R.

Using the “Momocs” R package (Bonhomme et al. 2014), the outlines were standardized following Bonhomme et al. (2017) by normalizing to a common centroid, scaling to centroid size and aligning along the long axis of the object. We then performed elliptic Fourier analysis (EFA) to convert the geometric data to frequency data, with the outline decomposed into a series of repeating trigonometric functions, referred to as harmonics (Caple et al. 2017; Fig. 3). The appropriate number of harmonics were identified to capture sufficient information on shape; this was deemed to be 8 harmonics, achieving 99% harmonic power (Caple et al. 2017).

Next, we performed a principal components analysis (PCA) on the elliptic Fourier coefficients to reduce the dimensionality of the data. Principal components (PCs) are constructed to highlight the main axes of morphological variance (Zelditch et al. 2004). Like with the metric data, we calculated the $ICC$ and $R$ values to partition the variance from the inter-observer error for the PC scores of repeat captures (Daboul et al. 2018; Fruciano 2006). Due to the nature of PC scores, we were unable to obtain an informative relative measure of dispersion ($\%TEM)$ and instead refer to the standard deviation (calculated as the square root of the variance) as absolute measures of dispersion. This is because, when the mean of a set of repeat captures falls close to the mean of a PC (~ 0) and has a low standard deviation (~ 0), the %TEM would be very high despite the tight clustering of the repeated measures along that PC. In addition, we applied linear discriminant analysis (LDA) to the PC scores, with the equal sample sizes used as the prior group probabilities (1/6) of a repeat belonging to a certain group based on their outline shape alone (Mitteroecker and Bookstein 2011). In this analysis, we tested firstly whether the tools could be distinguished based on their shape alone, and then whether the observers could be identified. One would expect high classification results when discriminating between tools and low classification results when discriminating between observers if inter-observer error is low.

Results

Linear metric analysis

We first explored whether the measurements were recorded consistently on the replicas between observers. Figure 4 shows the distribution of the multiple observer data through boxplots; most of the measurements have very limited variance around the mean, and all tools were significantly different to each other across all measurements when tested using Tukey’s honestly significant difference (HSD; p < 0.001). Thickness is the most variable dimension recorded, probably because it is more difficult to orient the tool for this measurement than it is for length or width. Calculation of the coefficient of reliability between each pair of observers found that all values of $R$ were > 0.999, suggesting that over 99% of the variance in each measurement is due to variability between the specimens as opposed to error. We calculated the $TEM$ as 0.368 and the $\%TEM$ as 0.908, supporting that less than 1% of the variance in the dataset is related to measurement error. Finally, the $ICC$ score confirmed that there is a very high absolute agreement between the observers ($ICC$ = 1, p < 0.001).

We then compared the measurements taken by multiple observers with those taken by a single observer as a means of comparing intra- and inter-observer errors. We first calculated the coefficient of reliability for the single observer for each pair of replica assemblages—we found that all values were > 0.999, indicating very high congruence between repeat captures by the single observer. Table 2 reports the mean and standard deviation of length, width and thickness for the single observer compared to multiple observers; two-sample t-tests found that there were almost no statistically significant differences in means between the data sets (1/36 = p < 0.05; Table 3). However, F-tests found that half of the measurements show statistically significance differences in variance, particularly along length and width (Table 3). This demonstrates that the single observer is generally less prone to error, which is likely due to a combination of the familiarity of this observer to both the metric definitions and the assemblage and the fact that the same equipment was used to measure all of the replicas. Nonetheless, the fact that these differences in variance only resulted in a single instance of significant difference in mean, plus the standard deviation does not exceeds 0.7 mm, suggests that the effects of inter-observer error are minimal on the results.

Table 2 Summary statistics reporting the mean (m) and standard deviation (sd) obtained for length, width and thickness, recorded by multiple observers versus a single observer for each tool (1–6). Standard deviation values have been rounded to 3 decimal places

Full size table

Table 3 P-values from t-tests (difference in mean) and F-tests (difference in variance) comparing the metrics (length, width and thickness) for each tool (1–6) measured by multiple observers versus a single observer. Statistical significance (p < 0.05) is marked by an asterisk (*). All values have been rounded to 3 decimal places

Full size table

Geometric morphometric analysis

PCA was used to highlight variance in the multiple observer data. The first 3 PCs represented > 90% of the variation between the replicas, and thus were explored in this study. Figure 5 demonstrates the shape differences highlighted by PC1-3. PC1 represents 59.7% of the total variance, whilst PC2 and PC3 account for 33.4% and 3%, respectively (see Supplementary Online Fig. 3 for scree plot of PC loadings and cumulative variance).

When the first 3 PCs are plotted against each other, clear clustering occurs, demonstrating that replicas of the same tool tend to share more similarities than that of different tools (Fig. 6). However, there is notable variation within tools along PC3, suggesting that inter-observer error deriving from photography equipment and set-up is prevalent in this dimension. PC3 is an axis of variation represented by slight asymmetries in convexity at the proximal end (Fig. 5), thus likely reflecting parallax error between observers. Additionally, we note some overlap between certain tool groups, although this is primarily because these tools share similar shapes once size is removed (Supplementary Online Fig. 2). For example, Tool 5 sometimes plots within the range of variation for Tool 1 and only shows statistically significant differences in mean from this tool along PC2 (p < 0.008; see Supplementary Online Table 2 for Tukey’s HSD results comparing differences in mean between tools). To tease apart the variation between the tools and that associated with the error, we calculated the coefficient of reliability between each pair of observers, which ranged between 0.960 and 0.999 (Table 4), suggesting that < 4% of the variance is due to inter-observer error, which lies within our acceptable threshold. The $ICC$ was computed using the first 3 PC scores to determine levels of similarity between the six observers, whilst taking into account the variability between the tools, and found an almost perfect agreement ($ICC$ = 0.99, p < 0.001). Finally, we found that an LDA could discriminate accurately between the replica groups (94% classification accuracy) and could not differentiate between observers (0% classification accuracy).

Table 4 Coefficient of reliability ($R$) values for pair-wise combinations of observers using the first 3 PC scores. For observer abbreviations and associated assemblage numbers, see Table 1. All values have been rounded to 3 decimal places

Full size table

Next, we compared the levels of error obtained when collating photographs from multiple observers and that which arises when all replicas are photographed by the same observer. We performed another PCA with data acquired from both sets of images (see Supplementary Online Figs. 4-5 for PC contributions and loadings) and produced scatterplots of PC1–3. Figure 7 demonstrates clear clustering between tools recorded in both sets of data along PC1 and PC2. However, along PC3 there is clear variability within repeats when grouped by the observer (multiple vs single). F-tests found that the variance among certain tools was only significantly higher for the multiple observers in three cases, i.e. tool 4 and 1 along PC3 and tool 4 along PC1 (Table 5). Two-sample t-tests found statistically significant differences in means between the data sets, but these are limited (5/36 = p < 0.05; Table 5). Table 6 and Fig. 7 demonstrate that the data collected by a single observer returns lower variance, though this pattern is not strong, and, in a few cases, it is slightly higher under this strategy, though not significantly so. We finally calculated the coefficient of reliability for the single observer between each of capture of the replica assemblages—Supplementary Online Table 3 shows that the $R$ values ranged from 0.994 to 0.999, suggesting that < 1% of the variance in the single observer data is due to an intra-observer error.

Table 5 P-values from t-tests (difference in mean) and F-tests (difference in variance) comparing the principal component (PC) scores of the repeats of each tool (1–6) captured by multiple observers verses a single observer. Statistical significance (p < 0.05) is marked by an asterisk (*). All values have been rounded to 3 decimal places

Full size table

Table 6 Summary statistics reporting mean (m) and standard deviation (sd) of principal component (PC) scores of the repeats of each tool (1–6), captured by multiple observers versus a single observer. All values have been rounded to 3 decimal places

Full size table

Discussion

Here we present a control study that validates the use of the collaborative data collection protocol presented in Timbrell (2022), which can now be used more extensively by other researchers to reduce travel and carbon emissions, as well as to bring researchers from other geographical areas into the collaborative process more directly. Our results demonstrate that the levels of inter-observer error permeating shape data collated under a collaborative research framework fall within the acceptable threshold, thanks to the establishment of clear research protocols followed by each collaborator. We found that, inevitably, increases in error occur as a consequence of relying on multiple observers, who each have access to different equipment, yet we do not deem this to be significant enough to highly distort the results towards a different conclusion about the data. Therefore, our innovative 3D printing approach and the results reported here have important implications for error assessments of linear metric and GMM data when recording lithic shape as well as the aggregation of data collected by multiple observers.

Outline-based GMM was found to be slightly more sensitive to inter-observer error than metric methods. As Caple et al. (2018) point out, EFA involves global descriptors capturing around 99% of the variance in the outline shape, and therefore, discrepancies between images lead to error in the coefficients dispersed throughout the full outline. Therefore, even if the error is not equally distributed, it is measured as such, and consequently, outline methods are often more sensitive to error than linear methods that capture only certain dimensions of an object. 2D outline-based GMM provides comprehensive morphological information on the gross outline shape of an object, whereas linear metrics are able to capture aspects of the 3D shape but in much less detail; the increase in the morphological information captured, plus the added potential for automated data capture (e.g. Bonhomme et al. 2014; Matzig 2021) and impressive shape visualization (e.g. Figure 5), will be worth the potential increase in error with 2D GMM in many scenarios.

Our use of PCA to highlight axes of variance within lithic shape assemblages also demonstrates that inter-observer error does not affect all PCs equally. As outlined by Page (1976), subtle errors in each variable are combined in multivariate analyses and can be extracted by a single or small set of PCs, although they may also describe real aspects of covariance and so require careful consideration as to their source. When undertaking metric analyses, it is possible to assess error in each individual measurement; if the metrics are combined via dimension reduction methods such as PCA, the contributions of each individual measurement to each PC are readily identifiable through the PCA coefficients. This is less feasible with GMM data, particularly when using outlines and semi-landmarks, and in such cases, it is preferable to assess error on each of the leading PCs, as demonstrated above, rather than on each set of coordinates, which can be very numerous. Overall, the error is impossible to avoid completely, and indeed, the imperfect fidelity of cultural transmission means that copying errors can naturally occur during the knapping process and inflate variance between and within assemblages (Eerkens and Lipo 2005; Schillinger et al. 2014). In this sense, the error is certain to arise within a data set capturing lithic variability; however, steps can be taken to ensure it is minimized, such as standardization of data acquisition, processing, and analytical procedures, calibration, high-quality equipment and assessment of error through repeat measures (Evin et al. 2020; Lyman and VanPool 2009; Robinson and Terhune 2017; Yezerinac et al. 1992). In the case of the current study, we determine that inter-observer error is low enough for accurate analyses under both methods, especially as the high $ICC$ and $R$ values demonstrate acceptable levels of congruence between the six observers.

Through the development of clear research protocols, our results demonstrate that multiple observers can successfully work together to produce sets of comparable data for aggregation. We believe that collaborative research designs, such as the one reported in Timbrell (2022), play an integral role in addressing the vulnerabilities of international research to disruption, revealed most recently in 2020 by the outbreak of coronavirus (COVID-19), which halted both domestic and international travel as well as social interaction. Our results suggest that, as well as single researchers visiting multiple collections to independently access lithic samples, international colleagues are also able to work together in situ to generate data, thereby building resilience in archaeological practice (Douglass et al., 2020; Scerri et al., 2020). We stress though that collaborative research designs should involve an equitable partnership in relation to the data, following the imminent Cape Town statement (see Else, 2022), with all researchers being involved in all stages of the research, from planning and protocol development to publication and dissemination (Chirikure 2015; Douglass et al. 2020). In this way, dual project development can enable local researchers to benefit from international archaeological research, thereby avoiding some (but not all) of the neo-colonial “helicopter” practices that have been hugely criticized in archaeological and anthropological sciences, particularly in Africa (Ackermann 2019; Athreya and Ackermann 2019; Sahle 2021). We have provided here an initial pilot test of collaborative data collection using a 3D printing approach. This approach is unique and, to our knowledge, has not yet been applied in the context of lithic variability nor inter-observer error assessments. We propose that future studies should aim to reproduce our approach with more expanded samples of replica artefacts and discuss three important aspects of potential future study design below.

The first aspect relates to the use of statistics and simple metrics for reporting the inter-observer error. Statistics such as the $ICC$ and %$TEM$ express the error variance relative to the overall variance of the sample; the variance is decomposed into that due to genuine variation among the artefacts and that due to variation among the observers (including that due to different individuals, their different cameras, lenses, etc.). Whilst this approach has many advantages, one immediate drawback is that these statistics are directly affected by the magnitude of genuine variation in both the sample of artefacts and in the dimensions measured. A given constant level of measurement error will appear large when the artefacts measured are highly standardized, but small when the artefacts measured are highly variable. Even if one were to measure the widths and lengths of a set of highly standardized artefacts, a given level of measurement error would appear smaller the further the ratio of width to length is from unity, as this would increase the magnitude of genuine variation in the measurements taken. For this reason, it is always valuable to present simple indices of absolute error (such as standard deviation or variance) for single measurements alongside the indices of relative error variance across all measurements provided by the $ICC$ and $\%TEM$. Such simple indices are valuable in assessing inter-observer error even when the ultimate study involves more sophisticated morphological analyses, such as those based on GMM. In the current study, Table 2 presents such indices and demonstrates that levels of error are minimal (the largest standard deviation among multiple observers for a single measurement = 0.613 mm).

The second aspect relates to the exploration of the effects of the raw material used for the production of the reference collection on the results of comparative studies. In this study, we used flint because it was available and accessible at the University of Liverpool, where the materials were prepared. This fine-grained raw material tends to produce well-defined features and edges, and so it would be interesting to replicate the approach with a more coarse-grained material, such as quartzite, chert, calcrete or sandstone. This is especially pertinent in our case as the shapes obtained from these materials are likely to be more representative of the actual African stone tools that have been recorded in the main project. However, we note that heat-treated silcrete may achieve a grain as fine as flint (Key et al. 2021), and that obsidian can be even finer-grained than flint; since both silcrete and obsidian are raw materials commonly found in African Middle Stone Age assemblages. We suggest that the flint used here acts as a suitable middle ground in terms of granularity and can therefore be considered as broadly comparable to those raw materials studied in the main project.

Finally, an aspect of variation between individual replicas that we did not explicitly measure is that which can arise through 3D printing. Zeng and Zou (2019) outline some of the factors that can affect the precision of 3D printing, which include slicing and support errors. However, we propose that, even if there are printing errors present in our replicas, these are likely minimal due to the highly comparable data obtained across the project. Additionally, printing errors should not contribute to differences between the two data collection strategies as both the multiple observers and the single observer recorded measurements from the same set of replicas. Depending on the local accessibility of 3D printers, our approach to inter-observer testing could be further streamlined through the direct sharing of the virtual 3D models, with each collaborator printing their own copies to measure. This would alleviate potential logistical problems with global distribution, both via mail or directly, though further research is required to ascertain the variation in objects printed using different models of 3D printers.

Conclusion

Aggregating lithic shape data requires careful consideration in to order reduce potential sources of inter-observer error that can result in detrimental consequences on the results and their interpretation. Our analysis of metric and outline-based 2D GMM data from multiple observers found that the former performed slightly better than the latter in our tests of inter- and intra-observer error, primarily due to differences in the nature and detail of the morphological information obtained, though both approaches returned levels of error deemed acceptable for accurate analyses. Standardization of the data collection procedure is vital for ensuring that congruence between observers is maintained, though we note that this alone cannot completely eradicate error as we find that variability between observers can still be detected within our data to a (sometimes) significant extent. Nonetheless, we believe that producing replica samples through 3D printing could have many useful applications within archaeological and anthropological sciences beyond the study of error in the analysis of lithic assemblages and should be adopted more widely in assessments of inter-observer error as an integral component of international collaborations between institutions.

Data availability

All data and R code can be found on the project’s repository and was made available for the peer-review of this article: https://github.com/lucytimbrell/error_analysis_lithics/.

References

Ackermann RR (2019) Reflections on the history and legacy of scientific racism in South African paleoanthropology and beyond. J Hum Evol 126:106–111. https://doi.org/10.1016/j.jhevol.2018.11.007
Article Google Scholar
Athreya S, Ackermann RR (2019) Colonialism and narratives of human origins in Asia and Africa. In: Porr M, Matthews JM (eds) interrogating human origins: decolonisation and the deep past. Routledge, London. https://doi.org/10.4324/9780203731659-4
Bartlett JW, Frost C (2008) Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 31:466–475. https://doi.org/10.1002/uog.5256
Article Google Scholar
Bonhomme V, Picq S, Gaucherel C, Claude J (2014) Momocs: outline analysis using R. J Stat Softw 56:1–24. https://doi.org/10.18637/jss.v056.i13
Article Google Scholar
Bonhomme V, Forster E, Wallace M, Stillman E, Charles M, Jones G (2017) Identification of inter- and intra-species variation in cereal grains through geometric morphometric analysis, and its resilience under experimental charring. J Archaeol Sci 86:60–67. https://doi.org/10.1016/j.jas.2017.09.010
Article Google Scholar
Bookstein FL (1991) Morphometric tools for landmark data: geometry and biology. Cambridge University Press, Cambridge
Google Scholar
Bordes F (1961) Typologie du Paléolithique Ancien et Moyen. Bordeaux
Buchanan B, Andrews B, O’Brien M, Eren MI (2018) An assessment of stone weapon tip standardization during the Clovis-Folsom transition in the Western United States. Am Antiq 83:721–734. https://doi.org/10.1017/aaq.2018.53
Article Google Scholar
Caple J, Byrd J, Stephan CN (2017) Elliptical fourier analysis: fundamentals, applications, and value for forensic anthropology. Int J Leg Med 131:1675–1690. https://doi.org/10.1007/s00414-017-1555-0
Article Google Scholar
Caple J, Byrd J, Stephan CN (2018) The utility of elliptical Fourier analysis for estimating ancestry and sex from lateral skull photographs. Forensic Sci Int 289:352–362. https://doi.org/10.1016/j.forsciint.2018.06.009
Article Google Scholar
Cardillo M (2010) Some applications of geometric morphometrics to archaeology. In: Elewa AMT (ed.) morphometrics for nonmorphometricians. Springer, Berlin, Heidelberg. 325–341. https://doi.org/10.1007/978-3-540-95853-6_15
Chang J, Alfaro ME (2015) Crowdsourced geometric morphometrics enable rapid large-scale collection and analysis of phenotypic data. Methods Eco Evol 7(4):472–482. https://doi.org/10.1111/2041-210X.12508
Article Google Scholar
Chirikure S (2015) “Do as I Say and Not as I Do”. On the gap between good ethics and reality in african archaeology. In: Haber A, Shepherd N (eds.) after ethics. Ethical Archaeologies: The Politics of Social Justice, vol 3. Springer, New York. https://doi.org/10.1007/978-1-4939-1689-4_3.
Daboul A, Ivanovska T, Bülow R, Biffar R, Cardini A (2018) Procrustes-based geometric morphometrics on MRI images: an example of inter-operator bias in 3D landmarks and its impact on big datasets. PLoS ONE 13(5):e0197675. https://doi.org/10.1371/journal.pone.0197675
Article Google Scholar
Douglass K (2020) Amy ty lilin-draza’ay: building archaeological practice on principles of community. Afr Archaeol Rev 37(481):485. https://doi.org/10.1007/s10437-020-09404-8
Article Google Scholar
Else, H. (2022) African researchers lead campaign for equity in global collaborations. Nature News. https://www.nature.com/articles/d41586-022-01604-3
Evin A, Bonhomme V, Claude J (2020) Optimizing digitalization effort in morphometrics. Biol Methods Protoc 5(1):bpaa023. https://doi.org/10.1093/biomethods/bpaa023
Article Google Scholar
Fagertun J, Harder S, Rosengren A, Moeller C, Werge T, Paulsen RR, Hansen TF (2014) 3D facial landmarks: inter-operator variability of manual annotation. BMC Med Imaging 14:35. https://doi.org/10.1186/1471-2342-14-35
Article Google Scholar
Fruciano C (2016) Measurement error in geometric morphometrics. Dev Genes Evol 226:139–158. https://doi.org/10.1007/s00427-016-0537-4
Article Google Scholar
Inizan ML, Reduron-Ballinger M, Roche H, Tixier J (1999) Technology and terminology of knapped stone. Cercle de recherches et d'études préhistoriques
Iovita R (2009) Ontogenetic scaling and lithic systematics: method and application. J Archaeol Sci 36(7):1447–1457. https://doi.org/10.1016/j.jas.2009.02.008
Article Google Scholar
Iovita R (2011) Shape variation in Aterian tanged tools and the origins of projectile technology: a morphometric perspective on stone tool function. PLoS ONE 6(12):e29029. https://doi.org/10.1371/journal.pone.0029029
Article Google Scholar
Ivanovaitė L, Serwatka K, Hoggard CS, Sauer F, Riede F (2020) All these fantastic cultures? Research history and regionalization in the Late Palaeolithic tanged point cultures of Eastern Europe. Eur J Archaeol 23(2):162–185. https://doi.org/10.1017/eaa.2019.59
Article Google Scholar
Key A, Pargeter J, Schmidt P (2021) Heat treatment significantly increases the sharpness of silcrete stone tools. Archaeometry 63(3):447–466. https://doi.org/10.1111/arcm.12619
Article Google Scholar
Klingenberg C (2008) Novelty and ‘homology-free’ morphometrics: what’s in a name? Evol Biol 35:186–190. https://doi.org/10.1007/s11692-008-9029-4
Article Google Scholar
Koo T, Li M (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15:155–163. https://doi.org/10.1016/j.jcm.2016.02.012
Article Google Scholar
Lycett SJ (2009) Quantifying transitions: morphometric approaches to Palaeolithic variability and technological change. In: Camps M, Chauhan P (eds) sourcebook of paleolithic transitions: methods, theories, and interpretations, Springer New York, pp 9–92. https://doi.org/10.1007/978-0-387-76487-0_5
Lycett SJ, von Cramon-Taubadel N (2015) Toward a “quantitative genetic” approach to lithic variation. J Archaeol Method Theory 22:646–675. https://doi.org/10.1007/s10816-013-9200-9
Article Google Scholar
Lyman LL, VanPool TL (2009) Metric data in archaeology: a study of intra-analyst and inter-analyst variation. Am Antiq 74(3):485–504. https://doi.org/10.1017/S0002731600048721
Article Google Scholar
MacDonald DA, Royal K, Buchanan B (2020) Evaluating the effects of parallax in archaeological geometric morphometric analyses. Archaeol Anthropol Sci 12:149. https://doi.org/10.1007/s12520-020-01111-4
Article Google Scholar
Marwick B, Guedes JA, Barton CM et al (2017) Open science in archaeology. SAA Archaeol Rec. https://doi.org/10.17605/OSF.IO/3D6XX
Article Google Scholar
Matzig DN (2021) OutlineR: an R package to derive outline shapes from (multiple) artefacts on JPEG images. Zenodo. https://doi.org/10.5281/ZENODO.4527469
Article Google Scholar
Matzig DN, Hussain ST, Riede F (2021) Design space constraints and the cultural taxonomy of european final palaeolithic large tanged points: a comparison of typological, landmark-based and whole-outline geometric morphometric approaches. J Palaeolithic Archaeol 4(27). https://doi.org/10.1007/s41982-021-00097-2
McNabb J (2017) Journeys in space and time Assessing the Link between Acheulean Handaxes and Genetic Explanations. J Archaeol Sci Rep 13:403
Google Scholar
Menéndez LP (2017) Comparing methods to assess intraobserver measurement error of 3D craniofacial landmarks using geometric morphometrics through a digitizer arm. J Forensic Sci 62(3):741–746. https://doi.org/10.1111/1556-4029.13301
Article Google Scholar
Mesfin I, Leplongeon A, Pleurdeau D, Borel A (2020) Using morphometrics to reappraise old collections: the study case of the Congo Basin Middle Stone Age bifacial industry. J Lithic Stud 7(1). https://doi.org/10.2218/jls.4329
Mitteroecker P (2021) Morphometrics in evolutionary developmental biology. In: de la Rosa LN, Muller GB (eds) evolutionary development biology. Springer, Cham, pp 941–951. https://doi.org/10.1007/978-3-319-32979-6_119
Mitteroecker P, Bookstein F (2011) Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evol Biol 38(1):100–144. https://doi.org/10.1007/s11692-011-9109-8
Article Google Scholar
Mullin SK, Taylor PJ (2002) The effects of parallax on geometric morphometric data. Comput Biol Med 32(6):455–464. https://doi.org/10.1016/S0010-4825(02)00037-9
Article Google Scholar
O’Leary MA, Kaufman S (2011) MorphoBank: phylophenomics in the ‘cloud.’ Cladistics 27(5):529–537. https://doi.org/10.1111/j.1096-0031.2011.00355.x
Article Google Scholar
Okumura M, Araujo AGM (2018) Archaeology, biology, and borrowing: a critical examination of geometric morphometrics in archaeology. J Archaeol Sci 101:149–158. https://doi.org/10.1016/j.jas.2017.09.015
Article Google Scholar
Osis S, Hettinga B, Macdonald S, Ferber R (2015) A novel method to evaluate error in anatomical marker placement using a modified generalized Procrustes analysis. Comput Methods Biomech Biomed Eng 18:1108–1116. https://doi.org/10.1080/10255842.2013.873034
Article Google Scholar
Page JW (1976) A note on interobserver error in multivariate analyses of populations. Am J Phys Anthropol 44(3):521–525. https://doi.org/10.1002/ajpa.1330440315
Article Google Scholar
Perini TA, de Oliveira GL, Ornellas JDS, de Oliveira FP (2005) Technical error of measurement in anthropmetry. Rev Bras Med Esporte 11(1):86–90. https://doi.org/10.1590/S1517-86922005000100009
Article Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Riaño HC, Jaramillo N, Dujardin J-P (2009) Growth changes in Rhodnius pallescens under simulated domestic and sylvatic conditions. Infect Genet Evol 9:162–168. https://doi.org/10.1016/j.meegid.2008.10.009
Article Google Scholar
Robinson C, Terhune CE (2017) Error in geometric morphometric data collection: combining data from multiple sources. Am J Phys Anthropol 164(1):62–75. https://doi.org/10.1002/ajpa.23257
Article Google Scholar
Roe DA (1964) The British Lower and Middle Paleolithic: some problems, methods of study and preliminary results. Proceedings of the Prehistoric Society.
Sahle Y (2021) Fossil men: the quest for the oldest skeleton and the origins of humankind. Am J Phys Anthropol 176(2):340–341. https://doi.org/10.1002/ajpa.24359
Article Google Scholar
Sahle Y, Hutchings WK, Braun, et al (2013) Earliest stone-tipped projectiles from the Ethiopian Rift date to >279,000 years ago. PLoS ONE 8(11):e78092. https://doi.org/10.1371/journal.pone.0078092
Article Google Scholar
Scerri EML, Kühnert D, Blinkhorn J et al (2020) Field-based sciences must transform in response to COVID-19. Nat Ecol Evol 4:1571–1574. https://doi.org/10.1038/s41559-020-01317-8
Article Google Scholar
Schillinger K, Mesoudi A, Lycett S (2014) Copying error and the cultural evolution of “Additive” vs. “Reductive” material traditions: an experimental assessment. Am Antiq 79(1):128–143. https://doi.org/10.7183/0002-7316.79.1.128
Article Google Scholar
Serwatka K, Riede F (2016) 2D geometric morphometric analysis casts doubt on the validity of large tanged points as cultural markers in the European Final Palaeolithic. J Archaeol Sci Rep 9:150–159. https://doi.org/10.1016/j.jasrep.2016.07.018
Article Google Scholar
Shea JJ (2020) Prehistoric stone tools of Eastern Africa: a guide. Cambridge University Press
Book Google Scholar
Shott MJ, Trail BW (2010) Exploring new approaches to lithic analysis: laser scanning and geometric morphometrics. Lithic Technol 35(2):195–220. https://doi.org/10.1080/01977261.2010.11721090
Article Google Scholar
Shrout PE, Fleiss JL (1979) Intraclass correlation: uses in assessing rater reliability. Psychol Bull 86:420–428. https://doi.org/10.1037//0033-2909.86.2.420
Article Google Scholar
Timbrell L (2020) Strength in numbers: combining old datasets to answer new questions. In Kaercher K, Arntz M, Bomentre N, Hermoso Buxán XL, Day K, Ki S, Macleod R, Muñoz Mojado H, Timbrell L, Wisher I (eds) New Frontiers in Archaeology: Proceedings of the Cambridge Annual Student Archaeology Conference 2019, Archaeopress: Access Archaeology. ISBN 978–1–78969–794–0
Timbrell L (2022) A collaborative model for lithic shape digitization in museum settings. Lithic Technol. https://doi.org/10.1080/01977261.2022.2092299
Article Google Scholar
Ulijaszek SJ, Kerr DA (1999) Anthropometric measurement error and the assessment of nutritional status. Br J Nutr 82:165–177. https://doi.org/10.1017/s0007114599001348
Article Google Scholar
von Cramon-Taubadel N, Frazier BC, Lahr MM (2007) The problem of assessing landmark error in geometric morphometrics: theory, methods, and modifications. Am J Phys Anthropol 134:24–35. https://doi.org/10.1002/ajpa.20616
Article Google Scholar
Wang L-Y, Marwick B (2020) Standardization of ceramic shape: a case study of Iron Age pottery from northeastern Taiwan. J Archaeol Sci Rep 33:102554. https://doi.org/10.1016/j.jasrep.2020.102554
Article Google Scholar
Yezerinac SM, Lougheed SC, Handford P (1992) Measurement error and morphometric studies: statistical power and observer experience. Systematic Biol 41(4):471–482. https://doi.org/10.1093/sysbio/41.4.471
Article Google Scholar
Zelditch ML, Swiderski DL, Sheets DH, Fink WL (2004) Geometric morphometrics for biologists: a primer. Academic Press
Google Scholar
Zeng L, Zou X (2019) Error analysis and experimental research on 3D printing. IOP Conf Ser Mater Sci Eng 592:012150
Article Google Scholar

Download references

Acknowledgements

We would like to thank Stéphanie Bonilauri, Laurence Glemarec, and Roland Nespoulet (Musée de l’Homme), Abdelouahed Ben-Ncer and Hassan Dermouk (Institut National des Sciences de l’Archéologie et du Patriome), Simon Mboya and Sharon Manura (National Museums of Kenya) and Curtis Marean (Mossel Bay Archaeological Project) for their roles in facilitating the project. LT would like to thank Paloma de la Peña, Amy Way and Christian Hoggard for their continued collaboration. We would also like to thank Todd VanPool and an anonymous reviewer for their constructive feedback during the review of this article.

Funding

This project was supported by funding awarded to LT by the Leakey Foundation (Movement, interaction, and structure: modelling population networks and cultural diversity in the African Middle Stone Age), the Wenner Gren Foundation (Gr. 10157) and the Lithic Studies Society (Jacobi Bursary, 2020).

Author information

Authors and Affiliations

Department of Archaeology, Classics and Egyptology, University of Liverpool, Liverpool, UK
Lucy Timbrell, Christopher Scott & Matt Grove
Authority for Research and Conservation of Cultural Heritage, National Museum of Ethiopia, Addis Ababa, Ethiopia
Behailu Habte & Yosef Tefera
Département Homme Et Environnement, Musée de L’Homme, Paris, France
Hélène Monod
Institut National Des Sciences de L’Archéologie Et du Patrimoine, Rabat, Morocco
Mouna Qazzih
Archaeology Unit, Iziko Museums of South Africa, Cape Town, South Africa
Benjamin Marais & Wendy Black
Department of Archaeology, National Museums of Kenya, Nairobi, Kenya
Christine Maroma & Emmanuel Ndiema
Department of Archaeology, Max Planck Institute for the Science of Human History, Jena, Germany
Emmanuel Ndiema
Mossel Bay Archaeological Project, Western Cape Province, Cape Town, South Africa
Struan Henderson & Katherine Elmes
Archaeological Studies Program, University of the Philippines, Quezon City, Philippines
Kimberly Plomp
Department of Archaeology, Simon Fraser University, Burnaby, British Colombia, Canada
Kimberly Plomp

Authors

Lucy Timbrell
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Scott
View author publications
You can also search for this author in PubMed Google Scholar
Behailu Habte
View author publications
You can also search for this author in PubMed Google Scholar
Yosef Tefera
View author publications
You can also search for this author in PubMed Google Scholar
Hélène Monod
View author publications
You can also search for this author in PubMed Google Scholar
Mouna Qazzih
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Marais
View author publications
You can also search for this author in PubMed Google Scholar
Wendy Black
View author publications
You can also search for this author in PubMed Google Scholar
Christine Maroma
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Ndiema
View author publications
You can also search for this author in PubMed Google Scholar
Struan Henderson
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Elmes
View author publications
You can also search for this author in PubMed Google Scholar
Kimberly Plomp
View author publications
You can also search for this author in PubMed Google Scholar
Matt Grove
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Funding was acquired by LT. CS and LT performed the material preparation. Data collection was carried out by HM, MQ, BM, CM, BH, YT, SH, KE and LT, under the supervision of EM, WB, KP and MG. LT and MG performed data processing and analysis. The first draft of the manuscript was written by LT and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lucy Timbrell.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflicts of interest

The authors declare no competing interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 720 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Timbrell, L., Scott, C., Habte, B. et al. Testing inter-observer error under a collaborative research framework for studying lithic shape variability. Archaeol Anthropol Sci 14, 209 (2022). https://doi.org/10.1007/s12520-022-01676-2

Download citation

Received: 13 June 2022
Accepted: 20 September 2022
Published: 01 October 2022
DOI: https://doi.org/10.1007/s12520-022-01676-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Testing inter-observer error under a collaborative research framework for studying lithic shape variability

Abstract

Similar content being viewed by others

Evaluating the effects of parallax in archaeological geometric morphometric analyses

Interrogating Random and Systematic Measurement Error in Morphometric Data

A pan-European dataset revealing variability in lithic technology, toolkits, and artefact shapes ~15-11 kya

Introduction

Materials