Abstract
Inter- and intra-rater reliability studies in experimental archaeology promote consistency and replicability in the lithic analysis methods that are applied to interpretations of the archaeological record. Replication attempts to classify a knapper’s hand preference post-hoc using published methodologies that focus on right- and left-oriented flake features, have been largely unsuccessful. We tested the validity of flake feature categories described in three studies to be useful for determining a knapper’s hand preference (Bargalló and Mosquera, Laterality, 19(1), 37–63, 2014; Dominguez-Ballesteros and Arrizabalaga, Journal of Archaeological Science: Reports, 3, 313–320, 2015; Rugg and Mullane, Laterality, 6(3), 247–259, 2001). Five experienced lithic analysts independently made blind predictions of knapper hand preference on an experimental assemblage of mode I flakes produced by 18 knappers (9 left-handed), which included 344 complete flakes from 43 knapped cores. Inter-rater reliability measures (using Fleiss’ Kappa) showed significant agreement between raters for only one of the features (eraillure scar), with fair agreement for impact point, and poor agreement for the other features (cone of percussion, hackles, ripples, extraction axis, and platform inclination); poor agreement was found even within raters. Chi squared tests and correspondence analyses show that raters fail to perform significantly better than chance at predicting hand preference. These results suggest not only that these flake features are unreliable predictors of a knapper’s hand preference, but also that most of these features do not represent objective categories. We therefore urge caution in applying these methods to archaeological assemblages pending further independent replication.
Similar content being viewed by others
Notes
Using the feature frequency table provided by Bargalló and Mosquera (2014) for their experimental sample, the frequency of non-skewed features (distal or not present) is as follows: 50% for ridge, 77% for eraillure scar, 64% for hackles, 60% for ripples, 46% for extraction axis, 42% for impact point, and 55% for platform inclination. Within our data set, at least one rater shows roughly the same frequencies for each feature, with many of our frequencies differing less than 5% from the original study. Furthermore, the frequencies of skewed flake features from the archaeological assemblages in Bargalló et al. (2017) differ from their experimental assemblage frequencies by as much as 27%. Finally, specifically for the ridge on the cone of percussion, Rugg and Mullane (2001) were only able to find skewed features in 25% of their sample, suggesting that wide variation in these feature frequencies is a non-issue compared to other topics, such as subjectivity in categorical lithic analysis.
References
Bargalló, A., & Mosquera, M. (2014). Can hand laterality be identified through lithic technology? Laterality, 19(1), 37–63.
Bargalló, A., Geribàs, N., & Mosquera, M. (2013). Programa Experimental para Identificar la Lateralidad Manual a Partir de la Tecnología Lítica y la Distribución Espacial de los Restos. In A. Palomo, R. Piqué, & X. Terradas (Eds.), Experimentación en Arqueología: Estudio y Difusión del Pasado (pp. 161–175). Archaeology Museum of Catalonia: Barcelona.
Bargalló, A., Mosquera, M., & Lozano, S. (2017). In pursuit of our ancestors’ hand laterality. Journal of Human Evolution, 111, 18–32.
Bargalló, A., Mosquera, M., & Lorenzo, C. (2018). Identifying handedness at knapping; an analysis of the scatter pattern of lithic remains. Archaeological and Anthropological Sciences, 10(3), 587–598.
Beck, C., & Jones, G. T. (1989). Bias and archaeological classification. American Antiquity, 54(2), 244–262.
Bingham, P., & Mcnabb, J. (2013). How reliable are traditional methods of assessing handaxes? Lithics, 34, 5–13.
Brennan, P., & Silman, A. (1992). Statistical methods for assessing observer variability in clinical measures. BMJ, 304(6840), 1491–1494.
Byrne, F., Proffitt, T., Arroyo, A., & de la Torre, I. (2016). A comparative analysis of bipolar and freehand experimental knapping products from Olduvai Gorge, Tanzania. Quaternary International, 424, 58–68.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Cueva-Temprana, A., Lombao, D., Morales, J. I., Geribàs, N., & Mosquera, M. (2019). Gestures during knapping: a two-perspective approach to Pleistocene Technologies. Lithic Technology, 44(2), 74–89. https://doi.org/10.1080/01977261.2019.1587255.
Daniel, C., Putt, S. S., Franciscus, R. G. (2016). Investigating other causes for stone flake features attributed to handedness. Poster presented at 81st Annual Meeting of the Society for American Archaeology, Orlando, FL.
Dominguez–Ballesteros, E., & Arrizabalaga, A. (2015). Flint knapping and determination of human handedness: methodological proposal with quantifiable results. Journal of Archaeological Science: Reports, 3, 313–320.
Fish, P. R. (1978). Consistency in archaeological measurement and classification: a pilot study. American Antiquity, 43(01), 86–89.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Fletcher, J. P., & Bandy, W. D. (2008). Intrarater reliability of CROM measurement of cervical spine active range of motion in persons with and without neck pain. Journal of Orthopaedic & Sports Physical Therapy, 38(10), 640–652.
Florence, J. M., Pandya, S., King, W. M., Robison, J. D., Baty, J., Miller, J. P., Schlerbecker, J., & Signore, L. C. (1992). Intrarater reliability of manual muscle test (Medical Research Council Scale) Grades in Duchenne’s Muscular Dystrophy. Physical Therapy, 72(2), 115–127.
Freeman, H. D., & Gosling, S. D. (2010). Personality in nonhuman primates: a review and evaluation of past research. American Journal of Primatology, 72(8), 653–671.
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2012). irr: Various coefficients of interrater reliability and agreement. R package version 0.84. https://CRAN.R-project.org/package=irr. Accessed 15 Aug 2018.
Geribas, N., Mosquera, M., & Verges, J. M. (2010). The gesture substratum of stone tool making: an experimental approach. Annali Dell’Universita Di Ferrara Museologia Scientifica e Naturalistica, 6, 155–162.
Gill, M. R., Reiley, D. G., & Green, S. M. (2004). Interrater reliability of Glasgow Coma Scale Scores in the Emergency Department. Annals of Emergency Medicine, 43(2), 215–224.
Gnaden, D., & Holdaway, S. (2000). Understanding observer variation when recording stone artifacts. American Antiquity, 65(4), 739–747.
Hammer, Ø., Harper, D. A. T., & Ryan, P. D. (2001). PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica, 4(1), 1–9.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical Data. Biometrics, 33(1), 159–174.
Lobbestael, J., Leurgans, M., & Arntz, A. (2011). Inter-rater reliability of the structured clinical interview for DSM-IV Axis I disorders (SCID I) and Axis II disorders (SCID II). Clinical Psychology & Psychotherapy, 18(1), 75–79.
Lozano, M., Mosquera, M., Bermúdez de Castro, J. M., Arsuaga, J. L., & Carbonell, E. (2009). Right handedness of Homo heidelbergensis from Sima de los Huesos Atapuerca, Spain 500,000 years ago. Evolution of Human Behavior, 30(5), 369–376.
Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia, 9(1), 97–113.
Patterson, L., & Sollberger, J. (1986). Comments on Toth’s right-handedness study. Lithic Technology, 15(3), 109–111.
Pobiner, B. (1999). The use of stone tools to determine handedness in hominids. Current Anthropology, 40(1), 90–92.
Poza-Rey, E. M., Lozano, M., & Arsuaga, J. (2017). Brain asymmetries and handedness in the specimens from the Sima de los Huesos Site (Atapuerca, Spain). Quaternary International, 433(A), 32–44.
Proffitt, T., & de la Torre, I. (2014). The effect of raw material on inter-analyst variation and analyst accuracy for lithic analysis: a case study from Olduvai Gorge. Journal of Archaeological Science, 45(1), 270–283.
Putt, S. S. (2016). Human brain activity during stone tool production: tracing the evolution of cognition and language. Ph.D. Dissertation, University of Iowa, IA, USA.
Putt, S. S., Wijeakumar, S., Franciscus, R. G., & Spencer, J. P. (2017). The functional brain networks that underlie Early Stone Age tool manufacture. Nature Human Behaviour, 1(6), 0102.
R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing https://www.R-project.org/. Accessed 15 Aug 2018
Rein, R., Nonaka, T., & Bril, B. (2014). Movement Pattern Variability in Stone Knapping: Implications for the Development of Percussive Traditions. PLoS ONE, 9(11), e113567. https://doi.org/10.1371/journal.pone.0113567.
Ruck, L. (2014). Experimental Archaeology and Hominid Evolution: Establishing a Methodology for Determining Handedness in Lithic Materials as a Proxy for Cognitive Evolution. M.A. Thesis, Florida Atlantic University, FL, USA.
Ruck, L., Broadfield, D. C., & Brown, C. T. (2015). Determining hominid handedness in lithic debitage: a review of current methodologies. Lithic Technology, 40(3), 171–188.
Rugg, G., & Mullane, M. (2001). Inferring handedness from lithic evidence. Laterality, 6(3), 247–259.
Toth, N. (1985). Archaeological evidence for preferential right-handedness in the lower and Middle Pleistocene, and its possible implications. Journal of Human Evolution, 14(6), 607–614.
Trinkaus, E., Churchill, S. E., & Ruff, C. B. (1994). Postcranial robusticity in Homo. II: humeral bilateral asymmetry and bone plasticity. American Journal of Physical Anthropology, 93(1), 1–34.
Uomini, N. T. (2001). Lithic indications of handedness: Assessment of methodologies and the evolution of laterality in hominids. M.Sc. Dissertation, University of Durham, UK.
Uomini, N. T. (2006). In the knapper’s hands: Testing markers of laterality in hominin lithic production, with reference to the common substrate of language and handedness. Ph.D. Dissertation, University of Southampton, UK
Uomini, N. T., & Ruck, L. (2018). Manual laterality and cognition through evolution: an archeological perspective. Progress in Brain Research, 238, 295–323.
Vicente-Rodríguez, G., Rey-López, J. P., Mesana, M. I., Poortvliet, E., Ortega, F. B., Polito, A., Nagy, E., Widhalm, K., Sjöström, M., Moreno, L. A., & HELENA Study Group. (2012). Reliability and intermethod agreement for body fat assessment among two field and two laboratory methods in adolescents. Obesity, 20(1), 221–229.
Walrath, D. E., Turner, P., & Bruzek, J. (2004). Reliability test of the visual assessment of cranial traits for sex determination. American Journal of Physical Anthropology, 125(2), 132–137.
Whittaker, J. C., Caulkins, D., Kamp, K. A., Journal, S., & Jun, N. (1998). Evaluating Consistency in Typology and Classification. Journal of Archaeological Method and Theory, 5(2), 129–164.
Williams, E. M., Gordon, A. D., & Richmond, B. G. (2010). Upper limb kinematics and the role of the wrist during stone tool production. American Journal of Physical Anthropology, 143(1), 134–145. https://doi.org/10.1002/ajpa.21302.
Woods, S. P., Rippeth, J. D., Frol, A. B., Levy, J. K., Ryan, E., Soukup, V. M., Hinkin, C. H., Lazzaretto, D., Cherner, M., Marcotte, T. D., Gelman, B. B., Morgello, S., Singer, E. J., Grant, I., & Heaton, R. K. (2004). Interrater reliability of clinical ratings and neurocognitive diagnoses in HIV. Journal of Clinical and Experimental Neuropsychology, 26(6), 759–778.
Acknowledgments
We thank D. Jones for her assistance in the lab, and P. T. Schoenemann for his helpful comments on this paper.
Funding
Funding for this study was provided by the Iowa Center for Research by Undergraduates, the Dewey Stuit Fund for Undergraduates, and the John Templeton Foundation (Grant No. 52935).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(XLSX 190 kb)
Rights and permissions
About this article
Cite this article
Ruck, L., Holden, C., Putt, S.S.J. et al. Inter- and Intra-rater Reliability in Lithic Analysis: a Case Study in Handedness Determination Methodologies. J Archaeol Method Theory 27, 220–244 (2020). https://doi.org/10.1007/s10816-019-09424-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10816-019-09424-y