Skip to main content
Log in

Inter- and Intra-rater Reliability in Lithic Analysis: a Case Study in Handedness Determination Methodologies

  • Published:
Journal of Archaeological Method and Theory Aims and scope Submit manuscript

Abstract

Inter- and intra-rater reliability studies in experimental archaeology promote consistency and replicability in the lithic analysis methods that are applied to interpretations of the archaeological record. Replication attempts to classify a knapper’s hand preference post-hoc using published methodologies that focus on right- and left-oriented flake features, have been largely unsuccessful. We tested the validity of flake feature categories described in three studies to be useful for determining a knapper’s hand preference (Bargalló and Mosquera, Laterality, 19(1), 37–63, 2014; Dominguez-Ballesteros and Arrizabalaga, Journal of Archaeological Science: Reports, 3, 313–320, 2015; Rugg and Mullane, Laterality, 6(3), 247–259, 2001). Five experienced lithic analysts independently made blind predictions of knapper hand preference on an experimental assemblage of mode I flakes produced by 18 knappers (9 left-handed), which included 344 complete flakes from 43 knapped cores. Inter-rater reliability measures (using Fleiss’ Kappa) showed significant agreement between raters for only one of the features (eraillure scar), with fair agreement for impact point, and poor agreement for the other features (cone of percussion, hackles, ripples, extraction axis, and platform inclination); poor agreement was found even within raters. Chi squared tests and correspondence analyses show that raters fail to perform significantly better than chance at predicting hand preference. These results suggest not only that these flake features are unreliable predictors of a knapper’s hand preference, but also that most of these features do not represent objective categories. We therefore urge caution in applying these methods to archaeological assemblages pending further independent replication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Using the feature frequency table provided by Bargalló and Mosquera (2014) for their experimental sample, the frequency of non-skewed features (distal or not present) is as follows: 50% for ridge, 77% for eraillure scar, 64% for hackles, 60% for ripples, 46% for extraction axis, 42% for impact point, and 55% for platform inclination. Within our data set, at least one rater shows roughly the same frequencies for each feature, with many of our frequencies differing less than 5% from the original study. Furthermore, the frequencies of skewed flake features from the archaeological assemblages in Bargalló et al. (2017) differ from their experimental assemblage frequencies by as much as 27%. Finally, specifically for the ridge on the cone of percussion, Rugg and Mullane (2001) were only able to find skewed features in 25% of their sample, suggesting that wide variation in these feature frequencies is a non-issue compared to other topics, such as subjectivity in categorical lithic analysis.

References

  • Bargalló, A., & Mosquera, M. (2014). Can hand laterality be identified through lithic technology? Laterality, 19(1), 37–63.

    Article  Google Scholar 

  • Bargalló, A., Geribàs, N., & Mosquera, M. (2013). Programa Experimental para Identificar la Lateralidad Manual a Partir de la Tecnología Lítica y la Distribución Espacial de los Restos. In A. Palomo, R. Piqué, & X. Terradas (Eds.), Experimentación en Arqueología: Estudio y Difusión del Pasado (pp. 161–175). Archaeology Museum of Catalonia: Barcelona.

    Google Scholar 

  • Bargalló, A., Mosquera, M., & Lozano, S. (2017). In pursuit of our ancestors’ hand laterality. Journal of Human Evolution, 111, 18–32.

    Article  Google Scholar 

  • Bargalló, A., Mosquera, M., & Lorenzo, C. (2018). Identifying handedness at knapping; an analysis of the scatter pattern of lithic remains. Archaeological and Anthropological Sciences, 10(3), 587–598.

    Article  Google Scholar 

  • Beck, C., & Jones, G. T. (1989). Bias and archaeological classification. American Antiquity, 54(2), 244–262.

    Article  Google Scholar 

  • Bingham, P., & Mcnabb, J. (2013). How reliable are traditional methods of assessing handaxes? Lithics, 34, 5–13.

    Google Scholar 

  • Brennan, P., & Silman, A. (1992). Statistical methods for assessing observer variability in clinical measures. BMJ, 304(6840), 1491–1494.

    Article  Google Scholar 

  • Byrne, F., Proffitt, T., Arroyo, A., & de la Torre, I. (2016). A comparative analysis of bipolar and freehand experimental knapping products from Olduvai Gorge, Tanzania. Quaternary International, 424, 58–68.

    Article  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Cueva-Temprana, A., Lombao, D., Morales, J. I., Geribàs, N., & Mosquera, M. (2019). Gestures during knapping: a two-perspective approach to Pleistocene Technologies. Lithic Technology, 44(2), 74–89. https://doi.org/10.1080/01977261.2019.1587255.

    Article  Google Scholar 

  • Daniel, C., Putt, S. S., Franciscus, R. G. (2016). Investigating other causes for stone flake features attributed to handedness. Poster presented at 81st Annual Meeting of the Society for American Archaeology, Orlando, FL.

  • Dominguez–Ballesteros, E., & Arrizabalaga, A. (2015). Flint knapping and determination of human handedness: methodological proposal with quantifiable results. Journal of Archaeological Science: Reports, 3, 313–320.

  • Fish, P. R. (1978). Consistency in archaeological measurement and classification: a pilot study. American Antiquity, 43(01), 86–89.

    Article  Google Scholar 

  • Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.

    Article  Google Scholar 

  • Fletcher, J. P., & Bandy, W. D. (2008). Intrarater reliability of CROM measurement of cervical spine active range of motion in persons with and without neck pain. Journal of Orthopaedic & Sports Physical Therapy, 38(10), 640–652.

    Article  Google Scholar 

  • Florence, J. M., Pandya, S., King, W. M., Robison, J. D., Baty, J., Miller, J. P., Schlerbecker, J., & Signore, L. C. (1992). Intrarater reliability of manual muscle test (Medical Research Council Scale) Grades in Duchenne’s Muscular Dystrophy. Physical Therapy, 72(2), 115–127.

    Article  Google Scholar 

  • Freeman, H. D., & Gosling, S. D. (2010). Personality in nonhuman primates: a review and evaluation of past research. American Journal of Primatology, 72(8), 653–671.

    Article  Google Scholar 

  • Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2012). irr: Various coefficients of interrater reliability and agreement. R package version 0.84. https://CRAN.R-project.org/package=irr. Accessed 15 Aug 2018.

  • Geribas, N., Mosquera, M., & Verges, J. M. (2010). The gesture substratum of stone tool making: an experimental approach. Annali Dell’Universita Di Ferrara Museologia Scientifica e Naturalistica, 6, 155–162.

    Google Scholar 

  • Gill, M. R., Reiley, D. G., & Green, S. M. (2004). Interrater reliability of Glasgow Coma Scale Scores in the Emergency Department. Annals of Emergency Medicine, 43(2), 215–224.

    Article  Google Scholar 

  • Gnaden, D., & Holdaway, S. (2000). Understanding observer variation when recording stone artifacts. American Antiquity, 65(4), 739–747.

    Article  Google Scholar 

  • Hammer, Ø., Harper, D. A. T., & Ryan, P. D. (2001). PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica, 4(1), 1–9.

    Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical Data. Biometrics, 33(1), 159–174.

    Article  Google Scholar 

  • Lobbestael, J., Leurgans, M., & Arntz, A. (2011). Inter-rater reliability of the structured clinical interview for DSM-IV Axis I disorders (SCID I) and Axis II disorders (SCID II). Clinical Psychology & Psychotherapy, 18(1), 75–79.

    Article  Google Scholar 

  • Lozano, M., Mosquera, M., Bermúdez de Castro, J. M., Arsuaga, J. L., & Carbonell, E. (2009). Right handedness of Homo heidelbergensis from Sima de los Huesos Atapuerca, Spain 500,000 years ago. Evolution of Human Behavior, 30(5), 369–376.

    Article  Google Scholar 

  • Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia, 9(1), 97–113.

    Article  Google Scholar 

  • Patterson, L., & Sollberger, J. (1986). Comments on Toth’s right-handedness study. Lithic Technology, 15(3), 109–111.

    Article  Google Scholar 

  • Pobiner, B. (1999). The use of stone tools to determine handedness in hominids. Current Anthropology, 40(1), 90–92.

    Article  Google Scholar 

  • Poza-Rey, E. M., Lozano, M., & Arsuaga, J. (2017). Brain asymmetries and handedness in the specimens from the Sima de los Huesos Site (Atapuerca, Spain). Quaternary International, 433(A), 32–44.

    Article  Google Scholar 

  • Proffitt, T., & de la Torre, I. (2014). The effect of raw material on inter-analyst variation and analyst accuracy for lithic analysis: a case study from Olduvai Gorge. Journal of Archaeological Science, 45(1), 270–283.

    Article  Google Scholar 

  • Putt, S. S. (2016). Human brain activity during stone tool production: tracing the evolution of cognition and language. Ph.D. Dissertation, University of Iowa, IA, USA.

  • Putt, S. S., Wijeakumar, S., Franciscus, R. G., & Spencer, J. P. (2017). The functional brain networks that underlie Early Stone Age tool manufacture. Nature Human Behaviour, 1(6), 0102.

    Article  Google Scholar 

  • R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing https://www.R-project.org/. Accessed 15 Aug 2018

  • Rein, R., Nonaka, T., & Bril, B. (2014). Movement Pattern Variability in Stone Knapping: Implications for the Development of Percussive Traditions. PLoS ONE, 9(11), e113567. https://doi.org/10.1371/journal.pone.0113567.

    Article  Google Scholar 

  • Ruck, L. (2014). Experimental Archaeology and Hominid Evolution: Establishing a Methodology for Determining Handedness in Lithic Materials as a Proxy for Cognitive Evolution. M.A. Thesis, Florida Atlantic University, FL, USA.

  • Ruck, L., Broadfield, D. C., & Brown, C. T. (2015). Determining hominid handedness in lithic debitage: a review of current methodologies. Lithic Technology, 40(3), 171–188.

    Article  Google Scholar 

  • Rugg, G., & Mullane, M. (2001). Inferring handedness from lithic evidence. Laterality, 6(3), 247–259.

    Article  Google Scholar 

  • Toth, N. (1985). Archaeological evidence for preferential right-handedness in the lower and Middle Pleistocene, and its possible implications. Journal of Human Evolution, 14(6), 607–614.

    Article  Google Scholar 

  • Trinkaus, E., Churchill, S. E., & Ruff, C. B. (1994). Postcranial robusticity in Homo. II: humeral bilateral asymmetry and bone plasticity. American Journal of Physical Anthropology, 93(1), 1–34.

    Article  Google Scholar 

  • Uomini, N. T. (2001). Lithic indications of handedness: Assessment of methodologies and the evolution of laterality in hominids. M.Sc. Dissertation, University of Durham, UK.

  • Uomini, N. T. (2006). In the knapper’s hands: Testing markers of laterality in hominin lithic production, with reference to the common substrate of language and handedness. Ph.D. Dissertation, University of Southampton, UK

  • Uomini, N. T., & Ruck, L. (2018). Manual laterality and cognition through evolution: an archeological perspective. Progress in Brain Research, 238, 295–323.

    Article  Google Scholar 

  • Vicente-Rodríguez, G., Rey-López, J. P., Mesana, M. I., Poortvliet, E., Ortega, F. B., Polito, A., Nagy, E., Widhalm, K., Sjöström, M., Moreno, L. A., & HELENA Study Group. (2012). Reliability and intermethod agreement for body fat assessment among two field and two laboratory methods in adolescents. Obesity, 20(1), 221–229.

    Article  Google Scholar 

  • Walrath, D. E., Turner, P., & Bruzek, J. (2004). Reliability test of the visual assessment of cranial traits for sex determination. American Journal of Physical Anthropology, 125(2), 132–137.

    Article  Google Scholar 

  • Whittaker, J. C., Caulkins, D., Kamp, K. A., Journal, S., & Jun, N. (1998). Evaluating Consistency in Typology and Classification. Journal of Archaeological Method and Theory, 5(2), 129–164.

    Article  Google Scholar 

  • Williams, E. M., Gordon, A. D., & Richmond, B. G. (2010). Upper limb kinematics and the role of the wrist during stone tool production. American Journal of Physical Anthropology, 143(1), 134–145. https://doi.org/10.1002/ajpa.21302.

    Article  Google Scholar 

  • Woods, S. P., Rippeth, J. D., Frol, A. B., Levy, J. K., Ryan, E., Soukup, V. M., Hinkin, C. H., Lazzaretto, D., Cherner, M., Marcotte, T. D., Gelman, B. B., Morgello, S., Singer, E. J., Grant, I., & Heaton, R. K. (2004). Interrater reliability of clinical ratings and neurocognitive diagnoses in HIV. Journal of Clinical and Experimental Neuropsychology, 26(6), 759–778.

    Article  Google Scholar 

Download references

Acknowledgments

We thank D. Jones for her assistance in the lab, and P. T. Schoenemann for his helpful comments on this paper.

Funding

Funding for this study was provided by the Iowa Center for Research by Undergraduates, the Dewey Stuit Fund for Undergraduates, and the John Templeton Foundation (Grant No. 52935).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lana Ruck.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(XLSX 190 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruck, L., Holden, C., Putt, S.S.J. et al. Inter- and Intra-rater Reliability in Lithic Analysis: a Case Study in Handedness Determination Methodologies. J Archaeol Method Theory 27, 220–244 (2020). https://doi.org/10.1007/s10816-019-09424-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10816-019-09424-y

Keywords

Navigation