Abstract
Reliable measurements are fundamental for the empirical sciences. In observational research, measurements often consist of observers categorizing behavior into nominal-scaled units. Since the categorization is the outcome of a complex judgment process, it is important to evaluate the extent to which these judgments are reproducible, by having multiple observers independently rate the same behavior. A challenge in determining interrater agreement for timed-event sequential data is to develop clear objective criteria to determine whether two raters’ judgments relate to the same event (the linking problem). Furthermore, many studies presently report only raw agreement indices, without considering the degree to which agreement can occur by chance alone. Here, we present a novel, free, and open-source toolbox (EasyDIAg) designed to assist researchers with the linking problem, while also providing chance-corrected estimates of interrater agreement. Additional tools are included to facilitate the development of coding schemes and rater training.
Similar content being viewed by others
Notes
It is, of course, also possible to analyze annotations of audio clips.
A McNemar Test for 2 × 2 contingency tables is provided by a freely available Excel®-Worksheet (Mackinnon, 2000; available online at www.mhri.edu.au/biostats/DAG_Stat).
References
Bakeman, R., & Quera, V. (1992). SDIS: A sequential data interchange standard. Behavior Research Methods, Instruments, & Computers, 24, 554–559. doi:10.3758/BF03203604
Bakeman, R., & Quera, V. (2011). Sequential analysis and observational methods for the behavioral sciences. New York: Cambridge University Press.
Bakeman, R., Quera, V., & Gnisci, A. (2009). Observer agreement for timed-event sequential data: A comparison of time-based and event-based algorithms. Behavior Research Methods, 41, 137–147. doi:10.3758/brm.41.1.137
Bakeman, R., & Robinson, B. F. (1994). Understanding log-linear analysis with ILOG: An interactive approach. Hillsdale, NJ: Erlbaum.
Bavelas, J. B., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing on the telephone: Independent effects of dialogue and visibility. Journal of Memory and Language, 58, 495–520.
Bavelas, J. B., Kenwood, C., & Phillips, B. (2002). Discourse analysis. In M. Knapp & M. Daly (Eds.), Handbook of interpersonal communication (3rd ed., pp. 102–129). Thousand Oaks, CA: Sage.
Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Deming, W. E., & Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11, 427–444. doi:10.2307/2235722
Dijkstra, W., & Taris, T. (1995). Measuring the agreement between sequences. Sociological Methods & Research, 24, 214–231. doi:10.1177/0049124195024002004
Haccou, P., & Meelis, E. (1994). Statistical analysis of behavioural data: An approach based on time-structured models. Oxford: Oxford University Press.
Holle, H., & Rein, R. (2013). The modified Cohen’s kappa: Calculating interrater agreement for segmentation and annotation. In H. Lausberg (Ed.), Understanding body movement: A guide to empirical research on nonverbal behaviour (With an introduction to the NEUROGES coding system, pp. 261–275). Frankfurt am Main: Peter Lang Verlag.
Jansen, R. G., Wiertz, L. F., Meyer, E. S., & Noldus, L. P. (2003). Reliability analysis of observational data: Problems, solutions, and software implementation. Behavior Research Methods, Instruments, and Computers, 35, 391–399.
Kaufman, A. B., & Rosenthal, R. (2009). Can you believe my eyes? The importance of interobserver reliability statistics in observations of animal behaviour. Animal Behaviour, 78, 1487–1491. doi:10.1016/j.anbehav.2009.09.014
Kaufmann, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Lausberg, H. (2013). Understanding body movement: A guide to empirical research on nonverbal behaviour (With an introduction to the NEUROGES coding system). Frankfurt am Main: Peter Lang Verlag.
Lausberg, H., & Sloetjes, H. (2009). Coding gestural behavior with the NEUROGES–ELAN system. Behavior Research Methods, 41, 841–849. doi:10.3758/brm.41.3.841
Mackinnon, A. (2000). A spreadsheet for the calculation of comprehensive statistics for the assessment of diagnostic tests and inter-rater agreement. Computers in Biology and Medicine, 30, 127–134.
McNeill, D. (1992). Hand and mind:What gestures reveal about thought. Chicago: University of Chicago Press.
Quera, V., Bakeman, R., & Gnisci, A. (2007). Observer agreement for event sequences: Methods and software for sequence alignment and reliability estimates. Behavior Research Methods, 39, 39–49.
Rein, R. (2013). Using 3D kinematics of hand segments for segmentation of gestures: A pilot study. In H. Lausberg (Ed.), Understanding body movement: A guide to empirical research on nonverbal behavior (With an introduction to the NEUROGES coding system, pp. 163–187). Frankfurt am Main: Peter Lang Verlag.
Acknowledgments
We are grateful to Hedda Lausberg for sharing the annotation data. The acquisition of this data set was supported by a grant awarded to H.L. from the DFG (LA 1249/2-1). The EasyDIAg toolbox described in the article can be downloaded free of charge from http://sourceforge.net/projects/easydiag/.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 24 kb)
Rights and permissions
About this article
Cite this article
Holle, H., Rein, R. EasyDIAg: A tool for easy determination of interrater agreement. Behav Res 47, 837–847 (2015). https://doi.org/10.3758/s13428-014-0506-7
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-014-0506-7