The paper discusses several reliability measures: Scott’s pi, Krippendorff’s alpha, free marginal adjustment (Bennett, Alpert and Goldstein’s \(S\)), Cohen’s kappa, and Perreault and Leigh’s \(I\) and the assumptions on which they are based. It is suggested that correlation coefficients between, on one hand, the distribution of qualitative codes and, on the other hand, word co-occurrences and the distribution of the categories identified with the help of the dictionary based on substitution complement the other reliability measures. The paper shows that the choice of the reliability measure depends on the format of the text (stylistic versus rhetorical) and the type of reading (comprehension versus interpretation). Namely, Cohen’s kappa and Bennett, Alpert and Goldstein’s \(S\) emerge as reliability measures particularly suited for perspectival reading of rhetorical texts. Outcomes of the content analysis of 57 texts performed by four coders with the help of computer program QDA Miner inform the analysis.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
A similar assumption underpins the use of Cronbach’s \(\alpha \) in the cultural consensus theory. The agreement between coders presumably depends on how well they know the content of a cultural domain that exists independently of their input (Weller 2007, p. 343).
This assumption also serves to minimize the influence of the coders’ values on the outcomes of content analysis. If the content analysis is not value-free, then the coders have fewer chances to agree on the distribution of the categories. The possibility theorem, which is applicable to choices guided by values, states that “for any method of deriving social choices by aggregating individual preference patterns which satisfies certain natural conditions, it is possible to find individual preference patterns which give rise to a social choice pattern which is not linear ordering” (Arrow 1950, p. 330).
This rationale should not be confused with another, more “positivist” argument advanced by Krippendorff (2004a, p. 249): “we must estimate the distribution of categories in the population of phenomena from the judgments of as many observers as possible (at least two), making the common assumption that observer differences wash out in their average”.
Being a recent university graduate, the fourth co-author has not produced enough publications yet. She played the role of a “perfect reader” whose take on a text is not affected by the authorship of the other texts included in the sample.
When assessing this level of inter-coder agreement, one has to bear in mind that it reflects both the reliability of unitizing and the reliability of coding.
The dictionary based on substitution was subject to small edits only at the third stage.
The code book for analyzing the first co-author’s texts contained 13 codes, in the case of the second co-author it contained 15 codes, and in the case of the third—nine codes.
The distribution of the reliability measures was visually inspected prior to correlation analysis. This “eyeballing” suggested that the normality of distribution condition was not significantly violated.
Arrow, K.J.: A difficulty in the concept of social welfare. J. Polit. Econ. 58(4), 328–346 (1950)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistic. Comput. Linguist. 34(4), 555–596 (2008)
Bennett, E., Alpert, R., Goldstein, A.C.: Communications through limited-response questioning. Public Opin. Quart. 18(3), 303–308 (1954)
Bryman, A., Bell, E., Teevan, J.J.: Social Research Methods, 3rd edn. Oxford University Press, Don Mills (2012)
Camp, S.D., Saylor, W.G., Harer, M.D.: Aggregating individual-level evaluations of the organizational social climate: a multilevel investigation of the work environment at the Federal bureau of prisons. Justice Q. 14(4), 739–762 (1997)
Dijkstra, L., van Eijnatten, F.M.: Agreement and consensus in a Q-mode research design: an empirical comparison of measures, and an application. Qual. Quant. 43(5), 757–771 (2009)
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007)
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE, Thousand Oaks (2004a)
Krippendorff, K.: Measuring the reliability of qualitative text analysis data. Qual. Quant. 38(6), 787–800 (2004b)
Lotman, Y.: Universe of the Mind: A Semiotic Theory of Culture. Indiana University Press, Bloomington (1990)
Muñoz-Leiva, F., Montoro-Ríos, F.J., Luque-Martínez, T.: Assessment of interjudge reliability in the open-ended questions coding process. Qual. Quant. 40(4), 519–537 (2006)
Neuendorf, K.A.: The Content Analysis Guidebook. SAGE, Thousand Oaks (2002)
Norris, S.P., Philips, L.M.: The relevance of a reader’s knowledge within a perspectival view of reading. J. Read. Behav. 26(4), 391–412 (1994)
Oleinik, A.: Mixing quantitative and qualitative content analysis: triangulation at work. Qual. Quant. 45(4), 859–873 (2010)
Oleinik, A., Kirdina S., Popova I., Shatalova T.: Kak uchenye chitayut drug druga: osnova teorii akademicheskogo chteniya [How scientists read: on a theory of academic reading]. SOCIS 8 (2013)
Perreault, W.D., Leigh, L.E.: Reliability of nominal data based on qualitative judgments. J. Mark. Res. 26(2), 135–148 (1989)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)
Scott, W.A.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19(3), 321–325 (1955)
Siegel, S., Castellan, N.J.: Nonparametric Statistics for the Behavioural Sciences. McGraw Hill, New York (1988)
Skinner, Q.: Visions of Politics. Cambridge University Press, Cambridge (2002)
Warner, R.M.: Applied Statistics. SAGE, Thousand Oaks (2008)
Weller, S.C.: Cultural consensus theory: applications and frequently asked questions. Field Methods 19(4), 339–368 (2007)
The authors would like to thank the anonymous reviewers of Quality & Quantity for their helpful and constructive suggestions and comments. However, all remaining errors and inaccuracies are solely attributable to the authors.
About this article
Cite this article
Oleinik, A., Popova, I., Kirdina, S. et al. On the choice of measures of reliability and validity in the content-analysis of texts. Qual Quant 48, 2703–2718 (2014). https://doi.org/10.1007/s11135-013-9919-0
- Reliability measures
- Content analysis
- Correlation analysis
- Stylistic texts
- Rhetorical texts