Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

Martín Andrés, A.; Álvarez Hernández, M.

doi:10.1007/s11634-024-00581-x

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

Regular Article
Open access
Published: 06 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

Download PDF

505 Accesses
Explore all metrics

Abstract

To measure the degree of agreement between R observers who independently classify n subjects within K categories, various kappa-type coefficients are often used. When R = 2, it is common to use the Cohen' kappa, Scott's pi, Gwet’s AC1/2, and Krippendorf's alpha coefficients (weighted or not). When R > 2, some pairwise version based on the aforementioned coefficients is normally used; with the same order as above: Hubert's kappa, Fleiss's kappa, Gwet's AC1/2, and Krippendorf's alpha. However, all these statistics are based on biased estimators of the expected index of agreements, since they estimate the product of two population proportions through the product of their sample estimators. The aims of this article are three. First, to provide statistics based on unbiased estimators of the expected index of agreements and determine their variance based on the variance of the original statistic. Second, to make pairwise extensions of some measures. And third, to show that the old and new estimators of the Cohen’s kappa and Hubert’s kappa coefficients match the well-known estimators of concordance and intraclass correlation coefficients, if the former are defined by assuming quadratic weights. The article shows that the new estimators are always greater than or equal the classic ones, except for the case of Gwet where it is the other way around, although these differences are only relevant with small sample sizes (e.g. n ≤ 30).

Properties of Bangdiwala’s B

Article Open access 19 March 2018

Statistical Assessment of Agreement

A New Interpretation of the Weighted Kappa Coefficients

Article 17 December 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

It is often necessary to assess the degree of concordance or agreement between R raters which independently classify n subjects within K ≥ 2 categories (Fleiss 1971; Landis and Koch 1975a, b; Warrens 2010; Schuster and Smith 2005).

Let this be the case for only two raters (R = 2) and nominal categories. As some of the observed agreements may be due to chance, it is most common to eliminate the effect of chance by defining a kappa-type coefficient of the form κ = (I_o − I_e)/(1 − I_e). In that expression I_o is the observed index of agreements (the sum of the observed proportions of agreements), I_e is the expected index of agreements (the sum of the proportions of agreements that would happen if the two raters acted independently) and κ is the population value of the proposed agreement measure. Note that the previous indexes only consider the agreements obtained. When the categories are ordinal, the indexes defined are similar to the previous ones, but also considering the disagreements obtained, to which certain weights are assigned (see Sect. 2.1); this leads to a weighted kappa coefficient. From now on, κ will allude to one or the other indistinctly. According to the definition adopted for I_e, the different kappa coefficients are obtained: κ_S (Scott 1955), κ_C (Cohen 1960, 1968), and κ_G (Gwet 2008). The estimation of these coefficients has the general form of $\hat{\kappa } = {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} {\left( {1 - \hat{I}_{e} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{e} } \right)}},$ where the values $\hat{\kappa }$, $\hat{I}_{o}$ and $\hat{I}_{e}$ are the sample estimators of the previous population parameters. It can be seen that κ and $\hat{\kappa }$ are decreasing functions of I_e and $\hat{I}_{e}$ respectively. Additionally, Krippendorf (1970, 2004) provides an estimator $\hat{\kappa }_{K}$ of κ_S that differs slightly from the more classical $\hat{\kappa }_{S}$ because of its new definition of $\hat{I}_{o}$.

Let this be the case for multi-raters (R ≥ 2). The different coefficients κ of the case R = 2 can be generalized for the case of multi-raters in several ways, depending on how the phrase “an agreement has occurred” is interpreted. The most common interpretation is that of Fleiss (1971) and Hubert (1977) "an agreement occurs if and only if two raters categorize an object consistently" or pairwise definition of agreement. This is the definition in this article. Hubert (1977) also makes the following interpretation "an agreement occurs if and only if all raters agree on the categorization of an object", or R-wise definition (Conger 1980). The extension R-wise κ_HR of κ_C can be seen in Conger (1980), Shuster and Smith (2005) and Martín Andrés and Álvarez Hernández (2020). The best-known pairwise extensions of the coefficients κ_S, κ_C and κ_G are the coefficients κ_F (Fleiss 1971), κ_H (Hubert 1977; Conger 1980) and κ_G (Gwet 2008) respectively. All of them are defined under the same format as in the case of R = 2. Additionally, Krippendorf (1970, 2004) provides an estimator $\hat{\kappa }_{K}$ of κ_F that differs slightly from the more classical $\hat{\kappa }_{F}$, again because of the definition of $\hat{I}_{o}$. An overview of all of the above can be seen in Gwet's book (2021).

However, all $\hat{\kappa }_{X}$ expressions are based on biased estimators (X refers to any of the letters used above), since they estimate the product of two population proportions -a term that is present in I_e- through the product of their sample estimators. The first objective of this article is to correct this bias by proposing unbiased estimators $\hat{I}_{eU}$ of I_e − so the new estimator of κ_X will be $\hat{\kappa }_{XU} = {{\left( {\hat{I}_{o} - \hat{I}_{eU} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{eU} } \right)} {\left( {1 - \hat{I}_{eU} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{eU} } \right)}}$ − , as well as to determine the variance of $\hat{\kappa }_{XU}$. This methodology is easy to apply to any other kappa coefficient studied. A second objective is to make pairwise extensions of some measures, but in a different way to traditional pairwise extensions.

The previous description is very general since it is necessary to specify who are the “subject population” and the “rater population”. Regarding the population of subjects, the n subjects may be: (a) a random sample of an infinite population of subjects, which is what is assumed in the rest of the sections; (b) a random sample of a finite population of subjects, in which case a finite population correction (Gwet 2021a, b) must be made to the formulas of the variance; and (c) the only subjects of interest, in which case only $\hat{\kappa }_{X}$ makes sense, there is no κ_X parameter to estimate and it makes no sense to define $\hat{\kappa }_{XU}$.

Regarding the population of raters, the R raters may be (Shrout and Fleiss 1979): (1) different for the same subject -even with a different number- and extracted from an infinite population of raters; (2) the same for all of the subjects and extracted from an infinite population of raters; and (3) the same for all of the subjects and they are the only raters of interest, which is what is assumed in the rest of the sections. When the replies of the raters are quantitative, a traditional way of measuring the degree of agreement between them is through the intraclass correlation coefficients (ICC) ρ_I1, ρ_I2, and ρ_I3 which are obtained from the corresponding one-way random model, two-way random model, or two-way mixed model, respectively. In the last two cases it is assumed that there is no interaction. Nevertheless, in this context of measures of agreement, Shrout and Fleiss (1979) and Carrasco and Jover (2003) point out that in case (3) it is also necessary to include the variability between raters in the total variability, so that in cases (2) and (3) we should use ρ_I2. Additionally, and for case (3), Lin (1989, 2000) and Barnhart et al. (2002) propose using as a measure of agreement the concordance correlation coefficient (CCC) ρ_L.

As is logical, different researchers have shown interest in searching for relations between the coefficients κ_X, ρ_Ii, and ρ_L, as well as between their estimators $\hat{\kappa }_{X}$, $\hat{\rho }_{Ii}$, and $\hat{\rho }_{L}$. Landis and Koch (1977) demonstrated that $\hat{\kappa }_{F}$ is asymptotically equivalent to ρ_I1 when the replies are binary. Furthermore, Barnhart et al. (2002) and Carrasco and Jover (2003) demonstrated that ρ_L = ρ_I2. Since in the case of R = 2 Martín Andrés and Álvarez Hernández (2020) demonstrated that ρ_L = κ_C—assuming, as from now on, that the weights of the disagreements are quadratic—, then the satisfactory property κ_C = ρ_L = ρ_I2 is obtained when R = 2. The equivalences between the estimators of these parameters are more complex, since their values depend on the method of estimating their components. For example, Fleiss and Cohen (1973) demonstrated that $\hat{\kappa }_{C}$ is asymptotically equivalent to ρ_I2, King and Chinchilli (2001) and Martín Andrés and Álvarez Hernández (2020) demonstrated that $\hat{\kappa }_{C}$ = $\hat{\rho }_{L}$ when direct (biased) estimators are used, and Davis and Fleiss (1982) verified that $\hat{\kappa }_{H}$ is asymptotically equivalent to ρ_I2 when the replies are binary. The third objective of this article is to relate κ_H to ρ_L, as well as estimators $\hat{\kappa }_{CU}$ and $\hat{\kappa }_{HU}$ with estimators $\hat{\rho }_{I2}$ and $\hat{\rho }_{LU}$, which is based on the unbiased estimators of the components of ρ_I2 and ρ_L, respectively.

From the aforementioned reasons, we can see that in this article it is assumed that n subjects, extracted randomly from an infinite population, are given a score a single time by R fixed raters (who are the only ones of interest). It is also assumed that there are no missing data, i.e. that all of the raters give a reply in all of the subjects.

2 Case of two raters

Let be two raters (R = 2) that independently classify n subjects within K categories. Let O_ij be the number of subjects whom observer 1 classifies as type i (i = 1, 2, …, K) and observer 2 as type j (j = 1, 2, …, K). This gives rise to a table of absolute frequencies O_ij like those in Tables 1 and 2, with observed proportions $\hat{p}_{ij}$ = O_ij/n, where Σ_iΣ_jO_ij = n and Σ_iΣ_j$\hat{p}_{ij}$ = 1. The notation for row totals (O_i· and $\hat{p}_{i \cdot }$), of column (O_·j and $\hat{p}_{ \cdot j}$) or general (O_·· = n and $\hat{p}_{ \cdot \cdot }$ = 1) is the usual; for example $\hat{p}_{i \cdot }$ = Σ_j$\hat{p}_{ij}$. If the subjects have been chosen randomly and both raters classify all of the subjects, then the observed dataset {O_ij} comes from a multinomial distribution of parameters n and {p_ij}, where p_ij is the probability that a subject will be classified in cell (i, j). Additionally {p_i·} and {p_·j} will be the marginal distributions of the row and column observers respectively. Obviously, $\hat{p}_{ij}$, $\hat{p}_{i \cdot }$ and $\hat{p}_{ \cdot j}$ are the maximum likelihood estimators of p_ij, p_i· and p_·j respectively. At the end of “Appendix 2”, another type of sampling is mentioned in detail.

Table 1 Diagnosis of n = 100 subjects by R = 2 raters in K = 3 categories (Fleiss et al. 2003)

Full size table

Table 2 Classification of n = 8 subjects by R = 2 raters in K = 3 categories (Gwet 2021b, p 109)

Full size table

2.1 Weighted and unweighted kappa and observed index of agreements

It has already been indicated that κ depends on the indexes of agreement I_o (observed) and I_e (expected). To evaluate any of them it is necessary to previously define the weight or degree of agreement w_ij that is assigned to the answer (i, j), with 0 ≤ w_ij ≤ 1, w_ii = 1, and generally w_ij = w_ji < 1 (i ≠ j). When categories are ordinal, there are many ways to assign values to w_ij (Schuster and Smith 2005). If we assume that categories 1, 2, …, K are ordered from the lowest to highest, it is usual that w_ij is related to the value of (i − j). A classic definition, to which we will refer later, is the quadratic weighting w_ij = 1 − [(i − j)/(K − 1)]² of Fleiss and Cohen (1973). When categories are nominal, it is traditional to assign the weights w_ii = 1 and w_ij = 0 (i ≠ j), that is, it only considers the actual agreements. Historically, the different coefficients κ are defined first in the unweighted case, later extending it to the weighted case. However this article will be developed for the general weighted case, since the unweighted is a particular case of that: w_ij = δ_ij with δ_ij are the Kronecker deltas.

All coefficients κ are defined based on the same value of the index of agreements observed. Therefore, it is appropriate to indicate their definition (I_o) and their estimate ($\hat{I}_{o}$) as general reference for all this Sect. 2:

$$I_{o} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{ij} } } \quad {\text{and}}\quad \hat{I}_{o} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{ij} } } = {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{ij} } } } \mathord{\left/ {\vphantom {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{ij} } } } n}} \right. \kern-0pt} n},$$

where $\hat{I}_{o}$ is an unbiased estimator of I_o.

2.2 Cohen's kappa and the intraclass and concordance correlation coefficients

Cohen (1960, 1968) defines the classical measure of agreement

$$k_{C} = (I_{o} - I_{e} )/(1 - I_{e} )\quad where\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{i \cdot } p_{ \cdot j} } } ,$$

(1)

and proposes to estimate it by,

$$\begin{aligned}\hat{\kappa }_{C} =& {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} {\left( {1 - \hat{I}_{e} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{e} } \right)}}\quad {\text{where}}\quad\\ \hat{I}_{e} = &\sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{i \cdot } \hat{p}_{ \cdot j} } } = {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } \mathord{\left/ {\vphantom {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } {n^{2} }}} \right. \kern-0pt} {n^{2} }}. \end{aligned}$$

As indicated in “Appendix 1”, $\hat{p}_{i \cdot } \hat{p}_{ \cdot j}$ is not an unbiased estimator of p_i·p_·j since

$$E\left( {\hat{p}_{i \cdot } \hat{p}_{ \cdot j} } \right) = \frac{{\left( {n - 1} \right)p_{i \cdot } p_{ \cdot j} + p_{ij} }}{n},$$

(2)

although it is asymptotically unbiased, as happens in the other cases that follow. Therefore $E\left( {\hat{I}_{e} } \right) = \sum\nolimits_{i} {\sum\nolimits_{j} {E\left( {\hat{p}_{i \cdot } \hat{p}_{ \cdot j} } \right) = } }${(n − 1)I_e + I_o}/n and $\hat{I}_{e}$ is also not an unbiased estimator of I_e. From expression (2) it follows that the unbiased estimators of p_i·p_·j and I_e are

$$\widehat{{p_{i \cdot } p_{ \cdot j} }} = \frac{{n\hat{p}_{i \cdot } \hat{p}_{ \cdot j} - \hat{p}_{ij} }}{n - 1}\quad {\text{and}}\quad \hat{I}_{eU} = \sum\limits_{i} {w_{ij} \widehat{{p_{i \cdot } p_{ \cdot j} }}} = \frac{{n\hat{I}_{e} - \hat{I}_{o} }}{n - 1},$$

(3)

respectively. Thus, the new estimator $\hat{\kappa }_{CU}$ of κ_C will be

$$\hat{\kappa }_{CU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{n\hat{\kappa }_{C} }}{{\left( {n - 1} \right) + \hat{\kappa }_{C} }},$$

(4)

and its variance, which is deduced in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{CU} } \right) = \frac{{\left( {n - \kappa_{C} } \right)^{4} }}{{\left\{ {n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{C} } \right),$$

(5)

where $V\left( {\hat{\kappa }_{C} } \right)$ refers to the formula of Fleiss et al. (1969), which can be seen in the book by Gwet (2021b); this book also contains all of the variances that are needed in what follows. This type of correction is similar to the one used by Miettinen and Nurminen (1985) for the score statistics in 2 × 2 tables. Because of expression (3), $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $- \left( {\hat{I}_{o} - \hat{I}_{e} } \right)$ ≤ 0 if and only if $\hat{\kappa }_{C}$ ≥ 0. As $\hat{\kappa }_{C}$ decreases with $\hat{I}_{e}$, then $\hat{\kappa }_{CU}$ ≥ $\hat{\kappa }_{C}$ in the case of positive agreement ($\hat{\kappa }_{C}$ ≥ 0, which is the case of greatest interest). It is easy to see that $V\left( {\hat{\kappa }_{CU} } \right) \le V\left( {\hat{\kappa }_{C} } \right)$ if and only if κ_C ≥ n^0.5/{n^0.5 + (n − 1)^0.5}. Something similar happens with the other variances obtained below.

Let there now be two raters with quantitative answers x₁ and x₂ with means μ₁ and μ₂, variances $\sigma_{1}^{2}$ and $\sigma_{2}^{2}$, and covariance σ₁₂. Lin (1989, 2000) established the following measure of quantitative agreement ρ_L (known as CCC) and its estimation $\hat{\rho }_{L}$

$$\rho_{L} = \frac{{2\sigma_{12} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \left( {\mu_{1} - \mu_{2} } \right)^{2} }}\quad {\text{and}}\quad \hat{\rho }_{L} = \frac{{2S_{12} }}{{S_{1}^{2} + S_{2}^{2} + \left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} }},$$

(6)

where $S_{i}^{2}$ and S₁₂ are the biased estimators of the variances and covariances respectively (both with denominator n) and $\overline{x}_{i}$ are the sample means. As mentioned in the Introduction, the quadratic weighting has the advantage of achieving that κ_C = ρ_L = ρ_I2 and that $\hat{\rho }_{L} = \hat{\kappa }_{C}$. On the other hand, Carrasco and Jover (2003) replaced the values of $\sigma_{i}^{2}$, σ₁₂ and (μ₁ − μ₂)² for their unbiased estimators $s_{i}^{2}$, s₁₂ (their sample variances and covariances with denominator n − 1) and $\widehat{{\left( {\mu_{1} - \mu_{2} } \right)^{2} }}$ = ($\overline{x}_{1}$ − $\overline{x}_{2}$)² − ($s_{1}^{2}$ + $s_{2}^{2}$ − 2s₁₂)/n in the first expression (6), which led to the following estimator $\hat{\rho }_{LU}$ of ρ_L,

$$\hat{\rho }_{LU} = \frac{{2ns_{12} }}{{\left( {s_{1}^{2} + s_{2}^{2} } \right)\left( {n - 1} \right) + n\left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} + 2s_{12} }} = \frac{{2nS_{12} }}{{\left( {n - 1} \right)\left\{ {S_{1}^{2} + S_{2}^{2} + \left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} } \right\} + 2S_{12} }},$$

(7)

Note that $\hat{\rho }_{LU}$ = n $\hat{\rho }_{L}$/{(n − 1) + $\hat{\rho }_{L}$}, which is the same function of expression (4) that relates $\hat{\kappa }_{CU}$ with $\hat{\kappa }_{C}$. Therefore, as $\hat{\kappa }_{C}$ = $\hat{\rho }_{L}$, then $\hat{\kappa }_{CU}$ = $\hat{\rho }_{LU}$ and the two new estimators of ρ_L and κ_C (quadratic weights) are the same. Additionally, $\hat{\rho }_{LU}$ ≥ $\hat{\rho }_{L}$ if $\hat{\rho }_{L}$ ≥ 0. In the “Appendix 3” it is proved that $\hat{\rho }_{LU}$ = $\hat{\rho }_{I2}$, thus $\hat{\kappa }_{CU}$ = $\hat{\rho }_{LU}$ = $\hat{\rho }_{I2}$.

2.3 Scott's pi

Scott (1955) defines the following measure of agreement

$$k_{S} = (I_{o} - I_{e} )/(1 - I_{e} )\quad {\text{where}}\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \pi_{i} \pi_{j} } } ,\,{\text{with}}\,\,\pi_{i} = (p_{i \cdot } + p_{ \cdot i} )/2,$$

(8)

and proposes to estimate it by

$$\hat{\kappa }_{S} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } \quad {\text{with}}\quad \hat{\pi }_{i} = \frac{{\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} }}{2} = \frac{{O_{i \cdot } + O_{ \cdot i} }}{2n},$$

(9)

As indicated in “Appendix 1”, $\hat{\pi }_{i} \hat{\pi }_{j}$ is not an unbiased estimator of π_iπ_j since

$$E(\hat{\pi }_{i} \hat{\pi }_{j} ) = \frac{{\left( {n - 1} \right)\pi_{i} \pi_{j} + {{\left\{ {\delta_{ij} \left( {p_{i \cdot } + p_{ \cdot j} } \right) + \left( {p_{ij} + p_{ji} } \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {\delta_{ij} \left( {p_{i \cdot } + p_{ \cdot j} } \right) + \left( {p_{ij} + p_{ji} } \right)} \right\}} 4}} \right. \kern-0pt} 4}}}{n}.$$

(10)

Therefore, E($\hat{I}_{e}$) = $\sum\nolimits_{i} {\sum\nolimits_{j} {E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right)} }$ = {(n − 1)I_e + (1 + I_o)/2}/n, assuming that w_ij = w_ji, and $\hat{I}_{e}$ is not an unbiased estimator of I_e. From expression (10) it is deduced that the unbiased estimators of π_iπ_j and I_e are

$$\begin{aligned} &\widehat{{\pi_{i} \pi_{j} }} = \frac{{n\hat{\pi }_{i} \hat{\pi }_{j} - {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot j} } \right)\delta_{ij} + \left( {\hat{p}_{ij} + \hat{p}_{ji} } \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot j} } \right)\delta_{ij} + \left( {\hat{p}_{ij} + \hat{p}_{ji} } \right)} \right\}} 4}} \right. \kern-0pt} 4}}}{n - 1}\quad {\text{and}}\\ & \hat{I}_{eU} = \sum\limits_{i} {w_{ij} } \widehat{{\pi_{i} \pi_{j} }} = \frac{{n\hat{I}_{e} - {{\left( {1 + \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 + \hat{I}_{o} } \right)} 2}} \right. \kern-0pt} 2}}}{n - 1},\end{aligned}$$

(11)

respectively. Therefore, the new estimator $\hat{\kappa }_{SU}$ of κ_S will be

$$\hat{\kappa }_{SU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{S} + 1}}{{\left( {2n - 1} \right) + \hat{\kappa }_{S} }},$$

(12)

and its variance, as followed in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{SU} } \right) = \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{S} } \right).$$

(13)

Because of expression (11), $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $- \left\{ {\left( {1 - \hat{I}_{e} } \right) + \left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \right\}$ which is also proportional to $- \left\{ {1 + \hat{\kappa }_{S} } \right\}$ ≤ 0 if and only if $\hat{\kappa }_{S}$ ≥ − 1. As $\hat{\kappa }_{S}$ decreases with $\hat{I}_{e}$, then $\hat{\kappa }_{SU}$ ≥ $\hat{\kappa }_{S}$ in the case of a positive agreement.

2.4 Krippendorf's alpha

Krippendorf (1970, 2004) proposed to estimate κ_S as in expression (9), but with a small-sample correction for $\hat{I}_{o}$, though Gwet (2021b, p. 65) considers that “The need for such an adjustment and its potential benefits have not been documented”. The new estimator is,

$$\hat{\kappa }_{K} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n}\quad {\text{and}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } ,$$

(14)

where $\hat{I}_{oC} = \hat{I}_{o} + {{\left( {1 - \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 - \hat{I}_{o} } \right)} {2n}}} \right. \kern-0pt} {2n}}$; therefore,

$$\begin{aligned}\hat{\kappa }_{K} = &\frac{{\left( {2n - 1} \right)\hat{\kappa }_{S} + 1}}{2n}\quad {\text{and}}\\ \hat{\kappa }_{KU} =& \frac{{\left( {2n - 1} \right)\hat{\kappa }_{SU} + 1}}{2n} = \frac{{\left( {n - 1} \right) + \left\{ {2n\left( {n - 1} \right) + 1} \right\}\hat{\kappa }_{K} }}{{2n\left( {n - 1} \right) + n\hat{\kappa }_{K} }}. \end{aligned}$$

(15)

The first expression follows from expressions (9) and (14); the second is obtained by replacing $\hat{I}_{e}$ for the value of $\hat{I}_{eU}$ in expression (11). From expressions (15) it is deduces that $\hat{\kappa }_{K}$ ≥ $\hat{\kappa }_{S}$ and $\hat{\kappa }_{KU}$ ≥ $\hat{\kappa }_{SU}$. Also, as for positive degrees of agreement it occurs that $\hat{\kappa }_{SU}$ ≥ $\hat{\kappa }_{S}$ then, due to expressions (15), $\hat{\kappa }_{KU}$ ≥ $\hat{\kappa }_{K}$. Finally, if in the first expression of Eq. (15) $\hat{\kappa }_{S}$ is replaced by {(2n − 1)$\hat{\kappa }_{SU}$ − 1}/{(2n − 1) − $\hat{\kappa }_{SU}$} − which is deduced from expression (12) − then $\hat{\kappa }_{K}$ = 2(n − 1) $\hat{\kappa }_{SU}$/{(2n − 1) − $\hat{\kappa }_{SU}$} and $\hat{\kappa }_{SU}$ ≥ $\hat{\kappa }_{K}$ if $\hat{\kappa }_{SU}$ ≥ 0. The overall conclusion is that $\hat{\kappa }_{S}$ ≤ $\hat{\kappa }_{K}$ ≤ $\hat{\kappa }_{SU}$ ≤ $\hat{\kappa }_{KU}$ for positive degrees of agreement.

Regarding the variance, it is sufficient to use the first part of the second expression (15) and then replacing V($\hat{\kappa }_{SU}$) with the value in expression (13); thus

$$V\left( {\hat{\kappa }_{KU} } \right) = \left( {\frac{2n - 1}{{2n}}} \right)^{2} \times \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{S} } \right).$$

2.5 Gwet's AC1/2

Gwet (2008) defines the next measure regarding AC2 (AC1 refers to the non-weighted case),

$$\begin{aligned} &k_{G} = (I_{o} - I_{e} )/(1 - I_{e} )\quad {\text{where}}\quad I_{e} = W \times \sum\limits_{i} {\pi_{i} (1 - \pi_{i} )} ,/\{ K(K - 1)\} \\ {\text{and}}\quad &W = \sum\limits_{i} {\sum\limits_{j} {w_{ij} } } ,\end{aligned}$$

(16)

and proposes to estimate it by

$$\hat{\kappa }_{G} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\sum\limits_{i} {\hat{\pi }_{i} \left( {1 - \hat{\pi }_{i} } \right)} ,$$

(17)

where π_i and $\hat{\pi }_{i}$ are obtained as in expressions (8) and (9). Once again it happens that $\hat{I}_{e}$ is not an unbiased estimator of I_e, because $\hat{\pi }_{i}^{2}$ is not an estimator of $\pi_{i}^{2}$ either. Using the first expression (11) to estimate $\pi_{i}^{2}$ in an unbiased way, we obtain that the unbiased estimators of $\pi_{i}^{2}$ and I_e are, respectively

$$\begin{aligned}&\widehat{{\pi_{i}^{2} }} = \frac{{n\hat{\pi}_{i}^{2} - {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right) + 2\hat{p}_{ii} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right) + 2\hat{p}_{ii} } \right\}} 4}} \right. \kern-0pt} 4}}}{n - 1}\quad\\ {\text{and}}\quad &\hat{I}_{eU} = \frac{1}{n - 1}\left\{ {n\hat{I}_{e} - X} \right\},\quad {\text{where}}\quad X = \frac{{W\left( {1 - \sum\limits_{i} {\hat{p}_{ii} } } \right)}}{{2K\left( {K - 1} \right)}}.\end{aligned}$$

(18)

Therefore, the new estimator $\hat{\kappa }_{GU}$ of κ_G will be

$$\hat{\kappa }_{GU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + Y}}{{\left( {n - 1} \right) + Y}}\quad {\text{where}}\quad Y = \frac{{X - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$

(19)

In “Appendix 1” it is proved that $\hat{I}_{eU} - \hat{I}_{e}$ ≥ 0, so it always happens that $\hat{\kappa }_{GU}$ ≤ $\hat{\kappa }_{G}$. It can be observed that it is not feasible to determine V($\hat{\kappa }_{GU}$) directly from the value of V($\hat{\kappa }_{G}$).

3 Case of multi-raters

Let there be n subjects (s = 1, 2, …, n) classified by R raters (r = 1, 2, …, R) in K types (i = 1, 2, …, K). Let x_sr = 1, 2, …, K be the answer of rater r in subject s, values that are usually presented in a two-dimensional table in which the subjects are in rows and the raters in columns. For each row (subject), let R_is be the number of raters that answer i in subject s; obviously R_i+ = Σ_sR_is is the total number of i answers (for every rater), R_+s = Σ_iR_is = R and R₊₊ = Σ_iΣ_sR_is = nR. For each column (rater), let n_ir be the number of subjects classified as i by rater r; obviously n_+r = Σ_in_ir = n, n_i+ = Σ_rn_ir = R_i+ is the total number of i answers and n₊₊ = Σ_iΣ_rn_ir = nR = R₊₊. The results of R_is and n_ir are usually presented as in Table 3(a) and (b) respectively.

Table 3 Results of the classification of n = 29 fish by R = 4 raters in K = 5 colorations (Gwet 2021b, p 341)

Full size table

3.1 Pairwise methods and the observed index of agreement

To define and estimate the measures regarding the R > 2 case, the pairwise methods will be used. These methods in some way offer an average for what happens in the R(R − 1) possible pairs of raters (r, r'), with r, r' = 1, 2, …, R and r ≠ r'. This obliges us to change the notation used in Sect. 2, since it is necessary to indicate for each parameter from which pair (r, r') does its value come from. Parameters p_ij, p_i· and p_·j of Sect. 2 will now be notated as p_ir,jr', p_ir and p_jr' respectively. Additionally, we define the new parameter p_i+ = Σ_rp_ir = Σ_r'p_ir', which is the proportion of i answers of every raters. A similar thing occurs with the estimated values $\hat{p}_{ij}$ and $\hat{p}_{ir,jr^{\prime}}$ etc. Note that the estimators $\hat{p}_{ir}$ of p_ir and $\hat{p}_{i + }$ of p_i+ are

$$\hat{p}_{ir} = \frac{{n_{ir} }}{n}\quad {\text{and}}\quad \hat{p}_{i + } = \sum\nolimits_{r} {\hat{p}_{ir} } = \frac{{n_{i + } }}{n} = \frac{{R_{i + } }}{n},$$

(20)

respectively, where Σ_i $\hat{p}_{ir}$ = 1 and Σ_rΣ_i $\hat{p}_{ir}$ = R. Parameters κ, I_o and I_e of Sect. 2 will be denoted as κ(r, r'), I_o(r, r') and I_e (r, r') respectively; therefore

$$k\left( {r,r^{\prime } } \right) = \{ I_{o} \left( {r,r^{\prime } } \right) - I_{e} \left( {r,r^{\prime } } \right)\} /\{ 1 - I_{e} \left( {r,r^{\prime } } \right)\} ,$$

(21)

and the same for the estimated values $\hat{\kappa }\left( {r,r^{\prime}} \right)$ etc.

With pairwise methods there are several ways to average the results of every pair of raters (r, r'), but all procedures of interest define the global value of I_o as

$$I_{o} = \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{o} \left( {r,r^{\prime } } \right)} } /\{ R(R - 1)\} ,$$

(22)

thus I_o = Σ_rΣ_r'≠rΣ_iΣ_jw_ijp_ir,jr'/{R(R − 1)}. As is traditional, the measure of global agreement will be κ = (I_o − I_e)/(1 − I_e), where I_e is yet to be defined. If I_e is defined in a similar way to I_o

$$I_{e} = \sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {I_{e} \left( {r,r^{\prime } } \right)} } /\{ R(R - 1)\} ,$$

(23)

we say that the procedure that defines global κ is a “two-pairwise” procedure and the population coefficient thereby obtained will be,

$$k_{2} = \left\{ {\sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{o} \left( {r,r^{\prime } } \right)} } - \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{e} \left( {r,r^{\prime } } \right)} } } \right\}/\left\{ {R(R - 1) - \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{e} \left( {r,r^{\prime } } \right)} } } \right\}.$$

It can be noticed that κ₂ is also obtained by dividing the sum of all the possible numerators (Σ_rΣ_r'≠r) from expression (21), by the sum of all possible denominators, which indicates that κ₂ if the weighted average of R(R − 1) values of κ(r, r') − the weights are the denominators − . This procedure is the one recommended by Janson and Olsson (2001), Conger (1980) and Gwet (2021b). Notice that Σ_rΣ_r'≠rI_o(r, r') = 2Σ_rΣ_r'>rI_o(r, r') and similarly with I_e. We have preferred to use the first expression because it facilitates some proofs, but regarding calculations the second expressions seems preferable. All of the above also applies to the case of estimated values.

As the base values of I_o and $\hat{I}_{o}$ are the same in every κ measures, it should be specified since its values are (see “Appendix 1”),

$$\begin{aligned} I_{o} &= \frac{{\sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {\sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{{ir,jr^{\prime } }} } } } } }}{{R\left( {R - 1} \right)}},\\ \hat{I}_{o} &= \frac{{\sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {\sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{{ir,jr^{\prime } }} } } } } }}{{R\left( {R - 1} \right)}} = \frac{{\sum\limits_{i} {\sum\limits_{j} {w_{ij} \sum\limits_{s} {R_{is} R_{js} } } - nR} }}{{nR\left( {R - 1} \right)}},\end{aligned}$$

(24)

3.2 Hubert's kappa pairwise and the intraclass and concordance correlation coefficients

The κ_H coefficient of Hubert (Hubert 1977; Conger 1980) is a two-pairwise coefficient, and that is why the expression (23) can be applied for value I_e(r, r') of Cohen. Adjusting expression (1) to the current format, I_e(r, r') = Σ_iΣ_jw_ijp_irp_jr' and, due to “Appendix 1”

$$\begin{aligned}& k_{H} = (I_{o} - I_{e} )/(1 - I_{e} )\quad\\ {\text{where}}\quad& I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} } } \left( {p_{i + } p_{j + } - \sum\limits_{r} {p_{ir} p_{jr} } } \right)/\{ R(R - 1)\} .\end{aligned}$$

(25)

Using expressions (20) the following estimation is obtained

$$\hat{\kappa }_{H} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{1}{{n^{2} R\left( {R - 1} \right)}}\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {n_{i + } n_{j + } - \sum\limits_{r} {n_{ir} n_{jr} } } \right\}} } .$$

It can be observed that for R = 2 it occurs that κ_C = κ_H and $\hat{\kappa }_{C} = \hat{\kappa }_{H}$. In order to obtain an unbiased estimator of I_e, the second expression of (3), applied with the current notation, indicates that $\hat{I}_{eU} \left( {r,r^{\prime}} \right) =$${{\left\{ {n\hat{I}_{e} \left( {r,r^{\prime}} \right) - \hat{I}_{o} \left( {r,r^{\prime}} \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {n\hat{I}_{e} \left( {r,r^{\prime}} \right) - \hat{I}_{o} \left( {r,r^{\prime}} \right)} \right\}} {\left( {n - 1} \right)}}} \right. \kern-0pt} {\left( {n - 1} \right)}}$; therefore R(R − 1)$\hat{I}_{eU}$ = $\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{I}_{eU} \left( {r,r^{\prime}} \right)} } =$${{\left\{ {n\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{e} \left( {r,r^{\prime}} \right)} } - \sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{o} \left( {r,r^{\prime}} \right)} } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {n\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{e} \left( {r,r^{\prime}} \right)} } - \sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{o} \left( {r,r^{\prime}} \right)} } } \right\}} {\left( {n - 1} \right)}}} \right. \kern-0pt} {\left( {n - 1} \right)}}$ and so $\hat{I}_{eU}$ = (n $\hat{I}_{e}$ − $\hat{I}_{o}$)/ (n − 1). As this expression is the same as the second expression of (3), then the conclusions in Sect. 2.2 are still valid, changing the letter C with the letter H. Thus,

$$\hat{\kappa }_{HU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{n\hat{\kappa }_{H} }}{{\left( {n - 1} \right) + \hat{\kappa }_{H} }},$$

(26)

and $\hat{\kappa }_{HU}$ ≥ $\hat{\kappa }_{H}$ in the case of positive agreement.

Generalizing the first expression of (6) in the case of two raters r and r' of answers x_r and x_r’, means μ_r and μ_r', variances $\sigma_{r}^{2}$ and $\sigma_{r^{\prime}}^{2}$, and covariances σ_rr', we obtain $\rho_{L} \left( {r,r^{\prime}} \right)$ = ${{2\sigma_{rr^{\prime}} } \mathord{\left/ {\vphantom {{2\sigma_{rr^{\prime}} } {\left\{ {\sigma_{r}^{2} + \sigma_{r^{\prime}}^{2} + \left( {\mu_{r} - \mu_{r^{\prime}} } \right)^{2} } \right\}}}} \right. \kern-0pt} {\left\{ {\sigma_{r}^{2} + \sigma_{r^{\prime}}^{2} + \left( {\mu_{r} - \mu_{r^{\prime}} } \right)^{2} } \right\}}}.$ If we apply to this expression the two-pairwise criterion which consists of adding Σ_rΣ_r≠r' in the numerator and in the denominator, the CCC ρ_L of Lin (1989, 2000) and Barnhart et al. (2002) is obtained for the case of multi-raters; its estimated $\hat{\rho }_{L}$ value is obtained in the same way as the second expression of (6). In this way,

$$\begin{aligned}&\rho_{L} = \frac{{2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\sigma_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\sum\limits_{r} {\sigma_{r}^{2} } + \sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\mu_{r} - \mu_{{r^{\prime } }} } \right)^{2} } } }},\\ &\hat{\rho }_{L} = \frac{{2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {S_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\sum\limits_{r} {S_{r}^{2} } + \sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\overline{x}_{r} - \overline{x}_{{r^{\prime } }} } \right)^{2} } } }}. \end{aligned}$$

(27)

Carrasco and Jover (2003) justified that $\hat{\rho }_{L}$ is based on biased estimators and they proposed the following estimator, which is based on unbiased estimators (s_rr´ and $s_{r}^{2}$)

$$\hat{\rho }_{LU} = \frac{{2n\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {s_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\left( {n - 1} \right)\sum\limits_{r} {s_{r}^{2} } + n\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\overline{x}_{r} - \overline{x}_{{r^{\prime } }} } \right)^{2} + 2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {s_{{rr^{\prime } }} } } } } }}.$$

(28)

It is easy to see that the same thing can be obtained applying the two-pairwise method to the first expression (7). As for R = 2 it occurred that κ_C = ρ_L and $\hat{\kappa }_{C} = \hat{\rho }_{L}$ when the weights were quadratic, and in both cases the value for R > 2 is obtained in the same way − the sum of the numerators divided by the sum of the denominators − , then also κ_H = ρ_L and $\hat{\kappa }_{H} = \hat{\rho }_{L}$ in the case of R > 2. Additionally, κ_HR = κ_H = ρ_L = ρ_I2 since ρ_L = ρ_I2 (Carrasco and Jover 2003) and κ_HR = ρ_L (Martín Andrés and Álvarez Hernández 2020). Furthermore, as $\hat{\rho }_{LU} = {{n\hat{\rho }_{L} } \mathord{\left/ {\vphantom {{n\hat{\rho }_{L} } {\left\{ {\left( {n - 1} \right) + \hat{\rho }_{L} } \right\}}}} \right. \kern-0pt} {\left\{ {\left( {n - 1} \right) + \hat{\rho }_{L} } \right\}}}$ -an expression which has the same form as (26)- then also

$$\hat{\kappa }_{HU} = \hat{\rho }_{LU} = \hat{\rho }_{I2} = \frac{{n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}{{\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} + \left( {nR - n - R} \right)\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }},$$

(29)

where the last two equalities are demonstrated in the “Appendix 3”. In the last expression, which is simpler for the calculation, it is understood that x_s· = $\sum\nolimits_{r} {x_{sr} }$, x_·r = $\sum\nolimits_{s} {x_{sr} }$, and x_·· = $\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr} } }$. Something similar happens with the estimators based on the biased estimation of their components (see “Appendix 3”),

$$\hat{\kappa }_{H} = \hat{\rho }_{L} = \frac{{n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}{{\sum\nolimits_{r} {x_{ \cdot r}^{2} + n\left( {R - 1} \right)\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}.$$

(30)

3.3 Fleiss' kappa pairwise

Fleiss (1971) extended κ_S to the case of R > 2 defining in the following value of I_e, which is not a two-pairwise type,

$$\begin{aligned} &k_{F} = (I_{o} - I_{e} )/({1} - I_{e} )\quad {\text{where}}\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \pi_{i} \pi_{j} } } \\ {\text{and}}\quad &\pi_{i} = \sum\limits_{r} {p_{ir} } /R = p_{i + } /R, \end{aligned}$$

(31)

and proposes the following estimators

$$\begin{aligned} &\hat{\kappa }_{F} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } = \frac{1}{{n^{2} R^{2} }}\sum\limits_{i} {\sum\limits_{j} {w_{ij} R_{i + } R_{j + } } } \\ {\text{and}}\quad &\hat{\pi }_{i} = \frac{{\hat{p}_{i + } }}{R},\end{aligned}$$

(32)

since p_i+ is estimated as the second expression of Eq. (20). As indicated in “Appendix 1”, $\hat{I}_{e}$ is not an unbiased estimator of I_e since nE($\hat{I}_{e}$) = (n − 1)I_e + R⁻¹{1 + (R − 1)I_o}. This is why the unbiased estimator $\hat{I}_{eU}$ of I_e and the new estimator $\hat{\kappa }_{FU}$ of κ_F will be

$$\begin{aligned}&\hat{I}_{eU} = \frac{{n\hat{I}_{e} - {{\left\{ {1 + \left( {R - 1} \right)\hat{I}_{o} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {1 + \left( {R - 1} \right)\hat{I}_{o} } \right\}} R}} \right. \kern-0pt} R}}}{n - 1}\\ {\text{and}}\quad &\hat{\kappa }_{FU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {Rn - 1} \right)\hat{\kappa }_{F} + 1}}{{\left( {R - 1} \right)\hat{\kappa }_{F} + \left\{ {R\left( {n - 1} \right) + 1} \right\}}}.\end{aligned}$$

(33)

Its variance, as deduced in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{FU} } \right) = \frac{{\left\{ {\left( {nR - 1} \right) - \left( {R - 1} \right)\kappa_{F} } \right\}^{4} }}{{\left\{ {R^{2} n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{F} } \right).$$

(34)

Through the first expression of Eq. (33), $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $\hat{I}_{e}$ − R⁻¹{1 + (R − 1) $\hat{I}_{o}$}, which is also proportional to − {1 + (R − 1)$\hat{\kappa }_{F}$} ≤ 0 if and only if $\hat{\kappa }_{F}$ ≥ − (R − 1)⁻¹. Therefore, $\hat{\kappa }_{FU}$ ≥ $\hat{\kappa }_{F}$ in the case of positive agreement.

Another way of extending κ_S is to use the two-pairwise method. In this case, in “Appendix 1” it is demonstrated that

$$\begin{aligned} &k_{F2} = (I_{o} - I_{e} )/(1 - I_{e} )\quad\\ {\text{where}}\quad &I_{e} = \left[ {\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {(R - 2)\sum\limits_{r} {p_{ir} p_{jr} } + p_{i + } p_{j + } } \right\}} } } \right]/2R(R - 1),\end{aligned}$$

(35)

and therefore its estimated values in a traditional way would be

$$\hat{\kappa }_{F2} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{1}{{2n^{2} R\left( {R - 1} \right)}}\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {\left( {R - 2} \right)\sum\limits_{r} {n_{ir} n_{jr} + } n_{i + } n_{j + } } \right\}} } .$$

In order to obtain the unbiased estimator of I_e, the second expression of Eq. (11) is, in the current terms, $\hat{I}_{eU} \left( {r,r^{\prime}} \right)$ = [$n\hat{I}_{e} \left( {r,r^{\prime}} \right)$ − {1 + $\hat{I}_{o} \left( {r,r^{\prime}} \right)$/2}]/(n − 1). Applying expressions (22) and (23) it is obtained that the second expression of Eq. (11) is also applied to the current case, in such a way that the conclusions obtained in the case of Scott’s Pi are valid, changing the letter S with F2. In this way

$$\begin{aligned}&\hat{\kappa }_{F2U} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{F2} + 1}}{{\left( {2n - 1} \right) + \hat{\kappa }_{F2} }},\\ {\text{where}}\quad &\hat{I}_{eU} = \frac{{n\hat{I}_{e} - {{\left( {1 + \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 + \hat{I}_{o} } \right)} 2}} \right. \kern-0pt} 2}}}{n - 1},V\left( {\hat{\kappa }_{F2U} } \right) = \frac{{\left( {2n - 1 - \kappa_{F2} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2}}}V\left( {\hat{\kappa }_{F2} } \right),\end{aligned}$$

and $\hat{\kappa }_{F2U}$ ≥ $\hat{\kappa }_{F2}$ when $\hat{\kappa }_{F2}$ ≥ 0. Nevertheless, to the best of our knowledge, now the value of V($\hat{\kappa }_{F2}$) is not known.

3.4 Krippendorf's multi-rater alpha

Now the objective is similar to that of Sect. 2.4: to estimate κ_F as in expression (32), but changing the value of $\hat{I}_{o}$ for a value $\hat{I}_{oC}$ defined as expression (14). In this way

$$\hat{\kappa }_{K} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n}\quad {\text{and}}\quad \hat{I}_{e} = \frac{1}{{n^{2} R^{2} }}\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} R_{i + } R_{j + } } } .$$

Given the formal equality of the expressions, all of the previous conclusions can be accepted, with the necessary changes. In particular,

$$\hat{\kappa }_{K} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{F} + 1}}{2n}\quad {\text{and}}\quad \hat{\kappa }_{KU} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{FU} + 1}}{2n} = \frac{{\left( {n - 1} \right) + \left\{ {2n\left( {n - 1} \right) + 1} \right\}\hat{\kappa }_{K} }}{{2n\left( {n - 1} \right) + n\hat{\kappa }_{K} }},$$

(36)

$$\hat{\kappa }_{F} \le \hat{\kappa }_{K} \le \hat{\kappa }_{FU} \le \hat{\kappa }_{KU} ,$$

(37)

$$V\left( {\hat{\kappa }_{KU} } \right) = \left( {\frac{2n - 1}{{2n}}} \right)^{2} \times \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{F} } \right).$$

(38)

In a similar way for the two-pairwise method, where now

$$\begin{aligned}&\hat{\kappa }_{K2} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }},\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n},\\ {\text{and}}\quad &\hat{I}_{e} = \frac{{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} \left\{ {\left( {R - 2} \right)\sum\nolimits_{r} {n_{ir} n_{jr} + } n_{i + } n_{j + } } \right\}} } }}{{2n^{2} R\left( {R - 1} \right)}}.\end{aligned}$$

Therefore, expressions (36) to (38) are also valid putting number “2” after the letters K or F in the sub-indexes of these expressions.

3.5 Gwet's multi-rater AC1/2

For the case of multi-raters, Gwet (2008) defined the same measures of agreement AC1/2 κ_G and $\hat{\kappa }_{G}$ of expressions (16) and (17) respectively, but with π_i and $\hat{\pi }_{i}$ alluding to the Fleiss values of expressions (31) and (32) respectively. Therefore, I_e = W $\left( {1 - \sum\nolimits_{i} {\pi_{i}^{2} } } \right)$/ {K(K − 1)} = W $\left( {1 - {{\sum\nolimits_{i} {p_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {p_{i + }^{2} } } {R^{2} }}} \right. \kern-0pt} {R^{2} }}} \right)$/{K(K − 1)} and

$$\hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \sum\limits_{i} {\hat{\pi }_{i}^{2} } } \right\} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \frac{{\sum\nolimits_{i} {\hat{p}_{i + }^{2} } }}{{R^{2} }}} \right\} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \frac{{\sum\nolimits_{i} {R_{i + }^{2} } }}{{n^{2} R^{2} }}} \right\}.$$

(39)

"Appendix 1" demonstrates that $\hat{\pi }_{i}^{2}$ is not an unbiased estimator of $\pi_{i}^{2}$ − see expression (48) − , so that $\hat{I}_{e}$ is also not an unbiased estimator of I_e, which is justified in this same Appendix as the unbiased estimator $\hat{I}_{eU}$ of I_e is

$$\hat{I}_{eU} = \frac{{n\hat{I}_{e} - A}}{n - 1},\quad {\text{where}}\quad A = \frac{{W\left( {R - 1} \right)\left( {1 - \hat{I}_{oN} } \right)}}{{RK\left( {K - 1} \right)}}\quad {\text{and}}\quad \hat{I}_{oN} = \frac{{\sum\nolimits_{i} {R_{is}^{2} } - nR}}{{nR\left( {R - 1} \right)}}.$$

(40)

Therefore, the new estimator $\hat{\kappa }_{GU}$ of κ_G will be,

$$\hat{\kappa }_{GU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + B}}{{\left( {n - 1} \right) + B}}\quad {\text{where}}\quad B = \frac{{A - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$

(41)

It can be observed that now it is not viable to determine V($\hat{\kappa }_{GU}$) directly from the value of V($\hat{\kappa }_{G}$). “Appendix 1” demonstrates that $\hat{I}_{eU} - \hat{I}_{e}$ ≥ 0, so that now we also find that $\hat{\kappa }_{GU}$ ≤ $\hat{\kappa }_{G}$.

An alternative is to use the two-pairwise method. In this case, “Appendix 1” demonstrates that

$$\kappa_{G2} = \frac{{I_{o} - I_{e} }}{{1 - I_{e} }}\quad {\text{where}}\quad I_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left[ {1 - \frac{1}{{2R\left( {R - 1} \right)}}\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {p_{ir}^{2} + \sum\nolimits_{i} {p_{i + }^{2} } } } } \right\}} \right],$$

(42)

and therefore its estimated (biased) values are, because of expression (20)

$$\hat{\kappa }_{G2} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left[ {1 - \frac{1}{{2R\left( {R - 1} \right)n^{2} }}\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {n_{ir}^{2} + \sum\nolimits_{i} {n_{i + }^{2} } } } } \right\}} \right].$$

(43)

To obtain unbiased estimator of I_e, expression (18) is, in current terms, $\hat{I}_{eU} \left( {r,r^{\prime } } \right)$ = [$n\hat{I}_{e} \left( {r,r^{\prime } } \right)$ − W{1 − $\sum\nolimits_{i} {\hat{p}_{{ir,ir^{\prime } }} }$}/{2K(K − 1)}]/(n − 1). Applying expression (23) we obtain the value for the current $\hat{I}_{eU}$, which provides the value of $\hat{\kappa }_{G2U}$; i.e.

$$\hat{\kappa }_{G2U} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }}\quad {\text{where}}\quad \hat{I}_{eU} = \frac{{n\hat{I}_{e} - X_{N} }}{n - 1},X_{N} = \frac{{W\left( {1 - \hat{I}_{oN} } \right)}}{{2K\left( {K - 1} \right)}},$$

(44)

and $\hat{I}_{oN}$ as in expression (40). Note that in this expression $\hat{I}_{eU}$ has the same form as in expression (18), so that $\hat{\kappa }_{G2U}$ can be put as a function of $\hat{\kappa }_{G2}$ in a similar way to in expression (19):

$$\hat{\kappa }_{G2U} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G2} + Y_{N} }}{{\left( {n - 1} \right) + Y_{N} }}\quad {\text{where}}\quad Y_{N} = \frac{{X_{N} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$

As in case R = 2 it occurred that $\hat{I}_{eU} \left( {r,r^{\prime } } \right)$ ≥ $\hat{I}_{e} \left( {r,r^{\prime } } \right)$, through expression (23) it is deduced that in the actual case $\hat{I}_{eU}$ ≥ $\hat{I}_{e}$; therefore $\hat{\kappa }_{G2U}$ ≤ $\hat{\kappa }_{G2}$. “Appendix 1” provides a more direct demonstration of the previous statement. To the best of our knowledge, the value of V($\hat{\kappa }_{G2}$) is not known.

4 Examples

Table 1(a) contains the data from a classic example by Fleiss et al. (2003) in which R = 2 raters diagnose n = 100 individuals in K = 3 categories (Psychotic, Neurotic, and Organic). Its part (b) specifies the values of the eight kappa coefficients mentioned in Sect. 2, all of which are calculated for the non-weighted case (w_ij = δ_ij). It can be observed that the eight coefficients verify the properties mentioned in Sect. 2; for example, all of the new estimators have a value greater than or equal to that of the classic ones, except in the case of the coefficient of Gwet in which case the opposite happens. Nevertheless, the first are only slightly different from the latter. This is due to the fact that the current sample size (n = 100) is too large to show the differences between the estimators. When the sample size is small (n = 8), as occurs in the example of Gwet 2021b (p 109) in Table 2(a) (R = 2, K = 3), the differences are more evident, as shown by the results in Table 2(b).

For the case of more than two raters, Table 3(a) and (b) show the values of R_is and n_ir, respectively, values which are obtained from the data x_sr in an example by Gwet 2021b (p 341) related to the change in the coloring of Stickleback fish (R = 4, K = 5, n = 50). Table 3(c) shows the values of the fourteen kappa coefficients mentioned in Sect. 3, all of which are also calculated for the non-weighted case (w_ij = δ_ij). It can be observed that the fourteen coefficients verify the properties mentioned in Sect. 3. It is also observed that although the values of n and $\hat{\kappa }$ are moderate, all of the new coefficients are greater than the classic ones in at least one unit of the second decimal. The exception is the case of the two coefficients of Gwet, in which the differences obtained are very small.

5 Simulation

This section has two objectives. Firstly, to assess the bias of the two estimators of κ_X ($\hat{\kappa }_{X}$ and $\hat{\kappa }_{XU}$) in the case of R = 2, where X refers to C, S, K or G. Secondly, to assess the behaviour of the estimator of the variance $\hat{V}\left( {\hat{\kappa }_{CU} } \right)$, in order to exemplify that the new variances act coherently in relation to the classic ones.

To assess the two estimators, the procedure is as follows. Let us consider that the observed frequencies in Table 1(a), divided by n = 100, are the true probabilities p_ij of the problem mentioned, in which R = 2 and K = 3; for example, p₁₁ = 75/100 = 0.75. In that case the value $\hat{\kappa }_{C}$ = 0.676 of the Table 1(b) becomes the population value κ_C = 0.676 of the Cohen kappa coefficient, since the values $\hat{I}_{o}$ and $\hat{I}_{e}$ of $\hat{\kappa }_{C}$ become the values I_o and I_e of κ_C. If we now extract N = 10,000 random samples of the multinomial distribution of parameters {p_ij, n = 100}, each sample will provide two estimators $\hat{\kappa }_{Ch}$ and $\hat{\kappa }_{CUh}$ of κ_C. The means $\overline{\hat{\kappa }}_{C} = {{\Sigma_{h} \hat{\kappa }_{Ch} } \mathord{\left/ {\vphantom {{\Sigma_{h} \hat{\kappa }_{Ch} } N}} \right. \kern-0pt} N}$ and $\overline{\hat{\kappa }}_{CU} = {{\Sigma_{h} \hat{\kappa }_{CUh} } \mathord{\left/ {\vphantom {{\Sigma_{h} \hat{\kappa }_{CUh} } N}} \right. \kern-0pt} N}$ of the values $\hat{\kappa }_{Ch}$ and $\hat{\kappa }_{CUh}$ should be approximately equal to κ_C = 0.676 if the estimators were unbiased. The results of this simulation are provided on the sixteenth line of results in Table 4. The rest of the lines, where other values of K, n, and κ_C are used, were obtained in a similar way. It can be seen that in general κ_C = $\overline{\hat{\kappa }}_{CU}$ ≥ $\overline{\hat{\kappa }}_{C}$, except in two case in which κ_C > $\overline{\hat{\kappa }}_{C}$ ≥ $\overline{\hat{\kappa }}_{CU}$. Therefore, $\hat{\kappa }_{CU}$ is less biased than $\hat{\kappa }_{C}$ and, for the accuracy used, is generally unbiased. Nevertheless, $\hat{\kappa }_{C}$ is only unbiased for values n ≥ 50 or 100, depending on the value of K.

Table 4 Results of the 10,000 simulations performed for the kappa values indicated

Full size table

The same tables and previous simulations allow us to obtain the corresponding results of the other two pairs of estimators (see the rest of Table 4). In the case of Scott's pi coefficient, it is also observed that κ_S = $\overline{\hat{\kappa }}_{SU}$ ≥ $\overline{\hat{\kappa }}_{S}$, except in four cases in which κ_S > $\overline{\hat{\kappa }}_{SU}$ ≥ $\overline{\hat{\kappa }}_{S}$, so that $\hat{\kappa }_{SU}$ is also generally unbiased; additionally $\overline{\hat{\kappa }}_{SU}$ = $\overline{\hat{\kappa }}_{S}$ only for n = 100. The conclusions are a little different in the case of Krippendorf's alpha coefficient; in general it still occurs that κ_K = $\overline{\hat{\kappa }}_{KU}$ ≥ $\overline{\hat{\kappa }}_{K}$, except in five cases in which κ_K < $\overline{\hat{\kappa }}_{KU}$ or κ_K > $\overline{\hat{\kappa }}_{KU}$, in such a way that $\hat{\kappa }_{KU}$ may also underestimate κ_K; now $\overline{\hat{\kappa }}_{KU}$ = $\overline{\hat{\kappa }}_{K}$ on some occasions when n ≥ 50. As can be seen, the three pairs of previous coefficients are either unbiased or they underestimate the value of the populational parameter. In the case of Gwet's AC1 coefficient, the opposite happens. In general κ_G = $\overline{\hat{\kappa }}_{GU}$ ≤ $\overline{\hat{\kappa }}_{G}$, except in four cases in which κ_G < $\overline{\hat{\kappa }}_{GU}$ or κ_G > $\overline{\hat{\kappa }}_{GU}$, so that both estimators are either unbiased or they overestimate the value of the populational parameter. Now the equality $\overline{\hat{\kappa }}_{GU}$ = $\overline{\hat{\kappa }}_{G}$ generally happens when K > 2 and n ≥ 50.

The general conclusion is that the estimators $\hat{\kappa }_{XU}$ are generally unbiased and, when they are biased, their bias is lower than that of the estimators $\hat{\kappa }_{X}$. When there is bias, it is positive in the case of the Gwet coefficient, and is negative in the other three cases.

Let us now consider the case of variance. The classic estimator $\hat{\kappa }_{C}$ has an unknown variance $V_{E} \left( {\hat{\kappa }_{C} } \right)$ which can be estimated in a quite precise way through the sample variance $\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)$ of the values $\hat{\kappa }_{Ch}$ of the 10,000 simulations. Moreover, each simulation provides an estimator $\hat{V}_{h} \left( {\hat{\kappa }_{C} } \right)$ of $V_{E} \left( {\hat{\kappa }_{C} } \right)$ obtained through the formula of Fleiss et al. (1969); the average value $\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)$ of these 10,000 estimators, compared to $\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)$, allows us to check the bias of this estimator of the variance. The same reasoning is used in the case of the estimator $\hat{\kappa }_{CU}$, although now $\hat{V}_{h} \left( {\hat{\kappa }_{CU} } \right)$ is obtained through expression (5). The results are in Table 5. It can be seen that $\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)$ ≈ ${\hat{\text{V}}}_{E} \left( {\hat{\kappa }_{C} } \right)$ for n ≥ 20, being in general $V_{E} \left( {\hat{\kappa }_{CU} } \right)$ > ( <) $V_{E} \left( {\hat{\kappa }_{C} } \right)$ when κ_C = 0.4 (0.8). It is also observed to that the classic variance $\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)$ usually underestimates (overestimates) $\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)$ when κ_C = 0.4 (0.8), the differences being small when n ≥ 50. However, the new variance $\overline{\hat{V}}\left( {\hat{\kappa }_{CU} } \right)$ almost always underestimates $\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)$, the differences being small when n ≥ 50, but somewhat higher than in the previous case. In general, $\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)$ is closer to $\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)$ than $\overline{\hat{V}}\left( {\hat{\kappa }_{CU} } \right)$ is to $\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)$.

Table 5 Results of the 10,000 simulations performed for the variances of two estimators of the Cohen kappa coefficient

Full size table

6 Assessment of the difference between each pair of estimators

The objective of this section is to assess the difference Δ_XU = ∣$\hat{\kappa }_{XU}$ − $\hat{\kappa }_{X}$∣, when $\hat{\kappa }_{X}$ is any of the traditional estimators. In general, these differences are only appreciable with small samples, so that it is of interest to determine from what value of n onwards is it practically indifferent to calculate $\hat{\kappa }_{XU}$ or $\hat{\kappa }_{X}$.

For $\hat{\kappa }_{CU}$, in which $\hat{\kappa }_{CU}$ ≥ $\hat{\kappa }_{C}$, through expression (4), Δ_CU = $\hat{\kappa }_{C}$(1 − $\hat{\kappa }_{C}$)/{(n − 1) + $\hat{\kappa }_{C}$}. Its maximum value in $\hat{\kappa }_{C}$ ≥ 0 is reached in $\hat{\kappa }_{C}$ = (n − 1)^0.5/{n^0.5 + (n − 1)^0.5} and is {n^0.5 + (n − 1)^0.5}⁻². Therefore, Δ_CU < 0.01 (or 0.02) when n > 50 (or 17). The conclusion is also valid for Δ_HU and Δ_LU = ∣$\hat{\rho }_{LU} - \hat{\rho }_{L}$∣, since $\hat{\kappa }_{HU}$ and $\hat{\rho }_{LU}$ have the same form as $\hat{\kappa }_{CU}$.

For $\hat{\kappa }_{SU}$, in which $\hat{\kappa }_{SU}$ ≥ $\hat{\kappa }_{S}$, Δ_SU = (1 − $\hat{\kappa }_{S}^{2}$)/{(2n − 1) + $\hat{\kappa }_{S}$} through expression (12). Its maximum value in $\hat{\kappa }_{S}$ ≥ 0 is reached in $\hat{\kappa }_{S}$ = 0 and is 1/(2n − 1). Therefore, Δ_SU < 0.01 (or 0.02) when n > 100 (or 33). The conclusion is also valid for Δ_F2U, since $\hat{\kappa }_{F2U}$ has the same form as $\hat{\kappa }_{FU}$. The case of $\hat{\kappa }_{KU}$ for R = 2 − last expression of Eq. (15) − provides a maximum for Δ_KU of 1/2n and leads to the same conclusion as above. The conclusion is also maintained for $\hat{\kappa }_{KU}$ in R > 2 and $\hat{\kappa }_{K2U}$, since they have the same form as $\hat{\kappa }_{KU}$ for R = 2.

The case of $\hat{\kappa }_{FU}$, in which $\hat{\kappa }_{FU}$ ≥ $\hat{\kappa }_{F}$, is somewhat more complex. Through expression (33), Δ_FU = (1 − $\hat{\kappa }_{F}$){R − (R − 1)(1 − $\hat{\kappa }_{F}$)}/{Rn − (R − 1)(1 − $\hat{\kappa }_{F}$)}. Its maximum value in $\hat{\kappa }_{F}$ ≥ 0 is reached in $\hat{\kappa }_{F}$ = {(R − 1)(n − 1)^0.5 − n^0.5}/[(R − 1){n^0.5 + (n − 1)^0.5}] and is {R/(R − 1)} × {n^0.5 + (n − 1)^0.5}⁻². Note that for R = 2 this value is double that which is obtained for $\hat{\kappa }_{CU}$. Therefore, if we require that Δ_FU < 0.01 (or 0.02), the value of n depends on the value of R. For example: n > 100 (or 33) for R = 2, n > 75 (or 25) for R = 3, n > 63 (or 21) for R = 5, and n > 56 (or 19) for R = 10. Moreover, Δ_FU is a decreasing function in R, taking its extreme values $\hat{\kappa }_{F}$(1 − $\hat{\kappa }_{F}$)/{(n − 1) + $\hat{\kappa }_{F}$} in R = ∞, and (1 − $\hat{\kappa }_{F}^{2}$)/{(2n − 1) + $\hat{\kappa }_{F}$} in R = 2. As those expressions have the same form as Δ_CU and Δ_SU respectively, then the precise minimum values of n for this case are an intermediate value from among the pairs of values indicated for those two cases. This is compatible with the numerical results above.

The case of $\hat{\kappa }_{GU}$, in which $\hat{\kappa }_{GU}$ ≤ $\hat{\kappa }_{G}$, is much more complex since its values Δ_GU also depend on $\hat{I}_{e}$ because of expression (41). In the most simple situation -the unweighted case-, it can be demonstrated that Δ_GU ≤ {R/(R − 1)}/{m^0.5 + (m − 1)^0.5}⁻², with m = (n − 1)(K − 1), an expression that depends on n, R and K; the level is also valid for the weighted case, although it is conservative. Therefore, if R = 2 and we require that Δ_GU < 0.01 (or 0.02), the value of n depends on the value of K. For example: n > 101 (or 34) for K = 2, n > 51 (or 17) for K = 3, and n > 26 (or 9) for K = 5. The conclusion is also valid for Δ_G2U, since $\hat{\kappa }_{G2U}$ has the same form as $\hat{\kappa }_{GU}$.

The previous formulas provide values which are compatible with the results of Tables 1, 2, 3 and 4. Excluding the Gwet estimators and adopting the criterion that we want to guarantee that Δ_XU < 0.02 (0.01), the overall conclusion is that we should use the current estimators at least when n ≤ 17 (50) in the case of $\hat{\kappa }_{CU}$ and $\hat{\kappa }_{HU}$, or when n ≤ 33 (100) in the rest of the cases.

7 Conclusions

There are different types of kappa coefficients which measure the experimental degree of agreement between R raters. In this article, we have focused on Cohen's kappa (Cohen 1960, 1968), Scott's pi (Scott 1955), Gwet's AC1/2 (Gwet 2008) and Krippendorf's alpha coefficients (Krippendorf 1970, 2004), whether weighted or not, for R = 2, and in its pairwise type extensions, Hubert's kappa (Hubert 1977; Conger 1980), Fleiss's kappa (Fleiss 1971), Gwet's AC1/2 and Krippendorf's alpha coefficients, for R > 2. In this last case (R > 2), the four measures of agreement use the pairwise method to determine the observed index of agreements I_o, but only the measure of Hubert's kappa also uses the pairwise method to determine the expected index of agreements I_e. We have called the measures obtained in this last way as two-pairwise measures. We have also defined the other three coefficients (Fleiss's kappa, Gwet's AC1/2 and Krippendorf's alpha) from the two-pairwise point of view, thus obtaining the three Fleiss's kappas two-pairwise, etc. That is why the number of agreement coefficients that have been defined is eleven.

The article demonstrates that all of the traditional estimators of the eleven coefficients are based on biased estimators of I_e. The alternative is to use the eleven new proposed coefficients, which are based on unbiased estimators of I_e. In all cases, the traditional estimators are smaller than or equal to the new ones, except for the case of Gwet, where it is the other way around. The simulations carried out for the case of R = 2 show that the classic estimators $\hat{\kappa }_{X}$ usually underestimate κ_X (or overestimate, in the case of X = G), while the new estimators $\hat{\kappa }_{XU}$ are usually approximately unbiased. Additionally, it is verified that the new estimators $\hat{\kappa }_{XU}$ may be unnecessary when the sample size n is sufficiently large (e.g. n > 30). The article also provides the variances of the new estimators as a function of the variances of the classic estimators, except in the case of the Gwet estimators.

One question of interest is the relation between the coefficients and estimators of Hubert's kappa (Hubert 1977; Conger 1980), the CCC (Lin 1989, 2000), and the ICC (Shrout and Fleiss 1979; Carrasco and Jover 2003), when in the first case quadratic weights are used. In the article it has been justified that: (1) κ_H = ρ_L = ρ_I2, with respect to the coefficients; (2) $\hat{\kappa }_{H}$ = $\hat{\rho }_{L}$, with respect to classical estimators based on biased estimators of the components of the coefficients; and (3) $\hat{\kappa }_{HU}$ = $\hat{\rho }_{LU}$ = $\hat{\rho }_{I2}$, with respect to classical ($\hat{\rho }_{LU}$ and $\hat{\rho }_{I2}$) or new ($\hat{\kappa }_{HU}$) estimators based on unbiased estimators of all components of the coefficients. These statements are true for R ≥ 2, so that for R = 2 it is obtained that: κ_C = ρ_L = ρ_I2, $\hat{\kappa }_{C}$ = $\hat{\rho }_{L}$, and $\hat{\kappa }_{CU}$ = $\hat{\rho }_{LU}$ = $\hat{\rho }_{I2}$.

Finally, the entire article has been developed for the general case in which the measures are defined based on any w_ij weights, thus avoiding a repetition of expressions and demonstrations. Nevertheless the non-weighted case (w_ij = δ_ij) is very common. To make the text more reader friendly “Appendix 4” includes the eleven non-weighted coefficients mentioned in this article.

References

Barnhart HX, Haber M, Song J (2002) Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58:1020–1027. https://doi.org/10.1111/j.0006-341X.2002.01020.x
Article MathSciNet PubMed Google Scholar
Carrasco JL, Jover LL (2003) Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59:849–858
Article MathSciNet PubMed Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20:37–46
Article Google Scholar
Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreemet or parcial credit. Psychol Bull 70:213–220
Article CAS PubMed Google Scholar
Conger AJ (1980) Integration and generalization of kappas for multiple raters. Psychol Bull 88:322–328. https://doi.org/10.1037/0033-2909.88.2.322
Article Google Scholar
Davies M, Fleiss JL (1982) Measuring agreement for multinomial data. Biometrics 38:1047–1051
Article Google Scholar
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378–382
Article Google Scholar
Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measur 33(3):613–619
Article Google Scholar
Fleiss JL, Cohen J, Everitt BS (1969) Large sample standard errors of kappa and weighted kappa. Psychol Bull 72:323–327. https://doi.org/10.1037/h0028106
Article Google Scholar
Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New York
Book Google Scholar
Gwet KL (2008) Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 61(1):29–68
Article MathSciNet PubMed Google Scholar
Gwet KL (2021a) Large-sample variance of Fleiss generalized kappa. Educ Psychol Measur 81(4):781–790. https://doi.org/10.1177/0013164420973080
Article PubMed PubMed Central Google Scholar
Gwet KL (2021b) Handbook of inter-rater reliability. Volume 1: analysis of categorical ratings, 5th edn. Gaithersburg, MD, USA
Hubert L (1977) Kappa revisited. Psychol Bull 48(2):289–297
Article Google Scholar
Janson S, Olsson U (2001) A measure of agreement for interval or nominal multivariate observations. Educ Psychol Measur 61(2):277–289
Article MathSciNet Google Scholar
King TS, Chinchilli VM (2001) A generalized concordance correlation coefficient for continuous and categorical data. Statist Med 20(14):2131–2147. https://doi.org/10.1002/sim.845
Article CAS Google Scholar
Krippendorff K (1970) Estimating the reliability, systematic error, and random error of interval data. Educ Psychol Measur 30:61–70
Article Google Scholar
Krippendorff K (2004) Measuring the reliability of qualitative text analysis data. Qual Quant 38:787–800
Article Google Scholar
Landis JR, Koch GG (1975a) A review of statistical methods in the analysis of data arising from observer reliability studies (Part I). Stat Neerl 29:101–123
Article Google Scholar
Landis JR, Koch GG (1975b) A review of statistical methods in the analysis of data arising from observer reliability studies (Part II). Stat Neerl 29:151–161
Article Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article CAS PubMed Google Scholar
Lin LI-K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255–268
Article CAS PubMed Google Scholar
Lin LI-K (2000) A note of the concordance correlation coefficient. Letter to the Editor (Corrections). Biometrics 56:324–325
MathSciNet Google Scholar
Martín Andrés A, Álvarez Hernández M (2020) Hubert’s multi-rater kappa revisited. Br J Math Stat Psychol 73(1):1–22
Article PubMed Google Scholar
Miettinen O, Nurminen M (1985) Comparative analysis of two rates. Stat Med 4:213–226. https://doi.org/10.1002/sim.4780040211
Article CAS PubMed Google Scholar
Schuster C, Smith DA (2005) Dispersion-weighted kappa: an integrative framework for metric and nominal scale agreement coefficients. Psychometrika 70(1):135–146
Article MathSciNet Google Scholar
Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19:321–325
Article Google Scholar
Shrout PE, Fleiss JL (1979) Intraclass correlations: Uses in assessing rater reliability. Psychol Bull 86:420–428
Article CAS PubMed Google Scholar
Warrens MJ (2010) Inequalities between multi-rater kappas. Adv Data Anal Classif 4:271–286
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by the Ministry of Science and Innovation (Spain), Grant PID2021-126095NB-I00 funded by MCIN/AEI/https://doi.org/10.13039/501100011033 and by “ERDF A way of making Europe”.

Funding

Funding for open access publishing: Universidad de Granada/CBUA. Funding for open access charge: Universidad de Granada / CBUA. The authors are grateful to the reviewers for their invaluable comments.

Author information

Authors and Affiliations

Biostatistics, Faculty of Medicine, C8-01, University of Granada, 18071, Granada, Spain
A. Martín Andrés
CITMAga, 15782, Santiago de Compostela, Spain
M. Álvarez Hernández
Defense University Center, Spanish Naval Academy, Marín, Pontevedra, Spain
M. Álvarez Hernández

Authors

A. Martín Andrés
View author publications
You can also search for this author in PubMed Google Scholar
M. Álvarez Hernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Martín Andrés.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix 1: Average values of some functions of parameters of the multinomial distribution and simplification of some expressions

In a multinomial distribution M{n; p_i}, it occurs that E($\hat{p}_{i}$) = p_i, V($\hat{p}_{i}$) = E($\hat{p}_{i}^{2}$) − E²($\hat{p}_{i}$) = p_i(1 − p_i)/n and Cov($\hat{p}_{i}$,$\hat{p}_{j}$) = E($\hat{p}_{i} \hat{p}_{j}$) − E($\hat{p}_{i}$) × E($\hat{p}_{j}$) = − p_ip_j/n (if i ≠ j). Therefore

$$E\left( {\hat{p}_{i} \hat{p}_{j} } \right) = \frac{{\left( {n - 1} \right)p_{i} p_{j} + \delta_{ij} p_{i} }}{n}.$$

(45)

In the case of Sect. 2, applying the previous point to the distribution M{n; p_ij} it is deduced that E($\hat{p}_{i \cdot } \hat{p}_{ \cdot j}$) = $E\left[ {\left( {\sum\nolimits_{h} {\hat{p}_{ih} } } \right)\left( {\sum\nolimits_{t} {\hat{p}_{tj} } } \right)} \right]$ = $\sum\nolimits_{h} {\sum\nolimits_{t} {E\left( {\hat{p}_{ih} \hat{p}_{tj} } \right)} }$ = $\sum\nolimits_{h} {\sum\nolimits_{t} {{{\left\{ {\left( {n - 1} \right)p_{ih} p_{tj} + \delta_{ti} \delta_{hj} p_{ij} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {n - 1} \right)p_{ih} p_{tj} + \delta_{ti} \delta_{hj} p_{ij} } \right\}} n}} \right. \kern-0pt} n}} }$, where the last equality is due to expression (45), and h, t = 1, 2, …, K. Operating it is obtained that E($\hat{p}_{i \cdot } \hat{p}_{ \cdot j}$) = {(n − 1)p_i·p_·j + p_ij}/n, as in expression (2). In the same way, E($\hat{p}_{i \cdot } \hat{p}_{j \cdot }$) = $\sum\nolimits_{h} {\sum\nolimits_{t} {E\left( {\hat{p}_{ih} \hat{p}_{jt} } \right)} }$ = $\sum\nolimits_{h} {\sum\nolimits_{t} {{{\left\{ {\left( {n - 1} \right)p_{ih} p_{jt} + \delta_{ij} \delta_{ht} p_{ih} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {n - 1} \right)p_{ih} p_{jt} + \delta_{ij} \delta_{ht} p_{ih} } \right\}} n}} \right. \kern-0pt} n}} }$ = {(n − 1)p_i·p_j· + δ_ijp_i·}/n so that,

$$E\left( {\hat{p}_{i \cdot } \hat{p}_{j \cdot } } \right) = \frac{{\left( {n - 1} \right)p_{i \cdot } p_{j \cdot } + \delta_{ij} p_{i \cdot } }}{n}\quad {\text{and}}\quad \widehat{{p_{i \cdot } p_{j \cdot } }} = \frac{{n\hat{p}_{i \cdot } \hat{p}_{j \cdot } - \delta_{ij} \hat{p}_{i \cdot } }}{n - 1}.$$

(46)

In a similar way, for $\hat{p}_{ \cdot i} \hat{p}_{ \cdot j}$. As $\hat{\pi }_{i} \hat{\pi }_{j} = {{\left( {\hat{p}_{i \cdot } \hat{p}_{j \cdot } + \hat{p}_{ \cdot i} \hat{p}_{ \cdot j} + \hat{p}_{i \cdot } \hat{p}_{ \cdot j} + \hat{p}_{ \cdot i} \hat{p}_{j \cdot } } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{p}_{i \cdot } \hat{p}_{j \cdot } + \hat{p}_{ \cdot i} \hat{p}_{ \cdot j} + \hat{p}_{i \cdot } \hat{p}_{ \cdot j} + \hat{p}_{ \cdot i} \hat{p}_{j \cdot } } \right)} 4}} \right. \kern-0pt} 4}$ because of the expression (9) then, having applied the previous equalities, expression (10) is obtained. Finally, regarding the end of Sect. 2.5, through expression (18) it is deduced that $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to n $\hat{I}_{e}$ − W(1 − $\sum\nolimits_{i} {\hat{p}_{ii} }$)/{2(K − 1)} − (n − 1)$\hat{I}_{e}$ = $\hat{I}_{e}$ − W(1 − $\sum\nolimits_{i} {\hat{p}_{ii} }$)/{2(K − 1)} which, through expression (17), is also proportional to 1 + $\sum\nolimits_{i} {\hat{p}_{ii} }$ − 2 $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ = $\sum\nolimits_{i} {\left\{ {\hat{\pi }_{i} + \hat{p}_{ii} - 2\hat{\pi }_{i}^{2} } \right\}}$. Taking into account the value of $\hat{\pi }_{i}$ expression (9) and operating, it is deduced that each term i of the previous expression is also proportional to S_i(1 − S_i) + $\hat{p}_{ii} \left( {1 - \hat{p}_{ii} } \right)$ + 2 $\hat{p}_{ii}$(1 + S_i) ≥ 0, where S_i = $\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} - \hat{p}_{ii}$ ≥ 0. The conclusion is always that $\hat{I}_{eU} - \hat{I}_{e}$ ≥ 0.

In the case of Sect. 3, expression (46) adopts the form,

$$E\left( {\hat{p}_{ir} \hat{p}_{jr} } \right) = \frac{{\left( {n - 1} \right)p_{ir} p_{jr} + \delta_{ij} p_{ir} }}{n}\quad {\text{and}}\quad \widehat{{p_{ir} p_{jr} }} = \frac{{n\hat{p}_{ir} \hat{p}_{jr} - \delta_{ij} \hat{p}_{ir} }}{n - 1}.$$

Let the value I_o = Σ_rΣ_r'≠rΣ_iΣ_jw_ijp_ir,jr'/{R(R − 1)} = Σ_iΣ_jw_ijΣ_rΣ_r'≠rp_ir,jr' defined in Sect. 3.1, the one we are trying to estimate. For a given subject s, the possible pairs of replies (i, j), with i ≠ j, are R_isR_js, and the possible pairs of replies (i, i) are R_is(R_is − 1), since the two raters must be different. Adding in s and dividing by n we obtain the estimations Σ_rΣ_r'≠r $\hat{p}_{ir,jr^{\prime}}$ and Σ_rΣ_r'≠r $\hat{p}_{ir,ir^{\prime}}$ of Σ_rΣ_r'≠rp_ir,jr' and Σ_rΣ_r'≠rp_ir,ir' respectively. Therefore, the estimation $\hat{I}_{o}$ of the value I_o of the second expression of the beginning of this paragraph will verify that nR(R − 1)$\hat{I}_{o}$ = Σ_iΣ_j≠iw_ijΣ_sR_isR_js + Σ_iw_iiΣ_sR_is(R_is − 1) = Σ_iΣ_jw_ijΣ_sR_isR_js − nR, since Σ_iΣ_sw_iiR_is = nR as w_ii = 1. This leads to the second expression of Eq. (24).

The value of I_e of Sect. 3.2 is given by I_e = Σ_rΣ_r'≠rI_e(r, r') = Σ_rΣ_r'≠rΣ_iΣ_jw_ijp_irp_jr' = Σ_iΣ_jw_ijΣ_rΣ_r'≠rp_irp_jr' = Σ_iΣ_jw_ij(p_i+p_j+ − Σ_rp_irp_jr) since Σ_rΣ_r'≠rp_irp_jr' = Σ_rΣ_r'p_irp_jr' − Σ_rp_irp_jr = Σ_rp_irΣ_r'p_jr' − Σ_rp_irp_jr = p_i+p_j+ − Σ_rp_irp_jr. This leads to the second expression (25).

Regarding what is highlighted in the first paragraph of Sect. 3.3, $R^{2} E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right)$ = $E\left\{ {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{p}_{ir} \hat{p}_{jr^{\prime}} } } } \right\}$ = $E\left\{ {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir} \hat{p}_{jr^{\prime}} } + \sum\nolimits_{r} {\hat{p}_{ir} \hat{p}_{jr} } } } \right\}$. Through expressions (46) and (2) which are placed in the format of Sect. 3, $nR^{2} E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right)$ = (n − 1)$\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {p_{ir} p_{jr^{\prime}} } }$ + (n − 1)$\sum\nolimits_{r} {p_{ir} p_{jr} }$ + $\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {p_{ir,jr^{\prime}} } }$ + $\delta_{ij} \sum\nolimits_{r} {p_{ir} }$ where the sum of the two terms is (n − 1)$\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {p_{ir} p_{jr^{\prime}} } }$ = (n − 1)p_i+p_j+ = (n − 1)R²π_iπ_j. Therefore, $\hat{\pi }_{i} \hat{\pi }_{j}$ is not an unbiased estimator of π_iπ_j since,

$$E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right) = \frac{1}{n}\left[ {\left( {n - 1} \right)\pi_{i} \pi_{j} + \frac{1}{{R^{2} }}\left\{ {\sum\limits_{r} {\sum\limits_{r^{\prime} \ne r} {p_{ir,jr^{\prime}} } } + \delta_{ij} \sum\limits_{r} {p_{ir} } } \right\}} \right].$$

(47)

As $E\left( {\hat{I}_{e} } \right)$ = $\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right)} }$ = n⁻¹[(n − 1)I_e + R⁻²{Σ_iΣ_jw_ijΣ_rΣ_r≠r'p_ir,jr' + Σ_iΣ_jw_ijδ_ijΣ_rp_ir}] = n⁻¹[(n − 1)I_e + R⁻²{R(R − 1)I_o + R}], from here we deduce the first expression of (33).

Regarding what is highlighted in the second paragraph of Sect. 3.3, through expression (8) 4I_e(r, r') = Σ_iΣ_jw_ij(p_ir + p_ir')(p_jr + p_jr') = Σ_iΣ_jw_ij(p_irp_jr + p_ir'p_jr' + p_irp_jr' + p_ir' p_jr) and, through expression (23), 4R(R − 1)I_e = Σ_iΣ_jw_ij[Σ_rΣ_r'≠rp_irp_jr + Σ_rΣ_r'≠rp_ir'p_jr' + Σ_rΣ_r'≠rp_irp_jr' + Σ_rΣ_r'≠rp_ir'p_jr]. As Σ_rΣ_r'≠rp_irp_jr + Σ_rΣ_r'≠rp_ir'p_jr' = 2(R − 1)Σ_rp_irp_jr and Σ_rΣ_r'≠rp_irp_jr' + Σ_rΣ_r'≠rp_ir'p_jr = 2p_i+p_j+ − 2Σ_rp_irp_jr, then expression (35) is deduced.

Regarding the first paragraph of Sect. 3.5, expression (47) for i = j is,

$$E\left( {\hat{\pi }_{i}^{2} } \right) = \frac{1}{n}\left[ {\left( {n - 1} \right)\pi_{i}^{2} + \frac{1}{{R^{2} }}\left\{ {\sum\limits_{r} {\sum\limits_{r^{\prime} \ne r} {p_{ir,ir^{\prime}} } } + \sum\limits_{r} {p_{ir} } } \right\}} \right].$$

(48)

Therefore, the unbiased estimator of $\pi_{i}^{2}$ is $\widehat{{\pi_{i}^{2} }}$ = (n − 1)⁻¹[n $\hat{\pi }_{i}^{2}$ − ${{\left\{ {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir,ir^{\prime}} } } + \sum\nolimits_{r} {\hat{p}_{ir} } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir,ir^{\prime}} } } + \sum\nolimits_{r} {\hat{p}_{ir} } } \right\}} {R^{2} }}} \right. \kern-0pt} {R^{2} }}$] and that of $\sum\nolimits_{i} {\pi_{i}^{2} }$ will be $\sum\nolimits_{i} {\widehat{{\pi_{i}^{2} }}}$ = (n − 1)⁻¹[n $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ − ${{\left\{ {\sum\nolimits_{i} {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir,ir^{\prime}} } } } + \sum\nolimits_{i} {\sum\nolimits_{r} {\hat{p}_{ir} } } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\sum\nolimits_{i} {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir,ir^{\prime}} } } } + \sum\nolimits_{i} {\sum\nolimits_{r} {\hat{p}_{ir} } } } \right\}} {R^{2} }}} \right. \kern-0pt} {R^{2} }}$]. In this last expression, $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ = R⁻²{1 − K(K − 1)$\hat{I}_{e}$/W} through expression (39), $\sum\nolimits_{i} {\sum\nolimits_{r} {\hat{p}_{ir} } }$ = R since $\sum\nolimits_{i} {\hat{p}_{ir} }$ = 1, and $\sum\nolimits_{i} {\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{p}_{ir,ir^{\prime}} } } }$ = R(R − 1) $\hat{I}_{oN}$, where $\hat{I}_{oN}$ is obtained from the second expression of Eq. (22) applied to the non-weighted case of ω_ij = δ_ij. Substituting all of these values in W(1 − $\sum\nolimits_{i} {\widehat{{\pi_{i}^{2} }}}$)/{K(K − 1)} we obtain the value of $\hat{I}_{eU}$ of expression (40). Regarding the statement that $\hat{I}_{eU} - \hat{I}_{e}$ ≥ 0 one must take into account that $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $\hat{I}_{e}$ − A = $\hat{I}_{e}$ − W(R − 1)(1 − $\hat{I}_{oN}$)/{RK(K − 1)}; substituting in this expression the estimators $\hat{I}_{e}$ and $\hat{I}_{oN}$ with their values from the last expressions of Eq. (39) and Eq. (40) respectively, it is obtained that $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $\sum\nolimits_{i} {\sum\nolimits_{s} {R_{is}^{2} } }$ − ${{\sum\nolimits_{i} {R_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {R_{i + }^{2} } } n}} \right. \kern-0pt} n}$ = Σ_iΣ_s(R_is − $\overline{R}_{i}$)² ≥ 0, where $\overline{R}_{i}$ = Σ_sR_is/n.

Regarding what is highlighted in the second paragraph of Sect. 3.5, through expression (16) {K(K − 1)/W}I_e(r, r') = 1 − Σ_i(p_ir + p_ir')²/4 = 1 − Σ_i($p_{ir}^{2}$ + $p_{ir^{\prime}}^{2}$ + 2p_irp_ir')/4. But through expression (23), {K(K − 1)/W}I_e = 1 − Σ_i[2(R − 1)Σ_r $p_{ir}^{2}$ + 2p_i+ − 2Σ_r $p_{ir}^{2}$]/{4R(R − 1)}; this leads to the expression (42). Finally, to demonstrate that in the two-pairwise case it also occurs that $\hat{I}_{eU} - \hat{I}_{e}$ ≥ 0, one must take into that through expression (44) $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to $\hat{I}_{e}$ − X_N = $\hat{I}_{e}$ − W(1 − $\hat{I}_{oN}$)/{2 K(K − 1)}. Substituting in this expression the estimators $\hat{I}_{e}$ and $\hat{I}_{oN}$ through its values of the last expressions of expressions (43) and (40) respectively, it is obtained that $\hat{I}_{eU} - \hat{I}_{e}$ is proportional to nR(R − 2) + Σ_iΣ_s $R_{is}^{2}$ − Σ_i $n_{i + }^{2}$/n − (R − 2)Σ_iΣ_r $n_{ir}^{2}$/2 = Σ_iΣ_s(R_is − $\overline{R}_{i}$)² + (R − 2)Σ_iΣ_rn_ir(n − n_ir)/n ≥ 0.

As stated previously, all of the above is valid if there is only one multinomial sample. Let us suppose that R = 2, that the rater in the rows is a standard one and that the frequencies O_ij are obtained from K multinomial distributions {O_i·; p₁, p₂, …, p_K}, with Σp_i = 1. Now $\hat{I}_{e} = {{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} O_{i \cdot } \hat{p}_{j} } } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} O_{i \cdot } \hat{p}_{j} } } } n}} \right. \kern-0pt} n} = {{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } n}} \right. \kern-0pt} n}^{2}$ is an unbiased estimator of I_e = Σ_iΣ_jw_ijO_i·p_j/n, since E($\hat{p}_{j}$) = p_j.

1.2 Appendix 2: Variances of the new estimators of kappa

From hereon it is assumed that new estimators of kappa are approximately unbiased, since they are based on unbiased estimators of I_o and I_e. In the case of Sect. 2, from expression (4) it is deduced that $\hat{\kappa }_{C}$ = (n − 1)$\hat{\kappa }_{CU}$/(n − $\hat{\kappa }_{CU}$). Therefore d $\hat{\kappa }_{C}$/d $\hat{\kappa }_{CU}$ = n(n − 1)/(n − $\hat{\kappa }_{CU}$)², whose value in E($\hat{\kappa }_{CU}$) ≈ κ_C is n(n − 1)/(n − κ_C)² and, through the delta method, V($\hat{\kappa }_{C}$) = n²(n − 1)² V($\hat{\kappa }_{CU}$)/(n − κ_C)⁴. This leads to expression (5). In a similar way, from expression (12) it is deduced that $\hat{\kappa }_{S}$ = {(2n − 1) $\hat{\kappa }_{SU}$ − 1}/(2n − 1 − $\hat{\kappa }_{SU}$). Therefore, d $\hat{\kappa }_{S}$/d $\hat{\kappa }_{SU}$ = 4n(n − 1)/(2n − 1 − $\hat{\kappa }_{SU}$)², whose value in E($\hat{\kappa }_{SU}$) ≈ κ_S is 4n(n − 1)/(2n − 1 − κ_S)² and V($\hat{\kappa }_{S}$) = 16n²(n − 1)² V($\hat{\kappa }_{SU}$)/(2n − 1 − κ_S)⁴. This leads to expression (13). In the case of Sect. 3.3, from the second expression of Eq (33) it is deduced that $\hat{\kappa }_{F}$ = {(nR − R + 1)$\hat{\kappa }_{FU}$ − 1}/{(nR − 1) − (R − 1)$\hat{\kappa }_{FU}$)}. Therefore d $\hat{\kappa }_{F}$/d $\hat{\kappa }_{FU}$ = R²n(n − 1)/{(Rn − 1) − (R − 1)$\hat{\kappa }_{FU}$}², whose value in E($\hat{\kappa }_{FU}$) ≈ κ_F is R²n(n − 1)/{(nR − 1) − (R − 1)κ_F}² and V($\hat{\kappa }_{F}$) = R⁴n²(n − 1)² V($\hat{\kappa }_{FU}$)/{(nR − 1) − (R − 1)κ_F}⁴. This leads to expression (34). In a similar way with V($\hat{\kappa }_{KU}$) and V($\hat{\kappa }_{F2U}$).

1.3 Appendix 3: Justification of the equality $\hat{\rho }_{LU} = \hat{\rho }_{I2}$ $\hat{\rho }_{L} = \hat{\rho }_{I2S}$ and its simplified formula

Using the notation of the end of Sect. 3.2, the expression $\hat{\rho }_{LU}$ of (28) is equivalent to this one, where $\overline{x}_{ \cdot r}$ = x_·r/n,

$$\hat{\rho }_{LU} = \frac{{2n\sum\nolimits_{r} {\sum\nolimits_{{r^{\prime } \ne r}} {s_{{rr^{\prime } }} } } }}{{2\left( {R - 1} \right)\left( {n - 1} \right)\sum\nolimits_{r} {s_{r}^{2} } + n\sum\nolimits_{r} {\sum\nolimits_{{r^{\prime } }} {\left( {\overline{x}_{ \cdot r} - \overline{x}_{{ \cdot r^{\prime } }} } \right)^{2} } + 2\sum\nolimits_{r} {\sum\nolimits_{{r^{\prime } \ne r}} {s_{{rr^{\prime } }} } } } }}.$$

(49)

As s_rr´ = Σ_s(x_sr − $\overline{x}_{ \cdot r}$)(x_sr´ − $\overline{x}_{{ \cdot r^{\prime } }}$)/(n − 1) = {Σ_sx_srx_sr´ − x_·rx_·r´/n}/(n − 1), $s_{r}^{2}$ = Σ_s(x_sr − $\overline{x}_{ \cdot r}$)²/(n − 1) = (Σ_s $x_{sr}^{2}$ − $x_{ \cdot r}^{2}$/n)/(n − 1), and $\left( {\overline{x}_{ \cdot r} - \overline{x}_{{ \cdot r^{\prime } }} } \right)^{2}$ = (x_·r − x_·r´)²/n², then.

\(\begin{aligned} & \sum\nolimits_{r} {\sum\nolimits_{{r^{\prime } \ne r}} {s_{{rr^{\prime } }} } } = {{\left( {n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } } \right)} \mathord{\left/ {\vphantom {{\left( {n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } } \right)} {\left\{ {n\left( {n - 1} \right)} \right\}}}} \right. \kern-0pt} {\left\{ {n\left( {n - 1} \right)} \right\}}}, \\ & \quad \sum\nolimits_{r} {s_{r}^{2} } = {{\left( {n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } } - \sum\nolimits_{r} {x_{ \cdot r}^{2} } } \right)} \mathord{\left/ {\vphantom {{\left( {n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } } - \sum\nolimits_{r} {x_{ \cdot r}^{2} } } \right)} {\left\{ {n\left( {n - 1} \right)} \right\}}}} \right. \kern-0pt} {\left\{ {n\left( {n - 1} \right)} \right\}}},\quad {\text{and}}\\ &\quad \sum\nolimits_{r} {\sum\nolimits_{{r^{\prime } }} {\left( {\overline{x}_{ \cdot r} - \overline{x}_{{ \cdot r^{\prime } }} } \right)^{2} } } = 2{{\left( {R\sum\nolimits_{r} {x_{ \cdot r}^{2} } - x_{ \cdot \cdot }^{2} } \right)} \mathord{\left/ {\vphantom {{\left( {R\sum\nolimits_{r} {x_{ \cdot r}^{2} } - x_{ \cdot \cdot }^{2} } \right)} {n^{2} }}}\right. \kern-0pt} {n^{2} }}. \\\end{aligned}\)

Substituting the expression (49) it is obtained the last expression of Eq. (29). Similarly the expression (27) of $\hat{\rho }_{L}$ leads to the last expression of Eq. (30).

On the other hand, the estimator $\hat{\rho}_{I2}$ of ρ_I2 − which is based on the unbiased estimators of its components − is the ICC(2, 1) of Shrout and Fleiss (1979)

$$\hat{\rho }_{I2} = \frac{{n \times \left({MSS - MSE} \right)}}{{n \times MSS + R \times MSR + \left( {nR - n - R} \right) \times MSE}},$$

(50)

where MSS = SSS/(n − 1), MSR = SSR/(R − 1), and MSE = SSE/{(n − 1)(R − 1)} (or SSS, SSE, and SSD) denote the mean squares (or sum of squares) for subjects, raters, and error (residual) in the analysis of variance, respectively. In addition, SSE = SST − SSS − SSR, with SST the sum of squares total. As,

$$\begin{aligned} & SSS = R\sum\limits_{s} {\left( {\overline{x}_{s \cdot } - \overline{x}_{ \cdot \cdot } } \right)}^{2} = \frac{1}{R}\left\{ {\sum\limits_{s} {x_{s \cdot }^{2} } - \frac{{x_{ \cdot \cdot }^{2} }}{n}} \right\},\\ &SSR = n\sum\limits_{r} {\left( {\overline{x}_{ \cdot r} - \overline{x}_{ \cdot \cdot } } \right)}^{2} = \frac{1}{n}\left\{ {\sum\limits_{r} {x_{ \cdot r}^{2} } - \frac{{x_{ \cdot \cdot }^{2} }}{R}} \right\},\,{\text{and}} \\ & SST = \sum\limits_{s} {\sum\limits_{r} {\left( {x_{sr} - \overline{x}_{ \cdot \cdot } } \right)^{2} = } } \sum\limits_{s} {\sum\limits_{r} {x_{sr}^{2} } } - \frac{{x_{ \cdot \cdot }^{2} }}{nR} \\ \end{aligned}$$

then, substituting in the expression (50) it is obtained again the expression (29). Therefore $\hat{\rho }_{LU} = \hat{\rho }_{I2}$.

1.4 Appendix 4: Classic non-weighted kappa coefficients

We will now provide the values necessary to define any non-weighted coefficient κ = (I_o − I_e)/(1 − I_e), and calculate the value of its classic estimator $\hat{\kappa } = {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} {\left( {1 - \hat{I}_{e} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{e} } \right)}}$. The new estimator $\hat{\kappa }_{U}$ is obtained with the same formulas from the text of the article.

When R = 2 all of the kappa coefficients are based on I_o = Σp_ii and $\hat{I}_{o} = \sum\nolimits_{i} {\hat{p}_{ii} }$ = ${{\sum\nolimits_{i} {O_{ii} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {O_{ii} } } n}} \right. \kern-0pt} n}.$ The actual and estimated values of I_e in each coefficient are:

(a)
κ_C and $\hat{\kappa }_{C}$ (Cohen's kappa): I_e = Σ_ip_i·p_·i and $\hat{I}_{e} = \sum\nolimits_{i} {\hat{p}_{i \cdot } \hat{p}_{ \cdot i} } = {{\sum\nolimits_{i} {O_{i \cdot } O_{ \cdot i} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {O_{i \cdot } O_{ \cdot i} } } {n^{2} }}} \right. \kern-0pt} {n^{2} }}.$
(b)
κ_S and $\hat{\kappa }_{S}$ (Scott's pi): I_e = $\sum\nolimits_{i} {\pi_{i}^{2} }$ where π_i = (p_i· + p_·i)/2 and $\hat{I}_{e} = \sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ where $\hat{\pi }_{i} = {{\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right)} 2}} \right. \kern-0pt} 2} = {{\left( {O_{i \cdot } + O_{ \cdot i} } \right)} \mathord{\left/ {\vphantom {{\left( {O_{i \cdot } + O_{ \cdot i} } \right)} {2n}}} \right. \kern-0pt} {2n}}.$
(c)
$\hat{\kappa }_{K}$ (Krippendorf's alpha) which estimates κ_S: $\hat{I}_{e} = \sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$, with $\hat{\pi }_{i}$ as in (b), but $\hat{I}_{o}$ is special: $\hat{I}_{o} = {{\left\{ {\left( {2n - 1} \right)\sum\nolimits_{i} {\hat{p}_{ii} + 1} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {2n - 1} \right)\sum\nolimits_{i} {\hat{p}_{ii} + 1} } \right\}} {2n}}} \right. \kern-0pt} {2n}} = {{\left\{ {\left( {2n - 1} \right)\sum\nolimits_{i} {O_{ii} + n} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {2n - 1} \right)\sum\nolimits_{i} {O_{ii} + n} } \right\}} {2n^{2} }}} \right. \kern-0pt} {2n^{2} }}$.
(d)
κ_G and $\hat{\kappa }_{G}$ (Gwet's AC1): I_e = Σ_iπ_i(1 − π_i)/(K − 1) and $\hat{I}_{e} = {{\sum\nolimits_{i} {\hat{\pi }_{i} \left( {1 - \hat{\pi }_{i} } \right)} } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {\hat{\pi }_{i} \left( {1 - \hat{\pi }_{i} } \right)} } {\left( {K - 1} \right)}}} \right. \kern-0pt} {\left( {K - 1} \right)}}$, with $\hat{\pi }_{i}$ as in case (b). Note that $\hat{I}_{e} = {{\left\{ {1 - \sum\nolimits_{i} {\hat{\pi }_{i}^{2} } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {1 - \sum\nolimits_{i} {\hat{\pi }_{i}^{2} } } \right\}} {\left( {K - 1} \right)}}} \right. \kern-0pt} {\left( {K - 1} \right)}}$, where $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ is the value of $\hat{I}_{e}$ in (b). In this case, the formula of $\hat{\kappa }_{GU}$ does have a particular expression:
$$\hat{\kappa }_{GU} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + Y_{N} }}{{\left( {n - 1} \right) + Y_{N} }}\quad {\text{where}}\quad Y_{N} = \frac{{1 - \hat{\kappa }_{G} }}{{2\left( {K - 1} \right)}} - \frac{{\hat{I}_{e} }}{{1 - \hat{I}_{e} }}$$

When R ≥ 2 all of the kappa non-weighted coefficients are based on I_o = Σ_rΣ_r'≠rΣ_ip_ir,ir'/{R(R − 1)} and $\hat{I}_{o}$ = {$\sum\nolimits_{i} {\sum\nolimits_{s} {R_{is}^{2} } }$ − nR}/{nR(R − 1)}. The actual and estimated values of I_e are:

(A)
κ_H and $\hat{\kappa }_{H}$ (Hubert's kappa): I_e = $\sum\nolimits_{i} {\left\{ {p_{i + }^{2} - \sum\nolimits_{r} {p_{ir}^{2} } } \right\}}$/{R(R − 1)} and $\hat{I}_{e}$ = $\sum\nolimits_{i} {\left\{ {n_{i + }^{2} - \sum\nolimits_{r} {n_{ir}^{2} } } \right\}}$/ {n²R(R − 1)}.
(B)
κ_F and $\hat{\kappa }_{F}$ (Fleiss's kappa): I_e = $\sum\nolimits_{i} {p_{i + }^{2} }$/R² and $\hat{I}_{e}$ = $\sum\nolimits_{i} {R_{i + }^{2} }$/(nR)².
(C)
κ_F2 and $\hat{\kappa }_{F2}$ (Fleiss's kappa two-pairwise): I_e = [(R − 2)$\sum\nolimits_{i} {\sum\nolimits_{r} {p_{ir}^{2} } }$ + $\sum\nolimits_{i} {p_{i + }^{2} }$]/[2R(R − 1)] and $\hat{I}_{e}$ = [(R − 2)$\sum\nolimits_{i} {\sum\nolimits_{r} {n_{ir}^{2} } }$ + $\sum\nolimits_{i} {n_{i + }^{2} }$]/[2n²R(R − 1)].
(D)
$\hat{\kappa }_{K}$ (Krippendorf's alpha) which estimates κ_F: $\hat{I}_{e} = {{\sum\nolimits_{i} {R_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {R_{i + }^{2} } } {\left( {nR} \right)^{2} }}} \right. \kern-0pt} {\left( {nR} \right)^{2} }},$ but $\hat{I}_{o}$ is special: $\hat{I}_{o}$ = {(2n − 1)T + 1}/2n where T = {$\sum\nolimits_{i} {\sum\nolimits_{s} {R_{is}^{2} } }$ − nR}/{nR(R − 1)}.
(E)
$\hat{\kappa }_{K2}$ (Krippendorf's alpha two-pairwise) which estimates κ_F2: $\hat{I}_{o}$ is the same as in paragraph (D) and $\hat{I}_{e}$ = ${{\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {n_{ir}^{2} } + \sum\nolimits_{i} {n_{i + }^{2} } } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {n_{ir}^{2} } + \sum\nolimits_{i} {n_{i + }^{2} } } } \right\}} {\left\{ {2n^{2} R\left( {R - 1} \right)} \right\}}}} \right. \kern-0pt} {\left\{ {2n^{2} R\left( {R - 1} \right)} \right\}}}$.
(F)
κ_G and $\hat{\kappa }_{G}$ (Gwet's AC1): I_e = $\left( {1 - {{\sum\nolimits_{i} {p_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {p_{i + }^{2} } } {R^{2} }}} \right. \kern-0pt} {R^{2} }}} \right)$/(K − 1) and $\hat{I}_{e}$ = $\left\{ {1 - {{\sum\nolimits_{i} {R_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {R_{i + }^{2} } } {\left( {nR} \right)^{2} }}} \right. \kern-0pt} {\left( {nR} \right)^{2} }}} \right\}$/(K − 1). It can be observed that $\hat{\kappa }_{G}$ ≥ $\hat{\kappa }_{F}$, since $\hat{I}_{e}$(Gwet) − $\hat{I}_{e}$(Fleiss) is proportional to K⁻¹ − $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ ≤ 0; the first statement because of expressions (39) and (32) respectively; the second one because $\sum\nolimits_{i} {\hat{\pi }_{i}^{2} }$ reaches a minimum value of 1/K when $\hat{\pi }_{i}$ = 1/K. In this case, the formula of $\hat{\kappa }_{GU}$ does have a particular expression:
$$\hat{\kappa }_{GU} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + B_{N} }}{{\left( {n - 1} \right) + B_{N} }}\quad {\text{where}}\quad B_{N} = \frac{{\left( {R - 1} \right)\left( {1 - \hat{\kappa }_{G} } \right)}}{{R\left( {K - 1} \right)}} - \frac{{\hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$
(G)
κ_G2 and $\hat{\kappa }_{G2}$ (Gwet's AC1 two-pairwise):
$$\begin{aligned} & I_{e} = \frac{1}{K - 1}\left[ {1 - \frac{1}{{2R\left( {R - 1} \right)}}\left\{ {\left( {R - 2} \right)\sum\limits_{i} {\sum\limits_{r} {p_{ir}^{2} + \sum\limits_{i} {p_{i + }^{2} } } } } \right\}} \right],\,{\text{and}} \\ & \quad \hat{I}_{e} = \frac{1}{K - 1}\left[ {1 - \frac{1}{{2n^{2} R\left( {R - 1} \right)}}\left\{ {\left( {R - 2} \right)\sum\limits_{i} {\sum\limits_{r} {n_{ir}^{2} + \sum\limits_{i} {n_{i + }^{2} } } } } \right\}} \right]. \\ \end{aligned}$$

In this case, the formula of $\hat{\kappa }_{G2U}$ does have a particular expression:

$$\hat{\kappa }_{G2U} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G2} + C_{N} }}{{\left( {n - 1} \right) + C_{N} }}\quad {\text{where}}\quad C_{N} = \frac{{1 - \hat{\kappa }_{G2} }}{{2\left( {K - 1} \right)}} - \frac{{\hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Martín Andrés, A., Álvarez Hernández, M. Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements. Adv Data Anal Classif (2024). https://doi.org/10.1007/s11634-024-00581-x

Download citation

Received: 14 May 2023
Accepted: 28 January 2024
Published: 06 March 2024
DOI: https://doi.org/10.1007/s11634-024-00581-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

Abstract

Similar content being viewed by others

Properties of Bangdiwala’s B

Statistical Assessment of Agreement

A New Interpretation of the Weighted Kappa Coefficients

1 Introduction

2 Case of two raters

2.1 Weighted and unweighted kappa and observed index of agreements

2.2 Cohen's kappa and the intraclass and concordance correlation coefficients

2.3 Scott's pi

2.4 Krippendorf's alpha

2.5 Gwet's AC1/2

3 Case of multi-raters

3.1 Pairwise methods and the observed index of agreement

3.2 Hubert's kappa pairwise and the intraclass and concordance correlation coefficients

3.3 Fleiss' kappa pairwise

3.4 Krippendorf's multi-rater alpha

3.5 Gwet's multi-rater AC1/2

4 Examples

5 Simulation

6 Assessment of the difference between each pair of estimators

7 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix 1: Average values of some functions of parameters of the multinomial distribution and simplification of some expressions

1.2 Appendix 2: Variances of the new estimators of kappa

1.3 Appendix 3: Justification of the equality \(\hat{\rho }_{LU} = \hat{\rho }_{I2}\) ﻿ \(\hat{\rho }_{L} = \hat{\rho }_{I2S}\) and its simplified formula

1.4 Appendix 4: Classic non-weighted kappa coefficients

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

1.3 Appendix 3: Justification of the equality \(\hat{\rho }_{LU} = \hat{\rho }_{I2}\) \(\hat{\rho }_{L} = \hat{\rho }_{I2S}\) and its simplified formula