1 Introduction

It is often necessary to assess the degree of concordance or agreement between R raters which independently classify n subjects within K ≥ 2 categories (Fleiss 1971; Landis and Koch 1975a, b; Warrens 2010; Schuster and Smith 2005).

Let this be the case for only two raters (R = 2) and nominal categories. As some of the observed agreements may be due to chance, it is most common to eliminate the effect of chance by defining a kappa-type coefficient of the form κ = (Io − Ie)/(1 − Ie). In that expression Io is the observed index of agreements (the sum of the observed proportions of agreements), Ie is the expected index of agreements (the sum of the proportions of agreements that would happen if the two raters acted independently) and κ is the population value of the proposed agreement measure. Note that the previous indexes only consider the agreements obtained. When the categories are ordinal, the indexes defined are similar to the previous ones, but also considering the disagreements obtained, to which certain weights are assigned (see Sect. 2.1); this leads to a weighted kappa coefficient. From now on, κ will allude to one or the other indistinctly. According to the definition adopted for Ie, the different kappa coefficients are obtained: κS (Scott 1955), κC (Cohen 1960, 1968), and κG (Gwet 2008). The estimation of these coefficients has the general form of \(\hat{\kappa } = {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} {\left( {1 - \hat{I}_{e} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{e} } \right)}},\) where the values \(\hat{\kappa }\), \(\hat{I}_{o}\) and \(\hat{I}_{e}\) are the sample estimators of the previous population parameters. It can be seen that κ and \(\hat{\kappa }\) are decreasing functions of Ie and \(\hat{I}_{e}\) respectively. Additionally, Krippendorf (1970, 2004) provides an estimator \(\hat{\kappa }_{K}\) of κS that differs slightly from the more classical \(\hat{\kappa }_{S}\) because of its new definition of \(\hat{I}_{o}\).

Let this be the case for multi-raters (R ≥ 2). The different coefficients κ of the case R = 2 can be generalized for the case of multi-raters in several ways, depending on how the phrase “an agreement has occurred” is interpreted. The most common interpretation is that of Fleiss (1971) and Hubert (1977) "an agreement occurs if and only if two raters categorize an object consistently" or pairwise definition of agreement. This is the definition in this article. Hubert (1977) also makes the following interpretation "an agreement occurs if and only if all raters agree on the categorization of an object", or R-wise definition (Conger 1980). The extension R-wise κHR of κC can be seen in Conger (1980), Shuster and Smith (2005) and Martín Andrés and Álvarez Hernández (2020). The best-known pairwise extensions of the coefficients κS, κC and κG are the coefficients κF (Fleiss 1971), κH (Hubert 1977; Conger 1980) and κG (Gwet 2008) respectively. All of them are defined under the same format as in the case of R = 2. Additionally, Krippendorf (1970, 2004) provides an estimator \(\hat{\kappa }_{K}\) of κF that differs slightly from the more classical \(\hat{\kappa }_{F}\), again because of the definition of \(\hat{I}_{o}\). An overview of all of the above can be seen in Gwet's book (2021).

However, all \(\hat{\kappa }_{X}\) expressions are based on biased estimators (X refers to any of the letters used above), since they estimate the product of two population proportions -a term that is present in Ie- through the product of their sample estimators. The first objective of this article is to correct this bias by proposing unbiased estimators \(\hat{I}_{eU}\) of Ie − so the new estimator of κX will be \(\hat{\kappa }_{XU} = {{\left( {\hat{I}_{o} - \hat{I}_{eU} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{eU} } \right)} {\left( {1 - \hat{I}_{eU} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{eU} } \right)}}\) − , as well as to determine the variance of \(\hat{\kappa }_{XU}\). This methodology is easy to apply to any other kappa coefficient studied. A second objective is to make pairwise extensions of some measures, but in a different way to traditional pairwise extensions.

The previous description is very general since it is necessary to specify who are the “subject population” and the “rater population”. Regarding the population of subjects, the n subjects may be: (a) a random sample of an infinite population of subjects, which is what is assumed in the rest of the sections; (b) a random sample of a finite population of subjects, in which case a finite population correction (Gwet 2021a, b) must be made to the formulas of the variance; and (c) the only subjects of interest, in which case only \(\hat{\kappa }_{X}\) makes sense, there is no κX parameter to estimate and it makes no sense to define \(\hat{\kappa }_{XU}\).

Regarding the population of raters, the R raters may be (Shrout and Fleiss 1979): (1) different for the same subject -even with a different number- and extracted from an infinite population of raters; (2) the same for all of the subjects and extracted from an infinite population of raters; and (3) the same for all of the subjects and they are the only raters of interest, which is what is assumed in the rest of the sections. When the replies of the raters are quantitative, a traditional way of measuring the degree of agreement between them is through the intraclass correlation coefficients (ICC) ρI1, ρI2, and ρI3 which are obtained from the corresponding one-way random model, two-way random model, or two-way mixed model, respectively. In the last two cases it is assumed that there is no interaction. Nevertheless, in this context of measures of agreement, Shrout and Fleiss (1979) and Carrasco and Jover (2003) point out that in case (3) it is also necessary to include the variability between raters in the total variability, so that in cases (2) and (3) we should use ρI2. Additionally, and for case (3), Lin (1989, 2000) and Barnhart et al. (2002) propose using as a measure of agreement the concordance correlation coefficient (CCC) ρL.

As is logical, different researchers have shown interest in searching for relations between the coefficients κX, ρIi, and ρL, as well as between their estimators \(\hat{\kappa }_{X}\), \(\hat{\rho }_{Ii}\), and \(\hat{\rho }_{L}\). Landis and Koch (1977) demonstrated that \(\hat{\kappa }_{F}\) is asymptotically equivalent to ρI1 when the replies are binary. Furthermore, Barnhart et al. (2002) and Carrasco and Jover (2003) demonstrated that ρL = ρI2. Since in the case of R = 2 Martín Andrés and Álvarez Hernández (2020) demonstrated that ρL = κC—assuming, as from now on, that the weights of the disagreements are quadratic—, then the satisfactory property κC = ρL = ρI2 is obtained when R = 2. The equivalences between the estimators of these parameters are more complex, since their values depend on the method of estimating their components. For example, Fleiss and Cohen (1973) demonstrated that \(\hat{\kappa }_{C}\) is asymptotically equivalent to ρI2, King and Chinchilli (2001) and Martín Andrés and Álvarez Hernández (2020) demonstrated that \(\hat{\kappa }_{C}\) = \(\hat{\rho }_{L}\) when direct (biased) estimators are used, and Davis and Fleiss (1982) verified that \(\hat{\kappa }_{H}\) is asymptotically equivalent to ρI2 when the replies are binary. The third objective of this article is to relate κH to ρL, as well as estimators \(\hat{\kappa }_{CU}\) and \(\hat{\kappa }_{HU}\) with estimators \(\hat{\rho }_{I2}\) and \(\hat{\rho }_{LU}\), which is based on the unbiased estimators of the components of ρI2 and ρL, respectively.

From the aforementioned reasons, we can see that in this article it is assumed that n subjects, extracted randomly from an infinite population, are given a score a single time by R fixed raters (who are the only ones of interest). It is also assumed that there are no missing data, i.e. that all of the raters give a reply in all of the subjects.

2 Case of two raters

Let be two raters (R = 2) that independently classify n subjects within K categories. Let Oij be the number of subjects whom observer 1 classifies as type i (i = 1, 2, …, K) and observer 2 as type j (j = 1, 2, …, K). This gives rise to a table of absolute frequencies Oij like those in Tables 1 and 2, with observed proportions \(\hat{p}_{ij}\) = Oij/n, where ΣiΣjOij = n and ΣiΣj\(\hat{p}_{ij}\) = 1. The notation for row totals (O and \(\hat{p}_{i \cdot }\)), of column (O·j and \(\hat{p}_{ \cdot j}\)) or general (O·· = n and \(\hat{p}_{ \cdot \cdot }\) = 1) is the usual; for example \(\hat{p}_{i \cdot }\) = Σj\(\hat{p}_{ij}\). If the subjects have been chosen randomly and both raters classify all of the subjects, then the observed dataset {Oij} comes from a multinomial distribution of parameters n and {pij}, where pij is the probability that a subject will be classified in cell (i, j). Additionally {p} and {p·j} will be the marginal distributions of the row and column observers respectively. Obviously, \(\hat{p}_{ij}\), \(\hat{p}_{i \cdot }\) and \(\hat{p}_{ \cdot j}\) are the maximum likelihood estimators of pij, p and p·j respectively. At the end of “Appendix 2”, another type of sampling is mentioned in detail.

Table 1 Diagnosis of n = 100 subjects by R = 2 raters in K = 3 categories (Fleiss et al. 2003)
Table 2 Classification of n = 8 subjects by R = 2 raters in K = 3 categories (Gwet 2021b, p 109)

2.1 Weighted and unweighted kappa and observed index of agreements

It has already been indicated that κ depends on the indexes of agreement Io (observed) and Ie (expected). To evaluate any of them it is necessary to previously define the weight or degree of agreement wij that is assigned to the answer (i, j), with 0 ≤ wij ≤ 1, wii = 1, and generally wij = wji < 1 (i ≠ j). When categories are ordinal, there are many ways to assign values to wij (Schuster and Smith 2005). If we assume that categories 1, 2, …, K are ordered from the lowest to highest, it is usual that wij is related to the value of (i − j). A classic definition, to which we will refer later, is the quadratic weighting wij = 1 − [(i − j)/(K − 1)]2 of Fleiss and Cohen (1973). When categories are nominal, it is traditional to assign the weights wii = 1 and wij = 0 (i ≠ j), that is, it only considers the actual agreements. Historically, the different coefficients κ are defined first in the unweighted case, later extending it to the weighted case. However this article will be developed for the general weighted case, since the unweighted is a particular case of that: wij = δij with δij are the Kronecker deltas.

All coefficients κ are defined based on the same value of the index of agreements observed. Therefore, it is appropriate to indicate their definition (Io) and their estimate (\(\hat{I}_{o}\)) as general reference for all this Sect. 2:

$$I_{o} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{ij} } } \quad {\text{and}}\quad \hat{I}_{o} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{ij} } } = {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{ij} } } } \mathord{\left/ {\vphantom {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{ij} } } } n}} \right. \kern-0pt} n},$$

where \(\hat{I}_{o}\) is an unbiased estimator of Io.

2.2 Cohen's kappa and the intraclass and concordance correlation coefficients

Cohen (1960, 1968) defines the classical measure of agreement

$$k_{C} = (I_{o} - I_{e} )/(1 - I_{e} )\quad where\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{i \cdot } p_{ \cdot j} } } ,$$
(1)

and proposes to estimate it by,

$$\begin{aligned}\hat{\kappa }_{C} =& {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{I}_{o} - \hat{I}_{e} } \right)} {\left( {1 - \hat{I}_{e} } \right)}}} \right. \kern-0pt} {\left( {1 - \hat{I}_{e} } \right)}}\quad {\text{where}}\quad\\ \hat{I}_{e} = &\sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{i \cdot } \hat{p}_{ \cdot j} } } = {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } \mathord{\left/ {\vphantom {{\sum\limits_{i} {\sum\limits_{j} {w_{ij} O_{i \cdot } O_{ \cdot j} } } } {n^{2} }}} \right. \kern-0pt} {n^{2} }}. \end{aligned}$$

As indicated in “Appendix 1”, \(\hat{p}_{i \cdot } \hat{p}_{ \cdot j}\) is not an unbiased estimator of pp·j since

$$E\left( {\hat{p}_{i \cdot } \hat{p}_{ \cdot j} } \right) = \frac{{\left( {n - 1} \right)p_{i \cdot } p_{ \cdot j} + p_{ij} }}{n},$$
(2)

although it is asymptotically unbiased, as happens in the other cases that follow. Therefore \(E\left( {\hat{I}_{e} } \right) = \sum\nolimits_{i} {\sum\nolimits_{j} {E\left( {\hat{p}_{i \cdot } \hat{p}_{ \cdot j} } \right) = } }\){(n − 1)Ie + Io}/n and \(\hat{I}_{e}\) is also not an unbiased estimator of Ie. From expression (2) it follows that the unbiased estimators of pp·j and Ie are

$$\widehat{{p_{i \cdot } p_{ \cdot j} }} = \frac{{n\hat{p}_{i \cdot } \hat{p}_{ \cdot j} - \hat{p}_{ij} }}{n - 1}\quad {\text{and}}\quad \hat{I}_{eU} = \sum\limits_{i} {w_{ij} \widehat{{p_{i \cdot } p_{ \cdot j} }}} = \frac{{n\hat{I}_{e} - \hat{I}_{o} }}{n - 1},$$
(3)

respectively. Thus, the new estimator \(\hat{\kappa }_{CU}\) of κC will be

$$\hat{\kappa }_{CU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{n\hat{\kappa }_{C} }}{{\left( {n - 1} \right) + \hat{\kappa }_{C} }},$$
(4)

and its variance, which is deduced in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{CU} } \right) = \frac{{\left( {n - \kappa_{C} } \right)^{4} }}{{\left\{ {n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{C} } \right),$$
(5)

where \(V\left( {\hat{\kappa }_{C} } \right)\) refers to the formula of Fleiss et al. (1969), which can be seen in the book by Gwet (2021b); this book also contains all of the variances that are needed in what follows. This type of correction is similar to the one used by Miettinen and Nurminen (1985) for the score statistics in 2 × 2 tables. Because of expression (3), \(\hat{I}_{eU} - \hat{I}_{e}\) is proportional to \(- \left( {\hat{I}_{o} - \hat{I}_{e} } \right)\) ≤ 0 if and only if \(\hat{\kappa }_{C}\) ≥ 0. As \(\hat{\kappa }_{C}\) decreases with \(\hat{I}_{e}\), then \(\hat{\kappa }_{CU}\) ≥ \(\hat{\kappa }_{C}\) in the case of positive agreement (\(\hat{\kappa }_{C}\) ≥ 0, which is the case of greatest interest). It is easy to see that \(V\left( {\hat{\kappa }_{CU} } \right) \le V\left( {\hat{\kappa }_{C} } \right)\) if and only if κC ≥ n0.5/{n0.5 + (n − 1)0.5}. Something similar happens with the other variances obtained below.

Let there now be two raters with quantitative answers x1 and x2 with means μ1 and μ2, variances \(\sigma_{1}^{2}\) and \(\sigma_{2}^{2}\), and covariance σ12. Lin (1989, 2000) established the following measure of quantitative agreement ρL (known as CCC) and its estimation \(\hat{\rho }_{L}\)

$$\rho_{L} = \frac{{2\sigma_{12} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \left( {\mu_{1} - \mu_{2} } \right)^{2} }}\quad {\text{and}}\quad \hat{\rho }_{L} = \frac{{2S_{12} }}{{S_{1}^{2} + S_{2}^{2} + \left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} }},$$
(6)

where \(S_{i}^{2}\) and S12 are the biased estimators of the variances and covariances respectively (both with denominator n) and \(\overline{x}_{i}\) are the sample means. As mentioned in the Introduction, the quadratic weighting has the advantage of achieving that κC = ρL = ρI2 and that \(\hat{\rho }_{L} = \hat{\kappa }_{C}\). On the other hand, Carrasco and Jover (2003) replaced the values of \(\sigma_{i}^{2}\), σ12 and (μ1 − μ2)2 for their unbiased estimators \(s_{i}^{2}\), s12 (their sample variances and covariances with denominator n − 1) and \(\widehat{{\left( {\mu_{1} - \mu_{2} } \right)^{2} }}\) = (\(\overline{x}_{1}\) − \(\overline{x}_{2}\))2 − (\(s_{1}^{2}\) + \(s_{2}^{2}\) − 2s12)/n in the first expression (6), which led to the following estimator \(\hat{\rho }_{LU}\) of ρL,

$$\hat{\rho }_{LU} = \frac{{2ns_{12} }}{{\left( {s_{1}^{2} + s_{2}^{2} } \right)\left( {n - 1} \right) + n\left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} + 2s_{12} }} = \frac{{2nS_{12} }}{{\left( {n - 1} \right)\left\{ {S_{1}^{2} + S_{2}^{2} + \left( {\overline{x}_{1} - \overline{x}_{2} } \right)^{2} } \right\} + 2S_{12} }},$$
(7)

Note that \(\hat{\rho }_{LU}\) = n \(\hat{\rho }_{L}\)/{(n − 1) + \(\hat{\rho }_{L}\)}, which is the same function of expression (4) that relates \(\hat{\kappa }_{CU}\) with \(\hat{\kappa }_{C}\). Therefore, as \(\hat{\kappa }_{C}\) = \(\hat{\rho }_{L}\), then \(\hat{\kappa }_{CU}\) = \(\hat{\rho }_{LU}\) and the two new estimators of ρL and κC (quadratic weights) are the same. Additionally, \(\hat{\rho }_{LU}\) ≥ \(\hat{\rho }_{L}\) if \(\hat{\rho }_{L}\) ≥ 0. In the “Appendix 3” it is proved that \(\hat{\rho }_{LU}\) = \(\hat{\rho }_{I2}\), thus \(\hat{\kappa }_{CU}\) = \(\hat{\rho }_{LU}\) = \(\hat{\rho }_{I2}\).

2.3 Scott's pi

Scott (1955) defines the following measure of agreement

$$k_{S} = (I_{o} - I_{e} )/(1 - I_{e} )\quad {\text{where}}\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \pi_{i} \pi_{j} } } ,\,{\text{with}}\,\,\pi_{i} = (p_{i \cdot } + p_{ \cdot i} )/2,$$
(8)

and proposes to estimate it by

$$\hat{\kappa }_{S} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } \quad {\text{with}}\quad \hat{\pi }_{i} = \frac{{\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} }}{2} = \frac{{O_{i \cdot } + O_{ \cdot i} }}{2n},$$
(9)

As indicated in “Appendix 1”, \(\hat{\pi }_{i} \hat{\pi }_{j}\) is not an unbiased estimator of πiπj since

$$E(\hat{\pi }_{i} \hat{\pi }_{j} ) = \frac{{\left( {n - 1} \right)\pi_{i} \pi_{j} + {{\left\{ {\delta_{ij} \left( {p_{i \cdot } + p_{ \cdot j} } \right) + \left( {p_{ij} + p_{ji} } \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {\delta_{ij} \left( {p_{i \cdot } + p_{ \cdot j} } \right) + \left( {p_{ij} + p_{ji} } \right)} \right\}} 4}} \right. \kern-0pt} 4}}}{n}.$$
(10)

Therefore, E(\(\hat{I}_{e}\)) = \(\sum\nolimits_{i} {\sum\nolimits_{j} {E\left( {\hat{\pi }_{i} \hat{\pi }_{j} } \right)} }\) = {(n − 1)Ie + (1 + Io)/2}/n, assuming that wij = wji, and \(\hat{I}_{e}\) is not an unbiased estimator of Ie. From expression (10) it is deduced that the unbiased estimators of πiπj and Ie are

$$\begin{aligned} &\widehat{{\pi_{i} \pi_{j} }} = \frac{{n\hat{\pi }_{i} \hat{\pi }_{j} - {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot j} } \right)\delta_{ij} + \left( {\hat{p}_{ij} + \hat{p}_{ji} } \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot j} } \right)\delta_{ij} + \left( {\hat{p}_{ij} + \hat{p}_{ji} } \right)} \right\}} 4}} \right. \kern-0pt} 4}}}{n - 1}\quad {\text{and}}\\ & \hat{I}_{eU} = \sum\limits_{i} {w_{ij} } \widehat{{\pi_{i} \pi_{j} }} = \frac{{n\hat{I}_{e} - {{\left( {1 + \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 + \hat{I}_{o} } \right)} 2}} \right. \kern-0pt} 2}}}{n - 1},\end{aligned}$$
(11)

respectively. Therefore, the new estimator \(\hat{\kappa }_{SU}\) of κS will be

$$\hat{\kappa }_{SU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{S} + 1}}{{\left( {2n - 1} \right) + \hat{\kappa }_{S} }},$$
(12)

and its variance, as followed in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{SU} } \right) = \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{S} } \right).$$
(13)

Because of expression (11), \(\hat{I}_{eU} - \hat{I}_{e}\) is proportional to \(- \left\{ {\left( {1 - \hat{I}_{e} } \right) + \left( {\hat{I}_{o} - \hat{I}_{e} } \right)} \right\}\) which is also proportional to \(- \left\{ {1 + \hat{\kappa }_{S} } \right\}\) ≤ 0 if and only if \(\hat{\kappa }_{S}\) ≥  − 1. As \(\hat{\kappa }_{S}\) decreases with \(\hat{I}_{e}\), then \(\hat{\kappa }_{SU}\) ≥ \(\hat{\kappa }_{S}\) in the case of a positive agreement.

2.4 Krippendorf's alpha

Krippendorf (1970, 2004) proposed to estimate κS as in expression (9), but with a small-sample correction for \(\hat{I}_{o}\), though Gwet (2021b, p. 65) considers that “The need for such an adjustment and its potential benefits have not been documented”. The new estimator is,

$$\hat{\kappa }_{K} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n}\quad {\text{and}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } ,$$
(14)

where \(\hat{I}_{oC} = \hat{I}_{o} + {{\left( {1 - \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 - \hat{I}_{o} } \right)} {2n}}} \right. \kern-0pt} {2n}}\); therefore,

$$\begin{aligned}\hat{\kappa }_{K} = &\frac{{\left( {2n - 1} \right)\hat{\kappa }_{S} + 1}}{2n}\quad {\text{and}}\\ \hat{\kappa }_{KU} =& \frac{{\left( {2n - 1} \right)\hat{\kappa }_{SU} + 1}}{2n} = \frac{{\left( {n - 1} \right) + \left\{ {2n\left( {n - 1} \right) + 1} \right\}\hat{\kappa }_{K} }}{{2n\left( {n - 1} \right) + n\hat{\kappa }_{K} }}. \end{aligned}$$
(15)

The first expression follows from expressions (9) and (14); the second is obtained by replacing \(\hat{I}_{e}\) for the value of \(\hat{I}_{eU}\) in expression (11). From expressions (15) it is deduces that \(\hat{\kappa }_{K}\) ≥ \(\hat{\kappa }_{S}\) and \(\hat{\kappa }_{KU}\) ≥ \(\hat{\kappa }_{SU}\). Also, as for positive degrees of agreement it occurs that \(\hat{\kappa }_{SU}\) ≥ \(\hat{\kappa }_{S}\) then, due to expressions (15), \(\hat{\kappa }_{KU}\) ≥ \(\hat{\kappa }_{K}\). Finally, if in the first expression of Eq. (15) \(\hat{\kappa }_{S}\) is replaced by {(2n − 1)\(\hat{\kappa }_{SU}\) − 1}/{(2n − 1) − \(\hat{\kappa }_{SU}\)} − which is deduced from expression (12) − then \(\hat{\kappa }_{K}\) = 2(n − 1) \(\hat{\kappa }_{SU}\)/{(2n − 1) − \(\hat{\kappa }_{SU}\)} and \(\hat{\kappa }_{SU}\) ≥ \(\hat{\kappa }_{K}\) if \(\hat{\kappa }_{SU}\) ≥ 0. The overall conclusion is that \(\hat{\kappa }_{S}\) ≤ \(\hat{\kappa }_{K}\) ≤ \(\hat{\kappa }_{SU}\) ≤ \(\hat{\kappa }_{KU}\) for positive degrees of agreement.

Regarding the variance, it is sufficient to use the first part of the second expression (15) and then replacing V(\(\hat{\kappa }_{SU}\)) with the value in expression (13); thus

$$V\left( {\hat{\kappa }_{KU} } \right) = \left( {\frac{2n - 1}{{2n}}} \right)^{2} \times \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{S} } \right).$$

2.5 Gwet's AC1/2

Gwet (2008) defines the next measure regarding AC2 (AC1 refers to the non-weighted case),

$$\begin{aligned} &k_{G} = (I_{o} - I_{e} )/(1 - I_{e} )\quad {\text{where}}\quad I_{e} = W \times \sum\limits_{i} {\pi_{i} (1 - \pi_{i} )} ,/\{ K(K - 1)\} \\ {\text{and}}\quad &W = \sum\limits_{i} {\sum\limits_{j} {w_{ij} } } ,\end{aligned}$$
(16)

and proposes to estimate it by

$$\hat{\kappa }_{G} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\sum\limits_{i} {\hat{\pi }_{i} \left( {1 - \hat{\pi }_{i} } \right)} ,$$
(17)

where πi and \(\hat{\pi }_{i}\) are obtained as in expressions (8) and (9). Once again it happens that \(\hat{I}_{e}\) is not an unbiased estimator of Ie, because \(\hat{\pi }_{i}^{2}\) is not an estimator of \(\pi_{i}^{2}\) either. Using the first expression (11) to estimate \(\pi_{i}^{2}\) in an unbiased way, we obtain that the unbiased estimators of \(\pi_{i}^{2}\) and Ie are, respectively

$$\begin{aligned}&\widehat{{\pi_{i}^{2} }} = \frac{{n\hat{\pi}_{i}^{2} - {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right) + 2\hat{p}_{ii} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {\left( {\hat{p}_{i \cdot } + \hat{p}_{ \cdot i} } \right) + 2\hat{p}_{ii} } \right\}} 4}} \right. \kern-0pt} 4}}}{n - 1}\quad\\ {\text{and}}\quad &\hat{I}_{eU} = \frac{1}{n - 1}\left\{ {n\hat{I}_{e} - X} \right\},\quad {\text{where}}\quad X = \frac{{W\left( {1 - \sum\limits_{i} {\hat{p}_{ii} } } \right)}}{{2K\left( {K - 1} \right)}}.\end{aligned}$$
(18)

Therefore, the new estimator \(\hat{\kappa }_{GU}\) of κG will be

$$\hat{\kappa }_{GU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + Y}}{{\left( {n - 1} \right) + Y}}\quad {\text{where}}\quad Y = \frac{{X - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$
(19)

In “Appendix 1” it is proved that \(\hat{I}_{eU} - \hat{I}_{e}\) ≥ 0, so it always happens that \(\hat{\kappa }_{GU}\) ≤ \(\hat{\kappa }_{G}\). It can be observed that it is not feasible to determine V(\(\hat{\kappa }_{GU}\)) directly from the value of V(\(\hat{\kappa }_{G}\)).

3 Case of multi-raters

Let there be n subjects (s = 1, 2, …, n) classified by R raters (r = 1, 2, …, R) in K types (i = 1, 2, …, K). Let xsr = 1, 2, …, K be the answer of rater r in subject s, values that are usually presented in a two-dimensional table in which the subjects are in rows and the raters in columns. For each row (subject), let Ris be the number of raters that answer i in subject s; obviously Ri+ = ΣsRis is the total number of i answers (for every rater), R+s = ΣiRis = R and R++ = ΣiΣsRis = nR. For each column (rater), let nir be the number of subjects classified as i by rater r; obviously n+r = Σinir = n, ni+ = Σrnir = Ri+ is the total number of i answers and n++ = ΣiΣrnir = nR = R++. The results of Ris and nir are usually presented as in Table 3(a) and (b) respectively.

Table 3 Results of the classification of n = 29 fish by R = 4 raters in K = 5 colorations (Gwet 2021b, p 341)

3.1 Pairwise methods and the observed index of agreement

To define and estimate the measures regarding the R > 2 case, the pairwise methods will be used. These methods in some way offer an average for what happens in the R(R − 1) possible pairs of raters (r, r'), with r, r' = 1, 2, …, R and r ≠ r'. This obliges us to change the notation used in Sect. 2, since it is necessary to indicate for each parameter from which pair (r, r') does its value come from. Parameters pij, p and p·j of Sect. 2 will now be notated as pir,jr', pir and pjr' respectively. Additionally, we define the new parameter pi+ = Σrpir = Σr'pir', which is the proportion of i answers of every raters. A similar thing occurs with the estimated values \(\hat{p}_{ij}\) and \(\hat{p}_{ir,jr^{\prime}}\) etc. Note that the estimators \(\hat{p}_{ir}\) of pir and \(\hat{p}_{i + }\) of pi+ are

$$\hat{p}_{ir} = \frac{{n_{ir} }}{n}\quad {\text{and}}\quad \hat{p}_{i + } = \sum\nolimits_{r} {\hat{p}_{ir} } = \frac{{n_{i + } }}{n} = \frac{{R_{i + } }}{n},$$
(20)

respectively, where Σi \(\hat{p}_{ir}\) = 1 and ΣrΣi \(\hat{p}_{ir}\) = R. Parameters κ, Io and Ie of Sect. 2 will be denoted as κ(r, r'), Io(r, r') and Ie (r, r') respectively; therefore

$$k\left( {r,r^{\prime } } \right) = \{ I_{o} \left( {r,r^{\prime } } \right) - I_{e} \left( {r,r^{\prime } } \right)\} /\{ 1 - I_{e} \left( {r,r^{\prime } } \right)\} ,$$
(21)

and the same for the estimated values \(\hat{\kappa }\left( {r,r^{\prime}} \right)\) etc.

With pairwise methods there are several ways to average the results of every pair of raters (r, r'), but all procedures of interest define the global value of Io as

$$I_{o} = \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{o} \left( {r,r^{\prime } } \right)} } /\{ R(R - 1)\} ,$$
(22)

thus Io = ΣrΣr'rΣiΣjwijpir,jr'/{R(R − 1)}. As is traditional, the measure of global agreement will be κ = (Io − Ie)/(1 − Ie), where Ie is yet to be defined. If Ie is defined in a similar way to Io

$$I_{e} = \sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {I_{e} \left( {r,r^{\prime } } \right)} } /\{ R(R - 1)\} ,$$
(23)

we say that the procedure that defines global κ is a “two-pairwise” procedure and the population coefficient thereby obtained will be,

$$k_{2} = \left\{ {\sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{o} \left( {r,r^{\prime } } \right)} } - \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{e} \left( {r,r^{\prime } } \right)} } } \right\}/\left\{ {R(R - 1) - \sum\limits_{r}^{{}} {\sum\limits_{{r^{\prime } \ne r}}^{{}} {I_{e} \left( {r,r^{\prime } } \right)} } } \right\}.$$

It can be noticed that κ2 is also obtained by dividing the sum of all the possible numerators (ΣrΣr'r) from expression (21), by the sum of all possible denominators, which indicates that κ2 if the weighted average of R(R − 1) values of κ(r, r') − the weights are the denominators − . This procedure is the one recommended by Janson and Olsson (2001), Conger (1980) and Gwet (2021b). Notice that ΣrΣr'≠rIo(r, r') = 2ΣrΣr'>rIo(r, r') and similarly with Ie. We have preferred to use the first expression because it facilitates some proofs, but regarding calculations the second expressions seems preferable. All of the above also applies to the case of estimated values.

As the base values of Io and \(\hat{I}_{o}\) are the same in every κ measures, it should be specified since its values are (see “Appendix 1”),

$$\begin{aligned} I_{o} &= \frac{{\sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {\sum\limits_{i} {\sum\limits_{j} {w_{ij} p_{{ir,jr^{\prime } }} } } } } }}{{R\left( {R - 1} \right)}},\\ \hat{I}_{o} &= \frac{{\sum\limits_{r} {\sum\limits_{{r^{\prime } \ne r}} {\sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{p}_{{ir,jr^{\prime } }} } } } } }}{{R\left( {R - 1} \right)}} = \frac{{\sum\limits_{i} {\sum\limits_{j} {w_{ij} \sum\limits_{s} {R_{is} R_{js} } } - nR} }}{{nR\left( {R - 1} \right)}},\end{aligned}$$
(24)

3.2 Hubert's kappa pairwise and the intraclass and concordance correlation coefficients

The κH coefficient of Hubert (Hubert 1977; Conger 1980) is a two-pairwise coefficient, and that is why the expression (23) can be applied for value Ie(r, r') of Cohen. Adjusting expression (1) to the current format, Ie(r, r') = ΣiΣjwijpirpjr' and, due to “Appendix 1

$$\begin{aligned}& k_{H} = (I_{o} - I_{e} )/(1 - I_{e} )\quad\\ {\text{where}}\quad& I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} } } \left( {p_{i + } p_{j + } - \sum\limits_{r} {p_{ir} p_{jr} } } \right)/\{ R(R - 1)\} .\end{aligned}$$
(25)

Using expressions (20) the following estimation is obtained

$$\hat{\kappa }_{H} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{1}{{n^{2} R\left( {R - 1} \right)}}\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {n_{i + } n_{j + } - \sum\limits_{r} {n_{ir} n_{jr} } } \right\}} } .$$

It can be observed that for R = 2 it occurs that κC = κH and \(\hat{\kappa }_{C} = \hat{\kappa }_{H}\). In order to obtain an unbiased estimator of Ie, the second expression of (3), applied with the current notation, indicates that \(\hat{I}_{eU} \left( {r,r^{\prime}} \right) =\)\({{\left\{ {n\hat{I}_{e} \left( {r,r^{\prime}} \right) - \hat{I}_{o} \left( {r,r^{\prime}} \right)} \right\}} \mathord{\left/ {\vphantom {{\left\{ {n\hat{I}_{e} \left( {r,r^{\prime}} \right) - \hat{I}_{o} \left( {r,r^{\prime}} \right)} \right\}} {\left( {n - 1} \right)}}} \right. \kern-0pt} {\left( {n - 1} \right)}}\); therefore R(R − 1)\(\hat{I}_{eU}\) = \(\sum\nolimits_{r} {\sum\nolimits_{r^{\prime} \ne r} {\hat{I}_{eU} \left( {r,r^{\prime}} \right)} } =\)\({{\left\{ {n\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{e} \left( {r,r^{\prime}} \right)} } - \sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{o} \left( {r,r^{\prime}} \right)} } } \right\}} \mathord{\left/ {\vphantom {{\left\{ {n\sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{e} \left( {r,r^{\prime}} \right)} } - \sum\nolimits_{r} {\sum\nolimits_{r^{\prime}} {\hat{I}_{o} \left( {r,r^{\prime}} \right)} } } \right\}} {\left( {n - 1} \right)}}} \right. \kern-0pt} {\left( {n - 1} \right)}}\) and so \(\hat{I}_{eU}\) = (n \(\hat{I}_{e}\) − \(\hat{I}_{o}\))/ (n − 1). As this expression is the same as the second expression of (3), then the conclusions in Sect. 2.2 are still valid, changing the letter C with the letter H. Thus,

$$\hat{\kappa }_{HU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{n\hat{\kappa }_{H} }}{{\left( {n - 1} \right) + \hat{\kappa }_{H} }},$$
(26)

and \(\hat{\kappa }_{HU}\) ≥ \(\hat{\kappa }_{H}\) in the case of positive agreement.

Generalizing the first expression of (6) in the case of two raters r and r' of answers xr and xr’, means μr and μr', variances \(\sigma_{r}^{2}\) and \(\sigma_{r^{\prime}}^{2}\), and covariances σrr', we obtain \(\rho_{L} \left( {r,r^{\prime}} \right)\) = \({{2\sigma_{rr^{\prime}} } \mathord{\left/ {\vphantom {{2\sigma_{rr^{\prime}} } {\left\{ {\sigma_{r}^{2} + \sigma_{r^{\prime}}^{2} + \left( {\mu_{r} - \mu_{r^{\prime}} } \right)^{2} } \right\}}}} \right. \kern-0pt} {\left\{ {\sigma_{r}^{2} + \sigma_{r^{\prime}}^{2} + \left( {\mu_{r} - \mu_{r^{\prime}} } \right)^{2} } \right\}}}.\) If we apply to this expression the two-pairwise criterion which consists of adding ΣrΣrr' in the numerator and in the denominator, the CCC ρL of Lin (1989, 2000) and Barnhart et al. (2002) is obtained for the case of multi-raters; its estimated \(\hat{\rho }_{L}\) value is obtained in the same way as the second expression of (6). In this way,

$$\begin{aligned}&\rho_{L} = \frac{{2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\sigma_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\sum\limits_{r} {\sigma_{r}^{2} } + \sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\mu_{r} - \mu_{{r^{\prime } }} } \right)^{2} } } }},\\ &\hat{\rho }_{L} = \frac{{2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {S_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\sum\limits_{r} {S_{r}^{2} } + \sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\overline{x}_{r} - \overline{x}_{{r^{\prime } }} } \right)^{2} } } }}. \end{aligned}$$
(27)

Carrasco and Jover (2003) justified that \(\hat{\rho }_{L}\) is based on biased estimators and they proposed the following estimator, which is based on unbiased estimators (srr´ and \(s_{r}^{2}\))

$$\hat{\rho }_{LU} = \frac{{2n\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {s_{{rr^{\prime } }} } } }}{{\left( {R - 1} \right)\left( {n - 1} \right)\sum\limits_{r} {s_{r}^{2} } + n\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {\left( {\overline{x}_{r} - \overline{x}_{{r^{\prime } }} } \right)^{2} + 2\sum\limits_{r} {\sum\limits_{{r^{\prime } > r}} {s_{{rr^{\prime } }} } } } } }}.$$
(28)

It is easy to see that the same thing can be obtained applying the two-pairwise method to the first expression (7). As for R = 2 it occurred that κC = ρL and \(\hat{\kappa }_{C} = \hat{\rho }_{L}\) when the weights were quadratic, and in both cases the value for R > 2 is obtained in the same way − the sum of the numerators divided by the sum of the denominators − , then also κH = ρL and \(\hat{\kappa }_{H} = \hat{\rho }_{L}\) in the case of R > 2. Additionally, κHR = κH = ρL = ρI2 since ρL = ρI2 (Carrasco and Jover 2003) and κHR = ρL (Martín Andrés and Álvarez Hernández 2020). Furthermore, as \(\hat{\rho }_{LU} = {{n\hat{\rho }_{L} } \mathord{\left/ {\vphantom {{n\hat{\rho }_{L} } {\left\{ {\left( {n - 1} \right) + \hat{\rho }_{L} } \right\}}}} \right. \kern-0pt} {\left\{ {\left( {n - 1} \right) + \hat{\rho }_{L} } \right\}}}\) -an expression which has the same form as (26)- then also

$$\hat{\kappa }_{HU} = \hat{\rho }_{LU} = \hat{\rho }_{I2} = \frac{{n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}{{\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} + \left( {nR - n - R} \right)\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }},$$
(29)

where the last two equalities are demonstrated in the “Appendix 3”. In the last expression, which is simpler for the calculation, it is understood that x = \(\sum\nolimits_{r} {x_{sr} }\), x·r = \(\sum\nolimits_{s} {x_{sr} }\), and x·· = \(\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr} } }\). Something similar happens with the estimators based on the biased estimation of their components (see “Appendix 3”),

$$\hat{\kappa }_{H} = \hat{\rho }_{L} = \frac{{n\sum\nolimits_{s} {x_{s \cdot }^{2} } + \sum\nolimits_{r} {x_{ \cdot r}^{2} - n\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}{{\sum\nolimits_{r} {x_{ \cdot r}^{2} + n\left( {R - 1} \right)\sum\nolimits_{s} {\sum\nolimits_{r} {x_{sr}^{2} } - x_{ \cdot \cdot }^{2} } } }}.$$
(30)

3.3 Fleiss' kappa pairwise

Fleiss (1971) extended κS to the case of R > 2 defining in the following value of Ie, which is not a two-pairwise type,

$$\begin{aligned} &k_{F} = (I_{o} - I_{e} )/({1} - I_{e} )\quad {\text{where}}\quad I_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \pi_{i} \pi_{j} } } \\ {\text{and}}\quad &\pi_{i} = \sum\limits_{r} {p_{ir} } /R = p_{i + } /R, \end{aligned}$$
(31)

and proposes the following estimators

$$\begin{aligned} &\hat{\kappa }_{F} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \sum\limits_{i} {\sum\limits_{j} {w_{ij} \hat{\pi }_{i} \hat{\pi }_{j} } } = \frac{1}{{n^{2} R^{2} }}\sum\limits_{i} {\sum\limits_{j} {w_{ij} R_{i + } R_{j + } } } \\ {\text{and}}\quad &\hat{\pi }_{i} = \frac{{\hat{p}_{i + } }}{R},\end{aligned}$$
(32)

since pi+ is estimated as the second expression of Eq. (20). As indicated in “Appendix 1”, \(\hat{I}_{e}\) is not an unbiased estimator of Ie since nE(\(\hat{I}_{e}\)) = (n − 1)Ie + R−1{1 + (R − 1)Io}. This is why the unbiased estimator \(\hat{I}_{eU}\) of Ie and the new estimator \(\hat{\kappa }_{FU}\) of κF will be

$$\begin{aligned}&\hat{I}_{eU} = \frac{{n\hat{I}_{e} - {{\left\{ {1 + \left( {R - 1} \right)\hat{I}_{o} } \right\}} \mathord{\left/ {\vphantom {{\left\{ {1 + \left( {R - 1} \right)\hat{I}_{o} } \right\}} R}} \right. \kern-0pt} R}}}{n - 1}\\ {\text{and}}\quad &\hat{\kappa }_{FU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {Rn - 1} \right)\hat{\kappa }_{F} + 1}}{{\left( {R - 1} \right)\hat{\kappa }_{F} + \left\{ {R\left( {n - 1} \right) + 1} \right\}}}.\end{aligned}$$
(33)

Its variance, as deduced in “Appendix 2”, is

$$V\left( {\hat{\kappa }_{FU} } \right) = \frac{{\left\{ {\left( {nR - 1} \right) - \left( {R - 1} \right)\kappa_{F} } \right\}^{4} }}{{\left\{ {R^{2} n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{F} } \right).$$
(34)

Through the first expression of Eq. (33), \(\hat{I}_{eU} - \hat{I}_{e}\) is proportional to \(\hat{I}_{e}\) − R−1{1 + (R − 1) \(\hat{I}_{o}\)}, which is also proportional to − {1 + (R − 1)\(\hat{\kappa }_{F}\)} ≤ 0 if and only if \(\hat{\kappa }_{F}\) ≥  − (R − 1)−1. Therefore, \(\hat{\kappa }_{FU}\) ≥ \(\hat{\kappa }_{F}\) in the case of positive agreement.

Another way of extending κS is to use the two-pairwise method. In this case, in “Appendix 1” it is demonstrated that

$$\begin{aligned} &k_{F2} = (I_{o} - I_{e} )/(1 - I_{e} )\quad\\ {\text{where}}\quad &I_{e} = \left[ {\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {(R - 2)\sum\limits_{r} {p_{ir} p_{jr} } + p_{i + } p_{j + } } \right\}} } } \right]/2R(R - 1),\end{aligned}$$
(35)

and therefore its estimated values in a traditional way would be

$$\hat{\kappa }_{F2} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{1}{{2n^{2} R\left( {R - 1} \right)}}\sum\limits_{i} {\sum\limits_{j} {w_{ij} \left\{ {\left( {R - 2} \right)\sum\limits_{r} {n_{ir} n_{jr} + } n_{i + } n_{j + } } \right\}} } .$$

In order to obtain the unbiased estimator of Ie, the second expression of Eq. (11) is, in the current terms, \(\hat{I}_{eU} \left( {r,r^{\prime}} \right)\) = [\(n\hat{I}_{e} \left( {r,r^{\prime}} \right)\) − {1 + \(\hat{I}_{o} \left( {r,r^{\prime}} \right)\)/2}]/(n − 1). Applying expressions (22) and (23) it is obtained that the second expression of Eq. (11) is also applied to the current case, in such a way that the conclusions obtained in the case of Scott’s Pi are valid, changing the letter S with F2. In this way

$$\begin{aligned}&\hat{\kappa }_{F2U} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{F2} + 1}}{{\left( {2n - 1} \right) + \hat{\kappa }_{F2} }},\\ {\text{where}}\quad &\hat{I}_{eU} = \frac{{n\hat{I}_{e} - {{\left( {1 + \hat{I}_{o} } \right)} \mathord{\left/ {\vphantom {{\left( {1 + \hat{I}_{o} } \right)} 2}} \right. \kern-0pt} 2}}}{n - 1},V\left( {\hat{\kappa }_{F2U} } \right) = \frac{{\left( {2n - 1 - \kappa_{F2} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2}}}V\left( {\hat{\kappa }_{F2} } \right),\end{aligned}$$

and \(\hat{\kappa }_{F2U}\) ≥ \(\hat{\kappa }_{F2}\) when \(\hat{\kappa }_{F2}\) ≥ 0. Nevertheless, to the best of our knowledge, now the value of V(\(\hat{\kappa }_{F2}\)) is not known.

3.4 Krippendorf's multi-rater alpha

Now the objective is similar to that of Sect. 2.4: to estimate κF as in expression (32), but changing the value of \(\hat{I}_{o}\) for a value \(\hat{I}_{oC}\) defined as expression (14). In this way

$$\hat{\kappa }_{K} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n}\quad {\text{and}}\quad \hat{I}_{e} = \frac{1}{{n^{2} R^{2} }}\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} R_{i + } R_{j + } } } .$$

Given the formal equality of the expressions, all of the previous conclusions can be accepted, with the necessary changes. In particular,

$$\hat{\kappa }_{K} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{F} + 1}}{2n}\quad {\text{and}}\quad \hat{\kappa }_{KU} = \frac{{\left( {2n - 1} \right)\hat{\kappa }_{FU} + 1}}{2n} = \frac{{\left( {n - 1} \right) + \left\{ {2n\left( {n - 1} \right) + 1} \right\}\hat{\kappa }_{K} }}{{2n\left( {n - 1} \right) + n\hat{\kappa }_{K} }},$$
(36)
$$\hat{\kappa }_{F} \le \hat{\kappa }_{K} \le \hat{\kappa }_{FU} \le \hat{\kappa }_{KU} ,$$
(37)
$$V\left( {\hat{\kappa }_{KU} } \right) = \left( {\frac{2n - 1}{{2n}}} \right)^{2} \times \frac{{\left( {2n - 1 - \kappa_{S} } \right)^{4} }}{{\left\{ {4n\left( {n - 1} \right)} \right\}^{2} }}V\left( {\hat{\kappa }_{F} } \right).$$
(38)

In a similar way for the two-pairwise method, where now

$$\begin{aligned}&\hat{\kappa }_{K2} = \frac{{\hat{I}_{oC} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }},\quad {\text{where}}\quad \hat{I}_{oC} = \frac{{\left( {2n - 1} \right)\hat{I}_{o} + 1}}{2n},\\ {\text{and}}\quad &\hat{I}_{e} = \frac{{\sum\nolimits_{i} {\sum\nolimits_{j} {w_{ij} \left\{ {\left( {R - 2} \right)\sum\nolimits_{r} {n_{ir} n_{jr} + } n_{i + } n_{j + } } \right\}} } }}{{2n^{2} R\left( {R - 1} \right)}}.\end{aligned}$$

Therefore, expressions (36) to (38) are also valid putting number “2” after the letters K or F in the sub-indexes of these expressions.

3.5 Gwet's multi-rater AC1/2

For the case of multi-raters, Gwet (2008) defined the same measures of agreement AC1/2 κG and \(\hat{\kappa }_{G}\) of expressions (16) and (17) respectively, but with πi and \(\hat{\pi }_{i}\) alluding to the Fleiss values of expressions (31) and (32) respectively. Therefore, Ie = W \(\left( {1 - \sum\nolimits_{i} {\pi_{i}^{2} } } \right)\)/ {K(K − 1)} = W \(\left( {1 - {{\sum\nolimits_{i} {p_{i + }^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {p_{i + }^{2} } } {R^{2} }}} \right. \kern-0pt} {R^{2} }}} \right)\)/{K(K − 1)} and

$$\hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \sum\limits_{i} {\hat{\pi }_{i}^{2} } } \right\} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \frac{{\sum\nolimits_{i} {\hat{p}_{i + }^{2} } }}{{R^{2} }}} \right\} = \frac{W}{{K\left( {K - 1} \right)}}\left\{ {1 - \frac{{\sum\nolimits_{i} {R_{i + }^{2} } }}{{n^{2} R^{2} }}} \right\}.$$
(39)

"Appendix 1" demonstrates that \(\hat{\pi }_{i}^{2}\) is not an unbiased estimator of \(\pi_{i}^{2}\) − see expression (48) − , so that \(\hat{I}_{e}\) is also not an unbiased estimator of Ie, which is justified in this same Appendix as the unbiased estimator \(\hat{I}_{eU}\) of Ie is

$$\hat{I}_{eU} = \frac{{n\hat{I}_{e} - A}}{n - 1},\quad {\text{where}}\quad A = \frac{{W\left( {R - 1} \right)\left( {1 - \hat{I}_{oN} } \right)}}{{RK\left( {K - 1} \right)}}\quad {\text{and}}\quad \hat{I}_{oN} = \frac{{\sum\nolimits_{i} {R_{is}^{2} } - nR}}{{nR\left( {R - 1} \right)}}.$$
(40)

Therefore, the new estimator \(\hat{\kappa }_{GU}\) of κG will be,

$$\hat{\kappa }_{GU} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G} + B}}{{\left( {n - 1} \right) + B}}\quad {\text{where}}\quad B = \frac{{A - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$
(41)

It can be observed that now it is not viable to determine V(\(\hat{\kappa }_{GU}\)) directly from the value of V(\(\hat{\kappa }_{G}\)). “Appendix 1” demonstrates that \(\hat{I}_{eU} - \hat{I}_{e}\) ≥ 0, so that now we also find that \(\hat{\kappa }_{GU}\) ≤ \(\hat{\kappa }_{G}\).

An alternative is to use the two-pairwise method. In this case, “Appendix 1” demonstrates that

$$\kappa_{G2} = \frac{{I_{o} - I_{e} }}{{1 - I_{e} }}\quad {\text{where}}\quad I_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left[ {1 - \frac{1}{{2R\left( {R - 1} \right)}}\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {p_{ir}^{2} + \sum\nolimits_{i} {p_{i + }^{2} } } } } \right\}} \right],$$
(42)

and therefore its estimated (biased) values are, because of expression (20)

$$\hat{\kappa }_{G2} = \frac{{\hat{I}_{o} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}\quad {\text{where}}\quad \hat{I}_{e} = \frac{W}{{K\left( {K - 1} \right)}}\left[ {1 - \frac{1}{{2R\left( {R - 1} \right)n^{2} }}\left\{ {\left( {R - 2} \right)\sum\nolimits_{i} {\sum\nolimits_{r} {n_{ir}^{2} + \sum\nolimits_{i} {n_{i + }^{2} } } } } \right\}} \right].$$
(43)

To obtain unbiased estimator of Ie, expression (18) is, in current terms, \(\hat{I}_{eU} \left( {r,r^{\prime } } \right)\) = [\(n\hat{I}_{e} \left( {r,r^{\prime } } \right)\) − W{1 − \(\sum\nolimits_{i} {\hat{p}_{{ir,ir^{\prime } }} }\)}/{2K(K − 1)}]/(n − 1). Applying expression (23) we obtain the value for the current \(\hat{I}_{eU}\), which provides the value of \(\hat{\kappa }_{G2U}\); i.e.

$$\hat{\kappa }_{G2U} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }}\quad {\text{where}}\quad \hat{I}_{eU} = \frac{{n\hat{I}_{e} - X_{N} }}{n - 1},X_{N} = \frac{{W\left( {1 - \hat{I}_{oN} } \right)}}{{2K\left( {K - 1} \right)}},$$
(44)

and \(\hat{I}_{oN}\) as in expression (40). Note that in this expression \(\hat{I}_{eU}\) has the same form as in expression (18), so that \(\hat{\kappa }_{G2U}\) can be put as a function of \(\hat{\kappa }_{G2}\) in a similar way to in expression (19):

$$\hat{\kappa }_{G2U} = \frac{{\hat{I}_{o} - \hat{I}_{eU} }}{{1 - \hat{I}_{eU} }} = \frac{{\left( {n - 1} \right)\hat{\kappa }_{G2} + Y_{N} }}{{\left( {n - 1} \right) + Y_{N} }}\quad {\text{where}}\quad Y_{N} = \frac{{X_{N} - \hat{I}_{e} }}{{1 - \hat{I}_{e} }}.$$

As in case R = 2 it occurred that \(\hat{I}_{eU} \left( {r,r^{\prime } } \right)\) ≥ \(\hat{I}_{e} \left( {r,r^{\prime } } \right)\), through expression (23) it is deduced that in the actual case \(\hat{I}_{eU}\) ≥ \(\hat{I}_{e}\); therefore \(\hat{\kappa }_{G2U}\) ≤ \(\hat{\kappa }_{G2}\). “Appendix 1” provides a more direct demonstration of the previous statement. To the best of our knowledge, the value of V(\(\hat{\kappa }_{G2}\)) is not known.

4 Examples

Table 1(a) contains the data from a classic example by Fleiss et al. (2003) in which R = 2 raters diagnose n = 100 individuals in K = 3 categories (Psychotic, Neurotic, and Organic). Its part (b) specifies the values of the eight kappa coefficients mentioned in Sect. 2, all of which are calculated for the non-weighted case (wij = δij). It can be observed that the eight coefficients verify the properties mentioned in Sect. 2; for example, all of the new estimators have a value greater than or equal to that of the classic ones, except in the case of the coefficient of Gwet in which case the opposite happens. Nevertheless, the first are only slightly different from the latter. This is due to the fact that the current sample size (n = 100) is too large to show the differences between the estimators. When the sample size is small (n = 8), as occurs in the example of Gwet 2021b (p 109) in Table 2(a) (R = 2, K = 3), the differences are more evident, as shown by the results in Table 2(b).

For the case of more than two raters, Table 3(a) and (b) show the values of Ris and nir, respectively, values which are obtained from the data xsr in an example by Gwet 2021b (p 341) related to the change in the coloring of Stickleback fish (R = 4, K = 5, n = 50). Table 3(c) shows the values of the fourteen kappa coefficients mentioned in Sect. 3, all of which are also calculated for the non-weighted case (wij = δij). It can be observed that the fourteen coefficients verify the properties mentioned in Sect. 3. It is also observed that although the values of n and \(\hat{\kappa }\) are moderate, all of the new coefficients are greater than the classic ones in at least one unit of the second decimal. The exception is the case of the two coefficients of Gwet, in which the differences obtained are very small.

5 Simulation

This section has two objectives. Firstly, to assess the bias of the two estimators of κX (\(\hat{\kappa }_{X}\) and \(\hat{\kappa }_{XU}\)) in the case of R = 2, where X refers to C, S, K or G. Secondly, to assess the behaviour of the estimator of the variance \(\hat{V}\left( {\hat{\kappa }_{CU} } \right)\), in order to exemplify that the new variances act coherently in relation to the classic ones.

To assess the two estimators, the procedure is as follows. Let us consider that the observed frequencies in Table 1(a), divided by n = 100, are the true probabilities pij of the problem mentioned, in which R = 2 and K = 3; for example, p11 = 75/100 = 0.75. In that case the value \(\hat{\kappa }_{C}\) = 0.676 of the Table 1(b) becomes the population value κC = 0.676 of the Cohen kappa coefficient, since the values \(\hat{I}_{o}\) and \(\hat{I}_{e}\) of \(\hat{\kappa }_{C}\) become the values Io and Ie of κC. If we now extract N = 10,000 random samples of the multinomial distribution of parameters {pij, n = 100}, each sample will provide two estimators \(\hat{\kappa }_{Ch}\) and \(\hat{\kappa }_{CUh}\) of κC. The means \(\overline{\hat{\kappa }}_{C} = {{\Sigma_{h} \hat{\kappa }_{Ch} } \mathord{\left/ {\vphantom {{\Sigma_{h} \hat{\kappa }_{Ch} } N}} \right. \kern-0pt} N}\) and \(\overline{\hat{\kappa }}_{CU} = {{\Sigma_{h} \hat{\kappa }_{CUh} } \mathord{\left/ {\vphantom {{\Sigma_{h} \hat{\kappa }_{CUh} } N}} \right. \kern-0pt} N}\) of the values \(\hat{\kappa }_{Ch}\) and \(\hat{\kappa }_{CUh}\) should be approximately equal to κC = 0.676 if the estimators were unbiased. The results of this simulation are provided on the sixteenth line of results in Table 4. The rest of the lines, where other values of K, n, and κC are used, were obtained in a similar way. It can be seen that in general κC = \(\overline{\hat{\kappa }}_{CU}\) ≥ \(\overline{\hat{\kappa }}_{C}\), except in two case in which κC > \(\overline{\hat{\kappa }}_{C}\) ≥ \(\overline{\hat{\kappa }}_{CU}\). Therefore, \(\hat{\kappa }_{CU}\) is less biased than \(\hat{\kappa }_{C}\) and, for the accuracy used, is generally unbiased. Nevertheless, \(\hat{\kappa }_{C}\) is only unbiased for values n ≥ 50 or 100, depending on the value of K.

Table 4 Results of the 10,000 simulations performed for the kappa values indicated

The same tables and previous simulations allow us to obtain the corresponding results of the other two pairs of estimators (see the rest of Table 4). In the case of Scott's pi coefficient, it is also observed that κS = \(\overline{\hat{\kappa }}_{SU}\) ≥ \(\overline{\hat{\kappa }}_{S}\), except in four cases in which κS > \(\overline{\hat{\kappa }}_{SU}\) ≥ \(\overline{\hat{\kappa }}_{S}\), so that \(\hat{\kappa }_{SU}\) is also generally unbiased; additionally \(\overline{\hat{\kappa }}_{SU}\) = \(\overline{\hat{\kappa }}_{S}\) only for n = 100. The conclusions are a little different in the case of Krippendorf's alpha coefficient; in general it still occurs that κK = \(\overline{\hat{\kappa }}_{KU}\) ≥ \(\overline{\hat{\kappa }}_{K}\), except in five cases in which κK < \(\overline{\hat{\kappa }}_{KU}\) or κK > \(\overline{\hat{\kappa }}_{KU}\), in such a way that \(\hat{\kappa }_{KU}\) may also underestimate κK; now \(\overline{\hat{\kappa }}_{KU}\) = \(\overline{\hat{\kappa }}_{K}\) on some occasions when n ≥ 50. As can be seen, the three pairs of previous coefficients are either unbiased or they underestimate the value of the populational parameter. In the case of Gwet's AC1 coefficient, the opposite happens. In general κG = \(\overline{\hat{\kappa }}_{GU}\) ≤ \(\overline{\hat{\kappa }}_{G}\), except in four cases in which κG < \(\overline{\hat{\kappa }}_{GU}\) or κG > \(\overline{\hat{\kappa }}_{GU}\), so that both estimators are either unbiased or they overestimate the value of the populational parameter. Now the equality \(\overline{\hat{\kappa }}_{GU}\) = \(\overline{\hat{\kappa }}_{G}\) generally happens when K > 2 and n ≥ 50.

The general conclusion is that the estimators \(\hat{\kappa }_{XU}\) are generally unbiased and, when they are biased, their bias is lower than that of the estimators \(\hat{\kappa }_{X}\). When there is bias, it is positive in the case of the Gwet coefficient, and is negative in the other three cases.

Let us now consider the case of variance. The classic estimator \(\hat{\kappa }_{C}\) has an unknown variance \(V_{E} \left( {\hat{\kappa }_{C} } \right)\) which can be estimated in a quite precise way through the sample variance \(\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)\) of the values \(\hat{\kappa }_{Ch}\) of the 10,000 simulations. Moreover, each simulation provides an estimator \(\hat{V}_{h} \left( {\hat{\kappa }_{C} } \right)\) of \(V_{E} \left( {\hat{\kappa }_{C} } \right)\) obtained through the formula of Fleiss et al. (1969); the average value \(\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)\) of these 10,000 estimators, compared to \(\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)\), allows us to check the bias of this estimator of the variance. The same reasoning is used in the case of the estimator \(\hat{\kappa }_{CU}\), although now \(\hat{V}_{h} \left( {\hat{\kappa }_{CU} } \right)\) is obtained through expression (5). The results are in Table 5. It can be seen that \(\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)\) ≈ \({\hat{\text{V}}}_{E} \left( {\hat{\kappa }_{C} } \right)\) for n ≥ 20, being in general \(V_{E} \left( {\hat{\kappa }_{CU} } \right)\) > ( <) \(V_{E} \left( {\hat{\kappa }_{C} } \right)\) when κC = 0.4 (0.8). It is also observed to that the classic variance \(\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)\) usually underestimates (overestimates) \(\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)\) when κC = 0.4 (0.8), the differences being small when n ≥ 50. However, the new variance \(\overline{\hat{V}}\left( {\hat{\kappa }_{CU} } \right)\) almost always underestimates \(\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)\), the differences being small when n ≥ 50, but somewhat higher than in the previous case. In general, \(\overline{\hat{V}}\left( {\hat{\kappa }_{C} } \right)\) is closer to \(\hat{V}_{E} \left( {\hat{\kappa }_{C} } \right)\) than \(\overline{\hat{V}}\left( {\hat{\kappa }_{CU} } \right)\) is to \(\hat{V}_{E} \left( {\hat{\kappa }_{CU} } \right)\).

Table 5 Results of the 10,000 simulations performed for the variances of two estimators of the Cohen kappa coefficient

6 Assessment of the difference between each pair of estimators

The objective of this section is to assess the difference ΔXU = ∣\(\hat{\kappa }_{XU}\) − \(\hat{\kappa }_{X}\)∣, when \(\hat{\kappa }_{X}\) is any of the traditional estimators. In general, these differences are only appreciable with small samples, so that it is of interest to determine from what value of n onwards is it practically indifferent to calculate \(\hat{\kappa }_{XU}\) or \(\hat{\kappa }_{X}\).

For \(\hat{\kappa }_{CU}\), in which \(\hat{\kappa }_{CU}\) ≥ \(\hat{\kappa }_{C}\), through expression (4), ΔCU = \(\hat{\kappa }_{C}\)(1 − \(\hat{\kappa }_{C}\))/{(n − 1) + \(\hat{\kappa }_{C}\)}. Its maximum value in \(\hat{\kappa }_{C}\) ≥ 0 is reached in \(\hat{\kappa }_{C}\) = (n − 1)0.5/{n0.5 + (n − 1)0.5} and is {n0.5 + (n − 1)0.5}−2. Therefore, ΔCU < 0.01 (or 0.02) when n > 50 (or 17). The conclusion is also valid for ΔHU and ΔLU = ∣\(\hat{\rho }_{LU} - \hat{\rho }_{L}\)∣, since \(\hat{\kappa }_{HU}\) and \(\hat{\rho }_{LU}\) have the same form as \(\hat{\kappa }_{CU}\).

For \(\hat{\kappa }_{SU}\), in which \(\hat{\kappa }_{SU}\) ≥ \(\hat{\kappa }_{S}\), ΔSU = (1 − \(\hat{\kappa }_{S}^{2}\))/{(2n − 1) + \(\hat{\kappa }_{S}\)} through expression (12). Its maximum value in \(\hat{\kappa }_{S}\) ≥ 0 is reached in \(\hat{\kappa }_{S}\) = 0 and is 1/(2n − 1). Therefore, ΔSU < 0.01 (or 0.02) when n > 100 (or 33). The conclusion is also valid for ΔF2U, since \(\hat{\kappa }_{F2U}\) has the same form as \(\hat{\kappa }_{FU}\). The case of \(\hat{\kappa }_{KU}\) for R = 2 − last expression of Eq. (15) − provides a maximum for ΔKU of 1/2n and leads to the same conclusion as above. The conclusion is also maintained for \(\hat{\kappa }_{KU}\) in R > 2 and \(\hat{\kappa }_{K2U}\), since they have the same form as \(\hat{\kappa }_{KU}\) for R = 2.

The case of \(\hat{\kappa }_{FU}\), in which \(\hat{\kappa }_{FU}\) ≥ \(\hat{\kappa }_{F}\), is somewhat more complex. Through expression (33), ΔFU = (1 − \(\hat{\kappa }_{F}\)){R − (R − 1)(1 − \(\hat{\kappa }_{F}\))}/{Rn − (R − 1)(1 − \(\hat{\kappa }_{F}\))}. Its maximum value in \(\hat{\kappa }_{F}\) ≥ 0 is reached in \(\hat{\kappa }_{F}\) = {(R − 1)(n − 1)0.5 − n0.5}/[(R − 1){n0.5 + (n − 1)0.5}] and is {R/(R − 1)} × {n0.5 + (n − 1)0.5}−2. Note that for R = 2 this value is double that which is obtained for \(\hat{\kappa }_{CU}\). Therefore, if we require that ΔFU < 0.01 (or 0.02), the value of n depends on the value of R. For example: n > 100 (or 33) for R = 2, n > 75 (or 25) for R = 3, n > 63 (or 21) for R = 5, and n > 56 (or 19) for R = 10. Moreover, ΔFU is a decreasing function in R, taking its extreme values \(\hat{\kappa }_{F}\)(1 − \(\hat{\kappa }_{F}\))/{(n − 1) + \(\hat{\kappa }_{F}\)} in R = ∞, and (1 − \(\hat{\kappa }_{F}^{2}\))/{(2n − 1) + \(\hat{\kappa }_{F}\)} in R = 2. As those expressions have the same form as ΔCU and ΔSU respectively, then the precise minimum values of n for this case are an intermediate value from among the pairs of values indicated for those two cases. This is compatible with the numerical results above.

The case of \(\hat{\kappa }_{GU}\), in which \(\hat{\kappa }_{GU}\) ≤ \(\hat{\kappa }_{G}\), is much more complex since its values ΔGU also depend on \(\hat{I}_{e}\) because of expression (41). In the most simple situation -the unweighted case-, it can be demonstrated that ΔGU ≤ {R/(R − 1)}/{m0.5 + (m − 1)0.5}−2, with m = (n − 1)(K − 1), an expression that depends on n, R and K; the level is also valid for the weighted case, although it is conservative. Therefore, if R = 2 and we require that ΔGU < 0.01 (or 0.02), the value of n depends on the value of K. For example: n > 101 (or 34) for K = 2, n > 51 (or 17) for K = 3, and n > 26 (or 9) for K = 5. The conclusion is also valid for ΔG2U, since \(\hat{\kappa }_{G2U}\) has the same form as \(\hat{\kappa }_{GU}\).

The previous formulas provide values which are compatible with the results of Tables 1, 2, 3 and 4. Excluding the Gwet estimators and adopting the criterion that we want to guarantee that ΔXU < 0.02 (0.01), the overall conclusion is that we should use the current estimators at least when n ≤ 17 (50) in the case of \(\hat{\kappa }_{CU}\) and \(\hat{\kappa }_{HU}\), or when n ≤ 33 (100) in the rest of the cases.

7 Conclusions

There are different types of kappa coefficients which measure the experimental degree of agreement between R raters. In this article, we have focused on Cohen's kappa (Cohen 1960, 1968), Scott's pi (Scott 1955), Gwet's AC1/2 (Gwet 2008) and Krippendorf's alpha coefficients (Krippendorf 1970, 2004), whether weighted or not, for R = 2, and in its pairwise type extensions, Hubert's kappa (Hubert 1977; Conger 1980), Fleiss's kappa (Fleiss 1971), Gwet's AC1/2 and Krippendorf's alpha coefficients, for R > 2. In this last case (R > 2), the four measures of agreement use the pairwise method to determine the observed index of agreements Io, but only the measure of Hubert's kappa also uses the pairwise method to determine the expected index of agreements Ie. We have called the measures obtained in this last way as two-pairwise measures. We have also defined the other three coefficients (Fleiss's kappa, Gwet's AC1/2 and Krippendorf's alpha) from the two-pairwise point of view, thus obtaining the three Fleiss's kappas two-pairwise, etc. That is why the number of agreement coefficients that have been defined is eleven.

The article demonstrates that all of the traditional estimators of the eleven coefficients are based on biased estimators of Ie. The alternative is to use the eleven new proposed coefficients, which are based on unbiased estimators of Ie. In all cases, the traditional estimators are smaller than or equal to the new ones, except for the case of Gwet, where it is the other way around. The simulations carried out for the case of R = 2 show that the classic estimators \(\hat{\kappa }_{X}\) usually underestimate κX (or overestimate, in the case of X = G), while the new estimators \(\hat{\kappa }_{XU}\) are usually approximately unbiased. Additionally, it is verified that the new estimators \(\hat{\kappa }_{XU}\) may be unnecessary when the sample size n is sufficiently large (e.g. n > 30). The article also provides the variances of the new estimators as a function of the variances of the classic estimators, except in the case of the Gwet estimators.

One question of interest is the relation between the coefficients and estimators of Hubert's kappa (Hubert 1977; Conger 1980), the CCC (Lin 1989, 2000), and the ICC (Shrout and Fleiss 1979; Carrasco and Jover 2003), when in the first case quadratic weights are used. In the article it has been justified that: (1) κH = ρL = ρI2, with respect to the coefficients; (2) \(\hat{\kappa }_{H}\) = \(\hat{\rho }_{L}\), with respect to classical estimators based on biased estimators of the components of the coefficients; and (3) \(\hat{\kappa }_{HU}\) = \(\hat{\rho }_{LU}\) = \(\hat{\rho }_{I2}\), with respect to classical (\(\hat{\rho }_{LU}\) and \(\hat{\rho }_{I2}\)) or new (\(\hat{\kappa }_{HU}\)) estimators based on unbiased estimators of all components of the coefficients. These statements are true for R ≥ 2, so that for R = 2 it is obtained that: κC = ρL = ρI2, \(\hat{\kappa }_{C}\) = \(\hat{\rho }_{L}\), and \(\hat{\kappa }_{CU}\) = \(\hat{\rho }_{LU}\) = \(\hat{\rho }_{I2}\).

Finally, the entire article has been developed for the general case in which the measures are defined based on any wij weights, thus avoiding a repetition of expressions and demonstrations. Nevertheless the non-weighted case (wij = δij) is very common. To make the text more reader friendly “Appendix 4” includes the eleven non-weighted coefficients mentioned in this article.