1 Introduction

In data analysis and classification similarity coefficients are commonly used to quantify the strength of a relationship between two objects, variables, features or classifications (Goodman and Kruskal 1954; Gower and Warrens 2017). Similarity coefficients may be used to summarize parts of a research study, for example, an agreement or reliability study. They can also be used as input for methods of multivariate analysis such as factor analysis and cluster analysis (Bartholomew et al. 2011; Hennig et al. 2016).

Well-known examples of similarity coefficients are the Pearson correlation, which is a standard tool for assessing linear association between two variables, coefficient alpha (Cronbach 1951; Hoekstra et al. 2019), which is frequently used in classical test theory to estimate the reliability of a test score, the Jaccard coefficient (Jaccard 1912), which is commonly used for assessing co-occurrence of two species types, and the Hubert-Arabie adjusted Rand index (Hubert and Arabie 1985; Steinley et al. 2016), which is a standard tool for measuring agreement between two partitions of the same set of objects.

In social, behavioral and biomedical sciences kappa coefficients are commonly used for quantifying agreement between two classifications with identical categories (Vanbelle 2016; Warrens 2014, 2017). Agreement between classifications with nominal categories is usually assessed with Cohen’s kappa (Cohen 1960; Brennan and Prediger 1981; Maclure and Willett 1987; Kundel and Polansky 2003; Hsu and Field 2003; Conger 2017), whereas agreement between classifications with ordinal categories is commonly assessed with weighted kappa coefficients (Cohen 1968; Vanbelle and Albert 2009; Warrens 2011, 2012; Yang and Zhou 2015; Vanbelle 2016; Moradzadeh et al. 2017). These commonly used kappa coefficients have been extended in various directions. Kappa coefficients have been developed for multiple raters (Conger 1980; Warrens 2010), for hierarchical data (Vanbelle et al. 2012; Yang and Zhou 2014, 2015), for fuzzy classifications (Dou et al. 2007; Warrens 2016), for circular classifications (Warrens and Pratiwi 2016), and for situations with missing data (Strijbos and Stahl 2007; De Raadt et al. 2019).

Categories of nominal classifications are mutually exclusive and (usually) collectively exhaustive. There are, basically, two types of nominal classifications. The distinction hinges upon whether the classification does or does not include an ‘absence’ category. When there is no ‘absence’ category, a classification can be described as having three, four or more unordered categories of ‘presence’ that specify, for example, various disorders. This type of classification can be compared to a classification that contains an ‘absence’ category (for example, no disorder) and two or more ‘presence’ categories. The first type of classification will simply be referred to as a regular nominal classification. The second type of classification will be called a dichotomous-nominal classification, following terminology used in Cicchetti et al. (1992) for ordinal classifications.

Let us consider two examples of dichotomous-nominal classifications. The first example comes from the diagnosis of movement disorders (Son et al. 2014). Movement disorders are clinical syndromes that cause abnormal increased movements, or reduced or slow movements. Examples of movement disorders are dyskinesia (excessive, often repetitive, involuntary movement), akinesia (lack of voluntary movement) and hypokinesia (reduced amplitude of movement) (Fahn et al. 2011). Table 1 presents hypothetical pairwise classifications of 169 individuals with assumed movement disorder into four categories by two classifiers. The first three categories \(A_1\), \(A_2\) and \(A_3\) correspond to movement disorders. The last category \(A_4\) is the ‘absence’ category. Because the categories of the rows and columns of Table 1 are in the same order, the elements on the main diagonal are the number of individuals on which the classifiers agreed. All off-diagonal elements are numbers of individuals on which the classifiers disagreed.

As a second example we consider the diagnosis of personality disorders (Spitzer and Fleiss 1974; Loranger et al. 1997). These are mental disorders characterized by enduring maladaptive patterns of behavior and cognition. Personality disorders are usually grouped into three types, suspicious disorders, emotional and impulsive disorders, and anxious personality disorders. The first type further consists of paranoid, schizoid, schizotypical and antisocial personality disorders. Table 2 presents hypothetical pairwise classifications of 255 individuals with assumed suspicious personality disorder into five categories by two classifiers. The first four categories \(A_1\), \(A_2\), \(A_3\) and \(A_4\) correspond to suspicious personality disorders. The last category \(A_5\) is the ‘absence’ category.

Table 1 Hypothetical pairwise classifications of 169 individuals with assumed movement disorder by two human classifiers
Table 2 Hypothetical pairwise classifications of 255 individuals with assumed suspicious personality disorder by two human classifiers

Cohen’s kappa coefficient (Cohen 1960; Warrens 2011, 2015) can be used for assessing agreement between two regular nominal classifications. If one uses Cohen’s kappa to quantify agreement between the classifications, the distances between all categories are considered equal, and this makes sense if all nominal categories reflect different types of ‘presence’. However, there are no coefficients for assessing agreement between two dichotomous-nominal classifications with the same categories. Up till now Cohen’s kappa (and its extensions) have been used to analyze agreement between these classifications.

However, disagreement between classifiers on a ‘presence’ category and the ‘absence’ category may be much more serious than disagreement on two ‘presence’ categories, for example, for clinical treatment. The crucial clinical implication is that in the quantification of agreement distances between a ‘presence’ category and the ‘absence’ category should be dealt with differently than with two ‘presence’ categories. Cohen’s kappa does not accomplish this. In this manuscript we therefore develop new kappa coefficients for assessing agreement between dichotomous-nominal classifications. In addition, we present various properties of the coefficients.

The manuscript is organized as follows. In Sect. 2 we introduce the notation and present several definitions. A family of kappa coefficients for dichotomous-nominal classifications with identical categories is defined in Sect. 3. In Sect. 4 we present various properties of the coefficients. Among other things it is shown that the values of the new kappa coefficients can be ordered in two ways. One ordering is more likely to occur in practice. The second ordering is the reverse ordering of the first one. In Sect. 5 it is shown that the values of the new kappa coefficients increase with the number of categories for a class of agreement tables with constant values observed agreement and disagreement. A discussion and several recommendations are presented in Sect. 6.

2 Notation and weighted kappa

Suppose that two fixed classifiers (for example, expert observers, algorithms, rating instruments) have independently classified the same set of n objects (for example, individuals, scans, products) into \(c\ge 2\) unordered categories \(A_1,A_2,\ldots ,A_c\), that were defined in advance. We assume that the first \(c-1\) categories, labeled \(A_1,A_2,\ldots ,A_{c-1}\), are the ‘presence’ categories, and that the last category, labeled \(A_c\), denotes the ‘absence’ category.

For a population of objects, let \(\pi _{ij}\) denote the proportion of objects that is classified into category \(A_i\) by the first classifier and into category \(A_j\) by the second classifier, where \(i,j\in \left\{ 1,2,\ldots ,c\right\} \). We assume that the categories of the rows and columns of the table \(\left\{ \pi _{ij}\right\} \) are in the same order, so that the diagonal elements \(\pi _{ii}\) reflect the exact agreement between the two classifiers. In the context of agreement studies the table \(\left\{ \pi _{ij}\right\} \) is sometimes called an agreement table. The table \(\left\{ \pi _{ij}\right\} \) summarizes the pairwise information between the two nominal classifications (by classifiers 1 and 2). Furthermore, table \(\left\{ \pi _{ij}\right\} \) contains all information needed to define and calculate kappa coefficients.

Define the marginal totals

$$\begin{aligned} \pi _{i+}:=\sum ^c_{j=1}\pi _{ij}\quad \text{ and }\quad \pi _{+i}:=\sum ^c_{j=1}\pi _{ji}. \end{aligned}$$
(1)

The marginal probabilities \(\pi _{i+}\) and \(\pi _{+i}\) reflect how often the categories were used by the first and second classifier, respectively. Furthermore, if the ratings between the two classifiers are statistically independent the expected value of \(\pi _{ij}\) is given by \(\pi _{i+}\pi _{+j}\). The table \(\left\{ \pi _{i+}\pi _{+j}\right\} \) contains the expected values of the elements of table \(\left\{ \pi _{ij}\right\} \) under statistical independence of the classifiers.

In the next section we define kappa coefficients for dichotomous-nominal classifications as special cases of weighted kappa (Cohen 1968). Weighted kappa allows the user to describe the closeness between categories using weights (Vanbelle and Albert 2009; Warrens 2011, 2012; Yang and Zhou 2015; Vanbelle 2016; Moradzadeh et al. 2017). Let the real number \(0\le w_{ij}\le 1\) denote the weight corresponding to cell (ij) of tables \(\left\{ \pi _{ij}\right\} \) and \(\left\{ \pi _{i+}\pi _{+j}\right\} \). The weighted kappa coefficient is defined as (Warrens 2011)

$$\begin{aligned} \kappa :=\frac{O-E}{1-E}. \end{aligned}$$
(2)

where

$$\begin{aligned} O=\sum ^c_{i=1}\sum ^c_{j=1}w_{ij}\pi _{ij},\quad \text{ and }\quad E=\sum ^c_{i=1}\sum ^c_{j=1}w_{ij}\pi _{i+}\pi _{+j}. \end{aligned}$$
(3)

The cell probabilities of the table \(\left\{ \pi _{ij}\right\} \) are not directly observed. Let table \(\left\{ n_{ij}\right\} \) denote the contingency table of observed frequencies. Tables 1 and 2 are two examples of \(\left\{ n_{ij}\right\} \). Assuming a multinominal sampling model with the total number of objects n fixed, the maximum likelihood estimate of \(\pi _{ij}\) is given by \(\hat{\pi }_{ij}=n_{ij}/n\) (Yang and Zhou 2014, 2015). Furthermore, under the multinominal sampling model, the maximum likelihood estimate of \(\kappa \) is

$$\begin{aligned} \hat{\kappa }=\frac{\hat{O}-\hat{E}}{1-\hat{E}}. \end{aligned}$$
(4)

where

$$\begin{aligned} \hat{O}=\sum ^c_{i=1}\sum ^c_{j=1}w_{ij}\frac{n_{ij}}{n},\quad \text{ and }\quad \hat{E}=\sum ^c_{i=1}\sum ^c_{j=1}w_{ij}\frac{n_{i+}n_{+j}}{n^2}. \end{aligned}$$
(5)

The estimates in (4) and (5) are obtained by substituting \(\hat{\pi }_{ij}=n_{ij}/n\) for the cell probabilities \(\pi _{ij}\) in (2) and (3), respectively.

Next, we define several quantities for notational convenience. Consider the table \(\left\{ \pi _{ij}\right\} \) with cell probabilities, and define the quantities

$$\begin{aligned} \lambda _0&:=\sum ^c_{i=1}\pi _{ii}, \end{aligned}$$
(6a)
$$\begin{aligned} \lambda _1&:=\sum ^{c-2}_{i=1}\sum ^{c-1}_{j=i+1}(\pi _{ij}+\pi _{ji}), \end{aligned}$$
(6b)
$$\begin{aligned} \lambda _2&:=1-\lambda _0-\lambda _1. \end{aligned}$$
(6c)

Quantity \(\lambda _0\) is the total observed agreement, the proportion of objects that have been classified into the same categories by both classifiers. Furthermore, quantity \(\lambda _1\) is the proportion of observed disagreement between the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\). Moreover, quantity \(\lambda _2\) is the proportion of observed disagreement between ‘absence’ category \(A_c\) on the one hand, and the ‘presence’ categories on the other hand.

Next, consider the table \(\left\{ \pi _{i+}\pi _{+j}\right\} \), and define the quantities

$$\begin{aligned} \mu _0&:=\sum ^c_{i=1}\pi _{i+}\pi _{+i}, \end{aligned}$$
(7a)
$$\begin{aligned} \mu _1&:=\sum ^{c-2}_{i=1}\sum ^{c-1}_{j=i+1}(\pi _{i+}\pi _{+j}+\pi _{j+}\pi _{+i}), \end{aligned}$$
(7b)
$$\begin{aligned} \mu _2&:=1-\mu _0-\mu _1. \end{aligned}$$
(7c)

Quantities \(\mu _0\), \(\mu _1\) and \(\mu _2\) are the expected values of quantities \(\lambda _0\), \(\lambda _1\) and \(\lambda _2\), respectively, under statistical independence of the classifiers.

3 New kappa coefficients

In this section we define a family of kappa coefficients that can be used for quantifying agreement between two dichotomous-nominal classifications with the same categories. The kappas differ only by one parameter. To model the agreement and disagreement between the categories we use three different numbers. As usual with kappa coefficients we will give full weight 1 to the entries on the main diagonal of \(\left\{ \pi _{ij}\right\} \) (Cohen 1968; Warrens 2012, 2013). Furthermore, let \(u\in [0,1]\) be a real number. We give a partial weight u to model the disagreement among ‘presence’ categories \(A_1,\ldots ,A_{c-1}\). All other weights are set to zero: the weight 0 is used to model the disagreement between all ‘presence’ categories and the single ‘absence’ category \(A_c\). The weighting scheme is then given by

$$\begin{aligned} w_{ij}:={\left\{ \begin{array}{ll} 1,&{}\text{ if } i=j;\\ u,&{}\text{ if } i,j\in \left\{ 1,2,\ldots ,c-1\right\} \;\text{ with }\;i\ne j;\\ 0,&{}\text{ otherwise }. \end{array}\right. } \end{aligned}$$
(8)

The weighting scheme in (8) makes sense if we expect that disagreement between the classifiers on the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\) is similar for all pairs of categories, and if disagreement among \(A_1,\ldots ,A_{c-1}\) is less serious than between \(A_1,\ldots ,A_{c-1}\) on the one hand and \(A_c\) on the other hand.

Using the quantities in (6) the weighted observed agreement with parameter u is defined as

$$\begin{aligned} O_u:=\lambda _0+u\lambda _1. \end{aligned}$$
(9)

Furthermore, using the quantities in (7) the expected value of (9) under statistical independence is given by

$$\begin{aligned} E_u:=\mu _0+u\mu _1. \end{aligned}$$
(10)

By using higher values of u in (9) and (10) more weight is given to the disagreement among categories \(A_1,\ldots ,A_{c-1}\).

Using (9) and (10) a family of kappas with parameter u can be defined as

$$\begin{aligned} \kappa _u:=\frac{O_u-E_u}{1-E_u}=\frac{\lambda _0+u\lambda _1-\mu _0-u\mu _1}{1-\mu _0-u\mu _1}. \end{aligned}$$
(11)

The value of (11) is equal to 1 if there is perfect agreement between the classifiers (i.e. \(\lambda _0=1\)), and 0 when \(\lambda _0+u\lambda _1=\mu _0+u\mu _1\). Formula (11) is also obtained if one uses weighting scheme (8) in the general formula (2). Under the multinominal sampling model, the maximum likelihood estimate of \(\kappa _u\) is, using (4) and (5),

$$\begin{aligned} \hat{\kappa }_u=\frac{\hat{O}_u-\hat{E}_u}{1-\hat{E}_u}, \end{aligned}$$
(12)

where

$$\begin{aligned} \hat{O}_u=\sum ^c_{i=1}\frac{n_{ii}}{n}+u\sum ^{c-2}_{i=1}\sum ^{c-1}_{j=i+1} \frac{n_{ij}+n_{ji}}{n}, \end{aligned}$$
(13)

and

$$\begin{aligned} \hat{E}_u=\sum ^c_{i=1}\frac{n_{i+}n_{+i}}{n^2}+u\sum ^{c-2}_{i=1} \sum ^{c-1}_{j=i+1}\frac{n_{i+}n_{+j}+n_{j+}n_{+i}}{n^2}. \end{aligned}$$
(14)

A large sample variance estimator for (12) is given by (Fleiss et al. 1969; Yang and Zhou 2015)

$$\begin{aligned} \hat{\text{ var }}(\hat{\kappa }_u)=\frac{1}{n(1-\hat{O}_u)^4} \left[ M-\left( \hat{O}_u\hat{E}_u-2\hat{E}_u+\hat{O}_u\right) ^2\right] , \end{aligned}$$
(15)

where the quantity M is given by

$$\begin{aligned} M=\sum ^c_{i=1}\sum ^c_{j=1}\frac{n_{ij}}{n}\left[ w_{ij}\left( 1-\hat{E}_u \right) -(\bar{w}_{i+}+\bar{w}_{+j})\left( 1-\hat{O}_u\right) \right] ^2, \end{aligned}$$
(16)

and quantities \(\bar{w}_{i+}\) and \(\bar{w}_{+j}\) are given by

$$\begin{aligned} \bar{w}_{i+}=\sum ^c_{j=1}w_{ij}\frac{n_{+j}}{n},\quad \text{ and }\quad \bar{w}_{+j}=\sum ^c_{i=1}w_{ij}\frac{n_{i+}}{n}. \end{aligned}$$
(17)

Formula (15), together with (16) and (17), will be used to estimate 95% confidence intervals of the point estimate \(\hat{\kappa }_u\) (see Table 3 below).

Let us consider two special cases of (11). For \(u=0\) we obtain

$$\begin{aligned} \kappa _0=\frac{\lambda _0-\mu _0}{1-\mu _0}=\frac{\sum \nolimits ^c_{i=1}(\pi _{ii}-\pi _{i+}\pi _{+i})}{1-\sum \nolimits ^c_{i=1}\pi _{i+}\pi _{+i}}. \end{aligned}$$
(18)

The coefficient in (18) is Cohen’s ordinary kappa (Cohen 1960; Yang and Zhou 2014; Warrens 2011, 2015), a standard tool for assessing agreement in the case of regular nominal classifications. The value of (18) is equal to 1 when there is perfect agreement between the classifiers (i.e. \(\lambda _0=1\)), 0 when the observed agreement is equal to that expected under independence (i.e. \(\lambda _0=\mu _0\)), and negative when agreement is less than expected by chance.

Table 3 Point and interval estimates of various versions of the coefficient in (12) for the data in Tables 1 and 2

For \(u=1\) we obtain

$$\begin{aligned} \kappa _1=\frac{\lambda _0+\lambda _1-\mu _0-\mu _1}{1-\mu _0-\mu _1}. \end{aligned}$$
(19)

At first sight it is unclear how the kappa coefficient in (19) may be interpreted. An interpretation of coefficient (19) is presented in Theorem 2 in the next section. Table 3 presents point and interval estimates of (12) for the data in Tables 1 and 2, and for five values of parameter u. The values in Table 3 illustrate that the value of the coefficient in (12) increases in the parameter u for the data in Tables 1 and 2. This property is formally proved in Theorem 6 in the next section.

4 Properties of the kappa coefficients

In this section several relationships between the new kappa coefficients for dichotomous-nominal classifications are presented.

Theorem 1 shows that if the classifiers do not use the ‘absence’ category, then the kappa coefficient in (11) is identical to Cohen’s ordinary kappa (Cohen 1960). This property makes a lot of sense, since if the ‘absence’ category is not used, dichotomous-nominal classifications are de facto regular nominal classifications, and Cohen’s kappa is a standard tool for quantifying agreement between regular nominal classifications with identical categories.

Theorem 1

If ‘absence’ category \(A_c\) is not used by the classifiers, then \(\kappa _u=\kappa _0\).

Proof

If only ‘presence’ categories are used we have \(\pi _{c+}=\pi _{+c}=0\). In this case we have \(\lambda _2=0\) and \(\mu _2=0\), and thus the identities \(\lambda _1=1-\lambda _0\) and \(\mu _1=1-\mu _0\). Using these identities in (11) we obtain

$$\begin{aligned} \kappa _u=\frac{\lambda _0+u(1-\lambda _0)-\mu _0-u(1-\mu _0)}{1-\mu _0-u(1-\mu _0)}=\frac{(1-u)\lambda _0-(1-u)\mu _0}{1-u-(1-u)\mu _0}. \end{aligned}$$
(20)

Dividing all terms on the right-hand side of (20) by \((1-u)\) yields the coefficient in (18). \(\square \)

Theorem 2 shows that the kappa coefficient in (19) is identical to the coefficient that is obtained if we combine all the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\) into a single ‘presence’ category, and then calculate Cohen’s ordinary kappa for the collapsed \(2\times 2\) table.

Theorem 2

Coefficient \(\kappa _1\) is obtained if we combine the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\), and then calculate coefficient (18) for the collapsed \(2\times 2\) table.

Proof

Let \(\lambda ^*_0\), \(\mu ^*_0\) and \(\kappa ^*_0\) denote, respectively, the values of \(\lambda _0\), \(\mu _0\) and \(\kappa _0\) for the collapsed \(2\times 2\) table. If we combine categories \(A_1,\ldots ,A_{c-1}\) we have \(\lambda ^*_0=\lambda _0+\lambda _1\) and \(\mu ^*_0=\mu _0+\mu _1\). Hence, the coefficient in (18) for the collapsed \(2\times 2\) table is equal to

$$\begin{aligned} \kappa ^*_0=\frac{\lambda ^*_0-\mu ^*_0}{1-\mu ^*_0} =\frac{\lambda _0+\lambda _1-\mu _0-\mu _1}{1-\mu _0-\mu _1}, \end{aligned}$$
(21)

which is equivalent to the coefficient in (19). \(\square \)

Theorem 2 provides several ways to interpret coefficient \(\kappa _1\) in (19). First of all, the coefficient may be interpreted as a ‘presence’ versus ‘absence’ kappa coefficient. Furthermore, the procedure of combining all other categories (in this case all ‘presence’ categories) except a category of interest (in this case the ‘absence’ category), followed by calculating Cohen’s ordinary kappa for the collapsed \(2\times 2\) table, defines a category kappa for the category of interest (in this case the ‘absence’ category) (Kraemer 1979; Warrens 2011, 2015). Category kappas can be used to quantify agreement between the classifiers for individual categories. Hence, coefficient \(\kappa _1\) in (19) is the kappa coefficient for the ‘absence’ category.

Theorem 3 presents an alternative formula for coefficient \(\kappa _1\). It turns out that we only need three numbers to calculate this coefficient, regardless of the size of the total number of categories, namely, the values of \(\pi _{cc}\), \(\pi _{c+}\) and \(\pi _{+c}\).

Theorem 3

Coefficient \(\kappa _1\) can be calculated using

$$\begin{aligned} \kappa _1=\frac{\pi _{cc}-\pi _{c+}\pi _{+c}}{\dfrac{\pi _{c+}+\pi _{+c}}{2}-\pi _{c+}\pi _{+c}}. \end{aligned}$$
(22)

Proof

Using identities (6c) and (7c), the coefficient in (19) can be expressed as

$$\begin{aligned} \kappa _1=\frac{1-\lambda _2-(1-\mu _2)}{1-(1-\mu _2)} =\frac{\mu _2-\lambda _2}{\mu _2}. \end{aligned}$$
(23)

Using the identities \(\lambda _2=\pi _{c+}+\pi _{+c}-2\pi _{cc}\) and \(\mu _2=\pi _{c+}(1-\pi _{+c})+\pi _{+c}(1-\pi _{c+})\) in (23) yields

$$\begin{aligned} \kappa _1=\frac{2(\pi _{cc}-\pi _{c+}\pi _{+c})}{\pi _{c+}+\pi _{+c}-2\pi _{c+}\pi _{+c}}. \end{aligned}$$
(24)

Dividing all terms on the right-hand side of (24) by 2, we get the expression in (22).

Theorem 4 shows that all special cases of (11) coincide with \(c=2\) categories.

Theorem 4

If \(c=2\), then \(\kappa _u=\kappa _0\).

Proof

If \(A_1\) is the only ‘presence’ category and \(A_2\) is the ‘absence’ category, there is no disagreement between the classifiers on ‘presence’ categories, that is, \(\lambda _1=0\) and \(\mu _1=0\). Using \(\lambda _1=0\) and \(\mu _1=0\) in (11) we obtain

$$\begin{aligned} \kappa _u=\frac{\lambda _0-\mu _0}{1-\mu _0}, \end{aligned}$$
(25)

which is the coefficient in (18).

Since all special cases coincide with \(c=2\) categories (Theorem 4), we assume from here on that \(c\ge 3\).

Theorem 5 states that the kappa coefficient in (11) is a weighted average of the kappa coefficients in (18) and (19). The proof of Theorem 5 follows from simplifying the expression on the right-hand side of (26).

Theorem 5

Coefficient \(\kappa _u\) is a weighted average of \(\kappa _0\) and \(\kappa _1\) using, respectively, \((1-u)(1-\mu _0)\) and \(u(1-\mu _0-\mu _1)\) as weights:

$$\begin{aligned} \kappa _u=\frac{(1-u)(1-\mu _0)\kappa _0+u(1-\mu _0-\mu _1) \kappa _1}{(1-u)(1-\mu _0)+u(1-\mu _0-\mu _1)}. \end{aligned}$$
(26)

Since \(\kappa _u\) is a weighted average of \(\kappa _0\) and \(\kappa _1\) (Theorem 5) all values of \(\kappa _u\) for \(u\in (0,1)\) are between \(\kappa _0\) and \(\kappa _1\) when \(\kappa _0\ne \kappa _1\). Coefficients \(\kappa _0\) and \(\kappa _1\) are the minimum and maximum values of \(\kappa _u\) on \(u\in [0,1]\). For example, consider the numbers in Table 3. For both Tables 1 and 2 coefficient \(\kappa _0\) is the minimum and \(\kappa _1\) is the maximum value.

Theorem 6 shows that there exist precisely two orderings of the kappa coefficients for dichotomous-nominal classifications, as long as \(\kappa _0\ne \kappa _1\). If we have \(\kappa _0=\kappa _1\) the value of \(\kappa _u\) does not depend on u, or in other words, the values of all kappa coefficients for dichotomous-nominal classifications coincide.

Theorem 6

If \(\kappa _0<\kappa _1\), then \(\kappa _u\) is strictly increasing and concave upward on \(u\in [0,1]\). Conversely, if \(\kappa _0>\kappa _1\), then \(\kappa _u\) is strictly decreasing and concave downward on \(u\in [0,1]\).

Proof

The first derivative of (26) with respect to u is given by

$$\begin{aligned} \frac{d\kappa _u}{du}=\frac{(\kappa _1-\kappa _0)(1-\mu _0)(1-\mu _0-\mu _1)}{[(1-u)(1-\mu _0)+u(1-\mu _0-\mu _1)]^2}, \end{aligned}$$
(27)

and the second derivative of (26) with respect to u is given by

$$\begin{aligned} \frac{d^2\kappa _u}{du^2}=\frac{2\mu _1(\kappa _1-\kappa _0)(1-\mu _0) (1-\mu _0-\mu _1)}{[(1-u)(1-\mu _0)+u(1-\mu _0-\mu _1)]^3}. \end{aligned}$$
(28)

Since the quantities \((1-\mu _0)\) and \((1-\mu _0-\mu _1)\) in the numerators of (27) and (28), together with the denominators of (27) and (28), are strictly positive, (27) and (28) are strictly positive if \(\kappa _0<\kappa _1\). Since (27) is strictly positive if \(\kappa _0<\kappa _1\), (26) and (11) are strictly increasing on \(u\in [0,1]\). Furthermore, since (28) is strictly positive if \(\kappa _0<\kappa _1\), (26) and (11) are concave upward on \(u\in [0,1]\). \(\square \)

The properties presented in Theorem 6 can be illustrated with the numbers in Table 3. For both Tables 1 and 2 the values of the new coefficients are strictly increasing from \(\kappa _0\) to \(\kappa _1\). Furthermore, the coefficient values near \(u=0\) (i.e. near \(\kappa _0\)) are closer together than the coefficient values near \(u=1\) (i.e. near \(\kappa _1\)). The latter illustrates the concave upward property.

Theorem 7 presents a condition that is equivalent to the inequality \(\kappa _0<\kappa _1\). The latter inequality holds if the ratio of observed disagreement between the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\) to the corresponding expected disagreement under independence of the classifiers, exceeds the ratio of the observed disagreement between ‘absence’ category \(A_c\) on the one hand, and the ‘presence’ categories on the other hand, to the corresponding expected disagreement (i.e. condition ii. of Theorem 7).

Theorem 7

The following conditions are equivalent.

  1. i.

    \(\kappa _0<\kappa _1\);

  2. ii.

    \(\dfrac{\lambda _1}{\mu _1}>\dfrac{\lambda _2}{\mu _2}\).

Proof

Using identities (6c) and (7c) we have

$$\begin{aligned} \kappa _0=\frac{\mu _1+\mu _2-\lambda _1-\lambda _2}{\mu _1+\mu _2}=1-\frac{\lambda _1+\lambda _2}{\mu _1+\mu _2} \end{aligned}$$
(29)

and

$$\begin{aligned} \kappa _1=\frac{\mu _2-\lambda _2}{\mu _2}=1-\frac{\lambda _2}{\mu _2}. \end{aligned}$$
(30)

Hence, condition i. (inequality \(\kappa _0<\kappa _1\)) is equivalent to

$$\begin{aligned} \frac{\lambda _1+\lambda _2}{\mu _1+\mu _2}>\frac{\lambda _2}{\mu _2}. \end{aligned}$$
(31)

Condition ii. is then obtained by cross multiplying the terms of (31), followed by deleting terms that are on both sides of the inequality, and finally rearranging the remaining terms. \(\square \)

Theorems 6 and 7 show that if one of the two conditions of Theorem 7 holds then all special cases of (11) are strictly ordered. Moreover, the kappa coefficients can be ordered in precisely two ways. Furthermore, Theorem 7 also provides a condition under which all the new kappa coefficients obtain the same value, which can be empirically checked:

$$\begin{aligned} \frac{\lambda _1}{\mu _1}=\dfrac{\lambda _2}{\mu _2}. \end{aligned}$$
(32)

If (32) holds, we have \(\kappa _0=\kappa _1\) and all the new kappa coefficients produce the same value.

5 Dependence on the number of categories

In this section, a possible dependence of the new kappa coefficients on the number of categories is studied. In Theorem 8, it is assumed that data can be described by the specific structure presented in (33). Theorem 8 presents an example of a class of agreement tables for which all kappa coefficients for dichotomous-nominal classifications are increasing in the number of categories c. The agreement tables in this class vary in size (i.e. have different number of categories), but they have the same observed agreement, the same observed disagreement between the ‘presence’ categories \(A_1,\ldots ,A_{c-1}\), and the same observed disagreement between ‘absence’ category \(A_c\) on the one hand, and the ‘presence’ categories on the other hand. The specific values of the proportions of observed agreement and disagreements (denoted by \(b_0\), \(b_1\) and \(b_2\)) are however not fixed.

Theorem 8

Let \(c\ge 3\) and let \(0\le b_0,b_1,b_2\le 1\) with \(b_0+b_1+b_2=1\). Furthermore, let the elements of \(\left\{ \pi _{ij}\right\} \) be given by

$$\begin{aligned} \pi _{ij}:={\left\{ \begin{array}{ll} b_0/c,&{}\text{ for } i=j;\\ b_1/(c-1)(c-2),&{}\text{ for } i,j\in \left\{ 1,2,\ldots ,c-1\right\} \;\text{ with }\;i\ne j;\\ b_2/2(c-1),&{}\text{ otherwise }. \end{array}\right. } \end{aligned}$$
(33)

Then \(\kappa _u\) is strictly increasing in c for all \(u\in [0,1]\).

Proof

Under the conditions of the theorem we have \(\lambda _0=b_0\), \(\lambda _1=b_1\) and \(\lambda _2=b_2\), and thus the quantity

$$\begin{aligned} O_u=\lambda _0+u\lambda _1=b_0+ub_1. \end{aligned}$$
(34)

Formula (34) shows that, for all \(u\in [0,1]\), the quantity \(O_u\) is not affected by the number of categories c. We also have

$$\begin{aligned} \pi _{i+}=\pi _{+i}=\frac{b_0}{c}+\frac{(c-2)b_1}{(c-1)(c-2)} +\frac{b_2}{2(c-1)}=\frac{b_0}{c}+\frac{2b_1+b_2}{2(c-1)}, \end{aligned}$$
(35)

for \(i\in \left\{ 1,2,\ldots ,c-1\right\} \), and

$$\begin{aligned} \pi _{c+}=\pi _{+c}=\frac{b_0}{c}+\frac{(c-1)b_2}{2(c-1)}=\frac{b_0}{c}+\frac{b_2}{2}. \end{aligned}$$
(36)

Since (35) and (36) are strictly decreasing in c, the quantity \(E_u=\mu _0+u\mu _1\), with \(\mu _0\) and \(\mu _1\) defined in (7a) and (7b), respectively, is also strictly decreasing in c, under the conditions of the theorem, for all \(u\in [0,1]\).

Finally, the first order partial derivative of (11) with respect to \(E_u\) is given by

$$\begin{aligned} \frac{\partial \kappa _u}{\partial E_u}=\frac{O_u-1}{(1-E_u)^2}. \end{aligned}$$
(37)

If agreement is not perfect (i.e. \(\lambda _0<1\)), (37) is strictly negative. Hence, (11) is strictly decreasing in \(E_u\). Since \(E_u\) is strictly decreasing in c and since \(O_u\) is not affected by c, \(\kappa _u\) in (11) is strictly increasing in c for all \(u\in [0,1]\). \(\square \)

Theorem 8 shows that if we consider a series of agreement tables of a form (33) and keep the values of \(\lambda _0\) and \(\lambda _1\) fixed, then the values of the new kappa coefficients increase with the size of the table.

6 Discussion

A family of kappa coefficients for assessing agreement between two dichotomous-nominal classifications with identical categories was presented. This type of classification includes an ‘absence’ category in addition to two or more ‘presence’ categories. Cohen’s unweighted kappa (Cohen 1960; Warrens 2011, 2015) can be used to quantify agreement between two regular nominal classifications (i.e. classifications without an ‘absence’ category). However, Cohen’s kappa may not be appropriate for quantifying agreement between two dichotomous-nominal classifications, since disagreement between classifiers on a ‘presence’ category and the ‘absence’ category may be much more serious than disagreement on two ‘presence’ categories, for example, for clinical treatment.

The following properties of the new kappa coefficients for dichotomous-nominal classifications were formally proved. If the ‘absence’ category is not used, the dichotomous-nominal classifications reduce to regular nominal classifications, and all kappa coefficients are identical to Cohen’s kappa (Theorem 1). The values of the kappa coefficients for dichotomous-nominal classifications all coincide if the agreement table has two categories (Theorem 4). Furthermore, the values of the new kappa coefficients can be strictly ordered in precisely two ways (Theorem 6). Finally, for a particular, yet general class of agreement tables it was shown that the values of the kappa coefficients for dichotomous-nominal classifications increase with the number of categories (Theorem 8).

The values of the new kappa coefficients can be strictly ordered in two different ways. In practice, one ordering is more likely to occur than the other. Tables 1 and 2 and the associated numbers in Table 3 give examples of the likely ordering. In this likely ordering Cohen’s kappa produces the minimum value, and the values of the new kappa coefficients increase as more weight is assigned to the disagreement between the ‘presence’ categories. The strict ordering of their values suggests that the new kappa coefficients are measuring the same concept, but to a different extent.

The new kappa coefficients for dichotomous-nominal classifications allow the user to specify how much weight should be assigned to the disagreement between the ‘presence’ categories, using a value between 0 and 1. The higher the value of the weight the bigger the difference between the disagreement between the ‘presence’ categories and the disagreement between the ‘absence’ category on the one hand and the ‘presence’ categories on the other hand. Finding the optimal value of the weight for real-world applications is a necessary topic for future research. If one does not use the new kappa coefficients, but uses Cohen’s unweighted kappa for regular nominal classifications instead for quantifying agreement, the agreement will likely be underestimated, since Cohen’s kappa will usually produce a lower value. How much the agreement is underestimated depends on the data at hand.

Several authors have presented magnitude guidelines for evaluating the values of kappa coefficients (Landis and Koch 1977; Altman 1991; Fleiss et al. 2003). For example, it has been suggested that a value of 0.80 for Cohen’s kappa may indicate good or even excellent agreement. However, there is general consensus in the literature that uncritical application of such target values leads to practically questionable decisions (Vanbelle and Albert 2009; Warrens 2015). Since kappa coefficients for dichotomous-nominal classifications that give a large weight to the total disagreement between the ‘presence’ categories appear to produce values that are substantially higher than the values of the kappa coefficients that give a small weight to the disagreement between the ‘presence’ categories, the same magnitude guidelines cannot be used for all the new kappa coefficients. If it is desirable to use magnitude guidelines, then it seems reasonable to use stricter criteria for kappa coefficients that tend to produce high values.