Kappa coefficients for dichotomous-nominal classifications

Two types of nominal classifications are distinguished, namely regular nominal classifications and dichotomous-nominal classifications. The first type does not include an ‘absence’ category (for example, no disorder), whereas the second type does include an ‘absence’ category. Cohen’s unweighted kappa can be used to quantify agreement between two regular nominal classifications with the same categories, but there are no coefficients for assessing agreement between two dichotomous-nominal classifications. Kappa coefficients for dichotomous-nominal classifications with identical categories are defined. All coefficients proposed belong to a one-parameter family. It is studied how the coefficients for dichotomous-nominal classifications are related and if the values of the coefficients depend on the number of categories. It turns out that the values of the new kappa coefficients can be strictly ordered in precisely two ways. The orderings suggest that the new coefficients are measuring the same thing, but to a different extent. If one accepts the use of magnitude guidelines, it is recommended to use stricter criteria for the new coefficients that tend to produce higher values.


Introduction
In data analysis and classification similarity coefficients are commonly used to quantify the strength of a relationship between two objects, variables, features or classifications (Goodman and Kruskal 1954;Gower and Warrens 2017). Similarity coefficients may be used to summarize parts of a research study, for example, an agreement or reliability B Matthijs J. Warrens m.j.warrens@rug.nl 1 Groningen Institute for Educational Research, University of Groningen, Grote Rozenstraat 3, 9712 TG Groningen, The Netherlands study. They can also be used as input for methods of multivariate analysis such as factor analysis and cluster analysis (Bartholomew et al. 2011;Hennig et al. 2016).
Well-known examples of similarity coefficients are the Pearson correlation, which is a standard tool for assessing linear association between two variables, coefficient alpha (Cronbach 1951;Hoekstra et al. 2019), which is frequently used in classical test theory to estimate the reliability of a test score, the Jaccard coefficient (Jaccard 1912), which is commonly used for assessing co-occurrence of two species types, and the Hubert-Arabie adjusted Rand index (Hubert and Arabie 1985;Steinley et al. 2016), which is a standard tool for measuring agreement between two partitions of the same set of objects.
Categories of nominal classifications are mutually exclusive and (usually) collectively exhaustive. There are, basically, two types of nominal classifications. The distinction hinges upon whether the classification does or does not include an 'absence' category. When there is no 'absence' category, a classification can be described as having three, four or more unordered categories of 'presence' that specify, for example, various disorders. This type of classification can be compared to a classification that contains an 'absence' category (for example, no disorder) and two or more 'presence' categories. The first type of classification will simply be referred to as a regular nominal classification. The second type of classification will be called a dichotomousnominal classification, following terminology used in Cicchetti et al. (1992) for ordinal classifications.
Let us consider two examples of dichotomous-nominal classifications. The first example comes from the diagnosis of movement disorders (Son et al. 2014). Movement disorders are clinical syndromes that cause abnormal increased movements, or reduced or slow movements. Examples of movement disorders are dyskinesia (excessive, often repetitive, involuntary movement), akinesia (lack of voluntary movement) and hypokinesia (reduced amplitude of movement) (Fahn et al. 2011). Table 1 presents hypothetical pairwise classifications of 169 individuals with assumed movement disorder into four categories by two classifiers. The first three categories A 1 , A 2 and A 3 correspond to movement disorders. The last category A 4 is the 'absence' category. Because the categories of the rows and columns of Table 1 are in the same order, the elements on the main diagonal are the number of individuals on which the classifiers  agreed. All off-diagonal elements are numbers of individuals on which the classifiers disagreed.
As a second example we consider the diagnosis of personality disorders (Spitzer and Fleiss 1974;Loranger et al. 1997). These are mental disorders characterized by enduring maladaptive patterns of behavior and cognition. Personality disorders are usually grouped into three types, suspicious disorders, emotional and impulsive disorders, and anxious personality disorders. The first type further consists of paranoid, schizoid, schizotypical and antisocial personality disorders. Table 2 presents hypothetical pairwise classifications of 255 individuals with assumed suspicious personality disorder into five categories by two classifiers. The first four categories A 1 , A 2 , A 3 and A 4 correspond to suspicious personality disorders. The last category A 5 is the 'absence' category.
Cohen's kappa coefficient (Cohen 1960;Warrens 2011Warrens , 2015 can be used for assessing agreement between two regular nominal classifications. If one uses Cohen's kappa to quantify agreement between the classifications, the distances between all categories are considered equal, and this makes sense if all nominal categories reflect different types of 'presence'. However, there are no coefficients for assessing agreement between two dichotomous-nominal classifications with the same categories. Up till now Cohen's kappa (and its extensions) have been used to analyze agreement between these classifications.
However, disagreement between classifiers on a 'presence' category and the 'absence' category may be much more serious than disagreement on two 'presence' categories, for example, for clinical treatment. The crucial clinical implication is that in the quantification of agreement distances between a 'presence' category and the 'absence' category should be dealt with differently than with two 'presence' categories. Cohen's kappa does not accomplish this. In this manuscript we therefore develop new kappa coefficients for assessing agreement between dichotomous-nominal classifications. In addition, we present various properties of the coefficients.
The manuscript is organized as follows. In Sect. 2 we introduce the notation and present several definitions. A family of kappa coefficients for dichotomous-nominal classifications with identical categories is defined in Sect. 3. In Sect. 4 we present various properties of the coefficients. Among other things it is shown that the values of the new kappa coefficients can be ordered in two ways. One ordering is more likely to occur in practice. The second ordering is the reverse ordering of the first one. In Sect. 5 it is shown that the values of the new kappa coefficients increase with the number of categories for a class of agreement tables with constant values observed agreement and disagreement. A discussion and several recommendations are presented in Sect. 6.

Notation and weighted kappa
Suppose that two fixed classifiers (for example, expert observers, algorithms, rating instruments) have independently classified the same set of n objects (for example, individuals, scans, products) into c ≥ 2 unordered categories A 1 , A 2 , . . . , A c , that were defined in advance. We assume that the first c−1 categories, labeled A 1 , A 2 , . . . , A c−1 , are the 'presence' categories, and that the last category, labeled A c , denotes the 'absence' category.
For a population of objects, let π i j denote the proportion of objects that is classified into category A i by the first classifier and into category A j by the second classifier, where i, j ∈ {1, 2, . . . , c}. We assume that the categories of the rows and columns of the table π i j are in the same order, so that the diagonal elements π ii reflect the exact agreement between the two classifiers. In the context of agreement studies the table π i j is sometimes called an agreement table. The table π i j summarizes the pairwise information between the two nominal classifications (by classifiers 1 and 2). Furthermore, table π i j contains all information needed to define and calculate kappa coefficients.
Define the marginal totals The marginal probabilities π i+ and π +i reflect how often the categories were used by the first and second classifier, respectively. Furthermore, if the ratings between the two classifiers are statistically independent the expected value of π i j is given by π i+ π + j . The table π i+ π + j contains the expected values of the elements of table π i j under statistical independence of the classifiers.
In the next section we define kappa coefficients for dichotomous-nominal classifications as special cases of weighted kappa (Cohen 1968). Weighted kappa allows the user to describe the closeness between categories using weights (Vanbelle and Albert 2009;Warrens 2011Warrens , 2012Yang and Zhou 2015;Vanbelle 2016;Moradzadeh et al. 2017). Let the real number 0 ≤ w i j ≤ 1 denote the weight corresponding to cell (i, j) of tables π i j and π i+ π + j . The weighted kappa coefficient is defined as (Warrens 2011) The cell probabilities of the table π i j are not directly observed. Let table n i j denote the contingency table of observed frequencies. Tables 1 and 2 are two examples of n i j . Assuming a multinominal sampling model with the total number of objects n fixed, the maximum likelihood estimate of π i j is given byπ i j = n i j /n Zhou 2014, 2015). Furthermore, under the multinominal sampling model, the maximum likelihood estimate of κ iŝ The estimates in (4) and (5) are obtained by substitutingπ i j = n i j /n for the cell probabilities π i j in (2) and (3), respectively. Next, we define several quantities for notational convenience. Consider the table π i j with cell probabilities, and define the quantities Quantity λ 0 is the total observed agreement, the proportion of objects that have been classified into the same categories by both classifiers. Furthermore, quantity λ 1 is the proportion of observed disagreement between the 'presence' categories A 1 , . . . , A c−1 . Moreover, quantity λ 2 is the proportion of observed disagreement between 'absence' category A c on the one hand, and the 'presence' categories on the other hand.
Next, consider the table π i+ π + j , and define the quantities Quantities μ 0 , μ 1 and μ 2 are the expected values of quantities λ 0 , λ 1 and λ 2 , respectively, under statistical independence of the classifiers.

New kappa coefficients
In this section we define a family of kappa coefficients that can be used for quantifying agreement between two dichotomous-nominal classifications with the same categories. The kappas differ only by one parameter. To model the agreement and disagreement between the categories we use three different numbers. As usual with kappa coefficients we will give full weight 1 to the entries on the main diagonal of π i j (Cohen 1968;Warrens 2012Warrens , 2013. Furthermore, let u ∈ [0, 1] be a real number. We give a partial weight u to model the disagreement among 'presence' categories A 1 , . . . , A c−1 . All other weights are set to zero: the weight 0 is used to model the disagreement between all 'presence' categories and the single 'absence' category A c . The weighting scheme is then given by The weighting scheme in (8) (6) the weighted observed agreement with parameter u is defined as O u := λ 0 + uλ 1 .
Furthermore, using the quantities in (7) the expected value of (9) under statistical independence is given by E u := μ 0 + uμ 1 .
By using higher values of u in (9) and (10) more weight is given to the disagreement among categories A 1 , . . . , A c−1 .
Using (9) and (10) a family of kappas with parameter u can be defined as The value of (11) is equal to 1 if there is perfect agreement between the classifiers (i.e. λ 0 = 1), and 0 when λ 0 + uλ 1 = μ 0 + uμ 1 . Formula (11) is also obtained if one uses weighting scheme (8) in the general formula (2). Under the multinominal sampling model, the maximum likelihood estimate of κ u is, using (4) and (5), A large sample variance estimator for (12) is given by (Fleiss et al. 1969;Yang and Zhou 2015)v where the quantity M is given by and quantitiesw i+ andw + j are given bȳ Formula (15), together with (16) and (17), will be used to estimate 95% confidence intervals of the point estimateκ u (see Table 3 below). Let us consider two special cases of (11). For u = 0 we obtain  (12) for the data in Tables 1  and 2 Value of u Point estimate 95% confidence interval The coefficient in (18) is Cohen's ordinary kappa (Cohen 1960;Yang and Zhou 2014;Warrens 2011Warrens , 2015, a standard tool for assessing agreement in the case of regular nominal classifications. The value of (18) is equal to 1 when there is perfect agreement between the classifiers (i.e. λ 0 = 1), 0 when the observed agreement is equal to that expected under independence (i.e. λ 0 = μ 0 ), and negative when agreement is less than expected by chance.
For u = 1 we obtain At first sight it is unclear how the kappa coefficient in (19) may be interpreted. An interpretation of coefficient (19) is presented in Theorem 2 in the next section. Table 3 presents point and interval estimates of (12) for the data in Tables 1 and 2, and for five values of parameter u. The values in Table 3 illustrate that the value of the coefficient in (12) increases in the parameter u for the data in Tables 1 and 2. This property is formally proved in Theorem 6 in the next section.

Properties of the kappa coefficients
In this section several relationships between the new kappa coefficients for dichotomousnominal classifications are presented. Theorem 1 shows that if the classifiers do not use the 'absence' category, then the kappa coefficient in (11) is identical to Cohen's ordinary kappa (Cohen 1960). This property makes a lot of sense, since if the 'absence' category is not used, dichotomousnominal classifications are de facto regular nominal classifications, and Cohen's kappa is a standard tool for quantifying agreement between regular nominal classifications with identical categories.

Theorem 1 If 'absence' category A c is not used by the classifiers, then
Proof If only 'presence' categories are used we have π c+ = π +c = 0. In this case we have λ 2 = 0 and μ 2 = 0, and thus the identities λ 1 = 1 − λ 0 and μ 1 = 1 − μ 0 . Using these identities in (11) we obtain Dividing all terms on the right-hand side of (20) by (1 − u) yields the coefficient in (18).
Theorem 2 provides several ways to interpret coefficient κ 1 in (19). First of all, the coefficient may be interpreted as a 'presence' versus 'absence' kappa coefficient. Furthermore, the procedure of combining all other categories (in this case all 'presence' categories) except a category of interest (in this case the 'absence' category), followed by calculating Cohen's ordinary kappa for the collapsed 2 × 2 table, defines a category kappa for the category of interest (in this case the 'absence' category) (Kraemer 1979;Warrens 2011Warrens , 2015. Category kappas can be used to quantify agreement between the classifiers for individual categories. Hence, coefficient κ 1 in (19) is the kappa coefficient for the 'absence' category.
Theorem 3 presents an alternative formula for coefficient κ 1 . It turns out that we only need three numbers to calculate this coefficient, regardless of the size of the total number of categories, namely, the values of π cc , π c+ and π +c .
Dividing all terms on the right-hand side of (24) by 2, we get the expression in (22).
Theorem 4 shows that all special cases of (11) coincide with c = 2 categories.
Proof If A 1 is the only 'presence' category and A 2 is the 'absence' category, there is no disagreement between the classifiers on 'presence' categories, that is, λ 1 = 0 and μ 1 = 0. Using λ 1 = 0 and μ 1 = 0 in (11) we obtain which is the coefficient in (18).
Since all special cases coincide with c = 2 categories (Theorem 4), we assume from here on that c ≥ 3.
Theorem 5 states that the kappa coefficient in (11) is a weighted average of the kappa coefficients in (18) and (19). The proof of Theorem 5 follows from simplifying the expression on the right-hand side of (26).
Theorem 6 shows that there exist precisely two orderings of the kappa coefficients for dichotomous-nominal classifications, as long as κ 0 = κ 1 . If we have κ 0 = κ 1 the value of κ u does not depend on u, or in other words, the values of all kappa coefficients for dichotomous-nominal classifications coincide.
The properties presented in Theorem 6 can be illustrated with the numbers in Table 3. For both Tables 1 and 2 the values of the new coefficients are strictly increasing from κ 0 to κ 1 . Furthermore, the coefficient values near u = 0 (i.e. near κ 0 ) are closer together than the coefficient values near u = 1 (i.e. near κ 1 ). The latter illustrates the concave upward property.
Theorem 7 presents a condition that is equivalent to the inequality κ 0 < κ 1 . The latter inequality holds if the ratio of observed disagreement between the 'presence' categories A 1 , . . . , A c−1 to the corresponding expected disagreement under independence of the classifiers, exceeds the ratio of the observed disagreement between 'absence' category A c on the one hand, and the 'presence' categories on the other hand, to the corresponding expected disagreement (i.e. condition ii. of Theorem 7).

Theorem 7
The following conditions are equivalent.
Proof Using identities (6c) and (7c) we have and Hence, condition i. (inequality κ 0 < κ 1 ) is equivalent to Condition ii. is then obtained by cross multiplying the terms of (31), followed by deleting terms that are on both sides of the inequality, and finally rearranging the remaining terms.
Theorems 6 and 7 show that if one of the two conditions of Theorem 7 holds then all special cases of (11) are strictly ordered. Moreover, the kappa coefficients can be ordered in precisely two ways. Furthermore, Theorem 7 also provides a condition under which all the new kappa coefficients obtain the same value, which can be empirically checked: If (32) holds, we have κ 0 = κ 1 and all the new kappa coefficients produce the same value.

Dependence on the number of categories
In this section, a possible dependence of the new kappa coefficients on the number of categories is studied. In Theorem 8, it is assumed that data can be described by the specific structure presented in (33). Theorem 8 presents an example of a class of agreement tables for which all kappa coefficients for dichotomous-nominal classifications are increasing in the number of categories c. The agreement tables in this class vary in size (i.e. have different number of categories), but they have the same observed agreement, the same observed disagreement between the 'presence' categories A 1 , . . . , A c−1 , and the same observed disagreement between 'absence' category A c on the one hand, and the 'presence' categories on the other hand. The specific values of the proportions of observed agreement and disagreements (denoted by b 0 , b 1 and b 2 ) are however not fixed.

Theorem 8 Let c ≥ 3 and let
Furthermore, let the elements of π i j be given by Then κ u is strictly increasing in c for all u ∈ [0, 1].
Proof Under the conditions of the theorem we have λ 0 = b 0 , λ 1 = b 1 and λ 2 = b 2 , and thus the quantity Formula (34) shows that, for all u ∈ [0, 1], the quantity O u is not affected by the number of categories c. We also have for i ∈ {1, 2, . . . , c − 1}, and Since (35) and (36) are strictly decreasing in c, the quantity E u = μ 0 + uμ 1 , with μ 0 and μ 1 defined in (7a) and (7b), respectively, is also strictly decreasing in c, under the conditions of the theorem, for all u ∈ [0, 1]. Finally, the first order partial derivative of (11) with respect to E u is given by If agreement is not perfect (i.e. λ 0 < 1), (37) is strictly negative. Hence, (11) is strictly decreasing in E u . Since E u is strictly decreasing in c and since O u is not affected by c, κ u in (11) is strictly increasing in c for all u ∈ [0, 1].
Theorem 8 shows that if we consider a series of agreement tables of a form (33) and keep the values of λ 0 and λ 1 fixed, then the values of the new kappa coefficients increase with the size of the table.

Discussion
A family of kappa coefficients for assessing agreement between two dichotomousnominal classifications with identical categories was presented. This type of classification includes an 'absence' category in addition to two or more 'presence' categories. Cohen's unweighted kappa (Cohen 1960;Warrens 2011Warrens , 2015 can be used to quantify agreement between two regular nominal classifications (i.e. classifications without an 'absence' category). However, Cohen's kappa may not be appropriate for quantifying agreement between two dichotomous-nominal classifications, since disagreement between classifiers on a 'presence' category and the 'absence' category may be much more serious than disagreement on two 'presence' categories, for example, for clinical treatment.
The following properties of the new kappa coefficients for dichotomous-nominal classifications were formally proved. If the 'absence' category is not used, the dichotomous-nominal classifications reduce to regular nominal classifications, and all kappa coefficients are identical to Cohen's kappa (Theorem 1). The values of the kappa coefficients for dichotomous-nominal classifications all coincide if the agreement table has two categories (Theorem 4). Furthermore, the values of the new kappa coefficients can be strictly ordered in precisely two ways (Theorem 6). Finally, for a particular, yet general class of agreement tables it was shown that the values of the kappa coefficients for dichotomous-nominal classifications increase with the number of categories (Theorem 8).
The values of the new kappa coefficients can be strictly ordered in two different ways. In practice, one ordering is more likely to occur than the other. Tables 1 and 2 and the associated numbers in Table 3 give examples of the likely ordering. In this likely ordering Cohen's kappa produces the minimum value, and the values of the new kappa coefficients increase as more weight is assigned to the disagreement between the 'presence' categories. The strict ordering of their values suggests that the new kappa coefficients are measuring the same concept, but to a different extent.
The new kappa coefficients for dichotomous-nominal classifications allow the user to specify how much weight should be assigned to the disagreement between the 'presence' categories, using a value between 0 and 1. The higher the value of the weight the bigger the difference between the disagreement between the 'presence' categories and the disagreement between the 'absence' category on the one hand and the 'presence' categories on the other hand. Finding the optimal value of the weight for real-world applications is a necessary topic for future research. If one does not use the new kappa coefficients, but uses Cohen's unweighted kappa for regular nominal classifications instead for quantifying agreement, the agreement will likely be underestimated, since Cohen's kappa will usually produce a lower value. How much the agreement is underestimated depends on the data at hand.
Several authors have presented magnitude guidelines for evaluating the values of kappa coefficients (Landis and Koch 1977;Altman 1991;Fleiss et al. 2003). For example, it has been suggested that a value of 0.80 for Cohen's kappa may indicate good or even excellent agreement. However, there is general consensus in the literature that uncritical application of such target values leads to practically questionable decisions (Vanbelle and Albert 2009;Warrens 2015). Since kappa coefficients for dichotomousnominal classifications that give a large weight to the total disagreement between the 'presence' categories appear to produce values that are substantially higher than the values of the kappa coefficients that give a small weight to the disagreement between the 'presence' categories, the same magnitude guidelines cannot be used for all the new kappa coefficients. If it is desirable to use magnitude guidelines, then it seems reasonable to use stricter criteria for kappa coefficients that tend to produce high values.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.