Kappa Coefficients for Circular Classifications

Circular classifications are classification scales with categories that exhibit a certain periodicity. Since linear scales have endpoints, the standard weighted kappas used for linear scales are not appropriate for analyzing agreement between two circular classifications. A family of kappa coefficients for circular classifications is defined. The kappas differ only in one parameter. It is studied how the circular kappas are related and if the values of the circular kappas depend on the number of categories. It turns out that the values of the circular kappas can be strictly ordered in precisely two ways. The orderings suggest that the circular kappas are measuring the same thing, but to a different extent. If one accepts the use of magnitude guidelines, it is recommended to use stricter criteria for circular kappas that tend to produce higher values.


Introduction
Similarity coefficients are used in pattern recognition, data analysis and classification to quantify the strength of a relationship between two variables or classifications. Similarity coefficients can be used to summarize parts of a research study, but can also be used as input for data-analytic techniques, for example, cluster analysis. Well-known examples of similarity coefficients are Pearson's product-moment correlation for measuring linear dependence between two interval variables, the Jaccard coefficient for measuring co-occurrence of two species types, and the Hubert-Arabie adjusted Rand index for comparing partitions obtained with different clustering algorithms (Warrens 2008(Warrens , 2014. Kappa coefficients are commonly used to quantify agreement between classifications with identical categories (Vanbelle, Mutsvari, Declerck, and Lesaffre 2012;Warrens 2010aWarrens , 2011aYang and Zhou 2015).
In social and behavioral science and biomedical research, it is frequently required that agreement between two classifications with identical categories is assessed. For example, to assess the reliability of a rating scale researchers typically let two observers rate independently the same set of objects. The categories of the rating scale are defined in advance. The agreement between the observers can be used to investigate the reliability of the rating scale. Standard tools for quantifying agreement between classifications with identical categories are Cohen's kappa in the case of nominal categories (Yang and Zhou 2014;Warrens 2010b), and weighted kappa in the case of ordinal categories (Vanbelle 2015;Yang and Zhou 2015;Warrens 2012Warrens , 2013Warrens , 2015. Both coefficients correct for agreement due to chance. Although interval and ordinal data are usually measured on a linear scale, data may also exhibit a certain periodicity, for example, if the data are naturally measured on a circular scale. Examples of circular interval data are directions measured in degrees, and the time of the day (Berens 2009). Examples of categorical classifications that have been measured on a circular scale are the day of the week, affect states (Posner, Russell, and Peterson 2005;Watson, Wiese, Vaidya, and Tellegen 1999;Watson and Tellegen 1985), vocational interests (Brown 1992), and phases of cell cycle genes (Rueda, Fernández, and Peddada 2009). In social and behavioral science and biomedical research, circular scales that measure social or psychological constructs are usually referred to as circumplex models.
With circular scales the designation of high and low is arbitrary. Furthermore, with categorical circular scales an anchor point is usually not appropriate. For example, Russell (1980) hypothesized that the following eight affect categories can be ordered on a circular scale: Arousal, Excitement, Pleasure, Contentment, Sleepiness, Depression, Misery and Distress.  Table 1 thus form a circular scale. More elaborate circular scales of affect states can be found in Posner, Russell, and Peterson (2005) and Watson, Wiese, Vaidya, and Tellegen (1999). A second example comes from career assessment. Assessment of vocational interest is done to give insight into a person's interests, so that participants may be assisted in the choice of an occupation that will sustain their interests and keep them usefully employed throughout their working life. Vocational interest is usually measured with an interest inventory. A participant who completes an interest inventory expresses preferences about items concerning a field of work or recreation. The outcome of an interest inventory is one or a combination of the following six ordered categories: Realistic, Investigative, Artistic, Social, Enterprising and Conventional. Table 2 presents hypothetical pairwise classifications of the primary vocational interest of 120 participants obtained with two different interest inventories. The elements on the main diagonal are the numbers of participants with the same vocational interest according to both inventories. All off-diagonal elements are numbers of participants on which the inventories disagreed. Most disagreement is on categories that are adjacent in the depicted ordering. Furthermore, the disagreement between Realistic and Conventional and their adjacent categories suggests that the two categories should be adjacent on a scale. The categories in Table 2 thus form a circular scale.
The standard weighted kappas for linear scales studied in, for example, Vanbelle (2015) and Warrens (2012Warrens ( , 2013Warrens ( , 2014Warrens ( , 2015, are not appropriate with circular scales since they require that the scale has endpoints, which is not the case with circular scales. A kappa coefficient for circular classifications with identical categories as a special case of weighted kappa was first presented in Gwet (2012, p. 63-64). In an example with four categories, Gwet suggested to assign weights only to agreement, and disagreements on adjacent categories. In this paper, we generalize this idea and define formally a family of kappa coefficients for circular classifications. Furthermore, it is shown how the circular kappas are related, and it is studied whether the circular kappas depend on the number of categories.
The paper is organized as follows. In Section 2, we introduce the notation and present several definitions. A family of circular kappas is defined in Section 3. In Section 4, it is shown that the circular kappas can be ordered in two ways. One ordering is more likely to occur in practice. The second ordering is the reverse ordering of the first one. Furthermore, it is shown that a specific class of circular kappas can be interpreted as weighted averages of the Cohen's kappas of all collapsed tables that are obtained by combining two adjacent categories. In Section 5, a possible dependence of the circular kappas on the number of categories is studied. A result is presented that suggests that the circular kappas tend to increase with the number of categories. A discussion and several recommendations are presented in Section 6.

Notation and Weighted Kappa
Suppose that two fixed classifiers (for example, expert observers, algorithms, rating instruments) have independently classified the same set of n objects (for example, individuals, scans, products) into the categories A 1 , A 2 , . . . , A c , that were defined in advance. For a population of objects, let π ij denote the proportion of the n objects that is classified into category A i by the first classifier and into category A j by the second classifier, where i, j ∈ {1, 2, . . . , c}. We assume that the categories of the rows and columns of the table {π ij } are in the same order, so that the diagonal elements π ii reflect the exact agreement between the two classifiers. In the context of agreement studies, the table {π ij } is usually called an agreement table. Define and The marginal probabilities π i+ and π +i reflect how often the categories were used by the first and second classifier, respectively. Furthermore, if the ratings between the two classifiers are statistically independent the expected value of π ij is given by π i+ π +j . The table {π i+ π +j } contains the expected values.
In the next section, we define kappa coefficients for circular classifications as special cases of weighted kappa. Weighted kappa is a standard tool for quantifying the degree of agreement between two classifications with ordinal categories. With ordered categories, there is usually more disagreement between the classifiers on adjacent categories than on categories that are further apart. Weighted kappa allows the user to describe the closeness between categories using weights (Vanbelle 2015;Warrens 2013Warrens , 2014. The real number 0 ≤ w ij ≤ 1 denotes the weight corresponding to cell (i, j) of tables {π ij } and {π i+ π +j }. The weighted kappa coefficient is defined as (Warrens 2011b (3) The cell probabilities of the table {π ij } are not directly observed. Let {n ij } denote the contingency table of observed frequencies. Assuming a multinominal sampling model with the total number of objects n fixed, the maximum likelihood estimate of π ij is given byπ ij = n ij /n Zhou 2014, 2015). Tables 1 and 2 Estimate (4) is obtained by substitutingπ ij = n ij /n for the cell probabilities π ij in (3). Furthermore, a large sample standard error for weighted kappa is presented in Fleiss, Cohen, and Everitt (1969) (see also, Yang and Zhou 2015). This formula will be used to estimate 95% confidence intervals of the point estimateκ (see Table 3).
Next, we define several quantities for notational convenience. Consider the table {π ij } with relative frequencies, and define the quantities Quantity λ 0 is the total observed agreement, the proportion of objects that have been classified into the same categories by both classifiers. Quantity λ 1 is the sum of the elements on the first diagonal above the main diagonal of the table {π ij } and the first diagonal below the main diagonal, together with the elements π 1c and π c1 . Quantity λ 1 is the proportion of disagreement on adjacent categories of the circular scale. Since 1 − λ 0 is the total disagreement, quantity λ 2 is composed of the disagreement that is not part of λ 1 . Next, consider the table {π i+ π +j }, and define the quantities Quantities μ 0 , μ 1 and μ 2 are the expected values of quantities λ 0 , λ 1 and λ 2 , respectively, under statistical independence of the classifiers.

Circular Kappas
In this section, we define a family of kappa coefficients that can be used for quantifying agreement between two circular classifications. The Table 3. Point and interval estimates of the circular kappas in (10)  kappas differ only by one parameter. Let 0 ≤ u < 1 be a real number. The number u will be used as a parameter to assign weight to the disagreement on adjacent categories. Similar to the small example in Gwet (2012), we will give full weight to the entries on the main diagonal of {π ij }, and a partial weight u to the entries corresponding to adjacent categories; all other weights are set to 0: This weighting scheme makes sense if we expect some disagreement between the classifiers on adjacent categories but no serious disagreement on categories that are further apart on the scale. Using the quantities in (5), the weighted observed agreement with parameter u is defined as Furthermore, using the quantities in (6) the expected value of (8) under independence is given by By using higher values of u in (8) and (9) more weight is given to the total disagreement between adjacent categories. Using (8) and (9) a family of circular kappas with parameter u can be defined as The value of (10) is equal to 1 if there is perfect agreement between the classifiers (λ 0 = 1), and 0 when λ 0 + uλ 1 = μ 0 + uμ 1 . Formula (10) is also obtained if one uses weighting scheme (7) in the general formula (3).
For u = 0 we obtain Cohen's kappa (Yang and Zhou 2014;Warrens 2010b) This is an important special case of (10). The value of (11) is equal to 1 when there is perfect agreement between the classifiers (λ 0 = 1), 0 when the observed agreement is equal to that expected under independence (λ 0 = μ 0 ), and negative when agreement is less than expected by chance. Table 3 presents point and interval estimates of (10) for the data in Tables 1 and 2, and for four values of u. For example, for Table 1 the estimate of Cohen's kappa isκ 0 = 0.75 with 95% CI = 0.68 − 0.81. The values in Table 3 illustrate that the value of the circular kappas in (10) are increasing in the parameter u for the data in Tables 1 and 2. This property is formally proved in Lemma 2 in the next section.
We end this section with the following result. Lemma 1 shows that all special cases of (10) coincide with c = 3 categories (and thus also with c = 2 categories). More precisely, Lemma 1 shows that with c = 3 categories all circular kappas coincide with Cohen's kappa in (11). Lemma 1. If c = 3, then κ u = κ 0 .
Cohen's kappa is a standard tool for quantifying agreement between two classifications with nominal categories (Yang and Zhou 2014;Warrens 2010b). Lemma 1 shows that in the case of three categories the kappa coefficients for nominal categories and circular categories coincide. With three circular categories, all categories are adjacent to one another. Nominal categories are unordered, and thus none of the categories is adjacent to another category. It appears that from a mathematical perspective the two situations are quite similar.

Relationships Between Circular Kappas
In this section several relationships between the circular kappas are presented. One result involves all circular kappas (Lemma 2), while another result only applies to certain circular kappas (Lemma 4). Since all special cases coincide with c = 3 categories (Lemma 1), we assume from here on that c ≥ 4. Lemma 2 shows that there exist precisely two orderings of the circular kappas.
Proof. We first show that (14) is equivalent to (17). We have Since 1 − E u and 1 − E v are positive numbers, cross-multiplying the terms of (14) yields Adding Since 1−E v and E v −E u are positive numbers, inequality (16) is equivalent to Next, using definitions (8) and (9), inequality (17) becomes Inserting definitions (5c) and (6c) on the left-hand side of (18), we obtain Cross-multiplying the terms of (19), followed by some algebra, we finally obtain inequality (13).
Lemma 2 shows that if inequality (13) holds all special cases of (10) are strictly ordered. In fact, the circular kappas can be ordered in precisely two ways. If (13) holds, Cohen's kappa κ 0 has the smallest value and we have κ u < κ v if u < v. Note that inequality (13) holds for the data in Table  1, sinceλ 2 = 0 for these data. Furthermore, the inequality also holds for the data in Table 2, since (20) Table 3 shows that the point estimates of the circular kappas are indeed ordered as predicted by Lemma 2.
The reverse ordering holds if the converse of condition (13) holds, that is, λ 1 /μ 1 < λ 2 /μ 2 . In this case Cohen's kappa κ 0 has the highest value and we have κ u > κ v if u < v. All circular kappas coincide if c = 2, 3 (Lemma 1) and if (13) becomes an equality. This second ordering is less likely to occur, since it requires that there is more disagreement on categories that are not adjacent in the ordering than on categories that are adjacent.
The value of circular kappa κ u is bounded by κ 0 and lim u→1 κ u . Lemma 3 presents an expression of this limit.
Next, we consider a different type of relationship between the circular kappas. Lemma 4 below provides an interpretation for a specific class of circular kappas. It sometimes makes sense to combine two categories, for example, if two categories are not clearly defined or are easily confused. The disagreement between the categories can be removed by combining the categories. Since the categories of a circular scale are ordered, it only makes sense to combine categories that are adjacent in the ordering. With c categories there are c adjacent pairs of categories, and thus c different ways to collapse an c × c agreement into an (c − 1) × (c − 1) agreement table.
It turns out that certain circular kappas are weighted averages of the values of Cohen's kappa corresponding to the c collapsed tables that are obtained by combining two adjacent categories. This is proved in the following lemma.
Then, using (5b), (23) and (24), we find that Using similar arguments, we find that The quantities κ 0 (i), O 0 (i) and E 0 (i) are related by the formula or equivalently, κ 0 (i)(1 − E 0 (i)) = O 0 (i) − E 0 (i). Finally, using the latter identity, together with (25) and (26), we have An interesting application of Lemma 4 occurs when we have c = 4 categories. Since all circular kappas coincide with c = 3 categories (Lemma 1), the circular kappa κ 0.25 can be interpreted as a weighted average of the four kappas of the collapsed 3 × 3 tables that are obtained by combining adjacent categories.

Dependence on the Number of Categories
In this section, a possible dependence of the circular kappas on the number of categories is studied. In Lemma 5, it is assumed that data can be described by the specific structure presented in (28). This data structure is perhaps not realistic, but it provides a theoretical result on the dependency of the circular kappas. Lemma 5 presents an example of a class of agreement tables for which all circular kappas are increasing in the number of categories c.
Lemma 5. Let c ≥ 4 and let 0 ≤ a 0 , a 1 , a 2 ≤ 1 with a 0 + a 1 + a 2 = 1. Furthermore, let the entries of {π ij } be given by Then κ u is strictly increasing in c for all u.
Proof. Under the conditions of the lemma we have λ 0 = a 0 , λ 1 = a 1 and λ 2 = a 2 . Using the identity a 0 + a 1 + a 2 = 1 we also have 2, . . . , c} . (29) Using identity (29) we have μ 0 = 1/c and μ 1 = 2/c, and thus Using the right-hand side of (30) the first partial derivative with respect to c is given by Because u < 1, we have and thus the inequality a 0 + ua 1 < 1. It follows from this latter inequality that (31) is strictly positive. Thus, under the conditions of the lemma κ u is strictly increasing in c.
Lemma 5 shows that if we consider a series of agreement tables of a form (28) and keep the values of the total observed agreement λ 0 and the total disagreement on adjacent categories λ 1 fixed, then the values of the circular kappas increase with the size of the table. Using identity (30), we find that with a large number of categories the value of κ u approaches

Discussion
A family of kappa coefficients for assessing agreement between two circular classifications with identical categories was presented. If the categories form a circular scale, the categories exhibit a certain periodicity, and the designation of high and low is arbitrary. The standard weighted kappas used with linear scales, that is, linear and quadratic kappa (Vanbelle 2015;Warrens 2013Warrens , 2015, are not appropriate for analyzing agreement between circular classifications. The following properties of the circular kappas were formally proved. The circular kappas all coincide if the agreement table has two or three categories (Lemma 1). Furthermore, the values of the circular kappas can be strictly ordered in precisely two ways (Lemma 2). Moreover, certain circular kappas can be interpreted as weighted averages of the Cohen's kappas of all collapsed tables that are obtained by combining two adjacent categories (Lemma 4). Finally, for a particular type of agreement table it was shown that the values of the circular kappas increase with the number of categories (Lemma 5).
The values of the circular kappas can be strictly ordered in two different ways, but one ordering is more likely to occur in practice than the other. In the likely ordering (see Tables 1 and 2), Cohen's kappa produces the smallest value, and the values of the circular kappas increase as more weight is assigned to the total disagreement on adjacent categories. The strict ordering suggests that the circular kappas are measuring the same concept, but to a different extent. This in turn suggests that we might as well use the most well-known kappa coefficient in this family, which is Cohen's kappa. Using Cohen's kappa has several advantages. First of all, if we use Cohen's kappa it is not necessary to specify a positive value of the parameter u, which is arbitrary to some extent. Secondly, Cohen's kappa has been applied in thousands of applications and many of its properties are well understood (Zhou and Yang 2014; Warrens 2008Warrens , 2010bWarrens , 2013. Various authors have presented target values for evaluating the values of kappa coefficients (Landis and Koch 1977;Altman 1991). For example, a value of 0.80 for Cohen's kappa generally indicates good or excellent agreement. There is general consensus in the literature that uncritical application of such magnitude guidelines leads to practically questionable decisions. Since the circular kappas tend to measure the same thing, and since circular kappas that give a large weight to the total disagreement on adjacent categories appear to produce values that are substantially higher than the values of the circular kappas that give a small weight to the total disagreement on adjacent categories, the same guidelines cannot be applied to all circular kappas. If one accepts the use of magnitude guidelines, it seems reasonable to use stricter criteria for circular kappas that tend to produce high values.
If one is interested in using a circular kappa other than Cohen's kappa, then Lemma 4 provides an interesting case when we have c = 4 categories. Since all circular kappas coincide with c = 3 categories (Lemma 1), the circular kappa κ 0.25 can be interpreted as a weighted average of the four kappas of the collapsed 3 × 3 tables that are obtained by combining adjacent categories. Since κ 0.25 is an average, its value lies between the minimum and maximum value of the four kappas of the collapsed 3 × 3 tables. Furthermore, because these four kappas usually (with real life data) have distinct values, it follows that there exist two categories such that, when combined, the value of κ 0.25 is increased. Moreover, there exist two categories such that, when combined, the value of κ 0.25 is decreased. This minor existence result does not tell us which categories these are, just that they exist.
Open Acces This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.