2022 Generalized Cram´er’s coeﬃcient via f -divergence for contingency tables

This study proposes measures describing the strength of association between the row and column variables via the f -divergence. Cram´er’s coeﬃcient is a possible mechanism for the analysis of two-way contingency tables. Tomizawa et al. (2004) proposed more general measures, including Cram´er’s coeﬃcient, using the power-divergence. In this paper, we propose more general measures and show some of their properties, demonstrating that the proposed measures are beneﬁcial for comparing the strength of association in several tables.


Introduction
Contingency tables and their analysis are important for various fields, such as medicine, psychology, education, and social science.Typically, contingency tables are used to evaluate whether row and column variables are statistically independent.If the independence of the two variables is rejected, for example, through Pearson's chi-squared test, or if they are clearly related, then we are interested in their strength of association.Many coefficients have been proposed to measure the strength of association between the two variables, namely, to measure the degree of departure from independence.Pearson's coefficient φ 2 of mean square contingency and P of contingency and Tschuprow's coefficient T (Tschuprow, 1925(Tschuprow, , 1939) ) serve as prime examples (see, e.g., Bishop et al., 2007;Everitt, 1992;Agresti, 2003).These measures can represent the strength of the association within an interval of 0-1, where the value 0 indicates the independence of the contingency table.However, the problem with φ 2 is that its value does not attain 1 despite that the contingency table has a complete association structure (i.e., maximum departure from independence).Similarly, P and T do not always attain the value of 1 depending on the number of rows and columns in the table.To address this issue, Cramér (1946) proposed Cramér's coefficient V 2 , which can reach the value of 1 if the contingency table has a complete association structure for all rows and columns.Specifically, V 2 indicates the strength of association in the contingency table as 0 ≤ V 2 ≤ 1, with the value of 0 identifying the independent structure and the value of 1 identifying the complete association structure.Rényi (1961) introduced a class of measures of a divergence of two distributions.Recent studies have linked contingency tables and the divergence.Tomizawa et al. (2004) proposed measures V 2 t(λ) (t = 1, 2, 3) based on the power-divergence with parameter λ ≥ 0. This study extended the measure that had been limited to examining V 2 (λ = 1) and showed to be the members of a single-parameter family, including a measure based on the KL-divergence (λ = 0).(For more details of the power-divergence, see Cressie and Read (1984), and Read and Cressie (1988).)Furthermore, the f -divergence is introduced by Ali and Silvey (1966) and Csiszár (1963) as a useful generalization of the relative entropy, which retains some of its major properties.It is also called the φ-divergence.In contingency table analysis, a considerable amount of literature has been published on modeling using the fdivergence (e.g., Kateri andPapaioannou, 1994, 1997;Kateri and Agresti, 2007;Fujisawa and Tahata, 2020;Tahata, 2022;Yoshimoto et al., 2019).Many studies on goodness-of-fit tests using the f -divergence have been conducted in the literature, showing the usefulness of the f -divergence (e.g., Pardo, 2018;Felipe et al., 2014Felipe et al., , 2018, etc.), etc.).However, discussions on the measures using the f -divergence are limited.
In this paper, we propose a wider class of measures than the conventional one via the f -divergence.This study's contribution is proving that a measure applying a function f (x) that satisfies the condition of the f -divergence has desirable properties for measuring the strength of association in contingency tables.This contribution allows us to easily construct a new measure using a divergence that has desirable properties for the analyst.For example, we conduct numerical experiments with a measure applying the θ-divergence.Furthermore, we can give further interpretation of the association between rows and columns in the contingency table, which could not be obtained with the conventional one.
The rest of this paper is organized as follows.Section 2 proposes new measures to express the strength of the association between the row and column variables in two-way contingency tables.Furthermore, the section shows that the proposed measures have desirable properties for measuring the strength of association.Section 3 presents the relationship between the measures and correlation coefficient in the bivariate normal distribution of the latent variables in the contingency tables.Section 4 demonstrates its simulation experiment.Section 5 presents the ap-proximate confidence intervals of the proposed measures.Section 6 presents the analysis examples of the proposed measures applying the power-divergence and the θ-divergence with actual data.Finally, Section 7 provides some concluding remarks.

Generalized measure
We consider measures of association using the f -divergence for an r×c contingency table.For the r×c contingency table, let p ij denote the probability that an observation will fall in the ith row and jth column of the table (i = 1, . . ., r; j = 1, . . ., c).Moreover, let p i• and p •j be p i• = c t=1 p it and p •j = r s=1 p sj .Hereinafter, we assume that {p i• = 0, p •j ≥ 0} when r ≤ c and {p i• ≥ 0, p •j = 0} when r > c.
In Sason and Verdú (2016), the f -divergence from P to Q is defined as I f (P ; Q) = f (dP/dQ)dQ, where f is a convex function and P ≪ Q.For the r×c contingency table, P and Q are given as discrete distributions {p ij } and {q ij }.Accordingly, we have dP/dQ = {p ij /q ij }.Thus, the f -divergence from {p ij } to {q ij } is given as where f (x) is a once-differentiable and strictly convex function on (0, +∞) with f (1) = 0, lim x→0 f (x) = 0, 0f (0/0) = 0, and 0f (a/0) = a lim x→∞ f (x)/x (see Csiszár and Shields, 2004).By choosing function f , many important divergences, such as the KL-divergence (f (x) = x log x), the Pearson's divergence (f (x) = x 2 − x), the power-divergence (f (x) = (x λ+1 − x)/λ(λ + 1)), and the θ-divergence ), are included in special cases of the f -divergence (see, Sason and Verdú, 2016;Ichimori, 2013, e.g.,).Furthermore, the f -divergence is one of the monotone and regular divergences.The class of monotone and regular divergences is introduced in Cencov (2000) and studied in Corcuera and Giummolé (1998) as a wide class of invariant divergences with respect to Markov embeddings.The class of monotone and regular divergence is often used as the measures of goodness of prediction (see Geisser, 1993;Corcuera and Giummolè, 1999b,a, etc.).Studying these measures aims to obtain a quantitative measure of how well a row or column variable predicts the other variable.Therefore, we consider that the measures using the f -divergence are appropriate for measuring the association and a natural generalization of that of Tomizawa et al. (2004).
Measures that present the strength of association between row and column variables are proposed in three cases: (I) When the row and column variables are response and explanatory variables, respectively (II) When they are explanatory and response variables, respectively (III) When response and explanatory variables are undefined Further, we define measures for the asymmetric situation (in the case of (I) and (II)) and for the symmetric situation (in the case of (III)).
The following are the three properties that should be possessed by the measures: (i) The measures are contained within an interval (e.g., from 0 to 1).(ii) When the measure is minimal, the row and column variables are statistically independent.(iii) When the measure is maximal, the categories of one variable can be identified from the other.Conventional measures satisfy all of these properties.In the remainder of this section, we prove that the proposed measures also satisfy these three properties.

Case I
For a asymmetric situation wherein the column variable is the explanatory variable and the row variable is the response variable, we propose the following measure that presents the strength of association between the row and column variables by where Then, the following theorem for the measure (ii) V 2 1(f ) = 0 if and only if a structure of null association exists in the table (i.e., (iii) V 2 1(f ) = 1 if and only if a structure of complete association exists.For each column j (j = 1, 2, . . ., c), i j uniquely exists such that p i j ,j > 0 and p ij = 0 for all other i( = i j ) (assuming p i• > 0 for all i).
The proof of Theorem 1 is provided in the Appendix.Similar to the interpretation of measure V 2 1(λ) , V 2 1(f ) indicates the degree to which the prediction of the row category of an individual may be improved if knowledge regarding the column category of the individual exists.In this sense, V 2 1(f ) shows the strength of association between the row and column variables.The examples of the fdivergence are given below.When and V KL is identical to the Thile's uncertainty coefficient U (see, Theil, 1970).When f (x) = x 2 − x, the Pearson's divergence is derived, and , is also a single measure and one of the generalizations of V 2 , which agrees with V 2 at θ = 0.The numerator coincides with the triangular discrimination ∆ at θ = 0.5 (see, Dragomir et al., 2000;Topsoe, 2000).Unlike the power-divergence, the θ-divergence can measure departures from independence similar to the Euclidean distance, especially in the case of the triangular discrimination ∆, which can measure symmetrical distances of {p ij }and {p i• p •j }.In the numerical experiments discussed in Sections 4 and 6, we treat the θ-divergence-type measure as the example of a new single-parameter measure that can be considered by extending V 2 and compare it with the conventional one.Moreover, analysis corresponding to various contingency tables can be performed by changing the function.

Case II
For the asymmetric situation wherein the row and column variables are the explanatory and response variables, respectively, we propose the following measure, which presents the strength of association between the row and column variables: where Therefore, the following theorem is obtained for measure V 2 2(f ) .
Theorem 2. For each convex function f , (iii) V 2 2(f ) = 1 if and only if a structure of complete association exists; that is, for each row i (i = 1, 2, . . ., r), j i uniquely exists such that p i,j i > 0 and p ij = 0 for all other j( = j i ) ( assuming p •j > 0 for all j).
The proof of Theorem 2 is obtained in a similar manner to the proof of Theorem 1. V 2 2(f ) coincides with the value of V 2 1(f ) when the row and column variables are interchanged in the table, and V 2 2(f ) has no special characteristics compared to V 2 1(f ) .However, it is proposed because of its importance in Case III.

Case III
In an r × c contingency table wherein explanatory and response variables are undefined, using V 2 1(f ) and V 2 2(f ) is inappropriate if we are interested in determining the degree to what knowledge about the value of one variable can help us predict the value of the other variable.For this asymmetric situation, we propose the following measure that combines the ideas of both V 2 1(f ) and V 2 2(f ) : , where h is the monotonic function and w 1 + w 2 = 1 (w 1 , w 2 ≥ 0).Then, the following theorem is attained for measure V 2 3(f ) .Theorem 3.For each convex function f , (ii) V 2 3(f ) = 0 if and only if a structure of null association exists in the table (i.e., (iii) V 2 3(f ) = 1 if and only if a structure of complete association exists; that is, at most one non zero probability appears in each row or each column (assuming all marginal probabilities are non zero).
The proof of Theorem 3 is provided in the Appendix.We can show that, if h(u) = log u and and if h(u) = 1/u and .
Notably, V 2 G(f ) and V 2 H(f ) are the geometric mean and harmonic mean of V 2 1(f ) and V 2 2(f ) , respectively.We confirm that, when (Miyamoto et al., 2007).
For an r × r contingency table with the same row and column classifications, V 2 3(f ) = 1 if and only if the main diagonal cell probabilities in the r × r table are nonzero and the off-diagonal cell probabilities are all zero after interchanging some row and column categories.Therefore, all observations concentrate on the main diagonal cells.While predicting the values of categories of an individual, V 2 3(f ) would specify the degree to which the prediction could be improved if knowledge about the value of one variable exists.In this sense, V 2 3(f ) also indicates the strength of association between the row and column variables.If only the marginal distributions {p i• } and {p •j } are known, we consider predicting the values of the individual row and column categories in terms of probabilities with independent structures.
Theorem 4. For any fixed convex functions f and monotonic functions h,

Relationship between measures and bivariate normal distribution
In the analysis of the two-way contingency table, Tallis (1962), Lancaster and Hamdan (1964), Kirk (1973), and Divgi (1979) proposed an approach based on the bivariate normal distribution.This approach assumes that the classification of rows and columns results from continuous random variables with a bivariate normal distribution, that is, the sample contingency table comes from a discretized bivariate normal distribution.In many contexts, this assumption is invalid and a more general approach is needed.Therefore, Goodman (1981Goodman ( , 1985) ) presented an approximation close to the correlation structure of discrete bivariate distributions based on the association model, and Becker (1989) also made a similar proposal based on the KL-divergence.Assuming a bivariate normal distribution is important for examining the correlation structure of the contingency table, and previous studies have considered the association based on the model.In this section, we explain the relationship between the measures V 2 t(f ) (t = 1, 2, 3) and the correlation coefficient ρ when a bivariate normal distribution can be assumed for the latent variables in the contingency table.
Assuming a latent variable, the (i,j) cell probability p ij of the r ×c contingency table is denoted as where x i−1 < xi ≤ x i , y j−1 < ỹj ≤ y j and f X * ,Y * (x i , ỹj ) is a continuous joint density function of random variables X * and Y * .∆ x i and ∆ y j are the width of intervals (x i−1 , x i ] and (y j−1 , y j ], respectively.In this situation, it is possible to approximate I f ({p ij }; {p i• p •j }) as follows: where f X * (x) and f Y * (y) are marginal probability density functions of f X * ,Y * (x, y).
Let X * and Y * be random variables according to the bivariate normal distribution and joint density function is where ρ is the correlation coefficient between X * and Y * .The value of the correlation coefficient ranges from −1 to 1.In the formula, the standard deviation σ x and σ y are positive constants.However, the means µ x and µ y do not have to be positive constants.When applying f (x) = (x λ+1 − x)/λ(λ + 1), the relationship between the power-divergence and correlation coefficient ρ is expressed as where λ < 1/|ρ|.Therefore, it is better to use less than 1 for λ under the assumption.If we want to capture the relationship between the measures and correlation coefficient ρ, by applying the value at λ = 0, which is assumed to be the continuous limit as λ → 0 (i.e f (x) = x log x), it can be expressed as When we consider the latent variable and approximate a divergence, the relationship can be shown as ( 2) and (3).These equations show that the value is monotonically increasing with respect to |ρ|.Therefore, by considering the measures, the relationship can be captured and an upper limit can be established.This section showed the relationship between the measures and correlation coefficient ρ using the bivariate normal distribution and f (x) = (x λ+1 −x)/λ(λ+1) as examples.However, in the θ-divergence and more general divergence cases, it is difficult to calculate (1) in a closed form.Therefore, in the next section, we confirm that the value of the measures increases monotonically as the correlation coefficient moves away from 0, even when the θ-divergence is applied.

Numerical study
This section compares the measurements by function or parameter.In the numerical study, we use artificial data generated from discrete bivariate distributions with zero means and unit variances, as in Goodman (1981Goodman ( , 1985)), and Becker (1989).The method of partitioning the bivariate normal distribution is to use cut-points that generate uniform marginal distributions.For instance, when creating a 4 × 4 probability table, we split the bivariate normal distribution using z 0.25 , z 0.50 , and z 0.75 as cut-points.The 4 × 4 artificial probability tables created for the numerical study are given in the Appendix.The benefit of this method is that the strength of association between the row and column variables in the contingency table is known from the bivariate normal distribution, which is appropriate for examining the measures.For the comparison of the measures, we use Tomizawa's powerdivergence-type measures (f (x) = (x λ+1 −x)/λ(λ+1) for 0 ≤ λ ≤ 1) and the newly proposed θ-divergence-type measures (f (x) = (x−1) 2 /(θx+1−θ)+(x−1)/(1−θ) for 0 ≤ θ < 1), both of which are a single-parameter divergence and extensions of Cramér's coefficient V 2 .
Table 9 presents the values of the measures V 2 t(f ) (t = 1, 2, 3) for each 4 × 4 probability tables with ρ = 0.0, 0.4, 0.8, 1.0.Notably, in the case of r × r artificial contingency tables, each of . Table 9 shows that, when the correlation is away from 0, V 2 t(f ) are close to 1.0.Further, ρ = 0 if and only if the measures show that a structure of null association exists in the table, and ρ = 1.0 if and only if the measures confirm that a structure of complete association exists.The sharp increase around ρ = 1.0 can be explained by the previous section's relationship between the measures and correlation coefficients ρ.Another important finding is how each measure increases at ρ = 0.4, 0.8.In the case of V 2 (λ = 1, θ = 0), the increasing trend of V 2 with the change of ρ is slower than most measures.It may not be possible to accurately determine small differences in the strength of association when comparing multiple contingency tables, so having a broad perspective by extension may allow careful analysis.These results suggest that V 2 may not accurately determine small differences in the strength of association when comparing multiple contingency tables made by the bivariate normal distribution.The same is true for the power-divergence-type measures, which have an increasing trend similar to V 2 .We may consider that it is better to use the θ-divergence-type measures with θ = 0.7 in order to determine the small differences in the strength of association.Values of V 2 t(f ) for other ρ, and coverage probabilities are provided in the Appendix.

Approximate confidence intervals for measure
In the previous section, we confirmed the values of the proposed measures with simulated data.However, when analyzing real data, p ij is unknown, and these values are also unknown.Hence, it is necessary to construct confidence intervals.Therefore, in this section, we construct asymptotic confidence intervals by using the delta method.{n ij } denotes the observed frequency from multinomial distribution, and n denotes the total number of observations, namely, r i=1 c j=1 n ij .The approximate standard error and large-sample confidence interval are obtained Table 1: Values of measures V 2 t(f ) (t = 1, 2, 3) setting (a) the power-divergence for any λ and (b) the θ-divergence for any θ in 4 × 4 probability tables with ρ = 0, 0.4, 0.8, 1.0.

Example 1
Consider the data in Table 15, taken from the 2006 General Social Survey.These are data, which show the relationship between family income and education in the United States separately for black and white categories of race.By applying the measures V 2 1(f ) , we consider to what educational degree can be improved when the prediction of family income for black and white categories of an individual is known.16 shows the estimates of the measures, standard errors, and 95% confidence intervals.Tables 16(a1, b1) and 16(b1, b2) show the results of the analysis of Tables 15(a) and 15(b), respectively.One interesting finding is the confidence intervals for all V 2 1(f ) do not contain zero for any λ and θ.The results show that the two actual data have an associated structure from a point of view, other than Cramér's coefficient V 2 .Another important finding is the comparison of the confidence intervals.For conventional the power-divergence-type measures, a comparison of Tables 16(a1) and 16(b1) shows that confidence intervals overlap for each λ.Meanwhile, when θ = 0.9 in Tables 16(a2) and 16(b2), the confidence intervals do not overlap.Table 15(a), where the estimate is closer to 0, has higher independence.Therefore, this analysis revealed the merit of using the measures extended with the f -divergence to express the differences that did not appear in the conventional one.

Example 2
Consider the data in Table 21, obtained from Tomizawa (1985).These tables provide information on the unaided distance vision of 4746 university students aged 18 to about 25 and 3168 elementary students aged 6 to about 12.In Table 21, the row and column variables are the right and left eye grades, respectively, with the categories ordered from the highest grade (1) to the lowest grade (4).As the right and left eye grades have similar classifications, we apply measure V 2 H(f ) .Table 22 provides the estimates of the measures, standard errors, and confidence intervals.Tables 22(a1, b1) and 22(b1, b2) show the results of the analysis of Tables 21(a) and 21(b), respectively.The results of this analysis show that the two actual data have a strong structure of association in terms of the estimates and confidence intervals for all λ and θ.After comparing the value of the measures between Tables 22(a1, b1) and 22(b1, b2), we found that the strength of association between the right and left eyes is greater for elementary school students in terms Table 3: Estimate of the measure V 2 1(f ) , estimated approximate standard error for V 2 1(f ) , and approximate 95% confidence interval of V 2 1(f ) applying (a1, b1) the power-divergence for any λ and (a2, b2) the θ-divergence for any θ.
We recommend using various values instead of only one.V 2 t(f ) are a broad class of measures that include, as special cases ,the power-divergence-type measures (f (x) = (x λ+1 −x)/λ(λ+1)) and the θ-divergence (f (x) = (x−1) 2 /(θx+1−θ)+(x− 1)/(1 − θ)), which encompass Cramér's coefficient V 2 and others.It may be useful when we are interested in exploring new contingency table data with correlated row and column variables.While it is important to investigate various functions and parameters to give safety to the results of the analysis, the user should not select values for own convenience.In the actual analysis, it is considered that the results can be evaluated as the safest by varying the tuning parameters with a function that includes the most reasonable distance in the user's research field.However, when the reasonable distance is unclear, it is necessary to investigate with many functions because it is necessary to give considerations from various viewpoints (e.g., Squared L 2 family distance, Shannon's entropy family distance, etc).In addition, some choice suggestions can be made by considering Cramér's coefficient V 2 itself.
Cramér's coefficient V 2 is a popular measure for evaluating the degree of association between row and column variables in two-way contingency tables, but there are several limitations.Kvålseth (2018) points out some limitations of Cramér's coefficient V 2 .The main objective of this study is to generalize Cramér's coefficient V 2 , but it is also possible to provide two improvements.The first is that meaningful values are difficult to interpret.If the function applied to the f -divergence is integrable in (1) as well as the power-divergence in (2) and (3), it can give an operationally meaningful interpretation to the value of the measure corresponding to the correlation coefficient ρ.The second is that the degree of association may be overestimated when the observation frequency is small.This limitation also has Tomizawa's power-divergence-type measures V 2 t(λ) (t = 1, 2, 3), which have not been improved by generalization with the tuning parameter λ.In such cases, an evaluation can be given from a point of view similar to the Euclidean distance using the θ-divergence (except θ = 0) or other K-divergence, etc.As an example, consider the article data in Table 6.The data indicate clearly a very near independence between row and column categories with the Euclidean distance |p ij − p i• p •j | being either 0 or 0.01 and 3 i=1 3 j=1 |p ij − p i• p •j | = 0.04.However, Cramér's coefficient and Tomizawa's power-divergence-type measure both have large values, while the θ-divergence-type measure is close to the Euclidean distance in Table 7.There are other limitations, but it may be possible to improve the limitations of Cramér's coefficient V 2 while ensuring the properties of the measure by the function to be applied.

Conclusion
We found that the strength of association between the row and column variables in two-way contingency tables can be safely analyzed by proposing measures V 2 t(f ) (t = 1, 2, 3) that generalizes Cramér's coefficient V 2 via the f -divergence.First, this study proved that a measure applying a function f (x) that satisfies the condition of the f -divergence has desirable properties for measuring the strength of association in contingency tables.Hence, we can easily construct a new measure using a divergence that has essential properties for the analyst.Furthermore, we can give a further interpretation of the association between rows and columns in contingency tables, which could not be obtained with a conventional one.Second, we showed the relationship between the proposed measures V 2 t(f ) and the bivariate normal distribution.We found that the relationship between the power-divergence and correlation coefficient ρ is approximately formulated and more succinct with λ = 0.
Measures V 2 t(f ) always range between 0 and 1, independent of the dimensions r and c and the sample size n.Thus, comparing the strength of association between the row and column variables in several tables is useful.This is crucial in checking the relative magnitude of the strength of association between the row and column variables to the degree of complete association.Specifically, V 2 1(f ) (V 2 2(f ) ) would be effective when the row and column variables are the response (explanatory) and explanatory (response) variables, respectively, while V 2 3(f ) would be useful when explanatory and response variables are not defined.Furthermore, we first need to check if independence is established by using a test statistic, such as Pearson's chi-squared statistics, to analyze the strength of association between the row and column variables.Then, if it is determined that a structure of the association exists, the next step would be to measure the strength of the association by using V 2 t(f ) .However, if the table is determined as independent, employed V 2 t(f ) may not be meaningful.Furthermore, V 2 t(f ) is invariant under any permutation of the categories.Therefore, we can apply it to the data analysis on a nominal or ordinal scale.
We observe that (i) the estimate of the strength of association should be considered in terms of an approximate confidence interval for V 2 t(f ) rather than V 2 t(f ) itself and (ii) the measure helps describe relative magnitudes (of the strength of association), rather than absolute magnitudes.
Lemma A.1.Let f be a strictly convex function on [0, +∞) and Subsequently, g is a strictly monotonically increasing function.
Proof of Lemma A.1.Due to the strictly convex function f , for any x 1 , x 2 ∈ [0, +∞) and for any p ∈ (0, 1), it holds that: When x 2 = 0 and x 1 = 0, we have Thus, g is a strictly monotonically increasing function.
Proof of Theorem 1.The f -divergence is first transformed as follows: where g is given by (A.1).From Lemma A.1, since g is the strictly monotonically increasing function, it holds that Furthermore, from Jensen's inequality, we have From the equality of (A.2) holds, we have p ij /p i• p •j = 1 for all i, j.Thus we obtain {p ij = p i• p •j } from the properties of the f -divergence.
Finally, if there uniquely exists i j for each column such that p i j ,j > 0 and p ij = 0 for all other i( = i j ), the measure V 2 1(f ) can be expressed as: From Lemma A.1, as g is the strictly monotonically increasing function, the equality is satisfied if for each column, there is only one i j such that p i j ,j > 0 and p ij = 0 for the other all i( = i j ).
Proof of Theorem 3. First, for the weighted average of Notably, h is the monotonic function, so the equality is satisfied at Besides, V 2 3(f ) = 1 is obvious in the case of (I) and (II), as As mentioned previously, h is the monotonic function, so the equality is satisfied at

is satisfied under situations (I) and (II).
Proof of Theorem 4. The inequality 1 in Theorem 4 has already been validated in the proof of Theorem 3, so it has been omitted.We show the inequality 2. Let h 1 (u) = 1/u and h 2 (u) = log u, then it holds that

B Asymptotic variance for measures
Using the delta method,

C Additional numerical studies
Appendix C presents tables of the results of numerical studies on measures by function and parameter.In numerical study 1, we compare measures by function and parameter from 4 × 4 artificial data with different degrees of association.Numerical study 2 further investigates the effect of the number of rows and columns.

Numerical study 1
Consider Table 8 to examine the relationship between the degree of association and the measures.These tables are 4×4probability tables obtained by discretizing the bivariate normal distribution, and the correlation coefficient ρ corresponds to the associated structure of the contingency table.Therefore, by changing the correlation coefficient ρ by 0.2 from −1 to 1, it is possible to capture changes in the measures depending on the degree of association.
Table 9 presents the values of the measures V 2 t(f ) (t = 1, 2, 3).Notably, in the case of r × r artificial contingency tables, each of Table 9 shows that when the correlation is away from 0, V 2 t(f ) are close to 1. Besides, ρ = 0 if and only if the measures show that there is a structure of null association in the table, and ρ = ±1.0 if and only if the measures confirm that there is a structure of complete association.Furthermore, V 2 t(f ) are invariant under any permutation of the categories.Therefore, when the absolute values of ρ are equal, V 2 t(f ) are also equal.We evaluate the performance of the approximate confidence interval for the proposed measures by the covering probability.Suppose that 4 × 4 contingency tables with a sample size of 5000 are generated by a multinomial random number based on the probability distribution in Table 8.The number of iterations is 100, 000.This experiment enables us to evaluate whether the confidence interval for the proposed measures tends to change with the strength of the association in the contingency table.
Table 10 shows the coverage probability of the approximate 95% confidence interval for V 2 t(f ) .Note, when ρ = ±1.0, the diagonal or antidiagonal component of the contingency table is non-zero and the probability of the other (i, j) cells is zero, as shown in Table 8, so that V 2 t(f ) = 1 as can be seen from Table 9, and the confidence intervals are [1,1].Therefore, we can see that it is reasonable that the coverage probabilities are 1.000 when ρ = ±1.0 as shown in Table 10.Another finding is that the coverage probabilities of the 95% confidence interval for all V 2 t(f ) , regardless of function or parameter, exceed 0.95.This indicates that when the sample size is sufficiently large, the performance of the confidence intervals is good regardless of the strength of association in contingency tables.

Numerical study 2
In numerical experiment 2, we investigate the relationship between the measures when the number of rows and columns is varied.The artificial data is generated from a bivariate normal distribution with ρ = 0.4, and the number of rows and columns is increased to 4, 8, and 12, respectively.Table 11 is the result of the measures V 2 1(f ) , V 2 2(f ) , and V 2 3(f ) (especially V 2 H(f ) and V 2 G(f ) ).In Table 11, as the number of columns increase, V 2 1(f ) increases but V 2 2(f ) decreases for each λ and θ.On the other hand, for each λ and θ, as the number of rows increases, V 2 1(f ) decreases but V 2 2(f ) increases.The increase or decrease in these values is attributed to the increase in the number of explanatory variables, which makes it easier to capture the explanatory variables.The measure V 2 3(f ) combines both measures V 2 1(f ) and V 2 2(f ) to the extent to which knowledge of the value of one variable can help us predict the value of the other variable.Therefore, for each λ and θ, as the number of rows or columns increases, V 2 H(f ) and V 2 G(f ) are less, and the values remain the same even if the number of rows and columns are interchanged.In addition, closer inspection of Table 11 shows that the θ-divergence type measures is less affected by the number of rows and columns than the power-divergence type measures.Although this finding may be somewhat influenced by the discrete bivariate normal distribution, it may be better to use the θ-divergence type meausres with θ ≥ 0.5 in a contingency table where one side of the number of rows and columns is large.
We evaluate the performance of the approximate confidence interval for the proposed measures by the covering probability.Suppose that, contingency tables with a sample size of 5000, varying the number of rows and columns, are generated by a multinomial random number based on the probability distribution obtained from a discrete bivariate normal distribution.The number of iterations is 100, 000.This experiment enables us to evaluate whether the influence of the number of rows and columns in the contingency table tends to change the confidence interval for the proposed measures.
Table 12 shows the coverage probability of the approximate 95% confidence interval for V 2 t(f ) .The important finding is that the coverage probabilities of the 95% confidence interval for all V 2 t(f ) , regardless of function or parameter, exceed 0.91.This indicates that when the sample size is sufficiently large, the performance of the confidence intervals is good regardless of the number of rows and columns in contingency tables.

Example 1
Consider the data in Table 13, taken from Andersen (1994).These are data from a Danish Welfare Study which describes the cross-classification of alcohol consumption and social rank.Alcohol consumption in the contingency table is grouped according to the number of "units" consumed per day.A unit is typically a beer, half a bottle of wine, or 2cl or 40% alcohol.This data can be assumed that row variables are response variables and column variables are explanatory variables.By applying the measure V 2 1(f ) , we consider to what degree the prediction of alcohol consumption can be improved when the social rank of an individual is known.Andersen (1994) Table 14 shows the estimates of the measure, standard errors, and confidence intervals.The confidence intervals for all V 2 1(f ) do not contain zero for all λ and θ.This shows that there is a rather weak association between alcohol alcohol consumption and social rank, including KL-divergence (λ = 0), Pearson divergence (λ = 1, θ = 0), triangular discrimination (θ = 0.5), and other perspectives.Another important finding is that, for example, when λ = 1.0 and θ = 0.0, V 2 1(f ) indicates that the strength of association between alcohol consumption and social rank is estimated to be 0.015 times the complete association.Hence, while predicting the value of alcohol consumption of an individual, we can predict it 1.5% better than when we do not.

Example 2
Consider the data in Table 15, taken from the 2006 General Social Survey.These are data, which show the relationship between family income and education in the United States separately for black and white categories of race.This data can be assumed that row variables are response variables and column variables are explanatory variables.By applying the measures V 2 1(f ) , we consider to what Table 14: Estimate of the measure V 2 1(f ) , estimated approximate standard error for V 2 1(f ) , and approximate 95% confidence interval of V 2 1(f ) , applying (a) powerdivergence for any λ and (b) θ-divergence for any θ.

Example 3
Consider the data in Table 17 taken from Read and Cressie (1988).These are data on 4831 car accidents, which are cross-classified according to accident type and accident severity.This data can be assumed that row variables are explanatory variables and column variables are response variables.By applying the measures V 2 2(f ) , we consider the degree to which prediction of the accident severity can be improved when the accident type is known.Read and Cressie (1988) Table 18 gives the estimates of the measures, standard errors, and confidence intervals.Likewise, the confidence intervals for all V 2 2(f ) do not contain zero for any λ and θ.This shows that there is a weak association between accident type and accident severity, including KL-divergence (λ = 0), Pearson divergence (λ = 1, θ = 0), triangular discrimination (θ = 0.5), and other perspectives.Another important finding is that, for instance, when λ = 1.0 and θ = 0.0, V 2 2(f ) indicates that the strength of the association between accident type and accident severity is estimated to be 0.060 times the complete association.Therefore, while predicting the value of accident severity of an individual, we can predict it 6.0%better than when we do not.

Table 2 :
Data on educational degrees and family income, by race

Table 4 :
Unaided distance vision data for university and elementary students

Table 6 :
Artificial data to show differences from Cramér's coefficient V 2 and

Table 13 :
Association between alcohol consumption and social rank

Table 15 :
Data on educational degrees and family income, by race

Table 21 :
Unaided distance vision data for university and elementary students