Generalized Cramér’s coefficient via f-divergence for contingency tables

Urasaki, Wataru; Nakagawa, Tomoyuki; Momozaki, Tomotaka; Tomizawa, Sadao

doi:10.1007/s11634-023-00560-8

Generalized Cramér’s coefficient via f-divergence for contingency tables

Regular Article
Open access
Published: 05 October 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Generalized Cramér’s coefficient via f-divergence for contingency tables

Download PDF

Wataru Urasaki ORCID: orcid.org/0000-0003-0155-4294¹,
Tomoyuki Nakagawa¹^na1,
Tomotaka Momozaki¹^na1 &
…
Sadao Tomizawa^1,2^na1

758 Accesses
1 Altmetric
Explore all metrics

Abstract

Various measures in two-way contingency table analysis have been proposed to express the strength of association between row and column variables in contingency tables. Tomizawa et al. (2004) proposed more general measures, including Cramér’s coefficient, using the power-divergence. In this paper, we propose measures using the f-divergence that has a wider class than the power-divergence. Unlike statistical hypothesis tests, these measures provide quantification of the association structure in contingency tables. The contribution of our study is proving that a measure applying a function that satisfies the condition of the f-divergence has desirable properties for measuring the strength of association in contingency tables. With this contribution, we can easily construct a new measure using a divergence that has essential properties for the analyst. For example, we conducted numerical experiments with a measure applying the $\theta$-divergence. Furthermore, we can give further interpretation of the association between the row and column variables in the contingency table, which could not be obtained with the conventional one. We also show a relationship between our proposed measures and the correlation coefficient in a bivariate normal distribution of latent variables in the contingency tables.

Default “Gunel and Dickey” Bayes factors for contingency tables

Article Open access 20 June 2016

Power Comparisons in Contingency Tables

Article 25 May 2021

Anatomy of Pearson’s Chi-Square Statistic in Three-Way Contingency Tables

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Contingency tables and their analysis are important for various fields, such as medicine, psychology, education, and social science. Typically, contingency tables are used to evaluate whether row and column variables are statistically independent. If the independence of the two variables is rejected, for example, through Pearson’s chi-squared test, or if they are clearly related, then we are interested in their strength of association. Many coefficients have been proposed to measure the strength of association between the two variables, namely, to measure the degree of departure from independence. Pearson’s coefficient $\phi ^2$ of mean square contingency, P of contingency and Tschuprow’s coefficient T (Tschuprow 1925; Tschuprow 1939) serve as prime examples (see, e.g., Bishop et al. 2007; Everitt 1992; Agresti 2003). These measures can represent the strength of association within an interval of 0–1, where the value 0 indicates the independence of the contingency table. However, the problem with $\phi ^2$ is that its value does not attain 1 despite that the contingency table has a complete association structure (i.e., maximum departure from independence). Similarly, P and T do not always attain the value of 1 depending on the number of rows and columns in the table. To address this issue, Cramér (1946) proposed Cramér’s coefficient $V^2$, which can reach the value of 1 if the contingency table has a complete association structure for all rows and columns. Specifically, $V^2$ indicates the strength of association in the contingency table as $0 \le V^2 \le 1$, with the value of 0 identifying the independent structure and the value of 1 identifying the complete association structure.

Rényi (1961) introduced a class of measures of a divergence of two distributions. Recent studies have linked contingency tables and the divergence. Tomizawa et al. (2004) proposed measures $V^2_{t(\lambda )}$ ($t=1, 2, 3$) based on the power-divergence with parameter $\lambda \ge 0$. This study extended the measure that had been limited to examining $V^2$ ($\lambda = 1$) and showed to be the members of a single-parameter family, including a measure based on the KL-divergence ($\lambda = 0$). (For more details of the power-divergence, see Cressie and Read (1984), and Read and Cressie (1988).) Furthermore, the f-divergence is introduced by Ali and Silvey (1966) and Csiszár (1963) as a useful generalization of the relative entropy, which retains some of its major properties. It is also called the $\phi$-divergence. In contingency table analysis, a considerable amount of literature has been published on modeling using the f-divergence (e.g., Kateri and Papaioannou 1994; Kateri and Papaioannou 1997; Kateri and Agresti 2007; Fujisawa and Tahata 2020; Tahata 2022; Yoshimoto et al. 2019). Many studies on goodness-of-fit tests using the f-divergence have been conducted in the literature, showing the usefulness of the f-divergence (e.g., Pardo 2018; Felipe et al. 2014, 2018, etc.). However, discussions on the measures using the f-divergence are limited.

In this paper, we propose a wider class of measures than the conventional one via the f-divergence. This study’s contribution is proving that a measure applying a function f(x) that satisfies the condition of the f-divergence has desirable properties for measuring the strength of association in contingency tables. This contribution allows us to easily construct a new measure using a divergence that has desirable properties for the analyst. For example, we conduct numerical experiments with a measure applying the $\theta$-divergence. Furthermore, we can give further interpretation of the association between rows and columns in the contingency table, which could not be obtained with the conventional one.

The rest of this paper is organized as follows. Section 2 proposes new measures to express the strength of association between the row and column variables in two-way contingency tables. Furthermore, the section shows that the proposed measures have desirable properties for measuring the strength of association. Section 3 presents the relationship between the measures and correlation coefficient in a bivariate normal distribution of the latent variables in the contingency tables. Section 4 demonstrates its simulation experiment. Section 5 presents the approximate confidence intervals of the proposed measures. Section 6 presents the analysis examples of the proposed measures applying the power-divergence and the $\theta$-divergence with actual data. Finally, Section 7 provides some concluding remarks.

2 Generalized measure

We consider association measures using the f-divergence for an $r \times c$ contingency table. For the $r \times c$ contingency table, let $p_{ij}$ denote the probability that an observation will fall in the ith row and jth column of the table $(i = 1, \dots , r; j=1, \dots , c)$. Moreover, let $p_{i\cdot }$ and $p_{\cdot j}$ be $p_{i \cdot } = \sum ^c_{t=1} p_{it}$ and $p_{\cdot j} = \sum ^r_{s=1} p_{sj}$. Hereinafter, we assume that $\{p_{i\cdot } \ne 0,$ $p_{\cdot j} \ge 0\}$ when $r \le c$ and $\{p_{i\cdot } \ge 0,$ $p_{\cdot j} \ne 0\}$ when $r > c$.

In Sason and Verdú (2016), the f-divergence from P to Q is defined as $I_f(P;Q) = \int f(dP/dQ) dQ$, where f is a convex function and $P \ll Q$. For the $r \times c$ contingency table, P and Q are given as discrete distributions $\{p_{ij}\}$ and $\{q_{ij}\}$. Accordingly, we have $dP/dQ = \{p_{ij}/q_{ij}\}$. Thus, the f-divergence from $\{p_{ij}\}$ to $\{q_{ij}\}$ is given as

$$\begin{aligned} I_f(P;Q) = I_f(\{p_{ij}\};\{q_{ij}\})&= \sum _{i}\sum _{j} q_{ij} f\left( \frac{p_{ij}}{ q_{ij}} \right) , \end{aligned}$$

where f(x) is a once-differentiable and strictly convex function on $(0, +\infty )$ with $f(1) = 0$, $\lim _{x \rightarrow 0}f(x) = 0$, $0f(0/0) = 0$, and $0f(a/0) = a\lim _{x \rightarrow \infty }f(x) / x$ (see Csiszár 2004). By choosing function f, many important divergences, such as the KL-divergence ($f(x) = x\log x$), the Pearson’s divergence ($f(x) = x^2-x$), the power-divergence ($f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$), and the $\theta$-divergence ($f(x) = (x-1)^2/(\theta x + 1 - \theta ) + (x-1)/(1 - \theta )$ ), are included in special cases of the f-divergence (see, Sason and Verdú 2016; Ichimori 2013, e.g.,). Furthermore, the f-divergence is one of the monotone and regular divergences. The class of monotone and regular divergences is introduced in Cencov (2000) and studied in Corcuera and Giummolé (1998) as a wide class of invariant divergences with respect to Markov embeddings. The class of monotone and regular divergence is often used as the measures of goodness of prediction (see Geisser 1993; Corcuera and Giummolè 1999b, 1999, etc.). Studying these measures aims to obtain a quantitative measure of how well a row or column variable predicts the other variable. Therefore, we consider that the measures using the f-divergence are appropriate for measuring the association and a natural generalization of that of Tomizawa et al. (2004).

Measures that present the strength of association between row and column variables are proposed in three cases: (I) When the row and column variables are response and explanatory variables, respectively (II) When they are explanatory and response variables, respectively (III) When response and explanatory variables are undefined. Further, we define measures for the asymmetric situation (in the case of (I) and (II)) and for the symmetric situation (in the case of (III)).

The following are the three properties that should be possessed by the measures: (i) The measures are contained within an interval (e.g., from 0 to 1). (ii) When the measure is minimal, the row and column variables are statistically independent. (iii) When the measure is maximal, the categories of one variable can be identified from the other. Conventional measures satisfy all of these properties. In the remainder of this section, we prove that the proposed measures also satisfy these three properties.

2.1 Case I

For an asymmetric situation wherein the column variable is the explanatory variable and the row variable is the response variable, we propose the following measure that presents the strength of association between the row and column variables by

$$\begin{aligned} V^2_{1(f)}&= \frac{I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})}{K_{1(f)}}, \end{aligned}$$

where

$$\begin{aligned} I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})&= \sum ^r_{i=1} \sum ^c_{j=1}p_{i \cdot }p_{\cdot j} f\left( \frac{p_{ij}}{p_{i \cdot }p_{\cdot j}} \right) , \\ K_{1(f)}&= \sum ^r_{i=1}p^2_{i \cdot } f\left( \frac{1}{p_{i \cdot }} \right) . \end{aligned}$$

Then, the following theorem for the measure $V^2_{1(f)}$ is obtained.

Theorem 1

For each convex function f,

(i)
$0 \le V^2_{1(f)} \le 1$.
(ii)
$V^2_{1(f)} = 0$ if and only if a structure of null association exists in the table (i.e., $\{p_{ij} = p_{i \cdot }p_{\cdot j}\})$.
(iii)
$V^2_{1(f)} = 1$ if and only if a structure of complete association exists. For each column j $(j = 1, 2, \dots , c)$, $i_j$ uniquely exists such that $p_{i_j, j} > 0$ and $p_{ij} = 0$ for all other $i(\ne i_j)$ (assuming $p_{i\cdot } > 0$ for all i).

The proof of Theorem 1 is provided in the Supplementary Material. Similar to the interpretation of measure $V^2_{1(\lambda )}$, $V^2_{1(f)}$ indicates the degree to which the prediction of the row category of an individual may be improved if knowledge regarding the column category of the individual exists. In this sense, $V^2_{1(f)}$ shows the strength of association between the row and column variables. The examples of the f-divergence are given below. When $f(x) = x \log x$, $I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})$ is identical to the KL-divergence and $V^2_{1(f)}$ represented by

$$\begin{aligned} V_{KL}&= \frac{\displaystyle \sum\nolimits ^r_{i=1} \sum\nolimits ^c_{j=1}p_{ij}\log \left( \frac{p_{ij}}{p_{i \cdot }p_{\cdot j}} \right) }{- \displaystyle \sum\nolimits ^r_{i=1} p_{i\cdot }\log p_{i\cdot }} \end{aligned}$$

and $V_{KL}$ is identical to the Thile’s uncertainty coefficient U (see, Theil 1970). When $f(x) = x^2-x$, the Pearson’s divergence is derived, and $V^2_{1(f)}$ is identical to Cramér’s coefficient $V^2$ with $r \le c$, and $f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$, $V^2_{1(f)}$ is identical to the power-divergence-type measure

$$\begin{aligned} V^2_{1(\lambda )}&= \frac{\displaystyle \sum\nolimits ^r_{i=1} \sum\nolimits ^c_{j=1}p_{ij} \left[ \left( \frac{p_{ij}}{p_{i \cdot }p_{\cdot j}} \right) ^{\lambda } - 1 \right] }{\displaystyle \sum\nolimits ^r_{i=1} p^{1-\lambda }_{i\cdot } - 1}. \end{aligned}$$

Further, in the case of $f(x) = (x-1)^2/(\theta x + 1 - \theta ) + (x-1)/(1 - \theta )$ for $0 \le \theta < 1$, $I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})$ is identical to the $\theta$-divergence and $V^2_{1(f)}$ represented by the $\theta$-divergence-type measure

$$\begin{aligned} V^2_{1(\theta )}&= \frac{\displaystyle \sum\nolimits ^r_{i=1} \sum\nolimits ^c_{j=1}\frac{(p_{ij}-p_{i \cdot }p_{\cdot j})^2 }{\theta p_{ij} + (1 - \theta )p_{i \cdot }p_{\cdot j}}}{\displaystyle \sum\nolimits ^r_{i=1} \frac{p_{i\cdot }(1-p_{i\cdot })}{(1-\theta ) \left( \theta + (1-\theta )p_{i\cdot } \right) } }. \end{aligned}$$

Measure $V^2_{1(\theta )}$, such as $V^2_{1(\lambda )}$, is also a single-parameter measure and one of the generalizations of $V^2$, which agrees with $V^2$ at $\theta = 0$. The numerator coincides with the triangular discrimination $\Delta$ at $\theta = 0.5$ (see, Dragomir et al. 2000; Topsoe 2000). Unlike the power-divergence, the $\theta$-divergence can measure departures from independence similar to the Euclidean distance, especially in the case of the triangular discrimination $\Delta$, which can measure symmetrical distances of $\{p_{ij}\}$and $\{p_{i \cdot }p_{\cdot j}\}$. In the numerical experiments discussed in Sections 4 and 6, we treat the $\theta$-divergence-type measure as the example of the new single-parameter measure that can be considered by extending $V^2$ and compare it with the conventional one. Moreover, analysis corresponding to various contingency tables can be performed by changing the function.

2.2 Case II

For the asymmetric situation wherein the row and column variables are the explanatory and response variables, respectively, we propose the following measure, which presents the strength of association between the row and column variables:

$$\begin{aligned} V^2_{2(f)}&= \frac{I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})}{K_{2(f)}}, \end{aligned}$$

where

$$\begin{aligned} K_{2(f)}&= \sum ^c_{j=1}p^2_{\cdot j} f\left( \frac{1}{p_{\cdot j}} \right) . \end{aligned}$$

Therefore, the following theorem is obtained for measure $V^2_{2(f)}$.

Theorem 2

For each convex function f,

(i)
$0 \le V^2_{2(f)} \le 1$.
(ii)
$V^2_{2(f)} = 0$ if and only if a structure of null association exists in the table (i.e., $\{p_{ij} = p_{i \cdot }p_{\cdot j}\})$.
(iii)
$V^2_{2(f)} = 1$ if and only if a structure of complete association exists; that is, for each row i $(i = 1, 2, \ldots , r)$, $j_i$ uniquely exists such that $p_{i, j_i} > 0$ and $p_{ij} = 0$ for all other $j(\ne j_i)$ ( assuming $p_{\cdot j} > 0$ for all j).

The proof of Theorem 2 is obtained in a similar manner to the proof of Theorem 1. $V^2_{2(f)}$ coincides with the value of $V^2_{1(f)}$ when the row and column variables are interchanged in the table, and $V^2_{2(f)}$ has no special characteristics compared to $V^2_{1(f)}$. However, it is proposed because of its importance in Case III.

2.3 Case III

In an $r \times c$ contingency table wherein explanatory and response variables are undefined, using $V^2_{1(f)}$ and $V^2_{2(f)}$ is inappropriate if we are interested in determining the degree to what knowledge about the value of one variable can help us predict the value of the other variable. For this asymmetric situation, we propose the following measure that combines the ideas of both $V^2_{1(f)}$ and $V^2_{2(f)}$:

$$\begin{aligned} V^2_{3(f)}&= h^{-1} \left( \left( w_1h\left( V^2_{1(f)} \right) + w_2h\left( V^2_{2(f)} \right) \right) \right) , \end{aligned}$$

where h is the monotonic function and $w_1 + w_2 = 1$ $(w_1, w_2 \ge 0)$. Then, the following theorem is attained for measure $V^2_{3(f)}$.

Theorem 3

For each convex function f,

(i)
$0 \le V^2_{3(f)} \le 1$.
(ii)
$V^2_{3(f)} = 0$ if and only if a structure of null association exists in the table (i.e., $\{p_{ij} = p_{i \cdot }p_{\cdot j}\})$.
(iii)
$V^2_{3(f)} = 1$ if and only if a structure of complete association exists; that is, at most one non zero probability appears in each row or each column (assuming all marginal probabilities are non zero).

The proof of Theorem 3 is provided in the Supplementary Material. We can show that, if $h(u) = \log u$ and $w_1=w_2$, $V^2_{3(f)}$ is denoted by

$$\begin{aligned} V^2_{G(f)}&= \frac{I_f(\{p_{ij}\};\{p_{I \cdot }p_{\cdot j}\})}{\sqrt{K_{1(f)}K_{2(f)}}} = \sqrt{V^2_{1(f)} V^2_{2(f)}}, \end{aligned}$$

and if $h(u) = 1/u$ and $w_1=w_2$, $V^2_{3(f)}$ is represented by

$$\begin{aligned} V^2_{H(f)}&= \frac{2I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})}{K_{1(f)} + K_{2(f)}} = \frac{2V^2_{1(f)} V^2_{2(f)}}{V^2_{1(f)} + V^2_{2(f)}}. \end{aligned}$$

Notably, $V^2_{G(f)}$ and $V^2_{H(f)}$ are the geometric mean and harmonic mean of $V^2_{1(f)}$ and $V^2_{2(f)}$, respectively. We confirm that, when $f(x) = x^2-x$ with $r=c$, $V^2_{3(f)}$ is identical to Cramér’s coefficient $V^2$. Conversely, for $f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$, $V^2_{3(f)}$ is consistent with Miyamoto’s measure $G^2_{(\lambda )}$ (Miyamoto et al. 2007).

For an $r \times r$ contingency table with the same row and column classifications, $V^2_{3(f)}=1$ if and only if the main diagonal cell probabilities in the $r \times r$ table are non zero and the off-diagonal cell probabilities are all zero after interchanging some row and column categories. Therefore, all observations concentrate on the main diagonal cells. While predicting the values of categories of an individual, $V^2_{3(f)}$ would specify the degree to which the prediction could be improved if knowledge about the value of one variable exists. In this sense, $V^2_{3(f)}$ also indicates the strength of association between the row and column variables. If only the marginal distributions $\{p_{i\cdot }\}$ and $\{p_{\cdot j}\}$ are known, we consider predicting the values of the individual row and column categories in terms of probabilities with independent structures.

Theorem 4

For any fixed convex functions f and monotonic functions h,

1.
$\min (V^2_{1(f)}, V^2_{2(f)}) \le V^2_{3(f)} \le \max (V^2_{1(f)}, V^2_{2(f)})$,
2.
$\min (V^2_{1(f)}, V^2_{2(f)}) \le V^2_{H(f)} \le V^2_{G(f)} \le \max (V^2_{1(f)}, V^2_{2(f)})$.

The proof of Theorem 4 is provided in the Supplementary Material. When $f(x) = x^2-x$ with $r=c$, we observe that $V^2_{1(f)} = V^2_{2(f)} = V^2_{3(f)} = V^2_{H(f)} = V^2_{G(f)} = V^2$ (being the Cramér’s coefficient).

3 Relationship between measures and bivariate normal distribution

In the analysis of the two-way contingency table, Tallis (1962), Lancaster and Hamdan (1964), Kirk (1973), and Divgi (1979) proposed an approach based on the bivariate normal distribution. This approach assumes that the classification of rows and columns results from continuous random variables with the bivariate normal distribution, that is, the sample contingency table comes from a discretized bivariate normal distribution. In many contexts, this assumption is invalid and a more general approach is needed. Therefore, Goodman (1981); Goodman (1985) presented an approximation close to the correlation structure of discrete bivariate distributions based on the association model, and Becker (1989) also made a similar proposal based on the KL-divergence. Assuming the bivariate normal distribution is important for examining the correlation structure of the contingency table, and previous studies have considered the association based on the model. In this section, we explain the relationship between the measures $V^2_{t(f)}$ ($t = 1,2,3$) and the correlation coefficient $\rho$ when the bivariate normal distribution can be assumed for the latent variables in the contingency table.

Assuming the latent variables, the (i,j) cell probability $p_{ij}$ of the $r \times c$ contingency table is denoted as

$$\begin{aligned} p_{ij}&= P(X = i, Y=j) \\&= P(x_{i-1}< X^* \le x_i, y_{j-1} < Y^* \le y_j) \\&= f_{X^*, Y^*}({\tilde{x}}_i, {\tilde{y}}_j)\Delta _{x_i} \Delta _{y_j}, \end{aligned}$$

where $x_{i-1} < {\tilde{x}}_i \le x_i$, $y_{j-1} < {\tilde{y}}_j \le y_j$ and $f_{X^*, Y^*}({\tilde{x}}_i, {\tilde{y}}_j)$ is a continuous joint density function of random variables $X^*$ and $Y^*$. $\Delta _{x_i}$ and $\Delta _{y_j}$ are the width of intervals $(x_{i-1}, x_{i}]$ and $(y_{j-1}, y_{j}]$, respectively. In this situation, it is possible to approximate $I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})$ as follows:

$$\begin{aligned} \begin{aligned}&I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\}) \\&= \sum ^r_{i=1} \sum ^c_{j=1} f_{X^*}({\tilde{x}}_i) f_{Y^*}({\tilde{y}}_j) f\left( \frac{f_{X^*,Y^*}({\tilde{x}}_i, {\tilde{y}}_j)}{f_{X^*}({\tilde{x}}_i) f_{Y^*}({\tilde{y}}_j)} \right) \Delta _{x_i} \Delta _{y_j} \\&\xrightarrow [\Delta _{x_i} \Delta _{y_j} \rightarrow 0]{} \int ^{\infty }_{-\infty } \int ^{\infty }_{-\infty } f_{X^*}(x) f_{Y^*}(y) f\left( \frac{f_{X^*,Y^*}(x, y)}{f_{X^*}(x) f_{Y^*}(y)} \right) dx dy, \end{aligned} \end{aligned}$$

(1)

where $f_{X^*}(x)$ and $f_{Y^*}(y)$ are marginal probability density functions of $f_{X^*,Y^*}(x, y)$.

Let $X^*$ and $Y^*$ be random variables according to the bivariate normal distribution and joint density function is

$$\begin{aligned} \begin{aligned} f_{X^*,Y^*}(x, y)&= \frac{1}{2\pi \sigma _x \sigma _y \sqrt{1-\rho ^2}} \exp \left[ -\frac{1}{2(1-\rho ^2)} \right. \\&\quad \left. \qquad \left\{ \left( \frac{x-\mu _x}{\sigma _x} \right) ^2 - 2\rho \left( \frac{x-\mu _x}{\sigma _x} \right) \left( \frac{y-\mu _y}{\sigma _y} \right) + \left( \frac{y-\mu _y}{\sigma _y} \right) ^2 \right\} \right] \\&\quad -\infty< x< +\infty , \quad -\infty< y < +\infty \end{aligned} \end{aligned}$$

where $\rho$ is the correlation coefficient between $X^*$ and $Y^*$. The value of the correlation coefficient ranges from $-1$ to 1. In the formula, the standard deviation $\sigma _x$ and $\sigma _y$ are positive constants. However, the means $\mu _x$ and $\mu _y$ do not have to be positive constants. When applying $f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$, the relationship between the power-divergence and correlation coefficient $\rho$ is expressed as

$$\begin{aligned} I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})&\approx \frac{1}{\lambda (\lambda + 1)} \left\{ (1-\rho ^2)^{-\frac{\lambda }{2}}(1-\lambda ^2 \rho ^2)^{-\frac{1}{2}}-1 \right\} , \end{aligned}$$

(2)

where $\lambda < 1/\vert \rho \vert$. Therefore, it is better to use less than 1 for $\lambda$ under the assumption. If we want to capture the relationship between the measures and correlation coefficient $\rho$, by applying the value at $\lambda = 0$, which is assumed to be the continuous limit as $\lambda \rightarrow 0$ (i.e $f(x) = x\log x$), it can be expressed as

$$\begin{aligned} I_f(\{p_{ij}\};\{p_{i \cdot }p_{\cdot j}\})&\approx - \frac{1}{2}\log (1-\rho ^2). \end{aligned}$$

(3)

When we consider the latent variables and approximate the divergence, the relationship can be shown as (2) and (3). These equations show that the value is monotonically increasing with respect to $\vert \rho \vert$. Therefore, by considering the measures, the relationship can be captured and an upper limit can be established.

This section showed the relationship between the measures and correlation coefficient $\rho$ using the bivariate normal distribution and $f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$ as examples. However, in the $\theta$-divergence and more general divergence cases, it is difficult to calculate (1) in a closed form. Therefore, in the next section, we confirm that the value of the measures increases monotonically as the correlation coefficient moves away from 0, even when the $\theta$-divergence is applied.

4 Numerical study

This section compares the measurements by function or parameter. In the numerical study, we use artificial data generated from discrete bivariate distributions with zero means and unit variances, as in Goodman (1981); Goodman (1985), and Becker (1989). The method of partitioning the bivariate normal distribution is to use cut-points that generate uniform marginal distributions. For instance, when creating a $4\times 4$ probability table, we split the bivariate normal distribution using $z_{0.25}$, $z_{0.50}$, and $z_{0.75}$ as cut-points. The $4\times 4$ artificial probability tables created for the numerical study are given in the Supplementary Material. The benefit of this method is that the strength of association between the row and column variables in the contingency table is known from the bivariate normal distribution, which is appropriate for examining the measures. For the comparison of the measures, we use Tomizawa’s power-divergence-type measures ($f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$ for $0 \le \lambda \le 1$) and the newly proposed $\theta$-divergence-type measures ($f(x) = (x-1)^2/(\theta x + 1 - \theta ) + (x-1)/(1 - \theta )$ for $0 \le \theta < 1$), both of which are a single-parameter divergence and extensions of Cramér’s coefficient $V^2$.

Table 1 presents the values of the measures $V^2_{t(f)}$ ($t=1,2,3$) for each $4\times 4$ probability tables with $\rho = 0.0, 0.4, 0.8, 1.0$. Notably, in the case of $r \times r$ artificial contingency tables, each of $\{p_{i \cdot }\}$ and $\{p_{\cdot j}\}$ is constant so that $V^2_{1(f)} = V^2_{2(f)} = V^2_{3(f)}$. Table 1 shows that, when the correlation is away from 0, ${\hat{V}}^2_{t(f)}$ are close to 1.0. Further, $\rho = 0$ if and only if the measures show that a structure of null association exists in the table, and $\rho = 1.0$ if and only if the measures confirm that a structure of complete association exists. The sharp increase around $\rho = 1.0$ can be explained by the previous section’s relationship between the measures and correlation coefficients $\rho$. Another important finding is how each measure increases at $\rho = 0.4, 0.8$. In the case of $V^2$ ($\lambda = 1, \theta = 0$), the increasing trend of $V^2$ with the change of $\rho$ is slower than most measures. It may not be possible to accurately determine small differences in the strength of association when comparing multiple contingency tables, so having a broad perspective by extension may allow careful analysis. These results suggest that $V^2$ may not accurately determine small differences in the strength of association when comparing multiple contingency tables made by the bivariate normal distribution. The same is true for the power-divergence-type measures, which have an increasing trend similar to $V^2$. We may consider that it is better to use the $\theta$-divergence-type measures with $\theta =0.7$ in order to determine the small differences in the strength of association. Values of $V^2_{t(f)}$ for other $\rho$, and coverage probabilities are provided in the Supplementary Material.

Table 1 Values of measures $V^2_{t(f)}$ $(t =1, 2, 3)$ setting (a) the power-divergence for any $\lambda$ and (b) the $\theta$-divergence for any $\theta$ in $4\times 4$ probability tables with $\rho = 0, 0.4, 0.8, 1.0$

Full size table

5 Approximate confidence intervals for measure

In the previous section, we confirmed the values of the proposed measures with simulated data. However, when analyzing real data, $p_{ij}$ is unknown, and these values are also unknown. Hence, it is necessary to construct confidence intervals. Therefore, in this section, we construct asymptotic confidence intervals by using the delta method. $\{n_{ij}\}$ denotes the observed frequency from a multinomial distribution, and n denotes the total number of observations, namely, $\sum ^r_{i=1} \sum ^c_{j=1}n_{ij}$. The approximate standard error and large-sample confidence interval are obtained for $V^2_{t(f)} (t =1, 2, 3)$ using the delta method, which is described in, for example, Agresti (2003); Bishop et al. (2007), etc. The estimator of $V^2_{t(f)}$ (i.e., ${\hat{V}}^2_{t(f)}$) is given by $V^2_{t(f)}$ with $\{p_{ij}\}$ replaced by $\{ {\hat{p}}_{ij}\}$, where ${\hat{p}}_{ij} = n_{ij}/n$. When using the delta method, $\sqrt{n}({\hat{V}}^2_{t(f)} - V^2_{t(f)})$ has a asymptotically normal distribution (i.e., as $n \rightarrow \infty$) with mean 0 and variance $\sigma ^2[V^2_{t(f)}]$. Refer to the Supplementary Material for the values of $\sigma ^2[V^2_{t(f)}]$.

We define f(x) as once-differentiable and strictly convex and $f'(x)$ as a first derivative of f(x) with respect to x. Assume ${\hat{\sigma }}^2[V^2_{t(f)}]$ be $\sigma ^2[V^2_{t(f)}]$ with $\{p_{ij}\}$ replaced by $\{ {\hat{p}}_{ij}\}$. Then, an estimated standard error of ${\hat{V}}^2_{t(f)}$ is ${\hat{\sigma }}[V^2_{t(f)}] / \sqrt{n}$, and an approximate $100(1-\alpha )$ percent confidence interval of ${\hat{V}}^2_{t(f)}$ is ${\hat{V}}^2_{t(f)} \pm z_{\alpha /2} {\hat{\sigma }}[V^2_{t(f)}] / \sqrt{n}$, where $z_{\alpha /2}$ is the upper $\alpha /2$ percentage point from the standard normal distribution.

6 Examples

In this section, we explain the benefits of using the f-divergence to extend the measures, with some actual data examples. We use Tomizawa’s power-divergence-type measures ($f(x)=(x^{\lambda +1}-x)/\lambda (\lambda +1)$ for $\lambda = 0.0, 0.6, 1.0, 1.2, 1.5$) and the newly proposed $\theta$-divergence-type measures ($f(x) = (x-1)^2/(\theta x + 1 - \theta ) + (x-1)/(1 - \theta )$ for $\theta = 0.0, 0.3, 0.5, 0.7, 0.9$), both of which are a single-parameter divergence and extensions of Cramér’s coefficient $V^2$. Let observe the estimates of the measures and confidence intervals.

6.1 Example 1

Consider the data in Table 2, taken from the 2006 General Social Survey. These are data, which show the relationship between family income and education in the United States separately for black and white categories of race. By applying the measures $V^2_{1(f)}$, we consider to what educational degree can be improved when the prediction of family income for black and white categories of an individual is known.

Table 2 Data on educational degrees and family income, by race

Full size table

Table 3 shows the estimates of the measures, standard errors, and $95\%$ confidence intervals. Tables 3(a1, b1) and 3(b1, b2) show the results of the analysis of Tables 2(a) and 2(b), respectively. One interesting finding is the confidence intervals for all $V^2_{1(f)}$ do not contain zero for any $\lambda$ and $\theta$. The results show that the two actual data have an associated structure from a point of view, other than Cramér’s coefficient $V^2$. Another important finding is the comparison of the confidence intervals. For conventional the power-divergence-type measures, a comparison of Tables 3(a1) and 3(b1) shows that confidence intervals overlap for all $\lambda$. Meanwhile, when $\theta = 0.9$ in Tables 3(a2) and 3(b2), the confidence intervals do not overlap. Table 2(a), where the estimate is closer to 0, has higher independence. Therefore, this analysis revealed the merit of using the measures extended with the f-divergence to express the differences that did not appear in the conventional one.

Table 3 Estimate of the measure $V^2_{1(f)}$, estimated approximate standard error for ${\hat{V}}^2_{1(f)}$, and approximate $95\%$ confidence interval of $V^2_{1(f)}$ applying (a1, b1) the power-divergence for any $\lambda$ and (a2, b2) the $\theta$-divergence for any $\theta$

Full size table

6.2 Example 2

Consider the data in Table 4, obtained from Tomizawa (1985). These tables provide information on the unaided distance vision of 4746 university students aged 18 to about 25 and 3168 elementary students aged 6 to about 12. In Table 4, the row and column variables are the right and left eye grades, respectively, with the categories ordered from the highest grade (1) to the lowest grade (4). As the right and left eye grades have similar classifications, we apply measure $V^2_{H(f)}$.

Table 4 Unaided distance vision data for university and elementary students

Full size table

Table 5 provides the estimates of the measures, standard errors, and confidence intervals. Tables 5(a1, a2) and 5(b1, b2) show the results of the analysis of Tables 4(a) and 4(b), respectively. The results of this analysis show that the two actual data have a strong structure of association in terms of the estimates and confidence intervals for all $\lambda$ and $\theta$. After comparing the value of the measures between Tables 5(a1, a2) and 5(b1, b2), we found that the strength of association between the right and left eyes is greater for university students in terms of the estimates for each parameter. After comparing the confidence intervals between Tables 5(a1, a2) and 5(b1, b2) can be similarly concluded, we found that the confidence intervals for all $\lambda$ overlap, but not when $\theta = 0.5, 0.7, 0.9$. Another interesting finding is that, unlike Example 1, the confidence intervals of the results of the analysis in Example 2 do not overlap when $\theta = 0.5, 0.7$. In terms of the triangular discrimination $\Delta$ ($\theta = 0.5$), which is not observed in the $V^2$ or the power-divergence-type measures, it can be assumed that this result provides evidence that Table 4(b) has a stronger association structure. Therefore, the extension by the f-divergence helps us perform the analysis safely.

Table 5 Estimate of measures $V^2_{H(f)}$, estimated approximate standard error for ${\hat{V}}^2_{H(f)}$, and approximate $95\%$ confidence interval of $V^2_{H(f)}$ applying (a1, b1) the power-divergence for any $\lambda$ and (a2, b2) the $\theta$-divergence for any $\theta$

Full size table

Remark 1

(Brief guideline for choosing functions and parameters) In an analysis with our proposed measure $V^2_{t(f)} (t=1,2,3)$, users need to choose divergence and set parameters. The choice of these divergences and the setting of parameters should be determined by the distance from which users want to examine data. An instance of how to choose is described, along with the limitations of Cramér’s coefficient $V^2$. Kvålseth (2018) points out some limitations of $V^2$. The main objective of this study is to generalize $V^2$, but it is also possible to provide an improvement. This limitation is that the degree of association may be overestimated when the observation frequency is small. It also has Tomizawa’s power-divergence-type measures $V^2_{t(\lambda )} (t=1,2,3)$, which have not been improved by generalization with the parameter $\lambda$. In such cases, an evaluation can be given from a point of view similar to the Euclidean distance using the $\theta$-divergence (except $\theta =0$). As an example, consider the article data in Table 6. The data indicate clearly very near independence between row and column categories with the Euclidean distance $\vert p_{ij}-p_{i\cdot }p_{\cdot j}\vert$ being either 0 or 0.01 and $\sum ^3_{i=1} \sum ^3_{j=1} \vert p_{ij}-p_{i\cdot }p_{\cdot j}\vert = 0.04$. However, Cramér’s coefficient and Tomizawa’s power-divergence-type measure both have large values, while the $\theta$-divergence-type measure is close to the Euclidean distance in Table 7. In this way, it is necessary to choose divergence and set parameters according to the viewpoint from which users evaluate the degree of departure from independence.

Table 6 Artificial data to show differences from Cramér’s coefficient $V^2$ and Tomizawa’s power-divergence-type measures $V^2_{t(\lambda )} (t=1,2,3)$

Full size table

Table 7 Values of measures $V^2_{H(f)}$ setting (a) the power-divergence for any $\lambda$ and (b) the $\theta$-divergence for Table 6

Full size table

7 Conclusion

We found that the strength of association between the row and column variables in two-way contingency tables can be safely analyzed by proposing measures $V^2_{t(f)} (t =1, 2, 3)$ that generalizes Cramér’s coefficient $V^2$ via the f-divergence. First, this study proved that a measure applying a function f(x) that satisfies the condition of the f-divergence has desirable properties for measuring the strength of association in contingency tables. Hence, we can easily construct a new measure using a divergence that has essential properties for the analyst. Furthermore, we can give a further interpretation of the association between rows and columns in contingency tables, which could not be obtained with a conventional one. Second, we showed the relationship between the proposed measures $V^2_{t(f)}$ and the bivariate normal distribution. We found that the relationship between the power-divergence and correlation coefficient $\rho$ is approximately formulated and more succinct with $\lambda = 0$.

Measures $V^2_{t(f)}$ always range between 0 and 1, independent of the dimensions r and c and the sample size n. Thus, comparing the strength of association between the row and column variables in several tables is useful. This is crucial in checking the relative magnitude of the strength of association between the row and column variables to the degree of complete association. Specifically, $V^2_{1(f)}$ ($V^2_{2(f)}$) would be effective when the row and column variables are the response (explanatory) and explanatory (response) variables, respectively, while $V^2_{3(f)}$ would be useful when explanatory and response variables are not defined. Furthermore, we first need to check if independence is established by using a test statistic, such as Pearson’s chi-squared statistics, to analyze the strength of association between the row and column variables. Then, if it is determined that a structure of association exists, the next step would be to measure the strength of the association by using $V^2_{t(f)}$. However, if the table is determined as independent, employed $V^2_{t(f)}$ may not be meaningful. Furthermore, $V^2_{t(f)}$ is invariant under any permutation of the categories. Therefore, we can apply it to the data analysis on a nominal or ordinal scale.

When using our proposed measures $V^2_{t(f)}$, brief guideline for choosing divergence and setting parameters is described in remark 1. On the other hand, if users do not specifically decide on divergence and parameters, they should consider the data in-depth by applying some divergence and parameters, as in the selection method described in Momozaki et al. (2023). We agree with the method. Statisticians may be interested in mathematically choosing divergence and parameters to use, such as characteristics of the data or relationships between row and column variables. However it would be difficult to discuss in this article, and so these aspects would be considered in future studies. In addition, it is necessary to expand multi-way contingency tables.

Data availability

Not applicable.

References

Agresti A (2003) Categorical data analysis. John Wiley & Sons, Hoboken, New Jersey
MATH Google Scholar
Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J Roy Stat Soc: Ser B (Methodol) 28(1):131–142
MathSciNet Google Scholar
Becker MP (1989) On the bivariate normal distribution and association models for ordinal categorical data. Stat Prob Lett 8(5):435–440
Article MathSciNet MATH Google Scholar
Bishop YM, Fienberg SE, Holland PW (2007). Discrete multivariate analysis: theory and practice. Springer Science & Business Media
Cencov NN (2000) Statistical decision rules and optimal inference (No 53). Am Math Soc
Corcuera JM, Giummolé F (1998) A characterization of monotone and regular divergences. Ann Inst Stat Math 50(3):433–450
Article MathSciNet MATH Google Scholar
Corcuera JM, Giummolè F (1999) A generalized bayes rule for prediction. Scand J Stat 26(2):265–279
Article MathSciNet MATH Google Scholar
Corcuera JM, and Giummolè F (1999b). On the relationship between $\alpha$ connections and the asymptotic properties of predictive distributions. Bernoulli 163–176
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, New Jersey
MATH Google Scholar
Cressie N, Read TR (1984) Multinomial goodness-of-fit tests. J Roy Stat Soc: Ser B (Methodol) 46(3):440–464
MathSciNet MATH Google Scholar
Csiszár I (1963) Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizität von markhoffschen ketten. Publ Math Inst Hungarian Acad Sci 8:85–108
MATH Google Scholar
Csiszár I, Shields PC (2004) A tutorial. Now Publishers Inc., Information theory and statistics
Divgi DR (1979) Calculation of the tetrachoric correlation coefficient. Psychometrika 44(2):169–172
Article MATH Google Scholar
Dragomir S, Sunde J, Buse C,et al., (2000). New inequalities for jeffreys divergence measure. Tamsui Oxford J Math Sci 16, (2)
Everitt BS (1992)The analysis of contingency tables. CRC Press
Felipe A, Martín N, Miranda P, and Pardo L (2014) Phi-divergence test statistics for testing the validity of latent class models for binary data. arXiv preprint arXiv: 1407.2165
Felipe A, Martín N, Miranda P, Pardo L (2018) Statistical inference in constrained latent class models for multinomial data based on $\phi$-divergence measures. Adv Data Anal Classif 12(3):605–636
Article MathSciNet MATH Google Scholar
Fujisawa K, Tahata K (2020) Asymmetry model based on f-divergence and orthogonal decomposition of symmetry for square contingency tables with ordinal categories. SUT J Math 56(1):39–53
Article MathSciNet MATH Google Scholar
Geisser S (1993) Predictive inference: an introduction. Chapman and Hall/CRC
Goodman LA (1981) Association models and the bivariate normal for contingency tables with ordered categories. Biometrika 68(2):347–355
Article MathSciNet MATH Google Scholar
Goodman LA (1985) The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries. The Annals of Statistics, 10–69
Ichimori T (2013) On inequalities between $f$-divergence. Tech Note, IPSJ J 54(11):2344–2348 (in Japanese)
Google Scholar
Kateri M, Agresti A (2007) A class of ordinal quasi-symmetry models for square contingency tables. Stat Prob Lett 77(6):598–603
Article MathSciNet MATH Google Scholar
Kateri M, and Papaioannou T (1994) f-divergence association models. University of Ioannina
Kateri M, Papaioannou T (1997) Asymmetry models for contingency tables Asymmetry models for contingency tables. J Am Stat Assoc 92(439):1124–1131
Article MATH Google Scholar
Kirk DB (1973) On the numerical approximation of the bivariate normal (tetrachoric) correlation coefficient. Psychometrika 38(2):259–268
Article MATH Google Scholar
Kvålseth TO (2018) An alternative to cramér’s coefficient of association. Commun Stat-Theory Methods 47(23):5662–5674
Article MathSciNet MATH Google Scholar
Lancaster H, Hamdan M (1964) Estimation of the correlation coefficient in contingency tables with possibly nonmetrical characters. Psychometrika 29(4):383–391
Article MathSciNet MATH Google Scholar
Miyamoto N, Tamura T, Tomizawa S (2007) Generalized measure of association for contingency tables. JP J Biostat 1(1):25–37
MathSciNet MATH Google Scholar
Momozaki T, Wada Y, Nakagawa T, Tomizawa S (2023) Extension of generalized proportional reduction in variation measure for two-way contingency tables. Behaviormetrika 50(1):385–398
Article Google Scholar
Pardo L (2018) Statistical inference based on divergence measures. Chapman and Hall/CRC
Read TR and Cressie NA (1988) Goodness-of-fit statistics for discrete multivariate data. Springer Science & Business Media
Rényi A (1961) On measures of entropy and information. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability Proceedings of the fourth berkeley symposium on mathematical statistics and probability (Vol 1)
Sason I, Verdú S (2016) $f$ -divergence inequalities. IEEE Trans Inf Theory 62(11):5973–6006
Article MathSciNet MATH Google Scholar
Tahata K (2022) Advances in quasi-symmetry for square contingency tables. Symmetry 14(5):1051
Article Google Scholar
Tallis G (1962) The maximum likelihood estimation of correlation from contingency tables. Biometrics 18(3):342–353
Article MathSciNet MATH Google Scholar
Theil H (1970) On the estimation of relationships involving qualitative variables. Am J Sociol 76(1):103–154
Article Google Scholar
Tomizawa S (1985) Analysis of data in square contingency tables with ordered categories using the conditional symmetry model and its decomposed models. Environ Health Perspect 63:235–239
Article Google Scholar
Tomizawa S, Miyamoto N, Houya H (2004) Generalization of cramer’s coefficient of association for contingency tables: theory and methods. S Afr Stat J 38(1):1–24
MATH Google Scholar
Topsoe F (2000) Some inequalities for information divergence and related measures of discrimination. IEEE Trans Inf Theory 46(4):1602–1609
Article MathSciNet MATH Google Scholar
Tschuprow A (1925) Grundbegriffe und grundprobleme der korrelationstheorie. B.G. Teubner, Leipzig
MATH Google Scholar
Tschuprow A (1939) Principles of the mathematical theory of correlation. W. Hodge & Co
Yoshimoto T, Tahata K, Saigusa Y, Tomizawa S (2019) Quasi point-symmetry models based on f-divergence and decomposition of point-symmetry for multi-way contingency tables. SUT J Math 55(2):109–137
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to the editor and the referees for their valuable comments and suggestions.

Funding

Open access funding provided by Tokyo University of Science. This work was supported by JSPS Grant-in-Aid for Scientific Research (C) Number JP20K03756.

Author information

Tomoyuki Nakagawa, Tomotaka Momozaki and Sadao Tomizawa authors contributed equally to this work.

Authors and Affiliations

Department of Information Sciences, Tokyo University of Science, Noda City, Chiba, 278-8510, Japan
Wataru Urasaki, Tomoyuki Nakagawa, Tomotaka Momozaki & Sadao Tomizawa
Department of Information Science, Meisei University, Hino City, Tokyo, 191-8506, Japan
Sadao Tomizawa

Authors

Wataru Urasaki
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar
Tomotaka Momozaki
View author publications
You can also search for this author in PubMed Google Scholar
Sadao Tomizawa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

These authors contributed equally to this work.

Corresponding author

Correspondence to Wataru Urasaki.

Ethics declarations

Conflilct of interest

Not applicable.

Code availability

Not applicable.

Ethical approval

Not applicable.

Consent for publication

All authors have read and agreed to the published version of the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 122 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Urasaki, W., Nakagawa, T., Momozaki, T. et al. Generalized Cramér’s coefficient via f-divergence for contingency tables. Adv Data Anal Classif (2023). https://doi.org/10.1007/s11634-023-00560-8

Download citation

Received: 08 October 2022
Accepted: 05 September 2023
Published: 05 October 2023
DOI: https://doi.org/10.1007/s11634-023-00560-8

Keywords

Mathematical Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Generalized Cramér’s coefficient via f-divergence for contingency tables

Abstract

Similar content being viewed by others

Default “Gunel and Dickey” Bayes factors for contingency tables

Power Comparisons in Contingency Tables

Anatomy of Pearson’s Chi-Square Statistic in Three-Way Contingency Tables

1 Introduction

2 Generalized measure

2.1 Case I

Theorem 1

2.2 Case II

Theorem 2

2.3 Case III

Theorem 3

Theorem 4

3 Relationship between measures and bivariate normal distribution

4 Numerical study

5 Approximate confidence intervals for measure

6 Examples

6.1 Example 1

6.2 Example 2

Remark 1

7 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflilct of interest

Code availability

Ethical approval

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 122 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematical Subject Classification

Search

Navigation