Asymmetry Models Based on Non-integer Scores for Square Contingency Tables

Ando, Shuji

doi:10.1007/s44199-022-00039-z

Asymmetry Models Based on Non-integer Scores for Square Contingency Tables

Research Article
Open access
Published: 25 January 2022

Volume 21, pages 21–30, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Applications Aims and scope Submit manuscript

Asymmetry Models Based on Non-integer Scores for Square Contingency Tables

Download PDF

Shuji Ando ORCID: orcid.org/0000-0003-1663-1897¹

1486 Accesses
2 Citations
Explore all metrics

Abstract

Square contingency tables with ordinal classifications are used in many disciplines that include but are not limited to data science, engineering, and medical research. This study proposes two original asymmetry models based on non-integer scores for the analysis of square contingency tables. The ordinal quasi-symmetry model applies to data sets that can be assigned to known ordered scores for all categories. When we assign the equally spaced score for categories, the ordinal quasi-symmetry model is equivalent to the linear diagonals-symmetry model. The ordinal quasi-symmetry model, however, is not applicable to data sets that cannot be assigned the known ordered scores for all categories. This study addresses this issue. The proposed models apply to data sets that: (i) can be assigned the known ordered scores for all except one category and (ii) cannot be assigned the known ordered scores for all categories. These two models provide a better fit than existing models for real-world data.

Separation of symmetry for square tables with ordinal categorical data

Article 30 November 2019

An index for measuring degree of departure from symmetry for ordinal square contingency tables

Article Open access 16 June 2024

Decompositions of symmetry using extended palindromic symmetry models for square contingency tables

Article 01 March 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider $R \times R$ square contingency tables with the same row and column ordinal classifications. Square contingency tables are used in many disciplines that include data science, engineering, and medical research, see, for example, Agresti [1].

Let $s_{k}$ be the ordered score of category k for all $k=1,\dots ,R$, where $s_{1}<\cdots < s_{R}$. We consider the data sets that: (i) can be assigned the known ordered scores for all categories, (ii) can be assigned the known ordered scores for all except one category, and (iii) cannot be assigned the known ordered scores for all categories.

Typical examples of types (i) and (ii) are categorical variables set as intervals based on a continuous variable. When there is clear information about category intervals, it is recommended that ordered scores be assigned as midpoint intervals (midpoint scores) instead of equally spaced scores (see Graubard and Korn [2] and Senn [3]).

For the analysis of the data sets of type (i), the ordinal quasi-symmetry (OQS) model proposed by Agresti [1] is often used. The OQS model indicates the asymmetric structure of the cell probabilities with respect to the main-diagonals cell of the table. The OQS model assumes that the row category k and column category k are assigned the same known scores $s_{k}$ for all $k=1,\dots ,R$. This assumption is natural for square contingency tables with the same row and column ordinal classifications.

Table 1 Cross-classification of 1995 income data (in units of 10,000 yen) for income of individuals (male) and their spouses (female) in Japan are derived from the Social Science Japan Data Archive (available at https://nesstar.iss.u-tokyo.ac.jp/webview/)

Full size table

We consider the data set in Table 1, that presents the cross-classification of 1995 income data for espoused couples in Japan. Individual and spouse incomes are categorized as “less than 70”, “from 70 to less than 150”, “from 150 to less than 450”, and “450 or more”. We assign 35, 110, and 300 as the ordered scores for the first, second, and third categories, respectively. However, as the fourth category is unbounded above, we could not assign the known ordered score.

Gautam [4] suggests that the ordered scores for a data set with an open-ended category should be assigned as follows: the scores $s_{1}$ to $s_{R-1}$ are midpoint scores, and the score $s_{R}$ is unknown. Therefore, the score $s_{R}$ can be expressed as $s_{R}=w_{0}+w$; where $w_{0}$ is the smallest value of the interval for the open-ended category, and $w\ge 0$ is unknown. Gautam [4] assumes that the row category k and column category k are assigned the same scores $s_{k}$ for all $k=1,\dots ,R$. Additionally, as the ordered scores for a data set with an open-ended category, Aktas and Wu [5] introduces the standardized z-scores for the row and column categories, and proposes a model that indicates the asymmetric structure depending on the standardized z-scores. However, the standardized z-scores assumes that the row category k and column category k are assigned the different scores $s_{k}$ for all $k=1,\dots ,R$. Therefore, for the analysis of the data sets of type (ii), we are interested in considering a model the asymmetric structure of the cell probabilities depending on the score proposed by Gautam [4].

We further consider the data set in Table 2 obtained from Agresti [1], that present the cross-classification of occupational status categories for father and son dyads in Britain. In this case, the scores $s_{1}$ to $s_{R}$ are treated as unknown because it was difficult to assign an ordered known score to any category.

For the analysis of the data sets of type (iii), the ridit score type quasi-symmetry (RQS) model proposed by Iki et al. [6] is often used. The RQS model indicates the asymmetric structure of the cell probabilities depending on the ridit scores. The ridit scores is the unknown ordered scores, and is defined the average of row and column marginal ridits. Thus, the RQS model also assumes that the row category k and column category k are assigned the same unknown scores $s_{k}$ for all $k=1,\dots ,R$.

Bagheban and Zayeri [7] proposes a power parameter score. We modify the power parameter score so that it can be treated as the unknown ordered score, although Bagheban and Zayeri [7] is treated the power parameter score as the known ordered scores. Thus, for the analysis of the data sets of type (iii), we are interested in proposing a model the asymmetric structure of the cell probabilities depending on the power parameter score.

Table 2 Cross-classification of occupational status categories for father and son dyads in Britain, obtained from Agresti [1]

Full size table

This study proposes two original asymmetry models based on non-integer scores for square contingency tables with the same row and column ordinal classifications. One of the proposed models is useful for the data set with open-ended categories–namely type (ii). The other is applicable to a data set that cannot be assigned the known ordered scores for all categories–namely type (iii).

The remainder of this paper is organized as follows. Sect. 2 proposes two original models based on non-integer scores. Sect. 3 demonstrates the utility of these proposed models as applied to the real-world data presented in Tables 1 and 2. We conclude the paper in Sect. 4.

2 Proposed Models

2.1 Existing Models Based on Non-integer Scores

Let $p_{ij}$ denote the probability that an observation will fall in the (i, j)th cell of the table ($i=1,\dots ,R;j=1,\dots ,R$).

This study focuses on a model having the following formula:

$$\begin{aligned} p_{ij} = \delta ^{s_j-s_i}p_{ji} \quad (i<j). \end{aligned}$$

Note that $s_{k}$ is the ordered score of category k for all $k=1,\dots ,R$, where $s_{1}<\cdots < s_{R}$. This model can represent various models depending on how $s_k$ is set.

As the model based on integer scores (i.e., $s_k = k$ for all $k=1,\dots ,R$), the linear diagonals-parameter symmetry (LDPS) model proposed by Agresti [8] was defined as

$$\begin{aligned} p_{ij} = \delta ^{j-i}p_{ji} \quad (i<j). \end{aligned}$$

We introduce existing models based on non-integer scores (i.e., $s_k \ne k$ for all $k=1,\dots ,R$). We can assign the known ordered scores $s_{k}$ to the category k for all $k=1,\dots ,R$. The ordinal quasi-symmetry (OQS) model proposed by Agresti [1] was defined as

$$\begin{aligned} p_{ij} = \delta ^{s_j-s_i}p_{ji} \quad (i<j). \end{aligned}$$

Note that the OQS model corresponds to the type (i) data set. The OQS model with equally spaced scores ($s_k = s_1 + (k-1)d$ for $k=1,\dots ,R$) is equivalent to the LDPS model. The OQS model with $\delta =1$ is also identical to the symmetry (S) model proposed by Bowker [9]. Kateri and Agresti [10] considered the OQS model based on f-divergence, also see Saigusa et al. [11].

Let X and Y denote the row and column variables, $p_{k\cdot }=\sum ^{R}_{l=1}p_{kl}$ and $p_{\cdot k}=\sum ^{R}_{l=1}p_{lk}$ for the marginal probabilities for $k=1,\dots ,R$. and $F^{X}_{k} = \sum ^{k}_{l=1}p_{l\cdot }$ and $F^{Y}_{k} = \sum ^{k}_{l=1}p_{\cdot l}$ for the marginal distribution functions for $k=1,\dots ,R$, where $F^{X}_{R}=1$ and $F^{Y}_{R} = 1$. Then the marginal ridits are defined as,

$$\begin{aligned} r^{X}_k = \sum ^{k-1}_{l=1} p_{l \cdot } + \frac{p_{k \cdot }}{2} \quad \mathrm{and} \quad r^{Y}_k = \sum ^{k-1}_{l=1} p_{\cdot l} + \frac{p_{\cdot k}}{2} \quad (k=1,\dots ,R), \end{aligned}$$

see Bross [12].

When we cannot assign the known ordered scores $s_{k}$ to the category k for all $k=1,\dots ,R$, we adopt the RQS model proposed by Iki et al. [6]:

$$\begin{aligned} p_{ij} = \delta ^{s_j-s_i}p_{ji} \quad (i<j), \end{aligned}$$

where $s_{k}=(r^{X}_{k} + r^{Y}_{k})/2$ for all $k=1,\dots ,R$. Note that $\{s_{k}\}$ in the RQS model are unspecified (i.e., the unknown ordered scores), and the RQS model corresponds to the data set of type (iii).

We highlight that a model corresponding to the data set of type (ii) does not exist in a similar form to the OQS and RQS models.

2.2 Proposed Models Based on Non-integer Scores

We propose two original models based on non-integer scores corresponding to the data set of types (ii) and (iii). First, we propose an original model corresponding to the data set of type (ii), defined as

$$\begin{aligned} p_{ij} = \delta ^{s_j-s_i}p_{ji} \quad (i<j), \end{aligned}$$

where $s_1$ to $s_{R-1}$ are known, and $s_R$ is unknown. Therefore, $s_{1}$ to $s_{R-1}$ are assigned to known ordered scores (e.g., midpoint scores), $s_{R}$ is defined as $s_{R}=w_{0}+w$, where $w_{0}$ is the smallest value of the interval for the open-ended category, and $w\ (\ge 0)$ is unspecified. We refer this model as the open-ended category type asymmetry (OEAS) model.

Second, we consider a model corresponding to the data set of type (iii). Bagheban and Zayeri [7] consider a power parameter score as follows:

$$\begin{aligned} k^{a} \quad (k=1,\dots ,R), \end{aligned}$$

where $a > 0$. The power parameter score has the following properties:

(1)
if $a < 1$ then the difference in scores between category $k+1$ and k decreases as k increases;
(2)
if $a > 1$ then the difference in scores between category $k+1$ and k increases as k increases;
(3)
if $a = 1$ then the power parameter score is equivalent to the equally spaced score.

Bagheban and Zayeri [7] treated a as known but did not discuss how to select the optimal value of a. In the OQS model, Ando [13] used the power parameter score as the known ordered scores, selected the optimal value of a by a grid search. In contrast, we propose the following original model treating a as unknown:

$$\begin{aligned} p_{ij} = \delta ^{s_j-s_i}p_{ji} \quad (i<j), \end{aligned}$$

where $s_{k}=k^a$ for $k=1,\dots ,R$, and $a \ (>0)$ are unknown. We refer to this model as the power parameter type asymmetry (PPAS) model. The PPAS model with $a=1$ is identical to the LDPS model.

Under the OEAS and PPAS models, the following properties hold:

(1)
if $\delta >1$ then $F^{X}_{k} > F^{Y}_{k}$ for all $k=1,\dots ,R-1$ because $p_{ij} > p_{ji}$ for all $i<j$;
(2)
if $\delta <1$ then $F^{X}_{k} < F^{Y}_{k}$ for all $k=1,\dots ,R-1$ because $p_{ij} < p_{ji}$ for all $i<j$;
(3)
if $\delta = 1$ then the S model holds because $p_{ij} = p_{ji}$ for all $i<j$.

For properties (1) and (2), the parameter $\delta$ in the OEAS or PPAS models infers whether X is stochastically greater than Y or vice versa.

2.3 Goodness-of-Fit Test

Let $n_{ij}$ denote the observed frequency in the (i, j)th cell of the table ($i, j = 1,\dots ,R$). Assume that a multinomial distribution applies to the $R\times R$ table. The maximum likelihood estimates of expected frequencies under the model can be obtained using the Newton–Raphson method in the log-likelihood equation.

Each model can be tested for goodness-of-fit by, the likelihood ratio and chi-square statistic (denoted by $G^{2}$) with the corresponding degrees of freedom. The test statistic $G^{2}$ of model M is given as

$$\begin{aligned} G^{2}(M)=2\sum ^{R}_{i=1}\sum ^{R}_{j=1} n_{ij} \log \left( \frac{n_{ij}}{\hat{m}_{ij}}\right) , \end{aligned}$$

where $\hat{m}_{ij}$ is the maximum likelihood estimate (MLE) of the expected frequency $m_{ij}$ under model M.

The number of degrees of freedom for both the OEAS and PPAS models are $(R^{2}-R-4)/2$. Note that the number of degrees of freedom for the OEAS and PPAS models is one less than that of the LDPS, OQS, and RQS models, and two less than the S model.

Applied economists often use the Akaike information criterion (AIC) as a quick method for choosing the best-fitting model among alternatives. The AIC is defined as

$$\begin{aligned} \text {AIC} = -2(\text {the maximum log likelihood}) + 2(\text {the number of parameters}), \end{aligned}$$

for each model, see Akaike [14]. This criterion recommends a model with minimum AIC as the best-fitting model. When two models are compared, only the difference between AICs is required. It is therefore possible to ignore a common constant AIC, and use a modified AIC defined as

$$\begin{aligned} \text {AIC}^{+} = G^{2} - 2(\text {the number of degrees of freedom}). \end{aligned}$$

Thus, the model with the minimum AIC$^{+}$ (i.e., the minimum AIC) is the best-fitting model among the applied models.

3 Application to Real-World Data

3.1 Application to Income Data

We apply the S, LDPS, and OEAS models to the data set in Table 1. The ordered scores $s_1, s_2$, and $s_3$ of the OEAS model are assigned as 35, 110, and 300 respectively, and $s_4$ is assigned $450+w$ ($w\ge 0$).

Table 3 shows the MLEs of the expected frequencies under the OEAS model. The goodness-of-fit results in Table 4 reveal that (1) the S and LDPS models fit poorly, (2) the OEAS model fits well, and (3) the OEAS model is significantly better compared to the LDPS model.

Under the OEAS model, the MLEs of $\delta$ and w are $\hat{\delta }=0.986$ and $\hat{w}=39.203$ respectively. Thus, the MLE of $s_4$ is $\hat{s}_4 = 489.203$. Since $s_2-s_1 = 75$, $s_3-s_2 = 190$, and $s_4-s_3 = 189.203$, the $s_{k+1}-s_k$ for $k=1,\dots ,R-1$ are unlikely to be constant. Since $\hat{\delta } < 1$, we infer that the male individuals’ incomes tend to be higher than that of their female spouses’ incomes.

Table 3 The maximum likelihood estimates of expected frequencies under the open-ended category type asymmetry model applied to the data set in Table 1 are shown in parentheses in the second line

Full size table

Table 4 Values of the likelihood ratio, chi-square statistic ($G^2$) and the modified Akaike information criterion ($\text {AIC}^{+}$), for each model applied to the data are shown in Table 1

Full size table

3.2 Application to Occupational Status Data

We apply the S, LDPS, and PPAS models to the data set in Table 2.

Table 5 shows the MLEs of expected frequencies under the PPAS model. The results in Table 6 reveal that (1) the S and LDPS models fit poorly, (2) the RQS and PPAS models fit well, and (3) the PPAS model is preferred over the RQS model when values of $\text {AIC}^{+}$ are compared.

Under the PPAS model, the MLEs of $\delta$ and a are $\hat{\delta }=1.000013$ and $\hat{a}=6.441$ respectively. As $\hat{a} > 1$, the difference in scores between $k+1$ and k increases as k increases. We provide evidence that, $s_2-s_1 = 85.891$, $s_3-s_2 = 1096.712$, $s_4-s_3 = 6366.527$, and $s_5-s_4 = 24230.730$. As $\hat{\delta } >1$, we infer that the occupational statuses of fathers tend to be higher than those of their sons.

Table 5 The maximum likelihood estimates of expected frequencies under the power parameter type asymmetry model applied to the data set in Table 2 are shown in parentheses in the second line

Full size table

Table 6 Values of the likelihood ratio chi-square statistic ($G^2$) and the modified Akaike information criterion ($\text {AIC}^{+}$), for each model applied to the data are shown in Table 2

Full size table

4 Conclusion

This study introduced three types of data set, namely those that: (i) can be assigned the known ordered scores for all categories, (ii) can be assigned the known ordered scores for all except one category, and (iii) cannot be assigned the known ordered scores for all categories. This study proposed two original asymmetry models based on non-integer scores corresponding to data sets of types (ii) and (iii). The proposed models are simple asymmetry models, and therefore easier to apply and interpret. The findings demonstrate that the proposed models are applicable to real-world data.

Data Availability

The data set of Table 1 is available at https://nesstar.iss.u-tokyo.ac.jp/webview/.

Abbreviations

OQS:: Ordinal quasi-symmetry
LDPS:: Linear diagonals-parameter symmetry
S:: Symmetry
RQS:: Ridit score type quasi-symmetry
OEAS:: Open-ended category type asymmetry
PPAS:: Power parameter type asymmetry
MLE:: Maximum likelihood estimate
AIC:: Akaike information criterion

References

Agresti, A.: Categorical data analysis, 2nd edn. Wiley, New York (2002)
Book Google Scholar
Graubard, B.I., Korn, E.L.: Choice of column scores for testing independence in ordered $2 \times k$ contingency tables. Biometrics 43, 471–476 (1987)
Article MathSciNet Google Scholar
Senn, S.: Drawbacks to noninteger scoring for ordered categorical data. Biometrics 63, 296–299 (2007)
Article MathSciNet Google Scholar
Gautam, S.: Test for linear trend in $2 \times k$ ordered tables with open-ended categories. Biometrics 53, 1163–1169 (1997)
Article MathSciNet Google Scholar
Aktas, S., Wu, S.: Marginal homogeneity model for ordered categories with open ends in square contingency tables. REVSTAT-Statistical Journal 13(3), 233–243 (2015)
MathSciNet MATH Google Scholar
Iki, K., Tahata, K., Tomizawa, S.: Ridit score type quasi-symmetry and decomposition of symmetry for square contingency tables with ordered categories. Austrian Journal of Statistics 38, 183–192 (2009)
Google Scholar
Bagheban, A.A., Zayeri, F.: A generalization of the uniform association model for assessing rater agreement in ordinal scales. Journal of Applied Statistics 37, 1265–1273 (2010)
Article MathSciNet Google Scholar
Agresti, A.: A simple diagonals-parameter symmetry and quasi-symmetry model. Statistics & Probability Letters 1, 313–316 (1983)
Article MathSciNet Google Scholar
Bowker, A.H.: A test for symmetry in contingency tables. Journal of the American Statistical Association 43, 572–574 (1948)
Article Google Scholar
Kateri, M., Agresti, A.: A class of ordinal quasi-symmetry models for square contingency tables. Statistics & Probability Letters 77(6), 598–603 (2007)
Article MathSciNet Google Scholar
Saigusa, Y., Tahata, K., Tomizawa, S.: Orthogonal decomposition of symmetry model using the ordinal quasi-symmetry model based on f-divergence for square contingency tables. Statistics & Probability Letters 101, 33–37 (2015)
Article MathSciNet Google Scholar
Bross, I.D.J.: How to use ridit analysis. Biometrics 14, 18–38 (1958)
Article Google Scholar
Ando, S.: Asymmetry models based on ordered score and separations of symmetry model for square contingency tables. Biometrical Letters 58(1), 27–39 (2021)
Article Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control AC19, 716–723 (1983)
Article Google Scholar

Download references

Acknowledgements

The author would like to thank the anonymous reviewers and the editors for their comments and suggestions to improve this paper. The data set in Table 1 used in this analysis–namely Social Stratification and Mobility (SSM95A)–was provided by the Social Science Japan Data Archive, Center for Social Research and Data Archives, Institute of Social Science, and the University of Tokyo.

Funding

The authors have solely funded the research by themselves.

Author information

Authors and Affiliations

Department of Information and Computer Technology, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo, 1258585, Japan
Shuji Ando

Authors

Shuji Ando
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuji Ando.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

Not applicable.

Consent for Publication

Not applicable.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ando, S. Asymmetry Models Based on Non-integer Scores for Square Contingency Tables. J Stat Theory Appl 21, 21–30 (2022). https://doi.org/10.1007/s44199-022-00039-z

Download citation

Received: 25 October 2021
Accepted: 14 January 2022
Published: 25 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s44199-022-00039-z

Keywords

Mathematics Subject Classification

62H17

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Asymmetry Models Based on Non-integer Scores for Square Contingency Tables

Abstract

Similar content being viewed by others

Separation of symmetry for square tables with ordinal categorical data

An index for measuring degree of departure from symmetry for ordinal square contingency tables

Decompositions of symmetry using extended palindromic symmetry models for square contingency tables

1 Introduction

2 Proposed Models

2.1 Existing Models Based on Non-integer Scores

2.2 Proposed Models Based on Non-integer Scores

2.3 Goodness-of-Fit Test

3 Application to Real-World Data

3.1 Application to Income Data

3.2 Application to Occupational Status Data

4 Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Consent for Publication

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Asymmetry Models Based on Non-integer Scores for Square Contingency Tables

Abstract

Similar content being viewed by others

Separation of symmetry for square tables with ordinal categorical data

An index for measuring degree of departure from symmetry for ordinal square contingency tables

Decompositions of symmetry using extended palindromic symmetry models for square contingency tables

1 Introduction

2 Proposed Models

2.1 Existing Models Based on Non-integer Scores

2.2 Proposed Models Based on Non-integer Scores

2.3 Goodness-of-Fit Test

3 Application to Real-World Data

3.1 Application to Income Data

3.2 Application to Occupational Status Data

4 Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Consent for Publication

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation