1 Introduction

Consider an R × R square contingency table with the same row and column ordinal classifications. Let X and Y denote the row and column variables, respectively, and let Pr(X = i,Y = j) = pij for i = 1,…,R;j = 1,…,R. The marginal homogeneity (MH) model is defined by

$$ p_{i \cdot} = p_{\cdot i} \quad \text{for} ~ i = 1, {\ldots} , R, $$
(1)

where \(p_{i \cdot } = {\sum }^{R}_{t=1}p_{it}\) and \(p_{\cdot i} = {\sum }^{R}_{s=1}p_{si}\). See e.g., Stuart (1955) and Bishop et al. (1975, p.294). This indicates that the row marginal distribution is identical to the column marginal distribution.

Using the marginal cumulative probability, this model can be expressed as

$$ {F^{X}_{i}} = {F^{Y}_{i}} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(2)

where \({F^{X}_{i}} = {\sum }^{i}_{s=1} p_{s \cdot } = \text {Pr}(X \leq i)\) and \({F^{Y}_{i}} = {\sum }^{i}_{t=1} p_{\cdot t} = \text {Pr}(Y \leq i)\). The MH model can also be expressed as

$$ G_{1(i)} = G_{2(i)} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(3)

where \(G_{1(i)} = {\sum }^i_{s=1} {\sum }^R_{t=i+1} p_{st} = \text {Pr}(X \leq i, Y > i)\) and \(G_{2(i)} = {\sum }^{R}_{s=i+1} {\sum }^{i}_{t=1} p_{st} = \text {Pr}(X > i, Y \leq i)\) (see e.g., Tomizawa 1993; Tahata and Tomizawa 2008). Furthermore, Tahata et al. (2006) expressed the MH model using marginal ridits (see e.g., Bross 1958; Fleiss et al. 2003, pp.198-205; Agresti 2010, p.10). Moreover, the MH model can be expressed with other formulas (see e.g., Iki et al. 2010; Altun and Aktaş 2018).

When the MH model does not fit for the data, we are interested in applying a model with weaker restrictions. One example is an extension based on expression (1) proposed by Miyamoto et al. (2006) for a square contingency table with nominal classifications. For a square contingency table with ordinal classifications, the marginal cumulative logistic (ML) model is defined by

$$ \log \left( \frac{{F^{X}_{i}}}{1-{F^{X}_{i}}} \right) = \log \left( \frac{{F^{Y}_{i}}}{1-{F^{Y}_{i}}} \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1. $$
(4)

See e.g., McCullagh (1977), Agresti (2010, p.241), and Kurakami et al. (2013). Saigusa et al. (2018) proposed the marginal cumulative complementary log-log (MCL) model, defined by

$$ \log \left( -\log \left( 1-{F^{X}_{i}} \right) \right) =\log \left( -\log \left( 1-{F^{Y}_{i}} \right) \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1. $$
(5)

The ML (MCL) model indicates that one marginal distribution is a location shift of another marginal distribution on a logistic (complementary log-log) scale. Each special case of the ML and MCL model obtained by setting Δ = 0 is the MH model. These models are extensions of the MH model based on expression (2). Furthermore, Tahata and Tomizawa (2008) proposed extensions of the MH model based on expressions (3).

Herein we examine a new expression of the MH model using the continuation-ratio (see e.g., Fienberg 1980, pp.110-111; Agresti 2010, p.45). The MH model can be expressed as

$$ {c^{X}_{i}} = {c^{Y}_{i}} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(6)

where

$$ {c^{X}_{i}} = \frac{p_{i \cdot}}{1-{F^{X}_{i}}}, \quad {c^{Y}_{i}} = \frac{p_{\cdot i}}{1-{F^{Y}_{i}}}. $$

This states that the row marginal continuation-ratio is identical to the column marginal continuation-ratio. Note that there are various research focusing on the continuation-ratio (see e.g., Thompson 1977; McCullagh 1980; Läärä and Matthews 1985; Tutz 1991; Greenland 1994). As an example, Thompson (1977) used the continuation-ratio in modeling discrete survival time data. When the lengths of time intervals approach zero, his model converges to the Cox proportional hazards model.

For the square contingency table analysis, much research on the marginal homogeneity have been studied. However, research on the framework of the continuation-ratio, which is an important concept in categorical analysis, are not enough. As an example, the ML model cannot be interpreted under the continuation-ratio. The purpose of this study is to provide a new insight for the square contingency table analysis by studying the continuation-ratio. This paper can also further understand the previous research by considering the properties of the continuation-ratio. The plan of the paper is as follows. Section 2 extends the MH model based on expression (6). Section 3 decomposes the MH model. Section 4 extends the model into multi-way tables. Section 5 gives a test for the goodness-of-fit for the models. Section 6 provides some examples, and Section 7 discusses this paper in the context of related works.

2 Models

2.1 The Marginal Continuation Odds Ratio Model

The ratio of marginal continuation-ratios is

$$ \psi_{i} = \frac{{c^{X}_{i}}}{{c^{Y}_{i}}} = \frac{p_{i \cdot}/\left( 1-{F^{X}_{i}} \right)}{p_{\cdot i}/\left( 1-{F^{Y}_{i}} \right)} = \frac{\text{Pr}(X=i)/\text{Pr}(X>i)}{\text{Pr}(Y=i)/\text{Pr}(Y>i)}, $$

for i = 1,…,R − 1. We refer to the ratio of marginal continuation-ratios as the marginal continuation odds ratio. Note that this is different from the continuation odds ratio (Agresti 2010, p.24), and the quasi-symmetry model based on the continuation odds ratio was presented by Kateri et al. (2017).

We propose a new model defined by

$$ \log \psi_{i} = {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(7)

where the parameter Δ is unspecified. This model indicates that the ratios of marginal continuation-ratios are equal to \(\exp \left ({\Delta } \right )\). A special case of this model obtained by setting Δ = 0 is the MH model. We shall refer to model (7) as the marginal continuation odds ratio (MCOR) model.

Let

$$ {\omega^{X}_{i}} = \frac{p_{i \cdot}}{1-F^{X}_{i-1}} = \text{Pr}(X = i \mid X \geq i), $$

and

$$ {\omega^{Y}_{i}} = \frac{p_{\cdot i}}{1-F^{Y}_{i-1}} = \text{Pr}(Y = i \mid Y \geq i), $$

for i = 1,…,R − 1 with \({F^{X}_{0}}={F^{Y}_{0}}=0\). For models based on these conditional probabilities, see e.g., Läärä and Matthews (1985) and McCullagh and Nelder (1983, pp.102-104). Then the marginal continuation odds ratio is also expressed as

$$ \psi_{i} = \frac{{\omega^{X}_{i}} \left( 1-{\omega^{Y}_{i}} \right)}{{\omega^{Y}_{i}} \left( 1-{\omega^{X}_{i}} \right)} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$

since

$$ {c^{X}_{i}} = \frac{{\omega^{X}_{i}}}{1-{\omega^{X}_{i}}}, $$

and

$$ {c^{Y}_{i}} = \frac{{\omega^{Y}_{i}}}{1-{\omega^{Y}_{i}}}. $$

Then the MCOR model can be expressed as

$$ \log \left( \frac{{\omega^{X}_{i}}}{1-{\omega^{X}_{i}}} \right) = \log \left( \frac{{\omega^{Y}_{i}}}{1-{\omega^{Y}_{i}}} \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1. $$
(8)

Under this model, Δ > 0 is equivalent to \(\{ {\omega ^{X}_{i}} > {\omega ^{Y}_{i}} \}\).

The MCOR model can also be expressed as

$$ {\omega^{X}_{i}} = \frac{\exp \left( \theta_{i} + {\Delta} \right)}{1 + \exp \left( \theta_{i} + {\Delta} \right)} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$

where \(\theta _{i} = \log \left ({\omega ^{Y}_{i}} / \left (1-{\omega ^{Y}_{i}} \right ) \right )\). Therefore, the MCOR model indicates that the conditional probability \({\omega ^{X}_{i}}\) is a location shift of the conditional probability \({\omega ^{Y}_{i}}\) on a logistic scale. Thus, the MCOR model can also be called a marginal continuation-ratio logit model.

Interpretation of the proposed model will be described using the following examples. Consider the comparison of therapeutic effects when two drugs are administered to the same patient. The treatment effect is an ordinal score with R stages (larger scores indicate more severe symptoms). So, we obtain an R × R contingency table with the same row and column classifications (row variable is drug A; column variable is drug B). We are now interested in the odds that an observation will fall in score category i, instead of score category i + 1 or above for any i. From Eq. (7), under the MCOR model, the parameter Δ indicates the odds ratio between drug A and B; if the Δ is zero, the MH model holds, i.e., there is no difference between drug A and B; if the Δ is positive, the odds ratio is \(\exp \left ({\Delta } \right )\) times higher, i.e., drug A is more therapeutic effect than drug B. We can also interpret the MCOR model in two ways. From Eq. (8), on condition that an observation will fall in score category i or above, the odds that the observation falls in score category i instead of not i, are \(\exp \left ({\Delta } \right )\) times higher for drug A than for drug B. Moreover, we can see that the conditional probability for drug A is a location shift of that for drug B on a logistic scale.

Note that model (4) can be transformed into model (8) by replacing the marginal cumulative probability with the corresponding marginal conditional probability. However, the meanings of these models completely differ, and the likelihood ratio chi-squared statistics for testing the goodness-of-fit of these models do not coincide.

2.2 The Generalized Marginal Continuation-ratio Model

Model (8) is an extension of the MH model using the logit transformation. Hence, model (8) may be based on the idea of model (4). If we focus on the idea of model (5), a distinct extension can be derived using the complementary log-log transformation. Therefore, using a strictly increasing function such as a logit or a complementary log-log function, we propose a generalization of the MCOR model by

$$ h^{-1}\left( {\omega^{X}_{i}} \right) = h^{-1}\left( {\omega^{Y}_{i}} \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(9)

where the parameter Δ is unspecified and h(⋅) is a twice-differentiable and strictly increasing function with \(\displaystyle \lim _{x \to - \infty } h(x)=0\) and \(\displaystyle \lim _{x \to \infty } h(x)=1\). We shall refer to model (9) as the generalized marginal continuation-ratio (GMC) model. A special case of this model obtained by setting Δ = 0 is the MH model.

By setting \(h^{-1}\left ({\omega ^{Y}_{i}} \right ) = \theta _{i}\), the GMC model can be expressed as

$$ {\omega^{X}_{i}} = h \left( \theta_{i} + {\Delta} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1. $$

Let

$$ g (x) = \frac{ h (x) }{1 - h (x)}. $$

Note that g(⋅) is a strictly increasing function that gives \(\displaystyle \lim _{x \to - \infty } g(x) = 0\), \(\displaystyle \lim _{x \to \infty } g(x) = \infty \), and g(⋅) > 0. The GMC model can also be expressed as

$$ \frac{{\omega^{X}_{i}}}{1-{\omega^{X}_{i}}} = \frac{ h \left( \theta_{i} + {\Delta} \right) }{ 1 - h \left( \theta_{i} + {\Delta} \right) } = g \left( \theta_{i} + {\Delta} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1. $$

Furthermore, since

$$ \frac{{\omega^{X}_{i}}}{1-{\omega^{X}_{i}}} = \frac{p_{i \cdot}}{1-{F^{X}_{i}}}, $$

the GMC model can be expressed as

$$ \frac{p_{i \cdot}}{1-{F^{X}_{i}}} = g \left( \theta_{i} + {\Delta} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1. $$

Especially, when \(h^{-1}(x) = \log \left (x / \left (1-x \right ) \right )\), i.e., \(g(x) = \exp (x)\), the GMC model is equivalent to the MCOR model.

2.3 Properties

In this section, we focus on the complementary log-log and probit transformation as the major transformations for the GMC model.

2.3.1 The Marginal Continuation-ratio Complementary Log-log Model

When \(h^{-1}(x) = \log (-\log (1-x))\), the GMC model is expressed as

$$ \log \left( -\log \left( 1-{\omega^{X}_{i}} \right) \right) = \log \left( -\log \left( 1-{\omega^{Y}_{i}} \right) \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1. $$
(10)

We shall refer to model (10) as the marginal continuation-ratio complementary log-log (MCC) model.

Läärä and Matthews (1985) noted that the complementary log-log transformation for the conditional probabilities is equivalent to the one using the same transformation but with the cumulative probabilities. This leads to the following property.

Property 1.

The MCC model is equivalent to the MCL model.

We give the proof of Property 1 below: The conditional probabilities \({\omega ^{X}_{i}}\) can be expressed as

$$ {\omega^{X}_{i}} = \frac{p_{i \cdot}}{1 - F^{X}_{i-1}} = 1 - \frac{1 - {F^{X}_{i}}}{1 - F^{X}_{i-1}}, $$

then

$$ 1 - {\omega^{X}_{i}} = \frac{1 - {F^{X}_{i}}}{1 - F^{X}_{i-1}}, $$

for i = 1,…,R − 1. Therefore, the MCC model is expressed as

$$ \log \left( 1-{F^{X}_{i}} \right) - \log \left( 1-F^{X}_{i-1} \right) = \exp({\Delta}) \left[ \log \left( 1-{F^{Y}_{i}} \right) - \log \left( 1-F^{Y}_{i-1} \right) \right], $$

for i = 1,…,R − 1. When i = 1, we see

$$ \log \left( 1-{F^{X}_{1}} \right) = \exp({\Delta}) \log \left( 1-{F^{Y}_{1}} \right). $$

When i = 2, we see

$$ \log \left( 1-{F^{X}_{2}} \right) - \log \left( 1-{F^{X}_{1}} \right) = \exp({\Delta}) \left[ \log \left( 1-{F^{Y}_{2}} \right) - \log \left( 1-{F^{Y}_{1}} \right) \right], $$

thus,

$$ \log \left( 1-{F^{X}_{2}} \right) = \exp({\Delta}) \log \left( 1-{F^{Y}_{2}} \right). $$

Hence, in a similar manner we see that the MCC model is expressed as

$$ \log \left( 1-{F^{X}_{i}} \right) = \exp({\Delta}) \log \left( 1-{F^{Y}_{i}} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1. $$

This expression represents the MCL model.

From above, the parameter Δ in the MCC model can reflect the degree of inhomogeneity not only between \(\{ {\omega ^{X}_{i}} \}\) and \(\{ {\omega ^{Y}_{i}} \}\) but also between \(\{ {F^{X}_{i}} \}\) and \(\{ {F^{Y}_{i}} \}\). Hence, the MCC model also states that one marginal distribution is a location shift of another marginal distribution on a complementary log-log scale.

2.3.2 The Marginal Continuation-ratio Probit Model

Using the probit transformation, the GMC model is expressed as

$$ {\Phi}^{-1} \left( {\omega^{X}_{i}} \right) = {\Phi}^{-1} \left( {\omega^{Y}_{i}} \right) + {\Delta} \quad \text{for} ~ i = 1, {\ldots} , R-1, $$
(11)

where Φ(⋅) is the cumulative distribution function of the standard normal distribution. We refer to model (11) as the marginal continuation-ratio probit (MCP) model.

3 Decompositions of the Marginal Homogeneity Model

Consider the marginal mean equality (ME) model defined by

$$ \mathrm{E}(X) = \mathrm{E}(Y), $$

i.e.,

$$ \sum\limits^{R}_{i=1} i p_{i \cdot} = \sum\limits^{R}_{i=1} i p_{\cdot i}. $$

Note that the MH model implies the ME model.

We obtain the following lemmas and theorem.

Lemma 1.

The GMC model can also be expressed as

$$ p_{i \cdot} = \frac{ g \left( \theta_{i} + {\Delta} \right) }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) }, \quad p_{\cdot i} = \frac{ g \left( \theta_{i} \right) }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) }, $$

for i = 1,…,R − 1, and

$$ p_{R \cdot} = \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}, \quad p_{\cdot R} = \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right)}. $$

Proof 1.

The GMC model is expressed as

$$ \frac{p_{i \cdot}}{1-{F^{X}_{i}}} = g \left( \theta_{i} + {\Delta} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1. $$

When i = 1,

$$ \frac{p_{1 \cdot}}{1-{F^{X}_{1}}} = g \left( \theta_{1} + {\Delta} \right), $$

namely

$$ p_{1 \cdot} = \frac{g \left( \theta_{1} + {\Delta} \right)}{1+g \left( \theta_{1} + {\Delta} \right)}. $$

When i = 2,

$$ \frac{p_{2 \cdot}}{1-(p_{1 \cdot}+p_{2 \cdot})} = g \left( \theta_{2} + {\Delta} \right). $$

Namely

$$ \begin{array}{ll} \left( 1+g \left( \theta_{2} + {\Delta} \right) \right)p_{2 \cdot} &= g \left( \theta_{2} + {\Delta} \right)\left( 1-p_{1 \cdot} \right) \\ &= \displaystyle g \left( \theta_{2} + {\Delta} \right)\left( 1-\frac{g \left( \theta_{1} + {\Delta} \right)}{1+g \left( \theta_{1} + {\Delta} \right)} \right) \\ &= \displaystyle \frac{g \left( \theta_{2} + {\Delta} \right)}{1+g \left( \theta_{1} + {\Delta} \right)}. \end{array} $$

Thus

$$ p_{2 \cdot} = \frac{g \left( \theta_{2} + {\Delta} \right)}{{\prod}^{2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. $$

When i = 3,

$$ \frac{p_{3 \cdot}}{1-(p_{1 \cdot}+p_{2 \cdot}+p_{3 \cdot})} = g \left( \theta_{3} + {\Delta} \right). $$

Namely

$$ \begin{array}{@{}rcl@{}} &&\left( 1+g \left( \theta_{3} + {\Delta} \right) \right)p_{3 \cdot}\\ &=& g \left( \theta_{3} + {\Delta} \right)\left( 1-p_{1 \cdot}-p_{2 \cdot} \right) \\ &=& \displaystyle g \left( \theta_{3} + {\Delta} \right)\left( 1 - \frac{g \left( \theta_{1} + {\Delta} \right)}{1+g \left( \theta_{1} + {\Delta} \right)} - \frac{g \left( \theta_{2} + {\Delta} \right)}{{\prod}^{2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}\right) \\ &=& \displaystyle \frac{g \left( \theta_{3} + {\Delta} \right)}{{\prod}^{2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. \end{array} $$

Thus

$$ p_{3 \cdot} = \frac{g \left( \theta_{3} + {\Delta} \right)}{{\prod}^{3}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. $$

By a similar manner, we obtain

$$ p_{i \cdot} = \frac{ g \left( \theta_{i} + {\Delta} \right) }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) }, $$

for i = 1,…,R − 1. Moreover, we obtain

$$ \begin{array}{ll} p_{R \cdot} &= \displaystyle 1 - {\sum}^{R-1}_{i=1} p_{i \cdot} \\ &= \displaystyle 1 - \frac{g \left( \theta_{1} + {\Delta} \right)}{1+g \left( \theta_{1} + {\Delta} \right)} - {\sum}^{R-1}_{i=2} \frac{g \left( \theta_{i} + {\Delta} \right)}{{\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \\ &= \displaystyle \frac{1}{1+g \left( \theta_{1} + {\Delta} \right)} - \frac{g \left( \theta_{2} + {\Delta} \right)}{{\prod}^{2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} - {\sum}^{R-1}_{i=3} \frac{g \left( \theta_{i} + {\Delta} \right)}{{\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. \end{array} $$

Since

$$ \displaystyle \frac{1}{{\prod}^{k-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} - \frac{g \left( \theta_{k} + {\Delta} \right)}{{\prod}^{k}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} = \frac{1}{{\prod}^{k}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}, $$

for k = 2,…,R − 1, we see

$$ p_{R \cdot} = \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. $$

In a similar manner, we obtain

$$ p_{\cdot i} = \frac{ g \left( \theta_{i} \right) }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) }, $$

for i = 1,…,R − 1, and

$$ p_{\cdot R} = \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right)}. $$

Lemma 2.

Under the GMC model, we have

$$ \mathrm{E}(X) = 1 + \sum\limits^{R-1}_{i=1} \frac{ 1 }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) }, \quad \mathrm{E}(Y) = 1 + \sum\limits^{R-1}_{i=1} \frac{ 1 }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) }. $$

Proof 2.

Assume that the GMC model holds. From Lemma 1 we see

$$ \begin{array}{@{}rcl@{}} \mathrm{E}(X) &=& \displaystyle \sum\limits^{R}_{i=1} i p_{i \cdot} \\ &=& \displaystyle \sum\limits^{R-2}_{i=1} i \frac{g \left( \theta_{i} + {\Delta} \right)}{{\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \\ && + \displaystyle \left[ (R-1) \frac{g \left( \theta_{R-1} + {\Delta} \right)}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} + R \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \right] \\ &=& \displaystyle \sum\limits^{R-2}_{i=1} i \frac{g \left( \theta_{i} + {\Delta} \right)}{{\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \\ && + \displaystyle \left[ (R-1) \frac{1}{{\prod}^{R-2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} + \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \right] \\ &=& \displaystyle \sum\limits^{R-3}_{i=1} i \frac{g \left( \theta_{i} + {\Delta} \right)}{{\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \\ && + \displaystyle \left[ (R-2) \frac{g \left( \theta_{R-2} + {\Delta} \right)}{{\prod}^{R-2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} + (R - 1) \frac{1}{{\prod}^{R-2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \right] \\ && + \displaystyle \frac{1}{{\prod}^{R-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}. \end{array} $$

Since

$$ \begin{array}{@{}rcl@{}} && \displaystyle (k-1) \frac{g \left( \theta_{k-1} + {\Delta} \right)}{{\prod}^{k-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} + k \frac{1}{{\prod}^{k-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} \\ &=& \displaystyle (k-1) \frac{1}{{\prod}^{k-2}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)} + \frac{1}{{\prod}^{k-1}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right)}, \end{array} $$

where \({\prod }^{0}_{s=1} \left (1 + g \left (\theta _{s} + {\Delta } \right ) \right ) = 1\) for k = 2,…,R, we obtain

$$ \mathrm{E}(X) = 1 + \sum\limits^{R-1}_{i=1} \frac{ 1 }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) }. $$

In a similar manner, we obtain

$$ \mathrm{E}(Y) = 1 + \sum\limits^{R-1}_{i=1} \frac{ 1 }{ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) }. $$

Theorem 1.

The MH model holds if and only if both the GMC and ME models hold.

Proof 3.

If the MH model holds, then the GMC and ME models hold. Assuming that the GMC and ME models hold, we shall show that the MH model holds.

Since the ME model holds,

$$ \mathrm{E}(X) - \mathrm{E}(Y) = \sum\limits^{R-1}_{i=1} \frac{ \left[ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) \right] - \left[ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) \right] }{ \left[ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} + {\Delta} \right) \right) \right] \left[ {\prod}^{i}_{s=1} \left( 1 + g \left( \theta_{s} \right) \right) \right] } = 0, $$

from Lemmas 1 and 2. When Δ > 0, E(X) −E(Y ) < 0. When Δ < 0, E(X) −E(Y ) > 0. Therefore, Δ = 0. Consequently, the MH model holds. The proof is complete. □

We can also describe the following decompositions of the MH model.

Corollary 1.

The MH model holds if and only if both the MCOR and ME models hold.

Corollary 2.

The MH model holds if and only if both the MCC and ME models hold.

Corollary 3.

The MH model holds if and only if both the MCP and ME models hold.

4 Extension into Multi-way Tables

We extend the models and decompositions in Sections 2 and 3 into multi-way contingency tables.

Note that we must consider extensions into multi-way tables not only theoretical aspects but also practical aspects since they are known to be sparse. Although application issues will be future research, we give theoretical extensions respecting the historical value of previous research.

4.1 Models

Consider an RT table (T ≥ 2) with ordered categories. Let Xt denote the t-th random variable for t = 1,…,T, and let Pr\((X_{1} = i_{1} , \ldots , X_{T} = i_{T}) = p_{i_{1} {\ldots } i_{T}}\) for it = 1,…,R. The MHT model can be expressed as

$$ p^{(1)}_{i} = p^{(2)}_{i} = {\cdots} = p^{(T)}_{i} \quad \text{for} ~ i = 1, {\ldots} , R, $$

where \(p^{(t)}_{i} = \text {Pr}(X_{t} = i)\). See e.g., Bhapkar and Darroch (1990) and Agresti (2013, p.439).

Let \(\omega ^{(t)}_{i} = \text {Pr}(X_{t} = i \mid X_{t} \geq i)\) for i = 1,…,R − 1; t = 1,…,T. Then we propose a model defined by

$$ h^{-1}\left( \omega^{(k)}_{i} \right) = h^{-1}\left( \omega^{(1)}_{i} \right) + {\Delta}_{k} \quad \text{for} ~ i = 1, {\ldots} , R-1;~k = 2, {\ldots} , T, $$
(12)

where the parameters Δk are unspecified. A special case of this model obtained by setting Δ2 = ⋯ = ΔT = 0 is the MHT model. We refer to model (12) as the GMCT model. Under the GMCT model, Δk > 0(k = 2,…,T) is equivalent to \(\omega ^{(k)}_{i} > \omega ^{(1)}_{i}\) for i = 1,…,R − 1. Therefore, the parameters Δk in the GMCT model reflect the degree of inhomogeneity between \(\{ \omega ^{(k)}_{i} \}\) and \(\{ \omega ^{(1)}_{i} \}\). Incidentally, by setting \(h^{-1}\left (\omega ^{(1)}_{i} \right ) = \theta _{i}\), the GMCT model can be expressed as

$$ \omega^{(t)}_{i} = h \left( \theta_{i} + {\Delta}_{t} \right) \quad \text{for} ~ i = 1, {\ldots} , R-1;~t = 1, {\ldots} , T, $$

where Δ1 = 0. Hence, under the GMCT model, the conditional probability \(\omega ^{(k)}_{i}\) is a location shift of the conditional probability \(\omega ^{(1)}_{i}\) in terms of the above equation for k = 2,…,T.

Especially, when \(h^{-1}(x) = \log \left (x / \left (1-x \right ) \right )\), the GMCT model is expressed as

$$ \log \left( \frac{\omega^{(k)}_{i}}{1-\omega^{(k)}_{i}} \right) = \log \left( \frac{\omega^{(1)}_{i}}{1-\omega^{(1)}_{i}} \right) + {\Delta}_{k}, $$
(13)

for i = 1,…,R − 1; k = 2,…,T. We shall refer to model (13) as the MCORT model. Note that

$$ \frac{\omega^{(t)}_{i}}{1-\omega^{(t)}_{i}} = \frac{p^{(t)}_{i}}{1-F^{(t)}_{i}} \quad \text{for} ~ i = 1, {\ldots} , R-1;~t = 1, {\ldots} , T, $$

where \(F^{(t)}_{i} = {\sum }^{i}_{s=1} p^{(t)}_{s} = \text {Pr}(X_{t} \leq i)\). Using the marginal continuation odds ratio, the MCORT model can also be expressed as

$$ \log \psi^{(k)}_{i}= {\Delta}_{k} \quad \text{for} ~ i = 1, {\ldots} , R-1;~k = 2, {\ldots} , T, $$

where

$$ \psi^{(k)}_{i} = \frac{\omega^{(k)}_{i} \left( 1-\omega^{(1)}_{i} \right)}{\omega^{(1)}_{i} \left( 1-\omega^{(k)}_{i} \right)} = \frac{p^{(k)}_{i} \left( 1-F^{(1)}_{i} \right)}{p^{(1)}_{i} \left( 1-F^{(k)}_{i} \right)}. $$

Using the complementary log-log transformation, the GMCT model is expressed as

$$ \log \left( -\log \left( 1-\omega^{(k)}_{i} \right) \right) = \log \left( -\log \left( 1-\omega^{(1)}_{i} \right) \right) + {\Delta}_{k}, $$
(14)

for i = 1,…,R − 1; k = 2,…,T. We shall refer to model (14) as the MCCT model.

Using the probit transformation, the GMCT model is expressed as

$$ {\Phi}^{-1} \left( \omega^{(k)}_{i} \right) = {\Phi}^{-1} \left( \omega^{(1)}_{i} \right) + {\Delta}_{k} \quad \text{for} ~ i = 1, {\ldots} , R-1;~k = 2, {\ldots} , T, $$
(15)

where Φ(⋅) is the cumulative distribution function of the standard normal distribution. We refer to model (15) as the MCPT model.

4.2 Decompositions of the Marginal Homogeneity Model

Consider the MET model defined by

$$ \mathrm{E}(X_{1}) = {\cdots} = \mathrm{E}(X_{T}), $$

i.e.,

$$ \sum\limits^{R}_{i=1} i p^{(1)}_{i} = {\cdots} = \sum\limits^{R}_{i=1} i p^{(T)}_{i}. $$

Note that the MHT model implies the MET model.

We obtain the following theorem.

Theorem 2.

For the RT table, the MHT model holds if and only if both the GMCT and MET models hold.

The proof is omitted because it is obtained in a similar manner as the proof of Theorem 1. We also obtain the following corollaries.

Corollary 4.

For the RT table, the MHT model holds if and only if both the MCORT and MET models hold.

Corollary 5.

For the RT table, the MHT model holds if and only if both the MCCT and MET models hold.

Corollary 6.

For the RT table, the MHT model holds if and only if both the MCPT and MET models hold.

5 Goodness-of-fit Test

Let \(n_{i_{1}{\ldots } i_{T}}\) denote the observed frequency in the (i1,…,iT) cell of the RT table with \(n = \sum {\cdots } \sum n_{i_{1}{\ldots } i_{T}}\), and let \(m_{i_{1}{\ldots } i_{T}}\) denote the corresponding expected frequency. Assume that {\(n_{i_{1}{\ldots } i_{T}}\)} has a multinomial distribution. The maximum likelihood estimates (MLEs) of the expected frequencies under each model can be obtained using the Newton-Raphson method to solve the likelihood equations.

The likelihood ratio chi-squared statistic to test the goodness-of-fit of model M is given by

$$ G^{2}(M) = 2 \sum\limits^{R}_{i_{1}=1} {\cdots} \sum\limits^{R}_{i_{T}=1} n_{i_{1}{\ldots} i_{T}} \log \left( \frac{n_{i_{1}{\ldots} i_{T}}}{\hat{m}_{i_{1}{\ldots} i_{T}}} \right), $$

where \(\hat {m}_{i_{1}{\ldots } i_{T}}\) is the MLEs of \(m_{i_{1}{\ldots } i_{T}}\) under the model. The numbers of degrees of freedom (df) of statistics for testing the goodness-of-fit of the MH, GMC, and ME models are (T − 1)(R − 1), (T − 1)(R − 2), and T − 1, respectively. Consider two nested models, say M1 and M2, such that if model M1 holds, then model M2 holds. To test the goodness-of-fit of model M1 assuming that model M2 holds, the conditional likelihood ratio statistic is given by G2(M1M2) = G2(M1) − G2(M2). The number of df for the conditional test is the difference between the numbers of df for models M1 and M2.

6 Examples

6.1 Example 1

We focus on the contingency table grouping the time scale into ordered categories such as the sleep-onset time. As an example, we used the research data of Marqueze et al. (2015a, 2015b), which was found in the Dryad Digital Repository. We created a square contingency table by grouping the sleep-onset time scale between work days and days-off (Table 1). We used the pair sleep-onset time data of work days and days-off from the original data set, and combined two variables at once. Incidentally, the variable names of the dataset are “Bedtimew” and “Bedtimef”. Then we calculated the first quartile and the third quartile from the combined data to create a square contingency table using these quartiles as the cut points. Namely, we classified the continuous bedtime at three levels: (1) below the first quartile, (2) the first quartile or more but less than the third quartile, and (3) the third quartile or more.

Table 1 Marqueze’s data expressing the bedtime for work days and days-off using three levels: (1) below the first quartile, (2) the first quartile or more but less than the third quartile, and (3) the third quartile or more (Marqueze et al. 2015a, 2015a). Parenthesized values are the MLEs of the expected frequencies under the MCOR model

We shall analyze the data in Table 1 using Corollary 1. The MCOR model fits these data well since G2(MCOR) = 0.73 with 1 df. However, the MH and ME models do not fit these data well since G2(MH) = 68.86 with 2 df and G2(ME) = 58.48 with 1 df.

We shall consider the hypothesis that the MH model holds under the assumption that the MCOR model holds; namely, the hypothesis that Δ = 0 holds. Since G2(MH|MCOR) = G2(MH) − G2(MCOR) = 68.13 with 1 df, we reject this hypothesis at the 0.05 level. This shows Δ≠ 0 in the MCOR model. Therefore, the MCOR model is preferable to the MH model for the data in Table 1. Under the MCOR model, the MLEs of \(\exp \left ({\Delta } \right )\) are \(\exp \left (\hat {\Delta } \right ) = 1.38\). Noting that \({\omega ^{X}_{1}}/ \left (1-{\omega ^{X}_{1}} \right ) = p_{1 \cdot } / \left (p_{2 \cdot } + p_{3 \cdot } \right )\), \({\omega ^{X}_{2}} / \left (1-{\omega ^{X}_{2}} \right ) = p_{2 \cdot } / p_{3 \cdot }\), \({\omega ^{Y}_{1}} / \left (1-{\omega ^{Y}_{1}} \right ) = p_{\cdot 1} / \left (p_{\cdot 2} + p_{\cdot 3} \right )\), and \({\omega ^{Y}_{2}} / \left (1-{\omega ^{Y}_{2}} \right ) = p_{\cdot 2} / p_{\cdot 3}\), we see under the MCOR model that (i) the odds that the sleep-onset time is (1) below the first quartile, instead of (2) or (3), i.e., the first quartile or more, is estimated to be \(\exp \left (\hat {\Delta } \right ) = 1.38\) times higher for work days than for days-off, and (ii) the odds that it is (2) the first quartile or more but less than the third quartile, instead of (3) the third quartile or more is estimated to be 1.38 times higher for work days than for days-off.

Section 7 discusses the interpretation of this results from the viewpoint of time scales.

6.2 Example 2

Consider the data in Table 2, which is obtained from the Meteorological Agency in Japan (Tahata et al. 2008). These are obtained from the daily atmospheric temperatures at Hiroshima, Tokyo, and Sapporo in Japan in 2003 using three levels: (1) low, (2) normal, and (3) high. Variables X1,X2, and X3 mean the temperatures at Hiroshima, Tokyo, and Sapporo, respectively.

Table 2 Daily atmospheric temperatures at Hiroshima, Tokyo, and Sapporo in Japan in 2003, using three levels: (1) low, (2) normal, and (3) high (Tahata et al. 2008). Parenthesized values are the MLEs of the expected frequencies under the MCORT model

We shall analyze the data in Table 2 using Corollary 4. The MCORT model fits these data well since G2(MCORT) = 0.61 with 2 df, whereas the MHT and MET models do not fit these data well since G2(MHT) = 16.80 with 4 df and G2(MET) = 16.39 with 2 df.

We shall consider the hypothesis that the MHT model holds under the assumption that the MCORT model holds; namely, the hypothesis that Δ2 = Δ3 = 0 holds. Since G2(MHT|MCORT) = G2(MHT) − G2(MCORT) = 16.19 with 2 df, we reject this hypothesis at the 0.05 level. Therefore the MCORT model is preferable to the MHT model for these data.

We see from Corollary 4 that the poor fit of the MHT model is caused by the poor fit of the MET model rather than the MCORT model. That is, the mean temperatures at Hiroshima, Tokyo, and Sapporo differ. Under the MCORT model, the MLEs of {\(\exp \left ({\Delta }_{k} \right )\)} are \(\exp \left (\hat {\Delta }_{2} \right ) = 0.90\) and \(\exp \left (\hat {\Delta }_{3} \right ) = 1.33\). Noting that \(\omega ^{(t)}_{1} / \left (1-\omega ^{(t)}_{1} \right ) = p^{(t)}_{1} / \left (p^{(t)}_{2} + p^{(t)}_{3} \right )\) and \(\omega ^{(t)}_{2} / \left (1-\omega ^{(t)}_{2} \right ) = p^{(t)}_{2} / p^{(t)}_{3}\), we see under the MCORT model that the odds that the temperature is (1) Low instead of (2) Normal or (3) High is estimated to be \(\exp \left (\hat {\Delta }_{2} \right ) = 0.90\) times higher in Tokyo than in Hiroshima, and the odds that it is (2) Normal instead of (3) High is estimated to be 0.90 times higher in Tokyo than in Hiroshima. Also we see that the odds that it is (1) Low instead of (2) Normal or (3) High is estimated to be \(\exp \left (\hat {\Delta }_{3} \right ) = 1.33\) times higher in Sapporo than in Hiroshima, and the odds that it is (2) Normal instead of (3) High is estimated to be 1.33 times higher in Sapporo than in Hiroshima.

7 Discussion

7.1 Comparison Between Models

Analyzing the data in Tables 1 and 2, the goodness-of-fits of the MCORT, MCCT, and MCPT models are remarkably different (see Table 3). The MCORT and MCPT models fit both the data in Tables 1 and 2 very well. However, the MCCT model fits the data in Table 2 well, although it does not fit the data in Table 1 well. From above, considering special cases of the GMCT model, the conditional probabilities of the MCORT and MCPT models have a symmetric appearance. However, that of the MCCT model is asymmetric, \(\log (-\log (1-x))\) approaches 0 fairly slowly but approaches 1 quite sharply.

Table 3 Likelihood ratio statistic G2 for models applied to the data in Tables 1 and 2

The MCORT and MCCT models may be useful because the parameter \(\exp \left ({\Delta }_{k} \right )\) of the MCORT model can be interpreted as the marginal continuation odds ratio and the parameter Δk of the MCCT model can be considered as a location shift between the marginal distributions. On the other hand, some models such as the MCPT model make it difficult to interpret the parameter Δk. Therefore, the GMCT model may provide various strictly increasing functions to find the most applicable model to the data but the interpretation of Δk may be difficult. Hence, it is important that an analyst decides what kind of model to employ for data analysis.

7.2 Treating Conditional Probabilities as Discrete Time Hazards

Due to the different viewpoints, the conditional probability \(\omega ^{(t)}_{i} = \text {Pr}(X_{t} = i \mid X_{t} \geq i)\) may be considered as discrete time hazards. That is, it is the conditional probability of experiencing an event in the period i under the condition that has not experienced the event before the period i. Namely, if Xt represents a categorized survival time, the conditional probability represents the probability of survival to time level i given survival at least that long, which is the hazard rate. Hence, the MCORT model can describe hazards functions for grouped survival data, and a certain model using the complementary log-log transformation is also useful for such data. When we consider the MCCT model, we can also consider the ratio of survival functions. For discretely measured survival, let \(S^{(t)}_{i} = 1 - F^{(t)}_{i-1} = \text {Pr}(X_{t} \geq i)\) for i = 1,…,R − 1; t = 1,…,T (Agresti 2010, p.128). Namely, \(S^{(t)}_{i}\) denotes the discrete survival function of Xt. The conditional probabilities \(\omega ^{(t)}_{i}\) can be expressed as

$$ \omega^{(t)}_{i} = \frac{p^{(t)}_{i}}{1 - F^{(t)}_{i-1}} = 1 - \frac{1 - F^{(t)}_{i}}{1 - F^{(t)}_{i-1}} = 1 - \frac{S^{(t)}_{i+1}}{S^{(t)}_{i}}, $$

for i = 1,…,R − 1; t = 1,…,T. Thus, the GMCT model can also be expressed as

$$ h^{-1} \left( 1 - \frac{S^{(k)}_{i+1}}{S^{(k)}_{i}} \right) = h^{-1} \left( 1 - \frac{S^{(1)}_{i+1}}{S^{(1)}_{i}} \right) + {\Delta}_{k}, $$

for i = 1,…,R − 1; k = 2,…,T.

Consider the data in Table 1. Under the MCOR model, we can treat not only the marginal continuation odds ratio but also the discrete time hazards. The marginal continuation odds ratio is estimated to be \(\exp \left (\hat {\Delta } \right ) = 1.38\) (see Example 6.1). Furthermore, the hazard of work days is estimated to be \(\hat {\Delta } = 0.32\) location shift of that of days-off on a logistic scale. Hence, the sleep-onset time for work days tends to be earlier than that for days-off at a constant hazard on a logistic scale.

When an analyst treats the contingency table by grouping the time scale such as studies of survival, the proposed models and decompositions may be useful from the viewpoint of discrete time hazards.