Abstract
The multinomial/categorical responses, whether they are nominal or ordinal, are recorded in counts under all categories/cells involved. The analysis of this type of multinomial data is traditionally done by exploiting the marginal cell probabilities-based likelihood function. As opposed to the nominal setup, the computation of the marginal probabilities is not easy in the ordinal setup. However, as the ordinal responses in practice are interpreted by collapsing multiple categories either to binary data using a single cut-point or to tri-nomial data using two cut-points, most of the studies over the last four decades, first modeled the associated cumulative probabilities using a suitable such as logit, probit, log-log, or complementary log-log link function. Next the marginal cell probabilities were computed by subtraction, in order to construct the desired estimating function such as moment or likelihood function. In this paper we take a new look at this ordinal categorical data analysis problem. As opposed to the existing studies, we first model the ordinal categories using a multinomial logistic marginal approach by pretending that the adjacent categories are nominal, and then construct the cumulative probabilities to develop the final model for ordinal responses. For inferences, we develop the cut points based likelihood or generalized quasi-likelihood (GQL) estimating functions for the purpose of the estimation of the underlying regression parameters. The new GQL estimation approach is developed in details by utilizing both tri-nomial and binary (or binomial) collapsed structures. The likelihood analysis is also discussed. A data example is given to illustrate the proposed models and the estimation methodologies. Furthermore, we also examine the asymptotic properties of the likelihood and GQL estimators for the regression parameters for both tri-nomial and binary types of cumulative response based models.
Similar content being viewed by others
References
Agresti, A. (2010). Analysis of Ordinal Categorical Data Analysis, Second Edition. Wiley, New York.
Conaway, M. R. (1989). Analysis of repeated categorical measurements with conditional likelihood methods. J. Am. Stat. Assoc.84, 53–62.
Crouchley, R. (1995). A random-effects model for ordered categorical data. J. Am. Stat. Assoc.90, 489–498.
Fienberg, S. F., Bromet, E. J., Follmann, D., Lambert, D and May, S. M. (1985). Longitudinal analysis of categorical epidemiological data: a study of three mile island. Environ. Health Perspect.63, 241–248.
Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series. Statistical Science18, 357–376.
Fokianos, K. and Kedem, B. (2004). Partial likelihood inference for time series following generalized linear models. Journal of Time Series Analysis25, 173–197.
Harville, D. and Mee, R. W. (1984). A mixed-model procedure for analyzing ordered categorical data. Biometrics40, 393–408.
Loredo-Osti, J. C. and Sutradhar, B. C (2012). Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal.33, 458–467.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press Inc, London.
Mccullagh, P. (1980). Regression models for ordinal data. J. R. Statist. Soc. B42, 109–142.
Sutradhar, B. C. (2003). An overview on Regression Models for discrete Longitudinal responses. Stat. Sci.18, 377–393.
Sutradhar, B. C. (2014). Longitudinal Categorical Data Analysis. Springer, New York.
Tutz, G. and Hennevog, W. (1996). Random effects in ordinal regression models. Computational Statistics and Data Analysis22, 537–557.
Acknowledgments
This research was partially supported by a grant from the Natural Sciences and the Engineering Research Council of Canada. The authors would like to thank the Editor, Associate Editor, and a referee for their comments and suggestions leading to the improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: GQL Estimation Details for Both Tri-nomial and Binary Types Ordinal Models
1.1 A1. GQL Estimation aids for tri-nomial type ordinal model
In (3.3), the observations S[ℓ]j for j = 1 and j = J are scalar, and their means and variances are given by
and
respectively. However, for j = 2,…,J − 1, the two components of the trinomial observation, namely C[ℓ]j and K[ℓ]j are correlated. More specifically, cov[C[ℓ]j,K[ℓ]j] = −K[ℓ]F[ℓ](j− 1)π[ℓ]j. Hence, for j = 2,…,J − 1, we obtain
For the computational purpose, the GQL estimating equation in (3.3) may then be written as
which may be solved by using the iterative formula
until convergence.
1.2 A2. GQL Estimation aids for binary type ordinal model
Write the mean and covariance matrix of \(S^{*}_{i}\) (defined in (3.7)) as
Notice that the formulas for \(\pi ^{*}_{ij}\) and Vijj are already given in (3.6). Now, for two different cut points j < k, the covariance elements Vijk in (6) have the formulas
1.3 A3. How to write \(S^{*}_{[\ell ]j}\) in practice ?
Once again the elements in the \(S^{*}_{[\ell ]g}\) vector are either 0 or 1. By following the definition of the cumulative response given in (3.4), we write the observation vectors for g = 1,…,J, as
Appendix B: GQL Asymptotics for Both CTN and CBN Ordinal Models
2.1 B1. GQL asymptotics for the CTN ordinal model
For convenience, we re-express the GQL estimating equation in (5.4) as
where
We now exploit (b.1) and for true β define
where for a given value of \(\ell , h_{1},\ldots ,h_{i},\ldots ,h_{K_{\ell }}\) are independent to each other as they are collected from K[ℓ] independent individuals, and they are also identically distributed because
where for a given ℓ, the mean vectors and covariance matrices remain the same for individuals. Furthermore, by (b.4), it follows from (b.3) that
By (b.4) and (b.5), it then follows from the well known multivariate central limit theorem (see Mardia, Kent and Bibby (1979, p. 51), for example) that as min{K[ℓ];ℓ = 1,…,p + 1}→∞, the limiting distribution of \(Z_{K}=[V^{*}_{K}]^{-\frac {1}{2}}\bar {G}_{K,1}({\beta })\), say
is multivariate normal. More specifically,
Now because \(\hat {{\beta }}_{GQL}\) obtained from (5.4) or (b.1), is a solution of \(G_{1}(\hat {{\beta }}_{GQL})\) = 0, one may use (b.3) and solve
which by first order Taylor’s series expansion produces
That is,
It then follows by (b.6) that
yielding the normal distribution given in (5.5).
2.2 B2. GQL Asymptotics for the CBN ordinal model
In order to demonstrate the limiting distribution in (5.8) (under Section 5.2), we begin with \(E[S^{*}_{i}]=\pi ^{*}_{i}({\beta })\) and \(\text {cov}[S^{*}_{i}]=V^{*}_{i}({\beta })\) by (3.8), and also because \(S^{*}_{i}\)s are independent, one obtains
Next by similar calculations as in (b.9), one writes
where
by using the multivariate central limit theorem as in (b.6). The limiting distributional result in (5.8) follows by by applying (b.13)-(b.12).
Appendix C: Likelihood Asymptotics for the CBN Ordinal Model: An Illustration
As indicated in Section 2.3 we demonstrate here that the likelihood regression estimators, for example, under the CBN model are consistent for the regression parameters and they asymptotically follow the Gaussian distribution with suitable mean vector and covariance matrix.
For the purpose, we re-express the likelihood estimating equation (2.17) for β as
We now perform the last summation \({\sum }^{J-1}_{j = 1}\left [\cdot \right ]\) in (c.1) as follows. Use the elements shown in (⋅) for j = 1,…,J − 1, and construct the (J − 1) × 1 vector
Next write
Combining (c.2), (c.3) and (c.5), the summation \({\sum }^{J-1}_{j = 1}\left [\cdot \right ]\) in (c.1) may be expressed as
Consequently, the complete likelihood estimating equation in (c.1) reduces to
However, instead of solving (c.7), it is convenient to solve
Notice that in (c.8), E[di∈ℓ(ci)] = 0 and cov[di∈ℓ(ci)] = Pi∈ℓ. Also all K[ℓ] individuals are independent and covariate groups are mutually exclusive. It then follows that
Suppose that we denote the solution of the likelihood equation (c.8) by \(\hat {{\beta }}_{MLE}\). By applying the multivariate central limit theorem and following the calculations as in the Appendix B.1 (see Eqns. (b.6)-(b.10)), one obtains the limiting distribution of \(\hat {{\beta }}_{MLE}\) as
showing that \(\hat {{\beta }}_{MLE}\) is consistent for β.
Rights and permissions
About this article
Cite this article
Sutradhar, B.C., Variyath, A.M. A New Look at the Models for Ordinal Categorical Data Analysis. Sankhya B 82, 111–141 (2020). https://doi.org/10.1007/s13571-018-0180-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-018-0180-3
Keywords and phrases
- Binary mapping
- Cumulative response and probability
- Cut points
- Generalized quasi-likelihood inference
- Likelihood
- Linear versus non-linear logits
- Marginal multinomial model
- Ordinal categories