Abstract
This paper is about models for a vector of probabilities whose elements must have a multiplicative structure and sum to 1 at the same time; in certain applications, like basket analysis, these models may be seen as a constrained version of quasi-independence. After reviewing the basic properties of the models, their geometric features as a curved exponential family are investigated. An improved algorithm for computing maximum likelihood estimates is introduced and new insights are provided on the underlying geometry. The asymptotic distribution of three statistics for hypothesis testing are derived and a small simulation study is presented to investigate the accuracy of asymptotic approximations.
Similar content being viewed by others
References
Aitchison J, Silvey SD (1958) Maximum-likelihood estimation of parameters subject to restraints. Ann Math Stat 29:813–828
Aitchison J, Silvey SD (1960) Maximum-likelihood estimation procedures and associated tests of significance. J R Stat Soc Ser B 22:154–171
Barndorff-Nielsen OE (1978) Information and exponential families. Wiley, New York
Courant R (1964) Differential and integral calculus. Blacks and Sons, London
Efron B (1978) The geometry of exponential families. Ann Stat 6:362–376
Evans RJ, Forcina A (2013) Two algorithms for fitting constrained marginal models. Comput Statist Data Anal 66:1–7
Forcina A (2012) Smoothness of conditional independence models for discrete data. J Multivar Anal 106:49–56
Giudici P, Passerone G (2002) Data mining of association structures to model consumer behaviour. Comput Stat Data Anal 38:533–541
Goodman LA (1981) Association models and canonical correlation in the analysis of cross-classifications having ordered categories. J Am Stat Assoc 76(374):320–334
Klimova A, Rudas T (2015) Iterative scaling in curved exponential families. Scand J Stat 42:832–847
Klimova A, Rudas T (2016a) On the closure of relational models. J Multivar Anal 143:440–452
Klimova A, Rudas T (2016b) Testing the fit of relational models. arXiv preprint arXiv:1612.02416
Klimova A, Rudas T, Dobra A (2012) Relational models for contingency tables. J Multivar Anal 104:159–173
Pace L, Salvan A (1997) Principles of statistical inference. World Scientific, Singapore
Acknowledgements
The author would like to thank A. Klimova and T. Rudas for sharing ideas concerning Relational models and for several very enlightening discussions, A. Salvan for comment on the nature of the curved exponential family and P. Giudici for providing the basked data.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Multinomial and Poisson as exponential families
Let \({\varvec{v}}\)\(\sim \) Mn\((n,{\varvec{\pi }})\) where \({\varvec{\pi }}\) has dimension q; a multivariate logistic transform of \({\varvec{\pi }}\) may be defined as \(\log {\varvec{\pi }}\) = \({\varvec{G}}{\varvec{\lambda }}-{\varvec{1}}_q\log [{\varvec{1}}_q^{\prime }\exp ({\varvec{G}}{\varvec{\lambda }})]\), where \({\varvec{\lambda }}\) is a vector of canonical parameters determined by \({\varvec{G}}\), an arbitrary \(q \times (q-1)\) matrix of full rank whose columns do not span the unitary vector. The kernel of the log of the probability distribution may be written as
both \({\varvec{\lambda }}\) and \({\varvec{t}}\) = \({\varvec{G}}^{\prime }{\varvec{v}}\), the vector of sufficient statistics, have size \(q-1\) and \(K({\varvec{\lambda }})\) = \(n\log [{\varvec{1}}^{\prime }\exp ({\varvec{G}}{\varvec{\lambda }})]\).
To derive an explicit expression for \({\varvec{\lambda }}\), let \({\varvec{R}}\) = \({\varvec{I}}_q-{\varvec{1}}_q{\varvec{1}}_q^{\prime }/q\) and
then \({\varvec{\lambda }}\) = \({\varvec{D}}\log {\varvec{\pi }}\) ia s vector of \(q-1\) canonical parameters. To see why the coefficient of any linear constraint on canonical parameters must sum to 0, note that \({\varvec{D}}{\varvec{1}}_q\) = \({\varvec{0}}_{q-1}\). To introduce linear restrictions on \({\varvec{\lambda }}\), assume that \({\varvec{G}}\) is partitioned as \(({\varvec{X}}\,\, {\varvec{Z}})\), where \({\varvec{Z}}\) is such that \({\varvec{Z}}^{\prime }{\varvec{R}}{\varvec{X}}\) = \({\varvec{0}}\), let also \({\varvec{H}}\) = \(({\varvec{Z}}^{\prime }{\varvec{R}}{\varvec{Z}})^{-1}{\varvec{Z}}^{\prime }{\varvec{R}}\); now define \({\varvec{\eta }}\) = \({\varvec{H}}\log {\varvec{\pi }}\). Then the model \({\varvec{\lambda }}\) = \({\varvec{X}}{\varvec{\theta }}\) is equivalent to assume that \({\varvec{\eta }}\) = \({\varvec{0}}\).
If, instead, the elements of \({\varvec{v}}\) were distributed as q independent Poisson variables, the kernel of the log of the probability distribution would be
where \({\varvec{\lambda }}\) = \(\log {\varvec{\mu }}\) and \(K({\varvec{\lambda }})\) = \({\varvec{1}}^{\prime }\exp ({\varvec{\lambda }})\)
1.2 Proof of Lemma 1
Point (i) follows because \({\varvec{\theta }}\in \mathcal{F}({\varvec{X}})\) implies \(-{\varvec{X}}{\varvec{\theta }}>0\). Concerning (ii), let \({\varvec{C}}\) be a matrix whose columns are the generators of \(\mathcal{C}\), then any element in the interior of \(\mathcal{C}\) may be written as \({\varvec{c}}\) = \(x {\varvec{C}}{\varvec{w}}\), where \(x>0\) and the elements of \({\varvec{w}}\) are strictly positive and sum to 1. The derivative of c(x) with respect to x, computed by the chain rule, equals
To prove that d(x) is negative everywhere, note that the expression in square brackets is positive; the fact that the elements of the vector \((-{\varvec{X}}){\varvec{C}}{\varvec{w}}\) are also strictly positive follows from basic results on convex cones: the columns of \({\varvec{X}}^{\prime }\) are the generators of \(\mathcal{C}^0\), the dual cone, where an edge of \(\mathcal{C}^0\) can be orthogonal to, at most, \(k-1\) edges of \(\mathcal{C}\) and forms an obtuse angle with all the others. Because c(x) is continuous, strictly decreasing, positive for x close to 0 and negative for sufficiently large x, the value of x that satisfy (3) must be unique.
1.3 Proof of Lemma 2
To differentiate \(f(\gamma )\) = \(\log [{\varvec{1}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}({\varvec{\gamma }}))]\) note that (4) implies \({\varvec{\tau }}(\gamma )\) = \({\varvec{X}}^{\prime }{\varvec{\pi }}(\gamma )\) = \(\gamma {\varvec{X}}^{\prime }{\varvec{p}}\). By the chain rule
The result follows because, by construction, \({\varvec{X}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}(\gamma )) /[{\varvec{1}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}(\gamma ))]\) = \({\varvec{\tau }}(\gamma )\) = \(\gamma {\varvec{X}}^{\prime }{\varvec{p}}\) and
Differentiation of the function \(g(\gamma )\) is similar, except that, because \({\varvec{\tau }}(\gamma )\) = \({\varvec{s}}/\gamma \), the last component in the derivative is \(-{\varvec{s}}/\gamma ^2\).
Rights and permissions
About this article
Cite this article
Forcina, A. Estimation and testing of multiplicative models for frequency data. Metrika 82, 807–822 (2019). https://doi.org/10.1007/s00184-019-00709-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-019-00709-6