Skip to main content
Log in

Estimation and testing of multiplicative models for frequency data

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

This paper is about models for a vector of probabilities whose elements must have a multiplicative structure and sum to 1 at the same time; in certain applications, like basket analysis, these models may be seen as a constrained version of quasi-independence. After reviewing the basic properties of the models, their geometric features as a curved exponential family are investigated. An improved algorithm for computing maximum likelihood estimates is introduced and new insights are provided on the underlying geometry. The asymptotic distribution of three statistics for hypothesis testing are derived and a small simulation study is presented to investigate the accuracy of asymptotic approximations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Acknowledgements

The author would like to thank A. Klimova and T. Rudas for sharing ideas concerning Relational models and for several very enlightening discussions, A. Salvan for comment on the nature of the curved exponential family and P. Giudici for providing the basked data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonoio Forcina.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Multinomial and Poisson as exponential families

Let \({\varvec{v}}\)\(\sim \) Mn\((n,{\varvec{\pi }})\) where \({\varvec{\pi }}\) has dimension q; a multivariate logistic transform of \({\varvec{\pi }}\) may be defined as \(\log {\varvec{\pi }}\) = \({\varvec{G}}{\varvec{\lambda }}-{\varvec{1}}_q\log [{\varvec{1}}_q^{\prime }\exp ({\varvec{G}}{\varvec{\lambda }})]\), where \({\varvec{\lambda }}\) is a vector of canonical parameters determined by \({\varvec{G}}\), an arbitrary \(q \times (q-1)\) matrix of full rank whose columns do not span the unitary vector. The kernel of the log of the probability distribution may be written as

$$\begin{aligned} {\varvec{v}}^{\prime }{\varvec{\log }}{\varvec{\pi }}= {\varvec{v}}^{\prime }{\varvec{G}}{\varvec{\lambda }}- n\log [{\varvec{1}}_q^{\prime }\exp ({\varvec{G}}{\varvec{\lambda }})], \end{aligned}$$

both \({\varvec{\lambda }}\) and \({\varvec{t}}\) = \({\varvec{G}}^{\prime }{\varvec{v}}\), the vector of sufficient statistics, have size \(q-1\) and \(K({\varvec{\lambda }})\) = \(n\log [{\varvec{1}}^{\prime }\exp ({\varvec{G}}{\varvec{\lambda }})]\).

To derive an explicit expression for \({\varvec{\lambda }}\), let \({\varvec{R}}\) = \({\varvec{I}}_q-{\varvec{1}}_q{\varvec{1}}_q^{\prime }/q\) and

$$\begin{aligned} {\varvec{D}} = ({\varvec{G}}^{\prime }{\varvec{R}} {\varvec{G}})^{-1} {\varvec{G}}^{\prime }{\varvec{R}}; \quad \Rightarrow {\varvec{D}}{\varvec{G}} ={\varvec{I}}_q,\quad {\varvec{D}}{\varvec{1}}_q={\varvec{0}}_q, \end{aligned}$$

then \({\varvec{\lambda }}\) = \({\varvec{D}}\log {\varvec{\pi }}\) ia s vector of \(q-1\) canonical parameters. To see why the coefficient of any linear constraint on canonical parameters must sum to 0, note that \({\varvec{D}}{\varvec{1}}_q\) = \({\varvec{0}}_{q-1}\). To introduce linear restrictions on \({\varvec{\lambda }}\), assume that \({\varvec{G}}\) is partitioned as \(({\varvec{X}}\,\, {\varvec{Z}})\), where \({\varvec{Z}}\) is such that \({\varvec{Z}}^{\prime }{\varvec{R}}{\varvec{X}}\) = \({\varvec{0}}\), let also \({\varvec{H}}\) = \(({\varvec{Z}}^{\prime }{\varvec{R}}{\varvec{Z}})^{-1}{\varvec{Z}}^{\prime }{\varvec{R}}\); now define \({\varvec{\eta }}\) = \({\varvec{H}}\log {\varvec{\pi }}\). Then the model \({\varvec{\lambda }}\) = \({\varvec{X}}{\varvec{\theta }}\) is equivalent to assume that \({\varvec{\eta }}\) = \({\varvec{0}}\).

If, instead, the elements of \({\varvec{v}}\) were distributed as q independent Poisson variables, the kernel of the log of the probability distribution would be

$$\begin{aligned} {\varvec{v}}^{\prime }\log {\varvec{\mu }}-{\varvec{1}}^{\prime }{\varvec{\mu }}= {\varvec{y}}^{\prime }{\varvec{\lambda }}-K({\varvec{\lambda }}), \end{aligned}$$

where \({\varvec{\lambda }}\) = \(\log {\varvec{\mu }}\) and \(K({\varvec{\lambda }})\) = \({\varvec{1}}^{\prime }\exp ({\varvec{\lambda }})\)

1.2 Proof of Lemma 1

Point (i) follows because \({\varvec{\theta }}\in \mathcal{F}({\varvec{X}})\) implies \(-{\varvec{X}}{\varvec{\theta }}>0\). Concerning (ii), let \({\varvec{C}}\) be a matrix whose columns are the generators of \(\mathcal{C}\), then any element in the interior of \(\mathcal{C}\) may be written as \({\varvec{c}}\) = \(x {\varvec{C}}{\varvec{w}}\), where \(x>0\) and the elements of \({\varvec{w}}\) are strictly positive and sum to 1. The derivative of c(x) with respect to x, computed by the chain rule, equals

$$\begin{aligned} d(x) = -\left[ \frac{\exp (-x(-{\varvec{X}}){\varvec{C}}{\varvec{w}})}{{\varvec{1}}^{\prime }\exp (-x(-{\varvec{X}}){\varvec{C}}{\varvec{w}})}\right] ^{\prime }(-{\varvec{X}}){\varvec{C}}{\varvec{w}}. \end{aligned}$$

To prove that d(x) is negative everywhere, note that the expression in square brackets is positive; the fact that the elements of the vector \((-{\varvec{X}}){\varvec{C}}{\varvec{w}}\) are also strictly positive follows from basic results on convex cones: the columns of \({\varvec{X}}^{\prime }\) are the generators of \(\mathcal{C}^0\), the dual cone, where an edge of \(\mathcal{C}^0\) can be orthogonal to, at most, \(k-1\) edges of \(\mathcal{C}\) and forms an obtuse angle with all the others. Because c(x) is continuous, strictly decreasing, positive for x close to 0 and negative for sufficiently large x, the value of x that satisfy (3) must be unique.

1.3 Proof of Lemma 2

To differentiate \(f(\gamma )\) = \(\log [{\varvec{1}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}({\varvec{\gamma }}))]\) note that (4) implies \({\varvec{\tau }}(\gamma )\) = \({\varvec{X}}^{\prime }{\varvec{\pi }}(\gamma )\) = \(\gamma {\varvec{X}}^{\prime }{\varvec{p}}\). By the chain rule

$$\begin{aligned} \frac{\partial f(\gamma )}{\partial \gamma } = \frac{\partial f/\gamma )}{\partial {\varvec{\theta }}(\gamma )^{\prime }} \frac{\partial {\varvec{\theta }}(\gamma )}{\partial {\varvec{\tau }}(\gamma )^{\prime }} \frac{\partial {\varvec{\tau }}(\gamma )}{\partial \gamma } = \frac{\exp ({\varvec{X}}{\varvec{\theta }}(\gamma ))^{\prime }}{{\varvec{1}}^{\prime }\exp ({\varvec{X}} {\varvec{\theta }}(\gamma ))} {\varvec{X}} \frac{\partial {\varvec{\theta }}(\gamma )}{\partial {\varvec{\tau }}(\gamma )^{\prime }}{\varvec{X}}^{\prime }{\varvec{p}}. \end{aligned}$$

The result follows because, by construction, \({\varvec{X}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}(\gamma )) /[{\varvec{1}}^{\prime }\exp ({\varvec{X}}{\varvec{\theta }}(\gamma ))]\) = \({\varvec{\tau }}(\gamma )\) = \(\gamma {\varvec{X}}^{\prime }{\varvec{p}}\) and

$$\begin{aligned} \frac{\partial {\varvec{\theta }}(\gamma )}{\partial {\varvec{\tau }}(\gamma )^{\prime }} = \left( \frac{\partial {\varvec{\tau }}(\gamma )}{\partial {\varvec{\theta }}(\gamma )^{\prime }}\right) ^{-1} = {\varvec{X}}^{\prime }\frac{\partial {\varvec{\pi }}(\gamma )}{\partial ({\varvec{X}}{\varvec{\theta }}(\gamma ))^{\prime }}{\varvec{X}} = {\varvec{F}}(\gamma ). \end{aligned}$$

Differentiation of the function \(g(\gamma )\) is similar, except that, because \({\varvec{\tau }}(\gamma )\) = \({\varvec{s}}/\gamma \), the last component in the derivative is \(-{\varvec{s}}/\gamma ^2\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Forcina, A. Estimation and testing of multiplicative models for frequency data. Metrika 82, 807–822 (2019). https://doi.org/10.1007/s00184-019-00709-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-019-00709-6

Keywords

Navigation