Abstract
In statistical practice model building, sensitivity and uncertainty are major concerns of the analyst. This paper looks at these issues from an information geometric point of view. Here, we define sensitivity to mean understanding how inference about a problem of interest changes with perturbations of the model. In particular it is an example of what we call computational information geometry. The embedding of simple models in much larger information geometric spaces is shown to illuminate these critically important issues.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
F. Critchley—This work has been partly funded by EPSRC grant EP/L010429/1.
P. Marriott—This work has been partly funded by NSERC discovery grant ‘Computational Information Geometry and Model Uncertainty’.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Altham, P. M. (1978). Two generalizations of the binomial distribution. Applied Statistics, 27(2), 162–167.
Amari, S.-I. (1985). Differential-geometrical methods in statistics. New York: Springer.
Anaya-Izquierdo, K., Critchley, F., & Marriott, P. (2014). When are first-order asymptotics adequate? a diagnostic. Stat, 3(1), 17–22.
Anaya-Izquierdo, K., Critchley, F., Marriott, P., & Vos, P. (2013). Computational information geometry in statistics: Foundations. Geometric science of information (pp. 311–318). New York: Springer.
Barndorff-Nielsen, O. (1976). Factorization of likelihood functions for full exponential families. Journal of the Royal Statistical Society. Series B (Methodological), 38(1), 37–44.
Barndorff-Nielsen, O. (1978). Information and exponential families in statistical theory. New Jersey: Wiley.
Barndorff-Nielsen, O., & Blaesild, P. (1983). Exponential models with affine dual foliations. Annals of Statistics, 11(3), 753–769.
Barndorff-Nielsen, O., & Koudou, A. (1995). Cuts in natural exponential families. Theory of Probability and Its Applications, 40, 220–229.
Box, G. (1976). Science and statistics. Journal of the Acoustical Society of America, 71, 791–799.
Box, G. (1980). Sampling and Bayes’ inference in scientific modelling and robustness (with discussion). Journal of Reliability and Statistical Studies, B 143, 383–430.
Brown, L. (1986). Fundamentals of statistical exponential families: With applications in statistical decision theory. Hayward: Institute of Mathematical Statistics.
Christensen, B. J., & Kiefer, N. M. (1994). Local cuts and separate inference. Scandinavian Journal of Statistics, 21(4), 389–401.
Christensen, B. J., & Kiefer, N. M. (2000). Panel data, local cuts and orthogeodesic models. Bernoulli, 6(4), 667–678.
Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, Methodological, 48, 133–155.
Cox, D. (1986). Comment on ‘Assessment of local influence’ by R. D. Cook. Journal of the Royal Statistical Society. Series B (Methodological), 133–169.
Cox, D., & Reid, N. (1987). Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society, Series B: Methodological, 49, 1–18.
Critchley, F., & Marriott, P. (2004). Data-informed influence analysis. Biometrika, 91, 125–140.
Critchley, F., & Marriott, P. (2014a). Computational information geometry in statistics: Theory and practice. Entropy, 16(5), 2454–2471.
Critchley, F., & Marriott, P. (2014b). Computing with fisher geodesics and extended exponential families. Statistics and Computing, 1–8.
Csiszar, I., & Matus, F. (2005). Closures of exponential families. The Annals of Probability, 33(2), 582–600.
Efron, B. (1986). Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association, 81(395), 709–721.
Fukuda, K. (2004). From the zonotope construction to the Minkowski addition of convex polytopes. Journal of Symbolic Computation, 38, 1261–1272.
Geyer, C. J. (2009). Likelihood inference in exponential families and directions of recession. Electronic Journal of Statistics, 3, 259–289.
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34(1), 1–14.
Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249.
Rinaldo, A., Fienberg, S. E., & Zhou, Y. (2009). On the geometry of discrete exponential families with applications to exponential random graph models. Electronic Journal of Statistics, 3, 446–484.
Tuy, H. (1998). Convex analysis and global optimization. London: Klumer academic publishers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: The Model Space, Cuts and Closures
Model Space
A key concept in building the perturbation space is to first represent statistical models – sample spaces, together with probability distributions on them – and associated inference problems, inside adequately large but finite dimensional spaces, see Critchley and Marriott (2014a) for details. Consider the general k–dimensional extended multinomial model
The multinomial family on \(k+1\) categories can be identified with the (relative) interior of this space, \(int(\varDelta ^k)\), while the extended family, (11), allows the possibility of distributions with different support sets. This paper looks at (extended) exponential families embedded in \(\varDelta ^k\) and uses the following notation.
Definition 8
Let \( {\pi }^{0}=(\pi _{i}^{0})\in int (\varDelta ^{k})\), and V be a \((k+1) \times p\) matrix of the form \(( {v}^{(1)}| \dots | {v}^{(p)})=( {v}_{0}| \dots | {v}_{k})^{T}\) with linearly independent columns and chosen such that \(1_{k+1}:= (1, \dots , 1)^T\) \(\notin \mathrm{Range}(V)\). With these definitions there exists a p-dimensional full exponential family in \(\varDelta ^{k}\), denoted by \({\pi }( {\phi })= {\pi }_{( {\pi }^{0},V)}( {\phi })\) with general element:
\(i = 0, \dots , k\) with normalising constant
for all \({\phi }\in \mathbb {R}^{p}\).
Using this formalism selecting a one dimensional model to undertake inference about \(\mu = E(V)\), as in Examples (1) and (2), requires selecting a sufficient statistic V and a basepoint \(\pi ^0\). Initially we concentrate on the case where the choice of model contributes the minimal amount of information to the inference problem. We call these least informative models.
Definition 9
(Least informative model) Let X be the random variable over the \(k+1\) categories of \(\varDelta ^{k}\) which takes values \(x_i\) in category i. The model \(\pi (\phi )= \pi _{(\pi ^0, V)}(\phi )\) is a one dimensional least informative model for the estimation of E(X) when V is \((k+1)\times 1\) and \(v^{(1)} \propto (x_i)\).
Both the models considered in Examples (1) and (2) are least informative for the parameter of interest. Choices between different least informative models then correspond to selecting different base measures \(\pi ^0 \in \varDelta ^{k}\). We can think of these geometrically as translations of exponential families in the affine geometry defined by the natural parameters.
Closures of Exponential Families
In this section we consider the closure of discrete p-dimensional exponential families which are subsets of \(\varDelta ^k\) For more general results on closures of exponential families see Barndorff-Nielsen (1978), Brown (1986), Lauritzen (1996) and Csiszar and Matus (2005). In the discrete case considered here, we can understand boundary behaviour in extended exponential families by considering the polar dual (Critchley and Marriott 2014b) or alternatively the directions of recession, Geyer (2009), Rinaldo et al. (2009) and described in detail in Anaya-Izquierdo et al. (2014).
We want to consider the limit points of the p-dimensional exponential family, so we consider the limiting behaviour of the path \(\phi (\lambda ) := \lambda q\) as \(\lambda \rightarrow \infty \) where \(q \in {\mathbb R}^p\), and \(\Vert q\Vert =1\). The support of the limiting distribution is determined by the maximal elements of the set
where \(s_i :=\left( S_0(i), \dots , S_p(i) \right) ^T\). There exist a correspondence between the limiting behaviour of exponential families in a certain direction – the direction of recession – and the set of normals to faces of a convex polygon, the polar dual, Tuy (1998).
Appendix 2: Empirical Likelihood for the Mean Parameter in a Multinomial Setting
Let T be a discrete random variable with \(k+1\) values \(\{t_0,\ldots ,t_k\}\) so that the probability mass function is \(P[T=t_i]=\pi _i\) for \(i=0,1,\dots ,k,\) where \(\sum _{i=0}^{k}\pi _i=1\) and \(\pi _i\ge 0\). The distribution of T depends on k free parameters and we are interested in making inferences about the expectation parameter
in the presence of the other \(k-1\) nuisance parameters.
Theorem 4
For a given random sample of size N from T, let \(t_-\) be the minimum observed value of T and \(t_+\) be the maximum observed value of T, and we work in the generic case where all \(t_i\)’s are distinct. Then for any \(\phi \in (t_-,t_+)\) the profile likelihood for the mean parameter \(\phi \) is given by
Here, \(n_j\) is the number of times that \(t_j\) appears in the sample so that \(N=\sum _{i=0}^k n_i\) and \(\hat{\delta }_\phi \) is the unique solution to the equation
in the interval \(\left( \frac{N}{t_--\phi },\frac{N}{t_+-\phi }\right) \).
Proof
The empirical (profile) likelihood for \(\phi \) can be found by solving the following optimization problem
where, we recall, \(\mathcal P=\{i \,:\, n_i>0\}\) and \(\mathcal Z=\{i \,:\, n_i=0\}\). Since the \(t_i\)’s are distinct and we can also assume without loss that \(\pi _i>0\) for \(i\in \mathcal P\) because otherwise \(\ell =-\infty \). The Lagrangian is given by
and the key turning point equations are given by
which give the solutions
with \(\hat{\delta }_\phi \) defined as the solution \(H_\phi (\delta )=0\) where
Calculations show that
giving
so that \(H_{\phi }(\delta )\) is a strictly increasing function. Also
so that \(H_\phi (\delta )=0\) has a unique solution in the interval \((\delta _{min},\delta _{max})\).
Appendix 3: Sensitive Infinitesimal Perturbations
We proceed from the minimal exponential family representation of the multinomial for the observed counts \(n=(n_1,\ldots ,n_k)^T\)
where the relation with the probability parameter \(\pi \) is given by \(\eta _i(\pi )=\log \left( \frac{\pi _i}{1-\sum _{r=1}^k \pi _r}\right) \), \(\pi _i(\eta )=\frac{e^{\eta _i}}{1+\sum _{i=1}^{k}e^{\eta _i}}\) for \(i=1,\ldots ,k\), \(\varphi (\eta )=N\log (1+\sum _{i=1}^k e^{\eta _i}),\) and h(n) is the multinomial coefficient.
We define the following coordinate system in \(\mathcal N,\) the natural parameter space. Consider a fixed point \(\eta _0\in \mathbb R^k\) and \(d^T:=(t_1-t_0,\ldots ,t_k-t_0)/N\). Let \(\{v_1,\ldots ,v_{k-1}\}\) be an orthogonal basis for the orthogonal complement of d. If we take \( A=\left( d,v_1\ldots ,v_{k-1}\right) , \) then for any \(\eta \in \mathbb R^k\) we can write \( \eta =\eta _0+A\phi \) for some \(\phi \in \mathbb R ^k\). So \(\phi \) defines a new parameterisation for the multinomial. By defining \(s:=A^T n+c\) with \(c^T=(t_0,0,\ldots ,0)\) we have
where
This is of course, the same regular natural exponential family but now with natural parameter \(\phi \) and expectation parameter
We are interested in making inferences about \( \mu _1=E[s_1]=\sum _{i=0}^k t_i\,\pi _i =\phi .\)
According to the variance Condition 2 in Theorem 1: \(s_1=n^T d+t_0\) is an exact cut for the regular exponential family
if and only if its variance depends only on \(\mu _1\). If such exact cut exists, we can then make exact marginal inferences for \(\mu _1\) using the marginal distribution of \(s_1\) given by
for some real valued functions \(h^*\) and \(\psi \). We define
and then we have
so we can check how much this vary as a function of \(\mu _{(1)}\). For any fixed \(\mu _1^0\) we would like to explore the variation of \(V(s_1;\mu )\) in the subspace of densities given by \(\mu _1=\mu _1^0\). We would like to find a direction in such space such that \(Var(s_1;\mu )\) changes the most.
We define the following inner products for \(u,v\in \mathbb R^k\)
and orthogonal projections matrices are
If \(\omega \) is such that \(\omega _1=0\) and \(\mu _0=\mu (\phi )\) with \(\phi =0\) then
so the directional derivative at \(\mu _0\) along the vector \(\omega \) is given by \( \langle \omega , I(\phi _0) A^{-1}d^{(2)}\rangle _{\mu _0}. \) To explore the variation of \(Var(s_1;\mu _0+\lambda \,\omega )\) we define the following optimisation problem
where \(e_1^T=(1,0,\ldots ,0)\).
The solution is given by \(\hat{\omega }=\hat{u}/\Vert \hat{u}\Vert _{\mu _0}\) where
that is, the normalised projection of \(I(\phi _0)A^{-1}d^{(2)}\) orthogonal to \(I(\phi _0)A^{-1}d \) in the metric \(I(\mu _0)\). Note that \(A^{-1}d=e_1\) and also \(\Vert \hat{u}\Vert _{\mu _0}=\Vert P^\perp _{\eta _0}(d^{(2)};d)\Vert _{\eta _0}\). We can write \(\hat{\omega }\) as
The objective function evaluated at the maximum is
This has a nice interpretation. If we take \(\eta _0=\eta (\hat{\pi }_{Global})=\eta (n/N)\) we have
then it can be interpreted as \(\Vert d\Vert ^2_{\eta _0}\) times the \(+1\) curvature of the profile likelihood curve for \(\phi \) at \(\phi =\hat{\phi }_{Global}\). The profile likelihood curve defines a curved exponential family embedded in the multinomial. We have
so the \(+1\) embedding curvature of the profile likelihood curve at \(\phi =\hat{\phi }\) is given by
The solution \(\hat{\omega }\) determines a direction in the \(-1\) space of the exponential family \(\mathcal F\). If variation in this direction is small we can consider \(s_1\) as an approximate cut.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Anaya-Izquierdo, K., Critchley, F., Marriott, P., Vos, P. (2017). Towards the Geometry of Model Sensitivity: An Illustration. In: Nielsen, F., Critchley, F., Dodson, C. (eds) Computational Information Geometry. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-47058-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-47058-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47056-6
Online ISBN: 978-3-319-47058-0
eBook Packages: EngineeringEngineering (R0)