Abstract
This paper introduces a statistical procedure, to be applied after a goodness-of-fit test has rejected a null model, that provides diagnostic information to help the user decide on a better model. The procedure goes through a list of departures, each being tested by a local smooth test. The list is organized into a hierarchy by seeking answers to the questions “Where is the problem?” and “What is the problem there?”. This hierarchy allows to focus on finer departures as the data becomes more abundant. The procedure controls the family-wise Type 1 error rate. Simulations show that the procedure can succeed in providing useful diagnostic information.
Similar content being viewed by others
References
Ehm W, Kornmeier J, Heinrich SP (2010) Multiple testing along a tree. Electron J Stat 4:462–471
Finner H, Strassburger K (2002) The partitioning principle: a powerful tool in multiple decision theory. Ann Stat 30:1194–1213
Goeman JJ, Finos L (2012) The inheritance procedure: multiple testing of tree-structured hypotheses. Stat Appl Genet Mol Biol 2(11):1–18
Henze N, Klar B (1996) Properly rescaled components of smooth tests of fit are diagnostic. Aust J Stat 38:61–74
Henze N (1997) Do components of smooth tests of fit have diagnostic properties? Metrika 45:121–130
Inglot T, Kellenberg WC, Ledwina T (1994) Power approximations to and power comparison of smooth goodness-of-fit tests. Scand J Stat 21:131–145
Klar B (2000) Diagnostic smooth tests of fit. Metrika 52:237–252
Komlos J, Major P, Tusnady G (1976) An approximation of partial sums of independent RV’s and the sample df. II. Z Wahrscheinlichkeitstheorie Verwandte Geb 34:33–58
Meinshausen N (2008) Hierarchical testing of variable importance. Biometrika 95:265–278
Pollard D (1979) General Chi-square goodness-of-fit test with data-dependent cells. Z Wahrscheinlichkeitstheorie Verwandte Geb 50:317–331
Rayner JCW, Best DJ (1989) Smooth tests of goodness of fit. Oxford University Press, New York
von Eye A, Bogat GA (2004) Testing the assumption of multivariate normality. Psychol Sci 46:243–258
Acknowledgments
The authors wish to thank the referees for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of theorems
Appendix: Proof of theorems
We prove Theorem 1and concentrate on the more complex \({\mathcal {R}}_{M}^{{\hat{Q}}}\). To avoid trivial problems, assume that \({\hat{b}}-{\hat{a}}>0\). With \({\hat{X}}_{i}^{*}=F_{0}^{{\hat{Q}}}(X_{i})\) and \(X_{i}^{*}=F_{0}^{Q}(X_{i})\), consider
Now, \(L_{m}^{*}({\hat{X}}_{i}^{*})-L_{m}^{*}(X_{i}^{*})=C(X_{i})+D(X_{i}),\) where \(C(x)={\mathbb {I}}\{ x\in Q\} [L_{m}(F_{0}^{{\hat{Q}}}(x))-L_{m}(F_{0}^{Q}(x))]\) and \(D(x)=L_{m}(F_{0}^{{\hat{Q}}}(x))[{\mathbb {I}}\{ x\in {\hat{Q}}\} -{\mathbb {I}}\{ x\in Q\}].\) Hence,
Consider the second term on the right hand side of this equation. \(F_{0}^{Q}(x)\) is differentiable in a, b except at \(a=x\) et \(b=x\). Because \(X_{i}\ne a,b\) with probability 1, we can Taylor expand \(L_{m}(F_{0}^{{\hat{Q}}}(x))-L_{m}(F_{0}^{Q}(x))\) about (a, b) for \(x\in (a,b)\) to get
By the law of large numbers, the term in bracket converges to its expectation. Because \({\mathbb {E}}_{H_{0}^{Q}}({\mathbb {I}}\{ X_{i}\in Q\} L_{m}(F_{0}^{Q}(X_{i})))=0\), Leibniz’s integral rule yields
Hence
Now, regarding the term involving the \(D(X_{i})\) in (7.1), it is easy to see that
where \(\langle a,{\hat{a}}\rangle = (\min (a,{\hat{a}}),\max (a,{\hat{a}}))\) and similarly for \(\langle b,{\hat{b}}\rangle \). Moreover
where \(\alpha _{n}(\cdot )\) is the stochastic process \(\sqrt{n}({\hat{F}}_{n}(\cdot )-\cdot )\). From the Komlos et al. (1976) strong approximation, there exists a sequence of Brownian bridges \(B_{n}(\cdot )\) uniformly approximating \(\alpha _{n}(\cdot )\) to the order \(O\left( \frac{\log n}{\sqrt{n}}\right) \). Thus
The term \(B_{n}(a)-B_{n}(\hat{a)}=o_{p}(1)\) by the continuity of Brownian bridges. Hence,
Regrouping, we get:
Going back to (7.1), we find after combining (7.2) and (7.3)
Finally, as a byproduct of the above, \(\frac{N^{{\hat{Q}}}}{n}=\frac{N^{Q}}{n}+o_{p}(1)=F(Q)+o_{p}(1)\) because \(N^{Q}\) has the binomial distribution B(n, F(Q)). Hence the asymptotic behavior of the \({\mathcal {L}}_{m}^{{\hat{Q}}}\) are the same as that of the \({\mathcal {L}}_{m}^{Q}\) which are easily shown to independent \(\chi _{1}^{2}\) . \(\square \)
Next, we prove Theorem 2. Suppose for simplicity that \(Q_{S}\) are single intervals. We consider the more complex case where \(Q_{S}\) are estimated by \({\hat{Q}}_{S}\). Define the adjusted p value as \(\pi _{S}^{(adj)}=\frac{1}{{\mathbb {P}}_{H_{0}}[{\hat{Q}}_{S}]}\pi _{S}\), where \(\pi _{S}\) pertains to \({\mathcal {R}}_{M}^{\hat{_{Q}}_{S}}\). Let \({\mathcal {T}}_{0}\) be as in Theorem 2. Define the hierarchical adjusted p value as
Let \({\mathcal {T}}_{rej}=\{ S\in {\mathcal {T}};H_{0}^{Q_{S}} \text{ is } \text{ rejected } \text{ by } \text{ the } \text{ rule: } \pi _{S}^{(h,adj)}<\alpha \}\). It is easy to see that the null hypotheses rejected using the hierarchical adjusted p value coincide with those rejected using the \(\pi _{S}^{(adj)}\). In particular, no null hypothesis gets rejected if its parent has not been rejected because \(\pi _{S}^{(h,adj)}\ge \pi _{pa(S)}^{(h,adj)}\). The probability of a family-wise Type 1 error can be written as
Let \(\widetilde{{\mathcal {T}}}_{0}\) be a subset of \({\mathcal {T}}\) maximal in the sense that \(\widetilde{{\mathcal {T}}}_{0}:=\{ S\in {\mathcal {T}}_{0}: \not \exists C\in {\mathcal {T}}_{0}\text { with }S\subset C\}\). Obviously \(\widetilde{{\mathcal {T}}}_{0}\subseteq {\mathcal {T}}_{0}\). Also the definition of \(\pi _{S}^{(h,adj)}\) implies that a falsely rejected \(S\in {\mathcal {T}}_{0}-\widetilde{{\mathcal {T}}}_{0}\), implies a falsely rejected \(S^{\prime }\in \widetilde{{\mathcal {T}}}_{0}\), where \(S\subset S^{\prime }\). Thus, we need only to look at the probability of committing a Type 1 error in \(\widetilde{{\mathcal {T}}}_{0}\). But because \(\pi _{S}^{(h,adj)}\geqslant \pi _{S}^{(adj)}\),
by Bonferroni’s inequality. Notice that \({\mathbb {P}}_{H_{0}}[{\hat{Q}}_{S}] ={\mathbb {E}}_{H_{0}}({\hat{b}}-{\hat{a}}) = {\mathbb {P}}_{H_{0}}[Q_{S}]+o(1)\). Now writing \(G_{n}^{S}(\cdot )\) and \(G_{\infty }^{S}(\cdot )\) for the exact and asymptotic CDF under \(\tilde{{\mathcal {T}}}_{0}\) of test statistic \({\mathcal {R}}_{M}^{\hat{Q_{S}}}\), we have
Hence, \(\sum \nolimits _{S\in \widetilde{{\mathcal {T}}}_{0}}{\mathbb {P}}_{{\mathcal {T}}_{0}}[\pi _{S}^{(adj)}<\alpha ] <\alpha \sum \nolimits _{S\in \widetilde{{\mathcal {T}}}_{0}}{\mathbb {P}}_{H_{0}}[Q_{S}]+o(1).\) It only remains to show that \(\sum _{S\in \widetilde{{\mathcal {T}}}_{0}}{\mathbb {P}}_{H_{0}}[Q_{S}] \leqslant 1\). But by the construction of \(\widetilde{{\mathcal {T}}}_{0}\), \(\forall S\ne S^{\prime }\in \widetilde{{\mathcal {T}}}_{0}: S\cap S^{\prime }=\emptyset \). Hence \(\bigcup _{S\in \widetilde{{\mathcal {T}}}_{0}}S\subseteq \{ 1,\ldots ,K\}\). Because \({\mathbb {P}}_{H_{0}}[Q_{S}]={\mathbb {P}}_{H_{0}}[X\in \bigcup _{k\in S}P_{k}]\), we have
and this completes the proof. \(\square \)
Finally, we prove Theorem 3 with given \(Q_{S}\) for simplicity. From the above, it suffices to show that \(\sum \nolimits _{S\in \tilde{{\mathcal {T}}}_{0}}{\mathbb {P}}_{H_{0}}^{eff}[Q_{S}] \le 1\). For simplicity, assume there exists only one \(S^{*}\in \widetilde{{\mathcal {T}}}_{0}\) such that \({\mathbb {P}}_{H_{0}}^{eff}[Q_{S^{*}}]>{\mathbb {P}}_{H_{0}}[Q_{S^{*}}]\). Because the tree is binary,
Now, identifiability along with the relation \({\mathbb {P}}_{H_{0}}^{\text {eff }}[S^{*}]>{\mathbb {P}}_{H_{0}}[S^{*}]\) imply that \(si(S^{*})\notin \widetilde{{\mathcal {T}}}_{0}\), for otherwise \(pa(S^{*})\in \widetilde{{\mathcal {T}}}_{0}\), which in turn implies \(S^{*}\notin \widetilde{{\mathcal {T}}}_{0}\). The conclusion follows from
Rights and permissions
About this article
Cite this article
Ducharme, G.R., Al Akhras, W. Tree based diagnostic procedures following a smooth test of goodness-of-fit. Metrika 79, 971–989 (2016). https://doi.org/10.1007/s00184-016-0585-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-016-0585-9