Abstract
We focus on valid definitions of p-values. A valid p-value (VpV) statistic can be used to make a prefixed level-\( \alpha \) decision. In this context, Kolmogorov–Smirnov goodness-of-fit tests and the normal two-sample problem are considered. We examine an issue regarding the goodness-of-fit testability based on a single observation. We exemplify constructions of new test procedures, advocating practical reasons to implement VpV mechanisms. The VpV framework induces an extension of the conventional expected p-value (EPV) tool for measuring the performance of a test. Associating the EPV concept with the receiver operating characteristic (ROC) curve methodology, a well-established biostatistical approach, we propose a Youden’s index-based optimality to derive critical values of tests. In these terms, the significance level \( \alpha = 0.05 \) is suggested. We introduce partial EPV’s to characterize properties of tests including their unbiasedness. We provide the intrinsic relationship between the Bayes Factor (BF) test statistic and the BF of test statistics.
Similar content being viewed by others
References
Bayarri, M. J., Berger, J. O. (2000). P values for composite null models. Journal of the American Statistical Association, 95, 1127–1142.
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., et al. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10. https://doi.org/10.1038/s41562-017-0189-z.
Berger, R. L., Boos, D. D. (1994). P values maximized over a confidence set for the Nuance parameter. Journal of the American Statistical Association, 89, 1012–1016.
Dempster, A. P., Schatzoff, M. (1965). Expected significance level as a sensitivity index for test statistics. Journal of the American Statistical Association, 60, 420–436.
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A, 222, 309–368.
Henze, N., Meintanis, S. G. (2005). Recent and classical tests for exponentiality: A partial review with comparisons. Metrika, 61, 29–45.
Ionides, E. L., Giessing, A., Ritov, Y., Page, S. E. (2017). Response to the ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 71(1), 88–89. https://doi.org/10.1080/00031305.2016.1234977.
Lazar, N. A., Mykland, P. A. (1998). An evaluation of the power and conditionality properties of empirical likelihood. Biometrika, 85(3), 523–534.
Lehmann, E. L., Romano, J. P. (2006). Testing statistical hypotheses. New York: Springer.
Portnoy, S. (2019). Invariance, optimality and a 1-observation confidence interval for a normal mean. The American Statistician, 73, 10–15. https://doi.org/10.1080/00031305.2017.1360796.
R Development Core Team. (2002). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2002. http://www.R-project.org.
Sackrowitz, H., Samuel-Cahn, E. (1999). P values as random variables-expected p values. The American Statistician, 53, 326–331.
Schisterman, E. F., Vexler, A. (2008). To pool or not to pool, from whether to when: Applications of pooling to biospecimens subject to a limit of detection. Paediatric and Perinatal Epidemiology, 22, 486–496.
Schisterman, E. F., Perkins, N. J., Liu, A., Bondell, H. (2005). Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology, 16, 73–81.
Schisterman, E. F., Vexler, A., Ye, A., Perkins, N. J. (2011). A combined efficient design for biomarker data subject to a limit of detection due to measuring instrument sensitivity. The Annals of Applied Statistics, 5, 2651–2667.
Silvapulle, M. L. (1996). A test in the presence of nuisance parameters. Journal of the American Statistical Association, 91, 1690–1693.
Vexler, A., Hutson, A. D. (2018). Statistics in the health sciences: Theory, applications, and computing. New York: CRC Press.
Vexler, A., Yu, J. (2018). To t-test or not to t-test? A p-values-based point of view in the receiver operating characteristic curve framework. Journal of Computational Biology, 25, 541–550. https://doi.org/10.1089/cmb.2017.0216.
Vexler, A., Schisterman, E. F., Liu, A. (2008). Estimation of ROC based on stably distributed biomarkers subject to measurement error and pooling mixtures. Statistics in Medicine, 27, 280–296.
Vexler, A., Yu, J., Zhao, Y., Hutson, A. D., Gurevich, G. (2018). Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures. Statistical Methods in Medical Research, 27, 3560–3576. https://doi.org/10.1177/0962280217704451.
Wang, J., Tsang, W. W., Marsaglia, G. (2003). Evaluating Kolmogorov’s distribution. Journal of Statistical Software, 8(18), 1–4.
Wasserstein, R. L., Lazar, N. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70, 129–133.
Acknowledgements
Dr. Vexler’s effort was supported by the National Institutes of Health (NIH) Grant 1G13LM012241-01. I thank Professor Berger for many useful comments and discussions. The author is grateful to the Editor, the Associate Editor and the referees for suggestions that led to a substantial improvement in this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Proposition 1
Consider, for \( u > 0 \),
where the function \( 1 - \exp ( - \theta u) \) increases and the function \( \exp ( - \theta u) \) decreases with respect to \( u > 0 \). Then, the function \( D(u) \) increases, for \( u < X_{1} \), and decreases, for \( u \ge X_{1} \). Thus,
Assume \( 1 - \exp ( - \theta X_{1} ) < \exp ( - \theta X_{1} ) \). In this case, \( \theta < \log \left( 2 \right)/X_{1} \) and \( D_{1} (\theta ) = \exp ( - \theta X_{1} ) \) that is a decreasing function with respect to \( \theta . \) Assume \( 1 - \exp ( - \theta X_{1} ) \ge \exp ( - \theta X_{1} ) \). In this case, \( \theta \ge \log \left( 2 \right)/X_{1} \) and \( D_{1} (\theta ) = 1 - \exp ( - \theta X_{1} ) \) that is an increasing function with respect to \( \theta . \) Thus, \( D_{1} (\theta ) \) decreases, for \( \theta < \log \left( 2 \right)/X_{1} \), and increases, for \( \theta \ge \log \left( 2 \right)/X_{1} \). That is, we conclude that \( \inf D_{1} (\theta ) = D_{1} \left( {\log (2)/X_{1} } \right) = 0.5 \). By virtue of (8), the proof is complete.□
Proof of Proposition 2
Define the notation \( H_{0} (\theta_{0} ) \) to indicate the hypothesis \( H_{0} \) when the true value of \( \theta \) is \( \theta_{0} \). Now, we will obtain bounds related to the interval \( C_{\beta } \). The function \( u^{ - 1} \exp \left( {u - 1} \right),u > 0, \) has a global minimum at \( u = 1 \). Then, the threshold \( A_{\beta } \) satisfies \( A_{\beta } > 1 \), in order to provide a solution of \( \Pr_{{H_{0} \left( {\theta_{0} } \right)}} \left\{ {\left( {\theta_{0} X_{1} } \right)^{ - 1} \exp \left( {\theta_{0} X_{1} - 1} \right) < A_{\beta } } \right\} = 1 - \beta \). Let \( 0 < u_{0} < 1 < u_{1} \) be roots of the equation \( u^{ - 1} \exp \left( {u - 1} \right) = A_{\beta } \). The roots \( 0 < u_{0} < 1 < u_{1} \) exist, since \( A_{\beta } > 1 \) and the function \( u^{ - 1} \exp \left( {u - 1} \right) \) monotonically decreases, for \( 0 < u \le 1 \), and increases, for \( u > 1 \). This behavior of the function \( u^{ - 1} \exp \left( {u - 1} \right) \) can be used to show that
This defines the system of equations
Then, given \( \beta \), one can derive values of \( u_{0} \) and \( u_{1} \) that do not depend on values of \( \theta \) and provide \( \Pr_{{H_{0} \left( {\theta_{0} } \right)}} \left\{ {u_{0} < \theta_{0} X_{1} < u_{1} } \right\} = 1 - \beta \). Figure 1 presents numerical solutions of (11), depending on \( \beta \in (0,1) \). Then, we have \( \log (2) \in (u_{0} ,u_{1} ) \), for \( \beta \le 0.75 \). According to the proof of Proposition 1, \( \inf_{0 < \theta < \infty } D_{1} (\theta ) = D_{1} \left( {\log (2)/X_{1} } \right) = 0.5 \). That is, \( \inf_{{\theta \in C_{\beta } }} D_{1} (\theta ) \) \( = \inf_{{u_{0} < \theta X_{1} < u_{1} }} D_{1} (\theta ) \) \( = D_{1} \left( {\log (2)/X_{1} } \right) = 0.5 \), for \( \beta \le 0.75 \). By virtue of (8), this completes the proof.□
Proof of Proposition 3
It is clear that
Assume \( F_{{X_{1} ,0}} \left( {X_{1} } \right) \ge 1 - F_{{X_{1} ,0}} \left( {X_{1} } \right) \), i.e., \( F_{{X_{1} ,0}} \left( {X_{1} } \right) \ge 1/2 \). In this case, \( F_{{X_{1} ,0}} \left( {X_{1} } \right) = \left( {2\pi } \right)^{ - 1/2} \int_{ - \infty }^{{X_{1} - \theta }} {\exp \left( { - z^{2} /2} \right){\text{d}}z} \), where \( \theta \le X_{1} \), and then \( D_{1} (\theta ) = F_{{X_{1} ,0}} \left( {X_{1} } \right) \) is a decreasing function with respect to \( \theta . \) Assume \( F_{{X_{1} ,0}} \left( {X_{1} } \right) < 1 - F_{{X_{1} ,0}} \left( {X_{1} } \right) \), i.e., \( F_{{X_{1} ,0}} \left( {X_{1} } \right) < 1/2 \). In this case, \( \theta > X_{1} \) and \( D_{1} (\theta ) = 1 - F_{{X_{1} ,0}} \left( {X_{1} } \right) \) increases with respect to \( \theta . \) Thus, \( \inf_{ - \infty < \theta < \infty } D_{1} (\theta ) = D_{1} \left( {X_{1} } \right) = 0.5 \). The point \( \theta = X_{1} \) belongs to the interval \( C_{\beta } = \left\{ {\theta :\,\exp \left( {\left( {X_{1} - \theta } \right)^{2} /2} \right) < A_{\beta } } \right\} \). Therefore, \( \inf_{{\theta \in C_{\beta } }} D_{1} (\theta ) = D_{1} \left( {X_{1} } \right) = 0.5 \). The proof is complete.□
Proof of Proposition 4
We have
If \( X_{1} > 0 \), then \( F_{{X_{1} ,0}} \left( {X_{1} } \right) > 1/2 \) and \( F_{{X_{1} ,0}} \left( {X_{1} } \right) > 1 - F_{{X_{1} ,0}} \left( {X_{1} } \right) \). In this case, since \( D_{1} (\theta ) = F_{{X_{1} ,0}} \left( {X_{1} } \right) \) is a decreasing function with respect to \( \theta > 0 \), \( \inf_{0 < \theta < \infty } D_{1} (\theta ) = D_{1} \left( \infty \right) = 0.5 \).
If \( X_{1} < 0 \), then \( D_{1} (\theta ) = 1 - F_{{X_{1} ,0}} \left( {X_{1} } \right) \) is a decreasing function with respect to \( \theta > 0 \), and \( \inf_{0 < \theta < \infty } D_{1} (\theta ) = D_{1} \left( \infty \right) = 0.5 \).
Now, we consider \( p_{C} \). Note that since the function \( u^{ - 1/2} \exp \left( {u/2 - 1/2} \right),u > 0, \) has a global minimum at \( u = 1 \), in order to provide a solution of \( \Pr \left\{ {\eta^{ - 1/2} \exp \left( {\eta /2 - 1/2} \right) > A_{\beta } } \right\} = \beta \), where \( \eta \sim\chi_{1}^{2} \), the threshold \( A_{\beta } \) should satisfy \( A_{\beta } > 1 \). Thus, we have \( 0 < u_{0} < 1 < u_{1} \) that are roots of the equation \( u^{ - 1} \exp \left( {u^{2} /2 - 1/2} \right) = A_{\beta } \) and
According to the above proof scheme, \( D_{1} (\theta ) \) is a decreasing function with respect to \( \theta > 0 \) and then we obtain \( p_{C} = 1 - F_{{KS_{n} ,0}}^{{}} \left( {D_{1} \left( {\left| {X_{1} } \right|/u_{0} } \right)} \right) + \beta \), for \( \theta \in C_{\beta } \), where
since \( D_{1} (\theta ) = \hbox{max} \left\{ {F_{{X_{1} ,0}} \left( {X_{1} } \right),1 - F_{{X_{1} ,0}} \left( {X_{1} } \right)} \right\} \). The distribution function \( \int_{ - \infty }^{u} {\exp \left( { - z^{2} /2} \right){\text{d}}z} /\left( {2\pi } \right)^{1/2} = 1 - \int_{ - \infty }^{ - u} {\exp \left( { - z^{2} /2} \right){\text{d}}z} /\left( {2\pi } \right)^{1/2} \) is symmetric. This implies
Now, one can easily use a simple R Code (R Development Core Team 2002) to compute the accurate Monte Carlo approximations to \( p_{C} = 1 - F_{{KS_{n} ,0}}^{{}} \left( {D_{1} \left( {\int_{ - \infty }^{{u_{0} }} {\exp (- z^{2} /2 ) {\text{d}}z} /(2\pi )^{1/2} } \right)} \right) + \beta \), showing that \( p_{C} \ge 1 \) increases when \( \beta \) increases. The proof is complete.□
Proof of Proposition 5
Consider, for non-random variables \( u \) and \( s \), the probability
This implies the inequalities
Dividing these inequalities by s and employing \( s \to 0 \), we obtain Proposition 5.□
Proof of Proposition 6
Define the power function \( g(u) = \Pr (p{\text{-value}} < u|H_{1} ) \). We have \( \int_{0}^{\alpha } {g(u)} {\text{d}}u \ge \alpha^{2} /2 \), where \( \int_{0}^{\alpha } {g(u)} {\text{d}}u = \left. {g(u)u} \right|_{u = 0}^{u = \alpha } - \int_{0}^{\alpha } {uw(u)} {\text{d}}u \), \( w(u) = {\text{d}}g(u)/{\text{d}}u \). Since \( g(u) = \Pr \left\{ {1 - F_{T,0} (T) < u|H_{1} } \right\} = 1 - \Pr \left\{ {T < F_{T,0}^{ - 1} (1 - u)|H_{1} } \right\} = 1 - F_{T,1} \left( {F_{T,0}^{ - 1} (1 - u)} \right) \), we obtain \( w(u) = f_{T,1} (C_{u} )/f_{T,0} (C_{u} ) \) with \( C_{u} = F_{T,0}^{ - 1} (1 - u) \). It is clear that when \( u \nearrow \), the corresponding critical values \( C_{u} \searrow \) and then the likelihood ratio \( f_{T,1} (C_{u} )/f_{T,0} (C_{u} ) \searrow \). This implies \( \alpha^{2} /2 \le \int_{0}^{\alpha } {g(u)} {\text{d}}u = g(\alpha )\alpha - \int_{0}^{\alpha } {uw(u)} {\text{d}}u \le g(\alpha )\alpha - w(\alpha )\int_{0}^{\alpha } u {\text{d}}u \) that completes the proof.□
About this article
Cite this article
Vexler, A. Valid p-values and expectations of p-values revisited. Ann Inst Stat Math 73, 227–248 (2021). https://doi.org/10.1007/s10463-020-00747-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00747-2