Skip to main content
Log in

A Modified Neighborhood Hypothesis Test for Population Mean in Functional Data

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

When dealing with very high-dimensional and functional data, rank deficiency of sample covariance matrix often complicates the tests for population mean. To alleviate this rank deficiency problem, Munk et al. (J Multivar Anal 99:815–833, 2008) proposed neighborhood hypothesis testing procedure that tests whether the population mean is within a small, pre-specified neighborhood of a known quantity, M. How could we objectively specify a reasonable neighborhood, particularly when the sample space is unbounded? What should be the size of the neighborhood? In this article, we develop the modified neighborhood hypothesis testing framework to answer these two questions. We define the neighborhood as a proportion of the total amount of variation present in the population of functions under study and proceed to derive the asymptotic null distribution of the appropriate test statistic. Power analyses suggest that our approach is appropriate when sample space is unbounded and is robust against error structures with nonzero mean. We then apply this framework to assess whether the near-default sigmoidal specification of dose-response curves is adequate for widely used CCLE database. Results suggest that our methodology could be used as a pre-processing step before using conventional efficacy metrics, obtained from sigmoid models (for example: IC\(_{50}\) or AUC), as downstream predictive targets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Arya AK, El-Fert A, Devling T, Eccles RM, Aslam MA, Carlos P, Vlatkovi’c N, Fenwick J, Lloyd BH, Sibson DR et al (2010) Nutlin-3, the small-molecule inhibitor of MDM2, promotes senescence and radiosensitises laryngeal carcinoma cells harbouring wild-type p53. Br J Cancer 103(2):186–195

    Article  Google Scholar 

  • Barretina B, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D et al (2012) The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391):603–607

    Article  Google Scholar 

  • Berger JO, Delampady M (1987) Testing precise hypotheses. Stat Sci 2:317–352

    MathSciNet  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences. Routledge, Milton Park

    Google Scholar 

  • De Niz C, Rahman R, Zhao X, Pal R (2016) Algorithms for drug sensitivity prediction. Algorithms 9(4):77

    Article  MathSciNet  Google Scholar 

  • Dette H, Munk A (1998) Validation of linear regression models. Ann Stat 26:778–800

    Article  MathSciNet  Google Scholar 

  • Dette H, Munk A (2003) Some methodological aspects of validation of models in nonparametric regression. Stat Neerl 57:207–244

    Article  MathSciNet  Google Scholar 

  • Ellingson L, Patrangenaru V, Ruymgaart FH (2013) Nonparametric estimation of means on Hilbert manifolds and extrinsic analysis of mean shapes of contours. J Multivar Anal 122:317–333

    Article  MathSciNet  Google Scholar 

  • Hodges L, Lehmann L (1954) Testing the approximate validity of statistical hypotheses. J Roy Stat Soc B 16(2):261–268

    MathSciNet  Google Scholar 

  • Kuelbs J, Vidyashankar A (2010) Asymptotic inference for high-dimensional data. Ann Stat 38(2):836–869

    Article  MathSciNet  Google Scholar 

  • Ma J, Fong SH, Yunan Luo, Bakkenist CJ, Shen JP, Mourragui S, Wessels LFA, Hafner M, Sharan R, Jian Peng et al (2021) Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat Cancer 2(2):233–244

    Article  Google Scholar 

  • Munk A, Paige R, Pang J, Patrangenaru V, Ruymgaart F (2008) The one and multi sample problem for functional data with application to projective shape analysis. J Multivar Anal 99:815–833

    Article  MathSciNet  Google Scholar 

  • Patrangenaru V, Ellingson L (2015) Nonparametric statistics on manifolds and their applications to object data analysis. Chapman & Hall/CRC, London

    Book  Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis. In: Springer series in statistics. Springer

  • Safikhani Z, Smirnov P, Thu KL, Silvester J, El-Hachem N, Quevedo R, Lupien M, Mak TW, Cescon D, Haibe-Kains B (2017) Gene isoforms as expression-based biomarkers predictive of drug response in vitro. Nat Commun 8:1126

    Article  Google Scholar 

  • Sawilowsky S (2009) New effect size rules of thumb. J Modern Appl Stat Methods 8(2):467–474. https://doi.org/10.22237/jmasm/1257035100

    Article  MathSciNet  Google Scholar 

  • Wainwright Martin J (2019) High-dimensional statistics: a non-asymptotic viewpoint, vol 48. Cambridge University Press, Cambridge

    Google Scholar 

  • Wan Q, Pal R (2014) An ensemble based top performing approach for NCI-DREAM drug sensitivity prediction challenge. PLoS ONE 9(6):e101183

    Article  Google Scholar 

  • Xu M, Zhang D, Wu W (2014) L2 asymptotics for high-dimensional data. arXiv:1405.7244

Download references

Funding

The Funding was provided by National Science Foundation (CCF-2007418, CCF-2007903).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhanamalee Bandara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proofs for Section 3 The Modified Neighborhood Hypothesis Test

A Proofs for Section 3 The Modified Neighborhood Hypothesis Test

Lemma 3.1

If \(X_{1}, \ldots , X_{n}\) are independent and identically distributed random elements in a Hilbert space \({\mathbb {H}}\) with population mean \(\mu \in {\mathbb {H}}\) and covariance operator \(\Sigma :{\mathbb {H}} \rightarrow {\mathbb {H}}\) such that \(E \left( ||X||^{4} \right) < \infty ,\) then

$$\begin{aligned} \sigma _1^2= & {} \textrm{Var} \left( \frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}\right) }{{\tau }} \right) =1- \frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}), {\hat{\textrm{v}}_F}\right) \nonumber \\{} & {} + \frac{\gamma ^2}{\tau ^2} \left[ E[\rho ^4(\mu ,X)] - {\textrm{v}_F}^2 \right] . \end{aligned}$$
(8)

Proof

The test statistic \(T_1\) can be decomposed as follows:

$$\begin{aligned} T_1 = \frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}+ \gamma {\hat{\textrm{v}}_F}- \gamma {\textrm{v}_F}\right) }{{\tau }} = \frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}\right) }{{\tau }} + \frac{ \gamma \sqrt{n} \left( {\hat{\textrm{v}}_F}- {\textrm{v}_F}\right) }{{\tau }}.\nonumber \\ \end{aligned}$$
(18)

From Patrangenaru and Ellingson (2015, pg. 179), we also know that

$$\begin{aligned} \sqrt{n} ({\hat{\textrm{v}}_F}- {\textrm{v}_F}) \rightarrow _d N \left( 0,E \left[ \rho ^4(\mu ,X) \right] -{\textrm{v}_F}^2 \right) . \end{aligned}$$

As such,

$$\begin{aligned} \sigma _2^2 = \textrm{Var}\left( \frac{\gamma }{\tau } \sqrt{n} ({\hat{\textrm{v}}_F}- {\textrm{v}_F}) \right) = \frac{\gamma ^2}{\tau ^2}\left( E \left[ \rho ^4(\mu ,X) \right] -{\textrm{v}_F}^2 \right) \end{aligned}$$
(19)

From Sect. 2, we know that \(Var(T_1)=1\). Combining this with the above results yields

$$\begin{aligned} 1&=\textrm{Var}(T_1) =\sigma _1^2 + \sigma _2^2 +2\textrm{Cov} \left( \frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}\right) }{{\tau }}, \frac{\gamma }{\tau } \sqrt{n} ({\hat{\textrm{v}}_F}- {\textrm{v}_F}) \right) \nonumber \\&\quad =\sigma _1^2 + \sigma _2^2 +\frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}, {\hat{\textrm{v}}_F}-{\textrm{v}_F}\right) \nonumber \\&\quad =\sigma _1^2 + \sigma _2^2 +\frac{2\gamma n}{\tau ^2} \left[ \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\hat{\textrm{v}}_F}\right) - \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\textrm{v}_F}\right) \right. \nonumber \\&\quad \left. -\gamma \textrm{Cov} \left( {\hat{\textrm{v}}_F}, {\hat{\textrm{v}}_F}\right) + \gamma \textrm{Cov} \left( {\hat{\textrm{v}}_F}, {\textrm{v}_F}\right) \right] \nonumber \\&\quad =\sigma _1^2 + \sigma _2^2 +\frac{2\gamma n}{\tau ^2} \left[ \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\hat{\textrm{v}}_F}\right) -\gamma Var \left( {\hat{\textrm{v}}_F}\right) \right] \nonumber \\&\quad =\sigma _1^2 + \sigma _2^2 +\frac{2\gamma n}{\tau ^2} \left[ \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\hat{\textrm{v}}_F}\right) -\gamma \frac{\tau ^2}{\gamma ^2 n} \sigma _2^2 \right] \nonumber \\&\quad =\sigma _1^2 + \sigma _2^2 + \frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\hat{\textrm{v}}_F}\right) - \frac{2\gamma n}{\tau ^2} \frac{\tau ^2}{\gamma n} \sigma _2^2 \nonumber \\&\quad =\sigma _1^2 - \sigma _2^2 + \frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}) , {\hat{\textrm{v}}_F}\right) \end{aligned}$$
(20)

Solving for \(\sigma _1^2\) combined with (19) yields

$$\begin{aligned} \sigma _1^2=1- \frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}), {\hat{\textrm{v}}_F}\right) + \frac{\gamma ^2}{\tau ^2} \left[ E[\rho ^4(\mu ,X)] - {\textrm{v}_F}^2 \right] . \end{aligned}$$
(21)

\(\square \)

Lemma 3.2

Under the conditions of Lemma 3.1, then

$$\begin{aligned}{} & {} \frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}\right) }{\tau } \rightarrow _d N \left( 0, 1 - \frac{2\gamma n}{\tau ^2} \textrm{Cov} \left( \varphi _M({\overline{X}}), {\hat{\textrm{v}}_F}\right) \right. \nonumber \\{} & {} \left. \quad + \frac{\gamma ^2}{\tau ^2} \left[ E[\rho ^4(\mu ,X)] - {\textrm{v}_F}^2 \right] \right) \end{aligned}$$
(9)

Proof

This follows immediately from (18) to (8). \(\square \)

Theorem 3.1

Under the conditions of Lemma 3.1 and the mild assumption that \({\hat{\sigma }}_1^2 >0\), we arrive at the following asymptotic result:

$$\begin{aligned} T_2=\frac{\sqrt{n} \left( \varphi _M({\overline{X}}) - \gamma {\hat{\textrm{v}}_F}\right) }{{{\hat{\tau }} {\hat{\sigma }}_1}} \rightarrow _d N(0,1). \end{aligned}$$
(11)

Proof

From the proof of Lemma 3.1 and results from nonparametric bootstrap theory, then if \({\hat{\sigma }}_1^2>0\), then it is a consistent estimator of \(\sigma _1^2\). We can then apply Slutsky’s Theorem to the result of Lemma 3.2, yielding this result. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bandara, D., Ellingson, L., Ghosh, S. et al. A Modified Neighborhood Hypothesis Test for Population Mean in Functional Data. JABES 29, 1–18 (2024). https://doi.org/10.1007/s13253-023-00549-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-023-00549-y

Keywords

Navigation