Statistical Methods for Selective Biomarker Testing

Biomarker is a critically important tool in modern clinical diagnosis, prognosis, and classification/prediction. However, there are fiscal and analytical barriers to biomarker research. Selective Genotyping is an approach to increasing study power and efficiency where individuals with the most extreme phenotype (response) are chosen for genotyping (exposure) in order to maximize the information in the sample. In this article, we describe an analogous procedure in the biomarker testing landscape where both response and biomarker (exposure) are continuous. We propose an intuitive reverse-regression least squares estimator for the parameters relating biomarker value to response. Monte Carlo simulations show that this method is unbiased and efficient relative to estimates from random sampling when the joint normal distribution assumption is met. We illustrate application of proposed methods on data from a chronic pain clinical trial.


Introduction
A biological marker, or biomarker, is an objective measurement which indicates a biologic process or response [1].This umbrella definition captures a range of measurements that may be representative of a disease course, from simple indicators like the blood pressure to complex laboratory tests [2].
In this paper, we focus specifically on physical markers which are detectable via serum assays.In the past 20 years, the explosive use of biomarkers in medical research has coined the "biomarker revolution" [3].Clinically relevant biomarkers can provide information on both disease mechanisms and subsequent outcomes [4].In practice, a biomarker can serve as both a risk indicator and a surrogate for disease status.As such, the biomarker is a critically important tool in modern clinical diagnosis, prognosis, and classification/prediction. Despite the clinical utility and popularity of biomarkers and continual advancements in collection technology, there remain fiscal and analytical barriers to biomarker research.The cost of conducting biomarker assays for a sufficiently powered study is a major limitation.For example, a luminex assay with 39 sample slots and the capacity to detect up to 20 biomarkers can cost between $300 and $600 before fees for consultation, lab materials, and labor [5].This means that collecting biomarker samples for each person in a study of 800 participants could cost more than $12,600 when serum samples for each individual are collected at a single time point.Resource limitations have thus inspired the development of cost-effective experimental designs and corresponding statistical methodology [3,6].
In genetics literature, one such approach concentrates sampling to the most informative observation units [7].Selective genotyping [8] traditionally entails sampling individuals for genotyping based on extreme phenotypic values where genotype (exposure) is discrete (presence/absence; aa/Aa/AA) and phenotype (response) is continuous, most often assuming an ANOVA-style model [9].From a statistical perspective, this is distinct from, but perhaps inspired by, the information perspective [10] for which the optimal design in regression analysis (i.e.sample) is the one that minimizes the variance of coefficient estimates [11].
Under selective genotyping, the response no longer follows a normal distribution and missing data in the middle of the phenotypic distribution must be accounted for [12].Appropriate maximum likelihood methods have been developed for inference under such designs for QTL studies [7,9], which have been shown to increase statistical power relative to random sampling using a fraction of the original sample [13,7,14].However, the developed methods are highly specific to the field of genetics, modeling the exposure as discrete and accounting for elements such as backcross in models.Additionally, in genotyping studies it is advantageous to take a multistage approach wherein promising genetic markers are identified early out of a pool of candidates to meet study constraints [15,16].
In this paper we study a selective biomarker-testing scheme where, similar to selective genotyping, individuals with extreme response values are selected for biomarker-testing.In contrast to typing the discrete genotypes in the selective genotyping, here the individuals' continuous biomarkers are measured.Selective genotyping and our selective biomarker-testing are both special cases of Outcome Dependent Sampling (ODS) [10].ODS (also known as response-dependent sampling) is a form of biased sampling, which has been studied in the context of likelihood-based estimation [17].Zhou et al. [18] formally defined an ODS design, for a linearly related continuous exposure and outcome where all response values are observed and covariate values are observed, as (1) a simple random sample (SRS) from the full available cohort and (2) additional SRS's from regions of the response that are of particular interest.They proposed a semiparametric empirical likelihood estimation method for this ODS design which was shown to increase efficiency relative to simple random sampling [18].Weaver and Zhou [19] extended this design to incorporate all available information from the full sample including those with unknown exposure values taking an estimated likelihood approach.Zhou's original estimator has been expanded to accommodate different functional forms [20], auxiliary covariate information [21,22,23], multi-stage designs [24,23,25], mixed effects models [25], and more recently survival [26] and longitudinal endpoints [27,6,28].
Here we propose statistical analysis methods to analyze selective biomarker-testing data utilizing regression estimations available from standard statistical software.Notice that our selective biomarker-testing scheme can be considered as Extreme Outcome Dependent Sampling (EODS) since we only sample individuals with extreme response values without the SRS from the full cohort (thus no biomarker-testing is conducted for any individual in the mid range of response values).
The benefit, relative to the cost, of the incorporation of the primary SRS in traditional ODS designs is unknown.Selective genotyping literature suggests that there is no information to be gained by genotyping individuals outside the tails of the response variable distribution [7].Also, while the ODS estimators in above literature are efficient, unbiased, and flexible under their statistical assumptions, the complexity of the semiparametric likelihood-based approach is a key barrier to their widespread use in practice.The least squares approach to estimation in regression analysis is standard, and therefore most accessible to researchers conducting biomarker-testing studies.However, least squares has been shown to be most susceptible to the bias induced by extreme sampling based designs [11].To solve this issue, we propose a reverse-regression least squares estimation method for EODS designs with jointly normal distributed biomarker and response.
The organization of this article is as follows: In Section 2 we describe the BISP2 clinical trial dataset which motivated this research.In Section 3 we describe a general methodology for biomarker studies, including biomarker selection, the effect estimation, power/sample size considerations, and model checking methods.In Section 4 we present numerical results of our estimator.
We first study the finite sample properties of our estimator via simulation, then apply our estimator to the BISP2 Biomarker Study.

Motivating Example
Biopsychosocial Influence on Shoulder Pain Phase I (BISP) was a single-center, pre-clinical "proof of concept" study of 190 adults to identify genetic and psychological characteristics related to chronic musculoskeletal pain [29].Musculoskeletal pain is the general pain affecting the muscles, ligaments, tendons, or bones with chronic indicating that the pain is long-lasting or consistently recurring.In general, the musculoskeletal pain is a large contributor to the $635 billion yearly healthcare cost of chronic pain in the United States [29].Though this makes chronic pain a high priority research area, there are limited accepted treatment models due to the complexity of disease etiology.Treatment components must be personalized on the basis of genetic, psychological, environmental, and social risk factors, which all contribute to the individual variation in how people experience chronic pain.Specifically, BISP targeted chronic musculoskeletal pain affecting the shoulder region by comparing predictors of pain level among healthy individuals pre-and post-induced shoulder pain.
The target population was healthy adults.Participants were followed over the course of five days.
Baseline covariates and DNA samples were collected on the first day of the study, before inducing shoulder pain.An exercise-based protocol was then used to create controlled shoulder pain through inflammation and muscular fatigue [29].Four follow-up visits were conducted every 24 hours post-injury induction.Shoulder impairment, genetic testing, and other covariates related to pain level defined a priori based on clinical expertise were measured at baseline and each follow-up visit.BISP identified multiple prognostic factors which were associated with increased shoulder pain, including promising genes which showed evidence of predicting shoulder impairment.BISP Phase II (BISP2) was a single-center, randomized follow-up trial to BISP which aimed to test whether interventions personalized on the basis of genetic and psychological characteristics are effective for induced shoulder pain [30].The two-factor factorial design randomized 261 individuals to either propranolol or placebo and psychological education or general education.Propranolol is a drug chosen to target Catechol-O-methyl-transferace (COMT), which metabolizes adrenal hormones and is associated with pain sensitivity.The psychological education was designed to target pain rumination, which is magnification of pain by focusing on the pain with a pessimistic attitude.Pre-randomization, shoulder injury was induced using the same protocol.
BISP2 participants provided daily report on pain intensity and disability over the 5-day onsite observation period.Pain level was measured using the Brief Pain Inventory (BPI) [31], which is an 11-point scale ranging from 0 (no pain) to 10 (worst pain intensity imaginable).Participants rated the intensity of current pain and pain intensity at its worst, best, and average over the past 24 hours.Clinically relevant covariates and saliva samples for genetic testing were again taken at each follow up visit.
The BISP2 investigators targeted 14 genetic markers associated with pain in the study's exploratory biomarker analysis.These biomarkers were chosen a priori based on clinical knowledge of propensity to release pro-inflammatory cytokines.All genetic samples were saliva-based.The primary question of interest for the exploratory analysis was: which biomarkers are associated with recorded level of shoulder pain at 48-hours post-randomization?As the response, pain, is measured via the BPI to represent a continuum pain, linear regression with pain level as the response is a typical analysis approach.However, the investigators were limited by the budget and time constraints of the BISP2 clinical trial.It was infeasible to biomarker-test every individual in the study.Therefore, the investigators began by biomarker-testing 31 individuals in the high and low tails of worst pain experienced at 48-hours post shoulder pain induction.Determining an appropriate analysis method for data from this extreme sampling motivates the methodology in the remainder of this paper.

Methodology
In a clinical trial, we are interested in the relationship between a response variable Y and a biomarker variable X.If both variables are measured on all subjects enrolled in the trial, we have a full data set consisting of (X 1 , Y 1 ), ..., (X n F , Y n F ).Here n F denote the size of the full data set.
For the EODS biomarker analysis, we observe the response variable for the full data set, Y 1 , ..., Y n F , but only observe biomarker variable X on a subset corresponding to extreme values of Y .Say, we select γ proportion of the full data set and measure the biomarker X on the selected subset of size n S = [γn F ], where [d] where E(Y |X) denotes the conditional expectation of Y given X.When X and Y jointly follows a bivariate normal distribution, the full data set satisfy the standard regression model assumption: for i = 1, ..., n F .
However, for the EODS biomarker analysis, naively conducting linear regression analysis directly on the subset (X 1 , Y 1 ), ..., (X n S , Y n S ) does not work.The reason is that, the regression model assumption (2) does not hold on the selectively sampled subset because the response follows a truncated distribution due to the selection of extreme values for Y as shown by the red part of the curves in Figure 1a.The ordinary least square (OLS) fit on the selectively sampled subset will lead to a highly unbiased slope estimation as shown by the dotted red line.Therefore, specific analysis methods are needed for the selective biomarker analysis.In this paper, we propose a parametric analysis method which utilizes standard regression formulas on reverse-regression: regressing X on Y for the selectively sampled subset.The main insight for the proposal is that, when X and Y jointly follows a bivariate normal distribution, the standard regression model assumption holds for the reverse-regression on both full data set and the selectively sampled subset: As shown in Figure 1b, conditional on selected Y values, X is still normally distributed.Armed with this insight, we can conduct the reverse-regression E(X|Y ) = α X + β X Y using standard statistical software on the selectively sampled subset.Then we convert the reverse-regression fit results into regression inferences, with a little help of the additional response variable observations Y n S +1 , ..., Y n F .The conversion formulas are described in detail in the rest of this section, with the mathematical derivations provided in the Appendix.

Hypothesis Test
One main focus of the biomarker analysis is to test whether a biomarker X affects the response variable Y .Statistically, this is usually done by the linear regression hypothesis testing Mathematically, the null hypothesis H 0 : β Y = 0 is equivalent to the reverse-regression null hypothesis H 0 : β X = 0. Thus we can simply carry out the reverse-regression hypothesis test on the selectively sampled subset for The p-value for the test of ( 5) is also valid for testing (4).

Biomarker Selection
When multiple biomarkers X (1) , ..., X (M ) are under study, testing is done for the M biomarkers on the selectively sampled subset: (X n S , ..., X n S , Y n S ).The usual variable selection methods can be applied on X (1) , ..., X (M ) using the p-values from the reverse-regression.
That is, for each j = 1, ..., M , a p-value p (j) is yielded from testing on the selectively sampled subset.The biomarkers X (j) 's can then be sorted in an order of increasing p-values p (j) with smaller p-value indicating stronger biomarker effect on the response.
The strongest biomarkers can be selected for further study.To adjust for the multiple testing, common variable selection rules such as the Benjamini-Hochberg method [32] can be applied on these p-values.

Effect Estimation
The effect of the biomarker X on response variable Y is measured by the slope β Y in the regression function.Naively fitting the regression equation ( 1) on the selectively sampled subset would result in an overestimation of the biomarker effect.As illustrated in Figure 1a, the conditional (truncated) distribution of Y changes with the given X values, leading to OLS estimate (shown as the dotted red line) with much bigger slope magnitude than the true β Y value when β Y > 0. For consistent estimation of β Y , we use instead the parameter estimates βX and σ2 ε X from fitting the reverseregression (3) on the selectively sampled subset.These estimates can be obtained from conducting the reverse-regression using any standard software.Furthermore, from the full response variable observations of Y 1 , ..., Y n F , we have the sample mean μY = 1 Using equation (20) in the Appendix, we have an point estimation for Then the standard error for βY can be calculated as in equation (25), , whose detailed derivation is provided in the Appendix.Therefore, the (1 − α) confidence interval for β X can be calculated as βY ± t α/2,df =n S −2 s.e.{ βY }.
Here t α/2,df =n S −2 is the α/2-upper quantile for the t-distribution with n S − 2 degrees of freedom.
Similarly, using (21) in the Appendix, α Y is estimated by

Power/Sample Size Calculation
As derived in the Appendix, equation (28) gives the power formula of the hypothesis test in section 3.1: where W denotes a random variable following a non-central F-distribution with degrees of freedoms x 2 φ(x)dx, and F Q α,df1,df2 denotes the α-upper quantile of a central F-distribution with degrees of freedoms of df 1 and df 2 .
Here z α denotes the α-upper quantile of a standard normal distribution N (0, 1) whose density is denoted as φ(x).And f is the Cohen's effect size defined as where R 2 is the proportion of variation in the data explained by the regression equation.
Based on this, we can choose the proportion γ.We illustrate the sample size calculation with a simple example here.Assume that we have a full data set of sample size n F = 200, and we wish to detect a Cohen's effect size f = 0.  (9) gives power = 0.8985.Thus 20 individuals will be biomarker-tested to achieve the design goal of 90% power.
In contrast, the standard t-test in simple linear regression for testing (4) has power = 0.8983 when n = 118 and power = 0.9007 when n = 119 as given by equation (26).Thus a clinical trial without the EODS will require 119 individuals with both X and Y measurements, needing an almost 6-fold increase in cost of biomarker-testing for X than the EODS method.

Model Checking
Our methodology is based on the assumption that X and Y are jointly normally distributed.
Mathematically, that is equivalent to the following two assumptions holding simultaneously: (A) the response variable Y is normally distributed and (B) conditional on Y , the biomarker variable X is normally distributed.We now consider how to check these two assumptions on observed data.
Assumption (A) can be checked with standard methods such as the normal probability plot on the fully observed Y 1 , ..., Y n F .For assumption (B), we do not observe X for the whole range of Y .But we can still check whether (B) holds for the selected Y values by applying standard model checking methods on the reverse-regression, e.g., the normal probability plot of the reverseregression residuals.
4 Numerical Studies

Simulation Studies
In this section, we use simulations to check the finite sample performance of our proposed Outcome-Dependent Extreme biomarker-testing (ODEB) estimator.All simulations were conducted using R Statistical Software Version 3.6.0[33].

Simulation Comparison with Ordinary Least Square
We compare our ODEB estimation with the ordinary least square (OLS) estimation in various parameter settings here.Data was simulated according to the probability model described in Equation (2).Specifically, Y i is generated such that Individual data sets consisting of (X Figure 2 displays the bias of βY when an extreme sample of 10% is taken from each tail of the response distribution (thus resulting in a total of 20% extreme sampling).When there is truly no effect of the biomarker X, both OLS and ODEB estimation are unbiased regardless of whether the sample is extreme or random.When there is a biomarker effect present, βY,ODEB under extreme or random sampling and βY,OLS under random sampling remain mostly unbiased, with only a small appreciable bias when n = 100 which dissipates as the sample size reaches n = 400.However, βY,OLS under extreme sampling is severely biased in the positive direction.The bias initially increases, then tapers, with the size of β Y .For saving space, we omits the detailed numbers for simulations from other sampling proportions (10% and 40%) but summarize their bias patterns.
The bias of βY,OLS under extreme sampling is exacerbated when γ decreases to 10% (i.e., 5% per tail), with estimation bias exceeding twice the magnitude of β Y .The magnitude of the βY,OLS bias decreases slightly when γ = 0.4, reflecting contribution from more moderate observations, but the overall bias magnitude is still unacceptably large.
The root mean square error (RMSE) displays similar trends, as shown in Figure 3 In addition to being unbiased and more precise when compared to βY,OLS , βY,ODEB provides a practical increase in power to detect biomarker effect when combined with extreme sampling.First, for ODEB test under all sampling methods, the type I error rate for the test of the null hypothesis H 0 : β Y = 0 is controlled at the nominal level of 5% as illustrated in the upper-left panel of Figure 4.When β Y = 0, random sampling generally has less power to detect a biomarker effect than extreme sampling, with the dashed lines (random sampling) falling below the solid lines (extreme sampling) in each panel.This highlights the utility of focusing on those observations with extreme response values.Taking β Y = 0.2 as an illustrative example of this notion, the stars placed on the upper-right panel emphasize the increase in power afforded by successively more extreme samples.
For the three starred cases, each sample contains 80 biomarker-tested observations.We can see that a 10% extreme sample from 800 observations provides 96.2% power to detect the effect, while a 20% extreme sample from 400 observations provides 88.9% power and a 40% extreme sample from 200 observations provides only 73.8% power.A more extreme sample from a larger population has the advantage in terms of power.Intuitively, the more extreme sample contains more information about the relationship, and hence provides a formidable gain in power.
Figure 5 studies the performance of the 95% confidence interval for β Y for OLS and ODEB.The confidence level for the intervals given by ODEB under extreme and random sampling and OLS under random sampling is largely maintained at the nominal 95%.Unsurprisingly, the OLS-based confidence interval under extreme sampling does not achieve correct nominal 95% coverage.The poor coverage displayed is worsened both with total study sample size (simply reflecting a larger number of extreme observations contributing to estimation) and the size of β Y .This is particularly severe when β Y = 1, with coverage probabilities of effectively 0% for n = 200, 400, and 800.The OLS-based confidence interval under extreme sampling also displays the longest confidence interval length, though this decreases with β Y to approach the average lengths of the other intervals (Figure 6).ODEB estimation affords the most precise confidence interval compared to OLS, particularly when applied to an extreme sample.
The conclusions of comparison between OLS and our ODEB estimation under example sampling are also summarized in Figure 7 by fixing β Y and examining the resulting RMSE, bias, average confidence interval coverage, and average confidence interval length for various sampling proportions.RMSE, bias, and confidence interval length are globally higher for OLS estimation compared to ODEB estimation due to the violation of the conditional distribution illustrated in Figure 1.Accounting for the sampling scheme, ODEB estimation yields low RMSE and bias.ODEB-based confidence intervals not only maintain nominal coverage, but are shorter in length than those based on βY,OLS .

Simulations with Non-normal Residuals
To explore the degree to which the ODEB estimation method is affected by violations to the assumption of joint normality, the simulation above was repeated for ε Y,i following a scaled t distribution (scale √ 5) with DF ∈ {10, 20} and a shifted log-Normal distribution with a mode of zero and variance of 5.
From the upper-left panel (the case of no biomarker effect with β Y = 0) of Figure 8, the hypothesis test to detect the effect always has the correct level of α = 0.05 regardless the normality assumption holds or not.This is expected because, whatever the residual distribution is, there is no effect in the reverse-regression also when the regression effect β Y = 0.The power of the test improves when the residuals are skewed (shifted log-normal) and deteriorates when the residual distribution becomes more heavily tailed (smaller degrees of freedom for t-distribution).In all simulated settings, the power of detection exceeds 80% (shown as the horizontal dotted red line) when the full sample size is n F = 800 (with n S = 160 selected for biomarker-testing).
The bias of βY,ODEB is affected by the residual distributions as seen in Figure 9. Correspond- ingly, the coverage probability of the 95% confidence interval is no longer valid for non-normal residuals in Figure 10.The coverage deteriorates when the effect sizes increases, when the sample size increases, and when the tails of residual distribution become heavier.
Overall, when the normality assumption is violated, the effect detection test is still valid and powerful but the effect estimation is no longer reliable.

Comparison with MSELE Estimator
Additionally, we compared the ODEB estimator to the MSELE estimator [18].Notice that the MSELE estimator cannot be applied directly for data from extreme outcome-dependent sampling.
It requires an additional simple random sample (SRS) so that some of the individuals with medial Y values are biomarker-tested also.Thus we conduct the comparison under the ODS experimental design outlined by Zhou et al [18].
Again, for each of B = 20, 000 iterations, individual data sets consisting of (X 1 , Y 1 ), ..., (X n F , Y n F ) were randomly generated.This data was eligible for extreme outcome-dependent sam- pling.An additional SRS of size n SRS = 80 was generated.For each combination of parameter values described above, an extreme sample of size γn F was selected.Both βY,ODEB and βY,MSELE were calculated on the merged sample consisting of the extreme sample and the additional SRS for a total sample size of n = γn F + n SRS .The ODS package [34] in R was used to calculate βY,MSELE and its corresponding standard error.
Figure 11 compares the RMSE of βY,MSELE and βY,ODEB .The βY,MSELE is obtained from an iterative numerical optimization algorithm, and it does depend on the starting values which poses a challenge in convergence.We also plotted proportion of convergence of βY,MSELE out of B = 20, 000 simulation runs on the figure.The convergence rate of βY,MSELE is around 50% or lower when the effect size is small.As the effect size increases, the divergence issue alleviates, but there is still a significant proportion of divergence even when β Y = 1.The RMSE of βY,MSELE from the convergent runs are plotted, and are much higher than RMSE of βY,ODEB in corresponding settings.
The RMSE is heavily affected by outliers.Sometimes, βY,MSELE converges to a value that is far from the true value, indicating failure of the iterative algorithm to find the true root.Such wrong convergent cases inflate the RMSE of βY,MSELE a lot.To be fairer for βY,MSELE , Figure 12 presents the median absolute error (MAE) comparison instead.The MAE of βY,MSELE is now closer to, but still clearly exceeds, the MAE of βY,ODEB .This accuracy improvement by βY,ODEB is expected here as it utilizes the correct parametric assumption which βY,MSELE does not make.As β grows larger, this gap in performance between βY,MSELE and βY,ODEB shrinks.

Data Analysis/Application BISP2 Trial
The dependent variable of interest in the BISP2 Trial, pain intensity, is measured via the BPI to represent an underlying pain continuum for all participants.A pilot sample of 31 participants in the high and low tails of worst pain experienced at 48-hours post shoulder pain induction was selected for initial biomarker-testing.As the BPI is a discrete index represented on a continuous scale, there were ties for the highest and lowest pain scores.To determine which of the equal scores were included in the pilot sample, those who responded vs. did not respond to the treatment were selected in a balanced manner.Based on promising results from the initial pilot sample, a follow-up sample of 57 participants was selected in the same manner.All biomarkers were logarithm (base 10, for interpretability) transformed based on prior knowledge of the typical biomarker distribution.
We model the relationship between pain and log-transformed biomarker linearly as: Two pain-related outcomes of interest were explored using the above methodology: worst pain The results of the analysis of the combined stages 1 and 2 data for worst pain and change in pain are shown in Tables 1 and 2. Table 1 shows evidence of association between CCL2 and worst pain at 48 hours, with an effect estimate of 1.92 (95% CI: 0.24, 3.61).This indicates that for a 10-fold increase in CCL2, we expect the worst pain at 48 hours to increase by 1.92 points on the BPI scale.For change in pain, there is evidence of association from CRP ( β = 0.71 (95% CI: 0.05, 1.37)).We also see potential from IL10 and IL6, however the evidence in this sample is insufficient to conclude such.In additional analyses for self-reported disability outcomes (whose full result tables are omitted for succinctness), CRP shows evidence of association with Quick-DASH score.For a 10-fold increase in CRP, we expect an increase of 3.92 points in the Quick-DASH score.
Figure 14 provides the model checking diagnostic plots for the change in pain outcome and the residuals of the reverse regression of log-transformed CRP on change in pain.Our joint normality assumption appears to be reasonable for this example.

Discussion
We proposed a new analysis method for the extreme outcome-dependent sampling based on the joint normality assumption.Traditionally, the outcome-dependent sampling data is analyzed through likelihood approach [18] with an additional SRS to provide some observations in the middle range of response values.The main advantage of the likelihood approach is that no parametric distributional assumption is needed.In contrast, through the joint normality assumption, our method provides two significant practical advantages.First, our method allows concentration ing in great cost reduction for the expansive biomarker-testing.Second, our method is simple and can be conducted with existing standard statistical software.As seen in the simulation, often the likelihood method diverges or it converges to wrong parameter values, thus its data analysis often requires much care from an expert statistician.On the other hand, our method only requires applications of simple formulas on standard least-square estimates from the reverse-regression, and can be handled by any practitioner with a minimum statistical training.Furthermore, our normality assumption can be checked using some standard model checking techniques.When the normality assumption is violated, the confidence interval coverage deteriorate only for very heavy-tailed residuals, but the hypothesis testing for effect detection always remain valid.Thus our new EODS analysis method is particularly suited for exploratory biomarker studies due to its cost savings and ease of usage.
This paper is a first work demonstrating the feasibility of EODS regression analysis assuming bivariate joint normal distribution of a response variable and a biomarker variable.In the future, we will extend these analysis to more applications.For the multiple regression analysis, in principle we can similarly derive formulas assuming joint normal distribution of a response variable and multiple biomarker variables.To allow analysis of longitudinal data, we need to consider analysis Funding Dr. Wu was partially supported by the funding from the National Institutes of Health and National Institute of Arthritis and Musculoskeletal and Skin Diseases (AR055899).All authors were independent from this funding source.

Appendices
A.1.Mathematical derivation of formulas relating parameters of regression and reverse-regression We assume that (X, Y ) follows the bivariate normal distribution Under this assumption, we consider how to get regression parameters α Y , β Y and σ ε Y from the reverse-regression parameters α X , β X and σ ε X .
Conditional on X, we have the regression equation where Similarly, conditional on Y , we have the reverseregression equation where Firstly we derive how the regression parameters of equation ( 13) relate to the bivariate normal distribution parameters.To do this, we consider the standardized versions of X and Y as Hence we have where into equation (14) and compare with equation ( 13), we can express the reverse-regression parameters in terms of the bivariate normal distribution parameters as Secondly, by symmetry, similar derivations give the regression parameters' formulas in terms of the bivariate normal distribution parameters as We now find formulas to express regression parameters α Y , β Y and σ ε Y in terms of α X , β X , We start from the parameter of most interest β Y .Combine the middle equations in ( 15) and ( 16), we get Now, to get the desired expression of β Y , we only need to express σ 2 X in terms of α X , β X , σ ε X , µ Y and σ Y .
From the middle equation in (15), Square both sides of equation ( 18) and divides the last equation in (15), we get Therefore, we solve ρ 2 in this expression to get Plug this back into the last equation in (15) to solve for σ 2 X , we get Put this into equation ( 17), we have the desired expression for β Y as Next, we express α Y in terms of α X , β X , σ ε X , µ Y and σ Y .From the first equation in (15), Plug this and (20) both into the first equation of ( 16), we get Finally, for the expression of σ 2 ε Y , we plug (19) into the last equation of ( 16) to get

A.2. Mathematical derivation of power formulas
For the standard simple linear regression on a data set of size n, the power of an α level t-test (testing the zero slope null hypothesis) is given where N F ncp=nf 2 ,df1=1,df2=n−2 denotes a random variable following a non-central F-distribution with noncentral parameter nf 2 , degrees of freedoms of 1 and n − 2, and F Q α,df1=1,df2=n−2 denotes the α upper quantile of a central F-distribution with degrees of freedoms of 1 and n − 2.Here f is the Cohen's effect size defined as in equation ( 10) where R 2 is the proportion of variation in the data explained by the regression equation.
We are doing the reverse-regression hypothesis test for (5).If we have observations of both X and Y on the full data set, then the power is given by formula (26) with n = n F .Now we derive the power formula when the test is conducted on the selectively sampled subset.Besides the changes in sample size from n F to n S , the effect size also changes from the full data set to the selectively sampled subset.The R 2 in (10) would be larger in the subset due to selectively sampling the extreme values.To derive the change in effect size, we reexpress (10) as where the last equality comes from equation (19).Notice that both quantities β 2 X and σ 2 ε X remain invariant on the full data set and the subset since the reverse-regression model (3) holds on both data sets.σ 2 Y differs in the two data sets.Since only the extreme γ proportion of Y is selected, the variance of selected Y S is bigger on the selectively sampled subset than the variance of Y on the full data set.To calculate σ 2 Y S , we note that the selected Y S do not follow the N (µ Y , σ 2 Y ) distribution anymore.Rather it follows the normal distribution truncated at the upper and lower (γ/2)-tails.Let z γ/2 denotes the upper (γ/2)-quantile of the standard normal distribution N (0, 1).

Figure 1 :
Figure 1: Illustration of challenge posed to linear regression under extreme sampling.

( a )
Truncation of the conditional distribution of Y given X under extreme response selection with the linear model approach.Preservation of the conditional distribution of X given Y under extreme response selection.

Figure 4 :
Figure 4: Power to detect a nonzero effect of biomarker level on pain for various β Y 's.

Figure
Figure 5: 95% β Y confidence interval coverage under 20% of the total study sampled for various β Y 's.

Figure 11 :
Figure 11: Root mean square error (RMSE) of βY,MSELE compared to βY,ODEB under 40% of the total study sampled for various β Y 's.

Figure 13 :
Figure 13: Worst pain experienced at 48-hours shoulder pain induction for each sampling stage, combined stages, and the full BISP2 study population.

Figure 14 :
Figure 14: Model checking: Normal QQ plot example for the change in pain outcome and the reverse regression of Y P ain Change on log(X CRP ).

2 ,
from the variance formula of the chi-square distribution.Plug-in the point estimators for each quantity into equation(24), we estimate the standard error for βY as s.e.{ βY } = ( denotes the closest integer to d.Without loss of generality, we can denote the first n S subjects as the selected ones.That is, in the selective biomarker analysis, we observe (X 1 , Y 1 ), ..., (X n S , Y n S ) and Y n S +1 , ..., Y n F , while Y 1 , ..., Y n S corresponds to the top γ/2 proportion and the bottom γ/2 proportion of Y 1 , ..., Y n F .
extreme sampling and random sampling since the reverse-regression model assumption (3) hold in both cases.OLS would work well for the random sampling, but would provide inconsistent estimation under the extreme sampling scheme.The simulation results confirmed this.
Root mean square error (RMSE) of βY under 20% of the total study sampled for various β Y 's.
(7)nder extreme sampling, βY,ODEB has the smallest RMSE of the four considered combinations across both study sample sizes and true β Y 's.This reflects the efficient use of information by βY,ODEB , which incorporates information from observations with medial response values only through the response variance estimation in equation(7).In contrast, βY,OLS has the largest RMSE across Figure 3: study sample sizes and β Y 's when applied to an extreme sample.Of note, when biomarker-testing is conducted via random sample, βY,OLS and βY,ODEB often perform similarly in terms of RMSE, but βY,ODEB beats βY,OLS as β Y becomes bigger.
5: 95% β Y confidence interval coverage under 20% of the total study sampled for various β Y 's.
Figure 8: Power of βY,ODEB under skewed (log-normal) or heavy-tailed (t) residual distributions and an extreme sample of 20% (Normal residual distribution also plotted for reference).Figure 10: coverage of 95% confidence interval for β Y under under skewed (log-normal) or heavytailed (t) residual distributions and an extreme sample of 20% (Normal residual distribution also plotted for reference).
Figure 9: Bias of βY,ODEB under skewed (log-normal) or heavy-tailed (t) residual distributions and an extreme sample of 20% (Normal residual distribution also plotted for reference).

Table 1 :
Summaries from univariable reverse-regression based extreme sampling estimation and inference: Worst pain at 48 hours regressed on log-scale biomarkers.

Table 2 :
Summaries from univariable reverse-regression based extreme sampling estimation and inference: Change in worst pain from 48 hours to 96 hours regressed on log-scale biomarkers.