On Regularisation Methods for Analysis of High Dimensional Data
 157 Downloads
Abstract
High dimensional data are rapidly growing in many domains due to the development of technological advances which helps collect data with a large number of variables to better understand a given phenomenon of interest. Particular examples appear in genomics, fMRI data analysis, largescale healthcare analytics, text/image analysis and astronomy. In the last two decades regularisation approaches have become the methods of choice for analysing such high dimensional data. This paper aims to study the performance of regularisation methods, including the recently proposed method called debiased lasso, for the analysis of high dimensional data under different sparse and nonsparse situations. Our investigation concerns prediction, parameter estimation and variable selection. We particularly study the effects of correlated variables, covariate location and effect size which have not been well investigated. We find that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations. The latter may be seen under a nonsparse data structure. We demonstrate that the debiased lasso performs well especially in low dimensional data, however it still suffers from issues, such as multicollinearity and multiple hypothesis testing, similar to the classical regression methods.
Keywords
Debiased lasso High dimensional data Lasso Linear regression model Regularisation Sparsity1 Introduction
1.1 Background and Importance
“High dimensional” refers to the situations where the number of covariates or predictors is much larger than the number of data points (i.e., \(p\gg n\)). Such situations happen in many domains nowadays where the rapid development of technological advances helps collect a large number of variables to better understand a given phenomenon of interest. Examples occur in genomics, fMRI data analysis, largescale healthcare analytics, text/image analysis and astronomy, to name but a few.
In the last two decades regularisation approaches such as lasso, elastic net and ridge regression have become the methods of choice for analysing such high dimensional data. Much work has been done since the introduction of regularisation in tackling high dimensional linear regression problems. Regularisation methods especially lasso and ridge regression [10, 31, 40] have been applied to many applications in different disciplines [1, 15, 23, 26]. The theory behind regularisation methods often relies on the sparsity assumptions to achieve theoretical guarantees in their performance, ideally when dealing with high dimensional data. The performance of regularisation methods has been studied by many researchers, however conditions other than sparsity, such as the effects of correlated variables, covariate location and effect size have not been well understood. We investigate this in high dimensional linear regression models under sparse and nonsparse situations.
We assume no prior knowledge on \(\beta \). It is wellknown that the ordinary least square (OLS) solution for estimating \(\beta \) is \(\hat{\beta }^{\text {OLS}} = (X^{T}X)^{1}X^{T}y\) [10]. However, when \(p > n\), X is no longer full rank, and the OLS results in infinitely many solutions, leading to overfitting in the high dimensional case [14]. This kind of illposed problems arises in many applications as discussed above. Regularisation methods that impose penalty on the number of unknown parameters \(\beta \) is therefore a general and popular way to overcome the issue of illposed problems.
Issues due to the curse of dimensionality become apparent in the case of \(p \gg n\). A particular example occurs in fMRI image analysis, where selection from a large number of brain regions could lead to insensitive models on top of overfitting [30]. Also, numerical results from utilising regularisation methods in high dimensional data are unsatisfactory in terms of identifying one that performs the best most of the time [7]. To tackle these issues, sparsity assumption utilises the idea of “less is more” [14], referring to the phenomenon that an underlying data structure can mostly be explained by few out of many features. Such assumption would help some regularisation methods to at least achieve consistent variable selection even when \(p \gg n\) [18].
 i.
The effects of data correlation on the performance of common regularisation methods.
 ii.
The effects of covariate location on the performance of common regularisation methods.
 iii.
The impact of effect size on the performance of common regularisation methods.
 iv.
The performance of the recently developed debiased lasso [33, 37] in comparison to the common regularisation methods.
1.2 Related Work
Lasso and ridge regression, which use \(L_{1}\) and \(L_{2}\) penalties respectively (see Sect. 2.1), are the two most common regularisation methods. Many novel methods have been built upon them. For example, Zou and Hastie [40] developed the elastic net that uses a combination of these two penalties. The elastic net is particularly effective in tackling multicollinearity, and it can generally outperform both lasso and ridge regression under such situation. The study on elastic net had relatively low dimensions with the sample size larger than the number of covariates [40]. Moreover, the number of covariates associated with truly nonzero coefficients was smaller than the sample size. Studies with similar kinds of settings were also used when developing other novel methods [25, 36, 39]. Other new approaches with variations of standard techniques have also been investigated [13, 22, 26]. Also, reducing bias of estimators such as the lasso estimator is recently used to tackle issues of 0 standard errors and biased estimates [2, 37]. Beforehand, a method called the biascorrected ridge regression utilised the idea of bias reduction by projecting each feature column to the space of their compliment columns to achieve, with Gaussian noise, asymptotic normality for a projected ridge regression estimator under a fixed design [4]. Regularisation methods were also evaluated in other situations with classification purposes [1, 12, 13, 23, 24, 26, 35].
Statistical inference such as hypothesis testing with regularisation methods was difficult for a long time due to the mathematical limitations and the highly biased estimators in high dimensional models. Obenchain [27] argue that inference with biased estimators could be misleading when they are far away from their least squares region. The asymptotic theory has also shown that the lasso estimates can be 0 when the true values are indeed 0 [21], which can explain why the bootstrap with lasso estimators can lead to a 0 standard error [31]. Park and Casella [28] developed a Bayesian approach to construct confidence intervals for lasso estimates as well as its hyperparameters. They considered a Laplace prior for the unknown parameters \(\beta \) in the regression model (1) conditional on unknown variance of an independent and identically distributed Gaussian noise, leading to conditional normality for y and \(\beta \). However, they did not account for the presence of bias in parameter estimators when using regularisation methods. The recent debiased lasso (see Sect. 2.2) instead reduces the bias of lasso and enables to make statistical inference about a low dimensional parameter in the regression model (1). It is unknown whether or not the debiased lasso can outperform the lasso and other regularisation methods when increasing the data dimension in sparse and nonsparse situations.
More recently, [6] conducted a theoretical study on the prediction performance of the lasso. Their main finding was that the incorporation of a correlation measure into the tuning parameter could lead to a nearly optimal prediction performance of the lasso. Also, [29] proposed the spikeandslab lasso procedure for variable selection and parameter estimation in linear regression.
2 Regularisation Methods in High Dimensional Regression
2.1 Regularisation with a More General Penalty
Two special cases of (2) are the lasso with \(L_1\) penalty (i.e., \(q = 1\)) and the ridge regression with \(L_2\) penalty (i.e., \(q = 2\)). Also, the subset selection emerges as \(q \rightarrow 0\), and the lasso uses the smallest value of q (i.e., closest to subset selection) that yields a convex problem. Convexity is very beneficial for computational purposes [32].
Since the lasso provides a sparse solution (i.e., the number of nonzero parameter estimates are smaller than the sample size n) [31, 32], lasso regression requires the sparsity assumption, that is, many of the covariates are assumed to be unrelated to the response variable. It is appealing to be able to identify, out of a large number of predictors, a handful of them that are main contributions to some desired predictions, particularly in genomewide association studies (GWAS) [2, 5, 35]. This leads to parsimonious models from which the selected variables can be further examined, as well as greatly reducing subsequent computational costs in predictions.
A limitation of the lasso is that when there are contributing variables in a correlated group, lasso tends to select only few of them in a random manner. Yuan and Lin [36] proposed the group lasso method for performing variable selection on groups of variables to overcome the issue, given prior knowledge of the underlying data structure. Also, the choice of \(\lambda \) in lasso may not satisfy the oracle properties, which can lead to inconsistent selection results in high dimensions [9, 39]. As discussed in the introduction, Zou [39] developed the adaptive lasso to allow weighted \(L_{1}\) penalty on individual coefficients, and showed that the new penalty satisfies the oracle properties. Further issues arise with high dimensional data, which can be summarised as curse of dimensionality. Roughly speaking, curse of dimensionality is a phenomenon at which ordinary approaches (in this case regularisation approaches) to a statistical problem are no longer reliable when the associated dimension is drastically high. We particularly investigate this in Sect. 3.
2.2 The DeBiased Lasso
The debiased lasso is feasible when there is a good approximation of \(\overset{\sim }{G}\). To do so, a method called the lasso for nodewise regression has been suggested. Details regarding the nodewise regression and the theoretical results of asymptotic normality for debiased lasso can be found in [33]. Note that the residual term R is asymptotically negligible under some additional sparsity conditions when approximating \(\overset{\sim }{G}\).
It is shown that the debiased lasso is very effective in making statistical inference about a low dimensional parameter when the sparsity assumption holds [33, 37]. In Sect. 5, we investigate whether or not the debiased lasso can outperform the lasso and other regularisation methods when increasing the data dimension in sparse situations.
3 Curse of Dimensionality with a FixedEffect Gaussian Design
The number of selected variables by lasso for different values of p in the first 10 trials
p  Trial  

1  2  3  4  5  6  7  8  9  10  
100  64  71  59  67  73  68  72  72  67  72 
200  88  92  104  59  73  109  73  92  51  80 
300  51  61  74  70  93  30  68  54  46  23 
400  45  54  32  59  53  28  49  40  34  37 
500  105  41  85  94  60  83  91  82  122  93 
1000  45  44  0  36  35  29  58  19  9  10 
2000  57  104  61  76  75  83  55  52  47  89 
3000  1  0  66  0  107  6  8  48  115  4 
4000  44  85  80  27  30  1  1  3  3  22 
5000  100  48  90  53  36  6  27  29  61  34 
The results, presented in Fig. 1, show that the average identification rate tends to decrease when the dimension p increases. In other words, the number of selected variables that are correctly identified over the number of selected variables decreases as p gets larger. Since statistical inference is not feasible with lasso regression [21] as already discussed, one may rely on the fitted model which associates with the smallest prediction mean square error (MSE). From Figure 1, it can be seen that the proportion of nonzero variables correctly identified by lasso regression over those which are selected essentially decreases from 100% to 18.2% when moving from \(p = 100\) towards \(p = 5000\). This is also reflected by the increasing MSE as shown in Fig. 2. Table 1 provides more details regarding the first 10 trials where we can see that the number of variables selected by lasso regression seems to be consistent from \(p = 100\) to \(p = 500\), but deteriorates as p becomes much larger. All these suggest feature selection inconsistency of lasso regression in high dimensional situations with very large p.
4 Performance of Regularisation Methods in Sparse and Nonsparse High Dimensional Data
In this section, we investigate the performance of three common regularisation methods (lasso, ridge regression and elastic net) in estimation, prediction and variable selection for high dimensional data under different sparse and nonsparse situations. In particular, we study the performance of these regularisation methods when data correlation, covariate location and effect size are taken into account in the high dimensional linear model (1), as explained in the sequel.

\(\hat{\beta }\in {\mathbb {R}}^{p}\): estimator of coefficients in the fitted model,

\(\hat{y} = X\hat{\beta }\in {\mathbb {R}}^{n}\): vector of predicted values using the fitted model,

\(S_{0}\): active set of variables of the true underlying model,

\(S_{\text {final}}\): active set of the fitted model,

\(S_{\text {small}}\subset S_{0}\): active subset of variables with small contributions in the true underlying model,
In a sparse situation, where the underlying data structure is truly sparse, we choose \(n = 150\), \(p = 10{,}000\), and \(p^{*} = 200\). Recall that \(p^{*}>n\) means it is impossible to identify all of the important features without overfitting. Given p can be much larger with the same n in practice [2], such identification may not be possible even in a sparse situation, thus \(\beta ^{0}_{j}\) is set to be 0 for any \(j = n+1,n+2,\ldots ,p\) unless stated otherwise. In a nonsparse situation, we change \(p^{*}\) to 1000. In this case, we either use \(\varSigma _{X} = V_{5}\) or \(\varSigma _{X} = V_{20}\) to account for data correlation.
Each simulation is repeated 100 times and the average values are calculated for each performance measure in (6). We use the Rpackage glmnet for implementing the regularisation methods considered here.
4.1 Data Correlation
Complete results of simulation I
Case  Indices j: \(\beta ^{0}_{j}\) is generated from N(0,1)  Indices j: \(\beta ^{0}_{j}\) is generated from N(0,0.1)  Choice of \(\varSigma _{X}\)  

Setup for simulation study I with a sparse situation  
1  1–200  NA  \(I_{p\times p}\)  
2  101–200, 701–800  NA  \(V_{3}\)  
3  101–170, 351–420, 701–760  NA  \(V_{3}\)  
4  101–200  201–300  \(I_{p\times p}\)  
5  251–350  551–650  \(V_{3}\)  
6  291–310  401–580  \(V_{3}\)  
7  201–380  591–610  \(V_{3}\) 
Case  Lasso  Ridge  Elastic Net (0.5)  PCR 

Average MSE summary of simulation I  
1  203.0375  193.0763  200.8516  194.7625 
2  46.4572  58.9701  44.8423  53.0762 
3  49.8918  61.0208  49.3757  53.6081 
4  92.7584  92.2995  91.3694  92.7680 
5  21.6402  27.3076  21.5200  25.1636 
6  4.3967  6.6928  4.6757  5.3609 
7  38.9556  44.9188  37.9353  40.4747 
Average MAB summary of simulation I  
1  1.1797  0.7910  1.2846  0.7910 
2  0.9208  0.8114  0.8986  0.8118 
3  0.8537  0.7918  0.8890  0.7922 
4  1.3710  0.4368  1.3741  0.4368 
5  0.8510  0.4405  0.8764  0.4409 
6  0.6222  0.1513  0.6144  0.1514 
7  0.8116  0.7217  0.9442  0.7221 
Case  Lasso  Elastic Net (0.5)  

Average Power (P) summary of simulation I  
1  0.0112  0.0153  
2  0.0633  0.0880  
3  0.0598  0.0835  
4  0.0143  0.0187  
5  0.0493  0.0705  
6  0.0363  0.0528  
7  0.0513  0.0768  
Average Small Power ( \(P_{small}\) ) summary of simulation I  
4  0.0023  0.0033  
5  0.0133  0.0200  
6  0.0150  0.0298  
7  0.0083  0.0133 
Complete results of simulation III
Case  Indices j: \(\beta ^{0}_{j}\) is generated from N(0,1)  Indices j: \(\beta ^{0}_{j}\) is generated from N(0,0.1)  Choice of \(\varSigma _{X}\)  

Setup for simulation III with a nonsparse situation  
1  1–1000  NA  \(I_{p\times p}\)  
2  51–250, 351–550, 651–850, 951–1150, 1251–1450  NA  \(V_{5}\)  
3  1–500, 801–1300  NA  \(V_{5}\)  
4  1–500, 801–1300  NA  \(V_{20}\)  
5  1–500  501–1000  \(I_{p\times p}\)  
6  1–500  501–1000  \(V_{20}\)  
7  101–150  401–1350  \(V_{20}\)  
8  101–1050  1401–1450  \(V_{20}\) 
Case  Lasso  Ridge  Elastic Net (0.5)  PCR 

Average MSE summary of simulation III  
1  1107.7110  1057.6660  1091.0470  1069.1520 
2  294.8247  294.3531  290.2807  278.7965 
3  259.0461  259.3874  255.1258  242.4962 
4  265.4878  262.2610  256.5398  249.1404 
5  522.7113  507.2834  517.5554  511.7361 
6  122.7950  138.4813  121.0770  132.2491 
7  14.9461  18.3394  14.9496  16.2734 
8  230.4608  240.7334  226.3334  235.7741 
Average MAB summary of simulation III  
1  1.3913  0.7954  0.9473  0.7957 
2  1.1660  0.8035  0.9259  0.8039 
3  1.1698  0.7996  0.8868  0.8001 
4  1.1487  0.7958  0.9016  0.7960 
5  0.7533  0.4345  0.7058  0.4346 
6  0.9142  0.4385  0.7920  0.4388 
7  0.4784  0.1156  0.4433  0.1157 
8  1.1093  0.7633  0.8896  0.7639 
Case  Lasso  Elastic Net (0.5)  

Average Power (P) summary of simulation III  
1  0.0022  0.0021  
2  0.0228  0.0339  
3  0.0240  0.0372  
4  0.0272  0.0399  
5  0.0018  0.0035  
6  0.0244  0.0360  
7  0.0145  0.0202  
8  0.0272  0.0401  
Average Small Power ( \(P_{small}\) ) summary of simulation III  
Case  lasso  Elastic Net (0.5)  
5  0.0015  0.0025  
6  0.0076  0.0121  
7  0.0092  0.0134  
8  0.0013  0.0020 
Regarding the parameter estimation accuracy, the simulation results in Figs. 6 and 7 indicate that parameter estimation by ridge regression and PCR are largely unaffected by correlated data, with both having smaller average MAB compared to the lasso and elastic net, and this performance is mainly because of their dense solutions.
Regarding the variable selection performance, Fig. 7 shows that elastic net performs better than lasso in the case of correlated data, which can be justified by the presence of the grouping effect. Note that the data correlation associated with nuisance and less important variables seems to have little effect on our results compared to the data correlation associated with important variables.
Without data correlation, lasso and elastic net have prediction performances similar to ridge regression and PCR under the sparse situation (see the results for case 1 in Table 2). This is probably because of the identification of important covariates being limited by sample size and high dimensionality, causing difficulty for the lasso and elastic net to outperform the ridge regression and PCR. Under the nonsparse situation, lasso and elastic net performed even worse in prediction compared to ridge regression and PCR (see the results for case 1 in Table 3).
Overall, when important covariates are associated with correlated data, our results showed that the prediction performance is improved across all these four methods under both sparse and nonsparse situations, and that the prediction performance flipped to favour the lasso and elastic net over the ridge regression and PCR.
4.2 Covariate Location
4.3 Effect Size
Given the same number of important covariates, our simulation results (see cases 1 and 4 in Table 2 and cases 1 and 5 in Table 3) suggest that having a smaller overall effect size helps prediction and parameter estimation performances across all the methods. This is reasonable since the magnitude of errors is smaller in exchange of harder detection of covariates, having small contributions to the predictions.
With data correlation, our results also reveal that the overall effect size could alter our perception of underlying data structures in the nonsparse situation. Figure 9 shows the performance barplots for all the four methods when there were 1000 important covariates, 950 of which belonging to small effect size. Compared to Figs. 7 and 8 that both of which had 1000 important covariates of similar effect sizes, Fig. 9 indicates that the lasso and elastic net tend to perform better than the ridge regression and PCR in terms of prediction accuracy in this situation. This is probably because selecting some of those 50 important features associated with large effects is sufficient to explain the majority of the effects behind, which masks those associated with small effects. Other than the sparsity level of important covariates, overall covariate effect size seems to also change the indication of whether an underlying data structure is sparse via observing prediction performances, especially in a nonsparse situation.
5 Performance of the DeBiased Lasso
Similar to the lasso, sparsity assumptions play a major role in justifying the use of debiased lasso. In this section, we evaluate the performance of the debiased lasso in prediction, parameter estimation and variable selection, and compare the results with the other methods considered in the previous section. We are particularly interested in understanding how this recently developed method performs when the data dimension p increases, so we can provide a rough idea of its practicality to emerging challenges in big data analysis.
Complete results of simulation IV
Case  Dimension p  Number of truly nonzero coefficients \(p^{*}\)  Indices \(j : \beta ^{0}_{j}\) is generated from N(0,1)  Indices \(j : \beta ^{0}_{j}\) is generated from N(0,0.1)  Choice of \(\varSigma _{X}\) 

Setup information for simulation IV  
1  50  5  1–5  NA  \(I_{p\times p}\) 
2  100  10  1–10  NA  \(I_{p\times p}\) 
3  600  60  1–60  NA  \(I_{p\times p}\) 
4  600  60  1–60  NA  \(V_{2}\) 
5  600  60  1–50  401–410  \(V_{2}\) 
Case  lasso  Ridge  Elastic Net (0.5)  PCR  Debiased lasso 

Average MSE summary of Simulation IV  
1  1.0286  3.8180  1.1865  2.9341  0.8026 
2  4.9766  9.9611  5.1586  9.7301  4.7964 
3  60.0761  55.1843  58.4472  59.0408  59.1018 
4  16.6779  17.4790  17.5231  14.5093  42.8449 
5  11.6343  9.4145  9.8804  8.2314  23.1490 
Average MAB summary of Simulation IV  
1  0.3150  0.7965  0.3936  0.4798  0.0842 
2  0.5335  0.7704  0.5786  0.6290  0.2760 
3  0.8880  0.7709  1.0084  0.7655  NA 
4  0.8551  0.7785  1.0063  0.7709  2.7113 
5  1.2956  0.7114  1.0709  0.7036  1.9395 
Case  lasso  Elastic Net (0.5)  Debiased lasso  

Average power (P) summary of Simulation IV  
1  0.74  0.78  0.72  
2  0.52  0.48  0.24  
3  0.0233  0.03  0  
4  0.0533  0.0833  0.0017  
5  0.0383  0.0650  0.0083  
Average small power ( \(P_{small}\) ) summary of Simulation IV  
5  0  0.01  0 
6 Real Data Example
The prediction results from applying all the methods to the riboflavin data
Lasso  Ridge regression  Elastic net  PCR  Debiased lasso 

The average MSE on the test data  
0.2946  0.3953  0.3331  0.3493  0.3278 
We applied the debiased lasso and each of the other methods to the riboflavin data 100 times through different random partitions of training and testing sets, and compared their prediction performance using the average MSE as in (6). The prediction results are shown in Table 5. The results indicate that while PCR performed as good if not better than ridge regression, the lasso and elastic net had smaller average MSE than both ridge regression and PCR. The MSE of debiased lasso is similar to the elastic net and lasso. The potential correlation between genes helps elastic net and lasso to perform better in prediction, which is consistent with our simulation results in the previous sections. Also, according to our simulation findings in Sects. 4 and 5, it seems the underlying structure of the riboflavin dataset is sparse in the sense that among all the unknown covariates, which contribute to the production rate of vitamin \(B_{2}\), only a few of them have relatively large effects. We emphasise that this does not necessarily indicate sparsity on the number of important covariates compared to the data dimension. The riboflavin dataset has been recently analysed for statistical inference purposes such as constructing confidence intervals and hypothesis tests by some researchers including [3, 7, 17].
7 Conclusions and Discussion

When important covariates are associated with correlated variables, the simulation results showed that the prediction performance improves across all the methods considered in the simulations, for both sparse and nonsparse high dimensional data. The prediction performance flipped to favour the lasso and elastic net over the ridge regression and PCR.

When the correlated variables are associated with nuisance and less important variables, we observed that the prediction performance is generally unaffected across all the methods compared to the situation when the data correlation is associated with important variables.

In the presence of correlated variables, the parameter estimation performance of the ridge regression, elastic net and PCR was not affected, but the lasso showed a poorer parameter estimation when moving from sparse data to nonsparse data.

The variable selection performance of the elastic net was better than the lasso in the presence of correlated data.

Regarding the effects of the covariate location, we found that important variables being more scattered among groups of correlated data tend to result in better prediction performances. Such behaviour was more obvious for nonsparse data. The lasso tends to randomly select covariates in a group of correlated data, so it is less likely to select nuisance covariates when most of them are important in such group, thus improving prediction and variable selection performances.

Unlike in prediction and variable selection, the impact of covariate location was very small on the parameter estimation performance across all the methods.

Given the same number of important covariates, the simulation results showed that having a smaller overall effect size helps the prediction and parameter estimation performances across all the methods. The simulation results indicated that the lasso and elastic net tend to perform better than the ridge regression and PCR in terms of prediction accuracy in such situation. In the presence of data correlation, the overall effect size could change our indication of whether an underlying data structure is sparse via observing prediction performances, especially in the nonsparse situations.

For the debiased lasso, the simulation results showed that the debiased lasso outperforms all the other methods in terms of prediction and parameter estimation in low dimensional sparse situations with uncorrelated data. When the data dimension p increases, the prediction by debiased lasso is as good as the lasso and elastic net, however the debiased lasso no longer identifies any important covariates when the dimension p is very large. The results also showed that inducing correlated data seems to help debiased lasso identify important covariates when p is very large, however its performance in prediction and parameter estimation is no longer comparable to the other methods in the presence of correlated data.
We also observed that the curse of dimensionality can yield inconsistent and undesirable feature selection in high dimensional regression. The choice of shrinkage parameter \(\lambda \) during the crossvalidation process was found to be crucial. For high dimensional data, the error bars were too large for every crossvalidated value of \(\lambda \) and it was mainly due to the lack of sufficient observations compared to the number of covariates (\(p \gg n\)).
Finally, the debiased lasso can be used in a similar fashion as the OLS, but in illposed low dimensional problems. It therefore suffers from multicollinearity as well as the issue of too many hypothesis tests in high dimensional data [3, 7, 33]. With many procedures available to tackle issues from multiple hypothesis testing, a more accurate estimation procedure would be helpful when applying the debiased lasso to high dimensional data. It will be very useful to conduct research on how the debiased lasso combined with bootstrap [8] performs in high dimensional data under the above three conditions.
Notes
References
 1.Ayers KL, Cordell HJ (2010) Snp selection in genomewide and candidate gene studies via penalized logistic regression. Genet Epidemiol 34(8):879–891CrossRefGoogle Scholar
 2.Bühlmann P (2017) Highdimensional statistics, with applications to genomewide association studies. EMS Surv Math Sci 4(1):45–75CrossRefGoogle Scholar
 3.Bühlmann P, Kalisch M, Meier L (2014) Highdimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1:255–278Google Scholar
 4.Bühlmann P et al (2013) Statistical significance in highdimensional linear models. Bernoulli 19(4):1212–1242CrossRefGoogle Scholar
 5.Cantor RM, Lange K, Sinsheimer JS (2010) Prioritizing gwas results: a review of statistical methods and recommendations for their application. Am J Hum Genet 86(1):6–22CrossRefGoogle Scholar
 6.Dalalyan AS, Hebiri M, Lederer J (2017) On the prediction performance of the lasso. Bernoulli 23(1):552–581CrossRefGoogle Scholar
 7.Dezeure R, Bühlmann P, Meier L, Meinshausen N et al (2015) Highdimensional inference: confidence intervals, \( p \)values and rsoftware hdi. Stat Sci 30(4):533–558CrossRefGoogle Scholar
 8.Dezeure R, Bühlmann P, Zhang CH (2017) Highdimensional simultaneous inference with the bootstrap. TEST 26(4):685–719CrossRefGoogle Scholar
 9.Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360CrossRefGoogle Scholar
 10.Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New YorkGoogle Scholar
 11.Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRefGoogle Scholar
 12.García C, García J, López Martín M, Salmerón R (2015) Collinearity: Revisiting the variance inflation factor in ridge regression. J Appl Stat 42(3):648–661CrossRefGoogle Scholar
 13.Guo Y, Hastie T, Tibshirani R (2006) Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8(1):86–100CrossRefGoogle Scholar
 14.Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca RatonCrossRefGoogle Scholar
 15.Hoerl AE, Kennard RW (1970) Ridge regression: applications to nonorthogonal problems. Technometrics 12(1):69–82CrossRefGoogle Scholar
 16.Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70Google Scholar
 17.Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for highdimensional regression. J Mach Learn Res 15(1):2869–2909Google Scholar
 18.Jia J, Yu B (2010) On model selection consistency of the elastic net when p \(>>\) n. Stat Sinica 20:595–611Google Scholar
 19.Jolliffe IT (1982) A note on the use of principal components in regression. Appl Stat 31:300–303CrossRefGoogle Scholar
 20.Kendall M (1957) A course in multivariate statistics. Griffin, LondonGoogle Scholar
 21.Knight K, Fu W (2000) Asymptotics for lassotype estimators. Ann Stat 28:1356–1378CrossRefGoogle Scholar
 22.Ma S, Dai Y (2011) Principal component analysis based methods in bioinformatics studies. Brief Bioinform 12(6):714–722CrossRefGoogle Scholar
 23.Malo N, Libiger O, Schork NJ (2008) Accommodating linkage disequilibrium in geneticassociation analyses via ridge regression. Am J Hum Genet 82(2):375–385CrossRefGoogle Scholar
 24.Marafino BJ, Boscardin WJ, Dudley RA (2015) Efficient and sparse feature selection for biomedical text classification via the elastic net: application to icu risk stratification from nursing notes. J Biomed Inform 54:114–120CrossRefGoogle Scholar
 25.Meinshausen N (2007) Relaxed lasso. Comput Stat Data Anal 52(1):374–393CrossRefGoogle Scholar
 26.Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint l2, 1norms minimization. In: Advances in neural information processing systems, pp 1813–1821Google Scholar
 27.Obenchain R (1977) Classical ftests and confidence regions for ridge regression. Technometrics 19(4):429–439CrossRefGoogle Scholar
 28.Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686CrossRefGoogle Scholar
 29.Ročková V, George EI (2018) The spikeandslab lasso. J Am Stat Assoc 113(521):431–444CrossRefGoogle Scholar
 30.Ryali S, Chen T, Supekar K, Menon V (2012) Estimation of functional connectivity in fMRI data using stability selectionbased sparse partial correlation with elastic net penalty. NeuroImage 59(4):3852–3861CrossRefGoogle Scholar
 31.Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288Google Scholar
 32.Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B 73(3):273–282CrossRefGoogle Scholar
 33.Van de Geer S, Bühlmann P, Ritov Y, Dezeure R et al (2014) On asymptotically optimal confidence regions and tests for highdimensional models. Ann Stat 42(3):1166–1202CrossRefGoogle Scholar
 34.Mevik BH, Wehrens R (2007) The pls package: principal component and partial least squares regression in R. J Stat Soft 18:1–24Google Scholar
 35.Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genomewide association analysis by lasso penalized logistic regression. Bioinformatics 25(6):714–721CrossRefGoogle Scholar
 36.Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67CrossRefGoogle Scholar
 37.Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B 76(1):217–242CrossRefGoogle Scholar
 38.Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(Nov):2541–2563Google Scholar
 39.Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429CrossRefGoogle Scholar
 40.Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.