Penalized Partial Least Square applied to structured data

Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square.


Introduction
Since past years data analysis applied to high dimension in all domains has arisen [21].Extracting information from ever larger data has become a trend in numerous fields and a large number of observation need to be gathered to evaluate statistical models.When data are hard to retrieve, gathering existing data sets is an efficient way for assembling data of high dimension.However, this technique has its drawbacks: existing independent data sets can present intrinsic bias which can decrease the performance of the models used.
Those biases imply an unwanted underlying structure that will interfere with the signal that we want to find.Bias can come from a difference in the source of information or the process used during the recollection of the data.This set structure has to be taken into account to improve the efficiency of the models.For instance, in genomics, data can be gathered from different studies because of the cost of the experimentation.Each clinical study may have been performed with its own chemistry protocol, with its own experimental material and on its specific populations, and bias can arise among the different data sets obtained.This "batch effect" is known and can significantly decrease the power of the analysis [5].Another bias can occur in particular analysis, where different "dynamics" exist between the studies: a predictor can be highly correlated with independent variable, but the direction of the correlation depends on the study.For instance, pleiotropy [11] is a field of genetics, where a gene (predictor) can have a particular effect on different phenotypes (independent variables).Data can be gathered from different studies, where the nature of the phenotype differs.Therefore, a gene can be highly correlated with each phenotype, but an overall model struggles to catch the particularity of those effects.
In the article, we tackle the problem of "batch effect" for dimension reduction such as Partial Least Square (PLS) method introduced by Wold [7].Common dimension reduction techniques are Canonical Correlation Analysis (CCA) [14], Principal Component Analysis (PCA) [12], and PLS [1].All these methods rely on the projection of the data into a subspace of lower dimension which represents most of the variation of the data.They are often posed as an eigen value problem [3].PLS and CCA are both analysis two blocks of data and differ from the norm used, whereas PCA analyses one block.Aiming to apply our method to supervised analysis, PLS approach is considered in this article.
In these dimension reduction techniques, results are formulated with new variables that are linear combination of the original ones.These combinations can be hard to interpret due to the huge number of coefficient that they represent.To answer this problem, Lasso methods have been used.Introducing this penalization shrink to zero the participation to the model of the least relevant variables.Results highlight a smaller number of variable that are easier to explain.In addition, noise of the signal is reduced and the power of the methods is boosted.These are are called sparse method and have been developed for linear regression [16,23], CCA [19], PCA [24], and Partial Least Square (sPLS).The sparse PLS (sPLS) has shown encouraging results [2,8] and is the object of analysis in this article.The PLS and sPLS methods have also been used to control "the batch effect" when related studies are combining to increase sample size combining independent but related studies ( [4,13]).In particular, combining sPLS separating models and linking them can be an option like in the Multivariate INTegrative method (MINT) proposed in [13].However, this approach cannot identify the true signal in the presence of different dynamics.
For high-dimensional regression problems, using problem-specific prior information improves the accuracy of the prediction and the interpretability of the model [10].For example, in genomics, genes within the same pathway have similar functions and act together in regulating a biological system.Incorporation of this grouping structure is becoming increasingly common due to the success of gene set enrichment analysis approaches [17].Using a model taking into account, this variable group structure allows to improve the performance and the readability of the results.To this end, sparse group Partial Least Square (sgPLS) has been developed [9], where two overlaid Lasso penalizations translate the group structure in the Partial Least Square formulation.A structure with group and sub-groups can also be handle by its generalization with three overlaid Lasso penalizations (sgsPLS [18]).Methods such as MINT do not take into account this kind of group structure.
In this article, we consider data that are composed of independent observation sets.The observation sets are assumed to be known and are expected to introduce bias in the data.The presented methods allow us to use the information about the edification of the data set to improve the performance of the analysis.Although this theory has been developed with the aim to answer a problem occurring in genomic public data sets, it can be applied to any field, where a certain observation set structure exists.Different methods using Lasso penalization on data structured toward observation sets are discussed.In particular, a "penalized PLS for structured data" is defined, where separate PLS model is linked together with a common-Lasso penalization.In the end, variables selected by the model are the same for all observation sets, but the underlying model computes separated models for each observation set, giving both readability and flexibility to the model.We present the theoretical background for this method.Especially, we can show that the common-Lasso constraint that is used (i.e., a penalization across studies) can be be written as a standard Lasso with an overlaid group structure in an equivalent formulation of the PLS problems.We extend also this idea of common-Lasso constraint to a case, where an a priori structure is known, where the variables are gathered into groups.
Fig. 1 Illustration of data structured by group of observation.Observations are assumed to be ordered by observation set its rows are noted A (i,•) , and for j ∈ {1, . . ., b}, its columns are noted A (•, j) , and for subsets ã ⊂ {1, . . ., a} and b ⊂ {1, . . ., b} resp., row and column sub-matrices are noted A ( ã,•) and A (•, b) .For any vector ω of size a, for i ∈ {1, . . ., a}, its elements are noted ω (i) and for subsets ã ⊂ {1, . . ., a} ω ( ã) represents the elements of the vector corresponding to the positions in the subset.Matrices will always be in uppercase letters and vectors in lowercase letters to avoid any confusion.
The Frobenius norm on matrices is denoted F .We note X T the transpose matrix of X .The cardinal of a set S is noted #S.The positive value of a real number x is noted (x) + = |x|+x 2 .

Data with observation sets
Some data may present a structure among the observations gathered around groups of observations.For instance, data can be composed of different studies, each one presenting its own mechanisms and bias.Let us consider M different sets in the data.Noting, for m ∈ N, M m a subset of {1, . . ., n}, let M = (M m ) m=1..M be a partition of {1, ..., n} corresponding to the observation sets.We note #M m = n m .Row blocks are defined by these partitions in Fig. 1 (observations are assumed to be ordered by observation set).

Data with group of variables
Some data may present a structure among the variable gathered around groups.Let us consider that the variables are gather in K groups.Let P = (P k ) k=1..K be a partition of {1, ..., p} corresponding to this variable group structure.We note #P k = p k .We then have K k=1 p k = p.This partition can define column blocks among the variables if variables are assumed to be ordered by variable group.Both observation set structure and variable group structure can be defined at the same time like in Fig. 2.

Formulation of the sparse Partial Least Square
In the literature, two formulations of the Partial Least Square exist, some extensions of the PLS follow a first one usually called PLS1 [22] and other extensions follow a second one called "PLS2" [2].In the context of the article, we study exclusively the first one.Let X be a predictor matrix of size (n, p) and Y a matrix of independent variables of size (n, q).PLS finds successively couples of vector {u 1 , v 1 }, . . ., {u r , v r } for r < min( p, q, r ), where the couples are composed of vectors of length resp.p and q, maximizing Cov(Xu i , Y v i ) for any i ∈ {1, . . ., r }, under the constraint that the family of vectors u 1 , . . ., u r and v 1 , . . ., . . ., v r are both of them orthogonal families [7].It can be solved considering successive minimization problems [15], for h ∈ {1, . . ., r }: and Y h−2 for h ∈ {2, . . ., n}.The deflation depends on the PLS mode that is chosen [7,20].In this article, we focus on the enhancement of the optimization problem and its Lasso formulation in its hth step.According to [15], this step can be written as Lasso Penalty term for sparse PLS where the notation h is removed to simplify the formulation, because we are interested in only one of the r steps of the PLS.
The sparse PLS introduces a penalization in this formulation of the problem.The penalty P(•) forces the lowest values of u to be set to zero.The parameter controlling the degree of sparsity in the model is λ.In the presented formula, the sparsity is applied only to the vector u, but a similar penalization can be define for v.In the context of this article, we treat only the penalization of u, but all the results stand also for a v penalization.The following sections compare different ways of writing the sPLS optimization problem presented in Eq. ( 1) taking into account an observation or/and variable set structure.
Remark Before analysis, the X and Y matrices are transformed by subtracting their column averages.Scaling each column by their mean and standard deviation is also often recommended [6].Thus, the cross-product matrix X T Y is proportional to the empirical covariances between X-and Y -variables when the columns of X and Y are centered.When the columns are standardized, X T Y is proportional to the empirical correlations between X-and Y -variables.In this article, the standardization is an important step to overcome the issue of the "batch effect" or to aggregate observations from different studies.

Formulation of the penalized PLS
Six different formulations of the sPLS have been presented.The first four correspond to data presenting an observation set structure like in Fig. 1.The two last correspond to data presenting an observation set structure and a variable group structure like in Fig. 2 which correspond to sgPLS models (see [9]).We can note that problem 5 is a particular case of Fig. 2, where there is only one observation set (M = 1).Loading vectors introduced in those figures refer to vectors formulated in the following problems.The study of problems 4 and 6 is the main contribution of the article.
-Problem 1 (standard sPLS): This approach consists in simply considering all the observation set as one set.Data are standardized across all the sets, i.e., X and Y are standardized.The formulation is a standard sPLS problem: In the model, the loading u is composed of p elements and the loading v is composed of q elements.The sparsity of u is controlled by the parameter λ: for a given λ, s λ elements of u will be non-zero.-Problem 2 (MINT): Introduced in [13], this approach consists in considering M different sPLS problems corresponding to each of the M observation sets.Data are standardized within each observation set, i.e., for every m ∈ {1, . . ., M}, X (M m ,•) and Y (M m ,•) are standardized instead of X and Y .The sPLS problem is the same than in the previous problem in Eq. ( 2).
In the model, the loading u is composed of p elements and the loading v is composed of q elements.The sparsity of u is controlled by the parameter λ: for a given λ, s λ elements of u will be non-zero.-Problem 3 (multiple sPLS): This approach consists in considering all the observation set as one set, i.e., X (M m ,•) and Y (M m ,•) are standardized.Data are standardized within each observation set, i.e., for every m ∈ {1, . . ., M}, X (M m ,•) and Y (M m ,•) are standardized instead of X and Y .Formulation is a classic sPLS problem: In the model, the set of loading {u m } m∈{1,...,M} is composed of p × m elements ( p elements per u m ).The set of loading {u m } m∈{1,...,M} is composed of q × m elements (q elements per v m ).The sparsity of u m is controlled by the parameter λ m : for a given λ m , s m,λ m elements of u m will be non-zero.Therefore, variables concerned by the shrinkage to zero will depend on the observation set m. -Problem 4 (" sparse PLS for structured data"): This approach consists in considering M different sPLS problems corresponding to each of the M observation sets.Data are standardized within each observation set, i.e., for every m ∈ {1, . . ., M}, X (M m ,•) and Y (M m ,•) are standardized instead of X and Y .All problems are solved at the same time with a common-Lasso.The formulation of the problem is In the model, the set of loading U is composed of p × m elements ( p elements per U (•,m) ).The set of loading V is composed of q × m elements (q elements per V (•,m) ).The sparsity of all U (•,m) is controlled by the parameter λ: for a given λ, the same s λ elements of each U (•,m) will be non-zero.
-Problem 5 (classical sgPLS): When variables can be gathered in groups (Fig. 2), the sgPLS propose to add a group-Lasso penalization to the classical PLS.Data are standardized within each observation set, i.e., for every m ∈ {1, . . ., M}, X (M m ,•) and Y (M m ,•) are standardized instead of X and Y .The formulation of the problem is In the model, the loading vectors u and v are composed of resp.p and q elements.The penalization P variable forces single variables to be set to zero, whereas the penalization P group forces sets of variables to be set to zero.The degree of sparsity in general in the model is λ, whereas the parameter controlling the balance between both kind of sparsity is α.In this model, elements of u corresponding to least relevant variables and least relevant group of variables are set to zero.-Problem 6 (" sgPLS for structured data"): In the same spirit of adapting problem 2 into problem 4, problem 5 can be adapted with a common-Lasso penalization.Data are standardized within each observation set, i.e., for every m ∈ {1, . . ., M}, X (M m ,•) and Y (M m ,•) are standardized instead of X and Y .The formulation of the problem is In the model, the set of loading U is composed of p×m elements ( p elements per U (•,m) ).The set of loading V is composed of q×m elements (q elements per V (•,m) ).In this model, elements of U corresponding to least relevant variables and least relevant group of variables are set to zero.In this model ,the same variables and variable groups corresponding to least significant variables are set to zero for all U (•,m) , m ∈ {1, . . ., M}.

Solutions of the penalized PLS
The classical sPLS can be seen as a biconvex optimization problem.It can be solved by successively optimizing the loading u and v [15].For a given v, an optimized ũ is computed and the value of u is updated.Then, the same is performed permuting the roles of u and v.This optimization process relies on solving the problems: The solution of problems 1-3 (composed of standard sPLS methods) is given by the following theorem: Theorem 4.1 The marginal optima in ũ and ṽ in the sPLS (Eq.( 1)) are: fixing v, the optimal u opt for (7) is u Fixing u, the optimal v opt for (7) is In this formula, a soft-thresholding sets down to zero loadings corresponding to variables, whose scores are too low.Setting λ equal to zero, we find the formulation of the PLS problem without Lasso constraint.A proof can be find in [8].
For problems 4, 5, and 6, the solution is more complex.Problem 4 introduces a common-Lasso penalization, problem 5 introduces a variable group structure, and problem 6 introduces both common-Lasso penalization and variable group structure.We can note that problem 4 is a particular case of problem 6, where there is no group penalty, i.e., α = 1.Problem 5 is a particular case of problem 6, where there is only one observation set, i.e., M = 1.The solution of problem 6 is given in Theorem 4.2 (presented in the following), whereas solutions of problems 4 and 5 are corollaries of this theorem and can be found after the proof (Corollaries 4.4 and 4.5).

Theorem 4.2
The marginal optima in U and V in the "sparse group PLS for structured data" [Eq.(6)] are: Fixing V , the optimal U opt for (6) is: Fixing U , the optimal V opt for (6) is: Proof The proof is composed of three steps.In Step 1, we settle the sub-gradient equation corresponding to the minimization problem.In Step 2, we make the sPLS thresholding emerge in the equation.In Step 3, we make emerge the group thresholding and prove the theorem.
Let us settle the sub-gradient equation.The optimal U for a given V is We note that the problem can be formulated making appearing the column blocks corresponding to the variable groups.A second formulation of the problem would be min We can see that the problem can be separated in K distinct problems for every k ∈ {1, . . ., K }: min To solve this problem, let us consider the kth problem developing the Frobenius norm: min Taking the sub-gradient, the optimal U opt verify for m ∈ {1, . . ., M}: with the ( p × M) matrix Θ g such that We can note that when there is no penalty (i.e., λ = 0), U opt = U 0 is the solution of the non-sparse problem.The sub-gradient equation is settle (Step 1).Let us now make emerge the thresholding of sPLS.We investigate in which case U (P k ,•) opt = 0, i.e., when loading corresponding to a group of variables is set to zero.If U (P k ,•) opt = 0, then U (i,•) opt = 0 for every i ∈ P k .Hence, we have and with and for i ∈ P k , we have also and with

Let us define
We can establish in the following lemma, which makes emerge the variable thresholding term of sPLS like in (1) in Eq. ( 12).

Lemma 4.3
and there is Θ v , such that = 0 and then U (i,•) 0 = 0.The inequality is then true.
Furthermore, there is a 0 reach the equality (16).Otherwise, U (i,•) 1 = 0 and The inequality ( 15) is true, because the equality ( 16) is reached.In any case, the Lemma 4.3 is proved.
From lemma 4.3, we can infer that in ( 12) and the inequality can be reached as an equality.
We have and we have also and the inequality can be reached.Therefore In the end, we have Let us now consider that U (P k ,•) opt = 0 or U (i,•)  opt = 0 for at least one i ∈ P k then In both cases, we can consider that From this point, we find successively and Summing the square for every element of P k , we have . and hence, U After extracting the value of U (P k ,•) F from this equation, we finally find that Corollary 4. 4 The solution to Eq. ( 4) can be seen as a biconvex optimization problem.Fixing V , the optimal U opt for (4) is: Fixing U , the optimal V opt for (4) is: Corollary 4. 5 The solution to Eq. ( 5) can be seen as a biconvex optimization problem is: Fixing v, the optimal u opt for (5) is Fixing u, the optimal v opt for (5) is

Discussion
Problems 1-4 are discussed in this part, but we consider that the following remarks can be transposed to problems 5 and 6, problems 5 and 6 being resp.the equivalents of problems 2 and 4 for data with a variable group structure.

Size of the data
The larger data (in terms of number of observations) are, the better models are supposed to perform.We can see that Problems 1 and 2 have the merit of performing an sPLS on data containing n observations, whereas problems 3 and 4 perform M different sPLS methods on data with resp.M m observations for m ∈ {1, . . ., M}.
For some observation set, the number of observation can be significantly smaller than the size of the hole data which can have a negative impact on the result.

Number of loading elements in the model
Number of loading elements is an important parameter to control.From one side, the bigger the number is, the more information can be stored by the model, from the other side having too much loading elements can give results harder to interpret and there is higher risk of over-fitting.Problems 1 and 2 have only p loadings for u, whereas problems 3 and 4 have M × p ones.For problem 4, the number of loadings is important, but the number of non-zero variable will vary between 1 and p in the same way than in problem 1 and problem 2.
The problem 4 gives readable results while keeping the flexibility of a model with higher number of loading elements.For problem 3, the non-zero variables can be different from one study to another, we cannot control weather a variable will be null for all studies and the number of non-zero variable will be significantly higher than for other problems.

Sensibility to batch effect
Batch effect can arise when data provided from different source present a bias.This effect can happen when observation sets are expected to introduce its intrinsic error.Therefore, the cross-product matrices could be represented by a model like: where Z follows a given law and E m are Gaussian noise with parameters depending on m.Under this hypothesis, the standardization within studies can bypass this bias.Therefore, problems 3-4 can correct this kind of batch effect.However, more complex bias can exist.For instance, what happens if different observation sets have different dynamics?Let us consider a variable that is positively correlated in some observation sets and negatively correlated in others.In problems 1 and 2 and their overall sPLS, the variable will have a small corresponding loading, because positive and negative effects compensate each other and the variable will be cut because of the sparsity heuristic.In problem 3, the distinction between all dynamics will be highlighted by the model.Finally, problem 4 will select the same variable, because it has a significant loading on every observation set.In the end, problem 4 can handle more cases, where the observation sets introduce bias.

Relation between problem 4 and classic sgPLS
We can establish also that problem 4 (sPLS method with a common-Lasso penalization) applied to matrices X and Y of size resp.(n, p)and (n, q) can be equivalent to a classical sgPLS without a standard Lasso on well chosen-matrices X and Ỹ of size resp.(n, p × M) and (n, q × M).Those matrices are constructed by shifting the row blocks of X and Y : they are diagonal bloc matrices whose blocs are resp.X (M m ,•) and Y (M m ,•) for m ∈ {1, . . ., M)}.The corresponding loading vectors of size resp.p × M and q × M are called here resp.u e and v e .The representation of those objects is shown in Fig. 3.
where P k refers here to the variables of X associated to the variables corresponding to the group k In this formulation, loading vectors u e and v e can be seen as the concatenation of the rows of resp.U and V in a unique uni-dimensional vector.This notation is interesting from a theoretical point of view, because it ensure that problem 4 can inherit properties from sPLS.However, this notation is not wise for computational efficiency, because the matrices X and Ỹ are M times bigger than X and Y , where M is the number of observation set.For implementation, computing directly the solution from Eqs. ( 10) and ( 11) seems wiser.

Application on simulated data
Presented methods are illustrated on simulated data.A first simulation case presents data, where a batch effect exists and a second simulation case presents data, where different dynamics exist among different observation sets.For each case, different noise levels are considered.Every simulation is performed 50 times.The code can be found at https://github.com/camilobroc/sgPLS_for_structured_data.

Design of the simulated data
In the following, a training data set of 900 observations gathered in 3 observation sets of 300 observations is generated and then a test data set of 300 observations gathered in 3 observation sets of 100 observations (for the training data M = 3 and n 1 = n 2 = n 3 = 300, for the test data n 1 = n 2 = n 3 = 100).

Batch effect cases
In first the simulation case, applications of the methods with data presenting a batch effect are performed.The simulation is performed with different noise levels.Data have an observation set structure and a group of variable structure, as shown in Fig. 2. A batch effect implies that one same physical process is observed but the methods of measurement vary among the different group of observation.We represent this difference of measurements by a bias in depending on the observation set.
A matrix X with 1000 variables gathered in 50 groups of 20 variables (K = 50 and p 1 = • • • = p K = 20) and a matrix Y with 3 variables (q = 3) are generated.To mimic a batch effect, the generation procedure of the matrices resp.X and Y has different parameters depending on different sub-matrices of resp.X and Y .Those matrices are composed of a signal and a noise.The signal corresponds to a PLS model with one latent variable.For m ∈ {1, . . ., M}: Signal The latent variable H is a (n × 1) column vector, where each element follows a normal distribution of mean 0 and standard deviation 1.The loadings associated with this latent variable are resp.C a p × 1 column vector and D a q × 1 column vector corresponding resp.to X and Y .

Batch effect
The signal is blurred by a batch effect.The parameters λ , and λ (Y,B) m are real numbers depending on the observation set m.They control the shape of the batch effect.The notations 1 n m , p and 1 n m ,q correspond to the matrices which elements are equal to 1 and of respective size n m × p and n m × q.

Noise
The noise is represented by E X , a (n × p) matrix, and E Y , a (n × q) matrix.The matrix E X is constructed by group of variables: for k ∈ {1, . . ., K }, the rows of E (•,P k ) X follow a multivariate normal distribution N p k (O p k , λ X,E Σ p k ,ρ ), where ρ and λ X,E are real parameters and Σ p k ,ρ is a ( p k × p k ) matrix which diagonal elements are equal to 1 − ρ and non-diagonal elements are equal to ρ.The notations p k stands for the vector of size p k and which elements are all equal to 0. The rows of the matrix E Y follow a multivariate normal distribution N q (1 q × 0, λ Y,E Σ q,ρ ), where λ Y,E is a real parameter and Σ q,ρ is a (q × q) matrix which diagonal elements are equal 1 − ρ and non-diagonal elements are equal to ρ.The notations q to the vector of size q and which elements are all equal to 1.The parameter ρ represents a correlation between variables of a same group and λ Y,E and λ Y,E represents the noise levels.
The non-null parameters of C are the 15 first variables of the first 4 group of variables.Among those elements resp.15, 30 and 15 are equal to resp. 1, −1, 1.5, and the values are randomly distributed.Other parameters are given in Table 1 and the noise levels are indicated in Table 2.

Effects of different magnitudes among group of observations
This simulation case mimic data presenting different dynamics among observation sets.The generation process follows the same formulas as the previous one but with different parameters.The main difference with the previous cases is that the parameters λ X,B m for m ∈ {1, . . ., M} can have opposite signs.While in the first case, a batch could be represented by a difference of magnitude, the effects can have here opposite directions.In this simulation case, we are not interested in a bias concerning μ  1 and the noise levels are indicated in Table 2.

Compared methods
In the first simulation case, methods corresponding to problems 1, 2, and 5 are compared, whereas in the second simulation case, methods corresponding to problems 1, 2, 4, and 6 are compared.For the methods     corresponding to problems 1, 2, and 4, the penalization parameter λ is set, such that the number of variables is equal to the true number of variables having an effect (in this case 60).For the method corresponding to problems 5 and 6, the penalization parameter λ is set, such that the number of groups of variable is equal to the true one ( in this case 4), while α is chosen from the set {0.1, . . ., 0.9} by cross validation: the value giving the best Mean Square Error Prediction is kept.

results
The performances of the method are measured through the True Positive Rate (TPR), the Total Discordance (TD), and the Mean Square Error Prediction.The TPR is defined as Results of the first and second simulations are given in Table 2.
In the first simulation case, data present a bias that depends on the observation set.Different noise levels are generated (λ X,E = 2, 20, 30).We can see that when noise is small (λ X,E = 2), MINT and "sparse group PLS for structured data" can retrieve the true variables, whereas a classic sPLS cannot.We can also see that the MSEP is better for the "sparse group PLS for structured data" than for MINT.We can note that "sparse group PLS for structured data" misses a few true variables which gives a non-null TD.This is due to the fact that the calibration of the method does not seek for a selection of the true number of variables; hence, a small number of true variables can be missed.When noise is greater (λ X,E = 20), a difference in terms of detection of true variables is observed.The classical PLS have a much worse TPR and TD, while "sparse group PLS for structured data" is above MINT.When noise is even greater (λ X,E = 30), "sparse group PLS for structured data" clearly outperforms MINT.
In the second simulation case, data present a magnitude in the latent variables that depends on the observation set.Different noise levels are generated (λ X,E = 2, 10, 20).We can see that when noise is small (λ X,E = 2), we can see that to retrieve the true variables "s(g)PLS for structured data" is better than MINT which is better than sPLS.In the same way that in the first simulation case, "sgPLS for structured data" miss a few number of the true variables, whereas "sPLS for structured data" do not because of the specificity of the calibration.We can see at noise level λ X,E = 10, that only the methods calibrated "for structure data" are able to retrieve the true variables.At the highest noise level (λ X,E = 20), we can see that sgPLS stands clearly above "sPLS for structured data", and those two methods outperform the existing ones.

Conclusion
In the end, different ways of formulating an sPLS problem on data presenting an observation set structure have been discussed.The MINT formulation has the merit of being easy to implement and correct the batch effect.The novel method "sparse PLS for structured data" can also correct it.Furthermore, it allows to take into account a lot of different bias, especially when the different observation set do not have the same dynamics.Despite its high number of parameters, the common-Lasso penalization ensures that the result is readable with a small number of selected variables in the overall analysis.
This article proved its ability to inherit properties of sPLS.Its adaptation to variable groups developed in this article, called "sparse group PLS for structured data", is a notable example of " sparse PLS for structured data" benefiting from an extension of the sPLS.We can note also that it can be applied on either quantitative or qualitative variables as any sPLS can.
A simulation shows that the new methods can outperform existing methods for detecting a small signal in a large noise.Because its requirements on the nature of the data are very general, we are confident that the method can be applied to the wide area of domains, where sPLS is competitive.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Fig. 2
Fig. 2 Illustration of data structured by group of variables and group observation.Variables are assumed to be ordered by variable group

Fig. 3 2 F
Fig. 3 Notation of X (grey rectangle) and Ỹ (red rectangle) to write the sPLS for structured data as a sgPLS parameters are set to zero.The non-null parameters of C are the 15 first variables of the first 4 group of variables.Among those elements, resp.15, 30m and 15 are equal to resp. 1, −1, 1.5 and the values are randomly distributed.Other parameters are given in Table TPR = True Positives True Positive + False Negatives and TN is defined as: TD = False Postives + False Negatives.

Table 1
Table of the parameters used in first and second simulation cases

Table 2
Results for the first and second simulation cases.Results in terms of MSEP, TPR, and TD are presented for each noise level