Bilinear regression with random effects and reduced rank restrictions

Bilinear models with three types of effects are considered: fixed effects, random effects and latent variable effects. In the literature, bilinear models with random effects and bilinear models with latent variables have been discussed but there are no results available when combining random effects and latent variables. It is shown, via appropriate vector space decompositions, how to remove the random effects so that a well-known model comprising only fixed effects and latent variables is obtained. The spaces are chosen so that the likelihood function can be factored in a convenient and interpretable way. To obtain explicit estimators, an important standardization constraint on the random effects is assumed to hold. A theorem is presented where a complete solution to the estimation problem is given.


Introduction
In this article, we present a multivariate linear model which incorporates repeated measurements profiles (growth curves), covariate effects, random effects and effects due to latent variables. The model can be used to analyse quite complex data structures. We are not aware of any mathematical treatment of such a model. Focus is on obtaining explicit estimators because it is easier to understand explicit estimators than estimators derived by some algorithm. The derivation of explicit estimators also implies that it is easier to study properties of estimators and to perform model validation which is part of the statistical paradigm.
A base model here is a bilinear regression model which is often referred to as the growth curve model and was introduced by Potthoff and Roy (1964). It will be assumed that there exists covariate information (background information) which is modeled via fixed linear effects. It is also assumed that due to, for example, the sampling procedure there will be random effects which have an impact on the data, i.e., increase the variation. Further, we exploit the idea of adding latent process information to the model. When measuring many background variables, it is often the case that fewer latent processes are governing these variables. For example, if we make field trials unobserved soil characteristics can be important and it seems reasonable to think of the soil characteristics as latent variables. Another example is when measuring EEG signals on many places on the scalp, the response can be thought to be govern by a few latent variables. The latent variables in this article are taken into account by supposing rank restrictions on parameters and in our presentation a rank restriction on the mean parameters is applied. Note that sometimes in the literature, latent variables are motivating the use of random effects which is a different implementation of the concept of latent variable.
Before defining the model, some notation are introduced. Bold upper cases denote matrices: C(A) is the column vector space generated by the columns of A and C(A) ⟂ denotes its orthogonal complement. The orthogonal projector on C(A) is denoted P A and equals P A = A(A � A) − A � , where "-" denotes an arbitrary generalized inverse (g-inverse). Note that I − P A is a projector on C(A) ⟂ . The rank of A is denoted r(A) . Moreover, we will often write (M)() � instead of (M)(M) � , where M represents any matrix expression. The matrix normal distribution with mean : p × n and dispersion ⊗ (the symbol ⊗ stands for the Kronecker product) is denoted N p,n ( , , ) for matrices of size p × n and positive semi-definite matrices : p × p and : n × n (see Ohlson et al. 2013). Now the model which will be considered is presented in detail.

Definition 1 Let
where A : p × q 1 , C 1 : k 1 × n , C 2 : k 2 × n , F : k 3 × n , are all known matrices, Note the important standardization constraint ZZ � = I which will be utilized later. The parameters which are to be estimated are B 1 , B 2 , , u and e . Instead of C(F � ) ⊆ C(C � 1 ) given in Definition 1 some other condition can be used which

3
Japanese Journal of Statistics and Data Science (2020) 3:63-72 follows from the derivation of the estimates in the next section. Since in this article only estimation is considered, the random effect UZ will not be predicted. When in Definition 1 B 2 C 2 = 0 , F = 0 and UZ = 0 , the classical growth curve model appears (see Potthoff and Roy 1964;von Rosen 2018). When F = 0 and UZ = 0 , then we have the growth curve model with background information, i.e., a mixture of GMANOVA and MANOVA models; some references to this model, as well as more general models, are Chinchilli and Elswick (1985), Verbyla andVenables (1988), von Rosen (1989) and Bai and Shi (2007). If B 2 C 2 = 0 and F = 0 , then we have the growth curve model with random effects (see Ip et al. 2007) whereas if only F = 0 holds we refer to Yokoyama and Fujikoshi (1992) and Yokoyama (1995) where similar models are considered and where references to earlier works can be found. In these works, one puts structures on the covariance matrix which leads to somewhat different models than in this article.

Two examples
This section presents two examples where the proposed model in Definition 1 can be used.
Example 1 In the first example, a field experiment had been conducted where plant growth was studied under different treatment conditions. A couple of weeks after sowing, plant heights were collected weekly and in total p weeks were studied. In the study, treatments were randomly assigned to plots and each week plants were randomly chosen from plots and plant growth was measured. The experimental units were the plots which were modeled with respect to plant growth over time.
It was assumed that plant growth is linear and the basic model equaled: where with the help of A the linear plant growth over p weeks was modeled. For example when p = 4, The matrix C 1 in (1) is the design matrix connected to the treatments. If we would have had a univariate response or A = I , when comparing treatments, C 1 is the same design matrix as in ANOVA or MANOVA, respectively.
In particular, one wanted to take into account that fields are inhomogeneous objects with respect to soil characteristics since usually plant growth depends on soil characteristics. Plots were distributed over relatively large areas and, therefore, the characteristics varied among the plots. Thus, it was important to include in (1) soil characteristics as covariable: (1) X = AB 1 C 1 + E, (2) Moreover, when measuring plant growth, it was not possible to measure all plants within a plot. Instead each week a number of randomly selected plants were measured. Thus, it was natural to include a random effect in (2) leading to the model: Note that the condition ZZ � = I k 4 standardized the random effects according to the number of plots used to study a specific treatment.
Weather variables were also considered which are important variables for plant growth. However, there were too many weather-related variables which can have an influence on plant growth. For example, every hour during the day/night cycle temperature, precipitation, wind speed, etc., was recorded. All these variables are more or less connected and it was difficult to directly measure their influence on plant growth. Therefore, the concept of latent variable was of interest and latent variables were implemented in our model via rank restrictions on a parameter matrix. Hence, we ended up with the model presented in Definition 1, i.e., When the rank of equals r the interpretation is that there exist r latent processes affecting plant growth. The condition C(F � ) ⊆ C(C � 1 ) in Definition 1 means that the weather observations were taken at the same plots where plant growth was measured.
Example 2 In the second example, it will be shown that the model in Definition 1 can be used to analyze "small area" (small domain) problems. Small area estimation is an active research area in statistics with many applications. An introduction to the subject is for example given in the book by Rao and Molina (2015). The common thread for small area estimation problems is that a survey study (finite population case) has taken place and based on the survey the goal is to extract information about small domains by adding local information which was not accounted for in the comprehensive survey.
Sometimes survey samples are investigated several times. For example, suppose that there exists a national survey which collects information about some specific production from a certain type of companies. The survey samples are followed up once per year and this is ongoing for, say, 5 years. Moreover, suppose that there are 20 regions and each region consists of 10 subregions. The survey can consist of four companies from each subregion so the whole survey comprises 800 companies.
The survey produces for each subregion and year an estimate of the production of interest. The variance of its estimator is also obtained. However, in some sense these estimates are biased since they do not take care of local information (covariables). Moreover, the sample sizes are usually to small to draw firm conclusions about subareas. Therefore, the idea is to borrow strength across subareas via statistical models and background information. Let x be a vector where all survey estimates of a specific subarea variable are gathered (a specific production in our example). Moreover, if for these variables a linear model is assumed, we can write: (3) X = AB 1 C 1 + B 2 C 2 + UZ + E.
where ′ C 1 models the true value of the variable of concern and the error term is normally distributed with zero mean and known dispersion V , where the dispersion is determined solely by the survey design. Since V is known, one can equivalently study V −1∕2 x and, therefore, without loss of generality it can be assumed that V = I . Since the observations in the survey are repeatedly measured, and if it is supposed that there exists a second-order polynomial trend over the years, which is of the same form for all subregions, the following model can be set up: which models what happens with the companies over time.
Under the assumption of 20 regions, 10 subregions within each region, four companies within each subregion, repeated data collection for each company for 5 years and V = I , then E ∼ N 5,800 (0, , I) . In this case, for example, When expressing the survey estimators with the help of A and B 1 , we cannot say this is always true. Therefore, a random error U ∼ N p,l (0, u , I l ) is introduced in (4) and the following model emerges: where Z is related to C 1 , i.e., C(Z � ) ⊆ C(C � 1 ) and if additionally standardizing the effects ZZ � = I.
There usually exist two types of covariables. One type is accounted for in the survey study and another type of covariables are variables, for example, obtained from registers about the companies in the survey or there are base-line data which were available when the survey started. The effect of the second type of covariables can be modeled by B 2 C 2 which included in (5) yields the model: Moreover, suppose that a large cluster of socio-econometric variables exists. It is difficult to express a functional relationship between the survey estimators and the socio-econometric variables. Instead the idea of latent variables is employed leading to rank restrictions on an unknown parameter which should model the effect of these variables. Thus, we arrive again to the model presented in Definition 1:

Estimation
Let Q 1 and Q 2 be matrices of basis vectors such that and assume Q � (4) X = AB 1 C 1 + E, A one-one transformation of the model in Definition 1, using Q 1 and Q 2 , yields: Proof Since C(Z � ) ⊆ C(Q 1 ) and ZZ � = I the lemma is established by straight forward calculations of VV . ◻ From Lemma 1, it follows that: where : v × v is an orthogonal matrix. The identity in (6) is post-multiplied by , leading to the model: However, since the dispersion the model in (8) is split into two models.
Then, we have three models which will be used when finding estimators: The idea is to utilize (10) and (11) to estimate B 1 , B 2 , and e . Thereafter, these estimators are inserted in (9) which yields simple estimation equations for obtaining u . Suppose that estimators B 1 , B 2 , ̂ and ̂ e have been obtained, and let i.e., under the assumption of no randomness in B 1 , B 2 and ̂ we have a model: XQ 1 1 = AB 1 C 1 Q 1 1 + B 2 C 2 Q 1 1 + FQ 1 + UZQ 1 1 + EQ 1 1 , UZQ 1 1 ∼ N p,k 4 (0, u , I k 4 ), EQ 1 1 ∼ N p,k 4 (0, e , I k 4 ), (10) XQ 1 2 = AB 1 C 1 Q 1 2 + B 2 C 2 Q 1 2 + FQ 2 2 + EQ 1 2 , Based on this model, the maximum likelihood estimator, under the assumption that p ≤ k 4 , ̂ = k −1 4 Y 0 Y � 0 , which means that a natural estimator of u is given by This estimator can only be used if ̂ u is positive definite. However, if the estimator is not positive definite, via eigenvalues of ̂ u the estimator can be modified. Now we return to (10) and (11) to estimate B 1 , B 2 , ̂ and ̂ e , and it is convenient to merge the models, i.e., Let and then (12) is identical to Furthermore, the likelihood function which corresponds to the model in (15) equals with equality if and only if which, under some full rank conditions on D 2 , determines B 2 as a function of B 1 and . To show the inequality, we have used where both terms are positive semi-definite. The density in (16) corresponds to the model: Thus, we have a model which was treated by von Rosen and von Rosen (2017) and any further calculations are not necessary. For notational conveniences, we write the model in (17): 1 ) which is essential for being able to obtain explicit estimators (see von Rosen 1989).
The following theorem presents the estimators of the parameters in the model given by Definition 1.