Profile Analysis in High Dimensions

The three tests in profile analysis: test of parallelism, test of level and test of flatness are modified so that high-dimensional data can be analysed. Using specific scores, dimension reduction is performed and the exact null distributions are derived for the three hypotheses.


Introduction
In this article, we are going to consider profile analysis from a high-dimensional perspective, i.e. there are so many parameters in the model that there are not enough degrees of freedom to test those hypotheses which are part of the analysis. Profile analysis consists of three tests, and the tests are carried out in a certain order. The tests are: (1) the test of parallelism; (2) the test of equal level; and (3) the test of flatness. Later we will be more specific how the tests are performed. The approach differs somewhat from the usual likelihood ratio testing procedure since, in particular, the hypotheses are chosen in a certain order.
There are two possible distinct scenarios in the analysis of profiles which both imply a need to extend the classical theory to cover high-dimensional models: 1. The same variable is observed for each subject over many time points (repeated measurements) which often are tightly distributed along a finite interval. There can be more repeated measurements than the number of independent subjects. 2. For each subject many variables should be analysed simultaneously. There can be more variables than the number of independent observations.
In (1) there is a natural ordering of the repeated measurements. For example, it can be a growth curve or in general any stream of observations which is generated from some measurement device. In (2) there does not exist any ordering between variables. Instead one can measure hundreds of characteristics on a subject, for example components from a vehicle. Over the years profile analysis has been studied by many authors. One of the first contributions was published by Greenhouse and Geisser [4]. Originally, mean values were compared and modelled. Many years after profile analysis had been introduced, Srivastava [17] derived likelihood ratio-based test statistics together with their distributions. Illuminating chapters on profile analysis can be found in the books by Srivastava and Carter [19] and Srivastava [18]. A more sophisticated approach was suggested by Ohlson and Srivastava [13] who considered profile analysis of several groups, where the groups of subjects only partly had a common profile. In von Rosen [22] an overview of classical profile analysis has been given.
Fujikoshi [3], followed by Seo et al. [16], derived likelihood ratio tests for the parallelism, level and flatness hypothesis, respectively, when analysing growth curve data. For a parallel profile model, i.e. assuming parallel profiles, it has also been proposed to consider different covariance structures. In Yokoyama and Fujikoshi [25] and Yokoyama [24] the random-effect covariance structure was considered and some tests for the random-effect and flatness were derived. Later, Srivastava and Singull [20] constructed likelihood ratio tests in profile analysis, without any restrictions on the parameter space, for testing the covariance matrix for random-effect structure or sphericity.
Moreover, profile analysis has also been discussed under more general models. Okamoto et al. [14] studied the asymptotic expansions of the distributions of some test statistics considering elliptical distributions. Others have extended this model and discuss the asymptotic expansions for the null distribution of test statistics for profile analysis under non-normality, e.g. see Maruyama [12] and Harrar and Xu [6].
Our ideas about analysing high-dimensional profile data stem from works by Läuter [8,9] and Läuter et al. [10,11] where random scores were utilized in MANOVA models. Scores have for a long time been used in statistics and mostly they constitute of known linear combinations of random vectors/matrices. The idea with the random scores is that they should be applied to test statistics which to some extent are robust, i.e. instead of having test statistics which are based on normally distributed vectors they can be elliptically distributed and still the test statistics follow the same distribution as when the observed variables are normally distributed.
In Sect. 2 profile analysis is introduced and necessary background information for the rest of the paper is presented. Thereafter, in Sect. 3 the high-dimensional approach of this paper is described and in Sect. 4 the usual test statistics are modified so that high-dimensional data can be handled. Section 5 comprises some concluding remarks.
Concerning notation bold upper case letters denote matrices and bold lower case vectors. Other notations are defined when they first appear.

Profile Analysis
Assume that there are q groups which should be compared with n i p-dimensional random vectors x ij , j ∈ {1, ..., n i } , which are independently normally distributed as N p ( i , ) , i ∈ {1, ..., q} , where i = ( 1,i , ..., p,i ) � and is an unknown positive definite dispersion matrix. As mentioned in the introduction there are three different hypotheses which are commonly considered in profile analysis: where A 1 stands for alternative hypothesis, the parameters i are unknown scalars and 1 p is a p-dimensional vector of ones; 2. Level hypothesis One can note that instead of H 3 |H 1 the strategy can be to test H 3 |H 2 , when we have 1 = ⋯ = q . In this way, profile analysis can be built up around a chain of tests.
Profile analysis can also be reformulated with the help of matrices and the MANOVA and growth curve model (GMANOVA) as well as the extended growth curve model which all belong to the class of bilinear models (see von Rosen [23]). Moreover, for technical details we refer to the report by Cengiz and von Rosen [1].
Let the observation matrix be matrix normally distributed, i.e. X ∼ N p,N (BC, , I) , B : p × q and : p × p , consist of the unknown parameters and C : q × N , is the design matrix describing the q groups. In this article, to simplify presentation, C is supposed to be of full rank. It can be noted that B = ( 1 , … , q ) and one choice of C is The null hypothesis and the alternative hypothesis for parallelism can be written where F and G are contrast matrices given by It can be noted that F is of size (p − 1) × p and G is of size q × (q − 1) , respectively, of ranks p − 1 and q − 1.
If N − 2 is larger than p, the likelihood ratio statistic for the parallelism hypothesis can be given as where | • | stands for the determinant, the projection P M = M(M � M) −1 M � , for any matrix expression M of full rank and S = X(I − P C � )X � . The notation of a projection will frequently be used in this article. Let r(M) denote the rank of a matrix M . Moreover, are two expressions which are independently distributed, where W p ( , n) denotes the Wishart distribution with scale parameter and n degrees of freedom. Note that r(C) = q and r(G) = q − 1 . If p = 1 then a Wishart variable is proportional to a chi- which is known as Wilk's lambda distribution. Hence, the distribution for the likelihood ratio statistic given in (3) is If the profiles are parallel, we can say that there is no interaction between the responses and the groups. Given that the parallelism hypothesis holds, the next step is to proceed with testing the second hypothesis, H 2 , which indicates that there is no group effect. Moreover, if the first hypothesis holds, one may also want to test the third hypothesis, H 3 , meaning that the response is constant "over time". Note that failing to reject H 1 , as always, does not mean that the hypothesis is true but in profile analysis it is used as a strategy for analysing data. Thus, To find an exact significance level for the level hypothesis test and the flatness hypothesis test is still an open question. The main reason for this fact is that it is a difficult problem to solve since the different "conditional" test statistics are dependent. If the significance level is important, a Bonferroni approach can be used. Since the focus in this article is to construct test statistics in "high dimensions", we leave it to the future to compare the classical profile analysis approach with an approach based on "unconditional tests".

High-Dimensional Setting
The focus in this article is on high-dimensional profile analysis. Several authors have approached the analysis of high-dimensional profiles. Onozawa et al. [15] and Harrar and Kong [5] (see also Hyodo [7]) derived test statistics for high-dimensional profile analysis with unequal covariance matrices. Takahashi and Shutoh [21] proposed new test statistics in profile analysis with high-dimensional data by applying the Cauchy-Schwarz inequality. The above-mentioned authors introduce different high-dimensional asymptotic frameworks and derive the test statistics in profile analysis under these frameworks. The approach in this article is different since we will not focus on the asymptotic distributions of the test statistics. Instead a fixed p (number of repeated measurements) and n (number of observations) are of interest with a p which can be much larger than n.
The method adopted in this article is mainly based on ideas put forward by Läuter [8,9] and Läuter et al. [10,11] who proposed a scoring method for dealing with high-dimensional problems in MANOVA. The method is more advanced than principal component analysis and tests based on these scores are exact. However, for the level test in profile analysis this article presents a completely new approach. The tests which arise from Läuter's [8,9] and Läuter et al.'s [10,11] approach are based on linear scores which are constructed with the help of sums of products matrices. These scores are linear combinations of the repeated measurements. The approach implies that high-dimensional observations are compressed into low-dimensional observations and then these are used in the analysis instead of the original data. Note that we are very briefly mentioning the choice of scores and only one explicit expression of the scores is given in this work. However, there exist different kinds of scores and for details it can be referred to Läuter et al. [10] where several examples are presented. Now it is started with a brief introductory mathematical presentation of the theory. Suppose and consider a single score where d is the vector of weights and z j 's, j ∈ {1, ..., n} , are the individual scores. Suppose that the null hypothesis = 0 is of interest. In this case one can choose the vector d to be a unique function of XX ′ which is the total sums of product matrix of size p × p . Then, with z = 1 n z � 1 n and s 2 is t-distributed with n − 1 degrees of freedom. Note that the vector of random scores z is not normally distributed. The result on this type of "robustness" follows from a general result stated in the next lemma. The lemma is useful if the distribution of the statistic mentioned in Lemma 3.1 can be derived for one member of Φ + . In particular if it holds for u ∼ N n (0, I) because u belongs to Φ + and then the distribution has been obtained for all members of the class Φ + . These facts, among others, establish why (8) is true for all spherical distributions.
Wilk's Λ statistic is frequently used in this article and Theorem 1 in Läuter et al. [11] implies the following theorem: In the theorem there is involved in V ∼ W p ( , m) and W ∼ W p ( , n) but the distribution of is the same for all . It is only crucial that the same is included in the distributions for W and V . One way of constructing the weights D is to use the so-called principal component method (see Läuter et al. [11]), where the weights are determined by solving the eigenvalue problem and is a diagonal matrix with the positive eigenvalues.
The next corollary of the theorem is what is needed in this article. , d � X = g � (XX � )X . Moreover, for any orthogonal matrix , g(XX � ) = g(X � X � ) and since X has the same distribution as X , the score d � X = g � (XX � )X is spherically distributed. Note that can be written and it follows from Lemma 3.1 that since the statement instead of d ′ X also is true for a normally distributed variable with a dispersion matrix equal to I , is indeed -distributed with m/2 and n/2 degrees of freedom. ◻ Corresponding to (9) one alternative way to determine the weight d in Corollary 3.1 is by solving the eigenvalue/eigenvector problem

Main Results
In Sect. 2 the three hypothesis in classical profile analysis was presented, i.e. when N > p + q . Now it will be focused on the high-dimensional setting when p > N − q . Consider, for example, the test statistic given in (3). When p is large the problem is that S is singular and the determinant in (3) equals 0.
Läuter [8,9] and Läuter et al. [10,11] directly applied a vector d to the observation matrix X . In this article, since there is a bilinear testing situation, it is proposed to apply d to FX where F is given in (2). Hence, d is of size p − 1.
Let Y = FX , then the following test statistic is proposed to test the hypothesis of parallel profiles which is based on the formulation in (1):

Proposition 4.1 Let the parallelism hypothesis be defined via (1) and let Y = FX . A test statistic for testing the hypothesis is given by
It can be noted that 1 establishes the next theorem. (with probability 1) of Y(I − P C � G o )Y � the test statistic for testing for parallel profiles in high dimensions, given in (10), follows a -distribution with parameters (N − r(C))∕2 and r(G)∕2.

Theorem 4.1 If d is a nonzero function
In (5) the likelihood ratio was presented for the level hypothesis defined in (4). As it can be seen from the expression in (5) problems occur in high dimensions because S −1 does not exist when p > N − r(C) + 1 . Therefore, Thus, it can be seen that if p > N − r(C) + 1 negative degrees of freedom appear which of course is impossible. Thus, in order to test the level hypothesis in high dimensions the statistic in (5) has to be modified significantly. In this article the idea will be to modify the statistic in (5) as little as possible but so much that highdimensional statistical analyses can be carried out.
A couple of ideas will bring us to a proposition where a test statistic and its distribution are given. The first idea will be to prove (11) in detail and see if anything can be modified so that when p is large reasonable expressions appear. To simplify notation (A � S −1 A) −1 will be discussed where, for example, A = (F � ) • and the inverse is supposed to exist. The following chain of equalities shows some interesting structure: where Since A � −1 X is independent of A •′ X it is also independent of P . Moreover, P is idempotent and r(P) = N − r(C) − p + r(A) . Thus, for any choice of A • , conditionally on A •′ X , the expression in (12) is Wishart distributed but it also appears that this distribution is independent of A •′ X and therefore (11) is established for the spe- The critical point in the high-dimensional setting is that r(P) will approach 0, even if the inverse in (13) is replaced by a g-inverse, and therefore, it is proposed that P is modified in such a way that instead of this projection (expressed in F ′ instead of A • ) will be used, where d is a function in FX . Summarizing these calculations will give a quantity which will be used as nominator in a test statistic for testing the level hypothesis: How to choose d in (14) is not clear. On the one hand the distribution for U does not depend on d but if explicit expressions are to be calculated this will be a function d where now data has replaced X . From (5) it also follows that (A � S −1 A) −1 A � S −1 has to be considered. Similar calculations to those in (12) yield Replacing A by (F � ) • and then using the same d vector as in (14) leads to where is an idempotent matrix. Moreover, let and put Corresponding to L in (5) the next quantity is proposed to test the level hypothesis: where U and V are defined in (15) and (20), respectively. However, it is not possible to use Lh,prel in (21) because both U and V include the unknown dispersion matrix so a few more results have to be established. It follows immediately that If multiplying the nominator and denominator in (21) by ((F � ) • � −1 (F � ) • ) 1∕2 it follows since the distribution of is independent of that the distribution of Lh,prel is independent of . Thus, in order to have a test statistic which is functionally independent of the dispersion matrix = I is chosen. In the next proposition the text statistic for the level hypothesis is stated. (4). A test statistic for testing the level hypothesis is given by where with P , P 1 and Q defined in (14), (18) and (19), respectively.

Proposition 4.2 Let the level hypothesis testing problem be defined via
How to choose d in (14) and (18)  The third hypothesis of flatness was stated in (6). Since the approach for creating a test statistic is the same as for the parallelism hypothesis, the results are stated without any proofs.
The motivation that d is a function in YY ′ follows from the fact that Moreover, These results imply that the next theorem can be established.

Concluding Remarks
In this article the three well-known test statistics in profile analysis have been modified so that high-dimensional data can be handled in a non-asymptotic approach. The test for parallelism and flatness was derived following ideas given by Läuter [8,9] and Läuter et al. [10,11] which originally was developed for handling MANOVA problems. Concerning the level testing a completely new approach is proposed. Here we modify the degrees of freedom and an exact test is derived. The vector d , which is utilized in our approach when testing for parallelism and flatness, is a function of some sums of squares has only briefly been considered in this article. Instead it is referred to Läuter et al. [10], Section 4, where several different alternatives for determining d are proposed. Furthermore, a generalization of the presented approach in this article will be to apply a matrix D , i.e. study several linear scores, instead of a vector d which only give one score.
There is another important problem (observed by one of the reviewers) that the choice of F can have an effect on the choice of d and thus the test statistic depends in fact on the choice of F . It is important to continue this work and establish restrictions on the choice of d so that the vector only depends on the space generated by the columns in F , i.e. instead of using F using the projection F(F � F) − F � .
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.