1 Introduction

In this article, we are going to consider profile analysis from a high-dimensional perspective, i.e. there are so many parameters in the model that there are not enough degrees of freedom to test those hypotheses which are part of the analysis. Profile analysis consists of three tests, and the tests are carried out in a certain order. The tests are: (1) the test of parallelism; (2) the test of equal level; and (3) the test of flatness. Later we will be more specific how the tests are performed. The approach differs somewhat from the usual likelihood ratio testing procedure since, in particular, the hypotheses are chosen in a certain order.

There are two possible distinct scenarios in the analysis of profiles which both imply a need to extend the classical theory to cover high-dimensional models:

  1. 1.

    The same variable is observed for each subject over many time points (repeated measurements) which often are tightly distributed along a finite interval. There can be more repeated measurements than the number of independent subjects.

  2. 2.

    For each subject many variables should be analysed simultaneously. There can be more variables than the number of independent observations.

In (1) there is a natural ordering of the repeated measurements. For example, it can be a growth curve or in general any stream of observations which is generated from some measurement device. In (2) there does not exist any ordering between variables. Instead one can measure hundreds of characteristics on a subject, for example components from a vehicle.

Over the years profile analysis has been studied by many authors. One of the first contributions was published by Greenhouse and Geisser [4]. Originally, mean values were compared and modelled. Many years after profile analysis had been introduced, Srivastava [17] derived likelihood ratio-based test statistics together with their distributions. Illuminating chapters on profile analysis can be found in the books by Srivastava and Carter [19] and Srivastava [18]. A more sophisticated approach was suggested by Ohlson and Srivastava [13] who considered profile analysis of several groups, where the groups of subjects only partly had a common profile. In von Rosen [22] an overview of classical profile analysis has been given.

Fujikoshi [3], followed by Seo et al. [16], derived likelihood ratio tests for the parallelism, level and flatness hypothesis, respectively, when analysing growth curve data. For a parallel profile model, i.e. assuming parallel profiles, it has also been proposed to consider different covariance structures. In Yokoyama and Fujikoshi [25] and Yokoyama [24] the random-effect covariance structure was considered and some tests for the random-effect and flatness were derived. Later, Srivastava and Singull [20] constructed likelihood ratio tests in profile analysis, without any restrictions on the parameter space, for testing the covariance matrix for random-effect structure or sphericity.

Moreover, profile analysis has also been discussed under more general models. Okamoto et al. [14] studied the asymptotic expansions of the distributions of some test statistics considering elliptical distributions. Others have extended this model and discuss the asymptotic expansions for the null distribution of test statistics for profile analysis under non-normality, e.g. see Maruyama [12] and Harrar and Xu [6].

Our ideas about analysing high-dimensional profile data stem from works by Läuter [8, 9] and Läuter et al. [10, 11] where random scores were utilized in MANOVA models. Scores have for a long time been used in statistics and mostly they constitute of known linear combinations of random vectors/matrices. The idea with the random scores is that they should be applied to test statistics which to some extent are robust, i.e. instead of having test statistics which are based on normally distributed vectors they can be elliptically distributed and still the test statistics follow the same distribution as when the observed variables are normally distributed.

In Sect. 2 profile analysis is introduced and necessary background information for the rest of the paper is presented. Thereafter, in Sect. 3 the high-dimensional approach of this paper is described and in Sect. 4 the usual test statistics are modified so that high-dimensional data can be handled. Section 5 comprises some concluding remarks.

Concerning notation bold upper case letters denote matrices and bold lower case vectors. Other notations are defined when they first appear.

2 Profile Analysis

Assume that there are q groups which should be compared with \(n_i\) p-dimensional random vectors \(\varvec{x}_{ij}\), \(j\in \{1,...,n_i\}\), which are independently normally distributed as \(N_p(\varvec{\mu }_i, \varvec{\Sigma })\), \(i\in \{1,...,q\}\), where \(\varvec{\mu }_i=(\mu _{1,i},...,\mu _{p,i})'\) and \(\varvec{\Sigma }\) is an unknown positive definite dispersion matrix. As mentioned in the introduction there are three different hypotheses which are commonly considered in profile analysis:

  1. 1.

    Parallelism hypothesis

    \(H_1: \varvec{\mu }_i-\varvec{\mu }_q=\gamma _i\varvec{1}_p\), \(i\in \{1,...,q-1\}\) and \(A_1\ne H_1\), meaning that \(\varvec{\mu }_i-\varvec{\mu }_q\ne \gamma _i\varvec{1}_p\), \(i\in \{1,...,q-1\}\), where \(A_1\) stands for alternative hypothesis, the parameters \(\gamma _i\) are unknown scalars and \(\varvec{1}_p\) is a p-dimensional vector of ones;

  2. 2.

    Level hypothesis

    \(H_2|H_1: \gamma _i=0\), \(i\in \{1,...,q-1\}\) and \(A_2\ne H_2|H_1\), implying that \(\gamma _i\ne 0\), \(i\in \{1,...,q-1\}\), where \(H_2|H_1\) means \(H_2\) under the assumption that \(H_1\) is true;

  3. 3.

    Flatness hypothesis

    \(H_3|H_1: \varvec{\mu }_i=\psi _i\varvec{1}_p\), \(i\in \{1,...,q\}\) and \(A_3\ne H_3|H_1\), where \(H_3|H_1\) means \(H_3\) under the assumption that \(H_1\) is true, and the parameters \(\psi _i\) are unknown scalars.

One can note that instead of \(H_3|H_1\) the strategy can be to test \(H_3|H_2\), when we have \(\psi _1=\cdots =\psi _q\). In this way, profile analysis can be built up around a chain of tests.

Profile analysis can also be reformulated with the help of matrices and the MANOVA and growth curve model (GMANOVA) as well as the extended growth curve model which all belong to the class of bilinear models (see von Rosen [23]). Moreover, for technical details we refer to the report by Cengiz and von Rosen [1].

Let the observation matrix be matrix normally distributed, i.e. \(\varvec{X}\sim N_{p,N}(\varvec{BC},\varvec{\Sigma },\varvec{I})\), \(\varvec{B}\): \(p\times q\) and \(\varvec{\Sigma }\): \(p\times p\), consist of the unknown parameters and \(\varvec{C}\): \(q\times N\), is the design matrix describing the q groups. In this article, to simplify presentation, \(\varvec{C}\) is supposed to be of full rank. It can be noted that \(\varvec{B}=(\varvec{\mu _1},\dots ,\varvec{\mu _q})\) and one choice of \(\varvec{C}\) is

$$\begin{aligned} \varvec{C}=\left( \begin{array}{cccc}\varvec{1}'_{n_1}&{}\varvec{0}&{}\cdots &{}\varvec{0}\\ \varvec{0}&{}\varvec{1}'_{n_2}&{}\cdots &{}\vdots \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ \varvec{0}&{}\varvec{0}&{}\cdots &{}\varvec{1}'_{n_q} \end{array}\right) . \end{aligned}$$

The null hypothesis and the alternative hypothesis for parallelism can be written

$$\begin{aligned} H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \nonumber \\ A_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \text {no restrictions on}\quad \varvec{B}, \end{aligned}$$
(1)

where \(\varvec{F}\) and \(\varvec{G}\) are contrast matrices given by

$$\begin{aligned} \varvec{F}= \begin{pmatrix} 1&{}-1&{}0&{}0&{}\cdots &{}0&{}0 \\ 0&{}1&{}-1&{}0&{}\cdots &{}0&{}0 \\ 0&{}0&{}1&{}-1&{}\cdots &{}0&{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{} \ddots &{}\vdots &{}\vdots \\ 0&{}0&{}0&{}0&{}\cdots &{}1&{}-1 \end{pmatrix},\qquad \varvec{G}= \begin{pmatrix} 1&{}0&{}\cdots &{}0 \\ -1&{}1&{}\cdots &{}0 \\ 0&{}-1&{}\cdots &{}0 \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}1 \\ 0&{}0&{}\cdots &{}-1 \end{pmatrix}. \end{aligned}$$
(2)

It can be noted that \(\varvec{F}\) is of size \((p-1)\times p\) and \(\varvec{G}\) is of size \(q\times (q-1)\), respectively, of ranks \(p-1\) and \(q-1\).

If \(N-2\) is larger than p, the likelihood ratio statistic for the parallelism hypothesis can be given as

$$\begin{aligned} \lambda _P=\frac{|\varvec{FSF}'|}{|\varvec{FSF}'+\varvec{FX}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'|}, \end{aligned}$$
(3)

where \(|\bullet |\) stands for the determinant, the projection \(\varvec{P}_{M}=\varvec{M}(\varvec{M}'\varvec{M})^{-1}\varvec{M}'\), for any matrix expression \(\varvec{M}\) of full rank and \(\varvec{S}=\varvec{X}(\varvec{I}-\varvec{P}_{C'})\varvec{X}'\). The notation of a projection will frequently be used in this article. Let \(r(\varvec{M})\) denote the rank of a matrix \(\varvec{M}\). Moreover,

$$\begin{aligned}&\varvec{FSF}' \sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C})), \\&\varvec{FX}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{G})), \end{aligned}$$

are two expressions which are independently distributed, where \(W_p(\varvec{\Psi },n)\) denotes the Wishart distribution with scale parameter \(\varvec{\Psi }\) and n degrees of freedom. Note that \(r(\varvec{C})=q\) and \(r(\varvec{G})=q-1\). If \(p=1\) then a Wishart variable is proportional to a chi-square variable. Furthermore, if \(\varvec{U}\sim W_p(\varvec{\Sigma },n)\) is independent of \(\varvec{V}\sim W_p(\varvec{\Sigma },m)\) then

$$\begin{aligned} \frac{|\varvec{U}|}{|\varvec{U}+\varvec{V}|}\sim \Lambda (p,m,n), \end{aligned}$$

which is known as Wilk’s lambda distribution. Hence, the distribution for the likelihood ratio statistic given in (3) is

$$\begin{aligned} \lambda _P\sim \Lambda (p-1, r(\varvec{G}), N-r(\varvec{C})). \end{aligned}$$

If the profiles are parallel, we can say that there is no interaction between the responses and the groups. Given that the parallelism hypothesis holds, the next step is to proceed with testing the second hypothesis, \(H_2\), which indicates that there is no group effect. Moreover, if the first hypothesis holds, one may also want to test the third hypothesis, \(H_3\), meaning that the response is constant “over time”. Note that failing to reject \(H_1\), as always, does not mean that the hypothesis is true but in profile analysis it is used as a strategy for analysing data.

The level hypothesis in matrix form equals

$$\begin{aligned} H_2|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{BG}=\varvec{0}, \nonumber \\ A_2|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \end{aligned}$$
(4)

where \(\varvec{F}\) and \(\varvec{G}\) are defined in (2). Let \(\varvec{M}^{\circ }\) denote any matrix of full rank generating the orthogonal complement to the column space of \(\varvec{M}\). The corresponding likelihood ratio test statistic for the level hypothesis can be expressed as

$$\begin{aligned} \lambda _L=\frac{({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}}{({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}+\varvec{H}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\varvec{Q}^{-1}\varvec{G}' (\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{H}'}, \end{aligned}$$
(5)

where

$$\begin{aligned} \varvec{H}&= ({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1} {(\varvec{F}')^{\circ }}'\varvec{S}^{-1},\\ \varvec{Q}&= \varvec{G}'(\varvec{CC}')^{-1}\varvec{G}+\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{F}'(\varvec{FSF}')^{-1}\varvec{F}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}. \end{aligned}$$

Note that \(r{((\varvec{F}')^{\circ })}=1\) since \(r(\varvec{F})=p-1\). Then, it can be shown that

$$\begin{aligned}&({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1} \sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-p+1), \\&\varvec{H}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\varvec{Q}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{H}' \sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, r(\varvec{G})), \end{aligned}$$

which are independently distributed. Moreover, the Wilk’s lambda distribution for (5) is given by

$$\begin{aligned} \lambda _L\sim \Lambda (1, r(\varvec{G}), N-r(\varvec{C})-p+1) \end{aligned}$$

which equals a beta-distribution, with \((N-r(\varvec{C})-p+1)/2\) and \(r(\varvec{G})/2\) degrees of freedom. Furthermore, by a one-to-one transformation \(\lambda _L\) can be converted into an F-statistic.

Assuming that the profiles are parallel, a test can be constructed to see if the profiles are flat, i.e.

$$\begin{aligned} H_3|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FB}=\varvec{0}, \nonumber \\ A_3|H_1:&\quad E(\varvec{X})=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \end{aligned}$$
(6)

where \(\varvec{F}\) and \(\varvec{G}\) are defined in (2). The likelihood ratio statistic for the flatness hypothesis equals

$$\begin{aligned} \lambda _F=\frac{|\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'|}{|\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'+\varvec{F}\varvec{X}\varvec{P}_{C'G^\circ }\varvec{X}'\varvec{F}'|} , \end{aligned}$$
(7)

where there are two independently distributed Wishart matrices:

$$\begin{aligned}&\varvec{F}\varvec{X}\varvec{P}_{C'G^\circ }\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F}\varvec{\Sigma }\varvec{F}', r(\varvec{C}'\varvec{G}^\circ )), \\&\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C})+r(\varvec{G})). \end{aligned}$$

Thus,

$$\begin{aligned} \lambda _F\sim \Lambda (p-1, r(\varvec{C}'\varvec{G}^\circ ), N-r(\varvec{C})+r(\varvec{G})) \end{aligned}$$

and \(r(\varvec{C}'\varvec{G}^\circ )=r(\varvec{C})-r(\varvec{G})=1\).

To find an exact significance level for the level hypothesis test and the flatness hypothesis test is still an open question. The main reason for this fact is that it is a difficult problem to solve since the different “conditional” test statistics are dependent. If the significance level is important, a Bonferroni approach can be used. Since the focus in this article is to construct test statistics in “high dimensions”, we leave it to the future to compare the classical profile analysis approach with an approach based on “unconditional tests”.

3 High-Dimensional Setting

The focus in this article is on high-dimensional profile analysis. Several authors have approached the analysis of high-dimensional profiles. Onozawa et al. [15] and Harrar and Kong [5] (see also Hyodo [7]) derived test statistics for high-dimensional profile analysis with unequal covariance matrices. Takahashi and Shutoh [21] proposed new test statistics in profile analysis with high-dimensional data by applying the Cauchy–Schwarz inequality. The above-mentioned authors introduce different high-dimensional asymptotic frameworks and derive the test statistics in profile analysis under these frameworks. The approach in this article is different since we will not focus on the asymptotic distributions of the test statistics. Instead a fixed p (number of repeated measurements) and n (number of observations) are of interest with a p which can be much larger than n.

The method adopted in this article is mainly based on ideas put forward by Läuter [8, 9] and Läuter et al. [10, 11] who proposed a scoring method for dealing with high-dimensional problems in MANOVA. The method is more advanced than principal component analysis and tests based on these scores are exact. However, for the level test in profile analysis this article presents a completely new approach. The tests which arise from Läuter’s [8, 9] and Läuter et al.’s [10, 11] approach are based on linear scores which are constructed with the help of sums of products matrices. These scores are linear combinations of the repeated measurements. The approach implies that high-dimensional observations are compressed into low-dimensional observations and then these are used in the analysis instead of the original data. Note that we are very briefly mentioning the choice of scores and only one explicit expression of the scores is given in this work. However, there exist different kinds of scores and for details it can be referred to Läuter et al. [10] where several examples are presented.

Now it is started with a brief introductory mathematical presentation of the theory. Suppose

$$\begin{aligned} \varvec{X}\sim N_{p,n}(\varvec{\mu }\varvec{1}_n', \varvec{\Sigma }, \varvec{I}_n) \end{aligned}$$

and consider a single score

$$\begin{aligned} \varvec{z}'=(z_1, z_2,\cdots , z_n)=(d_1, d_2,\cdots , d_p)\varvec{X}=\varvec{d}'\varvec{X}, \end{aligned}$$

where \(\varvec{d}\) is the vector of weights and \(z_j\)’s, \(j\in \{1,...,n\}\), are the individual scores. Suppose that the null hypothesis \(\varvec{\mu }=\varvec{0}\) is of interest. In this case one can choose the vector \(\varvec{d}\) to be a unique function of \(\varvec{XX}'\) which is the total sums of product matrix of size \(p\times p\). Then, with \(\bar{z}=\frac{1}{n}\varvec{z}'\varvec{1}_n\) and \(s_z^2=\frac{1}{n-1}(\varvec{z}'\varvec{z}-n\bar{z}^2)\),

$$\begin{aligned} t=\frac{\sqrt{n}\bar{z}}{s_z}, \end{aligned}$$
(8)

is t-distributed with \(n-1\) degrees of freedom. Note that the vector of random scores \(\varvec{z}\) is not normally distributed. The result on this type of “robustness” follows from a general result stated in the next lemma.

Lemma 3.1

(Fang and Zhang [2], Theorem 2.5.8). Let \(\varvec{\Gamma z}\) and \(\varvec{z}\) have the same distribution for all orthogonal matrices \(\varvec{\Gamma }\) and suppose that \(P(\varvec{z}=\varvec{0})=0\) which defines a class of spherical distributions, say \(\Phi _+\). The distribution of a statistic \(t(\varvec{z})\), where the distribution of \(\varvec{z}\) belongs to \(\Phi _+\), is the same for each member of \(\Phi _+\), if for all \(\alpha >0\) the statistic \(t(\alpha \varvec{z})\) has the same distribution as \(t(\varvec{z})\).

The lemma is useful if the distribution of the statistic mentioned in Lemma 3.1 can be derived for one member of \(\Phi _+\). In particular if it holds for \(\varvec{u}\sim N_n(\varvec{0},\varvec{I})\) because \(\varvec{u}\) belongs to \(\Phi _+\) and then the distribution has been obtained for all members of the class \(\Phi _+\). These facts, among others, establish why (8) is true for all spherical distributions.

Wilk’s \(\Lambda\) statistic is frequently used in this article and Theorem 1 in Läuter et al. [11] implies the following theorem:

Theorem 3.1

Let \(\varvec{V}\sim W_p(\varvec{\Sigma },m)\) and \(\varvec{W}\sim W_p(\varvec{\Sigma },n)\) be independently Wishart distributed and put

$$\begin{aligned} \lambda =\frac{|\varvec{D}'\varvec{W}\varvec{D}|}{|\varvec{D}'(\varvec{W}+\varvec{V})\varvec{D}|}, \end{aligned}$$

where \(\varvec{D}\): \(s\times p\), \(s\le p\) is a function of \(\varvec{W}+\varvec{V}\) and the rank of \(\varvec{D}'(\varvec{W}+\varvec{V})\varvec{D}\) equals s with probability 1. Then \(\lambda \sim \Lambda (s,m,n)\).

In the theorem there is \(\varvec{\Sigma }\) involved in \(\varvec{V}\sim W_p(\varvec{\Sigma },m)\) and \(\varvec{W}\sim W_p(\varvec{\Sigma },n)\) but the distribution of \(\lambda\) is the same for all \(\varvec{\Sigma }\). It is only crucial that the same \(\varvec{\Sigma }\) is included in the distributions for \(\varvec{W}\) and \(\varvec{V}\). One way of constructing the weights \(\varvec{D}\) is to use the so-called principal component method (see Läuter et al. [11]), where the weights are determined by solving the eigenvalue problem

$$\begin{aligned} (\varvec{W}+\varvec{V})\varvec{D}=\varvec{D}\varvec{\Psi },\qquad \varvec{D}'\varvec{D}=\varvec{I}, \end{aligned}$$
(9)

and \(\varvec{\Psi }\) is a diagonal matrix with the positive eigenvalues.

The next corollary of the theorem is what is needed in this article.

Corollary 3.1

Let \(\varvec{V}\sim W_p(\varvec{\Sigma },m)\) and \(\varvec{W}\sim W_p(\varvec{\Sigma },n)\) be independently Wishart distributed and

$$\begin{aligned} \beta =\frac{\varvec{d}'\varvec{V}\varvec{d}}{\varvec{d}'\varvec{W}\varvec{d}+\varvec{d}'\varvec{V}\varvec{d}}, \end{aligned}$$

where \(\varvec{d}\in \mathfrak {R}^p\) is a function of \(\varvec{W}+\varvec{V}\) such that \(\varvec{d}'(\varvec{W}+\varvec{d}'\varvec{V})\varvec{d}>0\) with probability 1. Then \(\beta\) is \(\beta\)-distributed with m/2 and n/2 degrees of freedom.

Proof

By definition of a Wishart distribution there exist \(\varvec{Y}\sim N_{p,m}(\varvec{0},\varvec{\Sigma },\varvec{I}_m)\) and \(\varvec{Z}\sim N_{p,n}(\varvec{0},\varvec{\Sigma },\varvec{I}_n)\) such that \(\varvec{V}=\varvec{YY}'\) and \(\varvec{W}=\varvec{ZZ}'\). Put \(\varvec{X}=(\varvec{Y}:\varvec{Z})\) and then \(\varvec{V}+\varvec{W}=\varvec{XX}'=\varvec{YY}'+\varvec{ZZ}'\). Since \(\varvec{d}=\varvec{g}(\varvec{V}+\varvec{W})\) for some function \(\varvec{g}(\bullet )\), \(\varvec{d}'\varvec{X}={\varvec{g}}'(\varvec{XX}')\varvec{X}\). Moreover, for any orthogonal matrix \(\varvec{\Gamma }\), \({\varvec{g}}(\varvec{XX}')={\varvec{g}}(\varvec{X}\varvec{\Gamma }\varvec{\Gamma }'\varvec{X}')\) and since \(\varvec{X}\varvec{\Gamma }\) has the same distribution as \(\varvec{X}\), the score \(\varvec{d}'\varvec{X}={\varvec{g}}'(\varvec{XX}')\varvec{X}\) is spherically distributed. Note that \(\beta\) can be written

$$\begin{aligned} \beta =\frac{\varvec{d}'\varvec{X}(\varvec{I}:\varvec{0})'(\varvec{I}:\varvec{0})\varvec{X}'\varvec{d}}{\varvec{d}'\varvec{XX}'\varvec{d}} \end{aligned}$$

and it follows from Lemma 3.1 that since the statement instead of \(\varvec{d}'\varvec{X}\) also is true for a normally distributed variable with a dispersion matrix equal to \({\varvec{I}}\), \(\beta\) is indeed \(\beta\)-distributed with m/2 and n/2 degrees of freedom.\(\square\)

Corresponding to (9) one alternative way to determine the weight \(\varvec{d}\) in Corollary 3.1 is by solving the eigenvalue/eigenvector problem

$$\begin{aligned} (\varvec{W}+\varvec{V})\varvec{d}=\psi \varvec{d},\qquad \varvec{d}'\varvec{d}=1,\quad \psi >0. \end{aligned}$$

4 Main Results

In Sect. 2 the three hypothesis in classical profile analysis was presented, i.e. when \(N>p+q\) . Now it will be focused on the high-dimensional setting when \(p>N-q\). Consider, for example, the test statistic given in (3). When p is large the problem is that \(\varvec{S}\) is singular and the determinant in (3) equals 0.

Läuter [8, 9] and Läuter et al. [10, 11] directly applied a vector \(\varvec{d}\) to the observation matrix \(\varvec{X}\). In this article, since there is a bilinear testing situation, it is proposed to apply \(\varvec{d}\) to \(\varvec{FX}\) where \(\varvec{F}\) is given in (2). Hence, \(\varvec{d}\) is of size \(p-1\).

Let \(\varvec{Y}=\varvec{FX}\), then the following test statistic is proposed to test the hypothesis of parallel profiles which is based on the formulation in (1):

Proposition 4.1

Let the parallelism hypothesis be defined via (1) and let \(\varvec{Y}=\varvec{FX}\). A test statistic for testing the hypothesis is given by

$$\begin{aligned} \lambda _{Ph}=\frac{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}}{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}}. \end{aligned}$$
(10)

It can be noted that \(\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C}))\) is independent of \(\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{G}))\) and if \(\varvec{d}\) is a function of \(\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\) \(+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'=\varvec{Y}(\varvec{I}-\varvec{P}_{C'G^o})\varvec{Y}'\) Corollary 3.1 establishes the next theorem.

Theorem 4.1

If \(\varvec{d}\) is a nonzero function (with probability 1) of \(\varvec{Y}(\varvec{I}-\varvec{P}_{C'G^{o}})\varvec{Y}'\) the test statistic for testing for parallel profiles in high dimensions, given in (10), follows a \(\beta\)-distribution with parameters \((N-r(\varvec{C}))/2\) and \(r(\varvec{G})/2\).

In (5) the likelihood ratio was presented for the level hypothesis defined in (4). As it can be seen from the expression in (5) problems occur in high dimensions because \(\varvec{S}^{-1}\) does not exist when \(p>N-r(\varvec{C})+1\). Therefore, \(({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}\) and \(({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^{\circ }}'\varvec{S}^{-1}\) have to be studied. It can be noted that if \(p<N-r(\varvec{C})+1\),

$$\begin{aligned} ({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}\sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-p+1)). \end{aligned}$$
(11)

Thus, it can be seen that if \(p>N-r(\varvec{C})+1\) negative degrees of freedom appear which of course is impossible. Thus, in order to test the level hypothesis in high dimensions the statistic in (5) has to be modified significantly. In this article the idea will be to modify the statistic in (5) as little as possible but so much that high-dimensional statistical analyses can be carried out.

A couple of ideas will bring us to a proposition where a test statistic and its distribution are given. The first idea will be to prove (11) in detail and see if anything can be modified so that when p is large reasonable expressions appear. To simplify notation \((\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\) will be discussed where, for example, \(\varvec{A}=(\varvec{F}')^\circ\) and the inverse is supposed to exist. The following chain of equalities shows some interesting structure:

$$\begin{aligned} (\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}(\varvec{S}-\varvec{S}\varvec{A}^\circ ({\varvec{A}^\circ }'\varvec{S}\varvec{A}^\circ )^{-1}{\varvec{A}^\circ }'\varvec{S})\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}\varvec{P}\varvec{X}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}, \end{aligned}$$
(12)

where

$$\begin{aligned} \varvec{P}=\varvec{I}-\varvec{P_{C'}}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{A}^\circ ({\varvec{A}^\circ }'\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{A}^\circ )^{-1}{\varvec{A}^\circ }'\varvec{X}(\varvec{I}-\varvec{P_{C'}}). \end{aligned}$$
(13)

Since \(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}\) is independent of \({\varvec{A}^\circ }'\varvec{X}\) it is also independent of \(\varvec{P}\). Moreover, \(\varvec{P}\) is idempotent and \(r(\varvec{P})=N-r(\varvec{C})-p+r(\varvec{A})\). Thus, for any choice of \(\varvec{A}^\circ\), conditionally on \({\varvec{A}^\circ }'\varvec{X}\), the expression in (12) is Wishart distributed but it also appears that this distribution is independent of \({\varvec{A}^\circ }'\varvec{X}\) and therefore (11) is established for the specific choice \(\varvec{A}=(\varvec{F}')^\circ\).

The critical point in the high-dimensional setting is that \(r(\varvec{P})\) will approach 0, even if the inverse in (13) is replaced by a g-inverse, and therefore, it is proposed that \(\varvec{P}\) is modified in such a way that instead of this projection (expressed in \(\varvec{F}'\) instead of \(\varvec{A}^\circ\))

$$\begin{aligned} \widetilde{\varvec{P}}=\varvec{I}-\varvec{P_{C'}}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d}(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1} \varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}}) \end{aligned}$$
(14)

will be used, where \(\varvec{d}\) is a function in \(\varvec{F}\varvec{X}\). Summarizing these calculations will give a quantity which will be used as nominator in a test statistic for testing the level hypothesis:

$$\begin{aligned} U=({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}\varvec{X}\widetilde{\varvec{P}}\varvec{X}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}. \end{aligned}$$
(15)

How to choose \(\varvec{d}\) in (14) is not clear. On the one hand the distribution for U does not depend on \(\varvec{d}\) but if explicit expressions are to be calculated this will be a function \(\varvec{d}\) where now data has replaced \(\varvec{X}\). From (5) it also follows that \((\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}\) has to be considered. Similar calculations to those in (12) yield

$$\begin{aligned}&(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}\varvec{X}= (\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}\varvec{X} \nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}\nonumber \\&\quad \times \,(\varvec{I}-(\varvec{I}-\varvec{P}_{C'})\varvec{X}'(\varvec{A}')^\circ ((\varvec{A'})^{o'}\varvec{X} (\varvec{I}-\varvec{P}_{C'})\varvec{X}'(\varvec{A'})^o)^{-1}(\varvec{A'})^{o'}\varvec{X}).\qquad \end{aligned}$$
(16)

Replacing \(\varvec{A}\) by \((\varvec{F}')^\circ\) and then using the same \(\varvec{d}\) vector as in (14) leads to

$$\begin{aligned} \varvec{h}=\varvec{P}_1'\varvec{X}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, \end{aligned}$$
(17)

where

$$\begin{aligned} \varvec{P}_1=\varvec{I}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d}(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1} \varvec{d}'\varvec{F}\varvec{X} \end{aligned}$$
(18)

is an idempotent matrix. Moreover, let

$$\begin{aligned} \widetilde{\varvec{Q}}&= \varvec{G}'(\varvec{CC}')^{-1}\varvec{G}+\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{F}'\varvec{d}\nonumber \\&\quad \times \,(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1}\varvec{d}'\varvec{F}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G} \end{aligned}$$
(19)

and put

$$\begin{aligned} V=\varvec{h}'\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\widetilde{\varvec{Q}}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{h}. \end{aligned}$$
(20)

Corresponding to \(\lambda _L\) in (5) the next quantity is proposed to test the level hypothesis:

$$\begin{aligned} \lambda _{Lh,prel}=\frac{U}{U+V}, \end{aligned}$$
(21)

where U and V are defined in (15) and (20), respectively. However, it is not possible to use \(\lambda _{Lh,prel}\) in (21) because both U and V include the unknown dispersion matrix \({\varvec{\Sigma }}\) so a few more results have to be established. It follows immediately that

$$\begin{aligned} U\sim & {} W_1(({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-1), \end{aligned}$$
(22)
$$\begin{aligned} V\sim & {} W_1(({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{-1}, r(\varvec{G})). \end{aligned}$$
(23)

If multiplying the nominator and denominator in (21) by \(({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{1/2}\) it follows since the distribution of

$$\begin{aligned} ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1/2}({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}\varvec{X} \end{aligned}$$

is independent of \(\varvec{\Sigma }\) that the distribution of \(\lambda _{Lh,prel}\) is independent of \(\varvec{\Sigma }\). Thus, in order to have a test statistic which is functionally independent of the dispersion matrix \(\varvec{\Sigma }=\varvec{I}\) is chosen. In the next proposition the text statistic for the level hypothesis is stated.

Proposition 4.2

Let the level hypothesis testing problem be defined via (4). A test statistic for testing the level hypothesis is given by

$$\begin{aligned} \lambda _{Lh}=\frac{\widetilde{U}}{\widetilde{U}+\widetilde{V}}, \end{aligned}$$

where

$$\begin{aligned} \widetilde{U}&=({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^\circ }'\varvec{X}\widetilde{\varvec{P}}\varvec{X}'(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1},\\ \widetilde{V}&=\widetilde{\varvec{h}}'\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\widetilde{\varvec{Q}}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\widetilde{\varvec{h}},\\ \widetilde{\varvec{h}}&=\varvec{P}_1'\varvec{X}'{(\varvec{F}')^\circ }({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1} \end{aligned}$$

with \(\widetilde{\varvec{P}}\), \(\varvec{P}_1\) and \(\widetilde{\varvec{Q}}\) defined in (14), (18) and (19), respectively.

How to choose \(\varvec{d}\) in (14) and (18) is not clear. On the one side the distribution for \(\lambda _{Lh}\) does not depend on the choice of \(\varvec{d}\) whereas on the other side when calculating an explicit value of \(\lambda _{Lh}\) the choice of \(\varvec{d}\) will have an impact. This type of phenomena has been discussed very rarely. (We only know about it when a singular Gauss-Markov model has been considered.)

Based on the results in (22) and (23) the next theorem can be formulated.

Theorem 4.2

Let \(\varvec{d}\) in (14) be a nonzero function (with probability 1) of \(\varvec{F}\varvec{X}\). The test statistic in Proposition 4.2 for testing the level hypothesis in (4) follows a \(\beta\)-distribution with parameters \((N-r(\varvec{C})-1)/2\) and \(r(\varvec{G})/2\).

The third hypothesis of flatness was stated in (6). Since the approach for creating a test statistic is the same as for the parallelism hypothesis, the results are stated without any proofs.

Proposition 4.3

Let the flatness hypothesis be defined via (6) and put \(\varvec{Y}=\varvec{F}\varvec{X}\). A test statistic for testing the hypothesis is given by

$$\begin{aligned} \lambda _{Fh}=\frac{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}}{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}'\varvec{d}}, \end{aligned}$$
(24)

where \(\varvec{d}\in \mathfrak {R}^{p-1}\), with probability 1, is a nonzero function in \(\varvec{Y}\varvec{Y}'\).

The motivation that \(\varvec{d}\) is a function in \(\varvec{Y}\varvec{Y}'\) follows from the fact that

$$\begin{aligned} \varvec{Y}\varvec{Y}'=\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'+\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}'. \end{aligned}$$

Moreover,

$$\begin{aligned}&\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C}'\varvec{G}^\circ )),\\&\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}' \sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{C}'\varvec{G}^\circ )). \end{aligned}$$

These results imply that the next theorem can be established.

Theorem 4.3

The test statistic for testing for flatness in high dimensions, given in Proposition 4.3, follows a \(\beta\)-distribution with parameters \((N-r(\varvec{C}'\varvec{G}^\circ ))/2\) and \(r(\varvec{C}'\varvec{G}^\circ )/2\).

5 Concluding Remarks

In this article the three well-known test statistics in profile analysis have been modified so that high-dimensional data can be handled in a non-asymptotic approach. The test for parallelism and flatness was derived following ideas given by Läuter [8, 9] and Läuter et al. [10, 11] which originally was developed for handling MANOVA problems. Concerning the level testing a completely new approach is proposed. Here we modify the degrees of freedom and an exact test is derived.

The vector \(\varvec{d}\), which is utilized in our approach when testing for parallelism and flatness, is a function of some sums of squares has only briefly been considered in this article. Instead it is referred to Läuter et al. [10], Section 4, where several different alternatives for determining \(\varvec{d}\) are proposed. Furthermore, a generalization of the presented approach in this article will be to apply a matrix \(\varvec{D}\), i.e. study several linear scores, instead of a vector \(\varvec{d}\) which only give one score.

There is another important problem (observed by one of the reviewers) that the choice of \(\varvec{F}\) can have an effect on the choice of \(\varvec{d}\) and thus the test statistic depends in fact on the choice of \(\varvec{F}\). It is important to continue this work and establish restrictions on the choice of \(\varvec{d}\) so that the vector only depends on the space generated by the columns in \(\varvec{F}\), i.e. instead of using \(\varvec{F}\) using the projection \(\varvec{F}(\varvec{F}'\varvec{F})^-\varvec{F}'\).