Profile Analysis in High Dimensions

Cengiz, Cigdem; von Rosen, Dietrich; Singull, Martin

doi:10.1007/s42519-020-00154-z

Profile Analysis in High Dimensions

Original Article
Open access
Published: 22 December 2020

Volume 15, article number 15, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Profile Analysis in High Dimensions

Download PDF

2421 Accesses
2 Citations
Explore all metrics

Abstract

The three tests in profile analysis: test of parallelism, test of level and test of flatness are modified so that high-dimensional data can be analysed. Using specific scores, dimension reduction is performed and the exact null distributions are derived for the three hypotheses.

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this article, we are going to consider profile analysis from a high-dimensional perspective, i.e. there are so many parameters in the model that there are not enough degrees of freedom to test those hypotheses which are part of the analysis. Profile analysis consists of three tests, and the tests are carried out in a certain order. The tests are: (1) the test of parallelism; (2) the test of equal level; and (3) the test of flatness. Later we will be more specific how the tests are performed. The approach differs somewhat from the usual likelihood ratio testing procedure since, in particular, the hypotheses are chosen in a certain order.

There are two possible distinct scenarios in the analysis of profiles which both imply a need to extend the classical theory to cover high-dimensional models:

1.
The same variable is observed for each subject over many time points (repeated measurements) which often are tightly distributed along a finite interval. There can be more repeated measurements than the number of independent subjects.
2.
For each subject many variables should be analysed simultaneously. There can be more variables than the number of independent observations.

In (1) there is a natural ordering of the repeated measurements. For example, it can be a growth curve or in general any stream of observations which is generated from some measurement device. In (2) there does not exist any ordering between variables. Instead one can measure hundreds of characteristics on a subject, for example components from a vehicle.

Over the years profile analysis has been studied by many authors. One of the first contributions was published by Greenhouse and Geisser [4]. Originally, mean values were compared and modelled. Many years after profile analysis had been introduced, Srivastava [17] derived likelihood ratio-based test statistics together with their distributions. Illuminating chapters on profile analysis can be found in the books by Srivastava and Carter [19] and Srivastava [18]. A more sophisticated approach was suggested by Ohlson and Srivastava [13] who considered profile analysis of several groups, where the groups of subjects only partly had a common profile. In von Rosen [22] an overview of classical profile analysis has been given.

Fujikoshi [3], followed by Seo et al. [16], derived likelihood ratio tests for the parallelism, level and flatness hypothesis, respectively, when analysing growth curve data. For a parallel profile model, i.e. assuming parallel profiles, it has also been proposed to consider different covariance structures. In Yokoyama and Fujikoshi [25] and Yokoyama [24] the random-effect covariance structure was considered and some tests for the random-effect and flatness were derived. Later, Srivastava and Singull [20] constructed likelihood ratio tests in profile analysis, without any restrictions on the parameter space, for testing the covariance matrix for random-effect structure or sphericity.

Moreover, profile analysis has also been discussed under more general models. Okamoto et al. [14] studied the asymptotic expansions of the distributions of some test statistics considering elliptical distributions. Others have extended this model and discuss the asymptotic expansions for the null distribution of test statistics for profile analysis under non-normality, e.g. see Maruyama [12] and Harrar and Xu [6].

Our ideas about analysing high-dimensional profile data stem from works by Läuter [8, 9] and Läuter et al. [10, 11] where random scores were utilized in MANOVA models. Scores have for a long time been used in statistics and mostly they constitute of known linear combinations of random vectors/matrices. The idea with the random scores is that they should be applied to test statistics which to some extent are robust, i.e. instead of having test statistics which are based on normally distributed vectors they can be elliptically distributed and still the test statistics follow the same distribution as when the observed variables are normally distributed.

In Sect. 2 profile analysis is introduced and necessary background information for the rest of the paper is presented. Thereafter, in Sect. 3 the high-dimensional approach of this paper is described and in Sect. 4 the usual test statistics are modified so that high-dimensional data can be handled. Section 5 comprises some concluding remarks.

Concerning notation bold upper case letters denote matrices and bold lower case vectors. Other notations are defined when they first appear.

2 Profile Analysis

Assume that there are q groups which should be compared with $n_i$ p-dimensional random vectors $\varvec{x}_{ij}$, $j\in \{1,...,n_i\}$, which are independently normally distributed as $N_p(\varvec{\mu }_i, \varvec{\Sigma })$, $i\in \{1,...,q\}$, where $\varvec{\mu }_i=(\mu _{1,i},...,\mu _{p,i})'$ and $\varvec{\Sigma }$ is an unknown positive definite dispersion matrix. As mentioned in the introduction there are three different hypotheses which are commonly considered in profile analysis:

1.
Parallelism hypothesis

$H_1: \varvec{\mu }_i-\varvec{\mu }_q=\gamma _i\varvec{1}_p$, $i\in \{1,...,q-1\}$ and $A_1\ne H_1$, meaning that $\varvec{\mu }_i-\varvec{\mu }_q\ne \gamma _i\varvec{1}_p$, $i\in \{1,...,q-1\}$, where $A_1$ stands for alternative hypothesis, the parameters $\gamma _i$ are unknown scalars and $\varvec{1}_p$ is a p-dimensional vector of ones;
2.
Level hypothesis

$H_2|H_1: \gamma _i=0$, $i\in \{1,...,q-1\}$ and $A_2\ne H_2|H_1$, implying that $\gamma _i\ne 0$, $i\in \{1,...,q-1\}$, where $H_2|H_1$ means $H_2$ under the assumption that $H_1$ is true;
3.
Flatness hypothesis

$H_3|H_1: \varvec{\mu }_i=\psi _i\varvec{1}_p$, $i\in \{1,...,q\}$ and $A_3\ne H_3|H_1$, where $H_3|H_1$ means $H_3$ under the assumption that $H_1$ is true, and the parameters $\psi _i$ are unknown scalars.

One can note that instead of $H_3|H_1$ the strategy can be to test $H_3|H_2$, when we have $\psi _1=\cdots =\psi _q$. In this way, profile analysis can be built up around a chain of tests.

Profile analysis can also be reformulated with the help of matrices and the MANOVA and growth curve model (GMANOVA) as well as the extended growth curve model which all belong to the class of bilinear models (see von Rosen [23]). Moreover, for technical details we refer to the report by Cengiz and von Rosen [1].

Let the observation matrix be matrix normally distributed, i.e. $\varvec{X}\sim N_{p,N}(\varvec{BC},\varvec{\Sigma },\varvec{I})$, $\varvec{B}$: $p\times q$ and $\varvec{\Sigma }$: $p\times p$, consist of the unknown parameters and $\varvec{C}$: $q\times N$, is the design matrix describing the q groups. In this article, to simplify presentation, $\varvec{C}$ is supposed to be of full rank. It can be noted that $\varvec{B}=(\varvec{\mu _1},\dots ,\varvec{\mu _q})$ and one choice of $\varvec{C}$ is

$$\begin{aligned} \varvec{C}=\left( \begin{array}{cccc}\varvec{1}'_{n_1}&{}\varvec{0}&{}\cdots &{}\varvec{0}\\ \varvec{0}&{}\varvec{1}'_{n_2}&{}\cdots &{}\vdots \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ \varvec{0}&{}\varvec{0}&{}\cdots &{}\varvec{1}'_{n_q} \end{array}\right) . \end{aligned}$$

The null hypothesis and the alternative hypothesis for parallelism can be written

$$\begin{aligned} H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \nonumber \\ A_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \text {no restrictions on}\quad \varvec{B}, \end{aligned}$$

(1)

where $\varvec{F}$ and $\varvec{G}$ are contrast matrices given by

$$\begin{aligned} \varvec{F}= \begin{pmatrix} 1&{}-1&{}0&{}0&{}\cdots &{}0&{}0 \\ 0&{}1&{}-1&{}0&{}\cdots &{}0&{}0 \\ 0&{}0&{}1&{}-1&{}\cdots &{}0&{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{} \ddots &{}\vdots &{}\vdots \\ 0&{}0&{}0&{}0&{}\cdots &{}1&{}-1 \end{pmatrix},\qquad \varvec{G}= \begin{pmatrix} 1&{}0&{}\cdots &{}0 \\ -1&{}1&{}\cdots &{}0 \\ 0&{}-1&{}\cdots &{}0 \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}1 \\ 0&{}0&{}\cdots &{}-1 \end{pmatrix}. \end{aligned}$$

(2)

It can be noted that $\varvec{F}$ is of size $(p-1)\times p$ and $\varvec{G}$ is of size $q\times (q-1)$, respectively, of ranks $p-1$ and $q-1$.

If $N-2$ is larger than p, the likelihood ratio statistic for the parallelism hypothesis can be given as

$$\begin{aligned} \lambda _P=\frac{|\varvec{FSF}'|}{|\varvec{FSF}'+\varvec{FX}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'|}, \end{aligned}$$

(3)

where $|\bullet |$ stands for the determinant, the projection $\varvec{P}_{M}=\varvec{M}(\varvec{M}'\varvec{M})^{-1}\varvec{M}'$, for any matrix expression $\varvec{M}$ of full rank and $\varvec{S}=\varvec{X}(\varvec{I}-\varvec{P}_{C'})\varvec{X}'$. The notation of a projection will frequently be used in this article. Let $r(\varvec{M})$ denote the rank of a matrix $\varvec{M}$. Moreover,

$$\begin{aligned}&\varvec{FSF}' \sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C})), \\&\varvec{FX}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{G})), \end{aligned}$$

are two expressions which are independently distributed, where $W_p(\varvec{\Psi },n)$ denotes the Wishart distribution with scale parameter $\varvec{\Psi }$ and n degrees of freedom. Note that $r(\varvec{C})=q$ and $r(\varvec{G})=q-1$. If $p=1$ then a Wishart variable is proportional to a chi-square variable. Furthermore, if $\varvec{U}\sim W_p(\varvec{\Sigma },n)$ is independent of $\varvec{V}\sim W_p(\varvec{\Sigma },m)$ then

$$\begin{aligned} \frac{|\varvec{U}|}{|\varvec{U}+\varvec{V}|}\sim \Lambda (p,m,n), \end{aligned}$$

which is known as Wilk’s lambda distribution. Hence, the distribution for the likelihood ratio statistic given in (3) is

$$\begin{aligned} \lambda _P\sim \Lambda (p-1, r(\varvec{G}), N-r(\varvec{C})). \end{aligned}$$

If the profiles are parallel, we can say that there is no interaction between the responses and the groups. Given that the parallelism hypothesis holds, the next step is to proceed with testing the second hypothesis, $H_2$, which indicates that there is no group effect. Moreover, if the first hypothesis holds, one may also want to test the third hypothesis, $H_3$, meaning that the response is constant “over time”. Note that failing to reject $H_1$, as always, does not mean that the hypothesis is true but in profile analysis it is used as a strategy for analysing data.

The level hypothesis in matrix form equals

$$\begin{aligned} H_2|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{BG}=\varvec{0}, \nonumber \\ A_2|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \end{aligned}$$

(4)

where $\varvec{F}$ and $\varvec{G}$ are defined in (2). Let $\varvec{M}^{\circ }$ denote any matrix of full rank generating the orthogonal complement to the column space of $\varvec{M}$. The corresponding likelihood ratio test statistic for the level hypothesis can be expressed as

$$\begin{aligned} \lambda _L=\frac{({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}}{({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}+\varvec{H}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\varvec{Q}^{-1}\varvec{G}' (\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{H}'}, \end{aligned}$$

(5)

where

$$\begin{aligned} \varvec{H}&= ({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1} {(\varvec{F}')^{\circ }}'\varvec{S}^{-1},\\ \varvec{Q}&= \varvec{G}'(\varvec{CC}')^{-1}\varvec{G}+\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{F}'(\varvec{FSF}')^{-1}\varvec{F}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}. \end{aligned}$$

Note that $r{((\varvec{F}')^{\circ })}=1$ since $r(\varvec{F})=p-1$. Then, it can be shown that

$$\begin{aligned}&({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1} \sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-p+1), \\&\varvec{H}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\varvec{Q}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{H}' \sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, r(\varvec{G})), \end{aligned}$$

which are independently distributed. Moreover, the Wilk’s lambda distribution for (5) is given by

$$\begin{aligned} \lambda _L\sim \Lambda (1, r(\varvec{G}), N-r(\varvec{C})-p+1) \end{aligned}$$

which equals a beta-distribution, with $(N-r(\varvec{C})-p+1)/2$ and $r(\varvec{G})/2$ degrees of freedom. Furthermore, by a one-to-one transformation $\lambda _L$ can be converted into an F-statistic.

Assuming that the profiles are parallel, a test can be constructed to see if the profiles are flat, i.e.

$$\begin{aligned} H_3|H_1:&\quad E[\varvec{X}]=\varvec{BC}, \quad \varvec{FB}=\varvec{0}, \nonumber \\ A_3|H_1:&\quad E(\varvec{X})=\varvec{BC}, \quad \varvec{FBG}=\varvec{0}, \end{aligned}$$

(6)

where $\varvec{F}$ and $\varvec{G}$ are defined in (2). The likelihood ratio statistic for the flatness hypothesis equals

$$\begin{aligned} \lambda _F=\frac{|\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'|}{|\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}'+\varvec{F}\varvec{X}\varvec{P}_{C'G^\circ }\varvec{X}'\varvec{F}'|} , \end{aligned}$$

(7)

where there are two independently distributed Wishart matrices:

$$\begin{aligned}&\varvec{F}\varvec{X}\varvec{P}_{C'G^\circ }\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F}\varvec{\Sigma }\varvec{F}', r(\varvec{C}'\varvec{G}^\circ )), \\&\varvec{FSF}'+\varvec{F}\varvec{X}\varvec{P}_{C'(CC')^{-1}G}\varvec{X}'\varvec{F}' \sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C})+r(\varvec{G})). \end{aligned}$$

Thus,

$$\begin{aligned} \lambda _F\sim \Lambda (p-1, r(\varvec{C}'\varvec{G}^\circ ), N-r(\varvec{C})+r(\varvec{G})) \end{aligned}$$

and $r(\varvec{C}'\varvec{G}^\circ )=r(\varvec{C})-r(\varvec{G})=1$.

To find an exact significance level for the level hypothesis test and the flatness hypothesis test is still an open question. The main reason for this fact is that it is a difficult problem to solve since the different “conditional” test statistics are dependent. If the significance level is important, a Bonferroni approach can be used. Since the focus in this article is to construct test statistics in “high dimensions”, we leave it to the future to compare the classical profile analysis approach with an approach based on “unconditional tests”.

3 High-Dimensional Setting

The focus in this article is on high-dimensional profile analysis. Several authors have approached the analysis of high-dimensional profiles. Onozawa et al. [15] and Harrar and Kong [5] (see also Hyodo [7]) derived test statistics for high-dimensional profile analysis with unequal covariance matrices. Takahashi and Shutoh [21] proposed new test statistics in profile analysis with high-dimensional data by applying the Cauchy–Schwarz inequality. The above-mentioned authors introduce different high-dimensional asymptotic frameworks and derive the test statistics in profile analysis under these frameworks. The approach in this article is different since we will not focus on the asymptotic distributions of the test statistics. Instead a fixed p (number of repeated measurements) and n (number of observations) are of interest with a p which can be much larger than n.

The method adopted in this article is mainly based on ideas put forward by Läuter [8, 9] and Läuter et al. [10, 11] who proposed a scoring method for dealing with high-dimensional problems in MANOVA. The method is more advanced than principal component analysis and tests based on these scores are exact. However, for the level test in profile analysis this article presents a completely new approach. The tests which arise from Läuter’s [8, 9] and Läuter et al.’s [10, 11] approach are based on linear scores which are constructed with the help of sums of products matrices. These scores are linear combinations of the repeated measurements. The approach implies that high-dimensional observations are compressed into low-dimensional observations and then these are used in the analysis instead of the original data. Note that we are very briefly mentioning the choice of scores and only one explicit expression of the scores is given in this work. However, there exist different kinds of scores and for details it can be referred to Läuter et al. [10] where several examples are presented.

Now it is started with a brief introductory mathematical presentation of the theory. Suppose

$$\begin{aligned} \varvec{X}\sim N_{p,n}(\varvec{\mu }\varvec{1}_n', \varvec{\Sigma }, \varvec{I}_n) \end{aligned}$$

and consider a single score

$$\begin{aligned} \varvec{z}'=(z_1, z_2,\cdots , z_n)=(d_1, d_2,\cdots , d_p)\varvec{X}=\varvec{d}'\varvec{X}, \end{aligned}$$

where $\varvec{d}$ is the vector of weights and $z_j$’s, $j\in \{1,...,n\}$, are the individual scores. Suppose that the null hypothesis $\varvec{\mu }=\varvec{0}$ is of interest. In this case one can choose the vector $\varvec{d}$ to be a unique function of $\varvec{XX}'$ which is the total sums of product matrix of size $p\times p$. Then, with $\bar{z}=\frac{1}{n}\varvec{z}'\varvec{1}_n$ and $s_z^2=\frac{1}{n-1}(\varvec{z}'\varvec{z}-n\bar{z}^2)$,

$$\begin{aligned} t=\frac{\sqrt{n}\bar{z}}{s_z}, \end{aligned}$$

(8)

is t-distributed with $n-1$ degrees of freedom. Note that the vector of random scores $\varvec{z}$ is not normally distributed. The result on this type of “robustness” follows from a general result stated in the next lemma.

Lemma 3.1

(Fang and Zhang [2], Theorem 2.5.8). Let $\varvec{\Gamma z}$ and $\varvec{z}$ have the same distribution for all orthogonal matrices $\varvec{\Gamma }$ and suppose that $P(\varvec{z}=\varvec{0})=0$ which defines a class of spherical distributions, say $\Phi _+$. The distribution of a statistic $t(\varvec{z})$, where the distribution of $\varvec{z}$ belongs to $\Phi _+$, is the same for each member of $\Phi _+$, if for all $\alpha >0$ the statistic $t(\alpha \varvec{z})$ has the same distribution as $t(\varvec{z})$.

The lemma is useful if the distribution of the statistic mentioned in Lemma 3.1 can be derived for one member of $\Phi _+$. In particular if it holds for $\varvec{u}\sim N_n(\varvec{0},\varvec{I})$ because $\varvec{u}$ belongs to $\Phi _+$ and then the distribution has been obtained for all members of the class $\Phi _+$. These facts, among others, establish why (8) is true for all spherical distributions.

Wilk’s $\Lambda$ statistic is frequently used in this article and Theorem 1 in Läuter et al. [11] implies the following theorem:

Theorem 3.1

Let $\varvec{V}\sim W_p(\varvec{\Sigma },m)$ and $\varvec{W}\sim W_p(\varvec{\Sigma },n)$ be independently Wishart distributed and put

$$\begin{aligned} \lambda =\frac{|\varvec{D}'\varvec{W}\varvec{D}|}{|\varvec{D}'(\varvec{W}+\varvec{V})\varvec{D}|}, \end{aligned}$$

where $\varvec{D}$: $s\times p$, $s\le p$ is a function of $\varvec{W}+\varvec{V}$ and the rank of $\varvec{D}'(\varvec{W}+\varvec{V})\varvec{D}$ equals s with probability 1. Then $\lambda \sim \Lambda (s,m,n)$.

In the theorem there is $\varvec{\Sigma }$ involved in $\varvec{V}\sim W_p(\varvec{\Sigma },m)$ and $\varvec{W}\sim W_p(\varvec{\Sigma },n)$ but the distribution of $\lambda$ is the same for all $\varvec{\Sigma }$. It is only crucial that the same $\varvec{\Sigma }$ is included in the distributions for $\varvec{W}$ and $\varvec{V}$. One way of constructing the weights $\varvec{D}$ is to use the so-called principal component method (see Läuter et al. [11]), where the weights are determined by solving the eigenvalue problem

$$\begin{aligned} (\varvec{W}+\varvec{V})\varvec{D}=\varvec{D}\varvec{\Psi },\qquad \varvec{D}'\varvec{D}=\varvec{I}, \end{aligned}$$

(9)

and $\varvec{\Psi }$ is a diagonal matrix with the positive eigenvalues.

The next corollary of the theorem is what is needed in this article.

Corollary 3.1

Let $\varvec{V}\sim W_p(\varvec{\Sigma },m)$ and $\varvec{W}\sim W_p(\varvec{\Sigma },n)$ be independently Wishart distributed and

$$\begin{aligned} \beta =\frac{\varvec{d}'\varvec{V}\varvec{d}}{\varvec{d}'\varvec{W}\varvec{d}+\varvec{d}'\varvec{V}\varvec{d}}, \end{aligned}$$

where $\varvec{d}\in \mathfrak {R}^p$ is a function of $\varvec{W}+\varvec{V}$ such that $\varvec{d}'(\varvec{W}+\varvec{d}'\varvec{V})\varvec{d}>0$ with probability 1. Then $\beta$ is $\beta$-distributed with m/2 and n/2 degrees of freedom.

Proof

By definition of a Wishart distribution there exist $\varvec{Y}\sim N_{p,m}(\varvec{0},\varvec{\Sigma },\varvec{I}_m)$ and $\varvec{Z}\sim N_{p,n}(\varvec{0},\varvec{\Sigma },\varvec{I}_n)$ such that $\varvec{V}=\varvec{YY}'$ and $\varvec{W}=\varvec{ZZ}'$. Put $\varvec{X}=(\varvec{Y}:\varvec{Z})$ and then $\varvec{V}+\varvec{W}=\varvec{XX}'=\varvec{YY}'+\varvec{ZZ}'$. Since $\varvec{d}=\varvec{g}(\varvec{V}+\varvec{W})$ for some function $\varvec{g}(\bullet )$, $\varvec{d}'\varvec{X}={\varvec{g}}'(\varvec{XX}')\varvec{X}$. Moreover, for any orthogonal matrix $\varvec{\Gamma }$, ${\varvec{g}}(\varvec{XX}')={\varvec{g}}(\varvec{X}\varvec{\Gamma }\varvec{\Gamma }'\varvec{X}')$ and since $\varvec{X}\varvec{\Gamma }$ has the same distribution as $\varvec{X}$, the score $\varvec{d}'\varvec{X}={\varvec{g}}'(\varvec{XX}')\varvec{X}$ is spherically distributed. Note that $\beta$ can be written

$$\begin{aligned} \beta =\frac{\varvec{d}'\varvec{X}(\varvec{I}:\varvec{0})'(\varvec{I}:\varvec{0})\varvec{X}'\varvec{d}}{\varvec{d}'\varvec{XX}'\varvec{d}} \end{aligned}$$

and it follows from Lemma 3.1 that since the statement instead of $\varvec{d}'\varvec{X}$ also is true for a normally distributed variable with a dispersion matrix equal to ${\varvec{I}}$, $\beta$ is indeed $\beta$-distributed with m/2 and n/2 degrees of freedom.$\square$

Corresponding to (9) one alternative way to determine the weight $\varvec{d}$ in Corollary 3.1 is by solving the eigenvalue/eigenvector problem

$$\begin{aligned} (\varvec{W}+\varvec{V})\varvec{d}=\psi \varvec{d},\qquad \varvec{d}'\varvec{d}=1,\quad \psi >0. \end{aligned}$$

4 Main Results

In Sect. 2 the three hypothesis in classical profile analysis was presented, i.e. when $N>p+q$ . Now it will be focused on the high-dimensional setting when $p>N-q$. Consider, for example, the test statistic given in (3). When p is large the problem is that $\varvec{S}$ is singular and the determinant in (3) equals 0.

Läuter [8, 9] and Läuter et al. [10, 11] directly applied a vector $\varvec{d}$ to the observation matrix $\varvec{X}$. In this article, since there is a bilinear testing situation, it is proposed to apply $\varvec{d}$ to $\varvec{FX}$ where $\varvec{F}$ is given in (2). Hence, $\varvec{d}$ is of size $p-1$.

Let $\varvec{Y}=\varvec{FX}$, then the following test statistic is proposed to test the hypothesis of parallel profiles which is based on the formulation in (1):

Proposition 4.1

Let the parallelism hypothesis be defined via (1) and let $\varvec{Y}=\varvec{FX}$. A test statistic for testing the hypothesis is given by

$$\begin{aligned} \lambda _{Ph}=\frac{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}}{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}}. \end{aligned}$$

(10)

It can be noted that $\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C}))$ is independent of $\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{G}))$ and if $\varvec{d}$ is a function of $\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'$ $+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'=\varvec{Y}(\varvec{I}-\varvec{P}_{C'G^o})\varvec{Y}'$ Corollary 3.1 establishes the next theorem.

Theorem 4.1

If $\varvec{d}$ is a nonzero function (with probability 1) of $\varvec{Y}(\varvec{I}-\varvec{P}_{C'G^{o}})\varvec{Y}'$ the test statistic for testing for parallel profiles in high dimensions, given in (10), follows a $\beta$-distribution with parameters $(N-r(\varvec{C}))/2$ and $r(\varvec{G})/2$.

In (5) the likelihood ratio was presented for the level hypothesis defined in (4). As it can be seen from the expression in (5) problems occur in high dimensions because $\varvec{S}^{-1}$ does not exist when $p>N-r(\varvec{C})+1$. Therefore, $({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}$ and $({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^{\circ }}'\varvec{S}^{-1}$ have to be studied. It can be noted that if $p<N-r(\varvec{C})+1$,

$$\begin{aligned} ({(\varvec{F}')^{\circ }}'\varvec{S}^{-1}(\varvec{F}')^\circ )^{-1}\sim W_1(({(\varvec{F}')^{\circ }}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-p+1)). \end{aligned}$$

(11)

Thus, it can be seen that if $p>N-r(\varvec{C})+1$ negative degrees of freedom appear which of course is impossible. Thus, in order to test the level hypothesis in high dimensions the statistic in (5) has to be modified significantly. In this article the idea will be to modify the statistic in (5) as little as possible but so much that high-dimensional statistical analyses can be carried out.

A couple of ideas will bring us to a proposition where a test statistic and its distribution are given. The first idea will be to prove (11) in detail and see if anything can be modified so that when p is large reasonable expressions appear. To simplify notation $(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}$ will be discussed where, for example, $\varvec{A}=(\varvec{F}')^\circ$ and the inverse is supposed to exist. The following chain of equalities shows some interesting structure:

$$\begin{aligned} (\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}(\varvec{S}-\varvec{S}\varvec{A}^\circ ({\varvec{A}^\circ }'\varvec{S}\varvec{A}^\circ )^{-1}{\varvec{A}^\circ }'\varvec{S})\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}\varvec{P}\varvec{X}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}, \end{aligned}$$

(12)

where

$$\begin{aligned} \varvec{P}=\varvec{I}-\varvec{P_{C'}}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{A}^\circ ({\varvec{A}^\circ }'\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{A}^\circ )^{-1}{\varvec{A}^\circ }'\varvec{X}(\varvec{I}-\varvec{P_{C'}}). \end{aligned}$$

(13)

Since $\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}$ is independent of ${\varvec{A}^\circ }'\varvec{X}$ it is also independent of $\varvec{P}$. Moreover, $\varvec{P}$ is idempotent and $r(\varvec{P})=N-r(\varvec{C})-p+r(\varvec{A})$. Thus, for any choice of $\varvec{A}^\circ$, conditionally on ${\varvec{A}^\circ }'\varvec{X}$, the expression in (12) is Wishart distributed but it also appears that this distribution is independent of ${\varvec{A}^\circ }'\varvec{X}$ and therefore (11) is established for the specific choice $\varvec{A}=(\varvec{F}')^\circ$.

The critical point in the high-dimensional setting is that $r(\varvec{P})$ will approach 0, even if the inverse in (13) is replaced by a g-inverse, and therefore, it is proposed that $\varvec{P}$ is modified in such a way that instead of this projection (expressed in $\varvec{F}'$ instead of $\varvec{A}^\circ$)

$$\begin{aligned} \widetilde{\varvec{P}}=\varvec{I}-\varvec{P_{C'}}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d}(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1} \varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}}) \end{aligned}$$

(14)

will be used, where $\varvec{d}$ is a function in $\varvec{F}\varvec{X}$. Summarizing these calculations will give a quantity which will be used as nominator in a test statistic for testing the level hypothesis:

$$\begin{aligned} U=({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}\varvec{X}\widetilde{\varvec{P}}\varvec{X}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}. \end{aligned}$$

(15)

How to choose $\varvec{d}$ in (14) is not clear. On the one hand the distribution for U does not depend on $\varvec{d}$ but if explicit expressions are to be calculated this will be a function $\varvec{d}$ where now data has replaced $\varvec{X}$. From (5) it also follows that $(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}$ has to be considered. Similar calculations to those in (12) yield

$$\begin{aligned}&(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}\varvec{X}= (\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A}(\varvec{A}'\varvec{S}^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}^{-1}\varvec{X} \nonumber \\&=(\varvec{A}'\varvec{\Sigma }^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{\Sigma }^{-1}\varvec{X}\nonumber \\&\quad \times \,(\varvec{I}-(\varvec{I}-\varvec{P}_{C'})\varvec{X}'(\varvec{A}')^\circ ((\varvec{A'})^{o'}\varvec{X} (\varvec{I}-\varvec{P}_{C'})\varvec{X}'(\varvec{A'})^o)^{-1}(\varvec{A'})^{o'}\varvec{X}).\qquad \end{aligned}$$

(16)

Replacing $\varvec{A}$ by $(\varvec{F}')^\circ$ and then using the same $\varvec{d}$ vector as in (14) leads to

$$\begin{aligned} \varvec{h}=\varvec{P}_1'\varvec{X}'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1}, \end{aligned}$$

(17)

where

$$\begin{aligned} \varvec{P}_1=\varvec{I}-(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d}(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1} \varvec{d}'\varvec{F}\varvec{X} \end{aligned}$$

(18)

is an idempotent matrix. Moreover, let

$$\begin{aligned} \widetilde{\varvec{Q}}&= \varvec{G}'(\varvec{CC}')^{-1}\varvec{G}+\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{F}'\varvec{d}\nonumber \\&\quad \times \,(\varvec{d}'\varvec{F}\varvec{X}(\varvec{I}-\varvec{P_{C'}})\varvec{X}'\varvec{F}'\varvec{d})^{-1}\varvec{d}'\varvec{F}\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G} \end{aligned}$$

(19)

and put

$$\begin{aligned} V=\varvec{h}'\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\widetilde{\varvec{Q}}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\varvec{h}. \end{aligned}$$

(20)

Corresponding to $\lambda _L$ in (5) the next quantity is proposed to test the level hypothesis:

$$\begin{aligned} \lambda _{Lh,prel}=\frac{U}{U+V}, \end{aligned}$$

(21)

where U and V are defined in (15) and (20), respectively. However, it is not possible to use $\lambda _{Lh,prel}$ in (21) because both U and V include the unknown dispersion matrix ${\varvec{\Sigma }}$ so a few more results have to be established. It follows immediately that

$$\begin{aligned} U\sim & {} W_1(({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{-1}, N-r(\varvec{C})-1), \end{aligned}$$

(22)

$$\begin{aligned} V\sim & {} W_1(({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{-1}, r(\varvec{G})). \end{aligned}$$

(23)

If multiplying the nominator and denominator in (21) by $({(\varvec{F}')^\circ }'\varvec{\Sigma ^{-1}}(\varvec{F}')^\circ )^{1/2}$ it follows since the distribution of

$$\begin{aligned} ({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}(\varvec{F}')^\circ )^{-1/2}({(\varvec{F}')^\circ }'\varvec{\Sigma }^{-1}\varvec{X} \end{aligned}$$

is independent of $\varvec{\Sigma }$ that the distribution of $\lambda _{Lh,prel}$ is independent of $\varvec{\Sigma }$. Thus, in order to have a test statistic which is functionally independent of the dispersion matrix $\varvec{\Sigma }=\varvec{I}$ is chosen. In the next proposition the text statistic for the level hypothesis is stated.

Proposition 4.2

Let the level hypothesis testing problem be defined via (4). A test statistic for testing the level hypothesis is given by

$$\begin{aligned} \lambda _{Lh}=\frac{\widetilde{U}}{\widetilde{U}+\widetilde{V}}, \end{aligned}$$

where

$$\begin{aligned} \widetilde{U}&=({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1}{(\varvec{F}')^\circ }'\varvec{X}\widetilde{\varvec{P}}\varvec{X}'(\varvec{F}')^\circ ({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1},\\ \widetilde{V}&=\widetilde{\varvec{h}}'\varvec{X}\varvec{C}'(\varvec{CC}')^{-1}\varvec{G}\widetilde{\varvec{Q}}^{-1}\varvec{G}'(\varvec{CC}')^{-1}\varvec{C}\varvec{X}'\widetilde{\varvec{h}},\\ \widetilde{\varvec{h}}&=\varvec{P}_1'\varvec{X}'{(\varvec{F}')^\circ }({(\varvec{F}')^\circ }'(\varvec{F}')^\circ )^{-1} \end{aligned}$$

with $\widetilde{\varvec{P}}$, $\varvec{P}_1$ and $\widetilde{\varvec{Q}}$ defined in (14), (18) and (19), respectively.

How to choose $\varvec{d}$ in (14) and (18) is not clear. On the one side the distribution for $\lambda _{Lh}$ does not depend on the choice of $\varvec{d}$ whereas on the other side when calculating an explicit value of $\lambda _{Lh}$ the choice of $\varvec{d}$ will have an impact. This type of phenomena has been discussed very rarely. (We only know about it when a singular Gauss-Markov model has been considered.)

Based on the results in (22) and (23) the next theorem can be formulated.

Theorem 4.2

Let $\varvec{d}$ in (14) be a nonzero function (with probability 1) of $\varvec{F}\varvec{X}$. The test statistic in Proposition 4.2 for testing the level hypothesis in (4) follows a $\beta$-distribution with parameters $(N-r(\varvec{C})-1)/2$ and $r(\varvec{G})/2$.

The third hypothesis of flatness was stated in (6). Since the approach for creating a test statistic is the same as for the parallelism hypothesis, the results are stated without any proofs.

Proposition 4.3

Let the flatness hypothesis be defined via (6) and put $\varvec{Y}=\varvec{F}\varvec{X}$. A test statistic for testing the hypothesis is given by

$$\begin{aligned} \lambda _{Fh}=\frac{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}}{\varvec{d}'\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\varvec{d}+\varvec{d}'\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}'\varvec{d}}, \end{aligned}$$

(24)

where $\varvec{d}\in \mathfrak {R}^{p-1}$, with probability 1, is a nonzero function in $\varvec{Y}\varvec{Y}'$.

The motivation that $\varvec{d}$ is a function in $\varvec{Y}\varvec{Y}'$ follows from the fact that

$$\begin{aligned} \varvec{Y}\varvec{Y}'=\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'+\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}'. \end{aligned}$$

Moreover,

$$\begin{aligned}&\varvec{Y}(\varvec{I}-\varvec{P}_{C'})\varvec{Y}'+\varvec{Y}\varvec{P}_{C'(CC')^{-1}G}\varvec{Y}'\sim W_{p-1}(\varvec{F\Sigma F}',N-r(\varvec{C}'\varvec{G}^\circ )),\\&\varvec{Y}\varvec{P}_{C'G^\circ }\varvec{Y}' \sim W_{p-1}(\varvec{F\Sigma F}',r(\varvec{C}'\varvec{G}^\circ )). \end{aligned}$$

These results imply that the next theorem can be established.

Theorem 4.3

The test statistic for testing for flatness in high dimensions, given in Proposition 4.3, follows a $\beta$-distribution with parameters $(N-r(\varvec{C}'\varvec{G}^\circ ))/2$ and $r(\varvec{C}'\varvec{G}^\circ )/2$.

5 Concluding Remarks

In this article the three well-known test statistics in profile analysis have been modified so that high-dimensional data can be handled in a non-asymptotic approach. The test for parallelism and flatness was derived following ideas given by Läuter [8, 9] and Läuter et al. [10, 11] which originally was developed for handling MANOVA problems. Concerning the level testing a completely new approach is proposed. Here we modify the degrees of freedom and an exact test is derived.

The vector $\varvec{d}$, which is utilized in our approach when testing for parallelism and flatness, is a function of some sums of squares has only briefly been considered in this article. Instead it is referred to Läuter et al. [10], Section 4, where several different alternatives for determining $\varvec{d}$ are proposed. Furthermore, a generalization of the presented approach in this article will be to apply a matrix $\varvec{D}$, i.e. study several linear scores, instead of a vector $\varvec{d}$ which only give one score.

There is another important problem (observed by one of the reviewers) that the choice of $\varvec{F}$ can have an effect on the choice of $\varvec{d}$ and thus the test statistic depends in fact on the choice of $\varvec{F}$. It is important to continue this work and establish restrictions on the choice of $\varvec{d}$ so that the vector only depends on the space generated by the columns in $\varvec{F}$, i.e. instead of using $\varvec{F}$ using the projection $\varvec{F}(\varvec{F}'\varvec{F})^-\varvec{F}'$.

References

Cengiz C, von Rosen D (2020) High-dimensional profile analysis. Report LiTH-MAT-R–2020/07–SE, Department of Mathematics, Linköping University, Linköping, Sweden
Fang KT, Zhang YT (1990) Generalized multivariate analysis. Springer, Berlin
MATH Google Scholar
Fujikoshi Y (2009) Statistical inference for parallelism hypothesis in growth curve model. SUT J Math 45:137–148
MathSciNet MATH Google Scholar
Greenhouse SW, Geisser S (1959) On the methods in the analysis of profile data. Psychometrika 24:95–112
Article MathSciNet Google Scholar
Harrar SW, Kong X (2016) High-dimensional multivariate repeated measures analysis with unequal covariance matrices. J Multivar Anal 145:1–21
Article MathSciNet Google Scholar
Harrar SW, Xu J (2008) Asymptotic expansion of the null distributions of test statistics for profile analysis under general conditions. In: International conference on multivariate statistical modelling and high dimensional data mining, HDM-2008, Kayseri, Turkey
Hyodo M (2017) Tests for the parallelism and flatness hypotheses of multi-group profile analysis for high-dimensional elliptical populations. J Multivar Anal 162:82–92
Article MathSciNet Google Scholar
Läuter J (1996) Exact $t$ and $F$ tests for analyzing studies with multiple endpoints. Biometrics 52:964–970
Article MathSciNet Google Scholar
Läuter J (2016) Multivariate Statistik - drei Manuskripte. Shaker Verlag, Aachen
MATH Google Scholar
Läuter J, Glimm E, Kropf S (1996) New multivariate tests for data with an inherent structure. Biom J 38:5–23
Article MathSciNet Google Scholar
Läuter J, Glimm E, Kropf S (1998) Multivariate tests based on left-spherically distributed linear scores. Ann Stat 26:1972–1988
Article MathSciNet Google Scholar
Maruyama Y (2007) Asymptotic expansion of the null distributions of some test statistics for profile analysis in general conditions. J Stat Plan Inference 137:506–526
Article Google Scholar
Ohlson M, Srivastava MS (2010) Profile analysis for a growth curve model. J Jpn Stat Soc 40:1–21
Article MathSciNet Google Scholar
Okamoto N, Miura N, Seo T (2006) On the distributions of some test statistics for profile analysis in elliptical populations. Am J Math Manag Sci 26:1–31
MathSciNet MATH Google Scholar
Onozawa M, Nishiyama T, Seo T (2016) On test statistics in profile analysis with high-dimensional data. Commun Stat Simul Comput 45:3716–3743
Article MathSciNet Google Scholar
Seo T, Sakurai T, Fujikoshi Y (2011) LR tests for two hypotheses in profile analysis of growth curve data. SUT J Math 47:105–118
MathSciNet MATH Google Scholar
Srivastava MS (1987) Profile analysis of several groups. Commun Stat Theory Methods 16:909–926
Article MathSciNet Google Scholar
Srivastava MS (2002) Methods of multivariate statistics. Wiley, New York
MATH Google Scholar
Srivastava MS, Carter EM (1983) An introduction to applied multivariate statistics. North Holland, New York
MATH Google Scholar
Srivastava MS, Singull M (2012) Profile analysis with random-effects covariance structure. J Jpn Stat Soc 42:145–164
Article MathSciNet Google Scholar
Takahashi S, Shutoh N (2016) Tests for parallelism and flatness hypotheses of two mean vectors in high-dimensional settings. J Stat Comput Simul 86:1150–1165
Article MathSciNet Google Scholar
von Rosen D (2016) Profile analysis. Wiley StatsRef. https://doi.org/10.1002/9781118445112.stat02458.pub2
von Rosen D (2018) Bilinear regression analysis: an introduction. Lecture notes in statistics, vol 220, Springer, New York
Yokoyama T (1995) LR test for random-effects covariance structure in a parallel profile model. Ann Inst Stat Math 47:309–320
Article MathSciNet Google Scholar
Yokoyama T, Fujikoshi Y (1993) A parallel profile model with random-effects covariance structure. J Jpn Stat Soc 23:83–89
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank both reviewers who delivered many insightful comments and at the same time recognized many “small” errors. Dietrich von Rosen is supported by the Swedish Research Council (2017-03003).

Funding

Open Access funding provided by Swedish University of Agricultural Sciences.

Author information

Authors and Affiliations

Department of Energy and Technology, Swedish University of Agricultural Sciences, SE-750 07, Uppsala, Sweden
Cigdem Cengiz & Dietrich von Rosen
Department of Mathematics, Linköping University, SE-581 83, Linköping, Sweden
Cigdem Cengiz, Dietrich von Rosen & Martin Singull

Authors

Cigdem Cengiz
View author publications
You can also search for this author in PubMed Google Scholar
Dietrich von Rosen
View author publications
You can also search for this author in PubMed Google Scholar
Martin Singull
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dietrich von Rosen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka , and M. B. Rao.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cengiz, C., von Rosen, D. & Singull, M. Profile Analysis in High Dimensions. J Stat Theory Pract 15, 15 (2021). https://doi.org/10.1007/s42519-020-00154-z

Download citation

Accepted: 02 December 2020
Published: 22 December 2020
DOI: https://doi.org/10.1007/s42519-020-00154-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Profile Analysis in High Dimensions

Abstract

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

1 Introduction

2 Profile Analysis

3 High-Dimensional Setting

Lemma 3.1

Theorem 3.1

Corollary 3.1

Proof

4 Main Results

Proposition 4.1

Theorem 4.1

Proposition 4.2

Theorem 4.2

Proposition 4.3

Theorem 4.3

5 Concluding Remarks

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Profile Analysis in High Dimensions

Abstract

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

1 Introduction

2 Profile Analysis

3 High-Dimensional Setting

Lemma 3.1

Theorem 3.1

Corollary 3.1

Proof

4 Main Results

Proposition 4.1

Theorem 4.1

Proposition 4.2

Theorem 4.2

Proposition 4.3

Theorem 4.3

5 Concluding Remarks

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation