Auto-regressive independent process analysis without combinatorial efforts


We treat the problem of searching for hidden multi-dimensional independent auto-regressive processes (auto-regressive independent process analysis, AR-IPA). Independent subspace analysis (ISA) can be used to solve the AR-IPA task. The so-called separation theorem simplifies the ISA task considerably: the theorem enables one to reduce the task to one-dimensional blind source separation task followed by the grouping of the coordinates. However, the grouping of the coordinates still involves two types of combinatorial problems: (a) the number of the independent subspaces and their dimensions, and then (b) the permutation of the estimated coordinates are to be determined. Here, we generalize the separation theorem. We also show a non-combinatorial procedure, which—under certain conditions—can treat these two combinatorial problems. Numerical simulations have been conducted. We investigate problems that fulfill sufficient conditions of the theory and also others that do not. The success of the numerical simulations indicates that further generalizations of the separation theorem may be feasible.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    The possibility of such a decomposition principle was suspected by Cardoso [3], who based his conjecture on numerical experiments.

  2. 2.

  3. 3.

    W mICA denotes the component of separation matrix WICA that corresponds to the mth sub-process.


  1. 1.

    Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York

  2. 2.

    Cichocki A, Amari S (2002) Adaptive blind signal and image processing. Wiley, New York

  3. 3.

    Cardoso J (1998) Multidimensional independent component analysis. In: International conference on acoustics, speech, and signal processing (ICASSP ’98), vol 4. pp 1941–1944

  4. 4.

    Akaho S, Kiuchi Y, Umeyama S (1999) MICA: multimodal independent component analysis. In: International joint conference on neural networks (IJCNN ’99), vol 2. pp 927–932

  5. 5.

    Vollgraf R, Obermayer K (2001) Multi-dimensional ICA to separate correlated sources. In: Neural information processing systems (NIPS 2001), vol 14. MIT Press, Cambridge, pp 993–1000

  6. 6.

    Bach FR, Jordan MI (2003) Beyond independent components: trees and clusters. J Mach Learn Res 4:1205–1233

    Article  MathSciNet  Google Scholar 

  7. 7.

    Póczos B, Lőrincz A (2005) Independent subspace analysis using k-nearest neighborhood distances. Artif Neural Netw Formal Models Appl 3697:163–168

    Google Scholar 

  8. 8.

    Póczos B, Lőrincz A (2005) Independent subspace analysis using geodesic spanning trees. In: International conference on machine learning (ICML 2005), vol 119. ACM Press, New York, pp 673–680

  9. 9.

    Theis FJ (2005) Blind signal separation into groups of dependent signals using joint block diagonalization. In: International Society for Computer Aided Surgery (ISCAS 2005), vol 6. pp 5878–5881

  10. 10.

    Van Hulle MM (2005) Edgeworth approximation of multivariate differential entropy. Neural Comp 17:1903–1910

    MATH  Article  Google Scholar 

  11. 11.

    Póczos B, Takács B, Lőrincz A (2005) Independent subspace analysis on innovations. In: European conference on machine learning (ECML 2005), vol 3720 LNAI. Springer, Berlin, pp 698–706

  12. 12.

    Hyvärinen A (1998) Independent component analysis for time-dependent stochastic processes. In: International conference on artificial neural networks (ICANN ’98). Springer, Berlin, pp 541–546

  13. 13.

    Szabó Z, Póczos B, Lőrincz A (2006) Cross-entropy optimization for independent process analysis. In: Independent component analysis and blind signal separation (ICA 2006), vol 3889, LNCS. Springer, Berlin, pp 909–916

  14. 14.

    Cheung Y, Xu L (2003) Dual multivariate auto-regressive modeling in state space for temporal signal separation. IEEE Trans Syst Man Cybern B 33:386–398

    Article  Google Scholar 

  15. 15.

    Theis FJ (2004) Uniqueness of complex and multidimensional independent component analysis. Signal Process 84:951–956

    MATH  Article  Google Scholar 

  16. 16.

    Rubinstein RY, Kroese DP (2004) The cross-entropy method. Springer, Berlin

  17. 17.

    Hardy GH, Ramanujan SI (1918) Asymptotic formulae in combinatory analysis. Proc Lond Math Soc 17:75–115

    Article  Google Scholar 

  18. 18.

    Uspensky JV (1920) Asymptotic formulae for numerical functions which occur in the theory of partitions. Bull Russian Acad Sci 14:199–218

    Google Scholar 

  19. 19.

    Costa JA, Hero AO (2004) Manifold learning using k-nearest neighbor graphs. In: International conference on acoustic speech and signal processing (ICASSP 2004), vol 4. pp 988–991

  20. 20.

    Gray AG, Moore AW (2000) ‘N-Body’ problems in statistical learning. In: Proceedings of NIPS, pp 521–527

  21. 21.

    Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Google Scholar 

  22. 22.

    Takano S (1995) The inequalities of Fisher information and entropy power for dependent variables. In: Proceedings of the 7th Japan–Russia symposium on probability theory and mathematical statistics

  23. 23.

    Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, London

  24. 24.

    Frahm G (2004) Generalized elliptical distributions: theory and applications. PhD thesis, University of Köln

  25. 25.

    Gupta AK, Song D (1997) L p-norm spherical distribution. J Stat Plan Inference 60

  26. 26.

    Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20:130–141

    Article  Google Scholar 

  27. 27.

    Amari S, Cichocki A, Yang HH (1996) A new learning algorithm for blind signal separation. Adv Neural Inf Process Syst 8:757–763

    Google Scholar 

  28. 28.

    Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48

    Article  MathSciNet  Google Scholar 

  29. 29.

    Theis FJ (2005) Multidimensional independent component analysis using characteristic functions. In: European signal processing conference (EUSIPCO 2005)

  30. 30.

    Neumaier A, Schneider T (2001) Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans Math Softw 27:27–57

    MATH  Article  Google Scholar 

  31. 31.

    Schneider T, Neumaier A (2001) Algorithm 808: ARfit - a matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans Math Softw 27:58–65

    MATH  Article  Google Scholar 

  32. 32.

    Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9:1483–1492

    Article  Google Scholar 

  33. 33.

    Hyvärinen A, Hoyer PO (2000) Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12:1705–1720

    Article  Google Scholar 

Download references


This research has been supported by the EC NEST ‘Perceptual Consciousness: Explication and Testing’ grant under contract 043261. Opinions and errors in this manuscript are the author’s responsibility, they do not necessarily reflect those of the EC or other project members.

Author information



Corresponding author

Correspondence to András Lőrincz.



The theorems that we present here concern the ISA task that we gain after reducing the AR-IPA task and thus, here s m = e m (m = 1, …, M). Since the ISA task also concerns source, but these sources exhibit the i.i.d. property, thus we shall use notation s m. In the present work, the differential entropy H is defined by the logarithm of base e.

Appendix 1: The ISA separation theorem (Proof)

The main idea of our ISA separation theorem is that the ISA task may be accomplished in two steps under certain conditions. In the first step ICA is executed. The second step is search for the optimal permutation of the ICA components.

If EPI (see Eq. 17) is satisfied (on S L) then a further inequality holds:

Lemma 1

Suppose that continuous stochastic variables\(u_1,\ldots,u_L \in{\mathbb{R}}\)satisfy thew-EPI condition (see Eq. 19). Then, they also satisfy

$$ H\left(\sum_{i=1}^L w_iu_i\right)\ge\sum_{i=1}^Lw_i^2H\left(u_i\right), \forall{{\mathbf{w}}}\in S^L. $$

Note 4

w-EPI holds, for example, for independent variables u i , because independence is not affected by multiplication with a constant.


Assume that w ∈ S L. Applying ln on condition 19, and using the monotonicity of the ln function, we can see that the first inequality is valid in the following inequality chain

$$ \begin{aligned} 2H\left(\sum_{i=1}^Lw_iu_i\right)\ge& \ln\left(\sum_{i=1}^L {\rm e}^{2H(w_iu_i)}\right)=\ln\left(\sum_{i=1}^L{\rm e}^{2H(u_i)}\cdot w_i^2\right)\\ \ge&\sum_{i=1}^Lw_i^2\cdot\ln\left({\rm e}^{2H(u_i)}\right)= \sum_{i=1}^Lw_i^2\cdot2H(u_i). \end{aligned} $$


  1. 1.

    we used the relation [21]:

    $$ H(w_iu_i)=H(u_i)+\ln\left(\left|w_i\right|\right) $$

    for the entropy of the transformed variable. Hence

    $$ {\rm e}^{2H(w_iu_i)}={\rm e}^{2H(u_i)+2\ln\left(\left|w_i\right|\right)}= {\rm e}^{2H(u_i)}\cdot {\rm e}^{2\ln\left(\left|w_i\right|\right)}={\rm e}^{2H(u_i)}\cdot w_i^2. $$
  2. 2.

    In the second inequality, we utilized the concavity of ln.

Now we shall use Lemma 1 to proceed. The separation theorem will be a corollary of the following claim:

Proposition 3

Lety = [y1,…,yM] = y(W) = Ws, where\({{\mathbf{W}}}\in {{\mathcal{O}}}^D\), ymis the estimation of themthcomponent of the ISA task. Lety m i be theithcoordinate of themth component. Similarly, lets m i stand for theithcoordinate of themth source. Let us assume that thesmsources satisfy condition11. Then

$$ \sum_{m=1}^M\sum_{i=1}^dH\left(y^m_i\right)\ge \sum_{m=1}^M\sum_{i=1}^dH\left(s^m_i\right). $$


Let us denote the (ij)th element of matrix W by W i,j. Coordinates of y and s will be denoted by y i and s i , respectively. Let \({\mathcal{G}}^1, \ldots, {\mathcal{G}}^M\) denote indices belonging to the 1,…,M subspaces, respectively, that is, \({\mathcal{G}}^1:=\{1,\ldots,d\},\ldots, {\mathcal{G}}^M:=\{D-d+1,\ldots,D\}\). Now, writing the elements of the ith row of matrix multiplication y = W s, we have

$$ y_i=\sum_{j\in {\mathcal{G}}^1} W_{i,j}s_j+\cdots+\sum_{j\in {\mathcal{G}}^M} W_{i,j}s_j $$

and thus,

$$ H\left(y_i\right)=H\left(\sum_{j\in {\mathcal{G}}^1} W_{i,j}s_j+\cdots+\sum_{j\in {\mathcal{G}}^M} W_{i,j}s_j\right) $$
$$=H\left\{\sum_{m=1}^M\left[\left (\sum_{l\in{\mathcal{G}}^m}W_{i,l}^2\right)^{{\frac{1} {2}}}{\frac{\sum_{j\in{\mathcal{G}}^m}W_{i,j}s_j} {\left(\sum_{l\in{\mathcal{G}}^m}W_{i,l}^2\right)^{{\frac{1} {2}}}}}\right] \right\} $$
$$\ge\sum_{m=1}^M\left[\left(\sum_{l\in{\mathcal{G}}^m} W_{i,l}^2\right)H\left({\frac{\sum_{j\in{\mathcal{G}}^m}W_{i,j}s_j} {\left(\sum_{l\in{\mathcal{G}}^m}W_{i,l}^2\right)^{{\frac{1} {2}}}}}\right)\right] $$
$$=\sum_{m=1}^M\left[\left(\sum_{l\in{\mathcal{G}}^m} W_{i,l}^2\right) H\left(\sum_{j\in{\mathcal{G}}^m}{\frac{W_{i,j}} {\left(\sum_{l\in{\mathcal{G}}^m}W_{i,l}^2\right)^{{\frac{1} {2}}}}}s_j\right)\right] $$
$$\ge\sum_{m=1}^M\left[\left(\sum_{l\in{\mathcal{G}}^m} W_{i,l}^2\right) \sum_{j\in{\mathcal{G}}^m}\left({\frac{W_{i,j}} {\left(\sum_{l\in{\mathcal{G}}^m}W_{i,l}^2\right)^{{\frac{1} {2}}}}}\right)^2H\left(s_j\right)\right] $$
$$=\sum_{j\in{\mathcal{G}}^1}W_{i,j}^2H \left(s_j\right)+\cdots+\sum_{j\in{\mathcal{G}}^M}W_{i,j}^2H\left(s_j\right) $$

The above steps can be justified as follows:

  1. 1.

    Equation 35: Eq. 34 was inserted into the argument of H.

  2. 2.

    Equation 36: New terms were added for Lemma 1.

  3. 3.

    Equation 37: Sources s m are independent of each other and this independence is preserved upon mixing within the subspaces, and we could also use Lemma 1, because W is an orthogonal matrix.

  4. 4.

    Equation 38: Nominators were transferred into the ∑ j terms.

  5. 5.

    Equation 39: Variables s m satisfy condition 11 according to our assumptions.

  6. 6.

    Equation 40: We simplified the expression after squaring.

Using this inequality, summing it for i, exchanging the order of the sums, and making use of the orthogonality of matrix W, we have

$$ \sum_{i=1}^DH(y_i)\ge\sum_{i=1}^D\sum_{m=1}^M \left[\sum_{j\in{\mathcal{G}}^m}W_{i,j}^2H\left(s_j\right)\right] $$
$$=\sum_{m=1}^M\left[\sum_{j\in{\mathcal{G}}^m} \left(\sum_{i=1}^DW^2_{i,j}\right)H\left(s_j\right)\right] $$
$$=\sum_{j=1}^DH(s_j). $$

Note 5

The proof holds if the dimensions of the subspaces are not equal. The same is true for the ISA separation theorem.

Having this proposition, now we prove our main theorem (Theorem 1).


ICA minimizes the LHS of Eq. 33, that is, it minimizes \(\sum_{m=1}^M\sum_{i=1}^dH\left(y^m_i\right)\). The set of minima is invariant for permutations and sign changes and according to Proposition 3, {s m i }—that is the coordinates of components s m of the ISA task—belong to the set of minima.

Appendix 2: Sufficient conditions of the separation theorem

In the separation theorem, we assumed that relation 11 is fulfilled for the s m sources. Below, we present sufficient conditions—together with proofs—when this inequality is fulfilled.


According to Lemma 1, if the w-EPI property (i.e., Eq. 19) holds for sources s m, then inequality Eq. 11 holds, too.

Spherically symmetric sources

We shall make use of the following well-known property of spherically symmetric variables [23, 24]:

Lemma 2

Letvdenote a d-dimensional variable, which is spherically symmetric. Then the projection ofvonto lines through the origin have identical univariate distribution.

Lemma 3

The expectation value and the variance of a d-dimensional v spherically symmetric variable are

$$ E[{{\mathbf{v}}}]= {{\mathbf{0}}}, $$
$$ Var[{{\mathbf{v}}}]= (constant)\cdot {{\mathbf{I}}}_d. $$

Proof of Proposition 2

Here, we show that the w-EPI property is fulfilled with equality for spherical sources. According to Eqs. 44, 45, spherically symmetric sources s m have zero expectation values and up to a constant multiplier they also have identity covariance matrices:

$$ E[{{\mathbf{s}}}^m]= {{\mathbf{0}}}, $$
$$ Var[{{\mathbf{s}}}^m]= c^m\cdot {{\mathbf{I}}}_d. $$

Note that our constraint on the ISA task, namely that covariance matrices of the s m sources should be equal to I d , is fulfilled up to constant multipliers.

Let P w denote the projection to straight line with direction w ∈ S d, which crosses the origin, i.e.,

$$ P_{{{\mathbf{w}}}}:{\mathbb{R}}^d \ni {{\mathbf{u}}} \mapsto \sum_{i=1}^d w_iu_i\in{\mathbb{R}}. $$

In particular, if w is chosen as the canonical basis vector e i (all components are 0, except the ith component, which is equal to 1), then

$$ P_{{{\mathbf{e}}}_i}({{\mathbf{u}}})=u_i. $$

In this interpretation, w-EPI (Eq. 19) is concerned with the entropies of the projections of the different sources onto straight lines crossing the origin. The LHS projects to w, whereas the r.h.s. projects to the canonical basis vectors. Let u denote an arbitrary source, i.e., u := s m. According to Lemma 2, distribution of the spherical u is the same for all such projections and thus their entropies are identical. That is,

$$ \sum_{i=1}^{d} w_iu_i \mathop = \limits^{\hbox{distr}} u_1 \mathop = \limits^{\hbox{distr}}\cdots \mathop = \limits^{\hbox{distr}} u_d,\quad \forall {{\mathbf{w}}}\in S^d, $$
$$ H\left(\sum_{i=1}^d w_iu_i\right) = H\left(u_1\right) =\cdots = H\left(u_d\right),\quad \forall {{\mathbf{w}}}\in S^d. $$


  • LHS of w-EPI is equal to \({\rm e}^{2H(u_1)}\).

  • r.h.s. of w-EPI can be written as follows:

    $$ \sum_{i=1}^d {\rm e}^{2H(w_iu_i)}=\sum_{i=1}^d{\rm e}^{2H(u_i)}\cdot w_i^2={\rm e}^{2H(u_1)}\sum_{i=1}^dw_i^2={\rm e}^{2H(u_1)}\cdot 1={\rm e}^{2H(u_1)} $$

    At the first step, we used identity (Eq. 32) for each of the terms. At the second step, Eq. 51 was utilized. Then term \({\rm e}^{2H(u_1)}\) was pulled out and we took into account that w ∈ S d.

Note 6

We note that sources of spherically symmetric distribution have already been used in the context of ISA in [33]. In that work, a generative model was assumed. According to the assumption, the distribution of the norms of sample projections to the subspaces were independent. This way, the task was restricted to spherically symmetric source distributions, which is a special case of the general ISA task.

Sources Invariant to 90° Rotation

By definition, spherical variables are invariant to orthogonal transformations (see Eq. 20). For mixtures of two-dimensional components (d = 2), much milder condition, invariance to 90° rotation, suffices. First, we observe that:

Note 7

In the ISA separation theorem, it is enough if some orthogonal transformation of the s m sources, C m s m \(({{\mathbf{C}}}^m\in{{\mathcal{O}}}^d)\) satisfy the condition 11. In this case, the C m s m variables are extracted by the permutation search after the ICA transformation. Because the ISA identification has ambiguities up to orthogonal transformation in the respective subspaces, this is suitable. In other words, for the ISA identification the existence of an Orthonormal Basis (ONB) for each \({{\mathbf{u}}}:={{\mathbf{s}}}^m\in{\mathbb{R}}^d\) components is sufficient, on which the

$$ h:{\mathbb{R}}^d\ni{{\mathbf{w}}} \mapsto H[\left<{{\mathbf{w}}}, {{\mathbf{u}}}\right>] $$

function takes its minimum. (Here, the \(\left<{{\mathbf{w}}}, {{\mathbf{u}}}\right>:=\sum_{i=1}^dw_iu_i\) stochastic variable is the projection of u to the direction w.) In this case, the entropy inequality Eq. 11 is met with equality on the elements of the ONB.

Now we present our theorem concerning to the d = 2 case.

Theorem 2

Let us suppose, that the density function f of stochastic variable\({{\mathbf{u}}}=(u_1,u_2)(={{\mathbf{s}}}^m)\in{\mathbb{R}}^2\)exhibits the invariance

$$ f(u_1,u_2)=f(-u_2,u_1)=f(-u_1,-u_2)=f(u_2,-u_1)\quad\left(\forall {{\mathbf{u}}}\in{\mathbb{R}}^2\right), $$

that is, it is invariant to 90° rotation. If function\(h({{\mathbf{w}}})=H[\left\langle{{\mathbf{w}}},{{\mathbf{u}}}\right\rangle]\) has minimum on the set \(\{{{\mathbf{w}}}\ge{{\mathbf{0}}}\}\cap S^2\), it also has minimum on an ONB. (Relation w ≥ 0 concerns each coordinates.) Consequently, the ISA task can be identified by the use of the separation theorem.



$$ {{\mathbf{R}}}:=\left[\begin{array}{ll} 0 & -1\\ 1 & 0 \\ \end{array}\right] $$

denote the matrix of 90° ccw rotation. Let w ∈ S 2. \(\left\langle{{\mathbf{w}}},{{\mathbf{u}}}\right\rangle\in{\mathbb{R}}\) is the projection of variable u onto w. The value of the density function of the stochastic variable 〈wu〉 in \(t\in{\mathbb{R}}\) (we move t in direction w) can be calculated by integration starting from the point w t, in direction perpendicular to w

$$ f_{y=y({{\mathbf{w}}})=\left\langle{{\mathbf{w}}},{{\mathbf{u}}} \right\rangle}(t)= \int\limits_{{{\mathbf{w}}}^\perp} f ({{\mathbf{w}}}t+ {{\mathbf{z}}})\hbox{d}{{\mathbf{z}}}. $$

Using the supposed invariance of f and the relation 56 we have

$$ f_{y({{\mathbf{w}}})}=f_{y({{\mathbf{Rw}}})}=f_{y({{\mathbf{R}}}^2 {{\mathbf{w}}})}=f_{y({{\mathbf{R}}}^3{{\mathbf{w}}})}, $$

where ‘ = ’ denotes the equality of functions. Consequently, it is enough to optimize h on the set {w0}. Let w min be the minimum of function h on the set \(S^{2}\cap\{{{\mathbf{w}}}\ge {{\mathbf{0}}}\}\). According to Eq. 57, h takes constant and minimal values in the

$$ \{{{\mathbf{w}}}_{\rm min},{{\mathbf{Rw}}}_{\rm min},{{\mathbf{R}}}^2 {{\mathbf{w}}}_{\rm min},{{\mathbf{R}}}^3{{\mathbf{w}}}_{\rm min}\} $$

points. {v minRv min} is a suitable ONB in Note 7.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Szabó, Z., Póczos, B. & Lőrincz, A. Auto-regressive independent process analysis without combinatorial efforts. Pattern Anal Applic 13, 1–13 (2010).

Download citation


  • Independent component analysis
  • Independent process analysis
  • Auto-regressive processes