Skip to main content
Log in

A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The FastICA algorithm is one of the most popular algorithms in the domain of independent component analysis (ICA). Despite its success, it is observed that FastICA occasionally yields outcomes that do not correspond to any true solutions (known as demixing vectors) of the ICA problem. These outcomes are commonly referred to as spurious solutions. Although FastICA is a well-studied ICA algorithm, the occurrence of spurious solutions is not yet completely understood by the ICA community. In this contribution, we aim at addressing this issue. In the first part of this work, we are interested in characterizing the relationship between demixing vectors, local optimizers of the contrast function and (attractive or unattractive) fixed points of FastICA algorithm. We will show that there exists an inclusion relationship between these sets. In the second part, we investigate the possible scenarios where spurious solutions occur. It will be shown that when certain bimodal Gaussian mixture distributions are involved, there may exist spurious solutions that are attractive fixed points of FastICA. In this case, popular nonlinearities such as “Gauss” or “tanh” tend to yield spurious solutions, whereas “kurtosis” gives much more reliable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper, notation \(\subset\) stands for the “subset” rather than the “proper subset” inclusion. Hence, \({\mathbb {D}}\subset {\mathbb {L}}\) does not exclude \({\mathbb {D}}={\mathbb {L}}\).

References

  1. Wei T (2014) On the spurious solutions of the FastICA algorithm. In: 2014 IEEE workshop on statistical signal processing (SSP) (SSP’14), Gold Coast, Australia, June 2014, pp 161–164

  2. Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, London

    Google Scholar 

  3. Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley-Interscience, New York

    Book  Google Scholar 

  4. Amari S-I, Cichocki A (2002) Adaptive blind signal and image processing. Wiley, New York

    Google Scholar 

  5. Cardoso JF, Souloumiac A (1993) Blind beamforming for non-gaussian signals. IEEE Proc F 140(6):362–370

    Google Scholar 

  6. Comon P (1994) Independent component analysis: a new concept? Signal Process 36(3):287–314

    Article  MATH  Google Scholar 

  7. Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483–1492

    Article  Google Scholar 

  8. Zarzoso V, Comon P (2010) Robust independent component analysis by iterative maximization of the kurtosis contrast with algebraic optimal step size. IEEE Trans Neural Netw 21(2):248–261

    Article  Google Scholar 

  9. Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3):626–634

    Article  Google Scholar 

  10. Delfosse N, Loubaton P (1995) Adaptive blind separation of independent sources: a deflation approach. Signal Process 45(1):59–83

    Article  MATH  Google Scholar 

  11. Oja E, Yuan Z (2006) The FastICA algorithm revisited: convergence analysis. IEEE Trans Neural Netw 17(6):1527–1541

    Article  Google Scholar 

  12. Regalia PA, Kofidis E (2003) Monotonic convergence of fixed-point algorithms for ICA. IEEE Trans Neural Netw 14(4):943–949

    Article  Google Scholar 

  13. Tichavsky P, Koldovsky Z, Oja E (2006) Performance analysis of the FastICA algorithm and Cramér-Rao bounds for linear independent component analysis. IEEE Trans Signal Process 54(4):1189–1203

    Article  Google Scholar 

  14. Ollila E, Kim H-J, Koivunen V (2008) Compact Cramér-Rao bound expression for independent component analysis. IEEE Trans Signal Process 56(4):1421–1428

    Article  MathSciNet  Google Scholar 

  15. Basiri S, Ollila E, Koivunen V (2014) Fast and robust bootstrap method for testing hypotheses in the ICA model. In: acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on, May 2014, pp 6–10

  16. Shen H, Kleinsteuber M, Hüper K (2008) Local convergence analysis of FastICA and related algorithms. IEEE Trans Neural Netw 19(6):1022–1032

    Article  Google Scholar 

  17. Hyvärinen A (1997) One-unit contrast functions for independent component analysis: a statistical analysis. In: Proceedings of the IEEE NNSP workshop ’97. Neural networks for signal processing VII

  18. Ollila E (2010) The deflation-based FastICA estimator: Statistical analysis revisited. IEEE Trans Signal Process 58(3):1370–1381

    Article  MathSciNet  Google Scholar 

  19. Miettinen J, Nordhausen K, Oja H, Taskinen S (2014) Deflation-based fastica with adaptive choices of nonlinearities. IEEE Trans Signal Process 62(21):5716–5724

    Article  MathSciNet  Google Scholar 

  20. Wei T (2015) A convergence and asymptotic analysis of the generalized symmetric FastICA algorithm (submitted). ArXiv

  21. Wei T (2015) An overview of the asymptotic performance of the family of the FastICA algorithms (accepted). In: latent variable analysis and signal separation, ser. lecture notes in computer science, August 2015

  22. Douglas S (2003) On the convergence behavior of the FastICA algorithm. In: Proceedings of the 4th symposium on independent component analysis blind source separation, Nara, Japan, pp 409–414

  23. Vrins F, Verleysen M (2005) On the entropy minimization of a linear mixture of variables for source separation. Signal Process 85(5):1029–1044

    Article  MATH  Google Scholar 

  24. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430

    Article  Google Scholar 

  25. Dermoune A, Wei T (2013) FastICA algorithm: Five criteria for the optimal choice of the nonlinearity function. IEEE Trans Signal Process 61(8):2078–2087

    Article  Google Scholar 

  26. Himberg J, Hyvärinen A (2003) ICASSO: software for investigating the reliability of ICA estimates by clustering and visualization. In: In Proceedings of the 2003 IEEE workshop on neural networks for signal processing (NNSP2003), pp 259–268

Download references

Acknowledgments

The author would like to express the deepest gratitude to Prof. A. Dermoune for his invaluable guidance. The author would also like to thank the anonymous referees for carefully reading the manuscript and for giving us many helpful and constructive suggestions resulting in the present work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianwen Wei.

Appendix

Appendix

1.1 Proof of Theorem 7

Without loss of generality, we may suppose that the mixing matrix \({\mathbf {A}}\) is an identity matrix. Let \({\mathbf {v}}\) be a fixed point, and write \({\mathbf {v}}=(v_1,\ldots ,v_d)^{\mathsf {T}}\). By (14), we have for any \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} {\mathbb {E}}\Big [\sum _{j=1}^d({v_j s_j})^3s_i\Big ] = {\mathbb {E}}\Big [\sum _{j=1}^d({v_j s_j})^4\Big ]v_i, \end{aligned}$$

or equivalently, after some algebraic simplifications,

$$\begin{aligned} \kappa _i v_i^3 + 3 v_i = \Big (\sum _{j=1}^d v_j^4 \kappa _j +3\Big )v_i, \end{aligned}$$
(26)

where \(\kappa _i\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[s_i^4]-3\ne 0\) by assumption \({\mathcal {A}}_2\) (see Sect. 3.1). Denote by \({\mathcal {I}}\) the set of indices i such that \(v_i\ne 0\). It follows from (26) that for \(i\in {\mathcal {I}}\)

$$\begin{aligned} v_i^2 = \frac{1}{\kappa _i}\sum _{j=1}^d v_j^4 \kappa _j. \end{aligned}$$
(27)

Since \(\sum _{i=1}^d v_i^2=1\), we deduce from (27)

$$\begin{aligned} \sum _{i\in {\mathcal {I}}} \Big ( \frac{1}{\kappa _i}\sum _{j=1}^d v_j^4 \kappa _j \Big ) = 1, \end{aligned}$$

or equivalently

$$\begin{aligned} \sum _{j=1}^d v_j^4 \kappa _j= \Big (\sum _{i\in {\mathcal {I}}} \kappa _i^{-1} \Big )^{-1}. \end{aligned}$$

Then, we can rewrite (26) as

$$\begin{aligned} v_i^2 = \kappa _i^{-1} \Big (\sum _{j\in {\mathcal {I}}} \kappa _j^{-1} \Big )^{-1},\quad i\in {\mathcal {I}}. \end{aligned}$$

Now let us calculate \({\mathbf {M}}\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {s}}{\mathbf {s}}^{\mathsf {T}}]\). We have

$$\begin{aligned} {\mathbf {M}}_{ij}= & {} {\mathbb {E}}\Big [3 \sum _k (v_k s_k)^2 s_is_j \Big ] = 6v_iv_j\\ {\mathbf {M}}_{ii}= & {} {\mathbb {E}}\Big [3 \sum _k (v_k s_k)^2 s_i^2 \Big ]=3(\kappa _i v_i^2 + 2v_i^2 + 1). \end{aligned}$$

From this, we deduce that \({\mathbf {M}}=3(2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}+ {\mathbf {I}}+{\mathbf {D}})\), where \({\mathbf {D}}\) is a diagonal matrix with the ith diagonal entry \({\mathbf {D}}_i=\kappa _i v_i^2\). Since

$$\begin{aligned} \alpha ({\mathbf {v}})={\mathbb {E}}\Big [3- \Big (\sum _{i=1}^dv_is_i\Big )^4\Big ]=-\sum _i v_i^4 \kappa _i, \end{aligned}$$

it follows from (27) that \({\mathbf {D}}_i=-\alpha ({\mathbf {v}})\) for \(i\in {\mathcal {I}}\) and \({\mathbf {D}}_i=0\) otherwise. Besides, since \({\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {I}}]=3{\mathbf {I}}\), we have \({\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]=3{\mathbf {I}}-{\mathbf {M}}=-3({\mathbf {D}}+ 2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})\). From this, we deduce that

$$\begin{aligned} \varvec{f}'({\mathbf {v}})= & {} \frac{({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}){\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {x}})({\mathbf {I}}-{\mathbf {x}}{\mathbf {x}}^{\mathsf {T}})]}{ |\alpha ({\mathbf {v}}) |} \nonumber \\= & {} \frac{-3({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})({\mathbf {D}}+2{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})}{ |\alpha ({\mathbf {v}}) |} \nonumber \\= & {} 3{\mathrm {sign}}(\alpha ({\mathbf {v}})) ({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})\bar{\mathbf {D}}, \end{aligned}$$
(28)

where \(\bar{\mathbf {D}}\mathop {=}\limits ^{\mathrm {def}}-{\mathbf {D}}/|\alpha ({\mathbf {v}})|\). Clearly, the diagonal entry of \(\bar{\mathbf {D}}\) satisfies \(\bar{\mathbf {D}}_i=1\) for \(i\in {\mathcal {I}}\) and \(\bar{\mathbf {D}}_i=0\) otherwise. Besides, it is easy to see that \(\bar{\mathbf {D}}{\mathbf {v}}={\mathbf {v}}\), which implies \({\mathrm {span}}({\mathbf {v}})\subset {\mathrm {range}}(\bar{\mathbf {D}})\). Denote by \(\#{\mathcal {I}}\) the cardinal of \({\mathcal {I}}\). Since \(\dim ({\mathrm {span}}({\mathbf {v}}))=1\) and \(\dim ({\mathrm {range}}(\bar{\mathbf {D}}))=\#{\mathcal {I}}\ge 1\), this inclusion becomes an equality if and only if \(\#{\mathcal {I}}=1\), i.e., there is exactly one entry \(v_i\ne 0\), or equivalently, \({\mathbf {v}}={\mathbf {e}}_i\in {\mathbb {D}}\) for some i. If this is the case, then we have immediately \(\varvec{f}'({\mathbf {v}})=0\) by (28). Otherwise, take any vector \({\mathbf {u}}=(u_1,\ldots ,u_d)^{\mathsf {T}}\) such that \(\Vert {\mathbf {u}}\Vert =1\), \({\mathbf {u}}^{\mathsf {T}}{\mathbf {v}}=0\) and \(u_i=0\) for \(i\ne {\mathcal {I}}\). We have

$$\begin{aligned} \varvec{f}'({\mathbf {v}}){\mathbf {u}}=3{\mathrm {sign}}(\alpha ({\mathbf {v}})) ({\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}})\bar{\mathbf {D}}{\mathbf {u}}=3{\mathrm {sign}}(\alpha ({\mathbf {v}})){\mathbf {u}}. \end{aligned}$$

On the one hand, by the submultiplicativity of spectral norm,

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {v}})\Vert \le 3\Vert {\mathbf {I}}-{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}\Vert \Vert \bar{\mathbf {D}}\Vert =3; \end{aligned}$$

on the other hand, there also holds

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {v}})\Vert =\sup _{\mathbf {w}\in {\mathcal {S}}}\Vert \varvec{f}'({\mathbf {v}}){\mathbf {w}}\Vert \ge \Vert \varvec{f}'({\mathbf {v}}){\mathbf {u}}\Vert =3. \end{aligned}$$

It then follows that \(\Vert \varvec{f}'({\mathbf {v}})\Vert =3\) for any \({\mathbf {v}}\in {\mathbb {F}}\backslash {\mathbb {D}}\). This fact also implies \({\mathbb {D}}={\mathbb {L}}\).

1.2 Proof of Proposition 8

Without loss of generality, in what follows we take \({\mathbf {A}}={\mathbf {I}}\) for simplicity.

1.2.1 Case \(d=2\)

Let us first consider the simplest case \(d=2\). If \(\alpha ({\mathbf {e}}_1)\) and \(\alpha ({\mathbf {e}}_2)\) have the same sign, say, positive, then \({\mathbf {e}}_1\) and \({\mathbf {e}}_2\) are local minimizers of the contrast function \({\mathcal {J}}(\cdot )\) on \({\mathcal {S}}\). Write

$$\begin{aligned} {\mathbf {w}}(\theta )= & {} \cos (\theta ){\mathbf {e}}_1+ \sin (\theta ){\mathbf {e}}_2 = \left(\begin{array}{l}\cos (\theta ) \\ \sin (\theta )\end{array}\right), \end{aligned}$$
(29)
$$\begin{aligned} f(\theta )= & {} {\mathcal {J}}({\mathbf {w}}(\theta ))={\mathbb {E}}[G(\cos (\theta )s_1 + \sin (\theta )s_2)] . \end{aligned}$$
(30)

Then, it is easy to see that \(\theta _1=0\) and \(\theta _2=\pi /2\) are local minimizers of \(f(\theta )\) on \({\mathbb {R}}\). From that we deduce immediately that f reaches its local maximum at some internal point \(\theta _0\in (\theta _1,\theta _2)\). The corresponding vector \({\mathbf {v}}\mathop {=}\limits ^{\mathrm {def}}\big (\cos (\theta _0), \sin (\theta _0)\big )^{\mathsf {T}}\) is then a local maximizer of \({\mathcal {J}}({\mathbf {w}})\) on \({\mathcal {S}}\).

We actually proved the following result:

Lemma 10

Let \(s_1\) and \(s_2\) be two random variables such that the quantity \({\mathbb {E}}[g'(s_i) - g(s_i)s_i]\) has the same sign for \(i=1,2.\) Then, \(f(\theta )\mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[G(\cos (\theta )s_1 + \sin (\theta )s_2)]\) has a local optimum at some \(\theta _0\in (0,2\pi )\). Moreover, the angle \(\theta _0\) satisfies

$$\begin{aligned} {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\big ){\mathbf {s}}\Big ] = {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\big ){\mathbf {w}}(\theta _0)^{\mathsf {T}}{\mathbf {s}}\Big ]{\mathbf {w}}(\theta _0) \end{aligned}$$
(31)

Equality (31) comes directly from (14).

1.2.2 Case \(d>2\)

Suppose \({\mathbf {e}}_i\) and \({\mathbf {e}}_j\) are two demixing vectors such that both \(\alpha ({\mathbf {e}}_i)\) and \(\alpha ({\mathbf {e}}_j)\) are positive. Write

$$\begin{aligned} f^{(i,j)}(\theta ) \mathop {=}\limits ^{\mathrm {def}}{\mathbb {E}}[G(\cos (\theta )s_i + \sin (\theta )s_j)]. \end{aligned}$$

By Lemma 10, there exists \(\theta '\in (0,2\pi )\) such that \(\theta '\) maximizes \(f^{(i,j)}(\theta )\) and satisfies

$$\begin{aligned} {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\big ){\mathbf {s}}_{ij}\Big ]= {\mathbb {E}}\Big [g\big ({\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\big ){\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\Big ]{\mathbf {w}}(\theta '), \end{aligned}$$
(32)

where \({\mathbf {s}}_{ij}=(s_i,s_j)^{\mathsf {T}}\). Consider vector \({\mathbf {u}}=(u_1,\ldots ,u_d)\) with \(u_i=\cos (\theta ')\), \(u_j=\sin (\theta ')\) and \(u_k=0\) for \(k\ne i,j\). Clearly, \({\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}=\cos (\theta ')s_i + \sin (\theta ')s_j={\mathbf {w}}(\theta ')^{\mathsf {T}}{\mathbf {s}}_{ij}\). It then follows from (32) that

$$\begin{aligned} {\mathbb {E}}[g({\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {s}}] = {\mathbb {E}}[g({\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}){\mathbf {u}}^{\mathsf {T}}{\mathbf {s}}]{\mathbf {u}}, \end{aligned}$$
(33)

which implies \({\mathbf {v}}\in {\mathbb {F}}\) by (14).

In the particular case that \(s_1\) and \(s_2\) have the same distribution, we must have \(\cos (\theta ')=\sin (\theta ')=1/\sqrt{2}\) by symmetry. This means \(\theta '=\pi /4\).

1.3 Proof of Proposition 9

Suppose that \({\mathbf {u}}\in {\mathbb {R}}^2\) is a spurious attractive fixed point in the case \(d=2\) with \(s_1{\sim } {\mathcal {D}}_1\) and \(s_2{\sim } {\mathcal {D}}_2\) . Then \(\Vert \varvec{f}'({\mathbf {u}})\Vert <1\) and

$$\begin{aligned} {\mathbf {u}}^{\mathsf {T}}{\mathbf {x}}= cs_1 + \sqrt{1-c^2}s_2 \end{aligned}$$

for some real scalar \(c\in (0,1)\). Now let us consider the case \(d=n>2\) with \(s_i{\sim } {\mathcal {D}}_1\) and \(s_j{\sim }{\mathcal {D}}_2\) for some indices \(i\ne j\). In the sequel, we assume \(i=1\), \(j=2\) for simplicity. Take

$$\begin{aligned} {\mathbf {v}}=c{\mathbf {a}}_1 + \sqrt{1-c^2}{\mathbf {a}}_2\in {\mathbb {R}}^d. \end{aligned}$$

It is easy to see that \({\mathbf {v}}^{\mathsf {T}}{\mathbf {x}}=cs_1 + \sqrt{1-c^2}s_2\) and \({\mathbf {v}}\in {\mathbb {F}}\). Next, we show \(\Vert \varvec{f}'({\mathbf {v}})\Vert =\Vert \varvec{f}'({\mathbf {u}})\Vert <1\), where \(\varvec{f}'({\mathbf {v}})\) and \(\varvec{f}'({\mathbf {u}})\) are, respectively, \(n\times n\) and \(2\times 2\) matrices. Note that these two “\(\varvec{f}'(\cdot )\)” are different mappings for they are determined by different ICA models. Denote respectively by \({\mathbf {A}}_{\mathbf {u}}\) and \({\mathbf {A}}_{\mathbf {v}}\) the mixing matrices for each case. Since the mixing matrix \({\mathbf {A}}_{\mathbf {u}}\) is orthogonal, for any \({\mathbf {w}}\in {\mathbb {R}}^2\) we have

$$\begin{aligned}&\Vert ({\mathbf {I}}-{\mathbf {w}}{\mathbf {w}}^{\mathsf {T}}){\mathbb {E}}[g'({\mathbf {w}}^{\mathsf {T}}{\mathbf {x}})({\mathbf {I}}-{\mathbf {x}}{\mathbf {x}}^{\mathsf {T}})]\Vert \nonumber \\&\quad =\Vert ({\mathbf {I}}-{\mathbf {A}}_{\mathbf {u}}^{\mathsf {T}}{\mathbf {w}}{\mathbf {w}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}){\mathbb {E}}[g'({\mathbf {w}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}{\mathbf {x}})({\mathbf {}I}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]\Vert . \end{aligned}$$
(34)

Similar equality also holds for \({\mathbf {A}}_{\mathbf {v}}\) and \({\mathbf {w}}\in {\mathbb {R}}^n\). Denote

$$\begin{aligned}&{\mathbf {B}}_{\mathbf {u}}\mathop {=}\limits ^{\mathrm {def}}({\mathbf {I}}-{\mathbf {A}}_{\mathbf {u}}^{\mathsf {T}}{\mathbf {u}}{\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}){\mathbb {E}}[g'({\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})] \\&{\mathbf {B}}_{\mathbf {v}}\mathop {=}\limits ^{\mathrm {def}}({\mathbf {I}}-{\mathbf {A}}_{\mathbf {v}}^{\mathsf {T}}{\mathbf {v}}{\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}){\mathbb {E}}[g'({\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}{\mathbf {s}})({\mathbf {I}}-{\mathbf {s}}{\mathbf {s}}^{\mathsf {T}})]. \end{aligned}$$

Using (34) and (16), we get

$$\begin{aligned} \Vert \varvec{f}'({\mathbf {u}})\Vert =\frac{\Vert {\mathbf {B}}_{\mathbf {u}}\Vert }{|\alpha ({\mathbf {u}})|},\quad \Vert \varvec{f}'({\mathbf {v}})\Vert =\frac{\Vert {\mathbf {B}}_{\mathbf {v}}\Vert }{|\alpha ({\mathbf {v}})|}. \end{aligned}$$
(35)

Notice that

$$\begin{aligned} {\mathbf {u}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {u}}= & {} (c,\sqrt{1-c^2})\in {\mathbb {R}}^2 \\ {\mathbf {v}}^{\mathsf {T}}{\mathbf {A}}_{\mathbf {v}}= & {} (c,\sqrt{1-c^2},0,\ldots ,0)\in {\mathbb {R}}^d \end{aligned}$$

by the construction of \({\mathbf {u}}\) and \({\mathbf {v}}\). This implies \({\mathbf {u}}^{\mathsf {T}}{\mathbf {x}}\) and \({\mathbf {v}}^{\mathsf {T}}{\mathbf {x}}\) have the same distribution and therefore \(\alpha ({\mathbf {u}})=\alpha ({\mathbf {v}})\). Besides, it is easily seen that

$$\begin{aligned} {\mathbf {B}}_{\mathbf {v}}=\left(\begin{array}{ll} {\mathbf {B}}_{\mathbf {u}}&{} {\mathbf {0}} \\ {\mathbf {0}} &{} {\mathbf {0}} \end{array}\right). \end{aligned}$$
(36)

From (36), we can deduce that \(\Vert {\mathbf {B}}_{\mathbf {u}}\Vert =\Vert {\mathbf {B}}_{\mathbf {v}}\Vert\). Finally, combining this result with (35) gives \(\Vert \varvec{f}'({\mathbf {v}})\Vert <1\).

1.4 Probability distributions used in Table 1

1.4.1 Generalized Gaussian distribution

The generalized Gaussian density function with parameter \(\alpha\), zero mean and unit variance is given by

$$\begin{aligned} f_{\alpha }(x)=\frac{\alpha \beta _{\alpha }}{2\Gamma (1/\alpha )}\exp {\{-(\beta _{\alpha }|x|)^{\alpha }\}}, \end{aligned}$$

where \(\alpha\) is a positive parameter that controls the distributions exponential rate of decay, \(\Gamma\) is the Gamma function and

$$\begin{aligned} \beta _{\alpha }=\sqrt{\frac{\Gamma (3/\alpha )}{\Gamma (1/\alpha )}}. \end{aligned}$$

This generalized Gaussian family encompasses the ordinary standard normal distribution for \(\alpha =2\) , the Laplace distribution for \(\alpha =1\) and the uniform distribution in the limit \(\alpha \rightarrow \infty\).

1.4.2 Bimodal distribution with Gaussian mixture

The bimodal distribution used in this paper consists of a mixture of two Gaussian distribution. Define random variable

$$\begin{aligned} X=Z Y_1 + (1-Z)Y_2, \end{aligned}$$

where \(Y_i{\sim }{\mathcal {N}}(\mu _i,\sigma _i^2)\) and \(Z{\sim }{\mathcal {B}}(p)\) are mutually independent random variables. Here, \({\mathcal {B}}(p)\) denotes the Bernoulli distribution with parameter p, i.e., \({\mathbb {P}}(Z=1)=p\) and \({\mathbb {P}}(Z=0)=1-p\). It is easy to see that the probability density function (PDF) of X is given by

$$\begin{aligned} f_X(x)= p f_{Y_1}(x) + (1-p)f_{Y_2}(x), \end{aligned}$$

where \(f_{Y_i}\) is the PDF of \(Y_i\) for \(i=1,2\).

Now take any \(\mu _1,\mu _2\) such that \(\mu _1\mu _2<0\) and \(|\mu _1\mu _2|\le 1\), then let \(\sigma ^2_1=\sigma ^2_2=1-|\mu _1\mu _2|\) and

$$\begin{aligned} p =\frac{|\mu _2|}{|\mu _1|+|\mu _2|}. \end{aligned}$$

Defined in such a way, X is a random variable with zero mean, unit variance and two modes at \(\mu _1\) and \(\mu _2\). Since the PDF of X is completely determined by \(\mu _1,\mu _2\), we use them as controlling parameter and denote by “Bimod\((\mu _1,\mu _2)\)” the distribution of X. Note that if \(\mu _1=-\mu _2\), then \(p=1/2\) and the distribution of X is symmetrical. In this case, we write simply “Bimod (\(\mu\))” with \(\mu =|\mu _1|\). Note also that the “bpsk” distribution is actually Bimod (1).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, T. A study of the fixed points and spurious solutions of the deflation-based FastICA algorithm. Neural Comput & Applic 28, 13–24 (2017). https://doi.org/10.1007/s00521-015-2033-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-2033-6

Keywords

Navigation