When applying independent component analysis (ICA), sometimes that the connections between the observed mixtures and the recovered independent components (or the original sources) to be sparse, to make the interpretation easier or to reduce the model complexity. In this paper we propose natural gradient algorithms for ICA with a sparse separation matrix, as well as ICA with a sparse mixing matrix. The sparsity of the matrix is achieved by applying certain penalty functions to its entries. The properties of the penalty functions are investigated. Experimental results on both artificial data and causality discovery in financial stocks show the usefulness of the proposed methods.


Independent Component Analysis Penalty Function Independent Component Analysis Neural Information Processing System Natural Gradient 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)CrossRefGoogle Scholar
  2. Amari, S., Cichocki, A., Yang, H.H.: A new learning algorithm for blind signal separation. In: Advances in Neural Information Processing Systems (1996)Google Scholar
  3. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statist. Assoc. 96, 1348–1360 (2001)MATHCrossRefMathSciNetGoogle Scholar
  4. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons Inc., Chichester (2001)CrossRefGoogle Scholar
  5. Lehmann, E.L.: Theory of Point Estimation. John Wiley & Sons, Inc, Chichester (1983)MATHGoogle Scholar
  6. Lewicki, M., Sejnowski, T.J.: Learning nonlinear overcomplete represenations for efficient coding. In: Advances in Neural Information Processing Systems, vol. 10, pp. 815–821 (1998)Google Scholar
  7. Pham, D.T., Garat, P.: Blind separation of mixture of independent sources through a quasi-maximum likelihood approach. IEEE Trans. on Signal Processing 45(7), 1712–1725 (1997)MATHCrossRefGoogle Scholar
  8. Shimizu, S., Hoyer, P.O., Hyvarinen, A., Kerminen, A.J.: A linear non-Gaussian acyclic model for causal discovery. Submitted to Journal of Machine Learning Research (2006)Google Scholar
  9. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society 58(1), 267–288 (1996)MATHMathSciNetGoogle Scholar
  10. Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight elimination with application to forecasting. Advances in Neural Information Processing Systems 3 (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kun Zhang
    • 1
  • Lai-Wan Chan
    • 1
  1. 1.Department of Computer Science and EngineeringThe Chinese University of Hong KongShatin, Hong Kong

Personalised recommendations