Abstract
Neurobiological studies have shown that neurons in the primary visual cortex (V1) may employ sparse presentations to represent stimuli. We describe a network model for sparse coding which includes input layer, base functional layer and output layer. We simulated standard sparse coding and sparse coding based on fast independent component analysis (ICA). The duration of training bases, the convergence speed of objective function and the sparsity of coefficient matrix were compared, respectively. The results show that sparse coding based on fast ICA is more effective than standard sparse coding.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Treichler DG (1967) Are you missing the boat in training aids. Film Audio Vis Commun 1:14–16
Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1. Vis Res 37:3313–3325
Field DJ (1994) What is the goal of sensory coding? Neural Comput 6:559–601
Olshausen BA, Field DJ (1996) Emergence of simple cell receptive properties by learning a sparse code for natural images. Nature 381:607–609
Simoncelli EP (2003) Vision and the statistics of the visual environment. Curr Opin Neurobiol 1(13):144–149
Delgutte B, Hammond B, Cariani P (1998) Psychophysical and physiological advances in hearing. Whurr Publishers, London
Ruderman DL, Bialek W (1994) Statistics of natural images: scaling in the woods. Phys Rev Lett 73(6):814–817
Kandel ER, Schwartz JH, Jessel TM (2000) Principles of neural science, vol 4. McGraw-Hill Medical, New york
Hyvarinen A (1999) Survey on independent component analysis. Neural Comput Surv 2(4):94–128
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Opin Neurobiol 14:481–487
Lewick M (2002) Efficient coding of natural sounds. Nat Neurosci 5:356–363
Vinje W, Gallant J (2002) Natural stimulation of the non-classical receptive field increases information transmission efficiency in V1. J Neurosci 22:2904–2915
Hubel DH, Wiesel TN (1977) Functional architecture of macaque monkey visual cortex. Proc R Soc Lond B 198:1–59
Hyvarinen A, Hoyer PO (2002) A two-layer sparse coding model learn simple and complex cell receptive fields and topography from natural images. Vis Res 41(18):2413–2423
Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24:1193–1216
Hoyer PO, Hyvarinen A (2000) Independent component analysis applied to feature extraction from colour and stereo images. Netw Comput Neural Syst 11(3):191–210
Haken H (2007) Towards a unifying model of neural net activity in the visual cortex. Cogn Neurodyn 1(1):15–25
Hyvarinen A, Hoyer PO, Mika OI (2001) Topographic independent component analysis. Neural Comput 13(7):1527–1558
Li S, Wu S (2007) Robustness of neural codes and its implication on natural image processing. Cogn Neurodyn 1(3):261–272
Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatiotemporal filters similar to simple cells in primary visual cortex. Proc R Soc Ser B 265:2315–2320
Gong HY, Zhang YY, Liang PJ, Zhang PM (2010) Neural coding properties based on spike timing and pattern correlation of retinal ganglion cells. Cogn Neurodyn 4(4):337–346
Saglam Murat, Hayashida Yuki, Murayama Nobuki (2009) A retinal circuit model accounting for wide-field amacrine cells. Cogn Neurodyn 3(1):25–32
Pillow JW, Shlens J, Paninski L, Sher A, Litke AM (2008) Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature 454:995–999
Li CG, Li YK (2011) Fast and robust image segmentation by small-world neural oscillator networks. Cogn Neurodyn 5(2):209–220
Vialatte FB, Dauwels J, Maurice M, Yamaguchi Y, Cichocki A (2009) On the synchrony of steady state visual evoked potentials and oscillatory burst events. Cogn Neurodyn 3(3):251–261
Huberman AD, Feller MB, Chapman B (2008) Mechanisms underlying development of visual maps and receptive fields. Annu Rev Neurosci 31:479–509
Han JW, Zhao SJ, Hu XT, Guo L, Liu TM (2014) Encoding brain network response to free viewing of videos. Cogn Neurodyn 8(5):389–397
Wang RB, Tsuda I, Zhang ZK (2015) A new work mechanism on neuronal activity. Int J Neural Syst 25(03):1450037
Wang RB (2015) Can the activities of the large scale cortical network be expressed by neural energy? Cogn Neurodyn 1:1–5
Wang ZY, Wang RB, Fang RY (2015) Energy coding in neural network with inhibitory neurons. Cogn Neurodyn 9(2):129–144
Wang ZY, Wang RB (2014) Energy distribution property and energy coding of a structural neural network. Front Comput Neurosci 8(8):14
Wang RB, Zhang ZK (2011) Phase synchronization motion and neural coding in dynamic transmission of neural information. IEEE Trans Neural Netw 22(7):1097–1106
Wang RB, Zhang ZK (2007) Energy coding in biological neural network. Cogn Neurodyn 1(3):203–212
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors do not have any type of conflict of interest.
Appendix
Appendix
1.1 Sparse coding
Sparse coding includes a class of unsupervised methods for learning sets of complete bases for efficient data representation. The aim of sparse coding is to develop a set of basic vectors that represent an input vector as a linear combination of the basic vectors:
where \(x = (x_{1} ,x_{2} , \ldots\,x_{n} )^{T}\) represents input data, \(A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}\) is base matrix, \(a_{i}\) is column i in \(A\), which represents the basic functions. \(S = (s_{1} ,s_{2} , \ldots\,s_{m} )^{T}\) denotes coefficient matrix. With a complete basis, \(S\) is no longer uniquely determined by the input vector \(x\). Therefore, we introduced the additional criterion of sparsity in sparse coding. We define sparsity in terms of few nonzero components or few components not close to zero. The choice of sparsity as a desired characteristic in our representation of the input data is motivated by the observation that most sensory data such as natural images may be described as the superposition of a small number of atomic elements such as surfaces or edges. Other justifications such as comparisons of the properties of the primary visual cortex have also been advanced.
We define the sparse coding cost function using a set of n input vectors as follows:
where \(a_{j}\) represents basic function, \({\text{s}}_{j}\) is coefficient, \(x_{i}\) is input data, \(\lambda\) is a constant, \(H(s_{j} )\) denotes a sparsity cost function, which penalizes \(s_{j}\) for being far from zero. Usually a common choice for the sparsity cost is the L1 penalty \(H(s_{j} ) = \left| {s_{j} } \right|_{1}\), but it is non-differentiable when basic function equals 0; therefore, we selected sparsity cost \(H(s_{j} ) = \sqrt {s_{j}^{2} + \varepsilon }\), wherein \(\varepsilon\) is a constant.
We interpret the first term of the sparse coding objective as a reconstruction term, which uses the algorithm to provide a good representation of x and the second term as a sparsity penalty, which is a sparse representation of x. The constant \(\lambda\) is a scaling constant determining the relative importance of these two contributions.
In addition, it is possible to make the sparsity penalty arbitrarily small by scaling down \(s_{j}\) and scaling \(a_{j}\) up using a large constant. To prevent this event, we constrain \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\) to be less than the constant C.
The full sparse coding cost function including our constraint is as follows:
Subject to \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\).
However, the constraint of \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\) cannot be enforced using simple gradient-based methods. This constraint is weakened to a “weight decay” term designed to keep the entries of \(A\) small. Therefore, we added the constraints to the objective function to provide a new objective function:
where \(\lambda\) and \(\gamma\) are constants, \(A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}\) is base matrix, \(S = (s_{1} ,s_{2} , \ldots\,s_{\text{m}} )^{T}\) is coefficient matrix.
The objective function is non-convex and hence impossible to optimize well using gradient-based methods. However, given \(A\), the problem of finding \(S\) that minimizes \({\text{F}}(A,S)\) is convex. Similarly, given \({\text{S}}\), the problem of finding \(A\) that minimizes \({\text{F}}(A,S)\) is also convex suggesting an alternative to optimize \(A\) for a fixed \(S\) and then optimizing \(S\) with a fixed \(A\).
The analytic solution of \(A\) is obtained as follows:
The analytic solution of \(S\) is provided by:
Therefore, the learning equation of basic function \(a_{i}\) is represented by:
The learning equation of coefficient \(s_{i}\) is as follows:
Using the simple iterative algorithm on a large dataset (including 10,000 patches) results in prolonged iterations and convergence of the algorithm. To increase the rate of convergence by accelerating the iteration, the algorithm may be run on mini-patches selecting a mini-patch random subset of 1000 patches from the 10,000 patches.
A faster and better convergence may be obtained via initialization of the feature matrix \(S\) before using gradient descent (or other methods) to optimize the objective function for \(S\) given \(A\). In practice, initializing \(S\) randomly at each iteration results in poor convergence unless a good optimum is found for \(S\) before optimizing for \(A\). A better way to initialize \(S\) involves the following steps:
-
1.
Random initialization of \(A.\)
-
2.
Repetition until convergence.
-
1.
Selection of a mini-patch random subset.
-
2.
Initialization of \(S\) with \(S\) = \({\text{A}}^{T} X\), dividing the feature by the corresponding basic vector in \(A.\)
-
3.
Finding \(S\) that minimizes \({\text{F}}(A,S)\) for the \(A\) in the previous step.
-
4.
Determination of \(A\) that minimizes \({\text{F}}(A,S)\) for the \(S\) found in the previous step.
-
1.
Using this method, good local optima can be reached relatively quickly.
1.2 ICA and fast ICA
In neurobiology, sparse coding can be interpreted as encoding an input stimulus as completely as possible in the activity of a few neurons. In the mathematical sense, a set of neurons is the most efficient if the response of each neuron was statistically independent. ICA attempts to analyze a multivariate signal into independent non-Gaussian signals. ICA can be used in natural images to obtain a set of independent linear basic functions. Therefore, ICA reveals the essential characteristics of data adequately [9]. We can use ICA to feature extraction and image processing, followed by sparse coding of images. ICA model essentially represents the properties of simple cell receptive fields in primary visual cortex. The basic functions of ICA are similar to Gabor function. The principal characteristics of the spatial cell receptive fields in primates appear to be selectively tuned for location, orientation and frequency, which are the properties of ICA [9, 13, 16].
Therefore, training images through ICA yields basic matrix, although the convergence speed of the objective function is very slow when the independent components are extracted using gradient descent. Further, the step-size choice is very difficult.
Fast ICA invented by Hyvärinen at Helsinki University of Technology is a fast iterative algorithm with enormous convergence speed. It seeks an orthogonal rotation of pre-whitened data, through a fixed-point iteration scheme that maximizes the non-Gaussian measure of the rotated components [14].
The steps of fast ICA is as follows:
-
1.
Whitening and centralizing the input data \(X\), and get \(Z.\)
-
2.
Initialize \(W_{p}\) randomly.
-
3.
\(W_{p} = E [Zg (W_{p}^{T} Z ) ] { - }E [g^{\prime } (W_{p}^{T} Z ) ]W\), usually g(.) = tanh(.), and E[.] represents averaging operation.
-
4.
$$W_{p} = W_{p} - \sum\limits_{j = 1}^{p - 1} { (W_{p}^{T} W_{j} )} W_{j} .$$
-
5.
Normalization process: \(W_{p} = W_{p} /\left\| {W_{p} } \right\|.\)
-
6.
Repeat 3 until \(W_{p}\) convergence.
Then, we can get basis matrix \(A\) by the following equation:
Rights and permissions
About this article
Cite this article
Wang, G., Wang, R. Sparse coding network model based on fast independent component analysis. Neural Comput & Applic 31, 887–893 (2019). https://doi.org/10.1007/s00521-017-3116-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3116-3