Skip to main content
Log in

In-Network Principal Component Analysis with Diffusion Strategies

  • Published:
International Journal of Wireless Information Networks Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is a very well-known statistical analysis technique. In its conventional formulation, it requires the eigen-decomposition of the sample covariance matrix. Due to its high-computational complexity and large memory requirements, the estimation of the covariance matrix and its eigen-decomposition do not scale up when dealing with big data, such as in large-scale networks. Numerous studies have been conducted to overcome this issue, often by partitioning the unknown matrix. In this paper, we propose a novel framework for estimating the principal axes, iteratively and in a distributed in-network scheme, without the need to estimate the covariance matrix. To this end, a coupling is operated between criteria for iterative PCA and several strategies for in-network processing. The investigated strategies can be grouped in two classes, noncooperative and cooperative such as information diffusion and consensus strategies. Theoretical results on the performance of these strategies are provided, as well as a convergence analysis. The performance of the proposed approach for in-network PCA is illustrated on diverse applications, such as image processing and time series in wireless sensor networks, with a comparison to state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(4), 433–459 (2010). doi:10.1002/wics.101

    Article  Google Scholar 

  2. Honeine, P.: Online kernel principal component analysis: a reduced-order model. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(9), 1814–1826 (2012)

    Article  Google Scholar 

  3. Chitradevi, N., Baskaran, K., Palanisamy, V., Aswini, D.: Designingan efficient PCA based data model for wireless sensor networks. In: Proceedings of the 1st International Conference on Wireless Technologies for Humanitarian Relief. ACWR ’11, pp. 147–154. ACM, New York, NY, USA (2011)

  4. Chen, F., Wen, F., Jia, H.: Algorithm of data compression based on multiple principal component analysis over the wsn. In: 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), pp. 1–4 (2010)

  5. Rooshenas, A., Rabiee, H., Movaghar, A., Naderi, M.: Reducing the data transmission in wireless sensor networks using the principal component analysis. In: Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 133–138 (2010). doi:10.1109/ISSNIP.2010.5706781

  6. Ahmadi Livani, M., Abadi, M.: A PCA-based distributed approach for intrusion detection in wireless sensor networks. In: 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), pp. 55–60 (2011)

  7. Huang, L., Nguyen, X., Garofalakis, M., Jordan, M., Joseph, A., Taft, N.: In-network PCA and anomaly detection, 19th edn., pp. 617–624. MIT Press, Cambridge, MA (2006)

    Google Scholar 

  8. Bunch, J.R., Nielsen, C.P.: Updating the singular value decomposition. Numerische Mathematik 31, 111–129 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hall, P.M., Marshall, A.D., Martin, R.R.: Merging and splitting eigenspace models. IEEE Trans. Pattern Anal. Mach. Intell. 22(9), 1042–1049 (2000)

    Article  Google Scholar 

  10. Kargupta, H., Huang, W., Sivakumar, K., Park, B.H., Wang, S.: Collective principal component analysis from distributed, heterogeneous data. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 452–457. Springer-Verlag, London, UK, UK (2000)

  11. Le Borgne, Y.A., Raybaud, S., Bontempi, G.: Distributed principal component analysis for wireless sensor networks. Sensors 8(8), 4821–4850 (2008). doi:10.3390/s8084821

    Article  Google Scholar 

  12. Ahmadi Livani, M., Abadi, M.: A PCA-based distributed approach for intrusion detection in wireless sensor networks. In: 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), pp. 55–60 (2011). doi:10.1109/CNDS.2011.5764585

  13. Fellus, J., Picard, D., Gosselin, P.: Dimensionality reduction in decentralized networks by gossip aggregation of principal components analyzers. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgique, pp. 171–176 (2014)

  14. Boyd, S., Ghosh, A., Prabhakar, B., Shah, D.: Randomized gossip algorithms. Information Theory, IEEE Transactions on 52(6), 2508–2530 (2006). doi:10.1109/TIT.2006.874516

    Article  MathSciNet  MATH  Google Scholar 

  15. Asensio-Marco, C., Beferull-Lozano, B.: Fast average gossiping under asymmetric links in wsns. In: 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), pp. 131–135 (2014)

  16. Oja, E.: Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15(3), 267–273 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  17. Sayed, A.H., Tu, S.Y., Chen, J., Zhao, X., Towfic, Z.J.: Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior. Signal Processing Magazine, IEEE 30(3), 155–171 (2013). doi:10.1109/MSP.2012.2231991

    Article  Google Scholar 

  18. Dony, R.D., Haykin, S.: Neural network approaches to image compression. Proceedings of the IEEE 83(2), 288–303 (1995). doi:10.1109/5.364461

    Article  Google Scholar 

  19. Degroot, M.H.: Reaching a Consensus. Journal of the American Statistical Association 69(345), 118–121 (1974)

    Article  MATH  Google Scholar 

  20. Tu, S.Y., Sayed, A.H.: Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks. Signal Processing, IEEE Transactions on 60(12), 6217–6234 (2012). doi:10.1109/TSP.2012.2217338

    Article  MathSciNet  Google Scholar 

  21. Nedic, A., Ozdaglar., A.: Cooperative distributed multi-agent optimization. In: D.P. Palomar, Y.C. Eldar (eds.) Convex Optimization in Signal Processing and Communications, pp. 340–386 (2009). Cambridge Books Online http://dx.doi.org/10.1017/CBO9780511804458.011.

  22. Yuan, K., Ling, Q., Yin, W.: On the convergence of decentralized gradient descent. ArXiv arXiv:1310.7063 (2013)

  23. Ghadban, N., Honeine, P., Mourad-Chehade, F., Francis, C., Farah, J.: Diffusion strategies for in-network principal component analysis. In: 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2014). doi:10.1109/MLSP.2014.6958849

  24. Möller, R.: First-order approximation of gram-schmidt orthonormalization beats deflation in coupled pca learning rules. Neurocomput. 69(13-15), 1582–1590 (2006)

    Article  Google Scholar 

  25. Srivastava, V.: A unified view of the orthogonalization methods. Journal of Physics A: Mathematical and General 33(35), 6219 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  26. Sanger, T.D.: Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2, 459–473 (1989)

    Article  Google Scholar 

  27. Sanger, T.D.: Two iterative algorithms for computing the singular value decomposition from input/output samples. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems 6, pp. 144–151 (1994)

  28. Chen, L.H., Chang, S.: An adaptive learning algorithm for principal component analysis. IEEE Transactions on Neural Networks 6(5), 1255–1263 (1995)

    Article  MathSciNet  Google Scholar 

  29. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor networks. IEEE Communications Magazine 40, 102–114 (2002)

    Article  Google Scholar 

  30. Chellaboina, V., Haddad, W.M.: Structured matrix norms for real and complex block-structured uncertainty. Automatica 33(5), 995–997 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lancaster, P., Tismenetsky, M.: The Theory of Matrices: With Applications. Computer Science and Scientific Computing Series (1985). https://books.google.fr/books?id=m8z6Xh1A3t8C

  32. Gazi, V., Passino, K.M.: Swarm Stability and Optimization, 1st edn. Springer (2011)

Download references

Acknowledgments

This work is supported by the Région Champagne-Ardenne (grant “WiDiD”) and the Lebanese University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nisrine Ghadban.

Appendix 1: Theoretical Results on the Performance of Diffusion Strategies

Appendix 1: Theoretical Results on the Performance of Diffusion Strategies

In this appendix, we study the theoretical performance of the diffusion strategies for in-network PCA in terms of convergence and recursive error.

1.1 General Diffusion Model

We introduce a general diffusion model so that ATC and CTA strategies are special cases of this model. We consider the case of extracting the first principal axis. The case of multiple axes can be easily derived as in the previous section. The general diffusion model is:

$$\begin{aligned} \varvec{\phi }_{k,t}&=\sum_{l\in{\mathcal{V}}_{k}} a_{1,kl} \, \varvec{w}_{l,t-1},\\ \varvec{\psi }_{k,t}&=\varvec{\phi }_{k,t}+\eta_{k,t} \, \sum_{l\in{\mathcal{V}}_k}c_{lk}\nabla_{\varvec{w}} J_l(\varvec{\phi }_{k,t}),\\ \varvec{w}_{k,t}&=\sum_{l\in{\mathcal{V}}_{k}} a_{2,kl} \, \varvec{\psi }_{l,t}, \end{aligned}$$

where \(a_{1,kl}\), \(c_{lk}\) and \(a_{2,kl}\) are nonnegative coefficients. Let \(\varvec{A}_1\), \(\varvec{C}\) and \(\varvec{A}_2\) be the matrices of these coefficients, respectively, so that the k-th column of \(\varvec{A}_1\) and \(\varvec{A}_2\) consist respectively of \(\{ a_{1,kl}, l=1\ldots ,N\}\) and \(\{ a_{2,kl}, l=1\ldots ,N\}\), and the k-th row of \(\varvec{C}\) is formed of \(\{ c_{kl}, l=1\ldots ,N\}\). Then these matrices satisfy the following conditions:

$$\varvec{A}_1^\top {{1\!\!1}}_N={{1\!\!1}}_N, \quad \varvec{C}{{1\!\!1}}_N={{1\!\!1}}_N, \quad \varvec{A}_2^\top {{1\!\!1}}_N={{1\!\!1}}_N,$$

where \({{1\!\!1}}_N\) is the \((N\times 1)\) vector whose entries are equal to one. In other words, \(\varvec{A}_1\) and \(\varvec{A}_2\) are left stochastic matrices and \(\varvec{C}\) is a right stochastic matrix. Let \(\varvec{I}_N\) be the \((N \times N)\) identity matrix. Cooperation strategies are derived according to the choice of matrices \(\varvec{A}_1\), \(\varvec{C}\) and \(\varvec{A}_2\). For example, if \(\varvec{A}_1=\varvec{I}_N\) and \(\varvec{A}_2=\varvec{A}\), we get the ATC strategy. If \(\varvec{A}_1=\varvec{A}\) and \(\varvec{A}_2=\varvec{I}_N\), we have the CTA strategy. The choice \(\varvec{C}=\varvec{I}_N\) is the case of no information exchange with the exception of the aggregation step. Furthermore, if \(\varvec{A}_1=\varvec{A}_2=\varvec{C}=\varvec{I}_N\), we have the noncooperative strategy. Table 2 shows these different cases.

Table 2 Different choices of \(\varvec{A}_1\), \(\varvec{C}\) and \(\varvec{A}_2\) corresponding to different strategies

1.2 Recursive Error

Our goal is to examine the convergence of the estimated \(\varvec{w}_{k,t}\) obtained from the diffusion strategies toward the optimal solution of the centralized strategy, denoted \(\varvec{w}_*\). For this, we introduce the following error vectors:

$$\begin{aligned} \widetilde{\varvec{\phi }}_{k,t}&= \varvec{w}_*-\varvec{\phi }_{k,t} \\ \widetilde{\varvec{\psi }}_{k,t}&= \varvec{w}_*-\varvec{\psi }_{k,t} \\ \widetilde{\varvec{w}}_{k,t}&= \varvec{w}_*-\varvec{w}_{k,t} \end{aligned}$$

Each of these error vectors measures the residue on the principal axis \(\varvec{w}_*\). Before studying the convergence, we recall that:

$$y_{\varvec{w},k}=\varvec{w}^\top \varvec{x}_k =\varvec{x}_k^\top \varvec{w}.$$
(12)

Multiplying on the left both side by \(\varvec{x}_k\) and averaging, we get:

$$\begin{aligned}{\mathbb{E}}(\varvec{x}_k y_{\varvec{w},k})&={\mathbb{E}}(\varvec{x}_k\varvec{x}_k^\top )\varvec{w}, \\ \varvec{m}_{w,k}&=\varvec{\mathfrak{C}}\varvec{w},\end{aligned}$$
(13)

where \(\varvec{m}_{w,k}={\mathbb{E}}(\varvec{x}_k y_{\varvec{w},k})\). We deduce that, at the optimum,

$$\varvec{w}_*=\varvec{\mathfrak{C}}^{-1}\varvec{m}_{w,k}$$
(14)

is the solution of a linear system of equations and that this solution can be calculated by each node directly from \(\varvec{\mathfrak{C}}\) and \(\varvec{m}_{w,k}\). It is helpful to reinterpret this observation as the solution to a problem of minimizing the mean square reconstruction error. Consider then the cost function:

$$J_k^{\text{exp}}(\varvec{w})= \tfrac{1}{4} {\mathbb{E}} \Vert \varvec{x}_k - y_{\varvec{w},k} \varvec{w}\Vert ^2.$$
(15)

By developing this expression, we get:

$$J_k^{\text{exp}}(\varvec{w})=\tfrac{1}{4} \Big [{\mathbb{E}} \Vert \varvec{x}_k \Vert ^2 -\varvec{m}_{w,k}^\top \varvec{w}-\varvec{w}^\top \varvec{m}_{w,k}+\varvec{w}^\top {\mathbb{E}}(y_{\varvec{w},k}^2)\varvec{w}\Big ].$$

Taking the gradient of \(J_k^{\text{exp}}(\varvec{w})\) with respect to \(\varvec{w}\) gives:

$$\nabla_{\varvec{w}} J_k^{\text{exp}}(\varvec{w})= {\mathbb{E}}(y_{\varvec{w},k}^2) \varvec{w}-\varvec{m}_{w,k}.$$

By replacing the cost function in the general model, we obtain:

$$\varvec{\phi }_{k,t}=\sum_{l\in{\mathcal{V}}_{k}} a_{1,kl} \, \varvec{w}_{l,t-1},$$
(16)
$$\varvec{\psi }_{k,t}=\varvec{\phi }_{k,t}+\eta_{k,t} \, \sum_{l\in{\mathcal{V}}_k}c_{lk}(\varvec{m}_{w,k}- {\mathbb{E}}(y_{\varvec{w},k}^2) \varvec{\phi }_{k,t}),$$
(17)
$$\varvec{w}_{k,t}=\sum_{l\in{\mathcal{V}}_{k}} a_{2,kl} \, \varvec{\psi }_{l,t},$$
(18)

Using Eqs. (12) and (13), and the fact that \(\varvec{w}_*\) is the eigenvector of the covariance matrix \(\varvec{\mathfrak{C}}\), we have:

$$\varvec{m}_{w,k}=\varvec{\mathfrak{C}}\varvec{w}_*=\lambda \varvec{w}_*,$$

and

$${\mathbb{E}}(y_{\varvec{w},k}^2) = {\mathbb{E}}(\varvec{w}_*^\top \varvec{x}_k\varvec{x}_k^\top \varvec{w}_*) = \varvec{w}_*^\top \varvec{\mathfrak{C}}\varvec{w}_* = \lambda \Vert \varvec{w}_*\Vert ^2 =\lambda,$$

since \(\varvec{w}_*\) is a unit-norm vector. Subtracting \(\varvec{w}_*\) from both sides of the Eqs. (16)–(18), we get:

$$\widetilde{\varvec{\phi }}_{k,t}=\sum_{l\in{\mathcal{V}}_{k}} a_{1,kl} \, \widetilde{\varvec{w}}_{l,t-1},$$
(19)
$$\widetilde{\varvec{\psi }}_{k,t}=\Big ( \varvec{I}_p - \eta_{k,t} \, \sum_{l\in{\mathcal{V}}_k}c_{lk}\lambda \Big ) \widetilde{\varvec{\phi }}_{k,t},$$
(20)
$$\widetilde{ \varvec{w}}_{k,t}=\sum_{l\in{\mathcal{V}}_{k}} a_{2,kl} \, \widetilde{\varvec{\psi }}_{l,t}.$$
(21)

We can describe these relations by gathering information through the network in blocks of vectors and matrices. We collect the error vectors across all nodes in the \((N\times 1)\) block vectors, whose individual entries are of size \((p \times 1)\) each:

$$\widetilde{\psi _t}=\left[\begin{array}{l} \widetilde{\varvec{\psi }}_{1,t} \\ \widetilde{\varvec{\psi }}_{2,t} \\ \vdots \\ \widetilde{\varvec{\psi }}_{N,t} \\ \end{array}\right], \quad \widetilde{\phi _t}=\left[\begin{array}{l} \widetilde{\varvec{\phi }}_{1,t} \\ \widetilde{\varvec{\phi }}_{2,t} \\ \vdots \\ \widetilde{\varvec{\phi }}_{N,t} \\ \end{array}\right], \quad \widetilde{\varvec{w}_t}=\left[\begin{array}{l} \widetilde{\varvec{w}}_{1,t} \\ \widetilde{\varvec{w}}_{2,t} \\ \vdots \\ \widetilde{\varvec{w}}_{N,t} \\ \end{array}\right].$$

These block vectors represent the error of the network at the iteration t. We also introduce the \((N\times N)\) diagonal matrices:

$$\begin{aligned} \varvec{{\mathcal{M}}}&= {\text{diag}}\left\{ \eta_{1,t}, \eta_{2,t}, \ldots , \eta_{N,t} \right\} \\ \varvec{\mathcal{L}}&= {\text{diag}} \left\{ \sum_{l\in{\mathcal{V}}_1}c_{l1}\lambda , \sum_{l\in{\mathcal{V}}_2}c_{l2}\lambda , \ldots , \sum_{l\in{\mathcal{V}}_N}c_{lN}\lambda \right\} \end{aligned}$$

In the case where \(\varvec{C}=\varvec{I}_N\), we have \(\varvec{\mathcal{L}}=\lambda \varvec{I}_N\). Furthermore, we introduce the Kronecker product:

$$\varvec{\mathcal{A}}_1=\varvec{A}_1\otimes \varvec{I}_p,\quad \varvec{\mathcal{A}}_2=\varvec{A}_2\otimes \varvec{I}_p$$

The matrix \(\varvec{\mathcal{A}}_1\) is a \((N\times N)\) block matrix, whose (lk) input block is \(a_{1,lk}\varvec{I}_p\). Similarly for \(\varvec{\mathcal{A}}_2\). Using these definitions, we can write the Eqs. (19)–(21) as follows:

$$\widetilde{\varvec{\phi }}_{t}=\varvec{\mathcal{A}}_1^\top \, \widetilde{\varvec{w}}_{t-1},$$
(22)
$$\widetilde{\varvec{\psi }}_{t}=\Big ( \varvec{I}_N - \varvec{\mathcal{M}}\varvec{\mathcal{L}}\Big ) \widetilde{\varvec{\phi }}_{t},$$
(23)
$$\widetilde{ \varvec{w}}_{t}=\varvec{\mathcal{A}}_2^\top \, \widetilde{\varvec{\psi }}_{t}.$$
(24)

Therefore, the network error \(\widetilde{ \varvec{w}}_{t}\) evolves according to the following dynamics:

$$\widetilde{ \varvec{w}}_{t}=\varvec{\mathcal{A}}_2^\top (\varvec{I}_N - \varvec{\mathcal{M}}\varvec{\mathcal{L}}) \varvec{\mathcal{A}}_1^\top \widetilde{ \varvec{w}}_{t-1}.$$
(25)

If each node minimizes its own cost function \(J_k^{\text{exp}}(\varvec{w})\) using the noncooperative strategy, then the vector error of the N nodes evolves according to the following dynamics:

$$\widetilde{ \varvec{w}}_{t}= (\varvec{I}_N - \lambda \varvec{\mathcal{M}}) \widetilde{ \varvec{w}}_{t-1},$$

where, since \(\varvec{A}_1=\varvec{A}_2=\varvec{C}=\varvec{I}_N\) in this case, the matrices \(\varvec{\mathcal{A}}_1\) and \(\varvec{\mathcal{A}}_2\) do not appear and \(\varvec{\mathcal{L}}\) boils down to \(\lambda \varvec{I}_N\).

1.3 Convergence

The expression of the recursive error vector (25) includes block vectors and matrices. To examine the stability and convergence properties of this error, we investigate results on standard block vectors [30]. Indeed, the error vector \(\widetilde{ \varvec{w}}_{t}\) in Eq. (25) converges to zero if, and only if, the matrix \(\varvec{\mathcal{A}}_2^\top (\varvec{I}_N - \varvec{\mathcal{M}}\varvec{\mathcal{L}}) \varvec{\mathcal{A}}_1^\top\) is a stable matrix, i.e., all eigenvalues are strictly inside the unit disc [31]. Since \(\varvec{A}_1\) and \(\varvec{A}_2\) are left stochastic matrices [32], \(\varvec{\mathcal{A}}_2^\top (\varvec{I}_N - \varvec{\mathcal{M}}\varvec{\mathcal{L}}) \varvec{\mathcal{A}}_1^\top\) is stable if \((\varvec{I}_N - \varvec{\mathcal{M}}\varvec{\mathcal{L}})\) is stable. This stability is checked by the following condition:

$$\eta_{k,t}<\frac{2}{\sum_{l\in{\mathcal{V}}_k} c_{lk} \lambda}$$
(26)

We note that this stability does not depend on the combination of matrices \(\varvec{A}_1\) and \(\varvec{A}_2\), but only on the matrix \(\varvec{C}\). If there is no information exchange between the nodes except for the aggregation step, i.e., \(\varvec{C}=\varvec{I}_N\), this condition becomes:

$$\eta_{k,t}<\frac{2}{ \lambda }$$
(27)

Similarly in the case of noncooperative strategy, i.e., \(\varvec{A}_1=\varvec{A}_2=\varvec{C}=\varvec{I}_N\), we have the same condition.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghadban, N., Honeine, P., Mourad-Chehade, F. et al. In-Network Principal Component Analysis with Diffusion Strategies. Int J Wireless Inf Networks 23, 97–111 (2016). https://doi.org/10.1007/s10776-016-0308-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10776-016-0308-1

Keywords

Navigation