LSALSA: accelerated source separation via learned sparse coding
We propose an efficient algorithm for the generalized sparse coding (SC) inference problem. The proposed framework applies to both the single dictionary setting, where each data point is represented as a sparse combination of the columns of one dictionary matrix, as well as the multiple dictionary setting as given in morphological component analysis (MCA), where the goal is to separate a signal into additive parts such that each part has distinct sparse representation within an appropriately chosen corresponding dictionary. Both the SC task and its generalization via MCA have been cast as \(\ell _1\)-regularized optimization problems of minimizing quadratic reconstruction error. In an effort to accelerate traditional acquisition of sparse codes, we propose a deep learning architecture that constitutes a trainable time-unfolded version of the split augmented lagrangian shrinkage algorithm (SALSA), a special case of the alternating direction method of multipliers (ADMM). We empirically validate both variants of the algorithm, that we refer to as learned-SALSA (LSALSA), on image vision tasks and demonstrate that at inference our networks achieve vast improvements in terms of the running time and the quality of estimated sparse codes on both classic SC and MCA problems over more common baselines. We also demonstrate the visual advantage of our technique on the task of source separation. Finally, we present a theoretical framework for analyzing LSALSA network: we show that the proposed approach exactly implements a truncated ADMM applied to a new, learned cost function with curvature modified by one of the learned parameterized matrices. We extend a very recent stochastic alternating optimization analysis framework to show that a gradient descent step along this learned loss landscape is equivalent to a modified gradient descent step along the original loss landscape. In this framework, the acceleration achieved by LSALSA could potentially be explained by the network’s ability to learn a correction to the gradient direction of steeper descent.
KeywordsSparse coding Morphological component analysis Deep learning
- Adler, J., & Öktem, O. (2017). Learned primal-dual reconstruction. CoRR arXiv:1707.06474.
- Borgerding, M., & Schniter, P. (2016) Onsager-corrected deep learning for sparse linear inverse problems. In GlobalSIP.Google Scholar
- Chen, X., Liu, J., Wang, Z., & Yin, W. (2018). Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds. arXiv preprint arXiv:1808.10038.
- Choromanska, A., Cowen, B., Kumaravel, S., Luss, R., Rish, I., Kingsbury, B., Tejwani, R., & Bouneffouf, D. (2019). Beyond backprop: Alternating minimization with co-activation memory. arXiv preprint arXiv:1806.09077v3.
- Elson, J., Douceur, J., Howell, J., & Saul, J. (2007). Asirra: A CAPTCHA that exploits interest-aligned manual image categorization. In ACM CCS.Google Scholar
- Figueiredo, M., Bioucas-Dias, J., & Afonso, M. (2009). Fast frame-based image deconvolution using variable splitting and constrained optimization. In Proceedings of IEEE workshop on statistical signal processing (pp. 109–112).Google Scholar
- Golle, P. (2008). Machine learning attacks against the Asirra CAPTCHA. In ACM CCS.Google Scholar
- Greff, K., Srivastava, R. K., & Schmidhuber, J. (2016). Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771.
- Gregor, K., & LeCun, Y. (2010). Learning fast approximations of sparse coding. In ICML.Google Scholar
- Jarrett, K., Kavukcuoglu, K., Koray, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In ICCV.Google Scholar
- Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (2010). Fast inference in sparse coding algorithms with applications to object recognition. CoRR arXiv:1010.3467.
- Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (Vol. 1, No. 4, p. 7). Technical report, University of Toronto.Google Scholar
- Lange, M., Zühlke, D., Holz, O., Villmann, T. (2014). Applications of LP-norms and their smooth approximations for gradient based learning vector quantization. In ESANN.Google Scholar
- Le Roux, J., Hershey, J. R., & Weninger, F. (2015). Deep NMF for speech separation. In ICASSP.Google Scholar
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (2009). Gradient-based learning applied to document recognition. In Proceedings of the IEEE.Google Scholar
- Liao, Q., & Poggio, T. (2016). Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv preprint arXiv:1604.03640.
- Moreau, T., & Bruna, J. (2016). Understanding trainable sparse coding with matrix factorization. arXiv preprint arXiv:1609.00285.
- Orhan, E., & Pitkow, X. (2018). Skip connections eliminate singularities. In International conference on learning representations.Google Scholar
- Parekh, A., Selesnick, I., Rapoport, D., & Ayappa, I. (2014). Sleep spindle detection using time-frequency sparsity. In IEEE SPMB.Google Scholar
- Peyré, G., Fadili, J., & Starck, J. L. (2007). Learning adapted dictionaries for geometry and texture separation. In SPIE Wavelets.Google Scholar
- Schmidt, M., Fung, G., & Rosales, R. (2007). Fast optimization methods for l1 regularization: A comparative study and two new approaches. In J. N. Kok, J. Koronacki, R. L. D. Mantaras, S. Matwin, D. Mladenič, A. Skowron (Eds.), ECML.Google Scholar
- Selesnick, I. (2014). L1-norm penalized least squares with salsa. Connexions (p. 66). Retrieved March 1, 2017 from http://cnx.org/contents/e980d3cd-f201-4ef6-8992-d712bf0a88a3@5.
- Shoham, N., & Elad, M. (2008). Algorithms for signal separation exploiting sparse representations, with application to texture image separation. In Proceedings of the IEEE 25th convention of electrical and electronics engineers in Israel.Google Scholar
- Sprechmann, P., Litman, R., Yakar, T., Bronstein, A., & Sapiro, G. (2013). Efficient supervised sparse analysis and synthesis operators. In NIPS.Google Scholar
- Starck, J. L., Moudden, Y., Bobina, J., Elad, M., Donoho, D. (2005b). Morphological component analysis. In Proceedings of SPIE Wavelets.Google Scholar
- Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., & Lim Tan, C. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision (pp. 4651–4659).Google Scholar
- Wang, Z., Ling, Q., & Huang, T. (2016). Learning deep L0 encoders. In AAAI.Google Scholar
- Wisdom, S., Powers, T., Pitton, J., & Atlas, L. (2017). Deep recurrent NMF for speech separation by unfolding iterative thresholding. In IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp. 254–258).Google Scholar
- Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. CoRR arXiv:1708.07747.
- Yang, Y., Sun, J., Li, H., & Xu, Z. (2016). Deep ADMM-net for compressive sensing MRI. In NIPS.Google Scholar
- Zhou, J., Di, K., Du, J., Peng, X., Yang, H., Pan, S.J., Tsang, I. W., Liu, Y., Qin, Z., & Goh, R. (2018). SC2Net: Sparse LSTMs for sparse coding. In AAAI.Google Scholar