Structured Sparsity: Discrete and Convex Approaches

  • Anastasios Kyrillidis
  • Luca BaldassarreEmail author
  • Marwa El Halabi
  • Quoc Tran-Dinh
  • Volkan Cevher
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)


During the past decades, sparsity has been shown to be of significant importance in fields such as compression, signal sampling and analysis, machine learning, and optimization. In fact, most natural data can be sparsely represented, i.e., a small set of coefficients is sufficient to describe the data using an appropriate basis. Sparsity is also used to enhance interpretability in real-life applications, where the relevant information therein typically resides in a low dimensional space. However, the true underlying structure of many signal processing and machine learning problems is often more sophisticated than sparsity alone. In practice, what makes applications differ is the existence of sparsity patterns among coefficients. In order to better understand the impact of such structured sparsity patterns, in this chapter we review some realistic sparsity models and unify their convex and non-convex treatments. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and hierarchical models. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications in image processing, neuronal signal processing, and confocal imaging.


Discrete Wavelet Transform Discrete Model Compressive Sense Convex Relaxation Submodular Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Argyriou, A., Micchelli, C., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers (2000). arXiv preprint arXiv:1104.1436Google Scholar
  2. 2.
    Bach, F.: Structured sparsity-inducing norms through submodular functions. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 118–126 (2010)Google Scholar
  3. 3.
    Bach, F.: Learning with submodular functions: a convex optimization perspective (2011). arXiv preprint arXiv:1111.6453Google Scholar
  4. 4.
    Bah, B., Baldassarre, L., Cevher, V.: Model-based sketching and recovery with expanders. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA) (2014)Google Scholar
  5. 5.
    Baldassarre, L., Bhan, N., Cevher, V., Kyrillidis, A.: Group-sparse model selection: Hardness and relaxations (2013). arXiv preprint arXiv:1303.3207Google Scholar
  6. 6.
    Baraniuk, R.: Optimal tree approximation with wavelets. In: Proceedings of SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, pp. 196–207. International Society for Optics and Photonics (1999)Google Scholar
  7. 7.
    Baraniuk, R., DeVore, R., Kyriazis, G., Yu, X.: Near best tree approximation. Adv. Comput. Math. 16(4), 357–373 (2002)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Baraniuk, R., Cevher, V., Duarte, M., Hegde, C.: Model-based compressive sensing. IEEE Trans. Inf. Theory 56(4), 1982–2001 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Baraniuk, R., Cevher, V., Wakin, M.: Low-dimensional models for dimensionality reduction and signal recovery: a geometric perspective. Proc. IEEE 98(6), 959–971 (2010)CrossRefGoogle Scholar
  10. 10.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Bertsekas, D.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Bhan, N., Baldassarre, L., Cevher, V.: Tractability of interpretability via selection of group-sparse models. In: Proceedings of IEEE International Symposium on Information Theory (ISIT) (2013)Google Scholar
  14. 14.
    Blumensath, T., Davies, M.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Blumensath, T., Davies, M.: Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Bonnans, J.: Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29, 161–186 (1994)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Born, M., Wolf, E.: Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light. 7th edn. Cambridge University Press, Cambridge, UK (1999)CrossRefGoogle Scholar
  18. 18.
    Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer-Verlag, New York, US (2006)CrossRefGoogle Scholar
  19. 19.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)CrossRefGoogle Scholar
  20. 20.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge, UK (2004)CrossRefGoogle Scholar
  21. 21.
    Buchbinder, N., Feldman, M., Naor, J., Schwartz, R.: A tight linear time 1∕2-approximation for unconstrained submodular maximization. In: IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 649–658 (2012)Google Scholar
  22. 22.
    Buchbinder, N., Feldman, M., Naor, J., Schwartz, R.: Submodular maximization with cardinality constraints. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA) (2014)Google Scholar
  23. 23.
    Candes, E.: Compressive sampling. In: Proceedings of the International Congress of Mathematicians: Madrid, August 22–30, 2006: Invited Lectures, pp. 1433–1452 (2006)Google Scholar
  24. 24.
    Cartis, C., Thompson, A.: An exact tree projection algorithm for wavelets (2013). arXiv preprint arXiv:1304.4570Google Scholar
  25. 25.
    Cevher, V., Hegde, C., Duarte, M., Baraniuk, R.: Sparse signal recovery using Markov random fields. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation (2009)Google Scholar
  26. 26.
    Chambolle, A., De Vore, R., Lee, N., Lucier, B.: Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7(3), 319–335 (1998)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Chandrasekaran, V., Recht, B., Parrilo, P., Willsky, A.: The convex geometry of linear inverse problems. Found. Comput. Math. 12, 805–849 (2012)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Combettes, P., Wajs, V.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simulat. 4(4), 1168–1200 (2005)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Crouse, M., Nowak, R., Baraniuk, R.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. Signal Process. 46(4), 886–902 (1998)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Dahl, J., Vandenberghe, L., Roychowdhury, V.: Covariance selection for nonchordal graphs via chordal embedding. Optim. Methods Softw. 23(4), 501–520 (2008)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Das, A., Dasgupta, A., Kumar, R.: Selecting diverse features via spectral regularization. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 1592–1600 (2012)Google Scholar
  34. 34.
    Das, A., Kempe, D.: Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection (2011). arXiv preprint arXiv:1102.3975Google Scholar
  35. 35.
    Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)CrossRefGoogle Scholar
  36. 36.
    Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Dughmi, S.: Submodular functions: extensions, distributions, and algorithms: a survey (2009). arXiv preprint arXiv:0912.0322Google Scholar
  38. 38.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRefGoogle Scholar
  39. 39.
    El Halabi, M., Baldassarre, L., Cevher, V.: To convexify or not? Regression with clustering penalties on graphs. In: IEEE 5th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 21–24 (20130Google Scholar
  40. 40.
    Eldar, Y., Mishali, M.: Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso (2010). arXiv preprint arXiv:1001.0736Google Scholar
  43. 43.
    Fujishige, S., Isotani, S.: A submodular function minimization algorithm based on the minimum-norm base. Pac. J. Optim. 7(1), 3–17 (2011)MathSciNetGoogle Scholar
  44. 44.
    Fujishige, S., Patkar, S.: Realization of set functions as cut functions of graphs and hypergraphs. Discret. Math. 226(1), 199–210 (2001)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12(8), 989–1000 (1981)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Gerstner, W., Kistler, W.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge, UK (2002)CrossRefGoogle Scholar
  47. 47.
    Gilbert, A., Indyk, P.: Sparse recovery using sparse matrices. Proc. IEEE 98(6), 937–947 (2010)CrossRefGoogle Scholar
  48. 48.
    Girosi, F.: An equivalence between sparse approximation and support vector machines. Neural Comput. 10(6), 1455–1480 (1998)CrossRefGoogle Scholar
  49. 49.
    Goldberg, A., Rao, S.: Beyond the flow decomposition barrier. J. ACM 45(5), 783–797 (1998)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Goldstein, T., Donoghue, B., Setzer, S.: Fast Alternating Direction Optimization Methods. CAM Report, pp. 12–35 (2012)Google Scholar
  51. 51.
    Goy, A., Psaltis, D.: Digital confocal microscope. Opt. Exp. 20(20), 22720 (2012)CrossRefGoogle Scholar
  52. 52.
    Gramfort, A., Kowalski, M.: Improving M/EEG source localization with an inter-condition sparse prior. In: Proceedings of IEEE International Symposium on Biomedical Imaging (2009)Google Scholar
  53. 53.
    Guigue, V., Rakotomamonjy, A., Canu, S.: Kernel basis pursuit. In: Machine Learning, pp. 146–157. Springer-Verlag, Berlin, Heidelberg (2005)Google Scholar
  54. 54.
    He, B., Yuan, X.: On the O(1∕n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)MathSciNetCrossRefGoogle Scholar
  55. 55.
    He, L., Carin, L.: Exploiting structure in wavelet-based Bayesian compressive sensing. IEEE Trans. Signal Process. 57(9), 3488–3497 (2009)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)Google Scholar
  57. 57.
    Hegde, C., Duarte, M., Cevher, V.: Compressive sensing recovery of spike trains using a structured sparsity model. In: Signal Processing with Adaptive Sparse Structured Representations (SPARS) (2009)Google Scholar
  58. 58.
    Hsieh, C., Sustik, M., Dhillon, I., Ravikumar, P.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 2330–2338 (2011)Google Scholar
  59. 59.
    Hsieh, C., Sustik, M., Dhillon, I., Ravikumar, P., Poldrack, R.: BIG & QUIC: Sparse inverse covariance estimation for a million variables. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 3165–3173 (2013)Google Scholar
  60. 60.
    Huang, J., Zhang, T.: The benefit of group sparsity. Ann. Stat. 38(4), 1978–2004 (2010)CrossRefGoogle Scholar
  61. 61.
    Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)MathSciNetGoogle Scholar
  62. 62.
    Indyk, P., Razenshteyn, I.: On model-based RIP-1 matrices. In: Automata, Languages, and Programming, pp. 564–575. Springer-Verlag, Berlin, Heidelberg (2013)Google Scholar
  63. 63.
    International Neuroinformatics Coordinating Faculty.: Spike time prediction – challenge C (2009)Google Scholar
  64. 64.
    Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of The 30th International Conference on Machine Learning (ICML) (2009)Google Scholar
  65. 65.
    Jalali, A., Ravikumar, P., Vasuki, V., Sanghavi, S.: On learning discrete graphical models using group-sparse regularization. In: Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 378–387 (2011)Google Scholar
  66. 66.
    Jegelka, S., Lin, H., Bilmes, J.: On fast approximate submodular minimization. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 460–468 (2011)Google Scholar
  67. 67.
    Jenatton, R., Audibert, J.-Y., Bach, F.: Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12, 2777–2824 (2011)MathSciNetGoogle Scholar
  68. 68.
    Jenatton, R., Gramfort, A., Michel, V., Obozinski, G., Bach, F., Thirion, B.: Multi-scale mining of fMRI data with hierarchical structured sparsity. In: Pattern Recognition in NeuroImaging (PRNI) (2011)Google Scholar
  69. 69.
    Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12, 2297–2334 (2011)MathSciNetGoogle Scholar
  70. 70.
    Johnstone, I.: On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat. 29(2), 295–327 (2001)MathSciNetCrossRefGoogle Scholar
  71. 71.
    Kim, S., Xing, E.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of The 30th International Conference on Machine Learning (ICML), pp. 543–550 (2010)Google Scholar
  72. 72.
    Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)CrossRefGoogle Scholar
  73. 73.
    Krause, A., Cevher, V.: Submodular dictionary selection for sparse representation. In: Proceedings of The 30th International Conference on Machine Learning (ICML), pp. 567–574 (2010)Google Scholar
  74. 74.
    Kyrillidis, A., Cevher, V.: Recipes on hard thresholding methods. In: Proceedings of 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) (2011)Google Scholar
  75. 75.
    Kyrillidis, A., Cevher, V.: Combinatorial selection and least absolute shrinkage via the clash algorithm. In: Proceedings of International Symposium on Information Theory Proceedings (ISIT), pp. 2216–2220 (2012)Google Scholar
  76. 76.
    Kyrillidis, A., Cevher, V.: Fast proximal algorithms for self-concordant function minimization with application to sparse graph selection. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6585–6589 (2013)Google Scholar
  77. 77.
    Kyrillidis, A., Puy, G., Cevher, V.: Hard thresholding with norm constraints. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3645–3648 (2012)Google Scholar
  78. 78.
    Lee, J., Hastie, T.: Structure learning of mixed graphical models. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp. 388–396 (2013)Google Scholar
  79. 79.
    Loh, P., Wainwright, M.: Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann. Stat. 41(6), 3022–3049 (2013)MathSciNetCrossRefGoogle Scholar
  80. 80.
    Lovász, L.: Submodular functions and convexity. In: Mathematical Programming The State of the Art, pp. 235–257. Springer-Verlag, Berlin, Heidelberg (1983)Google Scholar
  81. 81.
    Lustig, M., Donoho, D., Pauly, J.: Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn. Reson. Med. 58(6), 1182–1195 (2007)CrossRefGoogle Scholar
  82. 82.
    Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, Burlington, MA, US (1999)Google Scholar
  83. 83.
    Mallat, S., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)CrossRefGoogle Scholar
  84. 84.
    Martins, A., Smith, N., Aguiar, P., Figueiredo, M.: Structured sparsity in structured prediction. In: proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1500–1511 (2011)Google Scholar
  85. 85.
    McCoy, B., Wu, T.: The Two-Dimensional Ising Model. Harvard University Press, Cambridge, MA, US (1973)CrossRefGoogle Scholar
  86. 86.
    Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)Google Scholar
  87. 87.
    Minsky, M.: Microscopy Apparatus. US Patent 3,013,467 (1961)Google Scholar
  88. 88.
    Mosci, S., Villa, S., Verri, A., Rosasco, L.: A primal–dual algorithm for group 1 regularization with overlapping groups. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation (2010)Google Scholar
  89. 89.
    Narasimhan, M., Jojic, N., Bilmes, J.: Q-Clustering. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation (2005)Google Scholar
  90. 90.
    Natarajan, B.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)MathSciNetCrossRefGoogle Scholar
  91. 91.
    Needell, D., Tropp, J.: COSAMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetCrossRefGoogle Scholar
  92. 92.
    Nemhauser, G., Wolsey, L.: Integer and Combinatorial Optimization, vol. 18. Wiley, New York (1988)Google Scholar
  93. 93.
    Nemhauser, G., Wolsey, L., Fisher, M.: An analysis of approximations for maximizing submodular set functions. Math. Program. 14(1), 265–294 (1978)MathSciNetCrossRefGoogle Scholar
  94. 94.
    Nemirovskii, A.: Proximal-method with rate of convergence \(\mathcal{O}(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetCrossRefGoogle Scholar
  95. 95.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Sov. Math. Dokl. 27, 372–376 (1983)Google Scholar
  96. 96.
    Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)MathSciNetCrossRefGoogle Scholar
  97. 97.
    Nesterov, Y.: Smooth minimization of nonsmooth functions. Math. Program. 103(1), 127–152 (2005)MathSciNetCrossRefGoogle Scholar
  98. 98.
    Nesterov, Y.: Primal–dual subgradient methods for convex problems. Math. Program. 120(1, Ser. B), 221–259 (2009)Google Scholar
  99. 99.
    Obozinski, G., Bach, F.: Convex relaxation for combinatorial penalties (2012). arXiv preprint arXiv:1205.1240Google Scholar
  100. 100.
    Obozinski, G., Jacob, L., Vert, J.: Group lasso with overlaps: The latent group lasso approach (2011). arXiv preprint arXiv:1110.0413Google Scholar
  101. 101.
    Orlin, J.: A faster strongly polynomial time algorithm for submodular function minimization. Math. Program. 118(2), 237–251 (2009)MathSciNetCrossRefGoogle Scholar
  102. 102.
    Puig, A., Wiesel, A., Zaas, A., Woods, C., Ginsburg, G., Fleury, G., Hero, A.: Order-preserving factor analysis—application to longitudinal gene expression. IEEE Trans. Signal Process. 59, 4447–4458 (2011)MathSciNetCrossRefGoogle Scholar
  103. 103.
    Rao, N., Nowak, R., Wright, S., Kingsbury, N.: Convex approaches to model wavelet sparsity patterns. In: Proceedings of 18th IEEE International Conference on Image Processing (ICIP), pp. 1917–1920 (2011)Google Scholar
  104. 104.
    Rao, N., Recht, B., Nowak, R.: Signal recovery in unions of subspaces with applications to compressive imaging (2012). arXiv preprint arXiv:1209.3079Google Scholar
  105. 105.
    Rapaport, F., Barillot, E., Vert, J.: Classification of arrayCGH data using fused SVM. Bioinformatics 24(13), 375–i382 (2008)CrossRefGoogle Scholar
  106. 106.
    Rebafka, T., Lvy-Leduc, C., Charbit, M.: OMP-type algorithm with structured sparsity patterns for multipath radar signals (2011). arXiv preprint arXiv:1103.5158Google Scholar
  107. 107.
    Robinson, S.: Strongly regular generalized equations. Math. Oper. Res. 5, 43–62 (1980)MathSciNetCrossRefGoogle Scholar
  108. 108.
    Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation (2011)Google Scholar
  109. 109.
    Seeger, M.: On the Submodularity of Linear Experimental Design. Technical Report (2009)Google Scholar
  110. 110.
    Shapiro, J.: Embedded image coding using zero trees of wavelet coefficients. IEEE Trans. Signal Process. 41(12), 3445–3462 (1993)CrossRefGoogle Scholar
  111. 111.
    Sheppard, C., Shotton, D.: Confocal Laser Scanning Microscopy. BIOS Scientific Publishers, Garland Science, New York, US (1997)Google Scholar
  112. 112.
    Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)MathSciNetCrossRefGoogle Scholar
  113. 113.
    Stojnic, M., Parvaresh, F., Hassibi, B.: On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57(8) 3075–3085 (2009)MathSciNetCrossRefGoogle Scholar
  114. 114.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  115. 115.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 28(1), 267–288 (1996)Google Scholar
  116. 116.
    Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)Google Scholar
  117. 117.
    Tran-Dinh, Q., Cevher, V.: An Optimal Primal–Dual Decomposition Framework. Technical Report, LIONS – EPFL (2014)Google Scholar
  118. 118.
    Tran-Dinh, Q., Cevher, V.: A Unified Optimal Primal–Dual Framework for Constrained Convex Minimization. Technical Report, LIONS, pp. 1–32 (2014)Google Scholar
  119. 119.
    Tran-Dinh, Q., Cevher, V.: Constrained convex minimization via model-based excessive gap. In: Proceedings of the Neural Information Processing Systems Foundation Conference (NIPS) (2014)Google Scholar
  120. 120.
    Tran Dinh, Q., Kyrillidis, A., Cevher, V.: Composite self-concordant minimization (2013). arXiv preprint arXiv:1308.2867Google Scholar
  121. 121.
    Tran Dinh, Q., Kyrillidis, A., Cevher, V.: A proximal Newton framework for composite minimization: graph learning without Cholesky decompositions and matrix inversions. In: Proceedings of The 30th International Conference on Machine Learning (ICML), pp. 271–279 (2013)Google Scholar
  122. 122.
    Tropp, J., Gilbert, A.: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)MathSciNetCrossRefGoogle Scholar
  123. 123.
    Tseng, P.: Applications of splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Optim. 29, 119–138 (1991)CrossRefGoogle Scholar
  124. 124.
    Villa, S., Rosasco, L., Mosci, S., Verri, A.: Proximal methods for the latent group lasso penalty. Comput. Optim. Appl. 58(2), 1–27 (2012)MathSciNetGoogle Scholar
  125. 125.
    Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)MathSciNetCrossRefGoogle Scholar
  126. 126.
    Vincent, M., Hansen, N.: Sparse group lasso and high dimensional multinomial classification. Comput. Stat. Data Anal. 71, 771–786 (2014)MathSciNetCrossRefGoogle Scholar
  127. 127.
    Wright, S., Nowak, R., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)MathSciNetCrossRefGoogle Scholar
  128. 128.
    Wright, S., Nocedal, J.: Numerical Optimization. Springer, New York (1999)Google Scholar
  129. 129.
    Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Proceedings of Neural Information Processing Systems (NIPS) Foundation, pp. 352–360 (2011)Google Scholar
  130. 130.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)Google Scholar
  131. 131.
    Zeng, X., Figueiredo, M.: A novel sparsity and clustering regularization (2013). arXiv preprint arXiv:1310.4945Google Scholar
  132. 132.
    Zhang, Z., Shi, Y., Yin, B.: MR images reconstruction based on TV-group sparse model. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2013)Google Scholar
  133. 133.
    Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)MathSciNetCrossRefGoogle Scholar
  134. 134.
    Zhou, H., Sehl, M.E., Sinsheimer, J.S., Lange, K.: Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19), 2375 (2010)CrossRefGoogle Scholar
  135. 135.
    Zhou, Y., Jin, R., Hoi, S.: Exclusive lasso for multi-task feature selection. In: Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 988–995 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Anastasios Kyrillidis
    • 1
  • Luca Baldassarre
    • 1
    Email author
  • Marwa El Halabi
    • 1
  • Quoc Tran-Dinh
    • 1
  • Volkan Cevher
    • 1
  1. 1.EPFLLausanneSwitzerland

Personalised recommendations