Bach, F. R., Jenatton, R., Mairal, J., & Obozinski, G. (2012). Optimization with sparsity-inducing penalties.

*Foundations and Trends in Machine Learning*,

*4*(1), 1–106.

CrossRef
Bakin, S. (1999). *Adaptative regression and model selection in data mining problems*. Ph.D. thesis, Australian National University.

Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems.

*SIAM Journal on Imaging Sciences*,

*2*, 183–202.

MathSciNetMATHCrossRef
Bertsekas, D. P. (1999).

*Nonlinear programming*. Belmont: Athena Scientific.

MATH
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In *Proceedings of conference on learning theory (COLT)* (pp. 144–152).

Chang, K. W., Hsieh, C. J., & Lin, C. J. (2008). Coordinate descent method for large-scale l2-loss linear support vector machines.

*Journal of Machine Learning Research*,

*9*, 1369–1398.

MathSciNetMATH
Combettes, P., & Wajs, V. (2005). Signal recovery by proximal forward-backward splitting.

*Multiscale Modeling & Simulation*,

*4*, 1168–1200.

MathSciNetMATHCrossRef
Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines.

*Journal of Machine Learning Research*,

*2*, 265–292.

MATH
Dredze, M., Crammer, K., & Pereira, F. (2008). Confidence-weighted linear classification. In

*Proceedings of international conference on machine learning (ICML)* (pp. 264–271).

CrossRef
Duchi, J., & Singer, Y. (2009a). Boosting with structural sparsity. In *Proceedings of international conference on machine learning (ICML)* (pp. 297–304).

Duchi, J., & Singer, Y. (2009b). Efficient online and batch learning using forward backward splitting.

*Journal of Machine Learning Research*,

*10*, 2899–2934.

MathSciNetMATH
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. In *Proceedings of neural information processing systems (NIPS)* (pp. 681–687).

Fan, R. E., & Lin, C. J. (2007). *A study on threshold selection for multi-label classification*. Tech. rep., National Taiwan University.

Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization.

*The Annals of Applied Statistics*,

*1*, 302–332.

MathSciNetMATHCrossRef
Friedman, J., Hastie, T., & Tibshirani, R. (2010a).

*A note on the group lasso and a sparse group lasso*. Tech. Rep.

arXiv:1001.0736.

Friedman, J. H., Hastie, T., & Tibshirani, R. (2010b). Regularization paths for generalized linear models via coordinate descent. *Journal of Statistical Software*, *33*, 1–22.

Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso.

*Journal of Computational and Graphical Statistics*,

*7*, 397–416.

MathSciNet
Lee, Y., Lin, Y., & Wahba, G. (2004). Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data.

*Journal of the American Statistical Association*,

*99*, 67–81.

MathSciNetMATHCrossRef
Mangasarian, O. (2002). A finite Newton method for classification.

*Optimization Methods and Software*,

*17*, 913–929.

MathSciNetMATHCrossRef
Meier, L., Van de Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression.

*Journal of the Royal Statistical Society. Series B. Statistical Methodology*,

*70*(1), 53–71.

MathSciNetMATHCrossRef
Obozinski, G., Taskar, B., & Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems.

*Statistics and Computing*,

*20*(2), 231–252.

MathSciNetCrossRef
Qin, Z., Scheinberg, K., & Goldfarb, D. (2010). *Efficient block-coordinate descent algorithms for the group lasso*. Tech. rep., Columbia University.

Richtárik, P., & Takáč, M. (2012a). Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. *Mathematical Programming*, 1–38.

Richtárik, P., & Takáč, M. (2012b).

*Parallel coordinate descent methods for big data optimization*. Tech. Rep.

arXiv:1212.0873.

Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification.

*Journal of Machine Learning Research*,

*5*, 101–141.

MathSciNetMATH
Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2010). Pegasos: primal estimated sub-gradient solver for svm. *Mathematical Programming*, 1–28.

Shevade, S. K., & Keerthi, S. S. (2003). A simple and efficient algorithm for gene selection using sparse logistic regression.

*Bioinformatics*,

*19*(17), 2246–2253.

CrossRef
Tseng, P., & Yun, S. (2009). A coordinate gradient descent method for nonsmooth separable minimization.

*Mathematical Programming*,

*117*, 387–423.

MathSciNetMATHCrossRef
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. In *Proceedings of international conference on machine learning (ICML)* (pp. 1113–1120).

Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In *Proceedings of European symposium on artificial neural networks, computational intelligence and machine learning* (pp. 219–224).

Wright, S. J. (2012). Accelerated block-coordinate relaxation for regularized optimization.

*SIAM Journal on Optimization*,

*22*, 159–186.

MathSciNetMATHCrossRef
Wright, S. J., Nowak, R. D., & Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation.

*Transactions on Signal Processing*,

*57*(7), 2479–2493.

MathSciNetCrossRef
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables.

*Journal of the Royal Statistical Society, Series B*,

*68*, 49–67.

MathSciNetMATHCrossRef
Yuan, G. X., Chang, K. W., Hsieh, C. J., & Lin, C. J. (2010). A comparison of optimization methods and software for large-scale l1-regularized linear classification.

*Journal of Machine Learning Research*,

*11*, 3183–3234.

MathSciNetMATH
Yuan, G. X., Ho, C. H., & Lin, C. J. (2011). An improved glmnet for l1-regularized logistic regression. In *Proceedings of the international conference on knowledge discovery and data mining* (pp. 33–41).

Zhang, H. H., Liu, Y., Wu, Y., & Zhu, J. (2006). Variable selection for multicategory svm via sup-norm regularization.

*Electronic Journal of Statistics*,

*2*, 149–167.

MathSciNetCrossRef
Zhao, P., & Yu, B. (2006). On model selection consistency of lasso.

*Journal of Machine Learning Research*,

*7*, 2541–2563.

MathSciNetMATH
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net.

*Journal of the Royal Statistical Society, Series B*,

*67*, 301–320.

MathSciNetMATHCrossRef