D-trace estimation of a precision matrix using adaptive Lasso penalties

Avagyan, Vahe; Alonso, Andrés M.; Nogales, Francisco J.

doi:10.1007/s11634-016-0272-8

D-trace estimation of a precision matrix using adaptive Lasso penalties

Regular Article
Published: 01 September 2016

Volume 12, pages 425–447, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Vahe Avagyan^1,2,
Andrés M. Alonso² &
Francisco J. Nogales³

687 Accesses
5 Citations
Explore all metrics

Abstract

The accurate estimation of a precision matrix plays a crucial role in the current age of high-dimensional data explosion. To deal with this problem, one of the prominent and commonly used techniques is the $\ell _1$ norm (Lasso) penalization for a given loss function. This approach guarantees the sparsity of the precision matrix estimate for properly selected penalty parameters. However, the $\ell _1$ norm penalization often fails to control the bias of obtained estimator because of its overestimation behavior. In this paper, we introduce two adaptive extensions of the recently proposed $\ell _1$ norm penalized D-trace loss minimization method. They aim at reducing the produced bias in the estimator. Extensive numerical results, using both simulated and real datasets, show the advantage of our proposed estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

References

Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley-Interscience, New York
Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9:485–516
MathSciNet MATH Google Scholar
Banerjee S, Ghosal S (2015) Bayesian structure learning in graphical models. J Multivar Anal 136:147–162
Article MathSciNet MATH Google Scholar
Bickel PJ, Levina E (2008) Regularized estimation of large covariance matrices. Ann Stat 36(1):199–227
Article MathSciNet MATH Google Scholar
Cai T, Liu W, Luo X (2011) A constrained ${\ell _1}$ minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607
Article MATH Google Scholar
Cai T, Yuan M (2012) Adaptive covariance matrix estimation through block thresholding. Ann Stat 40(4):2014–2042
Article MathSciNet MATH Google Scholar
Cui Y, Leng C, Sun D (2016) Sparse estimation of high-dimensional correlation matrices. Comput Stat Data Anal 93:390–403
Article MathSciNet Google Scholar
d’Aspremont A, Banerjee O, Ghaoui L (2008) First-order methods for sparse covariance selection. SIAM J Matrix Anal Appl 30:56–66
Article MathSciNet MATH Google Scholar
Dempster A (1972) Covariance selection. Biometrics 28(1):157–175
Article Google Scholar
Deng X, Tsui K (2013) Penalized covariance matrix estimation using a matrix-logarithm transformation. J Comput Graph Stat 22(2):494–512
Article MathSciNet Google Scholar
Duchi J, Gould S, Koller D (2008) Projected subgradient methods for learning sparse Gaussians. In: Proceeding of the 24th conference on uncertainty in artificial intelligence, pp 153–160. arXiv:1206.3249
El Karoui N (2008) Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann Appl Stat 36(6):2717–2756
Article MathSciNet MATH Google Scholar
Fan J, Feng J, Wu Y (2009) Network exploration via the adaptive Lasso and SCAD penalties. Ann Appl Stat 3(2):521–541
Article MathSciNet MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet MATH Google Scholar
Frahm G, Memmel C (2010) Dominating estimator for minimum-variance portfolios. J Econom 159:289–302
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9(3):432–441
Article MATH Google Scholar
Goto S, Xu Y (2015) Improving mean variance optimization through sparse hedging restrictions. J Finan Quant Anal 50(06):1415–1441
Article Google Scholar
Haff LR (1980) Estimation of the inverse covariance matrix: random mixtures of the inverse Wishart matrix and the identity. Ann Stat 8(3):586–597
Article MathSciNet MATH Google Scholar
Hsieh C-J, Dhillon IS, Ravikumar PK, Sustik MA (2011) Sparse inverse covariance matrix estimation using quadratic approximation. In: Advances in neural information processing systems, vol 24, pp 2330–2338
Huang S, Li J, Sun L, Ye J, Fleisher A, Wu T, Chen K, Reiman E (2010) Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation. NeuroImage 50:935–949
Article Google Scholar
Johnstone IM (2001) On the distribution of the largest eigenvalue in principal component analysis. Ann Stat 29(3):295–327
Article MathSciNet MATH Google Scholar
Jorissen RN, Lipton L, Gibbs P, Chapman M, Desai J, Jones IT, Yeatman TJ, East P, Tomlinson IP, Verspaget HW, Aaltonen LA, Kruhøffer M, Orntoft TF, Andersen CL, Sieber OM (2008) DNA copy-number alterations underlie gene expression differences between microsatellite stable and unstable colorectal cancers. Clin Cancer Res 14(24):8061–8069
Article Google Scholar
Kourtis A, Dotsis G, Markellos N (2012) Parameter uncertainty in portfolio selection: shrinking the inverse covariance matrix. J Bank Finan 36:2522–2531
Article Google Scholar
Kuerer HM, Newman LA, Smith TL, Ames FC, Hunt KK, Dhingra K, Theriault RL, Singh G, Binkley SM, Sneige N, Buchholz TA, Ross MI, McNeese MD, Buzdar AU, Hortobagyi GN, Singletary SE (1999) Clinical course of breast cancer patients with complete pathologic primary tumor and axillary lymph node response to doxorubicin-based neoadjuvant chemotherapy. J Clin Oncol 17(2):460–469
Article Google Scholar
Lam C, Fan J (2009) Sparsistency and rates of convergence in large covariance matrix estimation. Ann Stat 37(6B):4254
Article MathSciNet MATH Google Scholar
Lauritzen S (1996) Graphical models. Clarendon Press, Oxford
MATH Google Scholar
Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411
Article MathSciNet MATH Google Scholar
Ledoit O, Wolf M (2012) Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann Stat 40(2):1024–1060
Article MathSciNet MATH Google Scholar
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, New York
MATH Google Scholar
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Article Google Scholar
Maurya A (2014) A joint convex penalty for inverse covariance matrix estimation. Comput Stat Data Anal 75:15–27
Article MathSciNet Google Scholar
McLachlan S (2004) Discriminant analysis and statistical pattern recognition. Wiley, New Jersey
Meinshausen N (2007) Relaxed Lasso. Comput Stat Data Anal 52(1):374–393
Article MathSciNet MATH Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34(2):1436–1462
Article MathSciNet MATH Google Scholar
Nguyen TD, Welsch RE (2010) Outlier detection and robust covariance estimation using mathematical programming. Adv Data Anal Classif 4(4):301–334
Article MathSciNet MATH Google Scholar
Ravikumar P, Wainwright M, Raskutti G, Yu B (2011) High-dimensional covariance estimation by minimizing $\ell _1$-penalized log-determinant divergence. Electr J Stat 5:935–980
Article MATH Google Scholar
Rothman A, Bickel P, Levina E (2009) Generalized thresholding of large covariance matrices. J Am Stat Assoc 104(485):177–186
Article MathSciNet MATH Google Scholar
Rothman A, Bickel P, Levina E, Zhu J (2008) Sparse permutation invariant covariance estimation. Electr J Stat 2:494–515
Article MathSciNet MATH Google Scholar
Rothman AJ (2012) Positive definite estimators of large covariance matrices. Biometrika 99(2):733–740
Article MathSciNet MATH Google Scholar
Ryali S, Chen T, Supekar K, Menon V (2012) Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty. NeuroImage 59(4):3852–3861
Article Google Scholar
Schafer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4(1):Article 32
Scheinberg K, Ma S, Goldfarb D (2010) Sparse inverse covariance selection via alternating linearization methods. In: Advances in neural information processing systems, vol 23, pp 2101–2109
Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, deLongueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Scherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK (2010) The microarray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28(8):827–838
Article Google Scholar
Stifanelli PF, Creanza TM, Anglani R, Liuzzi VC, Mukherjee S, Schena FP, Ancona N (2013) A comparative study of covariance selection models for the inference of gene regulatory networks. J Biomed Inf 46:894–904
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc 58(1):267–288
MathSciNet MATH Google Scholar
Touloumis A (2015) Nonparametric Stein-type shrnikage covariance matrix estimators in high-dimensional settings. Comput Stat Data Anal 83:251–261
Article MathSciNet Google Scholar
van de Geer S, Buhlmann P, Zhou S (2010) The adaptive and the thresholded Lasso for potentially misspecified models. arXiv preprint arXiv:1001.5176
Wang Y, Daniels MJ (2014) Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor. J Multivar Anal 130:21–26
Article MathSciNet MATH Google Scholar
Warton DI (2008) Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J Am Stat Assoc 103(481):340–349
Article MathSciNet MATH Google Scholar
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
MATH Google Scholar
Witten DM, Friedman JH, Simon N (2011) New insights and faster computations for the graphical Lasso. J Comput Graph Stat 20(4):892–900
Article MathSciNet Google Scholar
Xue L, Ma S, Zou H (2012) Positive-definite $\ell _1$-penalized estimation of large covariance matrices. J Am Stat Assoc 107(500):1480–1491
Article MATH Google Scholar
Yin J, Li J (2013) Adjusting for high-dimensional covariates in sparse precision matrix estimation by $\ell _1$-penalization. J Multivar Anal 116:365–381
Article MATH Google Scholar
Yuan M (2010) High dimensional inverse covariance matrix estimation via linear programming. J Mach Learn Res 11:2261–2286
MathSciNet MATH Google Scholar
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35
Article MathSciNet MATH Google Scholar
Zerenner T, Friederichs P, Lehnertz K, Hense A (2014) A Gaussian graphical model approach to climate networks. Chaos: an interdisciplinary. J Nonlinear Sci 24(2):023103
MATH Google Scholar
Zhang C-H, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36(4):1567–1594
Zhang T, Zou H (2014) Sparse precision matrix estimation via Lasso penalized D-trace loss. Biometrika 88:1–18
MathSciNet MATH Google Scholar
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We would like to thank the Associate Editor, Coordinating Editor and two anonymous referees for their helpful comments that led to an improvement of this article. We express our gratitude to Teng Zhang and Hui Zou for sharing their Matlab code that solves the $\ell _1$ norm penalized D-trace loss minimization problem. Andrés M. Alonso gratefully acknowledges financial support from CICYT (Spain) Grants ECO2012-38442 and ECO2015-66593. Francisco J. Nogales and Vahe Avagyan were supported by the Spanish Government through project MTM2013-44902-P. This paper is based on the first author’s dissertation submitted to the Universidad Carlos III de Madrid. At the time of publication, Vahe Avagyan is a Postdoctoral fellow at Ghent University.

Author information

Authors and Affiliations

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 281 (S9) Krijgslaan, 9000, Ghent, Belgium
Vahe Avagyan
Department of Statistics, Universidad Carlos III de Madrid, c/ Madrid 126, 28903, Getafe, Madrid, Spain
Vahe Avagyan & Andrés M. Alonso
Department of Statistics, Universidad Carlos III de Madrid, Av. de la Universidad 30, 28911, Leganes, Madrid, Spain
Francisco J. Nogales

Authors

Vahe Avagyan
View author publications
You can also search for this author in PubMed Google Scholar
Andrés M. Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Nogales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vahe Avagyan.

Appendices

Appendix A: Numerical results

See Tables 5, 6, 7, 8, 9, 10, 11 and 12.

Table 5 Average KL losses (with standard deviations) over 100 replications

Full size table

Table 6 Average Frobenius norm losses (with standard deviations) over 100 replications

Full size table

Table 7 Average operator norm losses (with standard deviations) over 100 replications

Full size table

Table 8 Average matrix $\ell _1$ norm losses (with standard deviations) over 100 replication

Full size table

Table 9 Average specificity (with standard deviations) over 100 replications

Full size table

Table 10 Average sensitivity (with standard deviations) over 100 replications

Full size table

Table 11 Average MCC (with standard deviations) over 100 replications

Full size table

Table 12 Average accuracy (with standard deviations) over 100 replications

Full size table

Appendix B: Algorithms

In this section, we describe in detail the steps of the algorithm for obtaining the estimators DT, ADT and WADT based on the alternating direction method. First, we introduce matrices $\Omega _0$ and $\Omega _1$. Next, we consider the following optimization problem instead of problem (3):

$$\begin{aligned} \begin{array}{ll} &{}\widehat{\Omega }_{\text {DT}}=\arg \min \limits _{\Omega _1\succ \epsilon I}\ \dfrac{1}{2}\text {trace}(\Omega ^2 S)-\text {trace}(\Omega )+\tau ||\Omega _0||_{1,\text {off}},\\ &{} \quad \text {subject to} \ \ \ \ \{\Omega ,\Omega \}=\{\Omega _0,\Omega _1\} \end{array} \end{aligned}$$

(21)

Note that the problems (3) and (21) are equivalent. The Lagrangian of the problem (21) has the following form:

$$\begin{aligned} L(\Omega ,\Omega _0,\Omega _1,\Lambda _0, \Lambda _1)= & {} \dfrac{1}{2}\text {trace}(\Omega ^2 S)-\text {trace}(\Omega )+\tau ||\Omega _0||_{1,\text {off}}+h(\Omega _1\succeq \epsilon I) \nonumber \\&+\,\text {trace}(\Lambda _0(\Omega -\Omega _0))+\text {trace}(\Lambda _1(\Omega -\Omega _1)) \nonumber \\&+\,\dfrac{\rho }{2}||\Omega -\Omega _0||_2^2+\dfrac{\rho }{2}||\Omega -\Omega _1||_2^2, \end{aligned}$$

(22)

where $\rho $, $\Lambda _0$, $\Lambda _1$ are the multipliers and $h(\Omega _1\succeq \epsilon I)$ is an indicator function, which returns 0 if the statement $\Omega _1\succeq \epsilon I$ is true and $\infty $, otherwise. For simplicity, we take $\rho =1$. Assume that $(\Omega ^t,\Omega ^t_0,\Omega ^t_1,\Lambda _0^t,\Lambda _1^t)$ is the solution at step t, for $t=0,1,2,\ldots $. The solution is updated according to the following:

$$\begin{aligned} \Omega ^{t+1}=\arg \min _{\Omega =\Omega ^T}L(\Omega ,\Omega ^t_0,\Omega ^t_1,\Lambda _0^t,\Lambda _1^t), \end{aligned}$$

(23)

$$\begin{aligned} \{\Omega ^{t+1}_0,\Omega ^{t+1}_1\}=\underset{\Omega _0=\Omega _0^T, \Omega _1\succeq \epsilon I}{\text {argmin}}L(\Omega ^{t+1},\Omega _0,\Omega _1,\Lambda _0^t,\Lambda _1^t), \end{aligned}$$

(24)

$$\begin{aligned} \{\Lambda ^{t+1}_0,\Lambda ^{t+1}_1\}=\{\Lambda ^{t}_0,\Lambda ^{t}_1\}+\{\Omega ^{t+1}-\Omega ^{t+1}_0,\Omega ^{t+1}-\Omega ^{t+1}_1\}. \end{aligned}$$

(25)

From the Eq. (23) we have the following:

$$\begin{aligned} \Omega ^{t+1}=\underset{\Omega =\Omega ^T}{\text {argmin}}\dfrac{1}{2} \text {trace}(\Omega ^2 (S+2I))-\text {trace}(\Omega (I+\Omega _0^t+\Omega _1^t-\Lambda _0^t-\Lambda _1^t)). \end{aligned}$$

(26)

First, for any $p\times p$ symmetric matrix $Z\succ 0$ and any $p\times p$ symmetric matrix Y we define a matrix G(Z, Y). Assuming that $Z=UVU^T$ is the eigendecomposition of matrix Z and $v_1\ge \cdots \ge v_p$ are its eigenvalues, we define

$$\begin{aligned} G(Z,Y)=U\{(U^T Y U)\circ C \}U^T, \end{aligned}$$

(27)

where $C_{i,j}=\dfrac{2}{v_i+v_j}$ for $1\le i,j\le p$ and $\circ $ denotes the Hadamard product of matrices. Zhang and Zou (2014) proved that we can write the solution of the problem (26) as $\Omega ^{t+1}=G(S+2I;I+\Omega _0^t+\Omega _1^t-\Lambda _0^t-\Lambda _1^t )$. For more details we refer to Theorem 1 in Zhang and Zou (2014).

From the first part of the Eq. (24) it follows that

$$\begin{aligned} \Omega _0^{t+1}=\underset{\Omega _0=\Omega _0^T}{\text {argmin}}\dfrac{1}{2} \text {trace}(\Omega _0^2)-\text {trace}(\Omega _0(\Omega ^{t+1}+\Lambda _0^t))+\tau ||\Omega _0||_{1,\text {off}}. \end{aligned}$$

(28)

We rewrite the problem (28) in the following form:

$$\begin{aligned} \Omega _0^{t+1}=\underset{\Omega _0=\Omega _0^T}{\text {argmin}}\dfrac{1}{2} \text {trace}(\Omega _0^2)-\text {trace}(\Omega _0 A)+\tau ||\Omega _0||_{1,\text {off}}, \end{aligned}$$

(29)

where $A=(\Omega ^{t+1}+\Lambda _0^t)$. It is easy to check that the solution of the problem (29) is given as $\Omega _0^{t+1}=T(A, \tau )$, where operator T is defined in (6). As mentioned earlier, (29) is a crucial step of the algorithm, which leads to the soft-thresholding operator T. Thus, by substituting the operator T with the operator AT or WAT, we can obtain the our proposed estimators ADT or WADT, respectively.

From the second part of the Eq. (24) it follows that

$$\begin{aligned} \Omega _1^{t+1}=\underset{\Omega _1\succeq \epsilon I}{\text {argmin}}\dfrac{1}{2} \text {trace}(\Omega _1^2)-\text {trace}(\Omega _1(\Omega ^{t+1}+\Lambda _1^t)). \end{aligned}$$

(30)

The solution of the problem (30) is given as

$$\begin{aligned} \Omega _1^{t+1}=[\Omega ^{t+1}+\Lambda _1^t]_+, \end{aligned}$$

(31)

where for any symmetric matrix Z with an eigendecomosition $Z=U_Z \text {diag}(\alpha _1,\ldots ,\alpha _p)U_Z^T$ the operator $[Z]_+$ is defined as $[Z]_+=U_Z \text {diag}(\max \{\alpha _1,\epsilon \},\ldots ,\max \{\alpha _p,\epsilon \})U_Z^T$.

After having all the steps of the alternating direction method provided above, we describe Algorithm 1.

It is important to note that we can significantly reduce the computational time of Algorithm 1 by discarding the constraint $\Omega \succeq \epsilon I$ in the initial optimization problem (DT, WADT or ADT). This enables us to omit the step $\Theta _1^{t+1}=[\Theta ^{t+1}+\Lambda _1^t]_+$ from 2(c), which is the most computationally expensive part of the algorithm. We can call the optimization problem without the constraint $\Omega \succeq \epsilon I$ the secondary problem, defined as:

$$\begin{aligned} \tilde{\Omega }=\arg \min _{\Omega ^T = \Omega }\ \dfrac{1}{2}\text {trace}(\Omega ^2 S)-\text {trace}(\Omega )+\tau \text {PEN}(\Omega ), \end{aligned}$$

(32)

where $\text {PEN}(\Omega )$ term is defined according to the estimation method (DT, ADT or WADT). Following Zhang and Zou (2014), we also present the simplified version of Algorithm 1.

In other words, if $\tilde{\Omega }\succeq \epsilon I$, we have $\hat{\Omega }=\tilde{\Omega }$, otherwise we use Algorithm 1 to find $\hat{\Omega }$ considering $\tilde{\Omega }$ as the initial value of $\hat{\Omega }$. It is clear that Algorithm 2 is not self-contained and the implementation of Algorithm 1 may be required for some iterations. However, the introduction of Algorithm 2 may save the computational time considerably.

For both algorithms we consider convergence if the following two conditions are satisfied:

$$\begin{aligned} \dfrac{||\Theta ^{t+1}-\Theta ^{t}||_2}{\max (1,||\Theta ^{t}||_2,||\Theta ^{t+1}||_2)}<10^{-7}, \ \ \ \ \dfrac{||\Theta ^{t+1}_0-\Theta ^{t}_0||_2}{\max (1,||\Theta ^{t}_0||_2,||\Theta ^{t+1}_0||_2)}<10^{-7}. \end{aligned}$$

Finally, in the algorithm we use $\epsilon =10^{-8}$.

For more details, we refer to Zhang and Zou (2014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Avagyan, V., Alonso, A.M. & Nogales, F.J. D-trace estimation of a precision matrix using adaptive Lasso penalties. Adv Data Anal Classif 12, 425–447 (2018). https://doi.org/10.1007/s11634-016-0272-8

Download citation

Received: 08 June 2015
Revised: 23 August 2016
Accepted: 25 August 2016
Published: 01 September 2016
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11634-016-0272-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

D-trace estimation of a precision matrix using adaptive Lasso penalties

Abstract

Access this article

Similar content being viewed by others

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

References

Acknowledgments