Computational Optimization and Applications

, Volume 68, Issue 2, pp 333–362 | Cite as

An alternating direction and projection algorithm for structure-enforced matrix factorization

Article

Abstract

Structure-enforced matrix factorization (SeMF) represents a large class of mathematical models appearing in various forms of principal component analysis, sparse coding, dictionary learning and other machine learning techniques useful in many applications including neuroscience and signal processing. In this paper, we present a unified algorithm framework, based on the classic alternating direction method of multipliers (ADMM), for solving a wide range of SeMF problems whose constraint sets permit low-complexity projections. We propose a strategy to adaptively adjust the penalty parameters which is the key to achieving good performance for ADMM. We conduct extensive numerical experiments to compare the proposed algorithm with a number of state-of-the-art special-purpose algorithms on test problems including dictionary learning for sparse representation and sparse nonnegative matrix factorization. Results show that our unified SeMF algorithm can solve different types of factorization problems as reliably and as efficiently as special-purpose algorithms. In particular, our SeMF algorithm provides the ability to explicitly enforce various combinatorial sparsity patterns that, to our knowledge, has not been considered in existing approaches.

Keywords

Matrix factorization Alternating direction method Projection Adaptive penalty parameter Sparse optimization Dictionary learning 

References

  1. 1.
    Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)CrossRefGoogle Scholar
  2. 2.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Conference on Neural Information Processing Systems (NIPS), pp. 556–562. MIT Press, MIT (2001)Google Scholar
  3. 3.
    Gonzalez, E., Zhang, Y.: Accelerating the lee-seung algorithm for nonnegative matrix factorization. Technical Report TR05-02, CAAM, Rice University (2005)Google Scholar
  4. 4.
    Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans. Neural Netw. 18(6), 1589–1596 (2007)CrossRefGoogle Scholar
  5. 5.
    Albright, R., Cox, J., Duling, D., Langville, A.N., Meyer, C.D.: Algorithms, initializations, and convergence for the nonnegative matrix factorization. NCSU Technical Report, Math 81706 (2006)Google Scholar
  6. 6.
    Lin, C.J.: Projected gradient methods for non-negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATHGoogle Scholar
  8. 8.
    Hoyer, P.O.: Non-negative sparse coding. Neural Networks for Signal Processing XII (Proc. IEEE Workshop on Neural Networks for Signal Processing), pp. 557–565. Martigny, Switzerland (2002)Google Scholar
  9. 9.
    Feng, T., Li, S.Z., Shum, H.Y., Zhang, H.J.: Local non-negative matrix factorization as a visual representation. In: Proceedings of the 2nd International Conference on Development and Learning, pp. 178–183 (2002)Google Scholar
  10. 10.
    Hoyer, P.O., Dayan, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetMATHGoogle Scholar
  11. 11.
    Montano, A.P., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth nonnegative matrix factorization (nsnmf). IEEE Trans. Pattern Anal. 28(3), 403–415 (2006)CrossRefGoogle Scholar
  12. 12.
    Jenatton, R., Obozinski, G., Bach, F.: Structured sparse principal component analysis. International Conference on Artificial Intellgence and Statistics (AISTATS) (2010)Google Scholar
  13. 13.
    Peharz, R., Pernkopf, F.: Sparse nonnegative matrix factorization with \(\ell _0\)-constraints. Neurocomputing 80, 38–46 (2012)CrossRefGoogle Scholar
  14. 14.
    Zheng, W.S., Lai, J.H., Liao, S.C., He, R.: Extracting non-negative basis images using pixel dispersion penalty. Pattern Recogn. 45(8), 2912–2926 (2012)CrossRefGoogle Scholar
  15. 15.
    Aharon, M., Elad, M., Bruckstein, A.: K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  16. 16.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning 8, pp. 689–696 (2009)Google Scholar
  17. 17.
    Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12, 2777–2824 (2011)MathSciNetMATHGoogle Scholar
  18. 18.
    Szabo, Z., Poczos, B., Lorincz, A.: Online group-structured dictionary learning. In: Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 11, pp. 2865–2872, Washington, DC (2011)Google Scholar
  19. 19.
    Glowinski, R., Marroco, A.: Sur lapproximation, par elements finis dordre un, et la resolution, par penalisation-dualite dune classe de problemes de dirichlet non lineaires. Revue francaise dautomatique, informatique, recherche operationnelle. Analyse numerique 9(2), 41–76 (1975)CrossRefMATHGoogle Scholar
  20. 20.
    Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)CrossRefMATHGoogle Scholar
  21. 21.
    Yang, J., Zhang, Y.: Alternating direction algorithms for L1-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Goldstein, T., Osher, S.: The split Bregman method for L1 regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989)CrossRefMATHGoogle Scholar
  24. 24.
    He, B., Liao, L., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. A(92), 103–118 (2002)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Technical Report TR12-14, CAAM, Rice University (2012)Google Scholar
  26. 26.
    Powell, M.J.D.: A Method for Nonlinear Constraints in Minimization Problems. Optimization. Academic Press, London (1969)MATHGoogle Scholar
  27. 27.
    Hestenes, M.R.: Multiplier and gradient methods. J. Optimiz. Theory Appl. 4, 303–320 (1969)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Rockafellar, R.T.: The multiplier method of hestenes and powell applied to convex programming. J. Optimiz. Theory Appl. 12, 555–562 (1973)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Fortin, M., Glowinski, R.: Augmented Lagrangian Methods. North-Holland, Amsterdam (1983)MATHGoogle Scholar
  30. 30.
    Zhang, Y.: An alternating direction algorithm for nonnegative matrix factorization. Technical Report TR10-03, CAAM, Rice University (2010)Google Scholar
  31. 31.
    He, B.S., Yang, H., Wang, S.L.: Alternating direction method with self adaptive penalty parameters for monotone variational inequalities. J. Optimiz. Theory Appl. 106(2), 337–356 (2000)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Progam. Comput. 2, 203–230 (2010)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low rank representation. Advances in Neural Information Processing Systems(NIPS) 24 (2011)Google Scholar
  34. 34.
    Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, pp. 138–142 (1994)Google Scholar
  35. 35.
    Donoho, D., Stodden, V.: When does non-negative matrix factorization give a correct decomposition into Parts? In: Proceedings of 17th Annual Conferance Neural Information Processing Systems NIPS (2003)Google Scholar
  36. 36.
    Mallat, S., Zhang, Z.F.: Matching pursuit with time-frequency dictionaries. IEEE Trans. Signal Process. 41, 3397–3415 (1993)CrossRefMATHGoogle Scholar
  37. 37.
    Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control 50, 1873–1896 (1989)CrossRefMATHGoogle Scholar
  38. 38.
    Davis, G.M., Mallat, S.G., Zhang, Z.F.: Adaptive time-frequency decompositions. Opt. Eng. 33(7), 2183–2191 (1994)CrossRefGoogle Scholar
  39. 39.
    Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: 1993 Conference Record of The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers 1, pp. 40–44 (1993)Google Scholar
  40. 40.
    Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)MathSciNetCrossRefMATHGoogle Scholar
  42. 42.
    Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstruction from limited data using focuss: A re-weighted norm minimization algorithm. IEEE Trans. Signal Process. 45, 600–616 (1997)CrossRefGoogle Scholar
  43. 43.
    Rao, B.D., Kreutz-Delgado, K.: An affine scaling methodology for best basis selection. IEEE Trans. Signal Process. 47(1), 187–200 (1999)MathSciNetCrossRefMATHGoogle Scholar
  44. 44.
    Rao, B.D., Kreutz-Delgado, K.: Deriving algorithms for computing sparse solutions to linear inverse problems. In: IEEE Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers 1, pp. 955–959 (1998)Google Scholar
  45. 45.
    Rao, B.D., Engan, K., Cotter, S.F., Palmer, J., Kreutz-Delgado, K.: Subset selection in noise based on diversity measure minimization. IEEE Trans. Signal Process. 51(3), 760–770 (2003)CrossRefGoogle Scholar
  46. 46.
    Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer Academic Publishers, Norwell (1991)MATHGoogle Scholar
  47. 47.
    Lewicki, M.S., Olshausen, B.A.: A probabilistic framework for the adaptation and comparison of image codes. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 16(7), 1587–1601 (1999)CrossRefGoogle Scholar
  48. 48.
    Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Netw. Comput. Neural 7(2), 333–339 (1996)CrossRefGoogle Scholar
  49. 49.
    Olshausen, B.A., Field, B.J.: Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis. Res. 37, 3311–3325 (1997)CrossRefGoogle Scholar
  50. 50.
    Lewicki, M.S., Sejnowski, T.J.: Learning overcomplete representations. Neural Comput. 12, 337–365 (2000)CrossRefGoogle Scholar
  51. 51.
    Engan, K., Aase, S.O., Husøy, J.H.: Multi-frame compression: theory and design. EURASIP Signal Process. 80(10), 2121–2140 (2000)CrossRefMATHGoogle Scholar
  52. 52.
    Engan, K., Aase, S.O., Hakon-Husoy, J.H.: Method of optimal directions for frame design. In: IEEE International Conferance on Acoustics, Speech, and Signal Processing 5, pp. 2443–2446 (1999)Google Scholar
  53. 53.
    Engan, K., Rao, B.D., Kreutz-Delgado, K.: Frame design using focuss with method of optimal directions (MOD). Norwegian Signal Processing Symposium, pp. 65–69 (1999)Google Scholar
  54. 54.
    Kreutz-Delgado, K., Murray, J.F., Rao, B.D., Engan, K., Lee, T., Sejnowski, T.J.: Dictionary learning algorithms for sparse representation. Neural Comput. 15(2), 349–396 (2003)CrossRefMATHGoogle Scholar
  55. 55.
    Murray, J.F., Kreutz-Delgado, K.: An improved focuss-based learning algorithm for solving sparse linear inverse problems. In: IEEE International Conference on Signals, Systems and Computers 1, pp. 347–351 (2001)Google Scholar
  56. 56.
    Kreutz-Delgado, K., Rao, B.D.: Focuss-based dictionary learning algorithms. Wavel. Appl. Signal Image Process. VIII 4119, 1–53 (2000)Google Scholar
  57. 57.
    Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning spatially localized, parts-based representation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2001) 1, pp. 207–212 (2001)Google Scholar
  58. 58.
    Zheng, W.S., Li, S.Z., Lai, J.H., Liao, S.C.: On constrained sparse matrix factorization. In: IEEE 11th International Conference on Computer Vision (ICCV2007), pp. 1–8 (2007)Google Scholar
  59. 59.
    Bittorf, V., Recht, B., Ré, E., Tropp, J.A.: Factoring nonnegative matrices with linear programs. In: Advances in Neural Information Processing Systems (NIPS ’12), pp. 1223–1231 (2012)Google Scholar
  60. 60.
    Araújo, U.M.C., Saldanha, B.T.C., Galvao, R.K.H., Yoneyama, T., Chame, H.C., Visani, V.: The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom. Intell. Lab. Syst. 57(2), 65–73 (2001)CrossRefGoogle Scholar
  61. 61.
    Kumar, A., Sindhwani, V., Kambadur, P.: Fast conical hull algorithms for near-separable non-negative matrix factorization. In Interantional Conferance on Machine Learning (ICML ’13) 28, pp. 231–239 (2013)Google Scholar
  62. 62.
    Gillis, N., Luce, R.: Robust near-separable nonnegative matrix factorization using linear optimization. J. Mach. Learn. Res. 15, 1249–1280 (2014)MathSciNetMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Dalian University of TechnologyDalianPeople’s Republic of China
  2. 2.Rice UniversityHoustonUSA

Personalised recommendations