Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)


We often encounter situations in supervised learning where there exist possibly groups that consist of more than two parameters. For example, we might work on parameters that correspond to words expressing the same meaning, music pieces in the same genre, and books released in the same year. Based on such auxiliary information, we could suppose that parameters in a group have similar roles in a problem and similar values. In this paper, we propose the Higher Order Fused (HOF) regularization that can incorporate smoothness among parameters with group structures as prior knowledge in supervised learning. We define the HOF penalty as the Lovász extension of a submodular higher-order potential function, which encourages parameters in a group to take similar estimated values when used as a regularizer. Moreover, we develop an efficient network flow algorithm for calculating the proximity operator for the regularized problem. We investigate the empirical performance of the proposed algorithm by using synthetic and real-world data.


Ordinary Little Square Average Root Mean Square Error Semantic Group Submodular Function Music Piece 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bach, F.R.: Structured sparsity-inducing norms through submodular functions. In: Proc. of NIPS, pp. 118–126 (2010)Google Scholar
  2. 2.
    Bach, F.R.: Shaping level sets with submodular functions. In: Proc. of NIPS, pp. 10–18 (2011)Google Scholar
  3. 3.
    Bach, F.R., Jenatton, R., Mairal, J., Obozinski, G.: Structured sparsity through convex optimization. Statistical Science 27(4), 450–468 (2012)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1), 183–202 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Chaux, C., Combettes, P.L., Pesquet, J.C., Wajs, V.R.: A variational formulation for frame-based inverse problems. Inverse Problems 23(4), 1495 (2007)CrossRefMathSciNetzbMATHGoogle Scholar
  6. 6.
    Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)Google Scholar
  7. 7.
    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation 4(4), 1168–1200 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Edmonds, J.: Submodular functions, matroids, and certain polyhedra. In: Combinatorial Structures and their Applications, pp. 69–87 (1970)Google Scholar
  9. 9.
    Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
  10. 10.
    Friedman, J., Hastie, T., Höfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. The Annals of Applied Statistics 1(2), 302–332 (2007)CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Fujishige, S.: Submodular functions and optimization, vol. 58. Elsevier (2005)Google Scholar
  12. 12.
    Fujishige, S., Hayashi, T., Isotani, S.: The minimum-norm-point algorithm applied to submodular function minimization and linear programming. Technical report, Research Institute for Mathematical Sciences Preprint RIMS-1571, Kyoto University, Kyoto, Japan (2006)Google Scholar
  13. 13.
    Fujishige, S., Patkar, S.B.: Realization of set functions as cut functions of graphs and hypergraphs. Discrete Mathematics 226(1), 199–210 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  14. 14.
    Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing 18(1), 30–55 (1989)CrossRefMathSciNetzbMATHGoogle Scholar
  15. 15.
    Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proc. of ICML, pp. 433–440 (2009)Google Scholar
  16. 16.
    Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. The Journal of Machine Learning Research 12, 2777–2824 (2011)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Koh, K., Kim, S.J., Boyd, S.P.: An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research 8(8), 1519–1555 (2007)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision 82(3), 302–324 (2009)CrossRefGoogle Scholar
  19. 19.
    Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 147–159 (2004)CrossRefGoogle Scholar
  20. 20.
    Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University (2009)Google Scholar
  21. 21.
    Lovász, L.: Submodular functions and convexity. In: Mathematical Programming the State of the Art, pp. 235–257. Springer (1983)Google Scholar
  22. 22.
    Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)Google Scholar
  23. 23.
    Nagano, K., Kawahara, Y.: Structured convex optimization under submodular constraints. In: Proc. of UAI, pp. 459–468 (2013)Google Scholar
  24. 24.
    Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. In: Proc. of ICML, pp. 977–984 (2011)Google Scholar
  25. 25.
    Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Soviet Mathematics Doklady 27, 372–376 (1983)zbMATHGoogle Scholar
  26. 26.
    Nesterov, Y.E.: Smooth minimization of non-smooth functions. Mathematical Programming 103(1), 127–152 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  27. 27.
    Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1), 259–268 (1992)CrossRefMathSciNetzbMATHGoogle Scholar
  28. 28.
    Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293–300 (1999)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Takamura, H., Inui, T., Okumura, M.: Extracting semantic orientations of words using spin model. In: Proc. of ACL, pp. 133–140. Association for Computational Linguistics (2005)Google Scholar
  30. 30.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)Google Scholar
  31. 31.
    Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1), 91–108 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  32. 32.
    Xin, B., Kawahara, Y., Wang, Y., Gao, W.: Efficient generalized fused lasso with its application to the diagnosis of alzheimers disease. In: Proc. of AAAI, pp. 2163–2169 (2014)Google Scholar
  33. 33.
    Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Proc. of NIPS, pp. 352–360 (2011)Google Scholar
  34. 34.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1), 49–67 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  35. 35.
    Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on bregman iteration. Journal of Scientific Computing 46(1), 20–46 (2011)CrossRefMathSciNetzbMATHGoogle Scholar
  36. 36.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2), 301–320 (2005)CrossRefMathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.NTT Communication Science LaboratoriesKyotoJapan
  2. 2.The Institute of Scientific and Industrial Research (ISIR)Osaka UniversityOsakaJapan

Personalised recommendations