Knowledge and Information Systems

, Volume 55, Issue 3, pp 671–694 | Cite as

Heterogeneous representation learning with separable structured sparsity regularization

  • Pei Yang
  • Qi Tan
  • Yada Zhu
  • Jingrui He
Regular Paper


Motivated by real applications, heterogeneous learning has emerged as an important research area, which aims to model the coexistence of multiple types of heterogeneity. In this paper, we propose a heterogeneous representation learning model with structured sparsity regularization (HERES) to learn from multiple types of heterogeneity. It aims to leverage the rich correlations (e.g., task relatedness, view consistency, and label correlation) and the prior knowledge (e.g., the soft-clustering of tasks) of heterogeneous data to improve learning performance. To this end, HERES integrates multi-task, multi-view, and multi-label learning into a principled framework based on representation learning to model the complex correlations and employs the structured sparsity to encode the prior knowledge of data. The objective is to simultaneously minimize the reconstruction loss of using the factor matrices to recover the heterogeneous data, and the structured sparsity imposed on the model. The resulting optimization problem is challenging due to the non-smoothness and non-separability of structured sparsity. We reformulate the problem by using the auxiliary function and prove that the reformulation is separable, which leads to an efficient algorithm family for solving structured sparsity penalized problems. Furthermore, we propose various HERES models based on different loss functions and subsume them into the weighted HERES, which is able to handle missing data. The experimental results in comparison with state-of-the-art methods demonstrate the effectiveness of the proposed approach.


Heterogeneous learning Multi-task learning Multi-view learning Multi-label learning Structured sparsity 



This work is supported by National Natural Science Foundation of China under Grant No. 61473123, National Science Foundation under Grant No. IIS-1552654, ONR under Grant No. N00014-15-1-2821, NASA under Grant No. NNX17AJ86A, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.


  1. 1.
    Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: NIPS, pp 41–48Google Scholar
  2. 2.
    Argyriou A, Micchelli CA, Pontil M, Shen L, Xu Y (2011) Efficient first order methods for linear composite regularizers. CoRR, arXiv:1104.1436
  3. 3.
    Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. In: NIPS, pp 730–738Google Scholar
  4. 4.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT, pp 92–100Google Scholar
  5. 5.
    Caruana R (1997) Multitask learning. Mach. Learn. 28(1):41–75MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177Google Scholar
  7. 7.
    Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2011) Smoothing proximal gradient method for general structured sparse learning. In: UAI, pp 105–114Google Scholar
  8. 8.
    Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVRGoogle Scholar
  9. 9.
    Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: NIPS, pp 681–687Google Scholar
  10. 10.
    Farquhar JDR, Hardoon DR, Meng H, Shawe-Taylor J, Szedmák S (2005) Two view learning: SVM-2K, theory and practice. In: NIPSGoogle Scholar
  11. 11.
    Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: KDD, pp 895–903Google Scholar
  12. 12.
    Gong P, Zhou J, Fan W, Ye J (2014) Efficient multi-task feature learning with calibration. In: KDD, pp 761–770Google Scholar
  13. 13.
    Guo Y (2013) Convex subspace representation learning from multi-view data. In: AAAIGoogle Scholar
  14. 14.
    Han L, Zhang Y (2015) Learning tree structure in multi-task learning. In: KDD, pp 397–406Google Scholar
  15. 15.
    He J, Lawrence R (2011) A graph-based framework for multi-task multi-view learning. In: ICML, pp 25–32Google Scholar
  16. 16.
    Jacob L, Obozinski G, Vert J (2009) Group Lasso with overlap and graph Lasso. In: ICML, pp 433–440Google Scholar
  17. 17.
    Jenatton R, Audibert J, Bach FR (2011) Structured variable selection with sparsity-inducing norms. J Mach Learn Res 12:2777–2824MathSciNetzbMATHGoogle Scholar
  18. 18.
    Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: KDD, pp 381–389Google Scholar
  19. 19.
    Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: ICML, pp 457–464Google Scholar
  20. 20.
    Kim S, Xing EP (2010) Tree-guided group Lasso for multi-task regression with structured sparsity. In: ICML, pp 543–550Google Scholar
  21. 21.
    Kong D, Ding CHQ, Huang H (2011) Robust nonnegative matrix factorization using L21-norm. In: CIKM, pp 673–682Google Scholar
  22. 22.
    Kong X, Ng MK, Zhou Z-H (2013) Transductive multilabel learning via label set propagation. IEEE Trans Knowl Data Eng 25(3):704–719CrossRefGoogle Scholar
  23. 23.
    Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397Google Scholar
  24. 24.
    Li Y, Tian X, Liu T, Tao D (2015) Multi-task model and feature joint learning. In: IJCAI, pp 3643–3649Google Scholar
  25. 25.
    Mairal J, Jenatton R, Obozinski G, Bach FR (2010) Network flow algorithms for structured sparsity. In: NIPS, pp 1558–1566Google Scholar
  26. 26.
    Mencía EL, Fürnkranz J (2008) Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: ECML-PKDD, pp 126–135Google Scholar
  27. 27.
    Mosci S, Villa S, Verri A, Rosasco L (2010) A primal-dual algorithm for group sparse regularization with overlapping groups. In: NIPS, pp 2604–2612Google Scholar
  28. 28.
    Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In: NIPS, pp 1813–1821Google Scholar
  29. 29.
    Qin ZT, Goldfarb D (2012) Structured sparsity via alternating direction methods. J Mach Learn Res 13:1435–1468MathSciNetzbMATHGoogle Scholar
  30. 30.
    Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group Lasso. J Comput Graph Stat 22(2):231MathSciNetCrossRefGoogle Scholar
  31. 31.
    Sindhwani V, Rosenberg DS (2008) An RKHS for multi-view learning and manifold co-regularization. In: ICML, pp 976–983Google Scholar
  32. 32.
    Sridharan K, Kakade SM (2008) An information theoretic framework for multi-view learning. In: COLT, pp 403–414Google Scholar
  33. 33.
    Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288MathSciNetzbMATHGoogle Scholar
  34. 34.
    Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    White M, Yu Y, Zhang X, Schuurmans D (2012) Convex multi-view subspace learning. In: NIPS, pp 1682–1690Google Scholar
  36. 36.
    Xu C, Tao D, Xu C (2015) Multi-view intact space learning. IEEE Trans Pattern Anal Mach Intell 37(12):2531–2544CrossRefGoogle Scholar
  37. 37.
    Yang H, He J (2014) Learning with dual heterogeneity: a nonparametric bayes model. In: KDD, pp 582–590Google Scholar
  38. 38.
    Yang P, He J (2015) Model multiple heterogeneity via hierarchical multi-latent space learning. In: KDD, pp 1375–1384Google Scholar
  39. 39.
    Yang P, He J (2016) Heterogeneous representation learning with structured sparsity regularization. In: ICDM, pp 539–548Google Scholar
  40. 40.
    Yang P, He J, Yang H, Fu H (2014) Learning from label and feature heterogeneity. In: ICDM, pp 1079–1084Google Scholar
  41. 41.
    Yang S, Sun Q, Ji S, Wonka P, Davidson I, Ye J (2015) Structural graphical Lasso for learning mouse brain connectivity. In: KDD, pp 1385–1394Google Scholar
  42. 42.
    Yang X, Kim S, Xing EP (2009) Heterogeneous multitask learning with joint sparsity constraints. In: NIPS, pp 2151–2159Google Scholar
  43. 43.
    Yu H-F, Jain P, Kar P, Dhillon IS (2014) Large-scale multi-label learning with missing labels. In: ICML, pp 593–601Google Scholar
  44. 44.
    Yuan L, Liu J, Ye J (2013) Efficient methods for overlapping group Lasso. IEEE Trans Pattern Anal Mach Intell 35(9):2104–2116CrossRefGoogle Scholar
  45. 45.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Zhang J, Huan J (2012) Inductive multi-task learning with multiple view data. In: KDD, pp 543–551Google Scholar
  47. 47.
    Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048CrossRefzbMATHGoogle Scholar
  48. 48.
    Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837CrossRefGoogle Scholar
  49. 49.
    Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. In: NIPS, pp 702–710Google Scholar
  50. 50.
    Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group Lasso. In: KDD, pp 1095–1103Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.South China University of TechnologyGuangzhouChina
  2. 2.Arizona State UniversityTempeUSA
  3. 3.South China Normal UniversityGuangzhouChina
  4. 4.IBM ResearchYorktown HeightsUSA

Personalised recommendations