Skip to main content
Log in

Heterogeneous representation learning with separable structured sparsity regularization

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Motivated by real applications, heterogeneous learning has emerged as an important research area, which aims to model the coexistence of multiple types of heterogeneity. In this paper, we propose a heterogeneous representation learning model with structured sparsity regularization (HERES) to learn from multiple types of heterogeneity. It aims to leverage the rich correlations (e.g., task relatedness, view consistency, and label correlation) and the prior knowledge (e.g., the soft-clustering of tasks) of heterogeneous data to improve learning performance. To this end, HERES integrates multi-task, multi-view, and multi-label learning into a principled framework based on representation learning to model the complex correlations and employs the structured sparsity to encode the prior knowledge of data. The objective is to simultaneously minimize the reconstruction loss of using the factor matrices to recover the heterogeneous data, and the structured sparsity imposed on the model. The resulting optimization problem is challenging due to the non-smoothness and non-separability of structured sparsity. We reformulate the problem by using the auxiliary function and prove that the reformulation is separable, which leads to an efficient algorithm family for solving structured sparsity penalized problems. Furthermore, we propose various HERES models based on different loss functions and subsume them into the weighted HERES, which is able to handle missing data. The experimental results in comparison with state-of-the-art methods demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

  2. http://lear.inrialpes.fr/people/verbeek/code.

  3. http://mulan.sourceforge.net/datasets-mlc.html.

References

  1. Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: NIPS, pp 41–48

  2. Argyriou A, Micchelli CA, Pontil M, Shen L, Xu Y (2011) Efficient first order methods for linear composite regularizers. CoRR, arXiv:1104.1436

  3. Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. In: NIPS, pp 730–738

  4. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT, pp 92–100

  5. Caruana R (1997) Multitask learning. Mach. Learn. 28(1):41–75

    Article  MathSciNet  Google Scholar 

  6. Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177

  7. Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2011) Smoothing proximal gradient method for general structured sparse learning. In: UAI, pp 105–114

  8. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR

  9. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: NIPS, pp 681–687

  10. Farquhar JDR, Hardoon DR, Meng H, Shawe-Taylor J, Szedmák S (2005) Two view learning: SVM-2K, theory and practice. In: NIPS

  11. Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: KDD, pp 895–903

  12. Gong P, Zhou J, Fan W, Ye J (2014) Efficient multi-task feature learning with calibration. In: KDD, pp 761–770

  13. Guo Y (2013) Convex subspace representation learning from multi-view data. In: AAAI

  14. Han L, Zhang Y (2015) Learning tree structure in multi-task learning. In: KDD, pp 397–406

  15. He J, Lawrence R (2011) A graph-based framework for multi-task multi-view learning. In: ICML, pp 25–32

  16. Jacob L, Obozinski G, Vert J (2009) Group Lasso with overlap and graph Lasso. In: ICML, pp 433–440

  17. Jenatton R, Audibert J, Bach FR (2011) Structured variable selection with sparsity-inducing norms. J Mach Learn Res 12:2777–2824

    MathSciNet  MATH  Google Scholar 

  18. Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: KDD, pp 381–389

  19. Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: ICML, pp 457–464

  20. Kim S, Xing EP (2010) Tree-guided group Lasso for multi-task regression with structured sparsity. In: ICML, pp 543–550

  21. Kong D, Ding CHQ, Huang H (2011) Robust nonnegative matrix factorization using L21-norm. In: CIKM, pp 673–682

  22. Kong X, Ng MK, Zhou Z-H (2013) Transductive multilabel learning via label set propagation. IEEE Trans Knowl Data Eng 25(3):704–719

    Article  Google Scholar 

  23. Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  24. Li Y, Tian X, Liu T, Tao D (2015) Multi-task model and feature joint learning. In: IJCAI, pp 3643–3649

  25. Mairal J, Jenatton R, Obozinski G, Bach FR (2010) Network flow algorithms for structured sparsity. In: NIPS, pp 1558–1566

  26. Mencía EL, Fürnkranz J (2008) Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: ECML-PKDD, pp 126–135

  27. Mosci S, Villa S, Verri A, Rosasco L (2010) A primal-dual algorithm for group sparse regularization with overlapping groups. In: NIPS, pp 2604–2612

  28. Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In: NIPS, pp 1813–1821

  29. Qin ZT, Goldfarb D (2012) Structured sparsity via alternating direction methods. J Mach Learn Res 13:1435–1468

    MathSciNet  MATH  Google Scholar 

  30. Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group Lasso. J Comput Graph Stat 22(2):231

    Article  MathSciNet  Google Scholar 

  31. Sindhwani V, Rosenberg DS (2008) An RKHS for multi-view learning and manifold co-regularization. In: ICML, pp 976–983

  32. Sridharan K, Kakade SM (2008) An information theoretic framework for multi-view learning. In: COLT, pp 403–414

  33. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  34. Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494

    Article  MathSciNet  MATH  Google Scholar 

  35. White M, Yu Y, Zhang X, Schuurmans D (2012) Convex multi-view subspace learning. In: NIPS, pp 1682–1690

  36. Xu C, Tao D, Xu C (2015) Multi-view intact space learning. IEEE Trans Pattern Anal Mach Intell 37(12):2531–2544

    Article  Google Scholar 

  37. Yang H, He J (2014) Learning with dual heterogeneity: a nonparametric bayes model. In: KDD, pp 582–590

  38. Yang P, He J (2015) Model multiple heterogeneity via hierarchical multi-latent space learning. In: KDD, pp 1375–1384

  39. Yang P, He J (2016) Heterogeneous representation learning with structured sparsity regularization. In: ICDM, pp 539–548

  40. Yang P, He J, Yang H, Fu H (2014) Learning from label and feature heterogeneity. In: ICDM, pp 1079–1084

  41. Yang S, Sun Q, Ji S, Wonka P, Davidson I, Ye J (2015) Structural graphical Lasso for learning mouse brain connectivity. In: KDD, pp 1385–1394

  42. Yang X, Kim S, Xing EP (2009) Heterogeneous multitask learning with joint sparsity constraints. In: NIPS, pp 2151–2159

  43. Yu H-F, Jain P, Kar P, Dhillon IS (2014) Large-scale multi-label learning with missing labels. In: ICML, pp 593–601

  44. Yuan L, Liu J, Ye J (2013) Efficient methods for overlapping group Lasso. IEEE Trans Pattern Anal Mach Intell 35(9):2104–2116

    Article  Google Scholar 

  45. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhang J, Huan J (2012) Inductive multi-task learning with multiple view data. In: KDD, pp 543–551

  47. Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  MATH  Google Scholar 

  48. Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  49. Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. In: NIPS, pp 702–710

  50. Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group Lasso. In: KDD, pp 1095–1103

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant No. 61473123, National Science Foundation under Grant No. IIS-1552654, ONR under Grant No. N00014-15-1-2821, NASA under Grant No. NNX17AJ86A, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, P., Tan, Q., Zhu, Y. et al. Heterogeneous representation learning with separable structured sparsity regularization. Knowl Inf Syst 55, 671–694 (2018). https://doi.org/10.1007/s10115-017-1094-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1094-5

Keywords

Navigation