Skip to main content

Low-Rank Transfer Learning

Abstract

Real-world visual data are expensive to label for the purpose of training supervised learning algorithms. Leverage of auxiliary databases with well labeled data for the new task may save considerable labeling efforts. However, data in the auxiliary databases are often obtained under conditions that differ from those in the new task. Transfer learning provides techniques for transferring learned knowledge from a source domain to a target domain by mitigating the divergence. In this chapter, we discuss transfer learning in a generalized subspace where each target sample can be represented by some combination of source samples under a low-rank constraint. Under this constraint, the underlying structure of both source and target domains are considered in the knowledge transfer, which brings in three benefits: First, good alignment between domains is ensured in that only relevant data in some subspace of the source domain are used to reconstruct the data in the target domain. Second, the discriminative power of the source domain is naturally passed on to the target domain. Third, noisy information will be filtered out in the knowledge transfer. Extensive experiments on synthetic data, and important computer vision problems, e.g., face recognition application, visual domain adaptation for object recognition, demonstrate the superiority of the proposed approach over the existing, well-established methods.

Keywords

  • Transfer learning
  • Low-rank constraint
  • Subspace learning
  • Domain adaptation

© 2014 Springer. Reprinted, with permission, from International Journal of Computer Vision, August 2014, Volume 109, Issue 1–2, pp. 74–93.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-12000-3_5
  • Chapter length: 29 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-12000-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.00
Price excludes VAT (USA)
Hardcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    So far, we still consider larger energies as being better for the subspace learning method. However, this will change later once we start minimizing rather than maximizing the objective function.

  2. 2.

    We use minimal of PCA instead of maximal to fit the LTSL.

  3. 3.

    Note that we use ULPP and UNPE to denote unsupervised LPP and NPE, and SLPP and SNPE to denote supervised LPP and NPE.

References

  1. A. Argyriou, T. Evgeniou, M. Pontil, Multi-task feature learning, Advances in Neural Information Processing Systems (MIT, Cambridge, 2007), pp. 41–48

    Google Scholar 

  2. A. Arnold, R. Nallapati, W. Cohen, A comparative study of methods for transductive transfer learning. in International Conference on Data Mining (Workshops). IEEE (2007), pp. 77–82

    Google Scholar 

  3. Y. Aytar, A. Zisserman, Tabula rasa: model transfer for object category detection. in IEEE International Conference on Computer Vision. IEEE (2011), pp. 2252–2259

    Google Scholar 

  4. R.H. Bartels, G. Stewart, Solution of the matrix equation ax + xb = c [f4]. Commun. ACM 15(9), 820–826 (1972)

    CrossRef  Google Scholar 

  5. P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (2002)

    CrossRef  Google Scholar 

  6. M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6), 1373–1396 (2003)

    CrossRef  MATH  Google Scholar 

  7. J. Blitzer, D. Foster, S. Kakade, Domain adaptation with coupled subspaces. JMLR Proc. Track 15, 173–181 (2011)

    Google Scholar 

  8. J. Blitzer, R. McDonald, F. Pereira, Domain adaptation with structural correspondence learning. in Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2006), pp. 120–128

    Google Scholar 

  9. J.F. Cai, E.J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)

    MathSciNet  CrossRef  MATH  Google Scholar 

  10. E. Candes, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 11 (2011)

    MathSciNet  CrossRef  Google Scholar 

  11. E. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)

    MathSciNet  CrossRef  MATH  Google Scholar 

  12. M. Chen, K. Weinberger, J. Blitzer, Co-training for domain adaptation. in Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  13. D. Coppersmith, S. Winograd, Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 251–280 (1990)

    MathSciNet  CrossRef  MATH  Google Scholar 

  14. W. Dai, G. Xue, Q. Yang, Y. Yu, Co-clustering based classification for out-of-domain documents. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2007), pp. 210–219

    Google Scholar 

  15. W. Dai, G.R. Xue, Q. Yang, Y. Yu, Transferring naive bayes classifiers for text classification. in AAAI Conference on Artificial Intelligence (2007), pp. 540–545

    Google Scholar 

  16. W. Dai, Q. Yang, G. Xue, Y. Yu, Boosting for transfer learning. in International Conference on Machine Learning. ACM (2007), pp. 193–200

    Google Scholar 

  17. H. Daumé, Frustratingly easy domain adaptation. Annu. Meet. ACL 45, 256–263 (2007)

    Google Scholar 

  18. H. Daumé III, D. Marcu, Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26(1), 101–126 (2006)

    MATH  Google Scholar 

  19. L. Duan, I.W. Tsang, D. Xu, T.S. Chua, Domain adaptation from multiple sources via auxiliary classifiers. in International Conference on Machine Learning. ACM (2009), pp. 289–296

    Google Scholar 

  20. L. Duan, D. Xu, S.F. Chang, Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. in IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012), pp. 1338–1345

    Google Scholar 

  21. L. Duan, D. Xu, I. Tsang, Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans. Neural Networks Learn. Syst. 23(3), 504–518 (2012)

    CrossRef  Google Scholar 

  22. L. Duan, D. Xu, I.W.H. Tsang, J. Luo, Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1667–1680 (2012)

    CrossRef  Google Scholar 

  23. J. Eckstein, D. Bertsekas, On the douglasłrachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)

    MathSciNet  CrossRef  MATH  Google Scholar 

  24. J. Gao, W. Fan, J. Jiang, J. Han, Knowledge transfer via multiple model local structure mapping. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008), pp. 283–291

    Google Scholar 

  25. X. Glorot, A. Bordes, Y. Bengio, Domain adaptation for large-scale sentiment classification: a deep learning approach. in International Conference on Machine Learning. ACM (2011), pp. 513–520

    Google Scholar 

  26. B. Gong, Y. Shi, F. Sha, K. Grauman, Geodesic flow kernel for unsupervised domain adaptation. in IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012), pp. 2066–2073

    Google Scholar 

  27. R. Gopalan, R. Li, R. Chellappa, Domain adaptation for object recognition: an unsupervised approach. in IEEE International Conference on Computer Vision. IEEE (2011), pp. 999–1006

    Google Scholar 

  28. X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding. in International Conference on Computer Vision, vol. 2. IEEE (2005), pp. 1208–1213

    Google Scholar 

  29. X. He, P. Niyogi, Locality preserving projections, Advances in Neural Information Processing Systems (MIT, Cambridge, 2004)

    Google Scholar 

  30. J. Ho, M. Yang, J. Lim, K. Lee, D. Kriegman, Clustering appearances of objects under varying illumination conditions. in IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE (2003), pp. I-11

    Google Scholar 

  31. J. Hoffman, E. Rodner, J. Donahue, K. Saenko, T. Darrell, Efficient learning of domain-invariant image representations (2013), arXiv:1301.3224

  32. D. Huang, J. Sun, Y. Wang, The buaa-visnir face database instructions (2012), http://irip.buaa.edu.cn/research/The_BUAA-VisNir_Face_Database_Instructions.pdf

  33. I.H. Jhuo, D. Liu, D. Lee, S.F. Chang, Robust visual domain adaptation with low-rank reconstruction. in IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012), pp. 2168–2175

    Google Scholar 

  34. J. Jiang, C. Zhai, Instance weighting for domain adaptation in nlp. Annu. Meet. ACL 45, 264–271 (2007)

    Google Scholar 

  35. W. Jiang, E. Zavesky, S.F. Chang, A. Loui, Cross-domain learning methods for high-level visual concept classification. in IEEE International Conference on Image Processing. IEEE (2008), pp. 161–164

    Google Scholar 

  36. R. Keshavan, A. Montanari, S. Oh, Matrix completion from noisy entries. J. Mach. Learn. Res. 99, 2057–2078 (2010)

    MathSciNet  Google Scholar 

  37. B. Kulis, P. Jain, K. Grauman, Fast similarity search for learned metrics. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2143–2157 (2009)

    CrossRef  Google Scholar 

  38. B. Kulis, K. Saenko, T. Darrell, What you saw is not what you get: domain adaptation using asymmetric kernel transforms. in IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2011), pp. 1785–1792

    Google Scholar 

  39. N. Lawrence, J. Platt, Learning to learn with the informative vector machine. in International conference on Machine learning. ACM (2004), pp. 65–72

    Google Scholar 

  40. J. Lim, R. Salakhutdinov, A. Torralba, Transfer learning by borrowing examples for multiclass object detection, Advances in Neural Information Processing Systems (MIT, Cambridge, 2011)

    Google Scholar 

  41. Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical report, UILU-ENG-09-2215 (2009)

    Google Scholar 

  42. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)

    CrossRef  Google Scholar 

  43. G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation. in International Conference on Machine Learning (2010), pp. 663–670

    Google Scholar 

  44. D. Lopez-Paz, J. Hernndez-Lobato, B. Schölkopf, Semi-supervised domain adaptation with non-parametric copulas, Advances in Neural Information Processing Systems (MIT, Cambridge, 2012)

    Google Scholar 

  45. L. Lu, R. Vidal, Combined central and subspace clustering for computer vision applications. in International Conference on Machine Learning. ACM (2006), pp. 593–600

    Google Scholar 

  46. L. Mihalkova, T. Huynh, R. Mooney, Mapping and revising markov logic networks for transfer learning. In: AAAI Conference on Artificial Intelligence (2007), pp. 608–614

    Google Scholar 

  47. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    CrossRef  Google Scholar 

  48. G.J. Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, T. Huang, Towards cross-category knowledge propagation for learning visual concepts. in IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2011), pp. 897–904

    Google Scholar 

  49. R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, Self-taught learning: Transfer learning from unlabeled data. in International Conference on Machine Learning (2007), pp. 759–766

    Google Scholar 

  50. S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    CrossRef  Google Scholar 

  51. K. Saenko, B. Kulis, M. Fritz, T. Darrell, Adapting visual category models to new domains. in European Computer Vision Conference (2010), pp. 213–226

    Google Scholar 

  52. M. Shao, C. Castillo, Z. Gu, Y. Fu, Low-rank transfer subspace learning. in International Conference on Data Mining. IEEE (2012), pp. 1104–1109

    Google Scholar 

  53. M. Shao, D. Kit, Y. Fu, Generalized transfer subspace learning through low-rank constraint. Int. J. Comput. Vision 109(1–2), 74–93 (2014)

    CrossRef  Google Scholar 

  54. M. Shao, S. Xia, Y. Fu, Genealogical face recognition based on ub kinface database. in IEEE Conference on Computer Vision and Pattern Recognition (Workshop on Biometrics) (2011), pp. 65–70

    Google Scholar 

  55. S. Si, D. Tao, B. Geng, Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22(7), 929–942 (2010)

    CrossRef  Google Scholar 

  56. Q. Sun, R. Chattopadhyay, S. Panchanathan, J. Ye, A two-stage weighting framework for multi-source domain adaptation, Advances in Neural Information Processing Systems (MIT, Cambridge, 2011)

    Google Scholar 

  57. M. Turk, A. Pentland, Eigenfaces for recognition. J. Cognitive Neurosci. 3(1), 71–86 (1991)

    CrossRef  Google Scholar 

  58. Z. Wang, Y. Song, C. Zhang, Transferred dimensionality reduction, Machine Learning and Knowledge Discovery in Databases (Springer, Heidelberg, 2008), pp. 550–565

    CrossRef  Google Scholar 

  59. J. Wright, A. Ganesh, S. Rao, Y. Peng, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Adv. Neural Inf. Proc. Syst. 22, 2080–2088 (2009)

    Google Scholar 

  60. S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 40–51 (2007)

    CrossRef  Google Scholar 

  61. J. Yang, R. Yan, A.G. Hauptmann: Cross-domain video concept detection using adaptive svms. in International Conference on Multimedia. ACM (2007), pp. 188–197

    Google Scholar 

  62. J. Yang, W. Yin, Y. Zhang, Y. Wang, A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J. Imaging Sci. 2(2), 569–592 (2009)

    MathSciNet  CrossRef  MATH  Google Scholar 

  63. C. Zhang, J. Ye, L. Zhang, Generalization bounds for domain adaptation, Advances in Neural Information Processing Systems (MIT, Cambridge, 2012)

    Google Scholar 

  64. T. Zhang, D. Tao, J. Yang, Discriminative locality alignment, European Conference on Computer Vision (Springer, Heidelberg, 2008), pp. 725–738

    Google Scholar 

Download references

Acknowledgments

This research is supported in part by the NSF CNS award 1314484, Office of Naval Research award N00014-12-1-1028, Air Force Office of Scientific Research award FA9550-12-1-0201, U.S. Army Research Office grant W911NF-13-1-0160, and IC Postdoc Program Grant 2011-11071400006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Shao .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 2

Suppose \(X_\text{ s }\) and \(X_\text{ t }\) are strictly drawn from \(\mathcal {S}_i\) and \(\mathcal {T}_i\). We use \(Y_\text{ s } = P^\text{ T } X_\text{ s }\) and \(Y_\text{ t } = P^\text{ T } X_\text{ t }\) to denote their low-dimensional representations in the subspace \(P\). Both \(Y_\text{ s }\) and \(Y_\text{ t }\) are of size \(m \times n\). Therefore, the energy of PCA in the source domain \(\mathcal {S}_i\) is:

$$\begin{aligned} F(P, \mathcal {S}_i) = \frac{1}{n-1} \left\| Y_\text{ s } - Y_\text{ s }*\frac{1}{n}\mathbf {e}\mathbf {e^\text{ T }}\right\| _F^2, \end{aligned}$$
(16)

where \(\mathbf {e}\) is \(\{\mathbf {e}_i = 1 | i = 1,2, \ldots , n\}\). For simplicity, we remove the constant term \(\frac{1}{n-1}\) and replace \(\frac{1}{n}\mathbf {e}\mathbf {e^\text{ T }}\) with the matrix \(C\). Then the energy of PCA in \(\mathcal {S}_i\) and \(\mathcal {T}_i\) can be rewritten as:

$$\begin{aligned} F(P, \mathcal {S}_i)&= \left\| Y_\text{ s } - Y_\text{ s }*C\right\| _F^2.\end{aligned}$$
(17)
$$\begin{aligned} F(P, \mathcal {T}_i)&= \left\| Y_\text{ s } Z_i - Y_\text{ s }*C Z_i \right\| _F^2. \end{aligned}$$
(18)

Since \(\widehat{Z}_i\) is non-singular, we have \(\widehat{Z}_i\widehat{Z}_i^{-1} = \mathbf {I}\) and the above function can be rewritten as:

$$\begin{aligned} F(P, \mathcal {S}_i)&= \Vert Y_\text{ s } \widehat{Z}_i\widehat{Z}_i^{-1} - Y_\text{ s }*C \widehat{Z}_i\widehat{Z}_i^{-1} \Vert _F^2\nonumber \\&= \Vert \left( Y_\text{ s } \widehat{Z}_i - Y_\text{ s }*C \widehat{Z}_i \right) \widehat{Z}_i^{-1} \Vert _F^2\nonumber \\&\le \Vert Y_\text{ s } \widehat{Z}_i - Y_\text{ s }*C \widehat{Z}_i \Vert _F^2 \Vert \widehat{Z}_i^{-1} \Vert _F^2. \end{aligned}$$
(19)

Note that \(\Vert Y_\text{ s } \widehat{Z}_i - Y_\text{ s }\,*\,C \widehat{Z}_i \Vert _F^2 \) is the PCA energy of the data from the target domain that has been perturbed by \(\widehat{Z}_i\). Therefore, \(F(P, \widehat{\mathcal {T}}_i) = \Vert Y_\text{ s } \widehat{Z}_i - Y_\text{ s }*C \widehat{Z}_i \Vert _F^2\). Combining that with the inequality in (19) results in:

$$\begin{aligned} F(P, \mathcal {\widehat{T}}_i) \ge F(P, \mathcal {S}_i) \Vert \widehat{Z}_i^{-1}\Vert _F^{-2}. \end{aligned}$$
(20)

If we add \(F(P, \mathcal {{T}}_i)\) to both sides of Inequality (20), then we derive Inequality (8).   \(\blacksquare \)

There are three points to be made. First, the difference between \(F(P, \mathcal {T}_i)\) and \(F(P, \widehat{\mathcal {T}}_i)\) is that the first one is the PCA energy of \(Y_\text{ s } Z_i\) while the second one is the PCA energy of \(Y_\text{ s } \widehat{Z}_i\) where \(\widehat{Z}_i = Z_i + \gamma \mathbf {I}\). Compared with \(Y_\text{ s } Z_i\), \(Y_\text{ s } \widehat{Z}_i\) adds a small term to each vector. However, this will not cause a significant change in \(F(P, \mathcal {T}_i)\). Therefore, term \(\xi \) in Theorem 2 will not be very large. Second, although we only compare the PCA energy of \(\mathcal {S}_i\) and \(\mathcal {T}_i\) that are drawn from one subspace, this theorem is easily extended to any other subspace included in the subspace union. Finally, other subspace learning methods can be similarly proven, since they are all unified in the linear graph embedding framework [60]. For example, the proof for LDA holds the value of \(\mathbf {Tr}(P^\text{ T }SP)\) fixed, where \(S=S_\text{ b } + S_\text{ w }\), and then maximizes \(\mathbf {Tr}(P^\text{ T }S_\text{ b }P)\) in the same way as was done for PCA.

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Shao, M., Kit, D., Fu, Y. (2014). Low-Rank Transfer Learning. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12000-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11999-1

  • Online ISBN: 978-3-319-12000-3

  • eBook Packages: Computer ScienceComputer Science (R0)