Generalized Transfer Subspace Learning Through Low-Rank Constraint

Shao, Ming; Kit, Dmitry; Fu, Yun

doi:10.1007/s11263-014-0696-6

Generalized Transfer Subspace Learning Through Low-Rank Constraint

Published: 31 January 2014

Volume 109, pages 74–93, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ming Shao¹,
Dmitry Kit¹ &
Yun Fu²

4503 Accesses
232 Citations
Explore all metrics

Abstract

It is expensive to obtain labeled real-world visual data for use in training of supervised algorithms. Therefore, it is valuable to leverage existing databases of labeled data. However, the data in the source databases is often obtained under conditions that differ from those in the new task. Transfer learning provides techniques for transferring learned knowledge from a source domain to a target domain by finding a mapping between them. In this paper, we discuss a method for projecting both source and target data to a generalized subspace where each target sample can be represented by some combination of source samples. By employing a low-rank constraint during this transfer, the structure of source and target domains are preserved. This approach has three benefits. First, good alignment between the domains is ensured through the use of only relevant data in some subspace of the source domain in reconstructing the data in the target domain. Second, the discriminative power of the source domain is naturally passed on to the target domain. Third, noisy information will be filtered out during knowledge transfer. Extensive experiments on synthetic data, and important computer vision problems such as face recognition application and visual domain adaptation for object recognition demonstrate the superiority of the proposed approach over the existing, well-established methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-Rank Transfer Learning

Transfer subspace learning joint low-rank representation and feature selection

Article 23 April 2022

Manifold transfer subspace learning based on double relaxed discriminative regression

Article 12 July 2023

Notes

So far, we still consider larger energies as being better for the subspace learning method. However, this will change later once we start minimizing rather than maximizing the objective function.
We use minimal of PCA instead of maximal to fit the LTSL.
Note that we use ULPP and UNPE to denote unsupervised LPP and NPE, and SLPP and SNPE to denote supervised LPP and NPE.

References

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems, Cambridge, MA: The MIT Press.
Arnold, A., Nallapati, R., & Cohen, W. (2007). A comparative study of methods for transductive transfer learning. IEEE International Conference on Data Mining (Workshops), 77–82.
Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. IEEE International Conference on Computer Vision, 2252–2259.
Bartels, R. H., & Stewart, G. (1972). Solution of the matrix equation ax+ xb= c [f4]. Communications of the ACM, 15(9), 820–826.
Article Google Scholar
Belhumeur, P., Hespanha, J., & Kriegman, D. (2002). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
Article Google Scholar
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.
Article MATH Google Scholar
Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, (pp. 120–128).
Blitzer, J., Foster, D., & Kakade, S. (2011). Domain adaptation with coupled subspaces. Journal of Machine Learning Research-Proceedings Track, 15, 173–181.
Google Scholar
Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20, 1956–1982.
Article MATH MathSciNet Google Scholar
Candès, E., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772.
Article MATH MathSciNet Google Scholar
Candes, E., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM.
Chen, M., Weinberger, K., & Blitzer, J. (2011). Co-training for domain adaptation. Advances in Neural Information Processing Systems.
Coppersmith, D., & Winograd, S. (1990). Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3), 251–280.
Article MATH MathSciNet Google Scholar
Dai, W., Xue, G., Yang, Q., & Yu, Y. (2007). Co-clustering based classification for out-of-domain documents. In ACM SIGKDD International Conference on Knowledge Discovery And Data Mining (ACM), (pp. 210–219).
Dai, W., Xue, G.R., Yang, Q., & Yu, Y. (2007b). Transferring naive bayes classifiers for text classification. In AAAI Conference on Artificial Intelligence (pp. 540–545).
Dai, W., Yang, Q., Xue, G., & Yu, Y. (2007c). Boosting for transfer learning. In International Conference on Machine learning, ACM (pp. 193–200).
Daumé, H. (2007). Frustratingly easy domain adaptation. Annual Meeting-Association for Computational Linguistics, 45, 256–263.
Google Scholar
Daumé, H, I. I. I., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26(1), 101–126.
MATH MathSciNet Google Scholar
Duan, L., Tsang, I.W., Xu, D., & Chua, T.S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning, ACM (pp. 289–296).
Duan, L., Xu, D., & Chang, S.F. (2012a). Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. IEEE Conference on Computer Vision and Pattern Recognition, 1338–1345.
Duan, L., Xu, D., & Tsang, I. (2012b). Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 504–518.
Article Google Scholar
Duan, L., Xu, D., Tsang, I. W. H., & Luo, J. (2012c). Visual event recognition in videos by learning from web data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.
Article Google Scholar
Eckstein, J., & Bertsekas, D. (1992). On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318.
Article MATH MathSciNet Google Scholar
Gao, J., Fan, W., Jiang, J., & Han, J. (2008). Knowledge transfer via multiple model local structure mapping. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (pp. 283–291).
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In International Conference on Machine Learning, ACM (pp. 513–520).
Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2066–2073).
Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In IEEE International Conference on Computer Vision (pp. 999–1006).
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. California Institute of Technology: Tech. rep.
He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems, vol 16. Cambridge, MA: The MIT Press.
He, X., Cai, D., Yan, S., & Zhang, H. (2005). Neighborhood preserving embedding. IEEE International Conference on Computer Vision, 2, 1208–1213.
Google Scholar
Ho, J., Yang, M., Lim, J., Lee, K., & Kriegman, D. (2003). Clustering appearances of objects under varying illumination conditions, vol 1. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–11).
Hoffman, J., Rodner, E., Donahue, J., Saenko, K., & Darrell, T. (2013). Efficient learning of domain-invariant image representations. arXiv, preprint arXiv:13013224.
Huang, D., Sun, J., & Wang, Y. (2012). The BUAA-VISNIR face database instructions. http://irip.buaa.edu.cn/research/The_BUAA-VisNir_Face_Database_Instructions.pdf.
Jhuo, I. H., Liu, D., Lee, D., & Chang. S. F. (2012) Robust visual domain adaptation with low-rank reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2168–2175).
Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation in NLP. Annual Meeting-Association for Computational Linguistics, 45, 264–271.
Google Scholar
Jiang, W., Zavesky, E., Chang, S. F., & Loui, A. (2008). Cross-domain learning methods for high-level visual concept classification. In IEEE International Conference on Image Processing (pp. 161–164).
Keshavan, R., Montanari, A., & Oh, S. (2010). Matrix completion from noisy entries. The Journal of Machine Learning Research, 99, 2057–2078.
MathSciNet Google Scholar
Kulis, B., Jain, P., & Grauman, K. (2009). Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2143–2157.
Article Google Scholar
Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1785–1792).
Lawrence, N., Platt, J. (2004). Learning to learn with the informative vector machine. In International Conference on Machine learning, ACM (pp. 65–72).
Lim, J., Salakhutdinov, R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In Advances in neural information processing systems. Cambridge, MA: The MIT Press.
Lin, Z., Chen, M., Wu, L., & Ma, Y. (2009). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report, UILU-ENG-09-2215.
Liu, G., Lin, Z., & Yu, Y. (2010). Robust subspace segmentation by low-rank representation. In International Conference on Machine Learning (pp. 663–670).
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., & Ma, Y. (2013). Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171–184.
Article Google Scholar
Lopez-Paz, D., Hernndez-Lobato, J., & Schölkopf, B. (2012). Semi-supervised domain adaptation with non-parametric copulas. In: Advances in neural information processing systems. Cambridge, MA: The MIT Press.
Lu, L., & Vidal, R. (2006). Combined central and subspace clustering for computer vision applications. In International Conference on Machine Learning, ACM (pp. 593–600).
Mihalkova, L., Huynh, T., & Mooney, R. (2007). Mapping and revising markov logic networks for transfer learning. In AAAI Conference on Artificial Intelligence (pp. 608–614).
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Qi, G.J., Aggarwal, C., Rui, Y., Tian, Q., Chang, S., & Huang, T. (2011). Towards cross-category knowledge propagation for learning visual concepts. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 897–904).
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: Transfer learning from unlabeled data. In International Conference on Machine Learning (pp. 759–766).
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Article Google Scholar
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In European Computer Vision Conference (pp. 213–226).
Shao, M., Xia, S., & Fu, Y., (2011). Genealogical face recognition based on ub kinface database. In IEEE Conference on Computer Vision and Pattern Recognition (Workshop on Biometrics) (pp. 65–70).
Shao, M., Castillo, C., Gu, Z., & Fu, Y. (2012). Low-rank transfer subspace learning. In IEEE International Conference on Data Mining (pp. 1104–1109).
Si, S., Tao, D., & Geng, B. (2010). Bregman divergence-based regularization for transfer subspace learning. IEEE Transactions on Knowledge and Data Engineering, 22(7), 929–942.
Article Google Scholar
Sun, Q., Chattopadhyay, R., Panchanathan, S., & Ye, J. (2011). A two-stage weighting framework for multi-source domain adaptation. In Advances in neural information processing systems. Cambridge, MA: The MIT Press.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Article Google Scholar
Wang, Z., Song, Y., & Zhang, C. (2008). Transferred dimensionality reduction. In Machine learning and knowledge discovery in databases (pp. 550–565). New York: Springer.
Wright, J., Ganesh, A., Rao, S., Peng, Y., & Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Advances in Neural Information Processing Systems, 22, 2080–2088.
Google Scholar
Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., & Lin, S. (2007). Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.
Google Scholar
Yang, J., Yan, R., & Hauptmann, A.G. (2007). Cross-domain video concept detection using adaptive svms. In International Conference on Multimedia, ACM (pp. 188–197).
Yang, J., Yin, W., Zhang, Y., & Wang, Y. (2009). A fast algorithm for edge-preserving variational multichannel image restoration. SIAM Journal on Imaging Sciences, 2(2), 569–592.
Article MATH MathSciNet Google Scholar
Zhang, C., Ye, J., & Zhang, L. (2012). Generalization bounds for domain adaptation. In Advances in neural information processing systems, Cambridge, MA: The MIT Press.
Zhang, T., Tao, D., & Yang, J. (2008). Discriminative locality alignment. In European conference on computer vision (pp. 725–738). New York: Springer.

Download references

Acknowledgments

This research is supported in part by the NSF CNS award 1314484, Office of Naval Research award N00014-12-1-1028, Air Force Office of Scientific Research award FA9550-12-1-0201, U.S. Army Research Office grant W911NF-13-1-0160, and IC Postdoc Program Grant 2011-11071400006.

Author information

Authors and Affiliations

140 The Fenway, Boston, MA , 02115, USA
Ming Shao & Dmitry Kit
403 Dana Research Center 360 Huntington Avenue, Boston, MA , 02115, USA
Yun Fu

Authors

Ming Shao
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Kit
View author publications
You can also search for this author in PubMed Google Scholar
Yun Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Fu.

Appendix

1.1 Proof of Lemma 1

Proof

$$\begin{aligned} \Vert AB\Vert _F^2&= \sum _j \Vert A [B]_{:,j}\Vert _2^2 \le \sum _j \Vert A\Vert _F^2\Vert [B]_{:,j} \Vert _2^2 \\&= \Vert A\Vert _F^2 \sum _j \Vert [B]_{:,j}\Vert _2^2 = \Vert A\Vert _F^2 \Vert B\Vert _F^2. \end{aligned}$$

$\square $

Next, due to space limitations, we only show a proof of Theorem 2 based on a specific learning method, Principal Component Analysis (PCA). Proof of others follows a similar scheme.

1.2 Proof of Theorem 2

Proof

Suppose $X_\mathrm{s}$ and $X_\mathrm{t}$ are strictly drawn from $\mathcal {S}_i$ and $\mathcal {T}_i$. We use $Y_\mathrm{s} = P^\mathrm{T} X_\mathrm{s}$ and $Y_\mathrm{t} = P^\mathrm{T} X_\mathrm{t}$ to denote their low-dimensional representations in the subspace $P$. Both $Y_\mathrm{s}$ and $Y_\mathrm{t}$ are of size $m \times n$. Therefore, the energy of PCA in the source domain $\mathcal {S}_i$ is:

$$\begin{aligned} F(P, \mathcal {S}_i) = \frac{1}{n-1} \left\| Y_\mathrm{s} - Y_\mathrm{s}*\frac{1}{n}\mathbf {e}\mathbf {e^\mathrm{T}}\right\| _F^2, \end{aligned}$$

(17)

where $\mathbf {e}$ is $\{\mathbf {e}_i = 1 | i = 1,2, \ldots , n\}$. For simplicity, we remove the constant term $\frac{1}{n-1}$ and replace $\frac{1}{n}\mathbf {e}\mathbf {e^\mathrm{T}}$ with the matrix $C$. Then the energy of PCA in $\mathcal {S}_i$ and $\mathcal {T}_i$ can be rewritten as:

$$\begin{aligned} F(P, \mathcal {S}_i)&= \left\| Y_\mathrm{s} - Y_\mathrm{s}*C\right\| _F^2.\end{aligned}$$

(18)

$$\begin{aligned} F(P, \mathcal {T}_i)&= \left\| Y_\mathrm{s} Z_i - Y_\mathrm{s}*C Z_i \right\| _F^2. \end{aligned}$$

(19)

Since $\widehat{Z}_i$ is non-singular, we have $\widehat{Z}_i\widehat{Z}_i^{-1} = \mathbf {I}$ and the above function can be rewritten as:

$$\begin{aligned} F(P, \mathcal {S}_i)&= \Vert Y_\mathrm{s} \widehat{Z}_i\widehat{Z}_i^{-1} - Y_\mathrm{s}*C \widehat{Z}_i\widehat{Z}_i^{-1} \Vert _F^2\nonumber \\&= \Vert \left( Y_\mathrm{s} \widehat{Z}_i - Y_\mathrm{s}*C \widehat{Z}_i \right) \widehat{Z}_i^{-1} \Vert _F^2\nonumber \\&\le \Vert Y_\mathrm{s} \widehat{Z}_i - Y_\mathrm{s}*C \widehat{Z}_i \Vert _F^2 \Vert \widehat{Z}_i^{-1} \Vert _F^2. \end{aligned}$$

(20)

Note that $\Vert Y_\mathrm{s} \widehat{Z}_i - Y_\mathrm{s}*C \widehat{Z}_i \Vert _F^2 $ is the PCA energy of the data from the target domain that has been perturbed by $\widehat{Z}_i$. Therefore, $F(P, \widehat{\mathcal {T}}_i) = \Vert Y_\mathrm{s} \widehat{Z}_i - Y_\mathrm{s}*C \widehat{Z}_i \Vert _F^2$. Combining that with the inequality in (20) results in:

$$\begin{aligned} F(P, \mathcal {\widehat{T}}_i) \ge F(P, \mathcal {S}_i) \Vert \widehat{Z}_i^{-1}\Vert _F^{-2}. \end{aligned}$$

(21)

If we add $F(P, \mathcal {{T}}_i)$ to both sides of Inequality (21), then we derive Inequality (8). $\square $

There are three points to be made. First, the difference between $F(P, \mathcal {T}_i)$ and $F(P, \widehat{\mathcal {T}}_i)$ is that the first one is the PCA energy of $Y_\mathrm{s} Z_i$ while the second one is the PCA energy of $Y_\mathrm{s} \widehat{Z}_i$ where $\widehat{Z}_i = Z_i + \gamma \mathbf {I}$. Compared with $Y_\mathrm{s} Z_i$, $Y_\mathrm{s} \widehat{Z}_i$ adds a small term to each vector. However, this will not cause a significant change in $F(P, \mathcal {T}_i)$. Therefore, term $\xi $ in Theorem 2 will not be very large. Second, although we only compare the PCA energy of $\mathcal {S}_i$ and $\mathcal {T}_i$ that are drawn from one subspace, this theorem is easily extended to any other subspace included in the subspace union. Finally, other subspace learning methods can be similarly proven, since they are all unified in the linear graph embedding framework (Yan et al. 2007). For example, the proof for LDA holds the value of $\mathbf {Tr}(P^\mathrm{T}SP)$ fixed, where $S=S_\mathrm{b} + S_\mathrm{w}$, and then maximizes $\mathbf {Tr}(P^\mathrm{T}S_\mathrm{b}P)$ in the same way as was done for PCA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, M., Kit, D. & Fu, Y. Generalized Transfer Subspace Learning Through Low-Rank Constraint. Int J Comput Vis 109, 74–93 (2014). https://doi.org/10.1007/s11263-014-0696-6

Download citation

Received: 31 March 2013
Accepted: 02 January 2014
Published: 31 January 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s11263-014-0696-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized Transfer Subspace Learning Through Low-Rank Constraint

Abstract

Access this article

Similar content being viewed by others

Low-Rank Transfer Learning

Transfer subspace learning joint low-rank representation and feature selection

Manifold transfer subspace learning based on double relaxed discriminative regression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized Transfer Subspace Learning Through Low-Rank Constraint

Abstract

Access this article

Similar content being viewed by others

Low-Rank Transfer Learning

Transfer subspace learning joint low-rank representation and feature selection

Manifold transfer subspace learning based on double relaxed discriminative regression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation