Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning

Taherkhani, Fariborz; Dabouei, Ali; Soleymani, Sobhan; Dawson, Jeremy; Nasrabadi, Nasser M.

doi:10.1007/978-3-030-58548-8_30

Fariborz Taherkhani¹²,
Ali Dabouei¹²,
Sobhan Soleymani¹²,
Jeremy Dawson¹² &
…
Nasser M. Nasrabadi¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12349))

Included in the following conference series:

European Conference on Computer Vision

5114 Accesses
5 Citations

Abstract

Semi-Supervised Learning (SSL) based on Convolutional Neural Networks (CNNs) have recently been proven as powerful tools for standard tasks such as image classification when there is not a sufficient amount of labeled data available during the training. In this work, we consider the general setting of the SSL problem for image classification, where the labeled and unlabeled data come from the same underlying distribution. We propose a new SSL method that adopts a hierarchical Optimal Transport (OT) technique to find a mapping from empirical unlabeled measures to corresponding labeled measures by leveraging the minimum amount of transportation cost in the label space. Based on this mapping, pseudo-labels for the unlabeled data are inferred, which are then used along with the labeled data for training the CNN. We evaluated and compared our method with state-of-the-art SSL approaches on standard datasets to demonstrate the superiority of our SSL method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agueh, M., Carlier, G.: Barycenters in the wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Article MathSciNet Google Scholar
Álvarez-Esteban, P.C., del Barrio, E., Cuesta-Albertos, J., Matrán, C.: A fixed-point approach to barycenters in wasserstein space. J. Math. Anal. Appl. 441(2), 744–762 (2016)
Article MathSciNet Google Scholar
Alvarez-Melis, D., Jaakkola, T., Jegelka, S.: Structured optimal transport. In: International Conference on Artificial Intelligence and Statistics, pp. 1771–1780 (2018)
Google Scholar
Amari, S.: Information Geometry and Its Applications. AMS, vol. 194. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Book MATH Google Scholar
Amari, S., Karakida, R., Oizumi, M.: Information geometry connecting wasserstein distance and kullback–leibler divergence via the entropy-relaxed transportation problem. Inform. Geom. 1(1), 13–37 (2018). https://doi.org/10.1007/s41884-018-0002-8
Article MathSciNet MATH Google Scholar
Anderes, E., Borgwardt, S., Miller, J.: Discrete wasserstein barycenters: optimal transport for discrete data. Math. Methods Oper. Res. 84(2), 389–409 (2016)
Article MathSciNet Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
Athiwaratkun, B., Finzi, M., Izmailov, P., Wilson, A.G.: There are many consistent explanations of unlabeled data: why you should average. In: International Conference on Learning Representations (2019)
Google Scholar
Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, pp. 3365–3373 (2014)
Google Scholar
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(Nov), 2399–2434 (2006)
MathSciNet MATH Google Scholar
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific Belmont, MA (1997)
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Article Google Scholar
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 601–608 (2003)
Google Scholar
Chen, Y., Ye, J., Li, J.: Aggregated wasserstein distance and state registration for hidden markov models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Google Scholar
Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1853–1865 (2017)
Article Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)
Google Scholar
Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014)
Google Scholar
Damodaran, B.B., Kellenberger, B., Flamary, R., Tuia, D., Courty, N.: Deepjdot: deep joint distribution optimal transport for unsupervised domain adaptation. In: European Conference on Computer Vision, pp. 467–483. Springer (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dong-DongChen, W., WeiGao, Z.H.: Tri-net for semi-supervised deep learning. IJCAI (2018)
Google Scholar
Frogner, C., Zhang, C., Mobahi, H., Araya, M., Poggio, T.A.: Learning with a wasserstein loss. In: Advances in Neural Information Processing Systems, pp. 2053–2061 (2015)
Google Scholar
Genevay, A., Chizat, L., Bach, F., Cuturi, M., Peyré, G.: Sample complexity of sinkhorn divergences. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1574–1583 (2019)
Google Scholar
Ho, N., Nguyen, X.L., Yurochkin, M., Bui, H.H., Huynh, V., Phung, D.: Multilevel clustering via wasserstein means. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 1501–1509. JMLR. org (2017)
Google Scholar
Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5070–5079 (2019)
Google Scholar
Jia, Y., Kwong, S., Hou, J.: Semi-supervised spectral clustering with structured sparsity regularization. IEEE Signal Process. Lett. 25(3), 403–407 (2018)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)
Article Google Scholar
Kolouri, S., Zou, Y., Rohde, G.K.: Sliced wasserstein kernels for probability distributions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5258–5267 (2016)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10285–10295 (2019)
Google Scholar
Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, vol. 3, p. 2. ICML (2013)
Google Scholar
Lee, J., Dabagia, M., Dyer, E., Rozell, C.: Hierarchical optimal transport for multimodal distribution alignment. In: Advances in Neural Information Processing Systems, pp. 13453–13463 (2019)
Google Scholar
Liu, X., Van De Weijer, J., Bagdanov, A.D.: Exploiting unlabeled data in cnns by self-supervised learning to rank. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1862–1878 (2019)
Article Google Scholar
Luo, Y., Zhu, J., Li, M., Ren, Y., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8896–8905 (2018)
Google Scholar
Mi, L., Zhang, W., Gu, X., Wang, Y.: Variational wasserstein clustering. arXiv preprint arXiv:1806.09045 (2018)
Miyato, T., Maeda, S., Ishii, S., Koyama, M.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1979–1993 (2018)
Article Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, p. 5 (2011)
Google Scholar
Nguyen, X., et al.: Borrowing strengh in hierarchical bayes: posterior concentration of the dirichlet base measure. Bernoulli 22(3), 1535–1571 (2016)
Article MathSciNet Google Scholar
Oliver, A., Odena, A., Raffel, C.A., Cubuk, E.D., Goodfellow, I.: Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Information Processing Systems, pp. 3235–3246 (2018)
Google Scholar
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
Pollard, D.: Quantization and the method of k-means. IEEE Trans. Inform. Theory 28(2), 199–205 (1982)
Article MathSciNet Google Scholar
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems, pp. 3546–3554 (2015)
Google Scholar
Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 1163–1171 (2016)
Google Scholar
Santambrogio, F.: Optimal transport for applied mathematicians. Birkauser NY 55, 58–63 (2015)
MATH Google Scholar
Schmitzer, B., Schnörr, C.: A hierarchical approach to optimal transport. In: Kuijper, A., Bredies, K., Pock, T., Bischof, H. (eds.) SSVM 2013. LNCS, vol. 7893, pp. 452–464. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38267-3_38
Chapter Google Scholar
Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Shi, W., Gong, Y., Ding, C., MaXiaoyu Tao, Z., Zheng, N.: Transductive semi-supervised deep learning using min-max features. In: The European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Solomon, J., et al.: Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. (TOG) 34(4), 66 (2015)
Article Google Scholar
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27
Chapter Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)
Google Scholar
Villani, C.: Optimal transport: old and new, vol. 338. Springer Science & Business Media (2008)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)
Google Scholar
Vural, E., Guillemot, C.: A study of the classification of low-dimensional data with supervised manifold learning. J. Mach. Learn. Res. 18, 1–157 (2017)
MathSciNet MATH Google Scholar
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Chapter Google Scholar
Yan, Y., Li, W., Wu, H., Min, H., Tan, M., Wu, Q.: Semi-supervised optimal transport for heterogeneous domain adaptation. In: IJCAI, pp. 2969–2975 (2018)
Google Scholar
Ye, J., Wu, P., Wang, J.Z., Li, J.: Fast discrete distribution clustering using wasserstein barycenter with sparse support. IEEE Trans. Signal Process. 65(9), 2317–2332 (2017)
Article MathSciNet Google Scholar
Yu, B., Wu, J., Ma, J., Zhu, Z.: Tangent-normal adversarial regularization for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10676–10684 (2019)
Google Scholar
Yurochkin, M., Claici, S., Chien, E., Mirzazadeh, F., Solomon, J.M.: Hierarchical optimal transport for document representation. In: Advances in Neural Information Processing Systems, pp. 1599–1609 (2019)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

West Virginia University, Morgantown, USA
Fariborz Taherkhani, Ali Dabouei, Sobhan Soleymani, Jeremy Dawson & Nasser M. Nasrabadi

Authors

Fariborz Taherkhani
View author publications
You can also search for this author in PubMed Google Scholar
Ali Dabouei
View author publications
You can also search for this author in PubMed Google Scholar
Sobhan Soleymani
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Dawson
View author publications
You can also search for this author in PubMed Google Scholar
Nasser M. Nasrabadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fariborz Taherkhani .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taherkhani, F., Dabouei, A., Soleymani, S., Dawson, J., Nasrabadi, N.M. (2020). Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-58548-8_30
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics