Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets

Ebert, Sandra; Fritz, Mario; Schiele, Bernt

doi:10.1007/978-3-642-37331-2_18

Sandra Ebert²⁰,
Mario Fritz²⁰ &
Bernt Schiele²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7724))

Included in the following conference series:

Asian Conference on Computer Vision

8436 Accesses
3 Citations

Abstract

Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hein, M., Maier, M.: Manifold Denoising. In: NIPS (2006)
Google Scholar
Zhou, D., Huang, J.: Learning from Labeled and Unlabeled Data on a Directed Graph. In: ICML (2005)
Google Scholar
Liu, W., Chang, S.: Robust multi-class transductive learning with graphs. In: CVPR (2009)
Google Scholar
Ebert, S., Larlus, D., Schiele, B.: Extracting Structures in Image Collections for Object Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 720–733. Springer, Heidelberg (2010)
Chapter Google Scholar
Delalleau, O., Bengio, Y., Le Roux, N.: Efficient non-parametric function induction in semi-supervised learning. In: AISTATS (2005)
Google Scholar
Liu, W., He, J., Chang, S.: Large graph construction for scalable semi-supervised learning. In: ICML (2010)
Google Scholar
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. In: ICCV (2007)
Google Scholar
Collins, B., Deng, J., Li, K., Fei-Fei, L.: Towards Scalable Dataset Construction: An Active Learning Approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008)
Chapter Google Scholar
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV (2009)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Perronnin, F., Liu, Y., Sánchez, J.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR (2010)
Google Scholar
Deselaers, T., Ferrari, V.: Visual and Semantic Similarity in ImageNet. In: CVPR (2011)
Google Scholar
Rohrbach, M., Stark, M., Schiele, B.: Evaluating Knowledge Transfer and Zero-Shot Learning in a Large-Scale Setting. In: CVPR (2011)
Google Scholar
Zhou, D., Schölkopf, B., Bousquet, O., Lal, T.N., Weston, J.: Learning with Local and Global Consistency. In: NIPS (2004)
Google Scholar
Sindhwani, V., Niyogi, P., Belkin: Beyond the point cloud: from transductive to semi-supervised learning. ML (2005)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML (2003)
Google Scholar
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. TKDE 1, 55–67 (2007)
Google Scholar
Zhang, Z., Wang, J., Zha, H.: Adaptive Manifold Learning. TPAMI, 1–14 (2011)
Google Scholar
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008)
Google Scholar
Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS (2009)
Google Scholar
Zhang, Z., Zha, H., Zhang, M., Tech, G.: Spectral Methods for Semi-supervised Manifold Learning. In: CVPR (2008)
Google Scholar
Zhang, K., Kwok, J.T., Parvin, B.: Prototype vector machine for large scale semi-supervised learning. In: ICML (2009)
Google Scholar
Li, Y.F., Zhou, Z.H.: Towards Making Unlabeled Data Never Hurt. In: ICML (2011)
Google Scholar
Ebert, S., Fritz, M., Schiele, B.: Reinforced Active Learning: An Object Class Learning-By-Doing Approach. In: CVPR (2012)
Google Scholar
Leibe, B., Schiele, B.: Analyzing Appearance and Contour Based Methods for Object Categorization. In: CVPR (2003)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV (2001)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFEAT: An Open and Portable Library of Computer Vision Algorithms (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrucken, Germany
Sandra Ebert, Mario Fritz & Bernt Schiele

Authors

Sandra Ebert
View author publications
You can also search for this author in PubMed Google Scholar
Mario Fritz
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, 151-744, Gwanak-gu, Seoul, Korea
Kyoung Mu Lee
Microsoft Research Asia, No. 5, Danling st., Haidian district, 100080, Beijing, P.R. China
Yasuyuki Matsushita
School of Interactive Computing, Georgia Institute of Technology, 801 Atlantic Drive, CCB 315, 30332, Atlanta, GA, USA
James M. Rehg
Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Zhong Quan Cun East Road 95, Haidian District, 100 190, Beijing, P.R. China
Zhanyi Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ebert, S., Fritz, M., Schiele, B. (2013). Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37331-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-37331-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37330-5
Online ISBN: 978-3-642-37331-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics