Abstract
Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hein, M., Maier, M.: Manifold Denoising. In: NIPS (2006)
Zhou, D., Huang, J.: Learning from Labeled and Unlabeled Data on a Directed Graph. In: ICML (2005)
Liu, W., Chang, S.: Robust multi-class transductive learning with graphs. In: CVPR (2009)
Ebert, S., Larlus, D., Schiele, B.: Extracting Structures in Image Collections for Object Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 720–733. Springer, Heidelberg (2010)
Delalleau, O., Bengio, Y., Le Roux, N.: Efficient non-parametric function induction in semi-supervised learning. In: AISTATS (2005)
Liu, W., He, J., Chang, S.: Large graph construction for scalable semi-supervised learning. In: ICML (2010)
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. In: ICCV (2007)
Collins, B., Deng, J., Li, K., Fei-Fei, L.: Towards Scalable Dataset Construction: An Active Learning Approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV (2009)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR (2009)
Perronnin, F., Liu, Y., Sánchez, J.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR (2010)
Deselaers, T., Ferrari, V.: Visual and Semantic Similarity in ImageNet. In: CVPR (2011)
Rohrbach, M., Stark, M., Schiele, B.: Evaluating Knowledge Transfer and Zero-Shot Learning in a Large-Scale Setting. In: CVPR (2011)
Zhou, D., Schölkopf, B., Bousquet, O., Lal, T.N., Weston, J.: Learning with Local and Global Consistency. In: NIPS (2004)
Sindhwani, V., Niyogi, P., Belkin: Beyond the point cloud: from transductive to semi-supervised learning. ML (2005)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML (2003)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. TKDE 1, 55–67 (2007)
Zhang, Z., Wang, J., Zha, H.: Adaptive Manifold Learning. TPAMI, 1–14 (2011)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008)
Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS (2009)
Zhang, Z., Zha, H., Zhang, M., Tech, G.: Spectral Methods for Semi-supervised Manifold Learning. In: CVPR (2008)
Zhang, K., Kwok, J.T., Parvin, B.: Prototype vector machine for large scale semi-supervised learning. In: ICML (2009)
Li, Y.F., Zhou, Z.H.: Towards Making Unlabeled Data Never Hurt. In: ICML (2011)
Ebert, S., Fritz, M., Schiele, B.: Reinforced Active Learning: An Object Class Learning-By-Doing Approach. In: CVPR (2012)
Leibe, B., Schiele, B.: Analyzing Appearance and Contour Based Methods for Object Categorization. In: CVPR (2003)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV (2001)
Vedaldi, A., Fulkerson, B.: VLFEAT: An Open and Portable Library of Computer Vision Algorithms (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ebert, S., Fritz, M., Schiele, B. (2013). Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37331-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-37331-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37330-5
Online ISBN: 978-3-642-37331-2
eBook Packages: Computer ScienceComputer Science (R0)