Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?
- 1 Citations
- 3.4k Downloads
Abstract
One transfer learning approach that has gained a wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate a prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the “correct meaning” of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution. We demonstrate this overlooked fact on some synthetic and real datasets. The data and software related to this paper are available at https://mine.kaust.edu.sa/Pages/zero-shot-learning.aspx.
Keywords
Zero-shot learning Attribute-based classification Multi-label classificationNotes
Acknowledgment
Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST) and the Saudi Arabian Oil Company (Saudi Aramco).
References
- 1.Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.T.: Learning from data (2012)Google Scholar
- 2.Alabdulmohsin, I.: Algorithmic stability and uniform generalization. In: NIPS, pp. 19–27. Curran Associates, Inc. (2015)Google Scholar
- 3.Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 137–144. MIT Press, Cambridge (2006). http://books.nips.cc/papers/files/nips19/NIPS2006_0838.pdf
- 4.Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
- 5.Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) Machine Learning 2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004)CrossRefGoogle Scholar
- 6.Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
- 7.Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: ICLR: Workshop Track (2015). arXiv:1412.6568
- 8.Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
- 9.Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR, pp. 1778–1785. IEEE (2009)Google Scholar
- 10.Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)CrossRefGoogle Scholar
- 11.Haehl, V., Vardaxis, V., Ulrich, B.: Learning to cruise: Bernstein’s theory applied to skill acquisition during infancy. Hum. Mov. Sci. 19(5), 685–715 (2000)CrossRefGoogle Scholar
- 12.Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NIPS, pp. 3464–3472 (2014)Google Scholar
- 13.Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 951–958. IEEE (2009)Google Scholar
- 14.Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRefGoogle Scholar
- 15.Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR, pp. 3337–3344. IEEE (2011)Google Scholar
- 16.Rader, N., Bausano, M., Richards, J.E.: On the nature of the visual-cliff-avoidance response in human infants. Child Dev. 51(1), 61–68 (1980)CrossRefGoogle Scholar
- 17.Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS, pp. 1410–1418 (2009)Google Scholar
- 18.Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML, pp. 2152–2161 (2015)Google Scholar
- 19.Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRefzbMATHGoogle Scholar
- 20.Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23528-8_9 CrossRefGoogle Scholar
- 21.Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: NIPS, pp. 935–943 (2013)Google Scholar
- 22.Thrun, S., Mitchell, T.M.: Lifelong robot learning. Rob. Auton. Syst. 15, 25–46 (1995)CrossRefGoogle Scholar
- 23.Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)Google Scholar