Iterative Category Discovery via Multiple Kernel Metric Learning

Abstract

The goal of an object category discovery system is to annotate a pool of unlabeled image data, where the set of labels is initially unknown to the system, and must therefore be discovered over time by querying a human annotator. The annotated data is then used to train object detectors in a standard supervised learning setting, possibly in conjunction with category discovery itself. Category discovery systems can be evaluated in terms of both accuracy of the resulting object detectors, and the efficiency with which they discover categories and annotate the training data. To improve the accuracy and efficiency of category discovery, we propose an iterative framework which alternates between optimizing nearest neighbor classification for known categories with multiple kernel metric learning, and detecting clusters of unlabeled image regions likely to belong to a novel, unknown categories. Experimental results on the MSRC and PASCAL VOC2007 data sets show that the proposed method improves clustering for category discovery, and efficiently annotates image regions belonging to the discovered classes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    In this setting, a true ranking is any ranking which places all relevant results before all irrelevant results.

  2. 2.

    The Hilbert-Schmidt norm is a natural generalization of the Frobenius norm. For our purposes, this can be understood as treating \(L\) as a collection of \(n\) elements \(v_i \in \mathcal {H}\) (one per output dimension of \(L\)), and summing over the squared-norms: \(\Vert L\Vert _\text {HS}=\sqrt{\sum _i \langle v_i, v_i\rangle _\mathcal {H}}\).

  3. 3.

    Familiarity refers to a segment’s true label, which may or may not be available: an unlabeled or test segment may be familiar or unfamiliar.

  4. 4.

    We chose spectral clustering over agglomerative clustering in this set of experiments to facilitate direct comparison to Lee and Grauman (2010).

  5. 5.

    Weak labeling in PASCAL dataset makes it difficult to evaluate due to background segments without ground truth.

  6. 6.

    In Table 6, MKLMNN (Galleguillos et al. 2010) has no MAP score for class tree because there was only one test segment of that class predicted as unfamiliar.

References

  1. Bart, E., Porteous, I., Perona, P., & Welling, M. (2008). Unsupervised learning of visual taxonomies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).

  2. Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., et al. (2010). Visual recognition with humans in the loop. In European conference in computer vision (ECCV) (pp. 438–451)

  3. Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In Computer Vision—ECCV.

  4. Cortes, C., & Vapnik, V. (1995). Support-vector networks. The Journal of Machine Learning Research, 20(3), 273–297.

    MATH  Google Scholar 

  5. Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4), 364–366.

    Article  MATH  MathSciNet  Google Scholar 

  6. Everingham, M, Van Gool, L, Williams, CKI, Winn, J, Zisserman, A (2007). The PASCAL visual object classes, challenge 2007 (VOC2007) Results.

  7. Faktor, A., & Irani, M. (2012). “Clustering by composition”—unsupervised discovery of image categories. In European conference in computer vision (ECCV) (pp. 474–487). Springer.

  8. Forsyth, D. A., Malik, J., Fleck, M. M., Greenspan, H., Leung, T., Belongie, S., et al. (1995). Finding pictures of objects in large collections of images. The Computer Journal, 1144, 335–360.

    Google Scholar 

  9. Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In International conference in computer vision (ICCV) (pp. 1–8).

  10. Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2010). Multi-class object localization by combining local contextual interactions. Computer vision and pattern recognition (CVPR) (pp. 113–120).

  11. Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2011). From region similarity to category discovery. In Computer vision and pattern recognition (CVPR) (pp. 2665–2672).

  12. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference in computer vision (ICCV).

  13. Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite embedding. In International conference on artificial intelligence and statistics (AISTATS).

  14. Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In Computer vision and pattern recognition (CVPR).

  15. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In European conference in computer vision (ECCV) (pp. 30–43). Springer : In .

  16. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.

    Article  Google Scholar 

  17. Joachims, T. (2005). A support vector method for multivariate performance measures. In International conference on machine learning (pp. 377–384).

  18. Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural svms. The Journal of Machine Learning Research, 77(1), 27–59.

    Article  MATH  Google Scholar 

  19. Kang, H., Hebert, M., Efros, A. A., & Kanade, T. (2012). Connecting missing links: object discovery from sparse observations using 5 million product images. European conference in computer vision (ECCV) (pp. 794–807). Springer.

  20. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27–72.

    MATH  Google Scholar 

  21. Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. In Computer vision and pattern recognition (CVPR).

  22. Lee, Y., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Computer vision and pattern recognition (CVPR) (pp. 1721–1728).

  23. McFee, B., & Lanckriet, G. (2010). Metric learning to rank. In International conference on machine learning.

  24. Meila, M., & Shi, J. (2001). Learning Segmentation by Random Walks. Advances in neural information processing systems.

  25. Rabinovich, A., Lange, T., Buhmann, J., & Belongie, S. (2006). Model order selection and cue combination for image segmentation. In Computer vision and pattern recognition (CVPR).

  26. Russell, B., Freeman, W., Efros, A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Computer vision and pattern recognition (CVPR).

  27. Schölkopf, B., Herbrich, R., Smola, A. J., & Williamson, R. (2001). A generalized representer theorem. In Computational learning theory (pp. 416–426).

  28. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  29. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In International conference in computer vision (ICCV).

  30. Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).

  31. Tian, Y., Liu, W., Xiao, R., Wen, F., & Tang, X. (2007). A face annotation framework with partial clustering and interactive labeling. In Computer vision and pattern recognition (CVPR) (pp. 1–8).

  32. Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In Computer vision and pattern recognition (CVPR).

  33. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research, 6, 1453–1484.

    MATH  MathSciNet  Google Scholar 

  34. Tuytelaars, T., Lampert, C., Blaschko, M., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.

    Article  Google Scholar 

  35. Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In International conference in computer vision (ICCV).

  36. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International conference in computer vision (ICCV).

  37. Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you? Predicting effort vs. informativeness for multi-label image annotations. In Computer vision and pattern recognition (CVPR).

  38. Wang, G., Hoiem, D., & Forsyth, D. (2010). Learning image similarity from flickr groups using stochastic intersection kernel machines. In Computer vision and pattern recognition (CVPR).

  39. Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in neural information processing systems.

  40. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.

    Article  Google Scholar 

  41. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In International conference in computer vision (ICCV) (Vol. 2, pp. 1800–1807).

  42. Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Machine Learning.

  43. Zhu, J. Y., Wu, J., Wei, Y., Chang, E., & Tu, Z. (2012). Unsupervised object class discovery via saliency-guided multiple class learning. In Computer vision and pattern recognition (CVPR) (pp. 3218–3225).

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Carolina Galleguillos.

Appendix: Implementation

Appendix: Implementation

The implementation uses the 1-slack margin-rescaling cutting plane algorithm (Joachims et al. 2009) to solve for all \(W^t\) within a prescribed tolerance \(\epsilon = 0.01\). We further constrain each \(W^t\) to be a diagonal matrix. This simplifies the semi-definite program to a linear program. For \(m\) kernels and \(n\) training points, this also reduces the number of parameters needed to learn from \(O(mn^{2})\) (\(m\) symmetric \(n\)-by-\(n\) matrices) to \(mn\).

In all experiments with MKMLR, we choose the ranking loss \(\Delta \) as the normalized discounted cumulative gain (NDCG) (Järvelin and Kekäläinen 2002) truncated at \(10\). Slack parameters \(C\) and kernel bandwidth \(\sigma \) for spectral clustering were found by cross-validation on the training set. For testing, we fix \(k=17\) as the number of nearest neighbors for classification across all experiments. Multiple stable segmentations were computed—9 different segmentations for each image—each of which contains between \(2\) and \(10\) segments, resulting in 54 segments per image (Rabinovich et al. 2006; Shi and Malik 2000).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Galleguillos, C., McFee, B. & Lanckriet, G.R.G. Iterative Category Discovery via Multiple Kernel Metric Learning. Int J Comput Vis 108, 115–132 (2014). https://doi.org/10.1007/s11263-013-0679-z

Download citation

Keywords

  • Category discovery
  • Metric learning
  • Multiple kernel learning
  • Iterative discovery