Iterative Category Discovery via Multiple Kernel Metric Learning

  • Carolina Galleguillos
  • Brian McFee
  • Gert R. G. Lanckriet
Article

Abstract

The goal of an object category discovery system is to annotate a pool of unlabeled image data, where the set of labels is initially unknown to the system, and must therefore be discovered over time by querying a human annotator. The annotated data is then used to train object detectors in a standard supervised learning setting, possibly in conjunction with category discovery itself. Category discovery systems can be evaluated in terms of both accuracy of the resulting object detectors, and the efficiency with which they discover categories and annotate the training data. To improve the accuracy and efficiency of category discovery, we propose an iterative framework which alternates between optimizing nearest neighbor classification for known categories with multiple kernel metric learning, and detecting clusters of unlabeled image regions likely to belong to a novel, unknown categories. Experimental results on the MSRC and PASCAL VOC2007 data sets show that the proposed method improves clustering for category discovery, and efficiently annotates image regions belonging to the discovered classes.

Keywords

Category discovery Metric learning  Multiple kernel learning Iterative discovery 

References

  1. Bart, E., Porteous, I., Perona, P., & Welling, M. (2008). Unsupervised learning of visual taxonomies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  2. Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., et al. (2010). Visual recognition with humans in the loop. In European conference in computer vision (ECCV) (pp. 438–451)Google Scholar
  3. Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In Computer Vision—ECCV.Google Scholar
  4. Cortes, C., & Vapnik, V. (1995). Support-vector networks. The Journal of Machine Learning Research, 20(3), 273–297.MATHGoogle Scholar
  5. Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4), 364–366.CrossRefMATHMathSciNetGoogle Scholar
  6. Everingham, M, Van Gool, L, Williams, CKI, Winn, J, Zisserman, A (2007). The PASCAL visual object classes, challenge 2007 (VOC2007) Results.Google Scholar
  7. Faktor, A., & Irani, M. (2012). “Clustering by composition”—unsupervised discovery of image categories. In European conference in computer vision (ECCV) (pp. 474–487). Springer.Google Scholar
  8. Forsyth, D. A., Malik, J., Fleck, M. M., Greenspan, H., Leung, T., Belongie, S., et al. (1995). Finding pictures of objects in large collections of images. The Computer Journal, 1144, 335–360.Google Scholar
  9. Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In International conference in computer vision (ICCV) (pp. 1–8).Google Scholar
  10. Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2010). Multi-class object localization by combining local contextual interactions. Computer vision and pattern recognition (CVPR) (pp. 113–120).Google Scholar
  11. Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2011). From region similarity to category discovery. In Computer vision and pattern recognition (CVPR) (pp. 2665–2672).Google Scholar
  12. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference in computer vision (ICCV).Google Scholar
  13. Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite embedding. In International conference on artificial intelligence and statistics (AISTATS).Google Scholar
  14. Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In Computer vision and pattern recognition (CVPR).Google Scholar
  15. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In European conference in computer vision (ECCV) (pp. 30–43). Springer : In .Google Scholar
  16. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  17. Joachims, T. (2005). A support vector method for multivariate performance measures. In International conference on machine learning (pp. 377–384).Google Scholar
  18. Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural svms. The Journal of Machine Learning Research, 77(1), 27–59.CrossRefMATHGoogle Scholar
  19. Kang, H., Hebert, M., Efros, A. A., & Kanade, T. (2012). Connecting missing links: object discovery from sparse observations using 5 million product images. European conference in computer vision (ECCV) (pp. 794–807). Springer.Google Scholar
  20. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27–72.MATHGoogle Scholar
  21. Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. In Computer vision and pattern recognition (CVPR).Google Scholar
  22. Lee, Y., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Computer vision and pattern recognition (CVPR) (pp. 1721–1728).Google Scholar
  23. McFee, B., & Lanckriet, G. (2010). Metric learning to rank. In International conference on machine learning.Google Scholar
  24. Meila, M., & Shi, J. (2001). Learning Segmentation by Random Walks. Advances in neural information processing systems. Google Scholar
  25. Rabinovich, A., Lange, T., Buhmann, J., & Belongie, S. (2006). Model order selection and cue combination for image segmentation. In Computer vision and pattern recognition (CVPR).Google Scholar
  26. Russell, B., Freeman, W., Efros, A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Computer vision and pattern recognition (CVPR).Google Scholar
  27. Schölkopf, B., Herbrich, R., Smola, A. J., & Williamson, R. (2001). A generalized representer theorem. In Computational learning theory (pp. 416–426).Google Scholar
  28. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
  29. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In International conference in computer vision (ICCV).Google Scholar
  30. Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  31. Tian, Y., Liu, W., Xiao, R., Wen, F., & Tang, X. (2007). A face annotation framework with partial clustering and interactive labeling. In Computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  32. Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In Computer vision and pattern recognition (CVPR).Google Scholar
  33. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research, 6, 1453–1484.MATHMathSciNetGoogle Scholar
  34. Tuytelaars, T., Lampert, C., Blaschko, M., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.CrossRefGoogle Scholar
  35. Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In International conference in computer vision (ICCV).Google Scholar
  36. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International conference in computer vision (ICCV).Google Scholar
  37. Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you? Predicting effort vs. informativeness for multi-label image annotations. In Computer vision and pattern recognition (CVPR).Google Scholar
  38. Wang, G., Hoiem, D., & Forsyth, D. (2010). Learning image similarity from flickr groups using stochastic intersection kernel machines. In Computer vision and pattern recognition (CVPR).Google Scholar
  39. Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in neural information processing systems.Google Scholar
  40. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.CrossRefGoogle Scholar
  41. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In International conference in computer vision (ICCV) (Vol. 2, pp. 1800–1807).Google Scholar
  42. Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Machine Learning.Google Scholar
  43. Zhu, J. Y., Wu, J., Wei, Y., Chang, E., & Tu, Z. (2012). Unsupervised object class discovery via saliency-guided multiple class learning. In Computer vision and pattern recognition (CVPR) (pp. 3218–3225).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Carolina Galleguillos
    • 1
  • Brian McFee
    • 2
  • Gert R. G. Lanckriet
    • 3
  1. 1.SET Media IncSan FranciscoUSA
  2. 2.Columbia UniversityNew YorkUSA
  3. 3.University of California, San DiegoLa JollaUSA

Personalised recommendations