Advertisement

The Visual Computer

, Volume 31, Issue 4, pp 367–375 | Cite as

Optimized recognition with few instances based on semantic distance

  • Hao WuEmail author
  • Zhenjiang Miao
  • Yi Wang
  • Manna Lin
Original Article

Abstract

In this paper, we present a new object recognition model with few instances based on semantic distance. Learning objects with many instances have been studied in computer vision for many years. However, in many cases, not enough positive instances occur, especially for some special categories. We must take full advantage of all instances, including those that do not belong to the category. The main insight is that, given a few positive instances from one category, we can define some other candidate instances as positive instances based on semantic distance to learn this model. Our model responds more strongly to instances with closer semantic distance to positive instances than to instances with farther semantic distance to positive instances. We use a regularized kernel machine algorithm to train the images from the database. The superiority of our method to existing object recognition methods is demonstrated. Experiments using an image database show that our method not only reduces the number of learning instances but also keeps the accurate rate of recognition.

Keywords

Semantic distance Object recognition GIST SIFT AP value AUC value 

Notes

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported by the National Key Technology R&D Program of China (2012BAH01F03), National Basic Research (973) Program of China (2011CB302203), research fund of Tsinghua -Tencent Joint Laboratory for Internet Innovation Technology.

References

  1. 1.
    Bart, E., et al.: Unsupervised learning of visual taxonomies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Anchorage (2008)Google Scholar
  2. 2.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Anchorage (2008)Google Scholar
  3. 3.
    Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: Neural Information Processing Systems, Vancouver, B.C., Canada, p. 23 (2009)Google Scholar
  4. 4.
    Palatucci, M., et al.: Zero-shot learning with semantic output codes. In: Neural Information Processing Systems, Vancouver, B.C., Canada, p. 22 (2009)Google Scholar
  5. 5.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 594–611 (2006)CrossRefGoogle Scholar
  6. 6.
    Maji, S., Berg, A.C.: Max-margin additive classifiers for detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 40–47. IEEE, Kyoto (2009)Google Scholar
  7. 7.
    Kumar, N., et al.: Attribute and simile classifiers for face verification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 365–372. IEEE, Kyoto (2009)Google Scholar
  8. 8.
    Zha, Z., et al.: Joint multi-label multi-instance learning for image classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1–8. Anchorage (2008)Google Scholar
  9. 9.
    Russell, B.C., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)CrossRefGoogle Scholar
  10. 10.
    Ulges, A., et al.: Identifying relevant frames in weakly labeled videos for training concept detectors. In: Proceedings of International Conference on Content-Based Image and Video Retrieval, Niagara Falls, Canada, pp. 9–16 (2008)Google Scholar
  11. 11.
    Fu, Y., Hospedales, T., Xiang, T., Gong, S.: Learning multi-modal latent attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  12. 12.
    Torralba, A., et al.: Describing visual scenes using transformed Dirichlet processes. Adv. Neural Inf. Process. Syst., pp. 1297–1304 (2005)Google Scholar
  13. 13.
    Sudderth, E.B., et al.: Learning hierarchical models of scenes, objects, and parts. In: Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol. 2. IEEE (2005)Google Scholar
  14. 14.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958. IEEE, Miami (2009)Google Scholar
  15. 15.
    Herbrich, R., Graepel, T., Obermayer, K.: Large margin rank boundaries for ordinal regression. In: KDD, pp. 115–132. Microsoft Research Publisher/MIT Press, Cambridge (2002)Google Scholar
  16. 16.
    Zhang, J., et al.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)CrossRefGoogle Scholar
  17. 17.
    Weinberger, K., Blitzer, J., Saul, L.: Distance Metric learning for large margin nearest neighbour classification. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, vol. 18, pp. 1437–1480 (2006)Google Scholar
  18. 18.
    Jacobs, D.W., Weinshall, D., Gdalyahu, Y.: Classification with non-metric distances: image retrieval and class representation. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 583–600 (2000)CrossRefGoogle Scholar
  19. 19.
    Frome, A., et al.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of the IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8 (2007)Google Scholar
  20. 20.
    Wang, G., Fotsyth, D.: Joint learning of visual attributes, object classes and visual saliency. In: Proceedings of the IEEE Conference on Computer Vision, pp. 537–544. IEEE, Kyoto (2009)Google Scholar
  21. 21.
    Wu, C.: Content-based image detection of semantic similarity. In: 2010 Second International Workshop on Education Technology and Computer Science (ETCS), vol. 2, pp. 452–455. IEEE (2010)Google Scholar
  22. 22.
    Choi, J., et al.: Concept-based image retrieval using the new semantic similarity measurement. In: Computational Science and Its Applications-ICCSA 2003. Springer, Berlin, pp. 79–88 (2003)Google Scholar
  23. 23.
    Cui, C., et al.: Semantically coherent image annotation with a learning-based keyword propagation strategy. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2423–2426. ACM (2012)Google Scholar
  24. 24.
    Wang, G., Forsyth, D., Hoiem, D.: Comparative object similarity for improved recognition with few or no examples. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3525–3532. IEEE, San Francisco (2010)Google Scholar
  25. 25.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  26. 26.
    van de Sande, K., Gevers, T., Snoek, C.: Evaluation of color descriptors for object and scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Anchorage (2008)Google Scholar
  27. 27.
    Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 672–679. ACM, New York (2007)Google Scholar
  28. 28.
    James, H., et al.: Scene completion using millions of photographs. ACM Trans. Graph. 26(3) (2007)Google Scholar
  29. 29.
    Zheng, Y.-T., et al.: Toward a higher-level visual representation for object-based image retrieval. Vis. Comput. 25(1), 13–23 (2009)CrossRefGoogle Scholar
  30. 30.
    Bart, E., Ullman, S.: Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 672–679. IEEE, San Diego (2005)Google Scholar
  31. 31.
    Torralba, A., Murphy, K.P.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 854–869 (2007)CrossRefGoogle Scholar
  32. 32.
    van de Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real-world images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Minneapolis (2007)Google Scholar
  33. 33.
    Farhadi, A., et al.: Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, Miami (2009)Google Scholar
  34. 34.
    Kunze, K., et al.: The wordometer–estimating the number of words read using document image retrieval and mobile eye tracking. In: 12th International Conference on Document Analysis and Recognition (ICDAR), 2013. IEEE (2013)Google Scholar
  35. 35.
    Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retr. 11(2), 77–107 (2008)CrossRefGoogle Scholar
  36. 36.
    Hiremath, P.S., Pujari, J.: Content based image retrieval using color, texture and shape features. In: International Conference on Advanced Computing and Communications, 2007. ADCOM 2007, pp. 780–784. IEEE (2007)Google Scholar
  37. 37.
    Vedaldi, A., Zisserman, A.: Image Classification Practical (2011). http://www.robots.ox.ac.uk/vgg/share/practical-image-classification.htm
  38. 38.
    An, S., Liu, W., Venkatesh, S.: Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognit., pp. 2154–2162 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingChina
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations