Semiautomatic Learning of 3D Objects from Video Streams

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9371)


Object detection and recognition are classical problems in computer vision, but are still challenging without a priori knowledge of objects and with a limited user interaction. In this work, a semiautomatic system for visual object learning from video stream is presented. The system detects movable foreground objects relying on FAST interest points. Once a view of an object has been segmented, the system relies on ORB features to create its descriptor, store it and compare it with descriptors of previously seen views. To this end, a visual similarity function based on geometry consistency of the local features is used. The system groups together similar views of the same object into clusters relying on the transitivity of similarity among them. Each cluster identifies a 3D object and the system learn to autonomously recognize a particular view assessing its cluster membership. When ambiguities arise, the user is asked to validate the membership assignments. Experiments have demonstrated the ability of the system to group together unlabeled views, reducing the labeling work of the user.


Local Feature Video Stream Object View Smart Camera RANSAC Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carrara, F., Amato, G., Falchi, F., Gennaro, C.: Efficient foreground-background segmentation using local features for object detection. In: Proceedings of the International Conference on Distributed Smart Cameras, ICDSC 2015, September 08–11, 2015, Seville, Spain (submitted for publication).
  2. 2.
    De Beugher, S., Brône, G., Goedemé, T.: Automatic analysis ofin-the-wild mobile eye-tracking experiments using object, face and persondetection. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISIGRAPP 2014), vol. 1, pp. 625–633 (2014)Google Scholar
  3. 3.
    Dubrofsky, E.: Homography estimation. Ph.D. thesis, University of British Columbia (2009)Google Scholar
  4. 4.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)CrossRefGoogle Scholar
  5. 5.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, vol. 2, pp. II–264. IEEE (2003)Google Scholar
  6. 6.
    Lowe, D.G.: Local feature view clustering for 3d object recognition. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–682. IEEE (2001)Google Scholar
  7. 7.
    Murase, H., Nayar, S.K.: Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision 14(1), 5–24 (1995)CrossRefGoogle Scholar
  8. 8.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)Google Scholar
  9. 9.
    Savarese, S., Li, F.F.: 3D generic object categorization, localization and pose estimation. In: ICCV, pp. 1–8 (2007)Google Scholar
  10. 10.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. Springer (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.ISTI-CNRPisaItaly

Personalised recommendations