VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest

  • David Liu
  • Gang Hua
  • Tsuhan Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5302)


We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of candidate areas which possibly contain the object of interest, and then figure out which area(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each area’s feature descriptor has the chance of genuinely describing the object of interest, hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating the frames as independent, we can hypothesize the location of the candidate areas more accurately. Third, by infusing prior knowledge into the topic-motion model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of candidate areas and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips.


Visual Word Image Patch Scale Invariant Feature Transform Neural Information Processing System Area Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-540-88682-2_34_MOESM1_ESM.wmv (14.5 mb)
Supplementary material (14,882 KB)


  1. 1.
  2. 2.
  3. 3.
    Bar-Shalom, Y., Fortmann, T.: Tracking and Data Association. Academic Press, London (1988)zbMATHGoogle Scholar
  4. 4.
    Bennett, K., Demiriz, A., Maclin, R.: Exploiting unlabeled data in ensemble methods. Intl. Conf. Knowledge Discovery and Data Mining (2002)Google Scholar
  5. 5.
    Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proc. European Conf. Artificial Intelligence, pp. 147–149 (1990)Google Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. PAMI 28(4), 594–611 (2006)CrossRefGoogle Scholar
  8. 8.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference (2002)Google Scholar
  9. 9.
    Julesz, B.: Textons, the elements of texture perception and their interactions. Nature 290, 91–97 (1981)CrossRefGoogle Scholar
  10. 10.
    Li, Y., Li, H., Guan, C., Chin, Z.: A self-training semi-supervised support vector machine algorithm and its applications in brain computer interface. IEEE Intl. Conf. Acoustics, Speech, and Signal Processing (2007)Google Scholar
  11. 11.
    Liu, D., Chen, T.: A topic-motion model for unsupervised video object discovery. In: IEEE Conf. Computer Vision and Pattern Recognition (2007)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Intl. J. Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent. In: Proc. Advances in Neural Information Processing Systems (NIPS) (1999)Google Scholar
  14. 14.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Intl. Conf. Information and Knowledge Management (2000)Google Scholar
  15. 15.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco (1988)zbMATHGoogle Scholar
  16. 16.
    Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: Peeking around the world. In: IEEE Intl. Conf. Computer Vision (2007)Google Scholar
  17. 17.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: IEEE Workshop on Applications of Computer Vision (2005)Google Scholar
  18. 18.
    Schneiderman, H., Kanade, T.: Object detection using the statistics of parts. Intl. J. Computer Vision 56, 151–177 (2004)CrossRefGoogle Scholar
  19. 19.
    Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. Intl. Journal of Computer Vision 67, 189–210 (2006)CrossRefzbMATHGoogle Scholar
  20. 20.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: IEEE Intl. Conf. Computer Vision (2003)Google Scholar
  21. 21.
    van de Weijer, J., Schmid, C.: Coloring local feature extraction. In: Proc. European Conf. Computer Vision (2006)Google Scholar
  22. 22.
    Viola, P., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: Proc. Advances in Neural Information Processing Systems (NIPS) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Liu
    • 1
  • Gang Hua
    • 2
  • Tsuhan Chen
    • 1
  1. 1.Dept. of ECECarnegie Mellon UniversityUSA
  2. 2.Microsoft Live LabsUSA

Personalised recommendations