Video Retrieval Based on User-Specified Appearance and Application to Animation Synthesis

  • Makoto Okabe
  • Yuta Kawate
  • Ken Anjyo
  • Rikio Onai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7733)


In our research group, we investigate techniques for retrieving videos based on user-specified appearances. In this paper, we introduce two of our research activities.

First, we present a user interface for quickly and easily retrieving scenes of a desired appearance from videos. Given an input image, our system allows the user to sketch a transformation of an object inside the image, and then retrieves scenes showing this object in the user-specified transformed pose. Our method employs two steps to retrieve the target scenes. We first apply a standard image-retrieval technique based on feature matching, and find scenes in which the same object appears in a similar pose. Then we find the target scene by automatically forwarding or rewinding the video, starting from the frame selected in the previous step. When the user-specified transformation is matched, we stop forwarding or rewinding, and thus the target scene is retrieved. We demonstrate that our method successfully retrieves scenes of a racing car, a running horse, and a flying airplane with user-specified poses and motions.

Secondly, we present a method for synthesizing fluid animation from a single image, using a fluid video database. The user inputs a target painting or photograph of a fluid scene. Employing the database of fluid video examples, the core algorithm of our technique then automatically retrieves and assigns appropriate fluid videos for each part of the target image. The procedure can thus be used to handle various paintings and photographs of rivers, waterfalls, fire, and smoke, and the resulting animations demonstrate that it is more powerful and efficient than our prior work.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)Google Scholar
  2. 2.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 5:1–5:60 (2008)Google Scholar
  3. 3.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM SIGGRAPH 2006 Papers, pp. 835–846 (2006)Google Scholar
  4. 4.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building rome in a day. In: ICCV 2009, pp. 72–79 (2009)Google Scholar
  5. 5.
    Frahm, J.M., Pollefeys, M., Lazebnik, S., Zach, C., Gallup, D., Clipp, B., Raguram, R., Wu, C., Johnson, T.: Fast robust large-scale mapping from video and internet photo collections. ISPRS Journal of Photogrammetry and Remote Sensing 65(6), 538–549 (2010)CrossRefGoogle Scholar
  6. 6.
    Tompkin, J., Kim, K., Kautz, J., Theobalt, C.: Videoscapes: Exploring sparse, unstructured video collections. ACM Transactions on Graphics (Proc. of SIGGRAPH) (2012)Google Scholar
  7. 7.
    Kimber, D., Dunnigan, T., Girgensohn, A., Shipman, F., Turner, T., Yang, T.: Trailblazing: Video playback control by direct object manipulation. In: IEEE International Conference on Multimedia and Expo., pp. 1015–1018 (2007)Google Scholar
  8. 8.
    Girgensohn, A., Kimber, D., Vaughan, J., Yang, T., Shipman, F., Turner, T., Rieffel, E., Wilcox, L., Chen, F., Dunnigan, T.: Dots: support for effective video surveillance. In: Proc. of ACM Multimedia, pp. 423–432 (2007)Google Scholar
  9. 9.
    Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., Singh, K.: Video browsing by direct manipulation. In: Proc. of CHI 2008, pp. 237–246 (2008)Google Scholar
  10. 10.
    Goldman, D.B., Gonterman, C., Curless, B., Salesin, D., Seitz, S.M.: Video object annotation, navigation, and composition. In: Proc. UIST 2008, pp. 3–12 (2008)Google Scholar
  11. 11.
    Karrer, T., Weiss, M., Lee, E., Borchers, J.: Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In: Proc. of CHI 2008, pp. 247–250 (2008)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Sand, P., Teller, S.: Particle video: Long-range motion estimation using point trajectories. In: Proc. of CVPR 2006, pp. 2195–2202 (2006)Google Scholar
  14. 14.
    Chuang, Y.Y., Goldman, D.B., Zheng, K.C., Curless, B., Salesin, D.H., Szeliski, R.: Animating pictures with stochastic motion textures. In: Proc. SIGGRAPH 2005, pp. 853–860 (2005)Google Scholar
  15. 15.
    Okabe, M., Anjyo, K., Igarashi, T., Seidel, H.P.: Animating pictures of fluid using video examples. Computer Graphics Forum (Proc. EUROGRAPHICS) 28(2), 677–686 (2009)CrossRefGoogle Scholar
  16. 16.
    Okabe, M., Anjyo, K., Onai, R.: Creating fluid animation from a single image using video database. Comput. Graph. Forum 30(7), 1973–1982 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Makoto Okabe
    • 1
    • 3
  • Yuta Kawate
    • 1
  • Ken Anjyo
    • 2
  • Rikio Onai
    • 1
  1. 1.The University of Electro-CommunicationsTokyoJapan
  2. 2.OLM Digital, Inc. / JST CRESTJapan
  3. 3.JST PRESTOJapan

Personalised recommendations