Annotated Free-Hand Sketches for Video Retrieval Using Object Semantics and Motion

  • Rui Hu
  • Stuart James
  • John Collomosse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7131)


We present a novel video retrieval system that accepts annotated free-hand sketches as queries. Existing sketch based video retrieval (SBVR) systems enable the appearance and movements of objects to be searched naturally through pictorial representations. Whilst visually expressive, such systems present an imprecise vehicle for conveying the semantics (e.g. object types) within a scene. Our contribution is to fuse the semantic richness of text with the expressivity of sketch, to create a hybrid ‘semantic sketch’ based video retrieval system. Trajectory extraction and clustering are applied to pre-process each clip into a video object representation that we augment with object classification and colour information. The result is a system capable of searching videos based on the desired colour, motion path, and semantic labels of the objects present. We evaluate the performance of our system over the TSF dataset of broadcast sports footage.


Video Frame Medial Axis Salient Object Mean Average Precision Video Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anjum, N., Cavallaro, A.: Multifeature object trajectory clustering for video analysis. IEEE Trans. on Circuits and Systems for Video 18(11), 1555–1564 (2008)CrossRefGoogle Scholar
  2. 2.
    Antonini, G., Thiran, J.P.: Counting pedestrians in video sequences using trajectory clustering. IEEE Tran. on Circuits and Systems for Video 16(8), 1008–1020 (2006)CrossRefGoogle Scholar
  3. 3.
    Bashir, F.I., Khokhar, A.A., Schonfeld, D.: Real-time motion trajectory-based indexing and retrieval of video sequences. IEEE Trans. Multimedia 9(1), 58–65 (2007)CrossRefGoogle Scholar
  4. 4.
    Battiato, S., Gallo, G., Puglisi, G., Scellato, S.: Sift features tracking for video stabilization. In: International Conference on Image Analysis and Processing, pp. 825–830 (2007)Google Scholar
  5. 5.
    Bertini, M., Del Bimbo, A., Nunziati, W.: Video Clip Matching Using MPEG-7 Descriptors and Edit Distance. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds.) CIVR 2006. LNCS, vol. 4071, pp. 133–142. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Cao, Y., Wang, H., Wang, C., Li, Z., Zhang, L., Zhang, L.: Mindfinder: interactive sketch-based image search on millions of images. In: ACM Multimedia, pp. 1605–1608 (2010)Google Scholar
  7. 7.
    Christoudias, C.M., Georgescu, B., Meer, P.: Synergism in low level vision. In: ICPR, vol. 4, p. 40150 (2002)Google Scholar
  8. 8.
    Collomosse, J., Mcneill, G., Qian, Y.: Storyboard sketches for content based video retrieval. In: ICCV (2009)Google Scholar
  9. 9.
    Collomosse, J., Mcneill, G., Watts, L.: Free-hand sketch grouping for video retrieval. In: ICPR (2008)Google Scholar
  10. 10.
    del Bimbo, A., Pala, P.: Visual image retrieval by elastic matching of user sketches 19(2), 121–132 (1997)Google Scholar
  11. 11.
    Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: Sketch-based image retrieval: Benchmark and bag-of-features descriptors. In: IEEE TVCG, vol. 99 (2010)Google Scholar
  12. 12.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)CrossRefzbMATHGoogle Scholar
  13. 13.
    Hafner, J., Sawhney, H.S., Equitz, W., Flickner, M., Niblack, W.: Effcient color histogram indexing for quadratic distance. IEEE PAMI 17(7), 729–736 (1995)CrossRefGoogle Scholar
  14. 14.
    Hsieh, J., Yu, S., Chen, Y.: Motion-based video retrieval by trajectory matching. IEEE Tran. on Circuits and Systems for Video 16(3), 396–409 (2006)CrossRefGoogle Scholar
  15. 15.
    Hu, R., Barnard, M., Collomosse, J.: Gradient field descriptor for sketch based retrieval and localization. In: ICIP, pp. 1025–1028 (2010)Google Scholar
  16. 16.
    Hu, R., Collomosse, J.: Motion-sketch based video retrieval using a trellis levenshtein distance. In: Intl. Conf. on Pattern Recognition, ICPR (2010)Google Scholar
  17. 17.
    Ip, H.H.S., Cheng, A.K.Y., Wong, W.Y.F., Feng, J.: Affine-invariant sketch-based retrieval of images. In: International Conference on Computer Graphics, pp. 55–61 (2001)Google Scholar
  18. 18.
    Jacobs, C.E., Finkelstein, A., Salesin, D.H.: Fast multi-resolution image querying. In: Proc. ACM SIGGRAPH, pp. 277–286 (1995)Google Scholar
  19. 19.
    Jung, C.R., Hennemann, L., Musse, S.R.: Event detection using trajectory clustering and 4-d histograms. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1565–1575 (2008)CrossRefGoogle Scholar
  20. 20.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Intl. Journal of Computer Vision 4(1), 321–331 (1987)Google Scholar
  21. 21.
    Kohli, P., Ladický, L., Torr, P.H.S.: Robust Higher Order Potentials for Enforcing Label Consistency. International Journal of Computer Vision 82, 302–324 (2009)CrossRefGoogle Scholar
  22. 22.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8, Soviet Physics Doklady (1966)Google Scholar
  23. 23.
    Li, X., Hu, W., Hu, W.: A coarse-to-fine strategy for vehicle motion trajectory clustering. In: ICPR, pp. 591–594 (2006)Google Scholar
  24. 24.
    Liu, C., Wang, D., Liu, X., Wang, C., Zhang, L., Zhang, B.: Robust semantic sketch based specific image retrieval. In: Proc. Intl. Conf. and Multimedia Expo. (2010)Google Scholar
  25. 25.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  26. 26.
    Lpez-Garca, F.: Sift features for object recognition and tracking within the ivsee system. In: ICPR, pp. 1–4. IEEE (2008)Google Scholar
  27. 27.
    Matusiak, S., Daoudi, M., Blu, T., Avaro, O.: Sketch-Based Images Database Retrieval. In: Jajodia, S., Özsu, M.T., Dogac, A. (eds.) MIS 1998. LNCS, vol. 1508, pp. 185–191. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  28. 28.
    Mokhtarian, F., Mackworth, A.K.: A theory of multiscale, curvature-based shape representation for planar curves. IEEE Trans. Pattern Anal. Mach. Intell. 14, 789–805 (1992)CrossRefGoogle Scholar
  29. 29.
    Piciarelli, C., Foresti, G.L.: On-line trajectory clustering for anomalous events detection. Pattern Recogn. Lett. 27, 1835–1842 (2006)CrossRefGoogle Scholar
  30. 30.
    Di Sciascio, E., Mingolla, G., Mongiello, M.: CBIR over the web using query by sketch and relevance feedback. In: Proc. Intl. Conf. VISUAL, pp. 123–130 (1999)Google Scholar
  31. 31.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR, pp. 1–8 (2008)Google Scholar
  32. 32.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 2:1470–2:1477 (2003)Google Scholar
  34. 34.
    Tulving, E.: Elements of episodic memory (1983)Google Scholar
  35. 35.
    Wang, C., Li, Z., Zhang, L.: Mindfinder: image search by interactive sketching and tagging. In: WWW, pp. 1309–1312 (2010)Google Scholar
  36. 36.
    Xu, J., Ye, G., Zhang, J.: Long-term trajectory extraction for moving vehicles. In: IEEE International Workshop on Multimedia Signal Processing, pp. 223–226 (2007)Google Scholar
  37. 37.
    Zhang, H., Kankanhalli, A., Smoliar, S.W.: Automatic partitioning of full-motion video. Multimedia Systems 1(1), 10–28 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Rui Hu
    • 1
  • Stuart James
    • 1
  • John Collomosse
    • 1
  1. 1.Centre for Vision, Speech and Signal Processing (CVSSP)University of SurreyGuildfordU.K.

Personalised recommendations