Multimedia Tools and Applications

, Volume 76, Issue 2, pp 1735–1774 | Cite as

A web-based tool for fast instance-level labeling of videos and the creation of spatiotemporal media fragments

  • Anastasia Ioannidou
  • Evlampios Apostolidis
  • Chrysa Collyda
  • Vasileios Mezaris


This paper presents a web-based interactive tool for time-efficient instance-level spatiotemporal labeling of videos, based on the re-detection of manually selected objects of interest that appear in them. The developed tool allows the user to select a number of instances of the object that will be used for annotating the video via detecting and spatially demarcating it in the video frames, and provide a short description about the selected object. These instances are given as input to the object re-detection module of the tool, which detects and spatially demarcates re-occurrences of the object in the video frames. The video segments that contain detected instances of the given object can be then considered as object-related media fragments, being annotated with the user-provided information about the object. A key component for building such a tool is the development of an algorithm that performs the re-detection of the object throughout the video frames. For this, the first part of this work presents our study on different approaches for object re-detection and the finally developed one, which combines the recently proposed BRISK descriptors with a descriptor matching strategy that relies on the LSH algorithm. Following, the second part of this work is dedicated to the description of the implemented tool, introducing the supported functionalities and demonstrating its use for object-specific labeling of videos. A set of experiments and a user study regarding the efficiency of the introduced object re-detection method and the performance of the developed tool indicate that the proposed framework can be used for accurate and time-efficient instance-based annotation of videos, and the creation of object-related spatiotemporal media fragments.


Instance-level video labeling Object re-detection BRISK descriptor Locality Sensitive Hashing 



This work was supported by the European Commission under contract FP7-600826 ForgetIT and FP7-287911 LinkedTV.


  1. 1.
    Abeles P (2013) Examination of hybrid image feature trackers. International Symposium on Visual Computing (ISVC)Google Scholar
  2. 2.
    Agrawal M, Konolige K, Blas MR (2008) CenSurE: Center surround extremas for realtime feature detection and matching. Comput Vision ECCV 2008(5305):102–115Google Scholar
  3. 3.
    Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. IEEE Conference on Computer Vision and Pattern Recognition, pp 510–517Google Scholar
  4. 4.
    Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122CrossRefGoogle Scholar
  5. 5.
    Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6583–6587Google Scholar
  6. 6.
    Apostolidis E, Mezaris V, Kompatsiaris I (2013) Fast object re-detection and localization in video for spatio-temporal fragment creation. IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp 1–6Google Scholar
  7. 7.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst 110(3):346–359CrossRefGoogle Scholar
  8. 8.
    Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefzbMATHGoogle Scholar
  9. 9.
    Bouguet J-Y (1999) Pyramidal implementation of the Lucas Kanade feature tracker: Description of the algorithm. Intel Corporation Microprocessor Research LabsGoogle Scholar
  10. 10.
    Calonder M, Lepetit V, Ozuysal M, Trzcinski T, Strecha C, Fua P (2012) BRIEF: Computing a local binary descriptor very fast. IEEE Trans Pattern Anal Mach Int 34(7):1281–1298CrossRefGoogle Scholar
  11. 11.
    Canclini A, Cesana M, Redondi A, Tagliasacchi M, Ascenso J, Cilla R (2013) Evaluation of low-complexity visual feature detectors and descriptors. 18th International Conference on Digital Signal Processing (DSP), pp 1–7Google Scholar
  12. 12.
    Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 213–218Google Scholar
  13. 13.
    Chum O, Matas J (2005) Matching with PROSAC - progressive sample consensus. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 220–226Google Scholar
  14. 14.
    Chum O, Matas J (2008) Optimal randomized RANSAC. IEEE Trans Pattern Anal Mach Int 30(8):1472–1482CrossRefGoogle Scholar
  15. 15.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Int 24(5):603–619CrossRefGoogle Scholar
  16. 16.
    Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262Google Scholar
  17. 17.
    Ebrahimi M, Mayol-Cuevas WW (2009) SUSurE: Speeded Up Surround Extrema feature detector and descriptor for realtime applications. IEEE Computer Society Conference on Computer Vision and Pattern Recognition WorkshopsGoogle Scholar
  18. 18.
    Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395MathSciNetCrossRefGoogle Scholar
  19. 19.
    Fleury M, Self RP, Downton AC (2004) Development of a fine-grained parallel karhunen-loeve transform. J Parallel Distrib Comput 64(4):520–535CrossRefzbMATHGoogle Scholar
  20. 20.
    Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw (TOMS) 3(3):209–226CrossRefzbMATHGoogle Scholar
  21. 21.
    Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput C-24(7):750–753CrossRefzbMATHGoogle Scholar
  22. 22.
    Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of 4th alvey vision conference, pp 147–151Google Scholar
  23. 23.
    Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with Kernels. In: Proceedings of the 12th European conference on computer vision, Part IV, pp 702–715Google Scholar
  24. 24.
    Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: Proceedings of the 16th ACM international conference on multimedia, pp 209–218Google Scholar
  25. 25.
    Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Int 34(7):1409–1422CrossRefGoogle Scholar
  26. 26.
    Kato K, Hosino T (2010) Solving k-nearest neighbor problem on multiple graphics processors. In: Proceedings of the 10th IEEE/ACM international conference on cluster cloud and grid computing, pp 769–773Google Scholar
  27. 27.
    Ke Y, Sukthankar R (2004) PCA-SIFT: A more distinctive representation for local image descriptorsGoogle Scholar
  28. 28.
    Khvedchenya E (2012) A battle of three descriptors: SURF, FREAK and BRISK. Accessed December 2014.
  29. 29.
    Korman S, Avidan S (2011) Coherency sensitive hashing. In: Proceedings of the 2011 international conference on computer vision, pp 1607–1614Google Scholar
  30. 30.
    Leutenegger S, Chli M, Siegwar R (2011) BRISK: Binary robust invariant scalable keypoints. In: Proceedings of the IEEE international conference on computer vision, pp 2548–2555Google Scholar
  31. 31.
    Liang-Chi C, Tian-Sheuan C, Jiun-Yen C, Chang NY-C (2013) Fast SIFT design for real-time visual feature extraction. IEEE Trans Image Process 22 (8):3158–3167CrossRefGoogle Scholar
  32. 32.
    Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (Oral session), pp 2074–2081Google Scholar
  33. 33.
    Liu Z, Xing B, Chen Y (2013) An efficient parallel SURF algorithm for multi-core processor. Computer Engineering and Technology, pp 27–37Google Scholar
  34. 34.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  35. 35.
    Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. Proceedings of the 7th international joint conference on artificial intelligence 2:674–679Google Scholar
  36. 36.
    Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large databases, pp 950–961Google Scholar
  37. 37.
    Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, vol 10, pp 1–36Google Scholar
  38. 38.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Int 27(10):1615–1630CrossRefGoogle Scholar
  39. 39.
    Miksik O, Mikolajczyk K (2012) Evaluation of local detectors and descriptors for fast feature matching. 21st International Conference on Pattern Recognition (ICPR): 2681–2684Google Scholar
  40. 40.
    Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Int 36(11):2227–2240CrossRefGoogle Scholar
  41. 41.
    Nebehay G, Pflugfelder R (2014) Consensus-based matching and tracking of keypoints for object tracking. IEEE Winter Conference on Applications of Computer Vision (WACV)Google Scholar
  42. 42.
    Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for K-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 211–220Google Scholar
  43. 43.
    Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, pp 113–120Google Scholar
  44. 44.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. IEEE International Conference on Computer Vision (ICCV), pp 2564–2571Google Scholar
  45. 45.
    Shih-Fu C, Junfeng H, Youngwoon L, Jae-Pil H, Sung-Eui Y (2012) Spherical hashing. IEEE Conference on Computer Vision and Pattern Recognition, pp 2957–2964Google Scholar
  46. 46.
    Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  47. 47.
    Sismanis N, Pitsianis N, Xiaobai S (2012) Parallel search of k-nearest neighbors with synchronous operations. IEEE Conference on High Performance Extreme Computing (HPEC), pp 1–6Google Scholar
  48. 48.
    Ta D-N, Chen W-C, Gelfand N, Pulli K (2009) SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors. IEEE Conference on Computer Vision and Pattern Recognition, pp 2937–2944Google Scholar
  49. 49.
    Tomasi C, Kanade T (1991) Detection and tracking of point features. CMU-CS-91-132, Carnegie Mellon UniversityGoogle Scholar
  50. 50.
    Warn S, Emeneker W, Cothren J, Apon A (2009) Accelerating SIFT on parallel architectures. IEEE International Conference on Cluster Computing and Workshops, pp 1–4Google Scholar
  51. 51.
    Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Advances in Neural Information Processing Systems, pp 1753–1760Google Scholar
  52. 52.
    Yang D, Liu L, Zhu F, Zhang W (2011) A parallel analysis on scale invariant feature transform (SIFT) algorithm. In: Proceedings of the 9th international conference on advanced parallel processing technologies, pp 98–111Google Scholar
  53. 53.
    Yue L, Deng C, Cheng L (2012) Density sensitive hashing. CoRR, abs/1205Google Scholar
  54. 54.
    Zhang N (2009) Computing parallel speeded-up robust features (P-SURF) via POSIX threads. In: Proceedings of the 5th international conference on emerging intelligent computing technology and applications, pp 287–296Google Scholar
  55. 55.
    Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Comp Vision Image Underst 113(3):345–352CrossRefGoogle Scholar
  56. 56.
    Zhou K, Hou Q, Wang R, Guo B (2008) Real-time KD-tree construction on graphics hardware. ACM SIGGRAPH Asia 2008 Papers, pp 126:1–126:11Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Anastasia Ioannidou
    • 1
  • Evlampios Apostolidis
    • 1
  • Chrysa Collyda
    • 1
  • Vasileios Mezaris
    • 1
  1. 1.Information Technologies InstituteCentre for Research and Technology HellasThessalonikiGreece

Personalised recommendations