Skip to main content

Video similarity detection using fixed-length Statistical Dominant Colour Profile (SDCP) signatures

Abstract

This paper presents a fast and effective technique for videos’ visual similarity detection and measurement using compact fixed-length signatures. The proposed technique facilitates for building real-time and scalable video matching/retrieval systems through generating a representative signature for a given video shot. The generated signature (Statistical Dominant Colour Profile, SDCP) effectively encodes the colours’ spatio-temporal patterns in a given shot, towards a robust real-time matching. Furthermore, the SDCP signature is engineered to better address the visual similarity problem, through its relaxed representation of shot contents. The compact fixed-length aspect of the proposed signature is the key to its high matching speed (>1000 fps) compared to the current techniques that relies on exhaustive processing, such as dense trajectories. The SDCP signature encodes a given video shot with only 294 values, regardless of the shot length, which facilitates for speedy signature extraction and matching. To maximize the benefit of the proposed technique, compressed-domain videos are utilized as a case study following their wide availability. However, the proposed technique avoids full video decompression and operates on tiny frames, rather than full-size decompressed frames. This is achievable through using the tiny DC-images sequence of the MPEG compressed stream. The experiments on various standard and challenging datasets (e.g. UCF101 13k videos) shows the technique’s robust performance, in terms of both, retrieval ability and computational performances.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    Percent change: \(\mathbf{[~|Difference|/Reference Value] \times 100}\).

  2. 2.

    Average of the direct difference between both values across the first 10 ranks.

  3. 3.

    Per cent change: \(\mathbf{[~|Difference|/Reference Value] \times 100}\).

References

  1. 1.

    YouTube statistics (2015). http://www.youtube.com/yt/press/statistics.html

  2. 2.

    Abbass, A., Youssif, A., Ghalwash, A.: Compressed domain video fingerprinting technique using the singular value decomposition. In: Proceedings of Applied Informatics and Computing Theory (2012)

  3. 3.

    Abbass, A.S., Youssif, A.A., Ghalwash, A.Z.: Hybrid-based compressed domain video fingerprinting technique. Comput. Inf. Sci. 5(5), 25 (2012)

    Google Scholar 

  4. 4.

    Aihara, K., Aoki, T.: Motion dense sampling and component clustering for action recognition. Multimed. Tools Appl. 74(16), 6303–6321 (2015)

    Google Scholar 

  5. 5.

    Almeida, J., Leite, N.J., Torres, R.da.S.: Comparison of video sequences with histograms of motion patterns. In: IEEE International Conference on Image Processing, pp. 3673–3676 (2011)

  6. 6.

    Altadmri, A., Ahmed, A.: A framework for automatic semantic video annotation. Multimed. Tools Appl. 72(2), 1167–1191 (2014)

    Google Scholar 

  7. 7.

    Arlinghaus, S.: Practical handbook of curve fitting. CRC Press, Boca Raton (1994)

    MATH  Google Scholar 

  8. 8.

    Attneave, F.: Dimensions of similarity. Am. J. Psychol. 53, 516–556 (1950)

    Google Scholar 

  9. 9.

    Avula, S.K., Deshmukh, S.C.: Frame based video retrieval using video signatures. Int. J. Comput. Appl. 59(10), 35–40 (2012)

    Google Scholar 

  10. 10.

    Ballas, N., Delezoide, B., Prêteux, F.: Trajectory signature for action recognition in video. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 1429–1432. ACM (2012)

  11. 11.

    Basharat, A., Zhai, Y., Shah, M.: Content based video matching using spatiotemporal volumes. Comput. Vis. Image Underst. 110(3), 360–377 (2008)

    Google Scholar 

  12. 12.

    Bekhet, S., Ahmed, A.: Compact signature-based compressed video matching using dominant color profiles (dcp). In: International Conference on Pattern Recognition ICPR, pp. 3933–3938 (2014)

  13. 13.

    Bekhet, S., Ahmed, A., Altadmri, A., Hunter, A.: Compressed video matching: Frame-to-frame revisited. Multimed. Tools Appl. (2015). doi:10.1007/s11042-015-2887-8

    Article  Google Scholar 

  14. 14.

    Bekhet, S., Ahmed, A., Hunter, A.: Video matching using dc-image and local features. Lect. Notes Eng. Comput. Sci. 3, 2209–2214 (2013)

    Google Scholar 

  15. 15.

    Chattopadhyay, C., Das, S.: Use of trajectory and spatiotemporal features for retrieval of videos with a prominent moving foreground object. Signal Image Video Process. 10(2), 319–326 (2016)

    Google Scholar 

  16. 16.

    Chen, L.H., Chin, K.H., Liao, H.Y.M.: Integration of color and motion features for video retrieval. Int. J. Pattern Recognit. Artif. Intell. 23(02), 313–329 (2009)

    Google Scholar 

  17. 17.

    Cheung, S.C.S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13(1), 59–74 (2003)

    Google Scholar 

  18. 18.

    DeMenthon, D., Doermann, D.: Video retrieval using spatio-temporal descriptors. In: Proceedings of the eleventh ACM international conference on Multimedia, pp. 508–517. ACM (2003)

  19. 19.

    Deng, Y., Manjunath, B., Kenney, C., Moore, M.S., Shin, H.: An efficient color representation for image retrieval. IEEE Trans. Image Process. 10(1), 140–147 (2001)

    MATH  Google Scholar 

  20. 20.

    Dimitrova, N., Golshani, F.: Motion recovery for video content classification. ACM Trans. Inf. Syst. (TOIS) 13(4), 408–439 (1995)

    Google Scholar 

  21. 21.

    Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

  22. 22.

    Droueche, Z., Lamard, M., Cazuguel, G., Quellec, G., Roux, C., Cochener, B.: Content-based medical video retrieval based on region motion trajectories. In: 5th European Conference of the International Federation for Medical and Biological Engineering, pp. 622–625. Springer (2012)

  23. 23.

    Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. J. Intell. Inf. Syst. 3(3–4), 231–262 (1994)

    Google Scholar 

  24. 24.

    Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)

    Google Scholar 

  25. 25.

    Farag, W.E., Abdel-Wahab, H.: A human-based technique for measuring video data similarity. In: Proceedings Eighth IEEE International Symposium on Computers and Communication, 2003.(ISCC 2003), pp. 769–774. IEEE (2003)

  26. 26.

    Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., et al.: Query by image and video content: The QBIC system. Computer 28(9), 23–32 (1995)

    Google Scholar 

  27. 27.

    Gao, H.P., Yang, Z.Q.: Content based video retrieval using spatiotemporal salient objects. In: International Symposium on Intelligence Information Processing and Trusted Computing (IPTC), pp. 689–692 (2010)

  28. 28.

    Guest, P.G., Guest, P.G.: Numerical methods of curve fitting. Cambridge University Press, Cambridge (2012)

    MATH  Google Scholar 

  29. 29.

    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey vision conference, vol. 15, p. 50. Citeseer (1988)

  30. 30.

    Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimed. Inf. Retr. 2(2), 73–101 (2013)

    Google Scholar 

  31. 31.

    Kamila, N.K.: Handbook of Research on Emerging Perspectives in Intelligent Pattern Recognition, Analysis, and Image Processing. IGI Global (2015)

  32. 32.

    Kanade, S.S., Patil, P.: Dominant color based extraction of key frames for sports video summarization. Int. J. Adv. Eng. Technol. 6(1), 504–512 (2013)

    Google Scholar 

  33. 33.

    Kantorov, V., Laptev, I.: Efficient feature extraction, encoding, and classification for action recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2593–2600. IEEE (2014)

  34. 34.

    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1725–1732. IEEE (2014)

  35. 35.

    Kekre, H., Mishra, D., Rege, M.P.: Survey on recent techniques in content based video retrieval. Int. J. Eng. Tech. Res. (IJETR) 3(5), 69–73 (2015)

    Google Scholar 

  36. 36.

    Kiranyaz, S., Uhlmann, S., Gabbouj, M.: Dominant color extraction based on dynamic clustering by multi-dimensional particle swarm optimization. In: Seventh International Workshop on Content-Based Multimedia Indexing, 2009. CBMI’09, pp. 181–188. IEEE (2009)

  37. 37.

    Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Computer Vision–ECCV 2012, pp. 256–269. Springer (2012)

  38. 38.

    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563. IEEE (2011)

  39. 39.

    Li, L., Huang, W., Gu, I.H., Luo, R., Tian, Q.: An efficient sequential approach to tracking multiple objects through crowds for real-time intelligent CCTV systems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(5), 1254–1269 (2008)

    Google Scholar 

  40. 40.

    Li, N., Cheng, X., Zhang, S., Wu, Z.: Realistic human action recognition by fast hog3d and self-organization feature map. Mach. Vis. Appl. 25(7), 1793–1812 (2014)

    Google Scholar 

  41. 41.

    Lichtsteiner, P., Posch, C., Delbruck, T.: A 128\(\times\) 128 120 db 15 \(\mu\)s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits 43(2), 566–576 (2008)

    Google Scholar 

  42. 42.

    Lienhart, R.W., Effelsberg, W., Jain, R.C.: Visualgrep: A systematic method to compare and retrieve video sequences. In: Photonics West’98 Electronic Imaging, pp. 271–282. International Society for Optics and Photonics (1997)

  43. 43.

    Lin, T., Ngo, C.W., Zhang, H.J., Shi, Q.Y.: Integrating color and spatial features for content-based video retrieval. In: Image Processing, 2001. Proceedings. 2001 International Conference on, vol. 3, pp. 592–595. IEEE (2001)

  44. 44.

    Lin, T., Zhang, H.J.: Automatic video scene extraction by shot grouping. In: Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 4, pp. 39–42. IEEE (2000)

  45. 45.

    Liu, H., Sun, M.T., Wu, R.C., Yu, S.S.: Automatic video activity detection using compressed domain motion trajectories for H. 264 videos. J. Vis. Commun. Image Represent. 22(5), 432–439 (2011)

    Google Scholar 

  46. 46.

    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1996–2003. IEEE (2009)

  47. 47.

    Liu, T.J., Han, H.J., Xin, X., Li, Z., Katsaggelos, A.K.: A robust and lightweight feature system for video fingerprinting. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 160–164. IEEE (2012)

  48. 48.

    Liu, X., Zhuang, Y., Pan, Y.: A new approach to retrieve video by example video clip. In: Proceedings of the seventh ACM international conference on Multimedia (Part 2), pp. 41–44. ACM (1999)

  49. 49.

    Lu, B., Cao, H., Cao, Z.: An efficient method for video similarity search with video signature. In: 2010 International Conference on Computational and Information Sciences (ICCIS), pp. 713–716. IEEE (2010)

  50. 50.

    Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: multimedia content description interface, vol. 1. Wiley, New York (2002)

    Google Scholar 

  51. 51.

    Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  52. 52.

    Mohan, R.: Video sequence matching. In: Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, vol. 6, pp. 3697–3700. IEEE (1998)

  53. 53.

    Mojsilovic, A., Hu, J., Soljanin, E.: Extraction of perceptually important colors and similarity measurement for image matching, retrieval and analysis. IEEE Trans. Image Process. 11(11), 1238–1248 (2002)

    MathSciNet  Google Scholar 

  54. 54.

    Naphade, M.R., Yeung, M.M., Yeo, B.L.: Novel scheme for fast and efficient video sequence matching using compact signatures. In: Electronic Imaging, pp. 564–572. International Society for Optics and Photonics (1999)

  55. 55.

    Pacharaney, U.S., Salankar, P.S., Mandalapu, S.: Dimensionality reduction for fast and accurate video search and retrieval in a large scale database. In: 2013 Nirma University International Conference on Engineering (NUiCONE), pp. 1–9. IEEE (2013)

  56. 56.

    Panchal, P., Merchant, S.: Performance evaluation of fade and dissolve transition shot boundary detection in presence of motion in video. In: Emerging Technology Trends in Electronics, Communication and Networking (ET2ECN), 2012 1st International Conference on, pp. 1–6. IEEE (2012)

  57. 57.

    Patel, B., Meshram, B.: Content based video retrieval systems. Int. J. UbiComp (IJU) 3(2) (2012)

    Google Scholar 

  58. 58.

    Peng, X., Qiao, Y., Peng, Q., Qi, X.: Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition. In: British Machine Vision Conference (BMVC) (2013)

  59. 59.

    Poppe, C., De Bruyne, S., Paridaens, T., Lambert, P., Van de Walle, R.: Moving object detection in the H. 264/AVC compressed domain for video surveillance applications. J. Vis. Commun. Image Represent. 20(6), 428–437 (2009)

    Google Scholar 

  60. 60.

    Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Google Scholar 

  61. 61.

    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008., pp. 1–8 (2008)

  62. 62.

    Rogowitz, B.E., Frese, T., Smith, J.R., Bouman, C.A., Kalin, E.B.: Perceptual image similarity experiments. In: Photonics West’98 Electronic Imaging, pp. 576–590. International Society for Optics and Photonics (1998)

  63. 63.

    Sabitha, M., Hariharan, R.: Hybrid approach for image search reranking. Int. J. Sci. Res. (IJSR) 2, 123–128 (2013)

    Google Scholar 

  64. 64.

    Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1234–1241. IEEE (2012)

  65. 65.

    Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)

    Google Scholar 

  66. 66.

    Shao, H., Wu, Y., Cui, W., Zhang, J.: Image retrieval based on mpeg-7 dominant color descriptor. In: Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for, pp. 753–757. IEEE (2008)

  67. 67.

    Shao, J., Shen, H.T., Zhou, X.: Challenges and techniques for effective and efficient similarity search in large video databases. Proceedings of the VLDB Endowment 1(2), 1598–1603 (2008)

    Google Scholar 

  68. 68.

    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07. pp. 1–8. IEEE (2007)

  69. 69.

    Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2595–2602. IEEE (2013)

  70. 70.

    Snoek, C.G., Huurnink, B., Hollink, L., De Rijke, M., Schreiber, G., Worring, M.: Adding semantics to detectors for video retrieval. IEEE Trans. Multimed. 9(5), 975–986 (2007)

    Google Scholar 

  71. 71.

    Solmaz, B., Assari, S.M., Shah, M.: Classifying web videos using a global video descriptor. Mach. Vis. Appl. 24(7), 1473–1485 (2013)

    Google Scholar 

  72. 72.

    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  73. 73.

    Su, C.W., Liao, H.Y., Tyan, H.R., Lin, C.W., Chen, D.Y., Fan, K.C.: Motion flow-based video retrieval. IEEE Trans. Multimed. 9(6), 1193–1201 (2007)

    Google Scholar 

  74. 74.

    Sun, J., Mu, Y., Yan, S., Cheong, L.F.: Activity recognition using dense long-duration trajectories. In: 2010 IEEE International Conference on Multimedia and Expo (ICME), pp. 322–327. IEEE (2010)

  75. 75.

    Thepade, S.D., Yadav, N.B.: Assessment of similarity measurement criteria in thepade’s sorted ternary block truncation coding (tstbtc) for content based video retrieval. In: 2015 International Conference on Communication, Information and Computing Technology (ICCICT), pp. 1–6. IEEE (2015)

  76. 76.

    Thorpe, S., Fize, D., Marlot, C., et al.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)

    Google Scholar 

  77. 77.

    TrecVid(2011): Trec video retrival task, bbc ruch (1-02-2011). www.nplpir.nist.gov/projects/trecvid

  78. 78.

    UCF: Ucf sports action dataset (2016). http://crcv.ucf.edu/data/UCF_Sports_Action.php. Retrieved (11-02-2016)

  79. 79.

    Uijlings, J., Duta, I., Sangineto, E., Sebe, N.: Video classification with densely extracted hog/hof/mbh features: an evaluation of the accuracy/computational efficiency trade-off. Int. J. Multimed. Inf. Retr. 4(1), 33–44 (2015)

    Google Scholar 

  80. 80.

    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J Comput. Vis. 103(1), 60–79 (2013)

    MathSciNet  Google Scholar 

  81. 81.

    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

  82. 82.

    Wang, L., Qiao, Y., Tang, X.: Motionlets: Mid-level 3d parts for human motion recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2674–2681. IEEE (2013)

  83. 83.

    Watson, A.B.: Image compression using the discrete cosine transform. Math. J. 4(1), 81 (1994)

    MathSciNet  Google Scholar 

  84. 84.

    Wu, Y., Zhuang, Y., Pan, Y.: Content-based video similarity model. In: Proceedings of the eighth ACM international conference on multimedia, pp. 465–467. ACM (2000)

  85. 85.

    Xu, P., Xie, L., Chang, S.F., Divakaran, A., Vetro, A., Sun, H.: Algorithms and system for segmentation and structure analysis in soccer video. In: Proceedings of ICME, vol. 1, pp. 928–931. Citeseer (2001)

  86. 86.

    Zhang, H., Smoliar, S.W.: Developing power tools for video indexing and retrieval. In: IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology, pp. 140–149. International Society for Optics and Photonics (1994)

  87. 87.

    Zhang, Z., Yuan, F.: Compressed video copy detection based on texture analysis. In: 2010 IEEE International Conference on Wireless Communications, Networking and Information Security (WCNIS), pp. 612–615. IEEE (2010)

  88. 88.

    Zhang, Z., Zou, J.: Compressed video copy detection based on edge analysis. In: The 2010 IEEE International Conference on Information and Automation, pp. 2497–2501 (2010)

  89. 89.

    Zhao, Z., Cui, B., Cong, G., Huang, Z., Shen, H.T.: Extracting representative motion flows for effective video retrieval. Multimed. Tools Appl. 58(3), 687–711 (2012)

    Google Scholar 

  90. 90.

    Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: Insightvideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimed. 7(4), 648–666 (2005)

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Saddam Bekhet.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bekhet, S., Ahmed, A. Video similarity detection using fixed-length Statistical Dominant Colour Profile (SDCP) signatures. J Real-Time Image Proc 16, 1999–2014 (2019). https://doi.org/10.1007/s11554-017-0700-9

Download citation

Keywords

  • Video matching
  • Statistical Dominant Colour Profile
  • SDCP
  • Compressed video
  • Signature