Skip to main content
Log in

A novel feature fusion based framework for efficient shot indexing to massive web videos

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

This study addresses an automatic approach to analyze the structure of large scale web videos based on visual and acoustic information. In our approach, video streams are macro-segmented via mining the duplicate sequences. Acoustic and visual information are both adopted for mining so as to avoid missing true-positive. Web videos contain severe visual and acoustic distortions, differing to TV data, where duplicate clips are quite similar. In this case, we present novel visual-acoustic feature schemes to handle the distortions. And shot based indexing algorithm and several temporary constrains are presented to mine the duplicate sequences, where the weak geometric verification is combined with direct hashing to achieve high efficiency and superior performance of image-based duplicate sequences detection, and dynamic programming is introduced to recall missing true-positives in audio-based section. Experiments conducted on the dataset composed of 500 h content-unknown videos show that F-Measure of duplicate sequences mining for web videos can achieve the rate of 95 % and, in terms of efficiency and detection performance, the proposed algorithm outperforms the state-of-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://ffmpeg.org.

  2. http://www-nlpir.nist.gov/projects/tv2011/#ccd.

  3. http://sports.orange.fr/.

References

  1. Zhao, J. Y., Hayasaka, R., Muranoi, R., & Matsushita, Y. (1998). A MPEG video structure analysis scheme and its application to hierarchical video browser. Telecommunication Systems, 9(3–4), 403–422.

    Article  Google Scholar 

  2. Gauch, J. M., & Shivadas, A. (2006). Finding and identifying unknown commercials using repeated video sequence detection. Computer Vision and Image Understanding, 103, 80–88.

    Article  Google Scholar 

  3. Berrani S., Lechat P., & Manson G. (2007) TV broadcast macro-segmentation: metadata-based vs. content-based approaches, Proceedings of the 6th ACM international conference on Image and video retrieval, Amsterdam, The Netherlands: ACM, pp. 325–332.

  4. Berrani, S., Manson, G., & Lechat, P. (2008). A non-supervised approach for repeated sequence detection in TV broadcast streams. Image Communication, 23, 525–537.

    Google Scholar 

  5. Covell, M., Baluja, S. (2006) Advertisement detection and replacement using acoustic and visual repetition, MMSP’06, IEEE 8th workshop on multimedia signal processing.

  6. Bai, H., Wang, L., Qin, G., Zhang, J., Tao, K., Chang, X., Dong, Y. (2011). TV program segmentation using multi-modal information fusion, Proceedings of the 1st ACM international conference on multimedia retrieval, 2011 ACM, New York, NY, USA.

  7. Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., & Liu, W. (2012). Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).

  8. Hampapur, A., Hyun, K., & Bolle, R. (2002). Comparison of sequence matching techniques for video copy detection. Proceedings of the storage and retrieval for media databases, pp. 194–201.

  9. Bai, H., Dong, Y., Liu, W., Wang, L., Huang, C., & Tao, K. (2011). France telecom orange labs (Beijing) at TRECVID 2011: Content-Based Copy Detection-TRECVID 2011 Notebook Paper.

  10. Duan, L., Wang, J., Zheng, Y., Jin, J. S., Lu, H., & Xu, C. (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis, Proceedings of the 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA: ACM, pp. 201–210.

  11. Derek, Y. K., Ke, Y., Hoiem, D., & Sukthankar, R. (2005). Computer vision for music identification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 597–604.

    Google Scholar 

  12. Haitsma, J., Kalker, T. (2001) Robust audio hashing for content identification, Content-based multimedia indexing (CBMI).

  13. Dong, Y., Qin, G., Xiao, G. R., Lian, S. G., & Chang, X. F. (2013) Advanced news video parsing via visual characteristics of anchorperson scenes, Telecommunication Systems. doi:10.1007/s11235-013-9731-0

  14. Smeaton, A. F., Over, P., & Doherty, A. R. (2010). Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4), 411–418.

    Article  Google Scholar 

  15. Fei-Fei, L., & Perona, P. (2005) A Bayesian hierarchical model for learning natural scene categories. Proceedings of IEEE computer vision and pattern recognition. pp. 524–531.

  16. Lowe, David G. (1999). Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision, 2, 1150–1157.

    Google Scholar 

  17. Huang, C., & Dong, Y. (2012) A fast color feature for real-time image retrieval, IC-NIDC.

  18. Lazebnik, S., Schmid, C., & Ponce, J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR

  19. Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2010). Real-time visual concept classifcation. IEEE Transactions of Multimedia, 12(7), 665.

    Article  Google Scholar 

  20. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree, IEEE computer society conference on computer vision and pattern recognition. 2, 2161–2168.

  21. Shang, L., Yang, L., Wang, F., Chan, K., & Hua, X. (2010) Real-time large scale near-duplicate web video retrieval, ACM MM.

  22. Needleman, S. B., & Wunsch, C. D. (1970). An efficient method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48, 444–453.

    Article  Google Scholar 

  23. Sellers, P. H. (1974). An algorithm for the distance between two finite sequences. Journal of Combinatorial Theory, A16, 253–258.

    Article  Google Scholar 

  24. Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., Liu, W. (2012) Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).

  25. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004) Locality-sensitive hashing scheme based on p-stable distributions, Annual symposium on computational geometry, pp. 253–262.

  26. Gionis, A., Indyk, P., & Motwani, R. (1999) Similarity search in high dimensions via hashing, Proceeding VLDB ’99 Proceedings of the 25th international conference on very large data bases, pp. 518–529.

  27. Schaefer, G., & Zhou, H. Y. (2009). Fuzzy clustering for colour reduction in images. Telecommunication Systems, 40(1–2), 17–25.

    Article  Google Scholar 

Download references

Acknowledgments

This work is sponsored by collaborative Research Project (SEV01100474) between Beijing University of Posts and Telecommunications and France Telecom – Orange Lab Beijing, the National High Technology Research and Development Program of China (863 Program, No. 2012AA012505), and the National Natural Science Foundation of China (61372169).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Dong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Y., Wang, L., Lian, S. et al. A novel feature fusion based framework for efficient shot indexing to massive web videos. Telecommun Syst 59, 401–413 (2015). https://doi.org/10.1007/s11235-014-9945-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-014-9945-9

Keywords

Navigation