Skip to main content
Log in

Discovery of Topical Objects from Video: A Structured Dictionary Learning Approach

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Automatic discovery of topical objects in video clips is a typical cognitive-related task and is essential for understanding and summarizing the video contents. In this paper, we propose a novel framework based on structured dictionary learning for such a task. Different from existing work which utilizes multiple segmentations to coarsely obtain the object regions, we adopt the most recently developed objectness operator to extract candidate objects. Such a method exhibits a great advantage that the interested object can be more reliably segmented. A structured dictionary learning method is proposed to discover the topical objects of the video clips. Such an optimization model exploits the temporal relation between video frames and therefore leads to better performance. Further, a globally convergent algorithm is developed to solve the structured dictionary learning problem, and extensive experiments on 10 Web video clips show that the proposed method outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For the experiments, we used the code of [19] available online without any modifications or tuning. It takes only about 0.01 s to compute candidate windows for one image.

References

  1. Zhao G, Yuan J, Xu J, Wu Y. Discovering the thematic object in commercial videos. IEEE Multimed. 2011;18(3):56–65.

    Article  Google Scholar 

  2. Liu H, Liu Y, Yu Y, Sun F. Diversified key-frame selection using structured L 2,1 optimization. IEEE Trans Ind Inform. 2014;10(3):1736–45.

    Article  Google Scholar 

  3. Liu H, Liu Y, Sun F. Video key-frame extraction for smart phones, Multimed Tools Appl., In press.

  4. Navarretta C. The automatic identification of the producers of co-occurring communicative behaviours. Cogn Comput. 2014;6(4):689–98.

    Article  Google Scholar 

  5. Chen Y, Zhou Q, Luo W, Du J, Classification of Chinese texts based on recognition of semantic topics, Cogn Comput., In press.

  6. Yuan Y, Sun F. Data fusion-based resilient control system under DoS attacks: a game theoretic approach. Int J Control Autom Syst. 2015;13(3):513–20.

    Article  Google Scholar 

  7. Sivic J, Russell B, Efros A, Zisserman A, Freeman W. Discovering objects and their location in images. In: Proceedings of international conference on computer vision (ICCV), 2005, pp. 370–377.

  8. Russell B, Freeman W, Efros A, Sivic J, Zisserman A. Using multiple segmentations to discover objects and their extent in image collections. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2006, pp. 1605–1614.

  9. Tunnermann J, Mertsching B. Region-based artificial visual attention in space and time. Cogn Comput. 2014;6(1):125–43.

    Article  Google Scholar 

  10. Zhao G, Yuan J, Hua G. Topical video object discovery from key frames by modeling word co-occurrence prior. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2013; pp. 1602–1609.

  11. Tang J, Lewis P. Non-negative matrix factorization for object class discovery and image auto-annotation. In: Proceedings of international conference on content-based image and video retrieval, 2008; pp. 105–112.

  12. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY. Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell. 2011;33(2):353–67.

    Article  PubMed  Google Scholar 

  13. Zhu J, Wu J, Wei Y, Chang E, Tu Z. Unsupervised object class discovery via saliency-guided multiple class learning, In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2012; pp. 3218–3225.

  14. Liu D, Chen T. A topic-motion model for unsupervised video object discovery. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2014, pp. 1–8.

  15. Zhao G, Yuan J. Discovering thematic patterns in videos via cohesive sub-graph mining. In: Proceedings of International conference on data mining (ICDM), pp. 1260–1265.

  16. Zhao J, Sun S, Liu X, Sun J, Yang A. A novel biologically inspired visual saliency model. Cogn Comput. 2014;6(4):841–8.

    Article  Google Scholar 

  17. Tu Z, Zheng A, Yang E, Luo B, Hussain A. A biologically inspired vision-based approach for detecting multiple moving objects in complex outdoor scenes. Cogn Comput. 2015;7(2):539–51.

    Article  Google Scholar 

  18. Alexe B, Deselaers T, Ferrari V. What is an object?. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2010, pp. 73–80.

  19. Cheng M, Zhang Z, Lin W, Torr P. BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2014, pp. 1–8.

  20. Cheng H, Liu Z, Hou L, Yang J. Sparsity induced similarity measure and its applications. IEEE Trans Circuits Syst Video Technol., In press, doi:10.1109/TCSVT.2012.2225911.

  21. Wang J, Su G, Xiong Y, Chen J, Shang Y, Liu J, Ren X. Sparse representation for face recognition based on constraint sampling and face alignment. Tsinghua Sci Technol. 2013;1:62–7.

    Article  Google Scholar 

  22. Zheng Y, Sheng H, Zhang B, Zhang J, Xiong Z. Weight-based sparse coding for multi-shot person re-identification. Sci China Inform. Sci. 2015;58:100104(15).

    Google Scholar 

  23. Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program. 2013;146:1–36.

    Google Scholar 

  24. Bao C, Ji H, Quan Y, Shen Z. \(l_0\) Norm based dictionary learning by proximal methods with global convergence. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2014, pp. 1–8.

  25. Yuan J, Zhao G, Fu Y, Li Z, Katsaggelos A, Wu Y. Discovering thematic objects in image collections and videos. IEEE Trans Image Process. 2012;21(4):2207–19.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This study was jointly supported by National Key Project for Basic Research of China under Grant 2013CB329403, National Natural Science Foundation of China under Grant 61327809, and National High-tech Research and Development Plan under Grant 2015AA042306.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaping Liu.

Ethics declarations

Conflict of Interest

Huaping Liu and Fuchun Sun declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Sun, F. Discovery of Topical Objects from Video: A Structured Dictionary Learning Approach. Cogn Comput 8, 519–528 (2016). https://doi.org/10.1007/s12559-015-9381-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9381-5

Keywords

Navigation