Advertisement

Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences

  • Kuan-Ting Lai
  • Dong Liu
  • Ming-Syan Chen
  • Shih-Fu Chang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8691)

Abstract

Complex events consist of various human interactions with different objects in diverse environments. The evidences needed to recognize events may occur in short time periods with variable lengths and can happen anywhere in a video. This fact prevents conventional machine learning algorithms from effectively recognizing the events. In this paper, we propose a novel method that can automatically identify the key evidences in videos for detecting complex events. Both static instances (objects) and dynamic instances (actions) are considered by sampling frames and temporal segments respectively. To compare the characteristic power of heterogeneous instances, we embed static and dynamic instances into a multiple instance learning framework via instance similarity measures, and cast the problem as an Evidence Selective Ranking (ESR) process. We impose ℓ1 norm to select key evidences while using the Infinite Push Loss Function to enforce positive videos to have higher detection scores than negative videos. The Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the optimization problem. Experiments on large-scale video datasets show that our method can improve the detection accuracy while providing the unique capability in discovering key evidences of each complex event.

Keywords

Video Event Detection Infinite Push Key Evidence Selection ADMM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, S.: The infinite push: A new support vector ranking algorithm that directly optimizes accuracy at the absolute top of the list. In: SDM, pp. 839–850. Society for Industrial and Applied Mathematics (2011)Google Scholar
  2. 2.
    Bhattacharya, S., Yu, F.X., Chang, S.F.: Minimally needed evidence for complex event recognition in unconstrained videos. In: ICMR (2014)Google Scholar
  3. 3.
    Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 688–701. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Chen, Y., Bi, J., Wang, J.Z.: Miles: Multiple-instance learning via embedded instance selection. PAMI 28(12), 1931–1947 (2006)CrossRefGoogle Scholar
  5. 5.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1), 31–71 (1997)CrossRefzbMATHGoogle Scholar
  6. 6.
    Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    INRIA: Yael library: Optimized implementations of computationally demanding functions (2009), https://gforge.inria.fr/projects/yael/
  8. 8.
    Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. IJMIR, 1–29 (2012)Google Scholar
  9. 9.
    Joachims, T.: Optimizing search engines using clickthrough data. In: SIGKDD, pp. 133–142. ACM (2002)Google Scholar
  10. 10.
    Li, W., Yu, Q., Divakaran, A., Vasconcelos, N.: Dynamic pooling for complex event recognition. In: ICCV (2013)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R.: Multimodal feature fusion for robust event detection in web videos. In: CVPR (2012)Google Scholar
  13. 13.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)Google Scholar
  15. 15.
    Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Kraaij, W., Smeaton, A.F., Quenot, G.: Trecvid 2013 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST (2013)Google Scholar
  16. 16.
    Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An efficient projection for l 1, ∞ , infinity regularization. In: ICML (2009)Google Scholar
  17. 17.
    Rakotomamonjy, A.: Sparse support vector infinite push. In: ICML (2012)Google Scholar
  18. 18.
    Rudin, C.: The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. JMLR 10, 2233–2271 (2009)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01 (2012)Google Scholar
  20. 20.
    Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H.: Evaluation of low-level features and their combinations for complex event detection in open source videos. In: CVPR (2012)Google Scholar
  21. 21.
    Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)Google Scholar
  22. 22.
    Vahdat, A., Cannons, K., Mori, G., Oh, S., Kim, I.: Compositional models for video event detection: A multiple kernel learning latent variable approach. In: ICCV, pp. 1185–1192 (2013)Google Scholar
  23. 23.
    Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
  24. 24.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  25. 25.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV, 1–20 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kuan-Ting Lai
    • 1
    • 2
  • Dong Liu
    • 3
  • Ming-Syan Chen
    • 1
    • 2
  • Shih-Fu Chang
    • 3
  1. 1.Graduate Institute of Electrical EngineeringNational Taiwan UniversityTaiwan
  2. 2.Research Center for IT InnovationAcademia SinicaTaiwan
  3. 3.Department of Electrical EngineeringColumbia UniversityUSA

Personalised recommendations