Skip to main content

Advertisement

Log in

Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Multimedia event detection (MED) is one of the most important branches of multimedia content analysis. Current research work on MED focuses mainly on detecting specific events, such as sport events, news events and suspicious events, which is far from achieving a complicated and generic MED due to the fact that these events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event detectors and event classifiers. Hence, proper representation of these visual attributes could be helpful in building a sophisticated and generic MED. In this paper, we use Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and propose a ℓ2-regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. We also propose an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results demonstrate the effectiveness of the proposed LLGMM classifier for MED.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Over P et al (2013) TRECVID 2013—An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In: NIST TRECVID workshop. http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf

  2. Reynolds AD (2009) Gaussian mixture models. In: Li SZ, Jain AK (eds) Encyclopedia of biometrics. Springer, New York, pp 659–663

  3. Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, 2004

  4. Duda RO, Hart PE, Storck DJ (2000) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  5. Yu G, Sapiro G, Mallat S (2012) Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans Image Process 21(5):2481–2499

    Article  MathSciNet  Google Scholar 

  6. Jian B, Vemuri BC (2011) Robust point set registration using Gaussian mixture models. TPAMI 33(8):1633–1645

    Article  Google Scholar 

  7. Ait Kerroum M, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recogn Lett 31(10):1168–1174

    Article  Google Scholar 

  8. Lin HH, Chuang JH, Liu TL (2011) Regularized background adaptation: a novel learning rate control scheme for Gaussian mixture modeling. IEEE Trans Image Process 20(3):822–836

    Article  MathSciNet  Google Scholar 

  9. Zhou M, Chen H, Paisley J, Ren L, Li L, Xing Z, Dunson D, Sapiro G, Carin L (2012) Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Image Process 21(1):130–144

    Article  MathSciNet  Google Scholar 

  10. Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957

    Article  Google Scholar 

  11. McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York, p 1

  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  13. Kwak N (2008) Principal component analysis based on L1-norm maximization. TPAMI 30(9):1672–1680

    Article  Google Scholar 

  14. Ding C et al (2006) R 1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, 2006

  15. Baek K et al (2002) PCA vs. ICA: a comparison on the FERET data set. In: JCIS, 2002

  16. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320

    Article  MathSciNet  MATH  Google Scholar 

  17. Nie F et al (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: NIPS, 2010

  18. Wang L (2008) Feature selection with kernel class separability. TPAMI 30(9):1534–1546

    Article  Google Scholar 

  19. Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. Ann Stat 38(3):1287–1319

    Article  MathSciNet  MATH  Google Scholar 

  20. Candès EJ, Plan Y (2009) Near-ideal model selection by ℓ1 minimization. Ann Stat 37(5A):2145–2177

    Article  MATH  Google Scholar 

  21. De Vito E, Caponnetto A, Rosasco L (2005) Model selection for regularized least-squares algorithm in learning theory. Found Comput Math 5(1):59–85

    Article  MathSciNet  MATH  Google Scholar 

  22. Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, 2004

  23. Kloft M et al (2009) Efficient and accurate lp-norm multiple kernel learning. NIPS 22(22):997–1005

    Google Scholar 

  24. Sonnenburg S et al (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  25. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  26. Mairal J et al (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60

    MathSciNet  MATH  Google Scholar 

  27. Mairal J et al (2009) Online dictionary learning for sparse coding. In: ICML, 2009

  28. Yang J et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, 2009

  29. Rozell CJ et al (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563

    Article  MathSciNet  Google Scholar 

  30. Andrew G, Gao J (2007) Scalable training of L 1-regularized log-linear models. In ICML, 2007

  31. Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imaging Sci 2(2):323–343

    Article  MathSciNet  MATH  Google Scholar 

  32. Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677

    Article  MathSciNet  Google Scholar 

  33. Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for l1 regularization: a comparative study and two new approaches. Machine learning: ECML. Springer, Berlin, pp 286–297

    Google Scholar 

  34. Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356

    MathSciNet  MATH  Google Scholar 

  35. Goodman J (2004) Exponential priors for maximum entropy models. In: HLT-NAACL, 2004

  36. Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, 2009

  37. Obozinski G, Taskar B, Jordan MI (2006) Multi-task feature selection, Statistics Department, UC Berkeley, Technical Report, 2006

  38. Farahmand AM et al (2008) Regularized policy iteration. In: NIPS, 2008

  39. Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77

    Article  Google Scholar 

  40. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  41. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Und 106(1):59–70

    Article  Google Scholar 

  42. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology

  43. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: CVPR, 2009

  44. Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. IJCV 77(1–3):157–173

    Article  Google Scholar 

  45. Reddy KK, Shah M (2012) Recognizing 50 human action categories of web videos. MVAP 24(5):971–981

    Google Scholar 

  46. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01, 2012

  47. Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, 2008

  48. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR, 2008

  49. Li LJ, Fei-Fei L (2007) What, where and who? Classifying event by scene and object recognition. In: ICCV, 2007

  50. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110

    Article  Google Scholar 

  51. van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. TPAMI 32(9):1582–1596

    Article  Google Scholar 

  52. Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Und 113(1):48–62

    Article  Google Scholar 

  53. Chen MY, Hauptmann A (2009) MoSIFT: reocgnizing human actions in surveillance videos. In: CMU-CS-09-161, Carnegie Mellon University

  54. Laptev I (2005) On space-time interest points. IJCV 64(2–3):107–123

    Google Scholar 

  55. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79

    Article  Google Scholar 

  56. Li LJ, Su H, Lim Y, Fei-Fei L (2010) Objects as attributes for scene classification. In: ECCV, 2010

  57. Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR, 2012, pp 1234–1241

  58. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, 2010

  59. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):Article 27

  60. An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162

    Article  MATH  Google Scholar 

  61. Krishnapuram B, Carin L, Figueiredo MAT, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. TPAMI 27(6):957–968

    Article  Google Scholar 

  62. Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320

    Article  MATH  Google Scholar 

  63. Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Vid 15(10):1225–1233

    Article  Google Scholar 

  64. Zhang D, Chang SF (2002) Event detection in baseball video using superimposed caption recognition. In: ACM MM, 2002

  65. Xu C et al (2006) Live sports event detection based on broadcast video and web-casting text. In: ACM MM, 2006

  66. Li Z et al (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR, 2005

  67. Xu P, Xie L, Chang SF, Divakaran A, Vetro A, Sun H (2001) Algorithms and system for segmentation and structure analysis in soccer video. In: ICME, 2001

  68. Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Und 92(2–3):285–305

    Article  Google Scholar 

  69. Bai L, Lao S, Zhang W, Jones G, Smeaton A (2007) A semantic event detection approach for soccer video based on perception concepts and finite state machines. In: WIAMIS, 2007

  70. Assfalg J, Bertini M, Del Bimbo A, Nunziati W, Pala P (2002) Soccer highlights detection and recognition using HMMs. In: ICME, 2002

  71. Brand M, Kettnaker V (2000) Discovery and segmentation of activities in video. TPAMI 22(8):844–851

    Article  Google Scholar 

  72. Ebadollahi S, Xie L, Chang SF, Smith J (2006) Visual event detection using multi-dimensional concept dynamics. In: ICME, 2006

  73. Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. In: EURASIP JIVP, pp 1–13

  74. Chao C, Shih HC, Huang CL (2005) Semantics-based highlight extraction of soccer program using DBN. In: ICASSP, 2005

  75. Luo M, Ma YF, Zhang HJ (2003) Pyramid wise structuring for soccer highlight extraction. In: ICICS-PCM, 2003

  76. Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. TPAMI 30(11):1985–1997

    Article  Google Scholar 

  77. Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: ACM MM, 2008

  78. Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time compressed motion features. In: ACM CIVR, pp 178–185

  79. Petersen KB, Pedersen MS (2008) The matrix cookbook. Technical University of Denmark, pp 7–15

  80. Koh K, Kim SJ, Boyd S (2007) An interior-point method for large-scale ℓ1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555

    MathSciNet  MATH  Google Scholar 

  81. Rennie JDM (2005) Regularized logistic regression is strictly convex. Unpublished manuscript, http://people.csail.mit.edu/jrennie/writing/convexLR.pdf

  82. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, vol 2. MIT Press, Cambridge

    MATH  Google Scholar 

  83. Andrén D, Hellström L, Markström K (2007) On the complexity of matrix reduction over finite fields. Adv Appl Math 39(4):428–452

    Article  MATH  Google Scholar 

  84. 2011 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED11-EvalPlan-V03-20110801a.pdf

  85. 2013 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED13_Evaluation_Plan_v2.pdf

  86. Tong W, Yang Y, Jiang L, Yu S, Lan Z, Ma Z, Sze W, Younessian E, Hauptmann A (2014) E-LAMP: integration of innovative ideas for multimedia event detection. Mach Vision Appl 25(1):5–15

    Article  Google Scholar 

  87. Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann A (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71:333–347

    Article  Google Scholar 

  88. Bao L et al (2011) Informedia@ trecvid 2011. TRECVID2011. In: NIST, 2011

  89. Cao L, Chang SF, Codella N, Cotton C, Ellis D, Gong L, Hill M, Hua G, Kender J, Merler M, Mu Y, Natsev A, Smith JR (2011) IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In: NIST TRECVID Workshop, 2011

  90. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  91. Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107

    MathSciNet  MATH  Google Scholar 

  92. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York

    Google Scholar 

  93. Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New York

    Google Scholar 

  94. Jiang YG et al (2010) Columbia-UCF TRECVID2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID, 2010

  95. Yang Y et al (2012) Robust cross-media transfer for visual event detection. In: ACM MM, 2012

  96. Schmidt M (2012) minFunc: unconstrained differentiable multivariate optimization in Matlab. http://www.di.ens.fr/~mschmidt/Software/minFunc.html

Download references

Acknowledgments

This paper was supported by the National Natural Science Foundation of China under Grant No. 61070092, the Doctor Startup Foundation of Wuyi University under Grant No. 2014BS07, the Science and Technology Planning Project of Guangzhou under Grant No. 2013Y2-00073, and the Special Province-Ministry Funds for Industry, Education and Research of Guangdong Province under Grant No. 2013B090500087. We would like to thank anonymous reviewers for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shoubin Dong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Dong, S., Lu, B. et al. Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression. Neural Comput & Applic 26, 1561–1574 (2015). https://doi.org/10.1007/s00521-014-1810-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1810-y

Keywords

Navigation