Advertisement

Neural Computing and Applications

, Volume 26, Issue 7, pp 1561–1574 | Cite as

Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression

  • Changyu Liu
  • Shoubin DongEmail author
  • Bin Lu
  • Mohamed Abdel-Mottaleb
Original Article

Abstract

Multimedia event detection (MED) is one of the most important branches of multimedia content analysis. Current research work on MED focuses mainly on detecting specific events, such as sport events, news events and suspicious events, which is far from achieving a complicated and generic MED due to the fact that these events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event detectors and event classifiers. Hence, proper representation of these visual attributes could be helpful in building a sophisticated and generic MED. In this paper, we use Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and propose a ℓ2-regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. We also propose an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results demonstrate the effectiveness of the proposed LLGMM classifier for MED.

Keywords

Multimedia event detection 2 Regularization Logistic regression Gaussian mixture model LLGMM classifier 

Notes

Acknowledgments

This paper was supported by the National Natural Science Foundation of China under Grant No. 61070092, the Doctor Startup Foundation of Wuyi University under Grant No. 2014BS07, the Science and Technology Planning Project of Guangzhou under Grant No. 2013Y2-00073, and the Special Province-Ministry Funds for Industry, Education and Research of Guangdong Province under Grant No. 2013B090500087. We would like to thank anonymous reviewers for helpful comments.

References

  1. 1.
    Over P et al (2013) TRECVID 2013—An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In: NIST TRECVID workshop. http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf
  2. 2.
    Reynolds AD (2009) Gaussian mixture models. In: Li SZ, Jain AK (eds) Encyclopedia of biometrics. Springer, New York, pp 659–663Google Scholar
  3. 3.
    Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, 2004Google Scholar
  4. 4.
    Duda RO, Hart PE, Storck DJ (2000) Pattern classification, 2nd edn. Wiley, New YorkGoogle Scholar
  5. 5.
    Yu G, Sapiro G, Mallat S (2012) Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans Image Process 21(5):2481–2499MathSciNetCrossRefGoogle Scholar
  6. 6.
    Jian B, Vemuri BC (2011) Robust point set registration using Gaussian mixture models. TPAMI 33(8):1633–1645CrossRefGoogle Scholar
  7. 7.
    Ait Kerroum M, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recogn Lett 31(10):1168–1174CrossRefGoogle Scholar
  8. 8.
    Lin HH, Chuang JH, Liu TL (2011) Regularized background adaptation: a novel learning rate control scheme for Gaussian mixture modeling. IEEE Trans Image Process 20(3):822–836MathSciNetCrossRefGoogle Scholar
  9. 9.
    Zhou M, Chen H, Paisley J, Ren L, Li L, Xing Z, Dunson D, Sapiro G, Carin L (2012) Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Image Process 21(1):130–144MathSciNetCrossRefGoogle Scholar
  10. 10.
    Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957CrossRefGoogle Scholar
  11. 11.
    McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York, p 1Google Scholar
  12. 12.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetzbMATHGoogle Scholar
  13. 13.
    Kwak N (2008) Principal component analysis based on L1-norm maximization. TPAMI 30(9):1672–1680CrossRefGoogle Scholar
  14. 14.
    Ding C et al (2006) R 1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, 2006Google Scholar
  15. 15.
    Baek K et al (2002) PCA vs. ICA: a comparison on the FERET data set. In: JCIS, 2002Google Scholar
  16. 16.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Nie F et al (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: NIPS, 2010Google Scholar
  18. 18.
    Wang L (2008) Feature selection with kernel class separability. TPAMI 30(9):1534–1546CrossRefGoogle Scholar
  19. 19.
    Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. Ann Stat 38(3):1287–1319MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Candès EJ, Plan Y (2009) Near-ideal model selection by ℓ1 minimization. Ann Stat 37(5A):2145–2177CrossRefzbMATHGoogle Scholar
  21. 21.
    De Vito E, Caponnetto A, Rosasco L (2005) Model selection for regularized least-squares algorithm in learning theory. Found Comput Math 5(1):59–85MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, 2004Google Scholar
  23. 23.
    Kloft M et al (2009) Efficient and accurate lp-norm multiple kernel learning. NIPS 22(22):997–1005Google Scholar
  24. 24.
    Sonnenburg S et al (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565MathSciNetzbMATHGoogle Scholar
  25. 25.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288MathSciNetzbMATHGoogle Scholar
  26. 26.
    Mairal J et al (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60MathSciNetzbMATHGoogle Scholar
  27. 27.
    Mairal J et al (2009) Online dictionary learning for sparse coding. In: ICML, 2009Google Scholar
  28. 28.
    Yang J et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, 2009Google Scholar
  29. 29.
    Rozell CJ et al (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563MathSciNetCrossRefGoogle Scholar
  30. 30.
    Andrew G, Gao J (2007) Scalable training of L 1-regularized log-linear models. In ICML, 2007Google Scholar
  31. 31.
    Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imaging Sci 2(2):323–343MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677MathSciNetCrossRefGoogle Scholar
  33. 33.
    Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for l1 regularization: a comparative study and two new approaches. Machine learning: ECML. Springer, Berlin, pp 286–297Google Scholar
  34. 34.
    Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356MathSciNetzbMATHGoogle Scholar
  35. 35.
    Goodman J (2004) Exponential priors for maximum entropy models. In: HLT-NAACL, 2004Google Scholar
  36. 36.
    Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, 2009Google Scholar
  37. 37.
    Obozinski G, Taskar B, Jordan MI (2006) Multi-task feature selection, Statistics Department, UC Berkeley, Technical Report, 2006Google Scholar
  38. 38.
    Farahmand AM et al (2008) Regularized policy iteration. In: NIPS, 2008Google Scholar
  39. 39.
    Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77CrossRefGoogle Scholar
  40. 40.
    Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141MathSciNetzbMATHGoogle Scholar
  41. 41.
    Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Und 106(1):59–70CrossRefGoogle Scholar
  42. 42.
    Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of TechnologyGoogle Scholar
  43. 43.
    Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: CVPR, 2009Google Scholar
  44. 44.
    Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. IJCV 77(1–3):157–173CrossRefGoogle Scholar
  45. 45.
    Reddy KK, Shah M (2012) Recognizing 50 human action categories of web videos. MVAP 24(5):971–981Google Scholar
  46. 46.
    Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01, 2012Google Scholar
  47. 47.
    Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, 2008Google Scholar
  48. 48.
    Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR, 2008Google Scholar
  49. 49.
    Li LJ, Fei-Fei L (2007) What, where and who? Classifying event by scene and object recognition. In: ICCV, 2007Google Scholar
  50. 50.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110CrossRefGoogle Scholar
  51. 51.
    van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. TPAMI 32(9):1582–1596CrossRefGoogle Scholar
  52. 52.
    Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Und 113(1):48–62CrossRefGoogle Scholar
  53. 53.
    Chen MY, Hauptmann A (2009) MoSIFT: reocgnizing human actions in surveillance videos. In: CMU-CS-09-161, Carnegie Mellon UniversityGoogle Scholar
  54. 54.
    Laptev I (2005) On space-time interest points. IJCV 64(2–3):107–123Google Scholar
  55. 55.
    Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79CrossRefGoogle Scholar
  56. 56.
    Li LJ, Su H, Lim Y, Fei-Fei L (2010) Objects as attributes for scene classification. In: ECCV, 2010Google Scholar
  57. 57.
    Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR, 2012, pp 1234–1241Google Scholar
  58. 58.
    Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, 2010Google Scholar
  59. 59.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):Article 27Google Scholar
  60. 60.
    An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162CrossRefzbMATHGoogle Scholar
  61. 61.
    Krishnapuram B, Carin L, Figueiredo MAT, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. TPAMI 27(6):957–968CrossRefGoogle Scholar
  62. 62.
    Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320CrossRefzbMATHGoogle Scholar
  63. 63.
    Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Vid 15(10):1225–1233CrossRefGoogle Scholar
  64. 64.
    Zhang D, Chang SF (2002) Event detection in baseball video using superimposed caption recognition. In: ACM MM, 2002Google Scholar
  65. 65.
    Xu C et al (2006) Live sports event detection based on broadcast video and web-casting text. In: ACM MM, 2006Google Scholar
  66. 66.
    Li Z et al (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR, 2005Google Scholar
  67. 67.
    Xu P, Xie L, Chang SF, Divakaran A, Vetro A, Sun H (2001) Algorithms and system for segmentation and structure analysis in soccer video. In: ICME, 2001Google Scholar
  68. 68.
    Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Und 92(2–3):285–305CrossRefGoogle Scholar
  69. 69.
    Bai L, Lao S, Zhang W, Jones G, Smeaton A (2007) A semantic event detection approach for soccer video based on perception concepts and finite state machines. In: WIAMIS, 2007Google Scholar
  70. 70.
    Assfalg J, Bertini M, Del Bimbo A, Nunziati W, Pala P (2002) Soccer highlights detection and recognition using HMMs. In: ICME, 2002Google Scholar
  71. 71.
    Brand M, Kettnaker V (2000) Discovery and segmentation of activities in video. TPAMI 22(8):844–851CrossRefGoogle Scholar
  72. 72.
    Ebadollahi S, Xie L, Chang SF, Smith J (2006) Visual event detection using multi-dimensional concept dynamics. In: ICME, 2006Google Scholar
  73. 73.
    Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. In: EURASIP JIVP, pp 1–13Google Scholar
  74. 74.
    Chao C, Shih HC, Huang CL (2005) Semantics-based highlight extraction of soccer program using DBN. In: ICASSP, 2005Google Scholar
  75. 75.
    Luo M, Ma YF, Zhang HJ (2003) Pyramid wise structuring for soccer highlight extraction. In: ICICS-PCM, 2003Google Scholar
  76. 76.
    Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. TPAMI 30(11):1985–1997CrossRefGoogle Scholar
  77. 77.
    Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: ACM MM, 2008Google Scholar
  78. 78.
    Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time compressed motion features. In: ACM CIVR, pp 178–185Google Scholar
  79. 79.
    Petersen KB, Pedersen MS (2008) The matrix cookbook. Technical University of Denmark, pp 7–15Google Scholar
  80. 80.
    Koh K, Kim SJ, Boyd S (2007) An interior-point method for large-scale ℓ1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555MathSciNetzbMATHGoogle Scholar
  81. 81.
    Rennie JDM (2005) Regularized logistic regression is strictly convex. Unpublished manuscript, http://people.csail.mit.edu/jrennie/writing/convexLR.pdf
  82. 82.
    Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, vol 2. MIT Press, CambridgezbMATHGoogle Scholar
  83. 83.
    Andrén D, Hellström L, Markström K (2007) On the complexity of matrix reduction over finite fields. Adv Appl Math 39(4):428–452CrossRefzbMATHGoogle Scholar
  84. 84.
    2011 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED11-EvalPlan-V03-20110801a.pdf
  85. 85.
    2013 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED13_Evaluation_Plan_v2.pdf
  86. 86.
    Tong W, Yang Y, Jiang L, Yu S, Lan Z, Ma Z, Sze W, Younessian E, Hauptmann A (2014) E-LAMP: integration of innovative ideas for multimedia event detection. Mach Vision Appl 25(1):5–15CrossRefGoogle Scholar
  87. 87.
    Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann A (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71:333–347CrossRefGoogle Scholar
  88. 88.
    Bao L et al (2011) Informedia@ trecvid 2011. TRECVID2011. In: NIST, 2011Google Scholar
  89. 89.
    Cao L, Chang SF, Codella N, Cotton C, Ellis D, Gong L, Hill M, Hua G, Kender J, Merler M, Mu Y, Natsev A, Smith JR (2011) IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In: NIST TRECVID Workshop, 2011Google Scholar
  90. 90.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  91. 91.
    Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107MathSciNetzbMATHGoogle Scholar
  92. 92.
    Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New YorkGoogle Scholar
  93. 93.
    Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New YorkGoogle Scholar
  94. 94.
    Jiang YG et al (2010) Columbia-UCF TRECVID2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID, 2010Google Scholar
  95. 95.
    Yang Y et al (2012) Robust cross-media transfer for visual event detection. In: ACM MM, 2012Google Scholar
  96. 96.
    Schmidt M (2012) minFunc: unconstrained differentiable multivariate optimization in Matlab. http://www.di.ens.fr/~mschmidt/Software/minFunc.html

Copyright information

© The Natural Computing Applications Forum 2015

Authors and Affiliations

  • Changyu Liu
    • 1
    • 3
    • 4
  • Shoubin Dong
    • 1
    Email author
  • Bin Lu
    • 2
  • Mohamed Abdel-Mottaleb
    • 3
    • 5
  1. 1.School of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.School of Computer ScienceWuyi UniversityJiangmenChina
  3. 3.Department of Electrical and Computer EngineeringUniversity of MiamiCoral GablesUSA
  4. 4.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  5. 5.AdjunctEffat UniversityJeddahSaudi Arabia

Personalised recommendations