Abstract
Multimedia event detection (MED) is one of the most important branches of multimedia content analysis. Current research work on MED focuses mainly on detecting specific events, such as sport events, news events and suspicious events, which is far from achieving a complicated and generic MED due to the fact that these events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event detectors and event classifiers. Hence, proper representation of these visual attributes could be helpful in building a sophisticated and generic MED. In this paper, we use Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and propose a ℓ2-regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. We also propose an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results demonstrate the effectiveness of the proposed LLGMM classifier for MED.
Similar content being viewed by others
References
Over P et al (2013) TRECVID 2013—An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In: NIST TRECVID workshop. http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf
Reynolds AD (2009) Gaussian mixture models. In: Li SZ, Jain AK (eds) Encyclopedia of biometrics. Springer, New York, pp 659–663
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, 2004
Duda RO, Hart PE, Storck DJ (2000) Pattern classification, 2nd edn. Wiley, New York
Yu G, Sapiro G, Mallat S (2012) Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans Image Process 21(5):2481–2499
Jian B, Vemuri BC (2011) Robust point set registration using Gaussian mixture models. TPAMI 33(8):1633–1645
Ait Kerroum M, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recogn Lett 31(10):1168–1174
Lin HH, Chuang JH, Liu TL (2011) Regularized background adaptation: a novel learning rate control scheme for Gaussian mixture modeling. IEEE Trans Image Process 20(3):822–836
Zhou M, Chen H, Paisley J, Ren L, Li L, Xing Z, Dunson D, Sapiro G, Carin L (2012) Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Image Process 21(1):130–144
Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957
McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York, p 1
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
Kwak N (2008) Principal component analysis based on L1-norm maximization. TPAMI 30(9):1672–1680
Ding C et al (2006) R 1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, 2006
Baek K et al (2002) PCA vs. ICA: a comparison on the FERET data set. In: JCIS, 2002
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320
Nie F et al (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: NIPS, 2010
Wang L (2008) Feature selection with kernel class separability. TPAMI 30(9):1534–1546
Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. Ann Stat 38(3):1287–1319
Candès EJ, Plan Y (2009) Near-ideal model selection by ℓ1 minimization. Ann Stat 37(5A):2145–2177
De Vito E, Caponnetto A, Rosasco L (2005) Model selection for regularized least-squares algorithm in learning theory. Found Comput Math 5(1):59–85
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, 2004
Kloft M et al (2009) Efficient and accurate lp-norm multiple kernel learning. NIPS 22(22):997–1005
Sonnenburg S et al (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Mairal J et al (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
Mairal J et al (2009) Online dictionary learning for sparse coding. In: ICML, 2009
Yang J et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, 2009
Rozell CJ et al (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563
Andrew G, Gao J (2007) Scalable training of L 1-regularized log-linear models. In ICML, 2007
Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imaging Sci 2(2):323–343
Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677
Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for l1 regularization: a comparative study and two new approaches. Machine learning: ECML. Springer, Berlin, pp 286–297
Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356
Goodman J (2004) Exponential priors for maximum entropy models. In: HLT-NAACL, 2004
Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, 2009
Obozinski G, Taskar B, Jordan MI (2006) Multi-task feature selection, Statistics Department, UC Berkeley, Technical Report, 2006
Farahmand AM et al (2008) Regularized policy iteration. In: NIPS, 2008
Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Und 106(1):59–70
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: CVPR, 2009
Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. IJCV 77(1–3):157–173
Reddy KK, Shah M (2012) Recognizing 50 human action categories of web videos. MVAP 24(5):971–981
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01, 2012
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, 2008
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR, 2008
Li LJ, Fei-Fei L (2007) What, where and who? Classifying event by scene and object recognition. In: ICCV, 2007
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. TPAMI 32(9):1582–1596
Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Und 113(1):48–62
Chen MY, Hauptmann A (2009) MoSIFT: reocgnizing human actions in surveillance videos. In: CMU-CS-09-161, Carnegie Mellon University
Laptev I (2005) On space-time interest points. IJCV 64(2–3):107–123
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79
Li LJ, Su H, Lim Y, Fei-Fei L (2010) Objects as attributes for scene classification. In: ECCV, 2010
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR, 2012, pp 1234–1241
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, 2010
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):Article 27
An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162
Krishnapuram B, Carin L, Figueiredo MAT, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. TPAMI 27(6):957–968
Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320
Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Vid 15(10):1225–1233
Zhang D, Chang SF (2002) Event detection in baseball video using superimposed caption recognition. In: ACM MM, 2002
Xu C et al (2006) Live sports event detection based on broadcast video and web-casting text. In: ACM MM, 2006
Li Z et al (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR, 2005
Xu P, Xie L, Chang SF, Divakaran A, Vetro A, Sun H (2001) Algorithms and system for segmentation and structure analysis in soccer video. In: ICME, 2001
Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Und 92(2–3):285–305
Bai L, Lao S, Zhang W, Jones G, Smeaton A (2007) A semantic event detection approach for soccer video based on perception concepts and finite state machines. In: WIAMIS, 2007
Assfalg J, Bertini M, Del Bimbo A, Nunziati W, Pala P (2002) Soccer highlights detection and recognition using HMMs. In: ICME, 2002
Brand M, Kettnaker V (2000) Discovery and segmentation of activities in video. TPAMI 22(8):844–851
Ebadollahi S, Xie L, Chang SF, Smith J (2006) Visual event detection using multi-dimensional concept dynamics. In: ICME, 2006
Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. In: EURASIP JIVP, pp 1–13
Chao C, Shih HC, Huang CL (2005) Semantics-based highlight extraction of soccer program using DBN. In: ICASSP, 2005
Luo M, Ma YF, Zhang HJ (2003) Pyramid wise structuring for soccer highlight extraction. In: ICICS-PCM, 2003
Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. TPAMI 30(11):1985–1997
Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: ACM MM, 2008
Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time compressed motion features. In: ACM CIVR, pp 178–185
Petersen KB, Pedersen MS (2008) The matrix cookbook. Technical University of Denmark, pp 7–15
Koh K, Kim SJ, Boyd S (2007) An interior-point method for large-scale ℓ1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555
Rennie JDM (2005) Regularized logistic regression is strictly convex. Unpublished manuscript, http://people.csail.mit.edu/jrennie/writing/convexLR.pdf
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, vol 2. MIT Press, Cambridge
Andrén D, Hellström L, Markström K (2007) On the complexity of matrix reduction over finite fields. Adv Appl Math 39(4):428–452
2011 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED11-EvalPlan-V03-20110801a.pdf
2013 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED13_Evaluation_Plan_v2.pdf
Tong W, Yang Y, Jiang L, Yu S, Lan Z, Ma Z, Sze W, Younessian E, Hauptmann A (2014) E-LAMP: integration of innovative ideas for multimedia event detection. Mach Vision Appl 25(1):5–15
Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann A (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71:333–347
Bao L et al (2011) Informedia@ trecvid 2011. TRECVID2011. In: NIST, 2011
Cao L, Chang SF, Codella N, Cotton C, Ellis D, Gong L, Hill M, Hua G, Kender J, Merler M, Mu Y, Natsev A, Smith JR (2011) IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In: NIST TRECVID Workshop, 2011
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New York
Jiang YG et al (2010) Columbia-UCF TRECVID2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID, 2010
Yang Y et al (2012) Robust cross-media transfer for visual event detection. In: ACM MM, 2012
Schmidt M (2012) minFunc: unconstrained differentiable multivariate optimization in Matlab. http://www.di.ens.fr/~mschmidt/Software/minFunc.html
Acknowledgments
This paper was supported by the National Natural Science Foundation of China under Grant No. 61070092, the Doctor Startup Foundation of Wuyi University under Grant No. 2014BS07, the Science and Technology Planning Project of Guangzhou under Grant No. 2013Y2-00073, and the Special Province-Ministry Funds for Industry, Education and Research of Guangdong Province under Grant No. 2013B090500087. We would like to thank anonymous reviewers for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, C., Dong, S., Lu, B. et al. Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression. Neural Comput & Applic 26, 1561–1574 (2015). https://doi.org/10.1007/s00521-014-1810-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1810-y