Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression

Liu, Changyu; Dong, Shoubin; Lu, Bin; Abdel-Mottaleb, Mohamed

doi:10.1007/s00521-014-1810-y

Multimedia event detection with ℓ₂-regularized logistic Gaussian mixture regression

Original Article
Published: 21 January 2015

Volume 26, pages 1561–1574, (2015)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Changyu Liu^1,3,4,
Shoubin Dong¹,
Bin Lu² &
…
Mohamed Abdel-Mottaleb^3,5

416 Accesses
3 Citations
Explore all metrics

Abstract

Multimedia event detection (MED) is one of the most important branches of multimedia content analysis. Current research work on MED focuses mainly on detecting specific events, such as sport events, news events and suspicious events, which is far from achieving a complicated and generic MED due to the fact that these events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event detectors and event classifiers. Hence, proper representation of these visual attributes could be helpful in building a sophisticated and generic MED. In this paper, we use Gaussian mixture model (GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes and propose a ℓ₂-regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. We also propose an efficient iterative algorithm, which uses gradient descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results demonstrate the effectiveness of the proposed LLGMM classifier for MED.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Spatio-temporal representation based efficient video event detection using and BMCIM model

Article 20 April 2023

A. Anbarasa Pandian & S. Maheswari

Automatic Event Detection in User-Generated Video Content: A Survey

Resource Constrained Multimedia Event Detection

References

Over P et al (2013) TRECVID 2013—An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In: NIST TRECVID workshop. http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf
Reynolds AD (2009) Gaussian mixture models. In: Li SZ, Jain AK (eds) Encyclopedia of biometrics. Springer, New York, pp 659–663
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML, 2004
Duda RO, Hart PE, Storck DJ (2000) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
Yu G, Sapiro G, Mallat S (2012) Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans Image Process 21(5):2481–2499
Article MathSciNet Google Scholar
Jian B, Vemuri BC (2011) Robust point set registration using Gaussian mixture models. TPAMI 33(8):1633–1645
Article Google Scholar
Ait Kerroum M, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recogn Lett 31(10):1168–1174
Article Google Scholar
Lin HH, Chuang JH, Liu TL (2011) Regularized background adaptation: a novel learning rate control scheme for Gaussian mixture modeling. IEEE Trans Image Process 20(3):822–836
Article MathSciNet Google Scholar
Zhou M, Chen H, Paisley J, Ren L, Li L, Xing Z, Dunson D, Sapiro G, Carin L (2012) Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Image Process 21(1):130–144
Article MathSciNet Google Scholar
Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957
Article Google Scholar
McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York, p 1
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
MathSciNet MATH Google Scholar
Kwak N (2008) Principal component analysis based on L1-norm maximization. TPAMI 30(9):1672–1680
Article Google Scholar
Ding C et al (2006) R 1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, 2006
Baek K et al (2002) PCA vs. ICA: a comparison on the FERET data set. In: JCIS, 2002
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320
Article MathSciNet MATH Google Scholar
Nie F et al (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: NIPS, 2010
Wang L (2008) Feature selection with kernel class separability. TPAMI 30(9):1534–1546
Article Google Scholar
Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. Ann Stat 38(3):1287–1319
Article MathSciNet MATH Google Scholar
Candès EJ, Plan Y (2009) Near-ideal model selection by ℓ1 minimization. Ann Stat 37(5A):2145–2177
Article MATH Google Scholar
De Vito E, Caponnetto A, Rosasco L (2005) Model selection for regularized least-squares algorithm in learning theory. Found Comput Math 5(1):59–85
Article MathSciNet MATH Google Scholar
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, 2004
Kloft M et al (2009) Efficient and accurate lp-norm multiple kernel learning. NIPS 22(22):997–1005
Google Scholar
Sonnenburg S et al (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
MathSciNet MATH Google Scholar
Mairal J et al (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
MathSciNet MATH Google Scholar
Mairal J et al (2009) Online dictionary learning for sparse coding. In: ICML, 2009
Yang J et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, 2009
Rozell CJ et al (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563
Article MathSciNet Google Scholar
Andrew G, Gao J (2007) Scalable training of L 1-regularized log-linear models. In ICML, 2007
Goldstein T, Osher S (2009) The split Bregman method for L1-regularized problems. SIAM J Imaging Sci 2(2):323–343
Article MathSciNet MATH Google Scholar
Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677
Article MathSciNet Google Scholar
Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for l1 regularization: a comparative study and two new approaches. Machine learning: ECML. Springer, Berlin, pp 286–297
Google Scholar
Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356
MathSciNet MATH Google Scholar
Goodman J (2004) Exponential priors for maximum entropy models. In: HLT-NAACL, 2004
Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, 2009
Obozinski G, Taskar B, Jordan MI (2006) Multi-task feature selection, Statistics Department, UC Berkeley, Technical Report, 2006
Farahmand AM et al (2008) Regularized policy iteration. In: NIPS, 2008
Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Article Google Scholar
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
MathSciNet MATH Google Scholar
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Und 106(1):59–70
Article Google Scholar
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: CVPR, 2009
Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. IJCV 77(1–3):157–173
Article Google Scholar
Reddy KK, Shah M (2012) Recognizing 50 human action categories of web videos. MVAP 24(5):971–981
Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01, 2012
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, 2008
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR, 2008
Li LJ, Fei-Fei L (2007) What, where and who? Classifying event by scene and object recognition. In: ICCV, 2007
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60(2):91–110
Article Google Scholar
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. TPAMI 32(9):1582–1596
Article Google Scholar
Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Und 113(1):48–62
Article Google Scholar
Chen MY, Hauptmann A (2009) MoSIFT: reocgnizing human actions in surveillance videos. In: CMU-CS-09-161, Carnegie Mellon University
Laptev I (2005) On space-time interest points. IJCV 64(2–3):107–123
Google Scholar
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1):60–79
Article Google Scholar
Li LJ, Su H, Lim Y, Fei-Fei L (2010) Objects as attributes for scene classification. In: ECCV, 2010
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR, 2012, pp 1234–1241
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, 2010
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):Article 27
An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162
Article MATH Google Scholar
Krishnapuram B, Carin L, Figueiredo MAT, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. TPAMI 27(6):957–968
Article Google Scholar
Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320
Article MATH Google Scholar
Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Vid 15(10):1225–1233
Article Google Scholar
Zhang D, Chang SF (2002) Event detection in baseball video using superimposed caption recognition. In: ACM MM, 2002
Xu C et al (2006) Live sports event detection based on broadcast video and web-casting text. In: ACM MM, 2006
Li Z et al (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR, 2005
Xu P, Xie L, Chang SF, Divakaran A, Vetro A, Sun H (2001) Algorithms and system for segmentation and structure analysis in soccer video. In: ICME, 2001
Assfalg J, Bertini M, Colombo C, Del Bimbo A, Nunziati W (2003) Semantic annotation of soccer videos: automatic highlights identification. Comput Vis Image Und 92(2–3):285–305
Article Google Scholar
Bai L, Lao S, Zhang W, Jones G, Smeaton A (2007) A semantic event detection approach for soccer video based on perception concepts and finite state machines. In: WIAMIS, 2007
Assfalg J, Bertini M, Del Bimbo A, Nunziati W, Pala P (2002) Soccer highlights detection and recognition using HMMs. In: ICME, 2002
Brand M, Kettnaker V (2000) Discovery and segmentation of activities in video. TPAMI 22(8):844–851
Article Google Scholar
Ebadollahi S, Xie L, Chang SF, Smith J (2006) Visual event detection using multi-dimensional concept dynamics. In: ICME, 2006
Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. In: EURASIP JIVP, pp 1–13
Chao C, Shih HC, Huang CL (2005) Semantics-based highlight extraction of soccer program using DBN. In: ICASSP, 2005
Luo M, Ma YF, Zhang HJ (2003) Pyramid wise structuring for soccer highlight extraction. In: ICICS-PCM, 2003
Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. TPAMI 30(11):1985–1997
Article Google Scholar
Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: ACM MM, 2008
Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time compressed motion features. In: ACM CIVR, pp 178–185
Petersen KB, Pedersen MS (2008) The matrix cookbook. Technical University of Denmark, pp 7–15
Koh K, Kim SJ, Boyd S (2007) An interior-point method for large-scale ℓ1-regularized logistic regression. J Mach Learn Res 8(8):1519–1555
MathSciNet MATH Google Scholar
Rennie JDM (2005) Regularized logistic regression is strictly convex. Unpublished manuscript, http://people.csail.mit.edu/jrennie/writing/convexLR.pdf
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, vol 2. MIT Press, Cambridge
MATH Google Scholar
Andrén D, Hellström L, Markström K (2007) On the complexity of matrix reduction over finite fields. Adv Appl Math 39(4):428–452
Article MATH Google Scholar
2011 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED11-EvalPlan-V03-20110801a.pdf
2013 TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED13_Evaluation_Plan_v2.pdf
Tong W, Yang Y, Jiang L, Yu S, Lan Z, Ma Z, Sze W, Younessian E, Hauptmann A (2014) E-LAMP: integration of innovative ideas for multimedia event detection. Mach Vision Appl 25(1):5–15
Article Google Scholar
Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann A (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71:333–347
Article Google Scholar
Bao L et al (2011) Informedia@ trecvid 2011. TRECVID2011. In: NIST, 2011
Cao L, Chang SF, Codella N, Cotton C, Ellis D, Gong L, Hill M, Hua G, Kender J, Merler M, Mu Y, Natsev A, Smith JR (2011) IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In: NIST TRECVID Workshop, 2011
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
MathSciNet MATH Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
Google Scholar
Hosmer DW Jr, Lemeshow S (2004) Applied logistic regression. Wiley, New York
Google Scholar
Jiang YG et al (2010) Columbia-UCF TRECVID2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID, 2010
Yang Y et al (2012) Robust cross-media transfer for visual event detection. In: ACM MM, 2012
Schmidt M (2012) minFunc: unconstrained differentiable multivariate optimization in Matlab. http://www.di.ens.fr/~mschmidt/Software/minFunc.html

Download references

Acknowledgments

This paper was supported by the National Natural Science Foundation of China under Grant No. 61070092, the Doctor Startup Foundation of Wuyi University under Grant No. 2014BS07, the Science and Technology Planning Project of Guangzhou under Grant No. 2013Y2-00073, and the Special Province-Ministry Funds for Industry, Education and Research of Guangdong Province under Grant No. 2013B090500087. We would like to thank anonymous reviewers for helpful comments.

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Changyu Liu & Shoubin Dong
School of Computer Science, Wuyi University, Jiangmen, 529020, China
Bin Lu
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, 33146, USA
Changyu Liu & Mohamed Abdel-Mottaleb
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Changyu Liu
Adjunct, Effat University, Jeddah, Saudi Arabia
Mohamed Abdel-Mottaleb

Authors

Changyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shoubin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abdel-Mottaleb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shoubin Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Dong, S., Lu, B. et al. Multimedia event detection with ℓ₂-regularized logistic Gaussian mixture regression. Neural Comput & Applic 26, 1561–1574 (2015). https://doi.org/10.1007/s00521-014-1810-y

Download citation

Received: 04 May 2014
Accepted: 19 December 2014
Published: 21 January 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s00521-014-1810-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia event detection with ℓ₂-regularized logistic Gaussian mixture regression

Abstract

Access this article

Similar content being viewed by others

Joint Spatio-temporal representation based efficient video event detection using and BMCIM model

Automatic Event Detection in User-Generated Video Content: A Survey

Resource Constrained Multimedia Event Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regression

Abstract

Access this article

Similar content being viewed by others

Joint Spatio-temporal representation based efficient video event detection using and BMCIM model

Automatic Event Detection in User-Generated Video Content: A Survey

Resource Constrained Multimedia Event Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Multimedia event detection with ℓ₂-regularized logistic Gaussian mixture regression